Open Web Application Security Project: Pushing Left, Like a Boss — Part 5.1 — Input Validation, Output Encoding and Parameterized Queries

As previously published on my blog, SheHacksPurple.

The previous article in this series was Part 4 - Secure coding.

After writing up my secure coding guideline and finding it was over 11 pages, my editor informed me that it was inappropriate to publish as a single blog post. With compromise in mind, and in the hopes that people continue reading my blog, I agreed to break up the guideline into several shorter posts. The first few posts will be in-depth details of several of the items for the guideline, then a final post which will be a short, concise guideline, with links to each of the previous posts for further explanation.

Input Validation, Output Encoding and Parameterized Queries

Input Validation

Any input that you receive, from anywhere, must be validated to ensure that it is what you are expecting. For instance:

It is the right type of data? — Date/string/integer/float/etc.
It is within the appropriate range for size? Is it too long? Too short? Does that day actually exist? (June 31st is not a real day)
Is the data is appropriate? — If you are expecting a username, why does it contain characters other than a-z, A-Z, 0–9? If the field is for the date of a future event, why is the date entered in the past? Business logic should be applied here.
Is the data is in the correct format? — If it’s a call to an API, is the call following the protocol of requested input? Is the XML in the correct format? Is it MM/DD/YY, DD/MM/YY or YYYY/DD/MM?

The most important thing is ensuring that the data you are receiving is *valid*. If it is not valid, reject it, then issue an error to the user. Do not try to sanitize it, that is where many programmers get into trouble. Just tell the user what they entered was wrong and let them try again.

Note #1: all sanitation of input should be performed on the server-side(definition below), not the client-side. The reason for this is that client-side validation is performed in JavaScript, which can be easily circumvented with a web proxy, such as OWASP Zap. If you require speed, you can validate on the client-side AND the server-side, but the final decision must always be made server-side.

Client-side versus server-side: Client-side actions happen on the user’s computer, generally in the browser. Client-side actions can often be easily manipulated with a web proxy, for instance javascript input validation. Server-side actions are things that happen on the server where your web app is hosted, and hence cannot be changed with the use of a web proxy.

Note #2: A whitelist is always recommended when performing input validation.

Whitelist versus blacklist: A blacklist is a list of characters that you do not want to allow (for instance tags that you may think would be part of a script). A blacklist is a list of “known bad”characters, which is very difficult to get right, and often simple for an attacker to avoid. A whitelist is a a list of “known good” characters that you will accept. For instance, when you want someone to create a username, you only allow [a-z, A-Z, 0–9]. If a character is not in the list of “known good”, then it is rejected, plain and simple.

There are many ways for malicious actors to circumvent blacklists, as illustrated in detail in the OWASP SQLi Filter Evasion Cheat Sheet.

Output Encoding

When displaying information to the screen, if it was received from a data source (rather than being part of the labels and other information programmed into the interface of the application), it needs to be output encoded. When something is output encoded, any ‘power’ it has is stripped away, and it is treated only as text. This means that if a script was accidentally passed into the application, API or database, it would be rendered as text, not as a script, when it is output by the program.

When we spoke about “Defense in Depth”, the layering of security measures, this is a perfect example of this in practice; only accepting valid input, then output encoding it just to be sure.

This is a perfect example of the layering of security measures in practice, as we covered in “Defense in Depth.” Only valid input should be accepted into the program, then we output encoded it just to be sure.

Many programming frameworks have output encoding automatically added, such as .Net Core.

Parameterized Queries

When sending queries to the database it is important that we use parameterized queries (also known as prepared statements), rather than inline SQL or other database languages. Inline SQL is the pasting of user input together with database query language, then submitting it directly to the database for execution, which is a highly dangerous activity.

The reason for this is that if you put the user input into parameters, it will either 1) be the correct data type and function normally, or 2) be incorrect and it will fail. For instance, if you inject a script into a date field, it will cause the query to fail. It also strips any special powers that some characters may have from the data within parameters, similar to output encoding. This strategy of using parameterized queries (such as stored procedures) is a huge win against any sort of database injection attack.

* For those of you who are unaware, injection attacks are the #1 most damaging and dangerous type of web application attack, and are generally considered to be rated as “critical” if found in a live application.

The next article in the “Pushing Left, Like a Boss” series is 5.2 — Use Safe Dependencies.