Regular Expression Injection
What is regular expression injection?
Regular expression injection is a way of exploiting the use of certain regular expressions. It potentially allows an attacker to execute code on the web server. This can lead to compromise of the website and the web server.What makes a site vulnerable?
Many languages implement a feature which allows regular expressions to include the execution of code (usually through the e modifier). Injection may be possible if the regular expression is constructed using user-supplied input, and this input is not properly validated before insertion into the regular expression. In such cases, it may be possible for an attacker to modify the input in such a way that the resulting regular expression executes malicious code.Impact of the attack
Regular expression potentially allows users to execute arbitrary code on the web server. This code will be in the same language used to write the website code (e.g. PHP). As a result, the attacker will be able to perform any operations on the web server permitted by the website’s user account. This will include being able to modify the website source code, accessing site data, creating arbitrary files on the web server, and executing commands on the underlying operating system.Example of regular expression injection
Wenz gives as an example in PHP of the regular expression-based code to change a user’s password stored unencrypted in an XML file. The main regular expression from the example is (with code to handle whitespace removed for clarity):
$users = preg_replace( "#(<username>$username</username><passwd>)$passwd</passwd>#e", "'//1'.strtolower('$newpasswd').'</passwd>'", $users);
If the password is well-behaved (alphanumeric, for example), it will be correctly swapped in for the old password. If however, the attacker specifies the new password as something like:
'.system('rm index.php').'
then following code will be executed as part of the regular expression:
strtolower(''.system('rm index.php').'')
This will result in the index.php file (possibly the site’s home page) being deleted from the current directory. Of course, as the attacker is executing commands directly on the web server, he will probably be able to exercise full control over the website.
Side note: Aside from the introduction of a regular expression injection vulnerability, there are two other obvious security problems with this code. The first is the choice to store passwords unencrypted in an XML file (hashing should be used). The second is to force all passwords to be lower case (which will significantly weaken some passwords).
Preventing regular expression injection attacks
The easiest way to avoid regular expression injection attacks is to avoid the use of executable code in regular expressions altogether. Another solution is to properly validate the user input before use (for example, ensuring that the new password contains only safe characters such as letters, numbers and a few choice symbols).