Quick Tip: Detect and Encode Curly Brackets in URL Validation

Posted on Oct 26 2009 in Web Development

Validating user input is always a great idea from a usability and security point of view. However, when it comes to things like URLs, the data is complex and there is a very strict pattern that the data has to adhere to. From a data perspective, this is great news, since we can validate for what we want, not try to detect what we don't.

However, a lot of modern URLs don't always do a great job following RFC 1738. Specifically, I'm looking at you .Net guys who insist on putting UUIDs wrapped in curly brackets in query strings and the like. According to RFC 1738, curly brackets are "unsafe" within URLs and should be encoded to their URL-encoded entities.

So, technically, curly brackets are fine in URLs (if encoded), but when a user pastes their URL with curly brackets into your site and you pass it through your likely regex-based validation algorithm, you are likely to experience a validation failure since the curly brackets aren't allowed. Now, sure, you could pass the entire URL through something like urlencode() or rawurlencode() -- but that encodes everything! What I do is simple: I just replace any brackets and then continue on my merry way:

$replace = array('{' => '%7B', '}' => '%7D');
$newval = str_replace(array_keys($replace), array_values($replace), $user_url);
if (!my_url_validation_func($newval)) {
  my_error_function("Hey, you need a valid URL!");
}

If you're saving the URL, you should probably save the encoded version, but you don't have to, as long as you adhere to RFC 1738 on output (although most modern browsers are fine if you don't).

I've found this especially useful while doing URL validation in Drupal, as its built-in (and contributed) URL validation routine all seem to be pretty adherent to RFC 1738.