Understanding URL Encoding

A look at how special characters get represented in web addresses.

URL encoding — you'll also hear it called percent-encoding — is how characters get represented in a web address when they can't appear there directly. If you've ever noticed a %20 where a space should be, that's it in action. It gives you a standard way to slip in characters that either have special meaning or aren't allowed in a URL at all. There's a URL encoding tool on this site if you'd like to try it as you read.

Why encoding is needed

A URL can only hold a limited set of characters. A few of them — the question mark, the ampersand, the slash — carry specific meaning and are used to separate the parts of the address. Others, like spaces, aren't permitted at all. So when you need to include data that might contain any of these, it has to be encoded first. Otherwise those characters would collide with the structure of the URL itself.

How percent-encoding works

The idea is simple: a character gets replaced by a percent sign followed by two hexadecimal digits that stand for its value. A space, for instance, usually becomes %20. This only happens to characters that are reserved or disallowed. Letters, digits, and a handful of other characters are considered safe and pass through untouched. What you end up with is a string made entirely of permitted characters that still represents the original data exactly.

Reserved and unreserved characters

You'll often see characters in a URL split into two groups: reserved and unreserved. Reserved characters do a job, like separating the components of the address, so they need to be encoded whenever you mean them as plain data rather than as separators. Unreserved characters — the letters, the digits, and a few symbols such as the hyphen and underscore — can sit in a URL as-is. Once that distinction clicks, it's much clearer why some characters get encoded and others don't.

Encoding components versus whole URLs

More often than not, you only want to encode part of a URL — a value being passed within it, say — rather than the whole thing. Encoding an entire URL blindly would mangle the very characters that give it its structure. That's why most programming environments hand you two separate functions: one for a full URL and one for an individual component. Picking the right one comes down to what you're actually encoding.

Summary

URL encoding swaps reserved or disallowed characters for a percent sign and two hexadecimal digits, which lets data ride along in a web address without breaking its structure. Unreserved characters don't need any of this. And it's worth being deliberate about whether you're encoding a whole URL or just a component, so you don't accidentally rewrite the parts that carry special meaning.

Try the URL encoder · Back to all articles