HTML Entity Encoder
Encode and decode HTML entities for safe display in web pages.
Common HTML Entities
Click to copy entity code
| Char | Name | Numeric | Description |
|---|
What HTML entities are
HTML entities are escape codes for characters that would otherwise have special meaning in markup. < would start a tag; written as < it shows up as a literal less-than sign. The same goes for >, &, and the quote characters — if you want them to appear as text instead of being interpreted as HTML syntax, you encode them.
Two forms exist: named (&, ©, —) and numeric (&, ©, or hex —). Named entities are easier to read, but only a few hundred characters have names. Numeric works for any Unicode code point.
This tool encodes plain text into safe HTML, decodes HTML back to plain text, and lists every named entity with its character and code points for reference.
The five characters that always need encoding
<→<— would otherwise start a tag>→>— closes tags&→&— starts an entity itself, must be escaped first"→"— needed inside double-quoted attributes'→'— needed inside single-quoted attributes (the named'isn't reliable in older HTML)
Almost everything else is either safe to use directly (modern UTF-8 documents) or a stylistic choice.
When to encode
- Code samples in documentation. Any time you display HTML, XML, or any tag-like syntax in a web page, you have to escape the angle brackets or the browser parses them as real tags.
- User-generated content. Comments, forum posts, anything a user typed that ends up rendered. Encoding before rendering is the basic defense against XSS.
- Email templates. Some legacy email clients prefer entities for non-ASCII characters; modern UTF-8 emails handle them fine.
- Strict-XML contexts. XML and XHTML are stricter about ampersands than HTML; an unescaped
&in an attribute value will fail validation.
When you don't need to
Modern HTML pages declared as UTF-8 (which is essentially all of them) can include any printable Unicode character directly in source. © works, but so does ©. — works, but so does —. Entities are only required for the five reserved characters above, plus anything that the source file's encoding can't represent. For UTF-8 source, that's nothing.
People still entity-encode by reflex. It's harmless, just visually noisier than necessary. Internal codebase consistency is the only argument for choosing one style over the other.
Useful named entities to know by sight
— non-breaking space (won't line-break)–(–) and—(—) — en-dash and em-dash©(©),®(®),™(™) — legal symbols““””‘‘’’ — smart quotes…(…) — ellipsis×(×) and÷(÷) — multiplication and division signs
Numeric encoding
For anything without a named entity — emoji, accented letters, math symbols — numeric encoding works. 👋 or 👋 both render as 👋. The hex form (with x) is usually what you'll see in modern code, because it matches the way Unicode code points are written everywhere else.
Security note
Entity encoding is a useful piece of XSS defense, but only when applied at the right layer. Encoding once when content is rendered into HTML is correct; encoding twice (once in storage, again on render) produces visible & garbage; not encoding at the render boundary because "the data is already clean" is how XSS happens. Pick one boundary and encode there.