HTML Entity Encoder & Decoder - Online XSS Protection Tool

Safely process user input by converting sensitive characters into HTML entities. Prevent XSS attacks and solve character encoding issues in legacy systems.

Enter text and click an action button to see the result here

Usage Tips

  • Basic Encoding: Converts < > " ' & into HTML entities to prevent XSS injection.
  • Deep Encoding: Converts all non-ASCII characters (Chinese, symbols, emojis) into numeric entities for legacy system compatibility.
  • Click "Live Preview" to see how the browser renders the decoded entities without executing scripts.
  • All conversions happen locally in your browser; no data is ever sent to a server.

About the HTML Entity Encoder & Decoder

What is HTML Entity Encoding?

HTML entity encoding is the process of replacing reserved or special characters with their corresponding character entity references. This tool provides two distinct modes designed to handle different security and compatibility challenges. The Basic Mode targets the five critical XML/HTML built-in entities—specifically the less-than sign, greater-than sign, ampersand, double quote, and single quote—transforming them into their safe named equivalents. This serves as a robust defense mechanism against Cross-Site Scripting (XSS) vulnerabilities. The Deep Encoding Mode extends this protection by converting every non-ASCII character, including multilingual text like Chinese characters, typographic punctuation, currency symbols, and emojis, into their decimal numeric character references. This comprehensive approach ensures data integrity when content must traverse legacy protocols, databases with limited character sets, or older email gateways that might otherwise corrupt or mangle modern Unicode text.

Why Escape Characters? Understanding XSS and Encoding Issues

Cross-Site Scripting (XSS) remains one of the most prevalent web security threats. Attackers inject malicious browser-side scripts, often wrapped in standard HTML tags, into web applications. When an application fails to sanitize user-generated content, a simple snippet like <script> can become an executable payload. By converting the fundamental building blocks of HTML syntax—specifically angle brackets and quotation marks—into their inert entity representations, the browser treats the entire string as plain text for display rather than a code fragment for execution. This neutralization is the cornerstone of output encoding strategies. Beyond security, character encoding problems represent a persistent source of garbled text. Many COBOL-based financial systems, older mainframe databases, and even some SMTP servers operate under the assumption of a 7-bit ASCII environment. Sending a raw UTF-8 character into such a pipeline can result in question marks, hollow boxes, or indecipherable mojibake. The tool’s deep encoding function solves this by translating "中文" into a pure ASCII sequence like &#20013;&#25991;, which can survive any 7-bit transport layer perfectly intact.

How to Use the HTML Entity Converter

Using this online HTML entity encoder is straightforward. Begin by typing or pasting your raw text into the input field above. If you need to sanitize user input for display in a web page, leave the "Deep Encoding" toggle unchecked and click the "Basic Encoding" button. The tool will instantly scan the text and replace the specific characters that hold structural meaning in HTML. If you are preparing data for a legacy archive or a system that rejects non-Latin alphabets, activate the "Deep Encoding" checkbox first. This instructs the processor to apply a universal transformation, converting every Unicode character outside the standard ASCII printable range into a decimal numeric entity reference. After the conversion runs, the encoded output appears in the result panel. You can validate the transformation by clicking the "Live Preview" button, which uses the browser's native HTML parser to render the encoded string visually, confirming that it displays correctly without executing any hidden script tags. Finally, use the "Copy Result" button to capture the sanitized string for your production environment.

A Glossary of Entity Types

Predefined XML Entities

&lt; for <
&gt; for >
&amp; for &
&quot; for "
&apos; for '

Decimal Numeric Entities

&#65; renders A
&#20320; renders 你
&#128514; renders 😂

Hexadecimal Entities

&#x3C; renders <
&#x4E2D; renders 中
Some legacy parsers only recognize decimal format, which is what this tool uses for maximum compatibility.

Frequently Asked Questions

What is the difference between basic and deep encoding in this tool?

Basic encoding targets a minimal set of five critical characters that have syntactic meaning in HTML: the less-than sign, greater-than sign, ampersand, double quote, and single quote. This mode is lightweight and sufficient for preventing XSS attacks in standard HTML element content. Deep encoding, conversely, is a much more aggressive transformation. It scans the entire input string and converts any character whose code point is above 127 into a decimal numeric entity reference. This includes all accented letters, logographic scripts like Chinese and Japanese, mathematical symbols, and emojis. The benefit of deep encoding is not necessarily security, but rather universal transportability. When deep encoding is applied, the resulting output is a pure 7-bit ASCII string that can pass through virtually any legacy middleware, email server, or database connection without corruption.

Is it safe to use the live preview feature?

Yes, the live preview feature is designed with safety as the primary concern. When you activate the preview, the tool uses the browser's built-in innerHTML property to render the encoded entity string. However, this is performed strictly on the client side within a sandboxed div element. Modern browsers implement a hardened HTML parser that strictly separates content markup from executable script contexts when using innerHTML to inject static strings. Specifically, script tags inserted via innerHTML are not executed by the rendering engine. While no dynamic script execution occurs, the preview accurately displays how text nodes, formatting tags, and special symbols will appear to an end user, giving you confidence in the output before deploying it.

Can I decode HTML entities back to original characters?

Absolutely. The decode function is the inverse operation of the encoder. It accepts a string containing HTML named entities like &lt; or numeric entities like &#38; and converts them back to their original character representations. The decoding algorithm implemented in the CryptoService recognizes both the standard XML named entities and a wide range of decimal and hexadecimal numeric character references. This feature is particularly useful when you need to recover the original human-readable text from a sanitized database entry or when debugging templates that may have been double-encoded by mistake.

Does this tool handle all Unicode characters?

The tool fully supports the entire Unicode BMP (Basic Multilingual Plane) and supplementary planes for character conversion. When deep encoding is active, any character outside the ASCII range (0-127) is algorithmically converted into its corresponding decimal Unicode code point wrapped in an entity format. This includes everything from Latin-1 supplements and Greek letters to complex emoji sequences and mathematical alphanumeric symbols. The conversion process relies on the JavaScript string iterator, which correctly handles surrogate pairs for characters like emojis that are represented by two UTF-16 code units. This ensures that even characters above U+FFFF are accurately encoded into a single numeric entity rather than being broken into invalid sequences.

Why would I need to convert Chinese characters into HTML entities?

Converting Chinese characters and other non-Latin scripts into HTML numeric entities is a proven strategy for preventing mojibake, the garbled text that appears when character encoding standards mismatch. Many enterprise environments still run on legacy infrastructure where the default character set is Latin-1 or even ASCII. When a modern UTF-8 document containing Chinese text is ingested by a system that interprets it as ISO-8859-1, each multibyte sequence is decoded as a series of Latin accented characters, resulting in complete data corruption. By pre-encoding the Chinese text into pure ASCII entity references, the semantic content is preserved in a format that any system can handle. This technique is critical for email newsletters targeting international audiences, data archival in mixed-encoding databases, or content migration projects involving older CMS platforms.