HTML Entity Parser - Solution & Explanation

Q: How to solve HTML Entity Parser in O(n)?

Traverse the string and detect potential entities whenever you see '&'. Extract the substring up to the next ';' and check it against a predefined hash map of valid entities like &, <, and >. If a match exists, append the mapped character; otherwise keep the original text. Each character is visited once, resulting in linear time.

MediumHash Table String11 min readAsked at: Amazon, Meta, Oracle +1

Practice this problem

Problem Statement

HTML entity parser is the parser that takes HTML code as input and replace all the entities of the special characters by the characters itself.

The special characters and their entities for HTML are:

Quotation Mark: the entity is " and symbol character is ".
Single Quote Mark: the entity is ' and symbol character is '.
Ampersand: the entity is & and symbol character is &.
Greater Than Sign: the entity is > and symbol character is >.
Less Than Sign: the entity is < and symbol character is <.
Slash: the entity is &frasl; and symbol character is /.

Given the input text string to the HTML parser, you have to implement the entity parser.

Return the text after replacing the entities by the special characters.

Example 1:

Input: text = "&amp; is an HTML entity but &ambassador; is not."
Output: "& is an HTML entity but &ambassador; is not."
Explanation: The parser will replace the &amp; entity by &

Example 2:

Input: text = "and I quote: &quot;...&quot;"
Output: "and I quote: \"...\""

Constraints:

1 <= text.length <= 10⁵
The string may contain any possible characters out of all the 256 ASCII characters.

Approach Overview

Problem Overview: You receive a string containing HTML entities such as &, <, or >. The task is to replace these encoded entities with their corresponding characters while leaving all other text unchanged.

Approach 1: Using Map for Entity Replacement (O(n) time, O(1) space)

This approach scans the string from left to right and replaces known HTML entities using a lookup table. Store all valid entities in a hash map where the key is the encoded entity (for example &) and the value is the decoded character (&). When you encounter the character '&', start checking substrings up to the next ';'. If the substring matches a key in the map, append the mapped character to the result and skip the processed characters. Otherwise, treat it as normal text. Because each character is processed at most once, the total time complexity is O(n). The map contains a constant number of entities, so the extra space is O(1). This method relies heavily on fast hash lookups, making it a practical use of a hash table combined with efficient string traversal.

The key insight is recognizing that the number of valid HTML entities is fixed. Instead of complex parsing logic, you simply check whether the substring matches one of the allowed tokens. This keeps the implementation straightforward and avoids unnecessary backtracking.

Approach 2: Regular Expression Replacement (O(n) time, O(1) space)

Another clean solution uses a regular expression to match any valid entity pattern and replace it through a callback or replacement function. Define a regex pattern that matches the allowed entities such as &(quot|apos|amp|gt|lt|frasl);. During replacement, map the captured entity name to its decoded character. Most modern regex engines process this in linear time relative to the string length, giving an effective O(n) time complexity with O(1) auxiliary space aside from the output string.

This approach reduces manual parsing logic and keeps the code concise. It works especially well in languages with strong regex support such as JavaScript or C#. Conceptually, the regex acts as a filter that only targets valid entity tokens, leaving all other characters untouched.

Recommended for interviews: The hash map scanning approach is typically preferred. It demonstrates clear control over string parsing, efficient use of a hash table, and predictable linear complexity. Interviewers often expect candidates to show how they detect an entity starting at '&', check the substring, and append the correct character. Regex solutions are valid but may hide the algorithmic reasoning behind library abstractions.

Approach 1: Using Map for Entity Replacement

In this approach, we create a mapping dictionary for all the known HTML entities and their corresponding special characters. As we traverse the input text, we check for any substrings starting with '&' that match an entity in our dictionary. If a match is found, we replace it with its corresponding character. This approach efficiently handles replacements using a single pass through the text.

The function entityParser initializes a dictionary with known HTML entities. It then iterates over the input string, checking for entities starting with &. If an entity from the dictionary is found, it appends the corresponding character to the result list and skips to the next unmatched character. It appends characters that do not start entities directly to the result list.

Code

Python Java

Python

Java

Complexity

Time Complexity: O(n * m), where n is the length of the input text and m is the average length of the entities. Since there are a fixed number of entities, this effectively reduces to O(n).
Space Complexity: O(n), for storing the output result.

Try this approach in the editor →

Approach 2: Regular Expression Replacement

This approach utilizes regular expressions to simplify the replacement of HTML entities. By compiling regular expressions for each entity, we can quickly substitute occurrences within the string.

The entityParser function uses a dictionary to store the HTML entities. For each entity-character pair, it constructs a regex and replaces all occurrences of the entity in the text with the character.

Code

JavaScript C#

JavaScript

Complexity

Time Complexity: O(n * m), where n is the length of text and m is the number of entities. Each entity replacement triggers a separate regex operation.
Space Complexity: O(n), considering the space needed for the resultant string.

Try this approach in the editor →

Approach 3: Hash Table + Simulation

We can use a hash table to store the corresponding character for each character entity. Then, we traverse the string, and when we encounter a character entity, we replace it with the corresponding character.

The time complexity is O(n times l), and the space complexity is O(l). Here, n is the length of the string, and l is the total length of the character entities.

Code

Python Java C++Go TypeScript

Python

Java

C++

TypeScript

Try this approach in the editor →

Approach 4: Default Approach

Code

TypeScript

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Using Map for Entity Replacement	Time Complexity: O(n * m), where n is the length of the input text and m is the average length of the entities. Since there are a fixed number of entities, this effectively reduces to O(n). Space Complexity: O(n), for storing the output result.
Regular Expression Replacement	Time Complexity: O(n * m), where n is the length of text and m is the number of entities. Each entity replacement triggers a separate regex operation. Space Complexity: O(n), considering the space needed for the resultant string.
Hash Table + Simulation	—
Default Approach	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Map-Based Entity Replacement	O(n)	O(1)	Best general solution for interviews and clear algorithmic control over parsing
Regular Expression Replacement	O(n)	O(1)	Useful when the language has strong regex support and you want concise code

Video Solution

Leetcode 1410. HTML Entity Parser • Fraz • 914 views views

Watch 5 more video solutions →

Frequently Asked Questions

Is HTML Entity Parser easy or hard?

HTML Entity Parser is generally rated Medium difficulty. The main challenge is correctly identifying entity boundaries and replacing them without misinterpreting normal text. Once the entity mapping is defined, the implementation becomes a straightforward linear scan.

HTML Entity Parser Python/Java solution

In Python or Java, the typical solution defines a dictionary or HashMap containing mappings such as "&" to "&" and "<" to "<". Iterate through the string and check substrings whenever '&' appears. Replace recognized entities and append characters to a result builder.

How to solve HTML Entity Parser in O(n)?

Traverse the string and detect potential entities whenever you see '&'. Extract the substring up to the next ';' and check it against a predefined hash map of valid entities like &, <, and >. If a match exists, append the mapped character; otherwise keep the original text. Each character is visited once, resulting in linear time.

What is the best approach for HTML Entity Parser?

The hash map scanning approach is the most practical solution. Store all valid HTML entities in a map and iterate through the string, checking substrings that start with '&'. Each lookup takes constant time, giving an overall time complexity of O(n) with O(1) extra space.

Is HTML Entity Parser asked at Google/Amazon/Meta?

Problems involving string parsing and token replacement frequently appear in interviews at companies like Google, Amazon, and Meta. HTML Entity Parser specifically tests careful string traversal, pattern recognition, and efficient use of hash tables.

What data structure is used in HTML Entity Parser?

The most common data structure is a hash table that maps encoded entities to their decoded characters. This allows constant-time lookup when an entity substring is detected. The algorithm also relies on sequential string traversal.

What is the time complexity of HTML Entity Parser?

The optimal solution runs in O(n) time where n is the length of the input string. Each character is processed once, and entity detection uses constant-time hash lookups. Space complexity is O(1) because the entity mapping contains a fixed number of entries.

Ready to solve this problem?

Practice HTML Entity Parser with our built-in code editor and test cases.

Practice on FleetCode

Two Sum

Longest Substring Without Repeating Characters

Problem Info

DifficultyMedium

Acceptance50.5%

Approaches4

Reading time11 min

Asked at

Amazon Meta Oracle Google

Practice this problem

Open in Editor

HTML Entity Parser - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Using Map for Entity Replacement

Code

Complexity

Approach 2: Regular Expression Replacement

Code

Complexity

Approach 3: Hash Table + Simulation

Code

Approach 4: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

HTML Entity Parser - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Using Map for Entity Replacement

Code

Complexity

Approach 2: Regular Expression Replacement

Code

Complexity

Approach 3: Hash Table + Simulation

Code

Approach 4: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Using Map for Entity Replacement

Code

Complexity

Approach 2: Regular Expression Replacement

Code

Complexity

Approach 3: Hash Table + Simulation

Code

Approach 4: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Using Map for Entity Replacement

Code

Complexity

Approach 2: Regular Expression Replacement

Code

Complexity

Approach 3: Hash Table + Simulation

Code

Approach 4: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents