HTML entity parser is the parser that takes HTML code as input and replace all the entities of the special characters by the characters itself.
The special characters and their entities for HTML are:
" and symbol character is ".' and symbol character is '.& and symbol character is &.> and symbol character is >.< and symbol character is <.⁄ and symbol character is /.Given the input text string to the HTML parser, you have to implement the entity parser.
Return the text after replacing the entities by the special characters.
Example 1:
Input: text = "& is an HTML entity but &ambassador; is not." Output: "& is an HTML entity but &ambassador; is not." Explanation: The parser will replace the & entity by &
Example 2:
Input: text = "and I quote: "..."" Output: "and I quote: \"...\""
Constraints:
1 <= text.length <= 105Problem Overview: You receive a string containing HTML entities such as &, <, or >. The task is to replace these encoded entities with their corresponding characters while leaving all other text unchanged.
Approach 1: Using Map for Entity Replacement (O(n) time, O(1) space)
This approach scans the string from left to right and replaces known HTML entities using a lookup table. Store all valid entities in a hash map where the key is the encoded entity (for example &) and the value is the decoded character (&). When you encounter the character '&', start checking substrings up to the next ';'. If the substring matches a key in the map, append the mapped character to the result and skip the processed characters. Otherwise, treat it as normal text. Because each character is processed at most once, the total time complexity is O(n). The map contains a constant number of entities, so the extra space is O(1). This method relies heavily on fast hash lookups, making it a practical use of a hash table combined with efficient string traversal.
The key insight is recognizing that the number of valid HTML entities is fixed. Instead of complex parsing logic, you simply check whether the substring matches one of the allowed tokens. This keeps the implementation straightforward and avoids unnecessary backtracking.
Approach 2: Regular Expression Replacement (O(n) time, O(1) space)
Another clean solution uses a regular expression to match any valid entity pattern and replace it through a callback or replacement function. Define a regex pattern that matches the allowed entities such as &(quot|apos|amp|gt|lt|frasl);. During replacement, map the captured entity name to its decoded character. Most modern regex engines process this in linear time relative to the string length, giving an effective O(n) time complexity with O(1) auxiliary space aside from the output string.
This approach reduces manual parsing logic and keeps the code concise. It works especially well in languages with strong regex support such as JavaScript or C#. Conceptually, the regex acts as a filter that only targets valid entity tokens, leaving all other characters untouched.
Recommended for interviews: The hash map scanning approach is typically preferred. It demonstrates clear control over string parsing, efficient use of a hash table, and predictable linear complexity. Interviewers often expect candidates to show how they detect an entity starting at '&', check the substring, and append the correct character. Regex solutions are valid but may hide the algorithmic reasoning behind library abstractions.
In this approach, we create a mapping dictionary for all the known HTML entities and their corresponding special characters. As we traverse the input text, we check for any substrings starting with '&' that match an entity in our dictionary. If a match is found, we replace it with its corresponding character. This approach efficiently handles replacements using a single pass through the text.
The function entityParser initializes a dictionary with known HTML entities. It then iterates over the input string, checking for entities starting with &. If an entity from the dictionary is found, it appends the corresponding character to the result list and skips to the next unmatched character. It appends characters that do not start entities directly to the result list.
Time Complexity: O(n * m), where n is the length of the input text and m is the average length of the entities. Since there are a fixed number of entities, this effectively reduces to O(n).
Space Complexity: O(n), for storing the output result.
This approach utilizes regular expressions to simplify the replacement of HTML entities. By compiling regular expressions for each entity, we can quickly substitute occurrences within the string.
The entityParser function uses a dictionary to store the HTML entities. For each entity-character pair, it constructs a regex and replaces all occurrences of the entity in the text with the character.
JavaScript
C#
Time Complexity: O(n * m), where n is the length of text and m is the number of entities. Each entity replacement triggers a separate regex operation.
Space Complexity: O(n), considering the space needed for the resultant string.
We can use a hash table to store the corresponding character for each character entity. Then, we traverse the string, and when we encounter a character entity, we replace it with the corresponding character.
The time complexity is O(n times l), and the space complexity is O(l). Here, n is the length of the string, and l is the total length of the character entities.
Python
Java
C++
Go
TypeScript
TypeScript
| Approach | Complexity |
|---|---|
| Using Map for Entity Replacement | Time Complexity: O(n * m), where n is the length of the input text and m is the average length of the entities. Since there are a fixed number of entities, this effectively reduces to O(n). |
| Regular Expression Replacement | Time Complexity: O(n * m), where n is the length of text and m is the number of entities. Each entity replacement triggers a separate regex operation. |
| Hash Table + Simulation | — |
| Default Approach | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Map-Based Entity Replacement | O(n) | O(1) | Best general solution for interviews and clear algorithmic control over parsing |
| Regular Expression Replacement | O(n) | O(1) | Useful when the language has strong regex support and you want concise code |
Leetcode 1410. HTML Entity Parser • Fraz • 914 views views
Watch 5 more video solutions →Practice HTML Entity Parser with our built-in code editor and test cases.
Practice on FleetCode