Watch 6 video solutions for HTML Entity Parser, a medium level problem involving Hash Table, String. This walkthrough by Fraz has 914 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
HTML entity parser is the parser that takes HTML code as input and replace all the entities of the special characters by the characters itself.
The special characters and their entities for HTML are:
" and symbol character is ".' and symbol character is '.& and symbol character is &.> and symbol character is >.< and symbol character is <.⁄ and symbol character is /.Given the input text string to the HTML parser, you have to implement the entity parser.
Return the text after replacing the entities by the special characters.
Example 1:
Input: text = "& is an HTML entity but &ambassador; is not." Output: "& is an HTML entity but &ambassador; is not." Explanation: The parser will replace the & entity by &
Example 2:
Input: text = "and I quote: "..."" Output: "and I quote: \"...\""
Constraints:
1 <= text.length <= 105Problem Overview: You receive a string containing HTML entities such as &, <, or >. The task is to replace these encoded entities with their corresponding characters while leaving all other text unchanged.
Approach 1: Using Map for Entity Replacement (O(n) time, O(1) space)
This approach scans the string from left to right and replaces known HTML entities using a lookup table. Store all valid entities in a hash map where the key is the encoded entity (for example &) and the value is the decoded character (&). When you encounter the character '&', start checking substrings up to the next ';'. If the substring matches a key in the map, append the mapped character to the result and skip the processed characters. Otherwise, treat it as normal text. Because each character is processed at most once, the total time complexity is O(n). The map contains a constant number of entities, so the extra space is O(1). This method relies heavily on fast hash lookups, making it a practical use of a hash table combined with efficient string traversal.
The key insight is recognizing that the number of valid HTML entities is fixed. Instead of complex parsing logic, you simply check whether the substring matches one of the allowed tokens. This keeps the implementation straightforward and avoids unnecessary backtracking.
Approach 2: Regular Expression Replacement (O(n) time, O(1) space)
Another clean solution uses a regular expression to match any valid entity pattern and replace it through a callback or replacement function. Define a regex pattern that matches the allowed entities such as &(quot|apos|amp|gt|lt|frasl);. During replacement, map the captured entity name to its decoded character. Most modern regex engines process this in linear time relative to the string length, giving an effective O(n) time complexity with O(1) auxiliary space aside from the output string.
This approach reduces manual parsing logic and keeps the code concise. It works especially well in languages with strong regex support such as JavaScript or C#. Conceptually, the regex acts as a filter that only targets valid entity tokens, leaving all other characters untouched.
Recommended for interviews: The hash map scanning approach is typically preferred. It demonstrates clear control over string parsing, efficient use of a hash table, and predictable linear complexity. Interviewers often expect candidates to show how they detect an entity starting at '&', check the substring, and append the correct character. Regex solutions are valid but may hide the algorithmic reasoning behind library abstractions.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Map-Based Entity Replacement | O(n) | O(1) | Best general solution for interviews and clear algorithmic control over parsing |
| Regular Expression Replacement | O(n) | O(1) | Useful when the language has strong regex support and you want concise code |