Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid.
A code snippet is valid if all the following rules hold:
<TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.< is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).<![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.
Example 1:
Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>" Output: true Explanation: The code is wrapped in a closed tag : <DIV> and </DIV>. The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag. So TAG_CONTENT is valid, and then the code is valid. Thus return true.
Example 2:
Input: code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>" Output: true Explanation: We first separate the code into : start_tag|tag_content|end_tag. start_tag -> "<DIV>" end_tag -> "</DIV>" tag_content could also be separated into : text1|cdata|text2. text1 -> ">> ![cdata[]] " cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>" text2 -> "]]>>]" The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.
Example 3:
Input: code = "<A> <B> </A> </B>" Output: false Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.
Constraints:
1 <= code.length <= 500code consists of English letters, digits, '<', '>', '/', '!', '[', ']', '.', and ' '.This approach uses a stack to manage nested tags and ensure that for every opening tag there is a corresponding closing tag with the same name. The strategy is to iterate through the string while checking for the start or end of tags and CDATA sections. We push the start tags onto the stack and pop from the stack when a valid matching end tag is found. Special care is given to correctly handling CDATA sections.
The function isValid iterates through the code using a while loop, checking the conditions for CDATA, start tag, and end tag at each step. When a CDATA section is found, it skips past it since its contents are not relevant for tag validation. For end tags, it ensures the top of the stack has a matching start tag before popping it off. For start tags, it checks the format and valid characters before pushing it to the stack. The function returns true only if all tags are properly closed and matched, leaving the stack empty at the end.
Java
Time Complexity: O(n), where n is the length of the code string since we are scanning through the string once.
Space Complexity: O(n), due to the use of a stack that can, in the worst case, contain the nested tags.
This approach uses regular expressions to match valid patterns directly within the code string. The idea is to replace valid components iteratively, reducing the code string size incrementally until no more valid patterns can be matched. If at the end the code string is empty, it indicates that all tags and content were valid.
Here, the isValid function uses Python's re library to define one regex pattern that matches either a CDATA block or valid nested tags. It uses this pattern within a loop that iteratively removes all occurrences of these valid patterns from the string. The process continues until no further replacements can be made. If by the end the entire string is reduced to empty, this means the input was a valid code. This uses regex to simplify the process of identifying and reducing valid sections by treating them as non-overlapping transformations.
JavaScript
Time Complexity: O(n2), due to the repeated reduction of the code string in the while loop, where each replacement operation can be considered linear within the current string length.
Space Complexity: O(n), predominantly due to the storage requirements of the regex engine and the intermediate strings produced during processing.
| Approach | Complexity |
|---|---|
| Stack-Based Parsing | Time Complexity: O(n), where n is the length of the code string since we are scanning through the string once. Space Complexity: O(n), due to the use of a stack that can, in the worst case, contain the nested tags. |
| Regex-Based Validation | Time Complexity: O(n2), due to the repeated reduction of the code string in the while loop, where each replacement operation can be considered linear within the current string length. Space Complexity: O(n), predominantly due to the storage requirements of the regex engine and the intermediate strings produced during processing. |
The LeetCode Fallacy • NeetCode • 628,353 views views
Watch 9 more video solutions →Practice Tag Validator with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor