Tag Validator - Solution & Explanation

HardString Stack11 min readAsked at: Microsoft, Faire

Problem Statement

Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid.

A code snippet is valid if all the following rules hold:

The code must be wrapped in a valid closed tag. Otherwise, the code is invalid.
A closed tag (not necessarily valid) has exactly the following format : <TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.
A valid TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.
A valid TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.
A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested.
A < is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).
The cdata has the following format : <![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.
CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.

Example 1:

Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
Output: true
Explanation: 
The code is wrapped in a closed tag : <DIV> and </DIV>. 
The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. 
Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag.
So TAG_CONTENT is valid, and then the code is valid. Thus return true.

Example 2:

Input: code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
Output: true
Explanation:
We first separate the code into : start_tag|tag_content|end_tag.
start_tag -> "<DIV>"
end_tag -> "</DIV>"
tag_content could also be separated into : text1|cdata|text2.
text1 -> ">>  ![cdata[]] "
cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>"
text2 -> "]]>>]"
The reason why start_tag is NOT "<DIV>>>" is because of the rule 6.
The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.

Example 3:

Input: code = "<A>  <B> </A>   </B>"
Output: false
Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.

Constraints:

1 <= code.length <= 500
code consists of English letters, digits, '<', '>', '/', '!', '[', ']', '.', and ' '.

Approach Overview

Problem Overview: You are given a string representing code written with XML‑style tags. A valid code snippet must follow strict rules: tags must be uppercase, properly nested, wrapped by a single root tag, and CDATA sections must be ignored during validation. The task is to check whether the entire string forms a valid tagged structure.

Approach 1: Stack-Based Parsing (O(n) time, O(n) space)

This approach simulates how a real parser validates nested tags. Iterate through the string character by character. When you encounter an opening tag like <TAG>, push the tag name onto a stack. When a closing tag like </TAG> appears, verify that it matches the stack's top element and pop it. If the order breaks or a closing tag appears without a corresponding opening tag, the code is invalid.

CDATA blocks such as <![CDATA[...]]> require special handling. Once detected, skip all characters until the closing ]]> because content inside CDATA should not be parsed as tags. Tag names must be between 1 and 9 uppercase letters, which you validate while parsing. At the end, the stack must be empty and the entire string must be enclosed by one root tag. This approach runs in O(n) time because each character is processed once, with O(n) space for the stack in the worst case. It relies heavily on concepts from stack processing and string parsing.

Approach 2: Regex-Based Validation (O(n)–O(n²) time, O(n) space)

A regex-driven strategy repeatedly simplifies the string until only a valid root tag remains. First replace CDATA sections using a pattern that matches <![CDATA[...]]> so the parser ignores their internal content. Next apply a regex that matches valid tag structures such as <TAG>content</TAG>. Each successful match can be replaced with a placeholder character.

By repeatedly applying this replacement, nested tags collapse layer by layer. If the final string reduces to a single placeholder representing the root element, the code is valid. While this technique is concise and expressive in languages with strong regex support, repeated replacements may re-scan the string multiple times. That pushes the practical complexity closer to O(n²) in worst cases. It works well for quick validation scripts but is less predictable for large inputs.

Recommended for interviews: The stack-based parser is the expected solution. It demonstrates clear reasoning about nested structures, explicit state management, and careful handling of edge cases like CDATA and tag length rules. Regex-based solutions can work but often hide logic inside complex patterns and repeated replacements. Showing the stack approach proves you understand parsing fundamentals and data structure design.

Approach 1: Stack-Based Parsing

This approach uses a stack to manage nested tags and ensure that for every opening tag there is a corresponding closing tag with the same name. The strategy is to iterate through the string while checking for the start or end of tags and CDATA sections. We push the start tags onto the stack and pop from the stack when a valid matching end tag is found. Special care is given to correctly handling CDATA sections.

The function isValid iterates through the code using a while loop, checking the conditions for CDATA, start tag, and end tag at each step. When a CDATA section is found, it skips past it since its contents are not relevant for tag validation. For end tags, it ensures the top of the stack has a matching start tag before popping it off. For start tags, it checks the format and valid characters before pushing it to the stack. The function returns true only if all tags are properly closed and matched, leaving the stack empty at the end.

Code

Python Java

Python

Java

Complexity

Time Complexity: O(n), where n is the length of the code string since we are scanning through the string once.

Space Complexity: O(n), due to the use of a stack that can, in the worst case, contain the nested tags.

Try this approach in the editor →

Approach 2: Regex-Based Validation

This approach uses regular expressions to match valid patterns directly within the code string. The idea is to replace valid components iteratively, reducing the code string size incrementally until no more valid patterns can be matched. If at the end the code string is empty, it indicates that all tags and content were valid.

Here, the isValid function uses Python's re library to define one regex pattern that matches either a CDATA block or valid nested tags. It uses this pattern within a loop that iteratively removes all occurrences of these valid patterns from the string. The process continues until no further replacements can be made. If by the end the entire string is reduced to empty, this means the input was a valid code. This uses regex to simplify the process of identifying and reducing valid sections by treating them as non-overlapping transformations.

Code

Python JavaScript

Python

JavaScript

Complexity

Time Complexity: O(n²), due to the repeated reduction of the code string in the while loop, where each replacement operation can be considered linear within the current string length.

Space Complexity: O(n), predominantly due to the storage requirements of the regex engine and the intermediate strings produced during processing.

Try this approach in the editor →

Approach 3: Default Approach

Code

Python Java C++Go Rust

Python

Java

C++

Rust

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Stack-Based Parsing	Time Complexity: O(n), where n is the length of the code string since we are scanning through the string once. Space Complexity: O(n), due to the use of a stack that can, in the worst case, contain the nested tags.
Regex-Based Validation	Time Complexity: O(n²), due to the repeated reduction of the code string in the while loop, where each replacement operation can be considered linear within the current string length. Space Complexity: O(n), predominantly due to the storage requirements of the regex engine and the intermediate strings produced during processing.
Default Approach	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Stack-Based Parsing	O(n)	O(n)	Best general solution for interviews and production parsers
Regex-Based Validation	O(n)–O(n²)	O(n)	Quick validation scripts where concise regex is preferred

Video Solution

【每日一题】591 Tag Validator, 05/10/2019 • Huifeng Guan • 669 views views

Watch 5 more video solutions →

Frequently Asked Questions

Is Tag Validator easy or hard?

Tag Validator is classified as a Hard problem on LeetCode with an acceptance rate around 40%. The challenge comes from multiple validation rules, CDATA handling, and strict tag formatting. Correctly managing edge cases while keeping the parser linear time requires careful implementation.

Tag Validator Python/Java solution

Both Python and Java implementations typically follow the same stack-based parsing logic. The code scans the string, extracts tag names, validates them, and pushes or pops from a stack accordingly. Handling CDATA sections with substring checks ensures internal text does not interfere with parsing.

How to solve Tag Validator in O(n)?

Scan the string sequentially and simulate an XML parser using a stack. Push tag names when encountering opening tags and pop them when matching closing tags appear. Skip content inside CDATA blocks so it is not parsed as markup. If the stack is empty at the end and the structure never breaks, the string is valid.

What is the best approach for Tag Validator?

The stack-based parsing approach is the most reliable solution. It processes the string from left to right, pushing opening tags onto a stack and matching them with closing tags. This ensures correct nesting and structure while handling CDATA sections properly. The algorithm runs in O(n) time and O(n) space.

Is Tag Validator asked at Google/Amazon/Meta?

Tag Validator is a classic parsing and stack problem that appears in interview preparation for companies like Google, Amazon, and Meta. It tests string parsing, edge‑case handling, and correct use of stacks for nested structures. Hard difficulty problems like this often show up in senior or onsite rounds.

What data structure is used in Tag Validator?

A stack is the primary data structure used to track nested tags. Each opening tag is pushed onto the stack and must match the next closing tag encountered. This structure naturally models nested hierarchies and ensures proper tag ordering.

What is the time complexity of Tag Validator?

The optimal stack-based solution runs in O(n) time where n is the length of the input string. Each character is scanned once while parsing tags and CDATA blocks. Space complexity is O(n) due to the stack used to store currently open tags.

Ready to solve this problem?

Practice Tag Validator with our built-in code editor and test cases.

Practice on FleetCode

Add Bold Tag in String

Problem Info

DifficultyHard

Acceptance40.4%

Approaches3

Reading time11 min

Asked at

Microsoft Faire

Practice this problem

Open in Editor

Tag Validator - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Stack-Based Parsing

Code

Complexity

Approach 2: Regex-Based Validation

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Tag Validator - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Stack-Based Parsing

Code

Complexity

Approach 2: Regex-Based Validation

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Stack-Based Parsing

Code

Complexity

Approach 2: Regex-Based Validation

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Stack-Based Parsing

Code

Complexity

Approach 2: Regex-Based Validation

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents