UTF-8 Validation - Solution & Explanation

MediumArray Bit Manipulation15 min readAsked at: Meta, Tesla, Google +1

Problem Statement

Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For a 1-byte character, the first bit is a 0, followed by its Unicode code.
For an n-bytes character, the first n bits are all one's, the n + 1 bit is 0, followed by n - 1 bytes with the most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

     Number of Bytes   |        UTF-8 Octet Sequence
                       |              (binary)
   --------------------+-----------------------------------------
            1          |   0xxxxxxx
            2          |   110xxxxx 10xxxxxx
            3          |   1110xxxx 10xxxxxx 10xxxxxx
            4          |   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

x denotes a bit in the binary form of a byte that may be either 0 or 1.

Note: The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

Input: data = [197,130,1]
Output: true
Explanation: data represents the octet sequence: 11000101 10000010 00000001.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

Example 2:

Input: data = [235,140,4]
Output: false
Explanation: data represented the octet sequence: 11101011 10001100 00000100.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

Constraints:

1 <= data.length <= 2 * 10⁴
0 <= data[i] <= 255

Approach Overview

Problem Overview: You receive an array of integers where each value represents a byte (only the lowest 8 bits are used). Determine whether the sequence forms a valid UTF-8 encoded character stream. UTF-8 characters may use 1 to 4 bytes, and continuation bytes must follow strict bit patterns.

Approach 1: Iterative Bit Pattern Validation (O(n) time, O(1) space)

This approach scans the array once while tracking how many continuation bytes are expected for the current UTF‑8 character. UTF‑8 encodings follow specific leading bit patterns: 0xxxxxxx for 1 byte, 110xxxxx for 2 bytes, 1110xxxx for 3 bytes, and 11110xxx for 4 bytes. Continuation bytes must always match 10xxxxxx. Iterate through the array and inspect the highest bits using bit masking and shifting. If a leading byte is encountered, determine how many continuation bytes should follow. For each expected continuation byte, verify the 10 prefix. If a mismatch occurs or the sequence ends early, the encoding is invalid. This method uses constant extra memory and relies purely on bit checks, making it the most practical solution. Problems like this commonly appear in bit manipulation practice where understanding binary prefixes is critical.

Approach 2: Recursive Sequence Validation (O(n) time, O(n) recursion depth worst case)

The recursive approach models the validation as processing one UTF‑8 character at a time. Start from index i, inspect the leading byte, and determine the total number of bytes in the character using the prefix bits. Then recursively validate the required continuation bytes and move to the next character start index. Each continuation byte must match the 10xxxxxx pattern before recursion proceeds. This method makes the structure of UTF‑8 characters explicit, but recursion adds overhead and stack usage. While still linear time, it consumes additional call stack space and is less practical than the iterative scan. However, it can help when reasoning about parsing sequences in array traversal problems or recursive parsers.

Recommended for interviews: The iterative bit‑masking approach is what interviewers typically expect. It demonstrates that you understand UTF‑8 byte prefixes and can efficiently validate them with bit operations. The recursive approach proves conceptual understanding of character grouping, but the iterative version shows stronger control over bit manipulation and constant‑space streaming validation.

Approach 1: Approach 1: Iterative Solution

This approach involves solving the problem using an iterative method, where we use loops to perform the necessary calculations. This can be more efficient in terms of space complexity, especially if recursion would lead to excessive function call overhead.

In C, we iterate through a loop to perform operations for each element. This avoids recursion and reduces memory usage as we're only using a single loop with constant space.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n), where n is the number of elements.
Space Complexity: O(1) since we are not using any extra space proportional to the input size.

Try this approach in the editor →

Approach 2: Approach 2: Recursive Solution

This approach explores solving the problem through recursion, which can offer simplicity and expressiveness. However, care must be taken with recursion depth to avoid stack overflow.

The recursive solution in C uses a helper parameter `i` that tracks the current state, printing the current value, and making a recursive call with `i + 1` until `i` reaches `n`.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n)
Space Complexity: O(n) due to the call stack.

Try this approach in the editor →

Approach 3: Single Pass

We use a variable cnt to record the current number of bytes that need to be filled starting with 10, initially cnt = 0.

For each integer v in the array:

If cnt > 0, then check if v starts with 10. If not, return false, otherwise decrement cnt.
If the highest bit of v is 0, then cnt = 0.
If the highest two bits of v are 110, then cnt = 1.
If the highest three bits of v are 1110, then cnt = 2.
If the highest four bits of v are 11110, then cnt = 3.
Otherwise, return false.

Finally, if cnt = 0, return true, otherwise return false.

The time complexity is O(n), where n is the length of the array data. The space complexity is O(1).

Code

Python Java C++Go TypeScript

Python

Java

C++

TypeScript

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Approach 1: Iterative Solution	Time Complexity: O(n), where n is the number of elements. Space Complexity: O(1) since we are not using any extra space proportional to the input size.
Approach 2: Recursive Solution	Time Complexity: O(n) Space Complexity: O(n) due to the call stack.
Single Pass	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Iterative Bit Pattern Validation	O(n)	O(1)	Best general solution. Single pass validation using bit masks and counters.
Recursive Sequence Validation	O(n)	O(n) worst case recursion	Useful for conceptual parsing of UTF‑8 characters or recursive stream validation.

Video Solution

UTF-8 Validation Leetcode • Venkatesh Thallam • 4,942 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is UTF-8 Validation easy or hard?

UTF-8 Validation is generally classified as a medium difficulty problem. The challenge comes from understanding UTF‑8 byte patterns and implementing precise bit checks. Once the encoding rules are clear, the implementation becomes a straightforward linear scan.

UTF-8 Validation Python/Java solution

Python and Java solutions typically implement the iterative bit-mask approach. In Python, bit shifting and masking are used to detect prefix bits, while Java uses integer bit operations for the same checks. Both versions achieve O(n) time and O(1) space.

How to solve UTF-8 Validation in O(n)?

Traverse the byte array while tracking the number of continuation bytes required. For each leading byte, determine its type using bit masks such as 110, 1110, or 11110. Then ensure the following bytes match the continuation pattern 10xxxxxx. Each element is processed once, giving O(n) time complexity.

What is the best approach for UTF-8 Validation?

The iterative bit‑manipulation approach is the most efficient and widely used solution. It scans the array once, determines how many continuation bytes are required for each leading byte, and verifies their bit pattern. The algorithm runs in O(n) time with O(1) extra space, making it ideal for interviews and production code.

Is UTF-8 Validation asked at Google/Amazon/Meta?

UTF-8 parsing and bit manipulation questions appear in interviews at companies like Google, Amazon, and Meta, especially for backend and systems roles. The problem tests understanding of binary representation, encoding rules, and careful iteration through arrays.

What data structure is used in UTF-8 Validation?

The problem mainly uses arrays and bit manipulation operations. The algorithm iterates through the array and uses bit masks or shifts to inspect leading bits in each byte. No complex data structures are required beyond simple counters.

What is the time complexity of UTF-8 Validation?

UTF-8 Validation can be solved in O(n) time because each byte in the array is examined once. The algorithm only performs constant-time bit checks and counter updates per element. Space complexity is O(1) when implemented iteratively.

Ready to solve this problem?

Practice UTF-8 Validation with our built-in code editor and test cases.

Practice on FleetCode

Two Sum

Median of Two Sorted Arrays

Problem Info

DifficultyMedium

Acceptance46.1%

Approaches3

Reading time15 min

Asked at

Meta Tesla Google Snapchat

Practice this problem

Open in Editor

UTF-8 Validation - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Approach 1: Iterative Solution

Code

Complexity

Approach 2: Approach 2: Recursive Solution

Code

Complexity

Approach 3: Single Pass

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

UTF-8 Validation - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Approach 1: Iterative Solution

Code

Complexity

Approach 2: Approach 2: Recursive Solution

Code

Complexity

Approach 3: Single Pass

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Approach 1: Iterative Solution

Code

Complexity

Approach 2: Approach 2: Recursive Solution

Code

Complexity

Approach 3: Single Pass

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Approach 1: Iterative Solution

Code

Complexity

Approach 2: Approach 2: Recursive Solution

Code

Complexity

Approach 3: Single Pass

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents