#2506 Count Pairs Of Similar Strings - Solution

You are given a 0-indexed string array words.

Two strings are similar if they consist of the same characters.

For example, "abca" and "cba" are similar since both consist of characters 'a', 'b', and 'c'.
However, "abacba" and "bcfd" are not similar since they do not consist of the same characters.

Return the number of pairs (i, j) such that 0 <= i < j <= word.length - 1 and the two strings words[i] and words[j] are similar.

Example 1:

Input: words = ["aba","aabb","abcd","bac","aabc"]
Output: 2
Explanation: There are 2 pairs that satisfy the conditions:
- i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'. 
- i = 3 and j = 4 : both words[3] and words[4] only consist of characters 'a', 'b', and 'c'.

Example 2:

Input: words = ["aabb","ab","ba"]
Output: 3
Explanation: There are 3 pairs that satisfy the conditions:
- i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'. 
- i = 0 and j = 2 : both words[0] and words[2] only consist of characters 'a' and 'b'.
- i = 1 and j = 2 : both words[1] and words[2] only consist of characters 'a' and 'b'.

Example 3:

Input: words = ["nba","cba","dba"]
Output: 0
Explanation: Since there does not exist any pair that satisfies the conditions, we return 0.

Constraints:

1 <= words.length <= 100
1 <= words[i].length <= 100
words[i] consist of only lowercase English letters.

In #2506 Count Pairs Of Similar Strings, two strings are considered similar if they contain the same set of characters, regardless of order or frequency. The key idea is to transform each word into a representation that captures its unique characters.

A common approach is to build a bitmask for every string. Since there are only 26 lowercase letters, each character can map to a specific bit in an integer. For every character in the word, set its corresponding bit using bit manipulation. Words that share the same set of characters will produce the same bitmask.

After generating these masks, use a hash table to count how many times each mask appears. For every mask that appears k times, the number of valid pairs is k * (k - 1) / 2. This avoids comparing every pair of strings and keeps the solution efficient.

This approach reduces repeated comparisons and ensures near-linear performance with respect to the total number of characters processed.