Sponsored
Sponsored
In this approach, the goal is to group the data by both `date_id` and `make_name`. While iterating over the data, for each group, maintain two sets, one for distinct `lead_id`s and one for distinct `partner_id`s. Ensure to keep track of only unique IDs by using sets. Once all data in a group has been processed, the size of each set gives the number of unique IDs.
The time complexity is O(n), where n is the number of entries, as we process each entry once. The space complexity is O(u), where u is the number of unique (date_id, make_name) combinations times the distinct IDs per combination.
1from collections import defaultdict
2
3def count_distinct_ids(data):
4 result = []
5 sales = defaultdict(lambda: defaultdict(lambda: {'leads': set(), 'partners': set()}))
6
7 for entry in data:
8 date_id, make_name, lead_id, partner_id = entry['date_id'], entry['make_name'], entry['lead_id'], entry['partner_id']
9 sales[date_id][make_name]['leads'].add(lead_id)
10 sales[date_id][make_name]['partners'].add(partner_id)
11
12 for date_id in sales:
13 for make_name in sales[date_id]:
14 unique_leads = len(sales[date_id][make_name]['leads'])
15 unique_partners = len(sales[date_id][make_name]['partners'])
16 result.append({
17 'date_id': date_id,
18 'make_name': make_name,
19 'unique_leads': unique_leads,
20 'unique_partners': unique_partners
21 })
22 return result
23
24# Example premise
25data = [
26 {'date_id': '2020-12-8', 'make_name': 'toyota', 'lead_id': 0, 'partner_id': 1},
27 {'date_id': '2020-12-8', 'make_name': 'toyota', 'lead_id': 1, 'partner_id': 0},
28 # ... other data rows ...
29]
30
31print(count_distinct_ids(data))In the Python solution, a dictionary with nested dictionaries is used to group data by `date_id` and `make_name`. Each combination keeps two sets for distinct `lead_id` and `partner_id`. The final counts are added to the result.
SQL can efficiently handle this task using aggregation functions. The plan here is to use the GROUP BY clause with COUNT(DISTINCT ...) to find distinct lead and partner counts for each combination of `date_id` and `make_name`.
The time complexity depends on the database indices and can often be O(n log n) due to sorting and deduplication. The space complexity is determined by the size of the intermediate tables, typically O(g) where g is the size of the distinct groups formed.
1SELECT date_id,
2 make_name,
3 COUNT(DISTINCT lead_id) ASThe SQL approach utilizes the GROUP BY clause to aggregate results based on `date_id` and `make_name`. The COUNT(DISTINCT ...) function is used twice to determine the number of unique `lead_id`s and `partner_id`s within each group. The result is a table with the necessary counts per unique date and make.