This commit is contained in:
2026-05-18 15:31:52 -04:00
parent 269d560847
commit 41be5a2e24
3 changed files with 384 additions and 239 deletions

View File

@@ -1,4 +1,6 @@
# Analysis #XX — Two Sum Is Not About Numbers
# Analysis #03 — Two Sum Is Not About Numbers
---
## Problem
@@ -9,9 +11,10 @@ At first glance, the problem looks trivial:
This is one of the most well-known interview questions, commonly referred to as **Two Sum**.
It is simple, clean, and perfectly defined:
- a static array
- exact arithmetic
- a guaranteed answer
- a static array
- exact arithmetic
- a guaranteed answer
And thats exactly why it works so well in interviews.
@@ -21,10 +24,10 @@ And thats exactly why it works so well in interviews.
A candidate is expected to go through a familiar progression:
1. Start with brute force (O(n²))
2. Recognize inefficiency
3. Optimize using a hash map
4. Achieve O(n) time complexity
1. Start with brute force (O(n²))
2. Recognize inefficiency
3. Optimize using a hash map
4. Achieve O(n) time complexity
```cpp
unordered_map<int, int> seen;
@@ -40,7 +43,7 @@ for (int i = 0; i < n; ++i) {
}
```
The “correct” answer is not about solving the problem.
The “correct” answer is not really about solving the problem.
It is about recognizing the pattern.
@@ -50,35 +53,41 @@ It is about recognizing the pattern.
Despite its simplicity, this problem evaluates:
- familiarity with standard patterns
- ability to choose a data structure
- understanding of time complexity
- familiarity with standard patterns
- ability to choose a data structure
- understanding of time complexity
But most importantly:
> it tests whether you have seen this problem before.
A candidate who has already practiced this family of tasks will likely reach the expected answer quickly.
A candidate who has spent years solving real engineering problems may still pause — not because the problem is hard, but because the interview expects a very specific kind of answer.
---
## A Subtle Shift
Now lets take the same idea and move it one step closer to reality.
Now lets move the same idea one step closer to reality.
Instead of numbers, we have **log events**.
Instead of a static array, we have a **stream**.
Instead of a clean equality, we have **imperfect data and thresholds**.
Instead of a clean equality, we have **imperfect data, context, and thresholds**.
---
## Synthetic Log Example
```
```text
2026-04-16T10:15:01.123Z service=api event=parse_input latency=12ms request_id=req-1001
2026-04-16T10:15:01.130Z service=cache event=cache_miss latency=48ms request_id=req-1001
2026-04-16T10:15:01.135Z service=db event=read_user latency=55ms request_id=req-1001
2026-04-16T10:15:01.141Z service=auth event=token_check latency=18ms request_id=req-2001
2026-04-16T10:15:01.144Z service=net event=external_call latency=47ms request_id=req-1001
2026-04-16T10:15:01.149Z service=db event=read_user latency=22ms request_id=req-2001
2026-04-16T10:15:01.151Z service=cache event=cache_miss latency=60ms request_id=req-3001
2026-04-16T10:15:01.154Z service=net event=external_call latency=52ms request_id=req-3001
```
@@ -92,9 +101,9 @@ We are no longer asked to find two numbers.
Instead, the problem becomes:
> Detect whether there exist two events:
> - belonging to the same request
> - occurring close in time
> - whose combined latency exceeds a threshold
> - belonging to the same request
> - occurring within a time window
> - whose combined latency exceeds a threshold
This still *looks* like Two Sum.
@@ -102,60 +111,104 @@ But it is not.
---
## How LeetCode Thinking Tries to Adapt
The first instinct is to simplify.
Take the log stream, ignore most of the structure, extract just the latency values, and reduce everything back to “numbers in an array”.
That leads to a familiar line of thinking:
1. Collect latencies
2. Search for matching pairs
3. Try to reuse the same hash map pattern
4. Treat the task as another variation of Two Sum
This is exactly what interview training encourages:
> reduce the problem until it matches a known template.
That works beautifully in interviews.
But this is also where the model starts to break.
---
## Where the Interview Model Breaks
### 1. No Exact Match
### 1. It Is Not an Exact-Match Problem
Interview version:
```
a + b == target
```
Real version:
```
a + b > threshold
```
We are not searching for a perfect complement.
a + b > threshold
We are not searching for a perfect complement.
We are evaluating a condition.
---
### 2. Context Is Mandatory
### 2. Context Cannot Be Ignored
You cannot combine arbitrary events.
A latency of 55ms from one request and 52ms from another may exceed the threshold.
A latency spike only makes sense **within the same request**.
But together they mean nothing.
Without context, the result is meaningless.
Without context, the result is technically correct — and completely useless.
---
### 3. Time Matters
### 3. Time Makes the Problem Harder
Events are not just values — they exist in time.
Two events five seconds apart may not be related at all.
Two events may belong to the same request and still be unrelated if they are too far apart.
This introduces:
- time windows
- ordering issues
- temporal constraints
- time windows
- ordering
- eviction
---
### 4. Data Is Not Static
### 4. The Data Is Not Static
LeetCode assumes:
- full dataset
- already loaded
- perfectly ordered
Interview assumptions:
- full dataset available
- stable ordering
- perfect input
Reality:
- streaming input
- delayed events
- missing entries
- out-of-order delivery
- streaming data
- out-of-order events
- missing logs
- duplicates
The “single clean pass over an array” stops being a valid model.
---
### 5. Pattern Matching Becomes a Trap
The more familiar the pattern, the stronger the temptation:
> “This is just Two Sum.”
But in reality:
- request_id defines grouping
- timestamp defines relevance
- streaming defines constraints
These are not details.
They are the problem.
---
@@ -169,21 +222,22 @@ It becomes:
> “determine which events are comparable at all”
And that is a fundamentally different problem.
The arithmetic is trivial.
The system is not.
---
## Real Engineering Approach
Instead of solving a mathematical puzzle, we build a system.
Instead of solving a puzzle, we build a mechanism.
### Core Idea
Maintain a sliding window of recent events per request.
Maintain a sliding window per request_id.
### Pseudocode
```
for each incoming event:
bucket = active_events[event.request_id]
@@ -194,7 +248,6 @@ for each incoming event:
report anomaly
add event to bucket
```
---
@@ -202,17 +255,52 @@ for each incoming event:
Now we must deal with:
- bounded memory
- streaming constraints
- time-based eviction
- correlation logic
- bounded memory
- streaming constraints
- time-based eviction
- request-level grouping
And beyond that:
And then reality hits:
- out-of-order events
- duplicate logs
- partial data
- noise filtering
- out-of-order events
- duplicate logs
- partial data
- noise
At this point, the original Two Sum is almost unrecognizable.
---
## Demo
See example implementation:
- examples/two_sum_logs_demo.cpp
---
## Example Output
Interview-style reduction:
combines events from different request_id → false positive
Streaming solution:
finds valid pair within same request and time window
---
## Explanation
The interview-style solution produces a mathematically valid result.
But it mixes unrelated events.
The streaming solution respects:
- request boundaries
- time constraints
Which makes the result meaningful.
---
@@ -222,20 +310,20 @@ The difficulty is not in computing a sum.
The difficulty is in defining:
- what data is valid
- what events belong together
- what “close enough” means
- how the system behaves under imperfect conditions
- what data is valid
- what events belong together
- what “close enough” means
- how the system behaves under imperfect conditions
---
## Key Takeaway
Two Sum is often presented as a problem about numbers.
Two Sum is not about numbers.
In reality, it is a problem about assumptions.
It is about assumptions.
Remove those assumptions, and the problem changes completely.
Remove those assumptions and the problem changes completely.
> The challenge is not finding two values.
> The challenge is understanding whether those values should ever be compared.
@@ -245,7 +333,14 @@ Remove those assumptions, and the problem changes completely.
## Project Perspective
Exists in real engineering?
→ Yes, but as event correlation under constraints
→ Yes, but as event correlation under constraints
Exists in interview form?
→ Yes, but stripped of context and complexity
→ Yes, but stripped of context and complexity
---
## Final Note
The algorithm was never the hard part.
The assumptions were.