second
This commit is contained in:
251
analysis/03-Two_Sum_Is_Not_About_Numbers/readme.md
Normal file
251
analysis/03-Two_Sum_Is_Not_About_Numbers/readme.md
Normal file
@@ -0,0 +1,251 @@
|
|||||||
|
# Analysis #XX — Two Sum Is Not About Numbers
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
At first glance, the problem looks trivial:
|
||||||
|
|
||||||
|
> Given a list of values, find two elements whose sum equals a target.
|
||||||
|
|
||||||
|
This is one of the most well-known interview questions, commonly referred to as **Two Sum**.
|
||||||
|
|
||||||
|
It is simple, clean, and perfectly defined:
|
||||||
|
- a static array
|
||||||
|
- exact arithmetic
|
||||||
|
- a guaranteed answer
|
||||||
|
|
||||||
|
And that’s exactly why it works so well in interviews.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Typical Interview Thinking
|
||||||
|
|
||||||
|
A candidate is expected to go through a familiar progression:
|
||||||
|
|
||||||
|
1. Start with brute force (O(n²))
|
||||||
|
2. Recognize inefficiency
|
||||||
|
3. Optimize using a hash map
|
||||||
|
4. Achieve O(n) time complexity
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
unordered_map<int, int> seen;
|
||||||
|
|
||||||
|
for (int i = 0; i < n; ++i) {
|
||||||
|
int complement = target - nums[i];
|
||||||
|
|
||||||
|
if (seen.count(complement)) {
|
||||||
|
return {seen[complement], i};
|
||||||
|
}
|
||||||
|
|
||||||
|
seen[nums[i]] = i;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The “correct” answer is not about solving the problem.
|
||||||
|
|
||||||
|
It is about recognizing the pattern.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What This Actually Tests
|
||||||
|
|
||||||
|
Despite its simplicity, this problem evaluates:
|
||||||
|
|
||||||
|
- familiarity with standard patterns
|
||||||
|
- ability to choose a data structure
|
||||||
|
- understanding of time complexity
|
||||||
|
|
||||||
|
But most importantly:
|
||||||
|
|
||||||
|
> it tests whether you have seen this problem before.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## A Subtle Shift
|
||||||
|
|
||||||
|
Now let’s take the same idea and move it one step closer to reality.
|
||||||
|
|
||||||
|
Instead of numbers, we have **log events**.
|
||||||
|
|
||||||
|
Instead of a static array, we have a **stream**.
|
||||||
|
|
||||||
|
Instead of a clean equality, we have **imperfect data and thresholds**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Synthetic Log Example
|
||||||
|
|
||||||
|
```
|
||||||
|
2026-04-16T10:15:01.123Z service=api event=parse_input latency=12ms request_id=req-1001
|
||||||
|
2026-04-16T10:15:01.130Z service=cache event=cache_miss latency=48ms request_id=req-1001
|
||||||
|
2026-04-16T10:15:01.135Z service=db event=read_user latency=55ms request_id=req-1001
|
||||||
|
2026-04-16T10:15:01.144Z service=net event=external_call latency=47ms request_id=req-1001
|
||||||
|
2026-04-16T10:15:01.151Z service=cache event=cache_miss latency=60ms request_id=req-3001
|
||||||
|
2026-04-16T10:15:01.154Z service=net event=external_call latency=52ms request_id=req-3001
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Real Problem
|
||||||
|
|
||||||
|
We are no longer asked to find two numbers.
|
||||||
|
|
||||||
|
Instead, the problem becomes:
|
||||||
|
|
||||||
|
> Detect whether there exist two events:
|
||||||
|
> - belonging to the same request
|
||||||
|
> - occurring close in time
|
||||||
|
> - whose combined latency exceeds a threshold
|
||||||
|
|
||||||
|
This still *looks* like Two Sum.
|
||||||
|
|
||||||
|
But it is not.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Where the Interview Model Breaks
|
||||||
|
|
||||||
|
### 1. No Exact Match
|
||||||
|
|
||||||
|
Interview version:
|
||||||
|
```
|
||||||
|
a + b == target
|
||||||
|
```
|
||||||
|
|
||||||
|
Real version:
|
||||||
|
```
|
||||||
|
a + b > threshold
|
||||||
|
```
|
||||||
|
|
||||||
|
We are not searching for a perfect complement.
|
||||||
|
We are evaluating a condition.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Context Is Mandatory
|
||||||
|
|
||||||
|
You cannot combine arbitrary events.
|
||||||
|
|
||||||
|
A latency spike only makes sense **within the same request**.
|
||||||
|
|
||||||
|
Without context, the result is meaningless.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Time Matters
|
||||||
|
|
||||||
|
Events are not just values — they exist in time.
|
||||||
|
|
||||||
|
Two events five seconds apart may not be related at all.
|
||||||
|
|
||||||
|
This introduces:
|
||||||
|
- time windows
|
||||||
|
- ordering issues
|
||||||
|
- temporal constraints
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Data Is Not Static
|
||||||
|
|
||||||
|
LeetCode assumes:
|
||||||
|
- full dataset
|
||||||
|
- already loaded
|
||||||
|
- perfectly ordered
|
||||||
|
|
||||||
|
Reality:
|
||||||
|
- streaming input
|
||||||
|
- delayed events
|
||||||
|
- missing entries
|
||||||
|
- out-of-order delivery
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What the Problem Really Becomes
|
||||||
|
|
||||||
|
At this point, the challenge is no longer:
|
||||||
|
|
||||||
|
> “find two numbers”
|
||||||
|
|
||||||
|
It becomes:
|
||||||
|
|
||||||
|
> “determine which events are comparable at all”
|
||||||
|
|
||||||
|
And that is a fundamentally different problem.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Real Engineering Approach
|
||||||
|
|
||||||
|
Instead of solving a mathematical puzzle, we build a system.
|
||||||
|
|
||||||
|
### Core Idea
|
||||||
|
|
||||||
|
Maintain a sliding window of recent events per request.
|
||||||
|
|
||||||
|
### Pseudocode
|
||||||
|
|
||||||
|
```
|
||||||
|
for each incoming event:
|
||||||
|
bucket = active_events[event.request_id]
|
||||||
|
|
||||||
|
remove events outside time window
|
||||||
|
|
||||||
|
for each old_event in bucket:
|
||||||
|
if event.latency + old_event.latency > threshold:
|
||||||
|
report anomaly
|
||||||
|
|
||||||
|
add event to bucket
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What This Introduces
|
||||||
|
|
||||||
|
Now we must deal with:
|
||||||
|
|
||||||
|
- bounded memory
|
||||||
|
- streaming constraints
|
||||||
|
- time-based eviction
|
||||||
|
- correlation logic
|
||||||
|
|
||||||
|
And beyond that:
|
||||||
|
|
||||||
|
- out-of-order events
|
||||||
|
- duplicate logs
|
||||||
|
- partial data
|
||||||
|
- noise filtering
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Real Insight
|
||||||
|
|
||||||
|
The difficulty is not in computing a sum.
|
||||||
|
|
||||||
|
The difficulty is in defining:
|
||||||
|
|
||||||
|
- what data is valid
|
||||||
|
- what events belong together
|
||||||
|
- what “close enough” means
|
||||||
|
- how the system behaves under imperfect conditions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Takeaway
|
||||||
|
|
||||||
|
Two Sum is often presented as a problem about numbers.
|
||||||
|
|
||||||
|
In reality, it is a problem about assumptions.
|
||||||
|
|
||||||
|
Remove those assumptions, and the problem changes completely.
|
||||||
|
|
||||||
|
> The challenge is not finding two values.
|
||||||
|
> The challenge is understanding whether those values should ever be compared.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Perspective
|
||||||
|
|
||||||
|
Exists in real engineering?
|
||||||
|
→ Yes, but as event correlation under constraints
|
||||||
|
|
||||||
|
Exists in interview form?
|
||||||
|
→ Yes, but stripped of context and complexity
|
||||||
Reference in New Issue
Block a user