Closes #01

2026-05-18 15:31:52 -04:00
parent 269d560847
commit 41be5a2e24
3 changed files with 384 additions and 239 deletions
--- a/analysis/03-Two_Sum_Is_Not_About_Numbers/readme.md
+++ b/analysis/03-Two_Sum_Is_Not_About_Numbers/readme.md
@@ -1,4 +1,6 @@
-# Analysis #XX — Two Sum Is Not About Numbers
+# Analysis #03 — Two Sum Is Not About Numbers
+
+---

 ## Problem

@@ -9,9 +11,10 @@ At first glance, the problem looks trivial:
 This is one of the most well-known interview questions, commonly referred to as **Two Sum**.

 It is simple, clean, and perfectly defined:
- a static array
- exact arithmetic
- a guaranteed answer
+
+- a static array  
+- exact arithmetic  
+- a guaranteed answer  

 And that’s exactly why it works so well in interviews.

@@ -21,10 +24,10 @@ And that’s exactly why it works so well in interviews.

 A candidate is expected to go through a familiar progression:

-1. Start with brute force (O(n²))
-2. Recognize inefficiency
-3. Optimize using a hash map
-4. Achieve O(n) time complexity
+1. Start with brute force (O(n²))  
+2. Recognize inefficiency  
+3. Optimize using a hash map  
+4. Achieve O(n) time complexity  

 ```cpp
 unordered_map<int, int> seen;
@@ -40,7 +43,7 @@ for (int i = 0; i < n; ++i) {
 }
 ```

-The “correct” answer is not about solving the problem.
+The “correct” answer is not really about solving the problem.

 It is about recognizing the pattern.

@@ -50,35 +53,41 @@ It is about recognizing the pattern.

 Despite its simplicity, this problem evaluates:

- familiarity with standard patterns
- ability to choose a data structure
- understanding of time complexity
+- familiarity with standard patterns  
+- ability to choose a data structure  
+- understanding of time complexity  

 But most importantly:

 > it tests whether you have seen this problem before.

+A candidate who has already practiced this family of tasks will likely reach the expected answer quickly.
+
+A candidate who has spent years solving real engineering problems may still pause — not because the problem is hard, but because the interview expects a very specific kind of answer.
+
 ---

 ## A Subtle Shift

-Now let’s take the same idea and move it one step closer to reality.
+Now let’s move the same idea one step closer to reality.

 Instead of numbers, we have **log events**.

 Instead of a static array, we have a **stream**.

-Instead of a clean equality, we have **imperfect data and thresholds**.
+Instead of a clean equality, we have **imperfect data, context, and thresholds**.

 ---

 ## Synthetic Log Example

-```
+```text
 2026-04-16T10:15:01.123Z service=api    event=parse_input   latency=12ms request_id=req-1001
 2026-04-16T10:15:01.130Z service=cache  event=cache_miss    latency=48ms request_id=req-1001
 2026-04-16T10:15:01.135Z service=db     event=read_user     latency=55ms request_id=req-1001
+2026-04-16T10:15:01.141Z service=auth   event=token_check   latency=18ms request_id=req-2001
 2026-04-16T10:15:01.144Z service=net    event=external_call latency=47ms request_id=req-1001
+2026-04-16T10:15:01.149Z service=db     event=read_user     latency=22ms request_id=req-2001
 2026-04-16T10:15:01.151Z service=cache  event=cache_miss    latency=60ms request_id=req-3001
 2026-04-16T10:15:01.154Z service=net    event=external_call latency=52ms request_id=req-3001
 ```
@@ -92,9 +101,9 @@ We are no longer asked to find two numbers.
 Instead, the problem becomes:

 > Detect whether there exist two events:
-> - belonging to the same request
-> - occurring close in time
-> - whose combined latency exceeds a threshold
+> - belonging to the same request  
+> - occurring within a time window  
+> - whose combined latency exceeds a threshold  

 This still *looks* like Two Sum.

@@ -102,60 +111,104 @@ But it is not.

 ---

+## How LeetCode Thinking Tries to Adapt
+
+The first instinct is to simplify.
+
+Take the log stream, ignore most of the structure, extract just the latency values, and reduce everything back to “numbers in an array”.
+
+That leads to a familiar line of thinking:
+
+1. Collect latencies  
+2. Search for matching pairs  
+3. Try to reuse the same hash map pattern  
+4. Treat the task as another variation of Two Sum  
+
+This is exactly what interview training encourages:
+
+> reduce the problem until it matches a known template.
+
+That works beautifully in interviews.
+
+But this is also where the model starts to break.
+
+---
+
 ## Where the Interview Model Breaks

-### 1. No Exact Match
+### 1. It Is Not an Exact-Match Problem

 Interview version:
-```
+
 a + b == target
-```

 Real version:
-```
-a + b > threshold
-```

-We are not searching for a perfect complement.
+a + b > threshold
+
+We are not searching for a perfect complement.  
 We are evaluating a condition.

 ---

-### 2. Context Is Mandatory
+### 2. Context Cannot Be Ignored

-You cannot combine arbitrary events.
+A latency of 55ms from one request and 52ms from another may exceed the threshold.

-A latency spike only makes sense **within the same request**.
+But together they mean nothing.

-Without context, the result is meaningless.
+Without context, the result is technically correct — and completely useless.

 ---

-### 3. Time Matters
+### 3. Time Makes the Problem Harder

 Events are not just values — they exist in time.

-Two events five seconds apart may not be related at all.
+Two events may belong to the same request and still be unrelated if they are too far apart.

 This introduces:
- time windows
- ordering issues
- temporal constraints
+
+- time windows  
+- ordering  
+- eviction  

 ---

-### 4. Data Is Not Static
+### 4. The Data Is Not Static

-LeetCode assumes:
- full dataset
- already loaded
- perfectly ordered
+Interview assumptions:
+
+- full dataset available  
+- stable ordering  
+- perfect input  

 Reality:
- streaming input
- delayed events
- missing entries
- out-of-order delivery
+
+- streaming data  
+- out-of-order events  
+- missing logs  
+- duplicates  
+
+The “single clean pass over an array” stops being a valid model.
+
+---
+
+### 5. Pattern Matching Becomes a Trap
+
+The more familiar the pattern, the stronger the temptation:
+
+> “This is just Two Sum.”
+
+But in reality:
+
+- request_id defines grouping  
+- timestamp defines relevance  
+- streaming defines constraints  
+
+These are not details.
+
+They are the problem.

 ---

@@ -169,21 +222,22 @@ It becomes:

 > “determine which events are comparable at all”

-And that is a fundamentally different problem.
+The arithmetic is trivial.
+
+The system is not.

 ---

 ## Real Engineering Approach

-Instead of solving a mathematical puzzle, we build a system.
+Instead of solving a puzzle, we build a mechanism.

 ### Core Idea

-Maintain a sliding window of recent events per request.
+Maintain a sliding window per request_id.

 ### Pseudocode

-```
 for each incoming event:
    bucket = active_events[event.request_id]

@@ -194,7 +248,6 @@ for each incoming event:
            report anomaly

    add event to bucket
-```

 ---

@@ -202,17 +255,52 @@ for each incoming event:

 Now we must deal with:

- bounded memory
- streaming constraints
- time-based eviction
- correlation logic
+- bounded memory  
+- streaming constraints  
+- time-based eviction  
+- request-level grouping  

-And beyond that:
+And then reality hits:

- out-of-order events
- duplicate logs
- partial data
- noise filtering
+- out-of-order events  
+- duplicate logs  
+- partial data  
+- noise  
+
+At this point, the original Two Sum is almost unrecognizable.
+
+---
+
+## Demo
+
+See example implementation:
+
+- examples/two_sum_logs_demo.cpp
+
+---
+
+## Example Output
+
+Interview-style reduction:
+  combines events from different request_id → false positive
+
+Streaming solution:
+  finds valid pair within same request and time window
+
+---
+
+## Explanation
+
+The interview-style solution produces a mathematically valid result.
+
+But it mixes unrelated events.
+
+The streaming solution respects:
+
+- request boundaries  
+- time constraints  
+
+Which makes the result meaningful.

 ---

@@ -222,20 +310,20 @@ The difficulty is not in computing a sum.

 The difficulty is in defining:

- what data is valid
- what events belong together
- what “close enough” means
- how the system behaves under imperfect conditions
+- what data is valid  
+- what events belong together  
+- what “close enough” means  
+- how the system behaves under imperfect conditions  

 ---

 ## Key Takeaway

-Two Sum is often presented as a problem about numbers.
+Two Sum is not about numbers.

-In reality, it is a problem about assumptions.
+It is about assumptions.

-Remove those assumptions, and the problem changes completely.
+Remove those assumptions — and the problem changes completely.

 > The challenge is not finding two values.  
 > The challenge is understanding whether those values should ever be compared.
@@ -245,7 +333,14 @@ Remove those assumptions, and the problem changes completely.
 ## Project Perspective

 Exists in real engineering?  
-→ Yes, but as event correlation under constraints
+→ Yes, but as event correlation under constraints  

 Exists in interview form?  
-→ Yes, but stripped of context and complexity
+→ Yes, but stripped of context and complexity  
+
+---
+
+## Final Note
+
+The algorithm was never the hard part.  
+The assumptions were.