second

2026-04-16 20:41:31 -04:00
parent 9ad604196e
commit 269d560847
1 changed files with 251 additions and 0 deletions
--- a/analysis/03-Two_Sum_Is_Not_About_Numbers/readme.md
+++ b/analysis/03-Two_Sum_Is_Not_About_Numbers/readme.md
@@ -0,0 +1,251 @@
 # Analysis #XX — Two Sum Is Not About Numbers
 ## Problem
 At first glance, the problem looks trivial:
 > Given a list of values, find two elements whose sum equals a target.
 This is one of the most well-known interview questions, commonly referred to as **Two Sum**.
 It is simple, clean, and perfectly defined:
 - a static array
 - exact arithmetic
 - a guaranteed answer
 And that’s exactly why it works so well in interviews.
 ---
 ## Typical Interview Thinking
 A candidate is expected to go through a familiar progression:
 1. Start with brute force (O(n²))
 2. Recognize inefficiency
 3. Optimize using a hash map
 4. Achieve O(n) time complexity
 ```cpp
 unordered_map<int, int> seen;
 for (int i = 0; i < n; ++i) {
    int complement = target - nums[i];
    if (seen.count(complement)) {
        return {seen[complement], i};
    }
    seen[nums[i]] = i;
 }
 ```
 The “correct” answer is not about solving the problem.
 It is about recognizing the pattern.
 ---
 ## What This Actually Tests
 Despite its simplicity, this problem evaluates:
 - familiarity with standard patterns
 - ability to choose a data structure
 - understanding of time complexity
 But most importantly:
 > it tests whether you have seen this problem before.
 ---
 ## A Subtle Shift
 Now let’s take the same idea and move it one step closer to reality.
 Instead of numbers, we have **log events**.
 Instead of a static array, we have a **stream**.
 Instead of a clean equality, we have **imperfect data and thresholds**.
 ---
 ## Synthetic Log Example
 ```
 2026-04-16T10:15:01.123Z service=api    event=parse_input   latency=12ms request_id=req-1001
 2026-04-16T10:15:01.130Z service=cache  event=cache_miss    latency=48ms request_id=req-1001
 2026-04-16T10:15:01.135Z service=db     event=read_user     latency=55ms request_id=req-1001
 2026-04-16T10:15:01.144Z service=net    event=external_call latency=47ms request_id=req-1001
 2026-04-16T10:15:01.151Z service=cache  event=cache_miss    latency=60ms request_id=req-3001
 2026-04-16T10:15:01.154Z service=net    event=external_call latency=52ms request_id=req-3001
 ```
 ---
 ## Real Problem
 We are no longer asked to find two numbers.
 Instead, the problem becomes:
 > Detect whether there exist two events:
 > - belonging to the same request
 > - occurring close in time
 > - whose combined latency exceeds a threshold
 This still *looks* like Two Sum.
 But it is not.
 ---
 ## Where the Interview Model Breaks
 ### 1. No Exact Match
 Interview version:
 ```
 a + b == target
 ```
 Real version:
 ```
 a + b > threshold
 ```
 We are not searching for a perfect complement.
 We are evaluating a condition.
 ---
 ### 2. Context Is Mandatory
 You cannot combine arbitrary events.
 A latency spike only makes sense **within the same request**.
 Without context, the result is meaningless.
 ---
 ### 3. Time Matters
 Events are not just values — they exist in time.
 Two events five seconds apart may not be related at all.
 This introduces:
 - time windows
 - ordering issues
 - temporal constraints
 ---
 ### 4. Data Is Not Static
 LeetCode assumes:
 - full dataset
 - already loaded
 - perfectly ordered
 Reality:
 - streaming input
 - delayed events
 - missing entries
 - out-of-order delivery
 ---
 ## What the Problem Really Becomes
 At this point, the challenge is no longer:
 > “find two numbers”
 It becomes:
 > “determine which events are comparable at all”
 And that is a fundamentally different problem.
 ---
 ## Real Engineering Approach
 Instead of solving a mathematical puzzle, we build a system.
 ### Core Idea
 Maintain a sliding window of recent events per request.
 ### Pseudocode
 ```
 for each incoming event:
    bucket = active_events[event.request_id]
    remove events outside time window
    for each old_event in bucket:
        if event.latency + old_event.latency > threshold:
            report anomaly
    add event to bucket
 ```
 ---
 ## What This Introduces
 Now we must deal with:
 - bounded memory
 - streaming constraints
 - time-based eviction
 - correlation logic
 And beyond that:
 - out-of-order events
 - duplicate logs
 - partial data
 - noise filtering
 ---
 ## The Real Insight
 The difficulty is not in computing a sum.
 The difficulty is in defining:
 - what data is valid
 - what events belong together
 - what “close enough” means
 - how the system behaves under imperfect conditions
 ---
 ## Key Takeaway
 Two Sum is often presented as a problem about numbers.
 In reality, it is a problem about assumptions.
 Remove those assumptions, and the problem changes completely.
 > The challenge is not finding two values.  
 > The challenge is understanding whether those values should ever be compared.
 ---
 ## Project Perspective
 Exists in real engineering?  
 → Yes, but as event correlation under constraints
 Exists in interview form?  
 → Yes, but stripped of context and complexity