# #03 — Two Sum Is Not About Numbers

---

## Problem

At first glance, the problem looks trivial:

> Given a list of values, find two elements whose sum equals a target.

This is one of the most well-known interview questions, commonly referred to as **Two Sum**.

It is simple, clean, and perfectly defined:

- a static array  
- exact arithmetic  
- a guaranteed answer  

And that’s exactly why it works so well in interviews.

---

## Typical Interview Thinking

A candidate is expected to go through a familiar progression:

1. Start with brute force (O(n²))  
2. Recognize inefficiency  
3. Optimize using a hash map  
4. Achieve O(n) time complexity  

```cpp
unordered_map<int, int> seen;

for (int i = 0; i < n; ++i) {
    int complement = target - nums[i];

    if (seen.count(complement)) {
        return {seen[complement], i};
    }

    seen[nums[i]] = i;
}
```

The “correct” answer is not really about solving the problem.

It is about recognizing the pattern.

---

## What This Actually Tests

Despite its simplicity, this problem evaluates:

- familiarity with standard patterns  
- ability to choose a data structure  
- understanding of time complexity  

But most importantly:

> it tests whether you have seen this problem before.

A candidate who has already practiced this family of tasks will likely reach the expected answer quickly.

A candidate who has spent years solving real engineering problems may still pause — not because the problem is hard, but because the interview expects a very specific kind of answer.

---

## A Subtle Shift

Now let’s move the same idea one step closer to reality.

Instead of numbers, we have **log events**.

Instead of a static array, we have a **stream**.

Instead of a clean equality, we have **imperfect data, context, and thresholds**.

---

## Synthetic Log Example

```text
2026-04-16T10:15:01.123Z service=api    event=parse_input   latency=12ms request_id=req-1001
2026-04-16T10:15:01.130Z service=cache  event=cache_miss    latency=48ms request_id=req-1001
2026-04-16T10:15:01.135Z service=db     event=read_user     latency=55ms request_id=req-1001
2026-04-16T10:15:01.141Z service=auth   event=token_check   latency=18ms request_id=req-2001
2026-04-16T10:15:01.144Z service=net    event=external_call latency=47ms request_id=req-1001
2026-04-16T10:15:01.149Z service=db     event=read_user     latency=22ms request_id=req-2001
2026-04-16T10:15:01.151Z service=cache  event=cache_miss    latency=60ms request_id=req-3001
2026-04-16T10:15:01.154Z service=net    event=external_call latency=52ms request_id=req-3001
```

---

## Real Problem

We are no longer asked to find two numbers.

Instead, the problem becomes:

> Detect whether there exist two events:
> - belonging to the same request  
> - occurring within a time window  
> - whose combined latency exceeds a threshold  

This still *looks* like Two Sum.

But it is not.

---

## How LeetCode Thinking Tries to Adapt

The first instinct is to simplify.

Take the log stream, ignore most of the structure, extract just the latency values, and reduce everything back to “numbers in an array”.

That leads to a familiar line of thinking:

1. Collect latencies  
2. Search for matching pairs  
3. Try to reuse the same hash map pattern  
4. Treat the task as another variation of Two Sum  

This is exactly what interview training encourages:

> reduce the problem until it matches a known template.

That works beautifully in interviews.

But this is also where the model starts to break.

---

## Where the Interview Model Breaks

### 1. It Is Not an Exact-Match Problem

Interview version:

a + b == target

Real version:

a + b > threshold

We are not searching for a perfect complement.  
We are evaluating a condition.

---

### 2. Context Cannot Be Ignored

A latency of 55ms from one request and 52ms from another may exceed the threshold.

But together they mean nothing.

Without context, the result is technically correct — and completely useless.

---

### 3. Time Makes the Problem Harder

Events are not just values — they exist in time.

Two events may belong to the same request and still be unrelated if they are too far apart.

This introduces:

- time windows  
- ordering  
- eviction  

---

### 4. The Data Is Not Static

Interview assumptions:

- full dataset available  
- stable ordering  
- perfect input  

Reality:

- streaming data  
- out-of-order events  
- missing logs  
- duplicates  

The “single clean pass over an array” stops being a valid model.

---

### 5. Pattern Matching Becomes a Trap

The more familiar the pattern, the stronger the temptation:

> “This is just Two Sum.”

But in reality:

- request_id defines grouping  
- timestamp defines relevance  
- streaming defines constraints  

These are not details.

They are the problem.

---

## What the Problem Really Becomes

At this point, the challenge is no longer:

> “find two numbers”

It becomes:

> “determine which events are comparable at all”

The arithmetic is trivial.

The system is not.

---

## Real Engineering Approach

Instead of solving a puzzle, we build a mechanism.

### Core Idea

Maintain a sliding window per request_id.

### Pseudocode

for each incoming event:
    bucket = active_events[event.request_id]

    remove events outside time window

    for each old_event in bucket:
        if event.latency + old_event.latency > threshold:
            report anomaly

    add event to bucket

---

## What This Introduces

Now we must deal with:

- bounded memory  
- streaming constraints  
- time-based eviction  
- request-level grouping  

And then reality hits:

- out-of-order events  
- duplicate logs  
- partial data  
- noise  

At this point, the original Two Sum is almost unrecognizable.

---

## Demo

See example implementation:

- examples/two_sum_logs_demo

---

## Example Output

```
Synthetic log stream:
  2026-04-16T10:15:01.100Z service=api    event=parse_input   latency=12ms request_id=req-1001
  2026-04-16T10:15:01.110Z service=cache  event=cache_miss    latency=48ms request_id=req-1001
  2026-04-16T10:15:01.120Z service=auth   event=token_check   latency=58ms request_id=req-2001
  2026-04-16T10:15:01.130Z service=db     event=read_user     latency=43ms request_id=req-3001
  2026-04-16T10:15:01.135Z service=db     event=read_user     latency=55ms request_id=req-1001
  2026-04-16T10:15:01.144Z service=net    event=external_call latency=47ms request_id=req-1001
  2026-04-16T10:15:01.200Z service=cache  event=cache_miss    latency=60ms request_id=req-3001
  2026-04-16T10:15:01.260Z service=net    event=external_call latency=52ms request_id=req-3001

Threshold: 100ms
Time window: 20ms

Interview-style reduction (ignores request_id and time):
  first : 2026-04-16T10:15:01.110Z service=cache  event=cache_miss    latency=48ms request_id=req-1001
  second: 2026-04-16T10:15:01.120Z service=auth   event=token_check   latency=58ms request_id=req-2001
  combined latency: 106ms

Streaming sliding-window detection:
  first : 2026-04-16T10:15:01.135Z service=db     event=read_user     latency=55ms request_id=req-1001
  second: 2026-04-16T10:15:01.144Z service=net    event=external_call latency=47ms request_id=req-1001
  combined latency: 102ms

Notes:
  - The interview-style version can produce a false correlation.
  - In this dataset, it first matches 58ms from req-2001 with 43ms from req-3001.
  - That pair exceeds the threshold, but it is operationally meaningless.
  - The streaming version only correlates events from the same request_id
    and only within the configured time window.
```
---

## Explanation

The interview-style solution produces a mathematically valid result.

But it mixes unrelated events.

The streaming solution respects:

- request boundaries  
- time constraints  

Which makes the result meaningful.

---

## The Real Insight

The difficulty is not in computing a sum.

The difficulty is in defining:

- what data is valid  
- what events belong together  
- what “close enough” means  
- how the system behaves under imperfect conditions  

---

## Key Takeaway

Two Sum is not about numbers.

It is about assumptions.

Remove those assumptions — and the problem changes completely.

> The challenge is not finding two values.  
> The challenge is understanding whether those values should ever be compared.

---

## Project Perspective

Exists in real engineering?  
→ Yes, but as event correlation under constraints  

Exists in interview form?  
→ Yes, but stripped of context and complexity  

---

## Final Note

The algorithm was never the hard part.  
The assumptions were.