136 lines
2.9 KiB
Markdown
136 lines
2.9 KiB
Markdown
# Kindle Clippings → LaTeX Converter
|
||
|
||
A small console utility that converts Amazon Kindle *My Clippings* text exports into structured LaTeX.
|
||
|
||
The tool parses Kindle highlights and groups them by book title, producing a LaTeX structure with:
|
||
|
||
* `\section{}` — per book
|
||
* `\subsection{}` — per highlight (metadata line)
|
||
* Highlight text — inserted as plain LaTeX content
|
||
* `\subsubsection{notes}` — placeholder for future comments
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
This project demonstrates **two different parsing approaches** solving the same problem:
|
||
|
||
### 1️⃣ FSM-based parser
|
||
|
||
Implemented using a template-based finite state machine (`fsm.h`).
|
||
|
||
Characteristics:
|
||
|
||
* Compile-time validated transitions
|
||
* Strong type safety
|
||
* Explicit state/event model
|
||
* Strict contract enforcement
|
||
|
||
This version is useful when:
|
||
|
||
* The input format is more complex
|
||
* You want compile-time guarantees for state transitions
|
||
* The parsing logic may grow over time
|
||
|
||
---
|
||
|
||
### 2️⃣ TypeFactory-based parser
|
||
|
||
Implemented using a registration-based factory (`typefactory.h`).
|
||
|
||
Characteristics:
|
||
|
||
* Stage-driven pipeline
|
||
* One handler per parsing stage
|
||
* Runtime validation of handler registration
|
||
* No per-line allocations (handlers cached once)
|
||
|
||
This version is:
|
||
|
||
* Simpler
|
||
* More readable
|
||
* Easier to debug
|
||
* Well suited for linear, protocol-like formats
|
||
|
||
Both implementations produce identical LaTeX output.
|
||
|
||
---
|
||
|
||
## Input Format
|
||
|
||
Expected input is a standard Kindle `My Clippings.txt` export.
|
||
|
||
Each clipping block follows this structure:
|
||
|
||
```
|
||
Book Title
|
||
- Your Highlight on Location 123-125 | Added on ...
|
||
|
||
Highlighted text line 1
|
||
Highlighted text line 2
|
||
==========
|
||
```
|
||
|
||
---
|
||
|
||
## Output Format
|
||
|
||
Generated LaTeX structure:
|
||
|
||
```latex
|
||
\section{Book Title}
|
||
|
||
\subsection{- Your Highlight on Location 123-125 | Added on ...}
|
||
Highlighted text line 1
|
||
Highlighted text line 2
|
||
\subsubsection{notes}
|
||
```
|
||
|
||
Highlights are grouped by book title.
|
||
|
||
---
|
||
|
||
## Build
|
||
|
||
Requires a C++17-compatible compiler.
|
||
|
||
```bash
|
||
g++ -std=gnu++17 -Wall -Wextra -O2 -o kindle2latex main.cpp
|
||
```
|
||
|
||
---
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
./kindle2latex --input input.txt --output output.tex
|
||
```
|
||
|
||
Arguments:
|
||
|
||
| Argument | Description |
|
||
| ---------- | ----------------------------- |
|
||
| `--input` | Path to Kindle clippings file |
|
||
| `--output` | Path to generated LaTeX file |
|
||
|
||
---
|
||
|
||
## Design Notes
|
||
|
||
* No dynamic allocations per input line (handlers are cached).
|
||
* Order of books is preserved as in the original file.
|
||
* LaTeX special characters are escaped automatically.
|
||
* Incomplete clipping blocks are safely ignored.
|
||
* The final block is flushed even if the file does not end with `==========`.
|
||
|
||
---
|
||
|
||
## Why Two Implementations?
|
||
|
||
This repository intentionally keeps two different parsing styles:
|
||
|
||
* The FSM version demonstrates strict compile-time state control.
|
||
* The TypeFactory version demonstrates a clean, extensible runtime pipeline.
|
||
|
||
The goal is architectural exploration and comparison, not just solving the parsing task.
|