Files
kindle2latex/README.md
2026-02-28 22:20:44 -05:00

136 lines
2.9 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Kindle Clippings → LaTeX Converter
A small console utility that converts Amazon Kindle *My Clippings* text exports into structured LaTeX.
The tool parses Kindle highlights and groups them by book title, producing a LaTeX structure with:
* `\section{}` — per book
* `\subsection{}` — per highlight (metadata line)
* Highlight text — inserted as plain LaTeX content
* `\subsubsection{notes};` — placeholder for future comments
---
## Architecture
This project demonstrates **two different parsing approaches** solving the same problem:
### 1⃣ FSM-based parser
Implemented using a template-based finite state machine (`fsm.h`).
Characteristics:
* Compile-time validated transitions
* Strong type safety
* Explicit state/event model
* Strict contract enforcement
This version is useful when:
* The input format is more complex
* You want compile-time guarantees for state transitions
* The parsing logic may grow over time
---
### 2⃣ TypeFactory-based parser
Implemented using a registration-based factory (`typefactory.h`).
Characteristics:
* Stage-driven pipeline
* One handler per parsing stage
* Runtime validation of handler registration
* No per-line allocations (handlers cached once)
This version is:
* Simpler
* More readable
* Easier to debug
* Well suited for linear, protocol-like formats
Both implementations produce identical LaTeX output.
---
## Input Format
Expected input is a standard Kindle `My Clippings.txt` export.
Each clipping block follows this structure:
```
Book Title
- Your Highlight on Location 123-125 | Added on ...
Highlighted text line 1
Highlighted text line 2
==========
```
---
## Output Format
Generated LaTeX structure:
```latex
\section{Book Title}
\subsection{- Your Highlight on Location 123-125 | Added on ...}
Highlighted text line 1
Highlighted text line 2
\subsubsection{notes}
```
Highlights are grouped by book title.
---
## Build
Requires a C++17-compatible compiler.
```bash
g++ -std=gnu++17 -Wall -Wextra -O2 -o kindle2latex main.cpp
```
---
## Usage
```bash
./kindle2latex --input input.txt --output output.tex
```
Arguments:
| Argument | Description |
| ---------- | ----------------------------- |
| `--input` | Path to Kindle clippings file |
| `--output` | Path to generated LaTeX file |
---
## Design Notes
* No dynamic allocations per input line (handlers are cached).
* Order of books is preserved as in the original file.
* LaTeX special characters are escaped automatically.
* Incomplete clipping blocks are safely ignored.
* The final block is flushed even if the file does not end with `==========`.
---
## Why Two Implementations?
This repository intentionally keeps two different parsing styles:
* The FSM version demonstrates strict compile-time state control.
* The TypeFactory version demonstrates a clean, extensible runtime pipeline.
The goal is architectural exploration and comparison, not just solving the parsing task.