Kindle Clippings → LaTeX Converter
A small console utility that converts Amazon Kindle My Clippings text exports into structured LaTeX.
The tool parses Kindle highlights and groups them by book title, producing a LaTeX structure with:
\section{}— per book\subsection{}— per highlight (metadata line)- Highlight text — inserted as plain LaTeX content
\subsubsection{notes}— placeholder for future comments
Architecture
This project demonstrates two different parsing approaches solving the same problem:
1️⃣ FSM-based parser
Implemented using a template-based finite state machine (fsm.h).
Characteristics:
- Compile-time validated transitions
- Strong type safety
- Explicit state/event model
- Strict contract enforcement
This version is useful when:
- The input format is more complex
- You want compile-time guarantees for state transitions
- The parsing logic may grow over time
2️⃣ TypeFactory-based parser
Implemented using a registration-based factory (typefactory.h).
Characteristics:
- Stage-driven pipeline
- One handler per parsing stage
- Runtime validation of handler registration
- No per-line allocations (handlers cached once)
This version is:
- Simpler
- More readable
- Easier to debug
- Well suited for linear, protocol-like formats
Both implementations produce identical LaTeX output.
Input Format
Expected input is a standard Kindle My Clippings.txt export.
Each clipping block follows this structure:
Book Title
- Your Highlight on Location 123-125 | Added on ...
Highlighted text line 1
Highlighted text line 2
==========
Output Format
Generated LaTeX structure:
\section{Book Title}
\subsection{- Your Highlight on Location 123-125 | Added on ...}
Highlighted text line 1
Highlighted text line 2
\subsubsection{notes}
Highlights are grouped by book title.
Build
Requires a C++17-compatible compiler.
g++ -std=gnu++17 -Wall -Wextra -O2 -o kindle2latex main.cpp
Usage
./kindle2latex --input input.txt --output output.tex
Arguments:
| Argument | Description |
|---|---|
--input |
Path to Kindle clippings file |
--output |
Path to generated LaTeX file |
Design Notes
- No dynamic allocations per input line (handlers are cached).
- Order of books is preserved as in the original file.
- LaTeX special characters are escaped automatically.
- Incomplete clipping blocks are safely ignored.
- The final block is flushed even if the file does not end with
==========.
Why Two Implementations?
This repository intentionally keeps two different parsing styles:
- The FSM version demonstrates strict compile-time state control.
- The TypeFactory version demonstrates a clean, extensible runtime pipeline.
The goal is architectural exploration and comparison, not just solving the parsing task.