Paperboy at the Terminal

Command-Line EditionFor Correspondents

For the correspondent who prefers a teletype to a touchscreen, Paperboy ships a proper command-line edition. One install, two main commands, and an editor at your terminal who will convert files to markdown or crawl an entire site without so much as a popup.

The `convert` command turns PDFs, Word documents, spreadsheets, e-books, web pages, images (with local OCR), and plain text into markdown next to the source file. The `crawl` command discovers a site’s pages via its sitemap, fetches them in order, and writes one combined markdown file — ready for an indexer, an agent, or a careful reader.

Configuration is optional. Everything works offline, locally, deterministically. AI features exist only if you bring your own key and call them by name.

Today’s Dispatch

$ paperboy-cli convert ./report.pdf -o report.md
  → 12,438 chars · 2 images OCR'd · 4 tables
  Wrote report.md (3.2 KB)

$ paperboy-cli crawl https://example.com \
    --max-pages 50 --output-mode single \
    -o site.md
  Discovered 47 URLs from sitemap.xml
  Crawled 47/47 pages · 184,920 words
  Wrote site.md (412 KB)

$ paperboy-cli doctor --offline
  11 OK · 1 warn · 0 fail. Ready to dispatch.

Capabilities, Catalogued

  • Convert · file → markdown
    PDF, DOCX, XLSX, EPUB, HTML, RTF, plain text, CSV. Images (PNG, JPG, GIF, BMP, WebP, TIFF) are OCR’d locally via tesseract.js.
  • Crawl · site → one markdown file
    Reads sitemap.xml first, falls back to internal-link discovery. Same-origin only. Strips tracking params. Outputs single file, mirrored tree, or JSONL for RAG pipelines.
  • Doctor · pre-flight diagnostics
    Verifies Node version, config, AI provider connectivity, converter smoke test, outbound network, proxy env, and writable output directory.
  • Offline by default
    No network calls unless you opt in. All conversion runs in process. Files never leave the machine unless you pipe them elsewhere yourself.
  • Stable spec, scriptable
    Every command supports --json for machine output. Exit codes follow Unix convention. Ideal for build pipelines and shell scripts.
  • Optional AI image descriptions
    If you configure an Ollama, OpenRouter, or local-endpoint provider, the converter can add an “Image Description” section alongside OCR. You bring the key; nothing is enabled by default.

Distribution

Install (npm)

npm install -g @proticom/paperboy-cli

Requires Node 20 or later. Install globally from npm and you're ready to convert. GitHub repository.

First-run sequence

# Confirm your environment is ready
paperboy-cli doctor --offline

# Convert a file
paperboy-cli convert ./notes.docx

# Crawl a documentation site
paperboy-cli crawl https://docs.example.com --max-pages 100

Full options are documented in the README and via paperboy-cli --help. The CLI is the same converter that powers the desktop app, the Chrome extension, and the embeddable widget — one engine, four mastheads.