Docpull

Documentation web scraper

One command. Entire documentation site. Clean markdown output.

Why This Exists

Training AI on documentation means copying hundreds of pages by hand. Or writing custom scrapers for each site. Neither scales.

Docpull runs once and pulls everything. HTML becomes markdown with YAML headers. Ready for fine-tuning, Claude skills, or offline reference.

How It Works

Point it at a docs site. It crawls, converts, and organizes. Parallel processing handles large sites fast. Caching means re-runs only fetch new content.

Pre-built configs exist for Stripe, Next.js, React, and others.

Use Cases

AI training data. Build corpora from public documentation.
Offline docs. Mirror sites for air-gapped work or travel.
Claude skills. Generate knowledge bases for specialized AI assistance.

Stack

Python 3.9+
Web scraping + markdown conversion
YAML configuration
pytest, codecov, bandit

Comments

Loading comments...

Docpull

Live

Documentation web scraper

Code Article

One command. Entire documentation site. Clean markdown output.

Why This Exists

Training AI on documentation means copying hundreds of pages by hand. Or writing custom scrapers for each site. Neither scales.

Docpull runs once and pulls everything. HTML becomes markdown with YAML headers. Ready for fine-tuning, Claude skills, or offline reference.

How It Works

Point it at a docs site. It crawls, converts, and organizes. Parallel processing handles large sites fast. Caching means re-runs only fetch new content.

Pre-built configs exist for Stripe, Next.js, React, and others.

Use Cases

AI training data. Build corpora from public documentation.
Offline docs. Mirror sites for air-gapped work or travel.
Claude skills. Generate knowledge bases for specialized AI assistance.

Stack

Python 3.9+
Web scraping + markdown conversion
YAML configuration
pytest, codecov, bandit

Comments

Loading comments...

Docpull

Documentation web scraper

One command. Entire documentation site. Clean markdown output.

Why This Exists

Training AI on documentation means copying hundreds of pages by hand. Or writing custom scrapers for each site. Neither scales.

Docpull runs once and pulls everything. HTML becomes markdown with YAML headers. Ready for fine-tuning, Claude skills, or offline reference.

How It Works

Point it at a docs site. It crawls, converts, and organizes. Parallel processing handles large sites fast. Caching means re-runs only fetch new content.

Pre-built configs exist for Stripe, Next.js, React, and others.

Use Cases

AI training data. Build corpora from public documentation.
Offline docs. Mirror sites for air-gapped work or travel.
Claude skills. Generate knowledge bases for specialized AI assistance.

Stack

Python 3.9+
Web scraping + markdown conversion
YAML configuration
pytest, codecov, bandit

Comments

Loading comments...

One command. Entire documentation site. Clean markdown output.

AI training data. Build corpora from public documentation.
Offline docs. Mirror sites for air-gapped work or travel.
Claude skills. Generate knowledge bases for specialized AI assistance.

Stack

Python 3.9+
Web scraping + markdown conversion
YAML configuration
pytest, codecov, bandit