© 2025
Back to projects

Docpull

Documentation web scraper

One command. Entire documentation site. Clean markdown output.

Why This Exists

Training AI on documentation means copying hundreds of pages by hand. Or writing custom scrapers for each site. Neither scales.

Docpull runs once and pulls everything. HTML becomes markdown with YAML headers. Ready for fine-tuning, Claude skills, or offline reference.

How It Works

Point it at a docs site. It crawls, converts, and organizes. Parallel processing handles large sites fast. Caching means re-runs only fetch new content.

Pre-built configs exist for Stripe, Next.js, React, and others.

Use Cases

  • AI training data. Build corpora from public documentation.
  • Offline docs. Mirror sites for air-gapped work or travel.
  • Claude skills. Generate knowledge bases for specialized AI assistance.

Stack

  • Python 3.9+
  • Web scraping + markdown conversion
  • YAML configuration
  • pytest, codecov, bandit

Comments

Loading comments...
All projects

Projects

x402aCrypto-ReposBabelClarityD'audioEA(s)ElfyTradingVPSzacharyr0th.comClaude StarterDocpullWebGLTOML ToolsOn AptosOracle PricingSymphonarySimpleOS

Docpull

Live

Documentation web scraper

CodeArticle

One command. Entire documentation site. Clean markdown output.

Why This Exists

Training AI on documentation means copying hundreds of pages by hand. Or writing custom scrapers for each site. Neither scales.

Docpull runs once and pulls everything. HTML becomes markdown with YAML headers. Ready for fine-tuning, Claude skills, or offline reference.

How It Works

Point it at a docs site. It crawls, converts, and organizes. Parallel processing handles large sites fast. Caching means re-runs only fetch new content.

Pre-built configs exist for Stripe, Next.js, React, and others.

Use Cases

  • AI training data. Build corpora from public documentation.
  • Offline docs. Mirror sites for air-gapped work or travel.
  • Claude skills. Generate knowledge bases for specialized AI assistance.

Stack

  • Python 3.9+
  • Web scraping + markdown conversion
  • YAML configuration
  • pytest, codecov, bandit

Comments

Loading comments...

Status

functional

Published

October 1, 2024

Links

CodeArticle

Technologies

Python
Web Scraping
Documentation
Markdown
AI
Developer Tools
Automation

All Technologies (68)

TypeScript(9)
Next.js(8)
Python(5)
React(5)
Crypto(4)
AI(4)
Automation(4)
Blockchain(3)
DeFi(3)
GraphQL(2)
Data Analysis(2)
Open Source(2)
Agents(2)
Tailwind CSS(2)
Developer Tools(2)
Web Scraping(2)
Aptos(2)
Web3
PostgreSQL
Hasura
TTS
Tauri
Finance
Analytics
Audio
Streaming
Web Audio API
LLM
Gift Giving
Recommendations
E-commerce
Trading
Risk Management
Framer Motion
Claude Code
Configuration
Commands
Documentation
Markdown
WebGL
GLSL
Shaders
JavaScript
Graphics
Interactive
Patterns
TOML
Ecosystems
Recharts
Shadcn/ui
Indexing
Move
Oracle
Pyth
Smart Contracts
Music
PDF
VexFlow
Audiveris
Monorepo
Music Theory
C
Assembly
Operating Systems
x86-64
Kernel
Unix
Systems Programming