firecrawl
Firecrawl
Description
Firecrawl is an API that turns any URL into clean, LLM‑ready data. It crawls sites (no sitemap required), scrapes dynamic pages, and outputs markdown, HTML, structured JSON, screenshots, links, and metadata. Use it via hosted cloud or run locally for development.
Features
- Scrape single pages into markdown, HTML, screenshots, links, and metadata.
- Crawl a URL and its accessible subpages; returns job status and results.
- Map websites to enumerate most URLs, with optional in‑site search.
- Search the web and (optionally) scrape result pages in one call.
- Extract structured data from single/multiple pages or whole domains (prompt and/or schema).
- LLM Extraction (beta) to return JSON shaped by a schema (Zod/Pydantic) or by prompt only.
- Actions (cloud‑only): click, type, wait, scroll, and screenshot before extraction for dynamic sites.
- Batching (new): submit thousands of URLs asynchronously for large‑scale scraping.
- Reliability & anti‑bot handling: proxies, JS‑rendered content, output parsing, orchestration.
- Customizability: exclude tags, headers/auth, crawl depth, media parsing (PDF, DOCX, images).
Technology Stack
- REST API (v2)
- SDKs: Python (
firecrawl-py
), Node.js (@mendable/firecrawl-js
) - LLM frameworks: LangChain (Python/JS), LlamaIndex, CrewAI, Composio, PraisonAI, Superinterface, Vectorize
- Low‑code tools: Dify, Langflow, Flowise AI, Cargo, Pipedream
- Automation: Zapier, Pabbly Connect
Requirements
- Firecrawl account and API key for hosted API usage
- Local usage supported (see contributing guide); full self‑hosting is still in progress
Quick Links
GitHub Metrics
Stars
52,251Forks
4,515Contributors
4,515Last Updated
8/26/2025Deploy firecrawl on DigitalOcean
Get started with $200 in free credits and deploy your application in minutes.
Trusted by 600,000+ developers