firecrawl

firecrawl

Firecrawl

Description

Firecrawl is an API that turns any URL into clean, LLM‑ready data. It crawls sites (no sitemap required), scrapes dynamic pages, and outputs markdown, HTML, structured JSON, screenshots, links, and metadata. Use it via hosted cloud or run locally for development.

Features

  • Scrape single pages into markdown, HTML, screenshots, links, and metadata.
  • Crawl a URL and its accessible subpages; returns job status and results.
  • Map websites to enumerate most URLs, with optional in‑site search.
  • Search the web and (optionally) scrape result pages in one call.
  • Extract structured data from single/multiple pages or whole domains (prompt and/or schema).
  • LLM Extraction (beta) to return JSON shaped by a schema (Zod/Pydantic) or by prompt only.
  • Actions (cloud‑only): click, type, wait, scroll, and screenshot before extraction for dynamic sites.
  • Batching (new): submit thousands of URLs asynchronously for large‑scale scraping.
  • Reliability & anti‑bot handling: proxies, JS‑rendered content, output parsing, orchestration.
  • Customizability: exclude tags, headers/auth, crawl depth, media parsing (PDF, DOCX, images).

Technology Stack

  • REST API (v2)
  • SDKs: Python (firecrawl-py), Node.js (@mendable/firecrawl-js)
  • LLM frameworks: LangChain (Python/JS), LlamaIndex, CrewAI, Composio, PraisonAI, Superinterface, Vectorize
  • Low‑code tools: Dify, Langflow, Flowise AI, Cargo, Pipedream
  • Automation: Zapier, Pabbly Connect

Requirements

  • Firecrawl account and API key for hosted API usage
  • Local usage supported (see contributing guide); full self‑hosting is still in progress

Categories

Topics

GitHub Metrics

Stars
52,251
Forks
4,515
Contributors
4,515
Last Updated
8/26/2025
DigitalOcean
DigitalOcean

Deploy firecrawl on DigitalOcean

Get started with $200 in free credits and deploy your application in minutes.

Trusted by 600,000+ developers