One platform to collect, extract, and cross-reference across 54 data sources. AI-powered entity extraction, relationship mapping, and multi-agent verification — all self-hosted.
Trusted by investigative journalists, compliance teams, and academic researchers worldwide.
Critical information is scattered across corporate filings, court records, news archives, and the Wayback Machine. Connecting the dots manually takes weeks.
SEC filings, OpenCorporates, Wikidata, ProPublica, GLEIF, court records — each behind a different interface, different format, different search syntax.
You find a name in a filing, then search it in five other databases, copy-paste into a spreadsheet, and hope you didn't miss a connection. There's a better way.
Maltego costs $1,000+/yr. Palantir requires a government contract. SpiderFoot's hosted plan is $500+/mo. Public-interest research shouldn't require a budget.
From collection to export, Super Scraper handles the full intelligence lifecycle.
Paste any URL. Auto-fetches, parses, and triggers AI extraction in one step.
Groq-powered tool calling extracts people, orgs, roles, and relationships automatically.
Interactive force-directed graph. See who connects to whom, find clusters and anomalies.
Cross-check claims across Groq, SambaNova, and Cerebras. Three models, one verdict.
Query 11 free databases: OpenCorporates, SEC EDGAR, Wikidata, ProPublica, GLEIF, and more.
Self-expanding pipeline: seed a query, search, analyze gaps, loop until complete.
OCR images, transcribe audio, analyze documents with Groq and Gemini vision.
Search across 5 academic APIs. Find peer-reviewed sources for any claim.
Pull historical snapshots. See what a page said before they changed it.
Watch URLs for changes. Get diffs and AI-generated summaries when content updates.
Semantic search powered by Gemini embeddings. Find related documents by meaning, not keywords.
Export as JSON, CSV, Markdown, or full evidence packages with sourced citations.
From raw URL to structured intelligence in minutes.
Paste a URL, upload a document, or run an OSINT search. Super Scraper fetches, parses, and stores everything.
AI automatically extracts entities, relationships, and timeline events. Everything gets linked in the graph.
Query your data with AI chat, explore the connections graph, verify claims with 3-agent cross-checks, and export evidence packages.
Self-host for free, forever. Or let us handle the infrastructure.
Self-hosted
Managed hosting
Early supporter
73 of 100 spots remaining
Professional-grade OSINT without the professional-grade invoice.
| Feature | Super Scraper | Maltego | Palantir | SpiderFoot |
|---|---|---|---|---|
| Starting price | Free | $1,000/yr | Custom ($$$$) | $500/mo |
| Self-hosted | ✓ | ✗ | ✗ | ✓ |
| Open source | ✓ | ✗ | ✗ | ✓ |
| AI entity extraction | ✓ | ✗ | ✓ | ✗ |
| Multi-agent verification | ✓ | ✗ | ✗ | ✗ |
| Semantic search | ✓ | ✗ | ✓ | ✗ |
| No paid API keys required | ✓ | ✗ | ✗ | ✗ |
Clone the repo and deploy in under 10 minutes. All you need is a free Supabase account and free API keys from Groq, Gemini, SambaNova, and Cerebras.
# Clone and install
git clone https://github.com/joecattt/archive-intelligence.git
cd super-scrape
npm install
# Set up Supabase
npx supabase init
npx supabase db push
npx supabase functions deploy
# Configure your free API keys
npx supabase secrets set GROQ_API_KEY="your-key"
npx supabase secrets set GEMINI_API_KEY="your-key"
npx supabase secrets set SAMBANOVA_API_KEY="your-key"
npx supabase secrets set CEREBRAS_API_KEY="your-key"
# Build and launch
npm run build && npm run preview
The information is public. The connections are hidden. Archive Intelligence makes them visible.