Open source · Self-hosted · No vendor lock-in

See What They Don't
Want You to Find

One platform to collect, extract, and cross-reference across 54 data sources. AI-powered entity extraction, relationship mapping, and multi-agent verification — all self-hosted.

Trusted by investigative journalists, compliance teams, and academic researchers worldwide.

The data is there. It's just buried.

Critical information is scattered across corporate filings, court records, news archives, and the Wayback Machine. Connecting the dots manually takes weeks.

🔍

Fragmented Sources

SEC filings, OpenCorporates, Wikidata, ProPublica, GLEIF, court records — each behind a different interface, different format, different search syntax.

Manual Correlation

You find a name in a filing, then search it in five other databases, copy-paste into a spreadsheet, and hope you didn't miss a connection. There's a better way.

💸

Prohibitive Tooling

Maltego costs $1,000+/yr. Palantir requires a government contract. SpiderFoot's hosted plan is $500+/mo. Public-interest research shouldn't require a budget.

Everything you need to investigate

From collection to export, Super Scraper handles the full intelligence lifecycle.

🌐

URL Collection

Paste any URL. Auto-fetches, parses, and triggers AI extraction in one step.

🧠

AI Entity Extraction

Groq-powered tool calling extracts people, orgs, roles, and relationships automatically.

🕸️

Connections Graph

Interactive force-directed graph. See who connects to whom, find clusters and anomalies.

3-Agent Verification

Cross-check claims across Groq, SambaNova, and Cerebras. Three models, one verdict.

🔎

OSINT Search

Query 11 free databases: OpenCorporates, SEC EDGAR, Wikidata, ProPublica, GLEIF, and more.

🔄

Research Loop

Self-expanding pipeline: seed a query, search, analyze gaps, loop until complete.

📸

Media Analysis

OCR images, transcribe audio, analyze documents with Groq and Gemini vision.

📚

Academic Papers

Search across 5 academic APIs. Find peer-reviewed sources for any claim.

🏛️

Wayback Machine

Pull historical snapshots. See what a page said before they changed it.

🔔

Change Monitoring

Watch URLs for changes. Get diffs and AI-generated summaries when content updates.

📊

Vector Search

Semantic search powered by Gemini embeddings. Find related documents by meaning, not keywords.

📦

Evidence Export

Export as JSON, CSV, Markdown, or full evidence packages with sourced citations.

Three steps. Full picture.

From raw URL to structured intelligence in minutes.

1

Collect

Paste a URL, upload a document, or run an OSINT search. Super Scraper fetches, parses, and stores everything.

2

Extract

AI automatically extracts entities, relationships, and timeline events. Everything gets linked in the graph.

3

Analyze

Query your data with AI chat, explore the connections graph, verify claims with 3-agent cross-checks, and export evidence packages.

Simple, honest pricing

Self-host for free, forever. Or let us handle the infrastructure.

Free

Self-hosted

$0 /forever
  • All 17 edge functions
  • Unlimited projects
  • Full source code
  • Community support
  • BYO API keys (free tiers)
Clone Repo

Pro

Managed hosting

$5 /month
  • Everything in Free
  • Managed Supabase instance
  • Pre-configured API keys
  • Automatic updates
  • Priority support
Get Started
Locked Forever

Founding Member

Early supporter

$40 /year

73 of 100 spots remaining

  • Everything in Pro
  • Price locked at $40/yr forever
  • Founding member badge
  • Direct roadmap input
  • Early access to new features
Claim Your Spot

How we compare

Professional-grade OSINT without the professional-grade invoice.

Feature Super Scraper Maltego Palantir SpiderFoot
Starting price Free $1,000/yr Custom ($$$$) $500/mo
Self-hosted
Open source
AI entity extraction
Multi-agent verification
Semantic search
No paid API keys required

Open source. Run it yourself.

Clone the repo and deploy in under 10 minutes. All you need is a free Supabase account and free API keys from Groq, Gemini, SambaNova, and Cerebras.

# Clone and install
git clone https://github.com/joecattt/archive-intelligence.git
cd super-scrape
npm install

# Set up Supabase
npx supabase init
npx supabase db push
npx supabase functions deploy

# Configure your free API keys
npx supabase secrets set GROQ_API_KEY="your-key"
npx supabase secrets set GEMINI_API_KEY="your-key"
npx supabase secrets set SAMBANOVA_API_KEY="your-key"
npx supabase secrets set CEREBRAS_API_KEY="your-key"

# Build and launch
npm run build && npm run preview
View on GitHub

Frequently asked questions

Is this really free? +
Yes. The core platform is open source and always will be. You self-host on your own Supabase instance (free tier is generous). The AI providers we use — Groq, Gemini, SambaNova, Cerebras — all have free tiers sufficient for serious research. No credit card needed for any of it.
What data sources does it search? +
The OSINT search function queries 11 free databases including OpenCorporates, SEC EDGAR, Wikidata (SPARQL), ProPublica Nonprofit Explorer, GLEIF (Legal Entity Identifiers), OFAC sanctions, and more. Plus you can collect any URL, crawl sites, pull Wayback Machine snapshots, and search 5 academic paper APIs.
How does multi-agent verification work? +
When you verify a claim, Archive Intelligence sends it to three independent AI providers (Groq, SambaNova, Cerebras) running different models. Each assesses the claim independently. You get a confidence rating based on agreement, plus specific supporting and contradicting evidence from each model. No single model can hallucinate past the consensus.
Is my research data private? +
When self-hosted, your data lives in your own Supabase instance with row-level security. Documents, extractions, and entity graphs never leave your infrastructure. AI queries are sent to third-party providers for processing but are not stored by them per their free-tier data policies.
Can I use this for journalism? +
Absolutely. Archive Intelligence is built for public-interest research including investigative journalism, academic research, compliance, and due diligence. The evidence export feature creates citation-ready packages with source URLs and timestamps. Multiple journalists and newsrooms already use it.
What's the tech stack? +
React 18 + TypeScript + Vite + Tailwind on the frontend. Supabase (Postgres + Deno Edge Functions) on the backend. AI providers are Groq (Llama 3.3 70B), SambaNova (DeepSeek-R1), Cerebras (Qwen 3 235B), and Gemini (embeddings + vision). All free tier, no paid APIs.

Start investigating

The information is public. The connections are hidden. Archive Intelligence makes them visible.