AI Deep Research Platforms — State of the Field, May 2026

Tier 01

Consumer chat-app deep research.

Triggered from a chat UI, returns one long-form cited report per query. Best for ad-hoc research where you'll read the answer yourself. Not built for programmatic enrichment of thousands of rows.

ChatGPT Deep ResearchOpenAI · GPT-5.4

The reference implementation. Asks clarifying questions before launching, then runs 7–20 min, reads 25–100+ pages, produces a structured report with inline citations. Strongest at multi-domain synthesis and complex query decomposition. Reliability advantage over Perplexity at higher cost in time.

Best balanceAsks clarifying QsSlow (7–20 min)

Access Plus / Pro

Free: 2/mo · Plus: 10/mo · Pro: $200/mo unlimited

Sweet spot Reliable reports

Claude ResearchAnthropic · Opus 4.7

Multi-agent orchestration. Researches ~261 sources in ~6 min in independent tests. Strongest writing quality of the chat-app cohort — closest to "perfection" in nuanced evals per multiple independent reviewers. Pairs naturally with Cowork/Skills for repeatable research workflows.

Best proseMulti-agentSkills-compatible

Access Pro / Max

Included with Max plan ($100–$200/mo)

Sweet spot Synthesis + writing

Gemini Deep ResearchGoogle · Gemini 3 Pro

Unique ecosystem play: now reads Gmail, Drive, Docs, Slides, Sheets, Chat directly inside a research run. Presents an editable research plan before executing (unlike OpenAI's autonomous launch). Slower than competitors in tests (62 sources in 15+ min), but the private-data integration is unmatched.

Gmail/Drive accessEditable plan1M context

Access Google AI Pro

$19.99/mo · bundles NotebookLM, Veo, Antigravity

Sweet spot Workspace research

Perplexity Deep ResearchPerplexity · Sonar + routed

Fastest of the cohort (~3 min). New Model Council compares 19 models in parallel; Perplexity Computer (Feb 2026) executes complex agentic workflows. SimpleQA ~94%. But: hallucination rates significantly higher than OpenAI/Claude in deep-research mode, and silently downgraded paid queries to cheaper models without disclosure in 2026. Treat as a discovery layer, not a source of truth.

Fastest (~3 min)Model CouncilTrust issues

Access Pro / Max

Pro $20/mo · Max $200/mo (Comet browser)

Sweet spot Fast discovery

NotebookLMGoogle · Gemini-backed

Different category — operates only on sources you upload (PDFs, Docs, YouTube transcripts, URLs). Zero web hallucination because the corpus is closed. Generates audio overviews, video overviews, mind maps, flashcards, infographics, slide decks. The canonical workflow: Perplexity for source discovery → NotebookLM for synthesis on the curated set.

Closed corpusAudio/video outputFree tier

Access Free / Plus

Plus bundled in Google AI Pro · Ultra $250/mo

Sweet spot Your own docs

Grok DeepSearchxAI · Grok 4

Speed-first. ~10× faster than ChatGPT Deep Research and reads ~3× more pages — but report depth and visual richness consistently rated weakest of the major players in head-to-head testing. Real-time X integration is the only genuine differentiator; useful for breaking-news contexts where Twitter signal matters.

Fastest scanX/Twitter real-time

Access X Premium+ / SuperGrok

$22–40/mo

Sweet spot News + social

Manus Wide ResearchManus (Singapore)

Orthogonal model: instead of one deep agent, spins up 100+ parallel sub-agents on one task. Each subagent is a full Manus instance — not a specialized role — that can independently scour the web. Demoed comparing 100 sneakers or generating 50 poster styles into a downloadable spreadsheet/ZIP. Genuinely useful when the task is enumerative, not deep.

100+ parallel agentsVM-per-session

Access Pro ($199)

Basic $19 · Plus $39 · Pro $199/mo

Sweet spot Enumerative tasks

GensparkGenspark Inc.

Mixture-of-Agents: 8 LLMs + 80 tools orchestrated dynamically per query. Outputs "Sparkpages" — generated landing-page-style reports rather than markdown documents. Has a native AI voice-calling module (genuinely unusual). Reported to beat ChatGPT-4 and Manus on GAIA precision (92.4%). Good free tier.

MoA architectureVoice callingGenerous free

Access Free / Plus / Pro

Plus ~$25/mo

Sweet spot Multimedia output

Qwen Deep ResearchAlibaba

Open-weight family entrant. The free preview tier explicitly collects prompts for training — disqualifies it for any sensitive strategy work until GA pricing lands with proper data handling. Capabilities competitive on Chinese-language sources and emerging-market coverage. Worth watching, not yet worth trusting with proprietary signal.

Collects promptsStrong CN coverage

Access Free preview

GA pricing TBA

Sweet spot CN-language R&D

Tier 02

Agentic API research infrastructure.

Programmatic deep research — call from your agent, get back structured JSON with citations and confidence scores. The right layer for Loop 3 strategy research, database enrichment, monitoring, and any workflow where the consumer of the output is software, not a human reading a report.

Parallel.aiParallel Web Systems

State of the art on BrowseComp. 48% accuracy on the OpenAI BrowseComp benchmark at $1,200/1K — beats ChatGPT Pro, Gemini Deep Research, Perplexity, and Exa Research on the Pareto frontier of cost/accuracy at every price point. Eight-tier processor architecture (Lite → Ultra4x) lets you match compute to query complexity. Task API, Search API, Extract API, Chat API, Monitor API, FindAll API — full toolkit. Best-in-class for programmatic deep research.

SOTA BrowseCompStructured JSONConfidence scoresAsync + sync

Pricing $5–$1,200 / 1K

Lite $5 · Base $10 · Core $25 · Pro $100 · Ultra $300 · Ultra2x $600 · Ultra4x $1,200

Latency 5s → 25 min

Async-first; fast variants pull from web index

ExaExa Labs

Neural/semantic-first search built specifically for AI agents. Exa 2.0 split into three endpoints: Exa Fast (sub-350ms p50 — fastest search API available), Exa Auto (general), Exa Deep (agentic research, field-level citations). 10 contents free per search. Strong on company/people/code search; embeddings find conceptually related content keyword search misses.

Semantic-firstSub-350ms FastField citations

Pricing $5–$1,500 / 1K

$10 free credits; Deep tier competes with Parallel Ultra

Latency 350ms → 3 min

TavilyTavily (acq. Nebius Feb 2026)

Pre-processed, AI-ready search results with content extraction built in. 99.99% uptime, 180ms p50, native LangChain integrations. Has a /research endpoint that breaks a question into sub-queries and returns cited synthesis. Best balance of speed-cost-predictability for RAG and chatbot use cases. Roadmap uncertainty after Nebius acquisition — worth monitoring before deep platform bet.

RAG-optimizedFlat pricingAcquired by Nebius

Pricing $8 / 1K credits

$30/mo entry · 1K free/mo · predictable flat rate

Latency ~180ms p50

FirecrawlFirecrawl Inc. (OSS)

The scraping/extraction layer most often paired with a search API. Open source + self-hostable for data residency. Handles JavaScript rendering, SPAs, authenticated content, complex PDFs. Outputs clean markdown or structured JSON ready for RAG ingestion. Pairs well with Exa or Tavily upstream. The right tool when you have URLs and need the contents.

Open sourceSelf-hostableJS rendering

Pricing $14+/mo

1 credit/page · Free tier 500 pages

Latency ~1.3s avg

Perplexity Sonar APIPerplexity

The same Sonar models that power Perplexity Pro, exposed as an API with citations baked in. One call = live search + in-house LLM synthesis + sources. Convenient when you want a single endpoint instead of orchestrating search + LLM separately. Sonar Pro for higher-quality multi-hop. Caveats inherited from Perplexity proper: occasional hallucination, less control over which sources are cited.

Citations built inOne-call synthesis

Pricing $5–$15 / 1K

Sonar $5 · Sonar Pro $15 · Sonar Reasoning $30

Latency 2–10s

ValyuValyu

Specialized for financial and time-sensitive research. Direct access to SEC filings + FRED data rather than scraping aggregator coverage. 79% on FreshQA (Google 39%, Exa 24%), 73% on finance-specific (Google 55%, Exa 63%). If Loop 3 research touches public-company filings or macro data, this is the surgical tool — not a general-purpose research API.

SEC + FRED direct79% FreshQA

Pricing Usage-based

Custom enterprise plans

Sweet spot Finance/macro

Brave Search APIBrave

Independent web index (not Google/Bing resold). Highest Agent Score in recent agentic-search benchmarks (14.89), lowest latency (~669ms avg). Privacy-respecting by default. Three tiers: Free (2K queries/mo, no commercial use), Base AI ($5/1K), Pro AI ($9/1K, unlimited). The independent-index hedge against Google/Bing dependency.

Independent indexTop Agent ScoreLowest latency

Pricing $5–$9 / 1K

Free 2K/mo · Pro AI $9/1K unlimited

Latency ~669ms

LinkupLinkup (EU)

European entrant. GDPR-native, flat predictable pricing (€0.005 per query, no token math), source-licensed content (Le Monde and similar partnerships). Smaller index than the US incumbents but a clean answer for EU-regulated workloads or anyone tired of pricing surprises.

GDPR-nativeLicensed contentFlat pricing

Pricing €5/mo free

Then pay-per-query, flat rate

Sweet spot EU compliance

OpenAI Deep Research APIOpenAI

The same engine as ChatGPT Deep Research, available as o3-deep-research / o4-mini-deep-research through the OpenAI API. Strong synthesis quality. But: on BrowseComp it benchmarks lower than Parallel Ultra at 6× the cost — pay the OpenAI tax for the brand familiarity, get worse accuracy per dollar. Use it when your stack is already OpenAI-native and integration matters more than benchmark.

Brand reliabilityLower $/accuracy

Pricing ~$250 / 1K

o3 + o4-mini deep-research models

Latency 7–20 min

Anthropic ResearchAnthropic

Claude with extended research tool-use, exposed via API. Strong on reasoning chains and citation discipline. On parallel.ai's own BrowseComp comparison, Claude Sonnet 4 with search at $1,168/1K hits 6% — i.e. the API-driven Anthropic stack is significantly more expensive than purpose-built deep research APIs for the same task. Best used as the synthesis engine on top of cheaper retrieval (Tavily, Exa, Firecrawl), not as the retrieval layer itself.

Strong synthesisExpensive as retriever

Pricing ~$1,000 / 1K

Sonnet/Opus + web_search tool use

Latency ~5 min

Benchmark · BrowseComp (OpenAI)

Cost vs accuracy on 1,266 multi-hop research questions — log scale on cost.

Platform	Cost / 1K (USD)	Accuracy
Parallel 1200 (Ultra4x)	1,200	48%
Parallel 600 (Ultra2x)	600	39%
Parallel Ultra	300	27%
Parallel Pro	100	17%
Exa Research	275	14%
Perplexity Deep Research	880	8%
Claude Sonnet 4 + search	1,168	6%
GPT-4.1 + browsing	53	1%

Source: parallel.ai/blog/deep-research benchmark (100-question subset, constant across competitors).
Translation: at $100/1K, Parallel Pro beats Exa Research at $275/1K. At $1,200/1K, Parallel beats Perplexity at $880/1K by 6×.

Tier 03

Academic deep search.

Search over peer-reviewed papers (mostly Semantic Scholar-backed), not the open web. Not what you want for crypto/markets/agent research — these are the right tools when you need ML literature reviews, RL papers, or scientific grounding for swarm.ing's R&D.

ElicitOught

The most production-ready academic research tool. Two key features: Research Reports (one-click long-form reports from Semantic Scholar) and Systematic Review (structured PRISMA-style protocol where you intervene at each step — inclusion/exclusion criteria, data extraction columns). Best output structure for literature reviews. Upload-papers feature lets you mix proprietary docs with the corpus.

Systematic reviewBest UXPDF upload

Pricing Free / Plus / Pro

Pro ~$15–$20/mo · API for teams

Sweet spot Lit reviews

UndermindUndermind (YC)

Considered by academic librarians the most thorough AI literature discovery tool, particularly for niche topics. Runs 8–10 min comprehensive multi-step searches. "Chat with Expert" lets you query all retrieved papers with a frontier model and 9 default expert prompts. Best when you need recall over speed and the topic is obscure.

Highest recallNiche topicsSlow but thorough

Pricing $25–$30/mo

Sweet spot Novelty checks

ConsensusConsensus

The "is there scientific consensus on X?" tool. Returns peer-reviewed answers with vote-counted study positions (yes/possibly/no). Scholar Agent in Pro and Deep Search modes runs multi-step searches with rigorous academic filters. LibKey integration for institutional access. Best for clinical, policy, and health-research-style queries where consensus matters more than novelty.

Consensus scoringLibKey integration

Pricing Free / Premium $9/mo

Sweet spot Clinical Qs

SciSpaceSciSpace

More of an end-to-end academic workflow: search → chat-with-paper → formatting → journal matching. Less rigorous than Elicit on systematic review structure, but stronger on the writing and submission tail. The "explain this passage" feature is the best of the cohort for actually reading unfamiliar papers.

Workflow-firstPaper chat

Pricing ~$15/mo Pro

Sweet spot Reading + writing

Ai2 Scholar QAAllen Institute for AI

Free, open, Semantic Scholar-backed. The "what does the literature say about X?" interface from the team that runs Semantic Scholar itself. Outputs cited responses with snippet-level provenance. Underrated; produces some of the cleanest direct-quote provenance in the cohort.

Free + openClean provenance

Pricing Free

Sweet spot Free baseline

STORM / Co-STORMStanford OVAL

Open-source research project, not a polished product. Spins up multiple specialized LLM agents that hold a roundtable conversation to generate Wikipedia-style long-form articles. Co-STORM mode lets you join the agent conversation as a participant. The architectural inspiration for many later commercial products; worth studying if you're building your own multi-agent research orchestration in Hermes.

Open sourceArchitectural ref

Pricing OSS / free demo

Sweet spot Reference design

ResearchRabbit + Connected PapersIndependent

The visual graph layer — neither generates reports nor answers questions. Show citation networks and "this paper led to these papers" relationships as interactive visualizations. Pair with Elicit/Undermind to make sure the AI didn't miss the canonical paper everyone in the field cites. Free.

FreeGraph visualizations

Pricing Free

Sweet spot Citation network

Decision Matrix

Pick by job-to-be-done.

The platforms aren't substitutes; they're job-specific. Here's how I'd map them to swarm.ing workflows.

One-off market scan

Use ChatGPT Deep Research.

For "what's the current state of MPC wallet UX in 2026?" — type, walk away, come back to a clean cited report. Best balance of reliability and effort. Claude Research if you want better writing. Perplexity if you need it in 3 minutes and the topic isn't load-bearing.

Cost: ~$0 (sub usage) · Time: 10 min · Output: report

Loop 3 strategy research

Use Parallel Ultra in Archon pipelines.

Programmatic deep research called from YAML pipelines, structured JSON outputs with confidence scores, citations, excerpts. State of the art on BrowseComp. Far better $/accuracy than OpenAI or Anthropic deep-research APIs.

Cost: $0.30–$1.20/query · Time: 5–25 min · Async

Loop 1 signal enrichment

Use Exa Fast or Brave Search API.

Sub-second latency, RAG-ready, $5–$9 per 1K requests. Exa for semantic relevance, Brave for independent index and lowest latency. Skip deep-research APIs at this tier — they're too slow and too expensive for live signal work.

Cost: ~$0.005–$0.009/query · Time: <1s

Sensitive proprietary docs

Use NotebookLM or Claude Projects.

When the sources are internal strategy docs, partner contracts, or trading logs — closed-corpus tools eliminate web hallucination entirely. NotebookLM for multimedia outputs (audio briefings on the road), Claude Projects when synthesis quality matters.

Cost: ~$20/mo · Time: instant · Closed corpus

Enumerative comparison (100 things)

Use Manus Wide Research or Parallel FindAll.

"Compare every DEX aggregator with sub-1bps slippage on cross-chain swaps" — enumerative tasks need parallelism, not depth. Manus spins 100 subagents; Parallel FindAll returns structured entity datasets with per-row enrichment. Either beats sequential deep-research for this shape of problem.

Cost: $199/mo (Manus) or per-match · Time: minutes

RL / agent research papers

Use Elicit Systematic Review.

When you need to know "what's the SOTA on hierarchical RL with offline-to-online finetuning" — academic deep-research tools are the right layer, not Perplexity. Elicit's structured review process pairs well with your Obsidian research vault workflow.

Cost: ~$20/mo · Time: 10–20 min · Semantic Scholar corpus

Synthesis

What this actually means.

Take 01

The infrastructure layer (Parallel, Exa, Brave) is winning on cost-efficiency.

Purpose-built deep-research APIs decisively beat general-LLM-with-web-search at every price point on BrowseComp. Claude + web_search at $1,168/1K hits 6% accuracy where Parallel Pro at $100/1K hits 17%. The moat is the orchestration harness, not the LLM — same lesson as Factory Droid. For Loop 3 work, default to Parallel; only fall back to LLM-with-search when integration friction outweighs cost.

Take 02

Chat-app deep research is converging on UX, diverging on trust.

ChatGPT, Claude, Gemini, and Perplexity all produce roughly the same shape of output (long-form cited report) in roughly the same time. The differentiation is now trust: Perplexity got caught silently downgrading paid queries to cheaper models. Gemini reads your Gmail/Drive. Claude has Skills. ChatGPT has the largest user-habit base. Pick on trust posture, not on raw output quality.

Take 03

Closed-corpus tools (NotebookLM, Claude Projects) are underused.

When the source is yours — internal docs, partner agreements, your own research notes — closed-corpus tools eliminate web hallucination entirely. For anything touching Openclaw's proprietary strategy work, this is the only safe option. NotebookLM's audio overviews convert dense research into commute-time briefings; genuinely useful for Loop 3 governance review on the go.

Take 04

The academic stack is its own world — don't bridge.

Don't ask Perplexity to find SOTA RL papers; ask Elicit or Undermind. Don't ask Elicit about crypto market structure; ask Parallel. The corpora are different (Semantic Scholar vs open web), the citation quality requirements are different, and trying to use general deep-research for academic synthesis produces hallucinated citations every time. Keep these workflows separate in your Obsidian vault.

Take 05

Manus and Genspark are bets on a different architecture.

While everyone else iterates on the "one deep agent, long task" pattern, Manus Wide Research parallelizes (100 subagents on one task) and Genspark uses Mixture-of-Agents (8 LLMs + 80 tools). Both architectures are closer to what Hermes is converging toward than ChatGPT Deep Research is. Worth studying their orchestration patterns, not just their output.

Take 06

For swarm.ing's stack: Parallel + Exa + Elicit + NotebookLM.

Parallel.ai Task API for Loop 3 programmatic research (you already have this). Exa Fast for Loop 1 sub-second signal enrichment when nv protocol needs web context. Elicit for RL/agent literature reviews into your Obsidian research vault. NotebookLM for synthesizing internal docs into audio briefings. Skip the chat-app deep-research products for production — useful for personal ad-hoc, not load-bearing for the platform.

The research layerhas split in three.