SWARM.ING/research dispatch
Field Report · Deep Research Platforms

The research layer
has split in three.

In May 2026 there is no single "deep research" market. There are three: consumer chat-app research for one-off questions, agentic API infrastructure for programmatic enrichment at scale, and academic deep search bound to peer-reviewed indexes. Picking the wrong tier wastes money and burns hours. This is the map.

Platforms surveyed
22
Top accuracy (BrowseComp)
48%
Cost spread (per 1K)
$5 → $2,400
Latency spread
<5s → 25min
Tier 01

Consumer chat-app deep research.

Triggered from a chat UI, returns one long-form cited report per query. Best for ad-hoc research where you'll read the answer yourself. Not built for programmatic enrichment of thousands of rows.

ChatGPT Deep ResearchOpenAI · GPT-5.4
The reference implementation. Asks clarifying questions before launching, then runs 7–20 min, reads 25–100+ pages, produces a structured report with inline citations. Strongest at multi-domain synthesis and complex query decomposition. Reliability advantage over Perplexity at higher cost in time.
Best balanceAsks clarifying QsSlow (7–20 min)
Access Plus / Pro
Free: 2/mo · Plus: 10/mo · Pro: $200/mo unlimited
Sweet spot Reliable reports
Claude ResearchAnthropic · Opus 4.7
Multi-agent orchestration. Researches ~261 sources in ~6 min in independent tests. Strongest writing quality of the chat-app cohort — closest to "perfection" in nuanced evals per multiple independent reviewers. Pairs naturally with Cowork/Skills for repeatable research workflows.
Best proseMulti-agentSkills-compatible
Access Pro / Max
Included with Max plan ($100–$200/mo)
Sweet spot Synthesis + writing
Gemini Deep ResearchGoogle · Gemini 3 Pro
Unique ecosystem play: now reads Gmail, Drive, Docs, Slides, Sheets, Chat directly inside a research run. Presents an editable research plan before executing (unlike OpenAI's autonomous launch). Slower than competitors in tests (62 sources in 15+ min), but the private-data integration is unmatched.
Gmail/Drive accessEditable plan1M context
Access Google AI Pro
$19.99/mo · bundles NotebookLM, Veo, Antigravity
Sweet spot Workspace research
Perplexity Deep ResearchPerplexity · Sonar + routed
Fastest of the cohort (~3 min). New Model Council compares 19 models in parallel; Perplexity Computer (Feb 2026) executes complex agentic workflows. SimpleQA ~94%. But: hallucination rates significantly higher than OpenAI/Claude in deep-research mode, and silently downgraded paid queries to cheaper models without disclosure in 2026. Treat as a discovery layer, not a source of truth.
Fastest (~3 min)Model CouncilTrust issues
Access Pro / Max
Pro $20/mo · Max $200/mo (Comet browser)
Sweet spot Fast discovery
NotebookLMGoogle · Gemini-backed
Different category — operates only on sources you upload (PDFs, Docs, YouTube transcripts, URLs). Zero web hallucination because the corpus is closed. Generates audio overviews, video overviews, mind maps, flashcards, infographics, slide decks. The canonical workflow: Perplexity for source discovery → NotebookLM for synthesis on the curated set.
Closed corpusAudio/video outputFree tier
Access Free / Plus
Plus bundled in Google AI Pro · Ultra $250/mo
Sweet spot Your own docs
Grok DeepSearchxAI · Grok 4
Speed-first. ~10× faster than ChatGPT Deep Research and reads ~3× more pages — but report depth and visual richness consistently rated weakest of the major players in head-to-head testing. Real-time X integration is the only genuine differentiator; useful for breaking-news contexts where Twitter signal matters.
Fastest scanX/Twitter real-time
Access X Premium+ / SuperGrok
$22–40/mo
Sweet spot News + social
Manus Wide ResearchManus (Singapore)
Orthogonal model: instead of one deep agent, spins up 100+ parallel sub-agents on one task. Each subagent is a full Manus instance — not a specialized role — that can independently scour the web. Demoed comparing 100 sneakers or generating 50 poster styles into a downloadable spreadsheet/ZIP. Genuinely useful when the task is enumerative, not deep.
100+ parallel agentsVM-per-session
Access Pro ($199)
Basic $19 · Plus $39 · Pro $199/mo
Sweet spot Enumerative tasks
GensparkGenspark Inc.
Mixture-of-Agents: 8 LLMs + 80 tools orchestrated dynamically per query. Outputs "Sparkpages" — generated landing-page-style reports rather than markdown documents. Has a native AI voice-calling module (genuinely unusual). Reported to beat ChatGPT-4 and Manus on GAIA precision (92.4%). Good free tier.
MoA architectureVoice callingGenerous free
Access Free / Plus / Pro
Plus ~$25/mo
Sweet spot Multimedia output
Qwen Deep ResearchAlibaba
Open-weight family entrant. The free preview tier explicitly collects prompts for training — disqualifies it for any sensitive strategy work until GA pricing lands with proper data handling. Capabilities competitive on Chinese-language sources and emerging-market coverage. Worth watching, not yet worth trusting with proprietary signal.
Collects promptsStrong CN coverage
Access Free preview
GA pricing TBA
Sweet spot CN-language R&D
Tier 02

Agentic API research infrastructure.

Programmatic deep research — call from your agent, get back structured JSON with citations and confidence scores. The right layer for Loop 3 strategy research, database enrichment, monitoring, and any workflow where the consumer of the output is software, not a human reading a report.

Parallel.aiParallel Web Systems
State of the art on BrowseComp. 48% accuracy on the OpenAI BrowseComp benchmark at $1,200/1K — beats ChatGPT Pro, Gemini Deep Research, Perplexity, and Exa Research on the Pareto frontier of cost/accuracy at every price point. Eight-tier processor architecture (Lite → Ultra4x) lets you match compute to query complexity. Task API, Search API, Extract API, Chat API, Monitor API, FindAll API — full toolkit. Best-in-class for programmatic deep research.
SOTA BrowseCompStructured JSONConfidence scoresAsync + sync
Pricing $5–$1,200 / 1K
Lite $5 · Base $10 · Core $25 · Pro $100 · Ultra $300 · Ultra2x $600 · Ultra4x $1,200
Latency 5s → 25 min
Async-first; fast variants pull from web index
ExaExa Labs
Neural/semantic-first search built specifically for AI agents. Exa 2.0 split into three endpoints: Exa Fast (sub-350ms p50 — fastest search API available), Exa Auto (general), Exa Deep (agentic research, field-level citations). 10 contents free per search. Strong on company/people/code search; embeddings find conceptually related content keyword search misses.
Semantic-firstSub-350ms FastField citations
Pricing $5–$1,500 / 1K
$10 free credits; Deep tier competes with Parallel Ultra
Latency 350ms → 3 min
TavilyTavily (acq. Nebius Feb 2026)
Pre-processed, AI-ready search results with content extraction built in. 99.99% uptime, 180ms p50, native LangChain integrations. Has a /research endpoint that breaks a question into sub-queries and returns cited synthesis. Best balance of speed-cost-predictability for RAG and chatbot use cases. Roadmap uncertainty after Nebius acquisition — worth monitoring before deep platform bet.
RAG-optimizedFlat pricingAcquired by Nebius
Pricing $8 / 1K credits
$30/mo entry · 1K free/mo · predictable flat rate
Latency ~180ms p50
FirecrawlFirecrawl Inc. (OSS)
The scraping/extraction layer most often paired with a search API. Open source + self-hostable for data residency. Handles JavaScript rendering, SPAs, authenticated content, complex PDFs. Outputs clean markdown or structured JSON ready for RAG ingestion. Pairs well with Exa or Tavily upstream. The right tool when you have URLs and need the contents.
Open sourceSelf-hostableJS rendering
Pricing $14+/mo
1 credit/page · Free tier 500 pages
Latency ~1.3s avg
Perplexity Sonar APIPerplexity
The same Sonar models that power Perplexity Pro, exposed as an API with citations baked in. One call = live search + in-house LLM synthesis + sources. Convenient when you want a single endpoint instead of orchestrating search + LLM separately. Sonar Pro for higher-quality multi-hop. Caveats inherited from Perplexity proper: occasional hallucination, less control over which sources are cited.
Citations built inOne-call synthesis
Pricing $5–$15 / 1K
Sonar $5 · Sonar Pro $15 · Sonar Reasoning $30
Latency 2–10s
ValyuValyu
Specialized for financial and time-sensitive research. Direct access to SEC filings + FRED data rather than scraping aggregator coverage. 79% on FreshQA (Google 39%, Exa 24%), 73% on finance-specific (Google 55%, Exa 63%). If Loop 3 research touches public-company filings or macro data, this is the surgical tool — not a general-purpose research API.
SEC + FRED direct79% FreshQA
Pricing Usage-based
Custom enterprise plans
Sweet spot Finance/macro
Brave Search APIBrave
Independent web index (not Google/Bing resold). Highest Agent Score in recent agentic-search benchmarks (14.89), lowest latency (~669ms avg). Privacy-respecting by default. Three tiers: Free (2K queries/mo, no commercial use), Base AI ($5/1K), Pro AI ($9/1K, unlimited). The independent-index hedge against Google/Bing dependency.
Independent indexTop Agent ScoreLowest latency
Pricing $5–$9 / 1K
Free 2K/mo · Pro AI $9/1K unlimited
Latency ~669ms
LinkupLinkup (EU)
European entrant. GDPR-native, flat predictable pricing (€0.005 per query, no token math), source-licensed content (Le Monde and similar partnerships). Smaller index than the US incumbents but a clean answer for EU-regulated workloads or anyone tired of pricing surprises.
GDPR-nativeLicensed contentFlat pricing
Pricing €5/mo free
Then pay-per-query, flat rate
Sweet spot EU compliance
OpenAI Deep Research APIOpenAI
The same engine as ChatGPT Deep Research, available as o3-deep-research / o4-mini-deep-research through the OpenAI API. Strong synthesis quality. But: on BrowseComp it benchmarks lower than Parallel Ultra at 6× the cost — pay the OpenAI tax for the brand familiarity, get worse accuracy per dollar. Use it when your stack is already OpenAI-native and integration matters more than benchmark.
Brand reliabilityLower $/accuracy
Pricing ~$250 / 1K
o3 + o4-mini deep-research models
Latency 7–20 min
Anthropic ResearchAnthropic
Claude with extended research tool-use, exposed via API. Strong on reasoning chains and citation discipline. On parallel.ai's own BrowseComp comparison, Claude Sonnet 4 with search at $1,168/1K hits 6% — i.e. the API-driven Anthropic stack is significantly more expensive than purpose-built deep research APIs for the same task. Best used as the synthesis engine on top of cheaper retrieval (Tavily, Exa, Firecrawl), not as the retrieval layer itself.
Strong synthesisExpensive as retriever
Pricing ~$1,000 / 1K
Sonnet/Opus + web_search tool use
Latency ~5 min
Benchmark · BrowseComp (OpenAI)
Cost vs accuracy on 1,266 multi-hop research questions — log scale on cost.
Platform Cost / 1K (USD) Accuracy Relative cost-efficiency
Parallel 1200 (Ultra4x) 1,200 48%
Parallel 600 (Ultra2x) 600 39%
Parallel Ultra 300 27%
Parallel Pro 100 17%
Exa Research 275 14%
Perplexity Deep Research 880 8%
Claude Sonnet 4 + search 1,168 6%
GPT-4.1 + browsing 53 1%
Source: parallel.ai/blog/deep-research benchmark (100-question subset, constant across competitors).
Translation: at $100/1K, Parallel Pro beats Exa Research at $275/1K. At $1,200/1K, Parallel beats Perplexity at $880/1K by 6×.
Tier 03

Academic deep search.

Search over peer-reviewed papers (mostly Semantic Scholar-backed), not the open web. Not what you want for crypto/markets/agent research — these are the right tools when you need ML literature reviews, RL papers, or scientific grounding for swarm.ing's R&D.

ElicitOught
The most production-ready academic research tool. Two key features: Research Reports (one-click long-form reports from Semantic Scholar) and Systematic Review (structured PRISMA-style protocol where you intervene at each step — inclusion/exclusion criteria, data extraction columns). Best output structure for literature reviews. Upload-papers feature lets you mix proprietary docs with the corpus.
Systematic reviewBest UXPDF upload
Pricing Free / Plus / Pro
Pro ~$15–$20/mo · API for teams
Sweet spot Lit reviews
UndermindUndermind (YC)
Considered by academic librarians the most thorough AI literature discovery tool, particularly for niche topics. Runs 8–10 min comprehensive multi-step searches. "Chat with Expert" lets you query all retrieved papers with a frontier model and 9 default expert prompts. Best when you need recall over speed and the topic is obscure.
Highest recallNiche topicsSlow but thorough
Pricing $25–$30/mo
Sweet spot Novelty checks
ConsensusConsensus
The "is there scientific consensus on X?" tool. Returns peer-reviewed answers with vote-counted study positions (yes/possibly/no). Scholar Agent in Pro and Deep Search modes runs multi-step searches with rigorous academic filters. LibKey integration for institutional access. Best for clinical, policy, and health-research-style queries where consensus matters more than novelty.
Consensus scoringLibKey integration
Pricing Free / Premium $9/mo
Sweet spot Clinical Qs
SciSpaceSciSpace
More of an end-to-end academic workflow: search → chat-with-paper → formatting → journal matching. Less rigorous than Elicit on systematic review structure, but stronger on the writing and submission tail. The "explain this passage" feature is the best of the cohort for actually reading unfamiliar papers.
Workflow-firstPaper chat
Pricing ~$15/mo Pro
Sweet spot Reading + writing
Ai2 Scholar QAAllen Institute for AI
Free, open, Semantic Scholar-backed. The "what does the literature say about X?" interface from the team that runs Semantic Scholar itself. Outputs cited responses with snippet-level provenance. Underrated; produces some of the cleanest direct-quote provenance in the cohort.
Free + openClean provenance
Pricing Free
Sweet spot Free baseline
STORM / Co-STORMStanford OVAL
Open-source research project, not a polished product. Spins up multiple specialized LLM agents that hold a roundtable conversation to generate Wikipedia-style long-form articles. Co-STORM mode lets you join the agent conversation as a participant. The architectural inspiration for many later commercial products; worth studying if you're building your own multi-agent research orchestration in Hermes.
Open sourceArchitectural ref
Pricing OSS / free demo
Sweet spot Reference design
ResearchRabbit + Connected PapersIndependent
The visual graph layer — neither generates reports nor answers questions. Show citation networks and "this paper led to these papers" relationships as interactive visualizations. Pair with Elicit/Undermind to make sure the AI didn't miss the canonical paper everyone in the field cites. Free.
FreeGraph visualizations
Pricing Free
Sweet spot Citation network
Decision Matrix

Pick by job-to-be-done.

The platforms aren't substitutes; they're job-specific. Here's how I'd map them to swarm.ing workflows.

01
One-off market scan

Use ChatGPT Deep Research.

For "what's the current state of MPC wallet UX in 2026?" — type, walk away, come back to a clean cited report. Best balance of reliability and effort. Claude Research if you want better writing. Perplexity if you need it in 3 minutes and the topic isn't load-bearing.

Cost: ~$0 (sub usage) · Time: 10 min · Output: report
02
Loop 3 strategy research

Use Parallel Ultra in Archon pipelines.

Programmatic deep research called from YAML pipelines, structured JSON outputs with confidence scores, citations, excerpts. State of the art on BrowseComp. Far better $/accuracy than OpenAI or Anthropic deep-research APIs.

Cost: $0.30–$1.20/query · Time: 5–25 min · Async
03
Loop 1 signal enrichment

Use Exa Fast or Brave Search API.

Sub-second latency, RAG-ready, $5–$9 per 1K requests. Exa for semantic relevance, Brave for independent index and lowest latency. Skip deep-research APIs at this tier — they're too slow and too expensive for live signal work.

Cost: ~$0.005–$0.009/query · Time: <1s
04
Sensitive proprietary docs

Use NotebookLM or Claude Projects.

When the sources are internal strategy docs, partner contracts, or trading logs — closed-corpus tools eliminate web hallucination entirely. NotebookLM for multimedia outputs (audio briefings on the road), Claude Projects when synthesis quality matters.

Cost: ~$20/mo · Time: instant · Closed corpus
05
Enumerative comparison (100 things)

Use Manus Wide Research or Parallel FindAll.

"Compare every DEX aggregator with sub-1bps slippage on cross-chain swaps" — enumerative tasks need parallelism, not depth. Manus spins 100 subagents; Parallel FindAll returns structured entity datasets with per-row enrichment. Either beats sequential deep-research for this shape of problem.

Cost: $199/mo (Manus) or per-match · Time: minutes
06
RL / agent research papers

Use Elicit Systematic Review.

When you need to know "what's the SOTA on hierarchical RL with offline-to-online finetuning" — academic deep-research tools are the right layer, not Perplexity. Elicit's structured review process pairs well with your Obsidian research vault workflow.

Cost: ~$20/mo · Time: 10–20 min · Semantic Scholar corpus
Synthesis

What this actually means.

Take 01

The infrastructure layer (Parallel, Exa, Brave) is winning on cost-efficiency.

Purpose-built deep-research APIs decisively beat general-LLM-with-web-search at every price point on BrowseComp. Claude + web_search at $1,168/1K hits 6% accuracy where Parallel Pro at $100/1K hits 17%. The moat is the orchestration harness, not the LLM — same lesson as Factory Droid. For Loop 3 work, default to Parallel; only fall back to LLM-with-search when integration friction outweighs cost.

Take 02

Chat-app deep research is converging on UX, diverging on trust.

ChatGPT, Claude, Gemini, and Perplexity all produce roughly the same shape of output (long-form cited report) in roughly the same time. The differentiation is now trust: Perplexity got caught silently downgrading paid queries to cheaper models. Gemini reads your Gmail/Drive. Claude has Skills. ChatGPT has the largest user-habit base. Pick on trust posture, not on raw output quality.

Take 03

Closed-corpus tools (NotebookLM, Claude Projects) are underused.

When the source is yours — internal docs, partner agreements, your own research notes — closed-corpus tools eliminate web hallucination entirely. For anything touching Openclaw's proprietary strategy work, this is the only safe option. NotebookLM's audio overviews convert dense research into commute-time briefings; genuinely useful for Loop 3 governance review on the go.

Take 04

The academic stack is its own world — don't bridge.

Don't ask Perplexity to find SOTA RL papers; ask Elicit or Undermind. Don't ask Elicit about crypto market structure; ask Parallel. The corpora are different (Semantic Scholar vs open web), the citation quality requirements are different, and trying to use general deep-research for academic synthesis produces hallucinated citations every time. Keep these workflows separate in your Obsidian vault.

Take 05

Manus and Genspark are bets on a different architecture.

While everyone else iterates on the "one deep agent, long task" pattern, Manus Wide Research parallelizes (100 subagents on one task) and Genspark uses Mixture-of-Agents (8 LLMs + 80 tools). Both architectures are closer to what Hermes is converging toward than ChatGPT Deep Research is. Worth studying their orchestration patterns, not just their output.

Take 06

For swarm.ing's stack: Parallel + Exa + Elicit + NotebookLM.

Parallel.ai Task API for Loop 3 programmatic research (you already have this). Exa Fast for Loop 1 sub-second signal enrichment when nv protocol needs web context. Elicit for RL/agent literature reviews into your Obsidian research vault. NotebookLM for synthesizing internal docs into audio briefings. Skip the chat-app deep-research products for production — useful for personal ad-hoc, not load-bearing for the platform.