articleHuggingFace Blog

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

AI-Q combines Llama 3.3-70B Instruct and Llama-3.3-Nemotron-Super-49B-v1.5 to enable long-context retrieval, agentic reasoning, and tool use in open-source stacks. NVIDIA details model lineage, post-training, and transparent evaluation metrics (hallucination detection, multi-source synthesis, citation trust, RAGAS), plus a 49B Nemotron running on a single H100. DeepResearch Bench ranks AI-Q top among fully open stacks with a score of 40.52 in LLM with Search (Aug 2025).

published AUG 04, 2025★★★★★

Read the sourcehuggingface.co/blog/nvidia/ai-q-top-ranking-open-portable-deep-research-agent

[*] Opens in a new tab · no tracking on Lantern's side

Source: HuggingFace Blog
Ingested: AUG 04, 2025 · 19:10
Editorial score: 4.0 / 5