FeedThis weekArticle
articleHuggingFace Blog

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

AI-Q combines Llama 3.3-70B Instruct and Llama-3.3-Nemotron-Super-49B-v1.5 to enable long-context retrieval, agentic reasoning, and tool use in open-source stacks. NVIDIA details model lineage, post-training, and transparent evaluation metrics (hallucination detection, multi-source synthesis, citation trust, RAGAS), plus a 49B Nemotron running on a single H100. DeepResearch Bench ranks AI-Q top among fully open stacks with a score of 40.52 in LLM with Search (Aug 2025).

published AUG 04, 2025★★★★
Source
HuggingFace Blog
Ingested
AUG 04, 2025 · 19:10
Editorial score
4.0 / 5