articleHuggingFace Blog

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

AI-Q combines Llama 3.3-70B Instruct and Llama-3.3-Nemotron-Super-49B-v1.5 to enable long-context retrieval, agentic reasoning, and tool use in open-source stacks. NVIDIA details model lineage, post-training, and transparent evaluation metrics (hallucination detection, multi-source synthesis, citation trust, RAGAS), plus a 49B Nemotron running on a single H100. DeepResearch Bench ranks AI-Q top among fully open stacks with a score of 40.52 in LLM with Search (Aug 2025).

publié 04 AOÛT 2025★★★★★

Lire la sourcehuggingface.co/blog/nvidia/ai-q-top-ranking-open-portable-deep-research-agent

[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern

Source: HuggingFace Blog
Ingéré: 04 AOÛT 2025 · 19:10
Score édito: 4.0 / 5