articleHuggingFace Blog
BigCodeArena: Judging code generations end to end with code executions
BigCodeArena is a human-in-the-loop platform that evaluates AI code generation by executing code in sandboxed environments across multiple languages and frameworks. It enables interactive testing, multi-turn conversations, and community voting to rank models, addressing key evaluation gaps in code generation. The platform has gathered over 14,000 conversations since February 2025.
publié 07 OCT. 2025★★★★★
Lire la sourcehuggingface.co/blog/bigcode/arena
[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern
- Source
- HuggingFace Blog
- Ingéré
- 07 OCT. 2025 · 19:10
- Score édito
- 4.0 / 5