articleHuggingFace Blog
BigCodeArena: Judging code generations end to end with code executions
BigCodeArena is a human-in-the-loop platform that evaluates AI code generation by executing code in sandboxed environments across multiple languages and frameworks. It enables interactive testing, multi-turn conversations, and community voting to rank models, addressing key evaluation gaps in code generation. The platform has gathered over 14,000 conversations since February 2025.
published OCT 07, 2025★★★★★
Read the sourcehuggingface.co/blog/bigcode/arena
[*] Opens in a new tab · no tracking on Lantern's side
- Source
- HuggingFace Blog
- Ingested
- OCT 07, 2025 · 19:10
- Editorial score
- 4.0 / 5