FeedThis weekArticle
articleHuggingFace Blog

BigCodeArena: Judging code generations end to end with code executions

BigCodeArena is a human-in-the-loop platform that evaluates AI code generation by executing code in sandboxed environments across multiple languages and frameworks. It enables interactive testing, multi-turn conversations, and community voting to rank models, addressing key evaluation gaps in code generation. The platform has gathered over 14,000 conversations since February 2025.

published OCT 07, 2025★★★★
Read the sourcehuggingface.co/blog/bigcode/arena
[*] Opens in a new tab · no tracking on Lantern's side
Source
HuggingFace Blog
Ingested
OCT 07, 2025 · 19:10
Editorial score
4.0 / 5