BigCodeArena: Judging code generations end to end with code executions
BigCodeArena is a human-in-the-loop platform that evaluates AI code generation by executing code in sandboxed environments across multiple languages and frameworks. It enables interactive testing, multi-turn conversations, and community voting to rank models, addressing key evaluation gaps in code generation. The platform has gathered over 14,000 conversations since February 2025.