articleHuggingFace Blog

3LM: A Benchmark for Arabic LLMs in STEM and Code

3LM est un benchmark multidomaine pour évaluer les LLM arabes en STEM et en code, déployant trois jeux de données (Native STEM MCQs, Synthetic STEM et Arabic Code Benchmarks) et des métriques comme pass@1 via EvalPlus. Le pipeline combine OCR, génération par LLM et vérifications humaines, et propose l'accès aux jeux sur HuggingFace et le code sur GitHub.

published AUG 01, 2025★★★★★

Read the sourcehuggingface.co/blog/tiiuae/3lm-benchmark

[*] Opens in a new tab · no tracking on Lantern's side

Source: HuggingFace Blog
Ingested: AUG 01, 2025 · 19:10
Editorial score: 4.0 / 5