⌁
ESC
⌁
Posts Shorts Projects Badge Life
ESC
EN
Posts Shorts Projects Badge Life
EN
Account:
Tags / #ai-benchmarks

#ai-benchmarks

1 post
April 27, 2026
April 27, 2026
April 27, 2026
April 27, 2026
SWE-bench Verified 失效:公开编程评测的可信度危机与下一代评估范式转型

SWE bench Verified 失效:当前沿模型开始“记住答案”,公开评测就不再等于真实编程能力 核心解读 今天 Hacker News 上最值得 llmapis.com 跟进的 AI 评测话题,不是某个模型又刷新了多少分,而是 OpenAI 明确宣布: SWE bench Verified 已经不再适合衡量前沿

swe-benchevaluation-contaminationautonomous-software-engineeringai-benchmarkscoding-agents
Previous
1
Next
Readme Posts Shorts Projects Tags
© 2026 LLMAPIS. ALL RIGHTS RESERVED. DESIGNED WITH CODE & TEA