Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)

> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%

> USAMO: 97.6% / 42.3% / 95.2% / 74.4%

> The biggest jump in the numbers they quoted is 6%.

Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.

 help



I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.

Can you please stop posting comments with personal swipes in them? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


You're right, I apologize for that. I have been responding with annoyance rather than walking away when I receive replies that appear to be ignoring context.

Appreciated! and of course, I know it's not easy - believe me I know...



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: