Hacker Newsnew | past | comments | ask | show | jobs | submit | knollimar's commentslogin

What about options and futures?

No, because that's real money.

When I see the word "genuine" or "why this works" my uncanny valley spidey senses tingle now. It always seems like it's trying to paper over a flawed argument with these, so instead of making it, it just "turns out" it's "genuinely" the answer

huh? I work in construction (electrical drafter) and I've been called out for my installs not being ADA (after the designer gave me a non-ADA compliant design).

a small harness that stores text files and manages context could be useful, otherwise you lose all ability to measure that skill (and that's important because it represents real world use cases on large code bases)

arc agi isnt testing a models ability to store files and code things. its testings its ability to reason through puzzles given the same information as a human

But that's the thing, as a human faced with a problem I'd often say "Sure, just let me get a pen, some paper and a calculator". Why shouldn't we make it easy for AIs to use their tools of choice?

if you tested my ability to reason and you gave me some challenging problems that involved arithmetic, it might be a better test if you gave me a scratch pad so I don't mess up the reasoning parts by failing arithmetic.

if anything, the "they'll just use a vpn" is an argument for the other way.

Law has privacy downsides and is trivially bypassable -> law is bad.


but you see, you own the hardware, but have no permission to modify the software I put on it >:)

Is the very first example not one without hierarchy and thus just a state machine?

Technically yes, that's just a state machine. On https://statecharts.dev/what-is-a-state-machine.html the website itself also admits that that example is a "simple state machine", and on https://statecharts.dev/what-is-a-statechart.html you get the better explanation with

> A statechart is a state machine where each state in the state machine may define its own subordinate state machines, called substates


wait why compare 2.6 to 2 instead of to 2.5?

Good question. We missed that release entirely. Our automated model checker only went live 2 months ago so they were manually curated prior to that. I'm adding it now. It'll be live in ~12 hours.

Update: Kimi K2.5 one-shot results are live. It wasn't a noteworthy release compared to K2.6: https://gertlabs.com/?mode=oneshot_coding

Can you add C# to supported languages? It's widely used and it be helpful for people and companies to see how different models fare against each other.

Good idea.

yeah putting the captcha on there to thwart the LLMs ability to extract good pelicans was a really good idea

Disagree on the necessary only point.

I understand there is a point where it's harmful to take time away from them, but there's a point well before necessary where you're still conservative when asking for help but it's a net benefit to take their time.

If it took you 2 hours to not bother someone for 10 minites, that's not necessary but also still net benefit.


Agree there's an optimal here. I'm saying LLM's overall reduce the need to speak to your coworkers and that's a good thing because it opens up more avenues to have interesting conversations. And not "how do you set up this repo".

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: