More

knollimar · 2026-04-30T20:22:39 1777580559

What about options and futures?

blazespin · 2026-04-30T20:30:25 1777581025

No, because that's real money.

knollimar · 2026-04-30T15:26:19 1777562779

When I see the word "genuine" or "why this works" my uncanny valley spidey senses tingle now. It always seems like it's trying to paper over a flawed argument with these, so instead of making it, it just "turns out" it's "genuinely" the answer

knollimar · 2026-04-26T21:31:16 1777239076

huh? I work in construction (electrical drafter) and I've been called out for my installs not being ADA (after the designer gave me a non-ADA compliant design).

knollimar · 2026-04-26T16:12:38 1777219958

a small harness that stores text files and manages context could be useful, otherwise you lose all ability to measure that skill (and that's important because it represents real world use cases on large code bases)

anthonypasq · 2026-04-27T20:50:19 1777323019

arc agi isnt testing a models ability to store files and code things. its testings its ability to reason through puzzles given the same information as a human

falcor84 · 2026-04-30T17:08:06 1777568886

But that's the thing, as a human faced with a problem I'd often say "Sure, just let me get a pen, some paper and a calculator". Why shouldn't we make it easy for AIs to use their tools of choice?

knollimar · 2026-04-29T00:48:52 1777423732

if you tested my ability to reason and you gave me some challenging problems that involved arithmetic, it might be a better test if you gave me a scratch pad so I don't mess up the reasoning parts by failing arithmetic.

knollimar · 2026-04-26T16:05:23 1777219523

if anything, the "they'll just use a vpn" is an argument for the other way.

Law has privacy downsides and is trivially bypassable -> law is bad.

knollimar · 2026-04-26T14:51:20 1777215080

but you see, you own the hardware, but have no permission to modify the software I put on it >:)

knollimar · 2026-04-26T14:36:19 1777214179

Is the very first example not one without hierarchy and thus just a state machine?

embedding-shape · 2026-04-26T14:52:04 1777215124

Technically yes, that's just a state machine. On https://statecharts.dev/what-is-a-state-machine.html the website itself also admits that that example is a "simple state machine", and on https://statecharts.dev/what-is-a-statechart.html you get the better explanation with

> A statechart is a state machine where each state in the state machine may define its own subordinate state machines, called substates

knollimar · 2026-04-21T02:26:05 1776738365

wait why compare 2.6 to 2 instead of to 2.5?

gertlabs · 2026-04-21T05:19:46 1776748786

Good question. We missed that release entirely. Our automated model checker only went live 2 months ago so they were manually curated prior to that. I'm adding it now. It'll be live in ~12 hours.

gertlabs · 2026-04-21T16:35:22 1776789322

Update: Kimi K2.5 one-shot results are live. It wasn't a noteworthy release compared to K2.6: https://gertlabs.com/?mode=oneshot_coding

DeathArrow · 2026-04-22T04:57:56 1776833876

Can you add C# to supported languages? It's widely used and it be helpful for people and companies to see how different models fare against each other.

gertlabs · 2026-04-22T15:11:59 1776870719

Good idea.

knollimar · 2026-04-21T02:21:51 1776738111

yeah putting the captcha on there to thwart the LLMs ability to extract good pelicans was a really good idea

knollimar · 2026-04-19T18:27:47 1776623267

Disagree on the necessary only point.

I understand there is a point where it's harmful to take time away from them, but there's a point well before necessary where you're still conservative when asking for help but it's a net benefit to take their time.

If it took you 2 hours to not bother someone for 10 minites, that's not necessary but also still net benefit.

simianwords · 2026-04-19T18:31:18 1776623478

Agree there's an optimal here. I'm saying LLM's overall reduce the need to speak to your coworkers and that's a good thing because it opens up more avenues to have interesting conversations. And not "how do you set up this repo".