When I see the word "genuine" or "why this works" my uncanny valley spidey senses tingle now. It always seems like it's trying to paper over a flawed argument with these, so instead of making it, it just "turns out" it's "genuinely" the answer
huh? I work in construction (electrical drafter) and I've been called out for my installs not being ADA (after the designer gave me a non-ADA compliant design).
a small harness that stores text files and manages context could be useful, otherwise you lose all ability to measure that skill (and that's important because it represents real world use cases on large code bases)
arc agi isnt testing a models ability to store files and code things. its testings its ability to reason through puzzles given the same information as a human
But that's the thing, as a human faced with a problem I'd often say "Sure, just let me get a pen, some paper and a calculator". Why shouldn't we make it easy for AIs to use their tools of choice?
if you tested my ability to reason and you gave me some challenging problems that involved arithmetic, it might be a better test if you gave me a scratch pad so I don't mess up the reasoning parts by failing arithmetic.
Good question. We missed that release entirely. Our automated model checker only went live 2 months ago so they were manually curated prior to that. I'm adding it now. It'll be live in ~12 hours.
Can you add C# to supported languages? It's widely used and it be helpful for people and companies to see how different models fare against each other.
I understand there is a point where it's harmful to take time away from them, but there's a point well before necessary where you're still conservative when asking for help but it's a net benefit to take their time.
If it took you 2 hours to not bother someone for 10 minites, that's not necessary but also still net benefit.
Agree there's an optimal here. I'm saying LLM's overall reduce the need to speak to your coworkers and that's a good thing because it opens up more avenues to have interesting conversations. And not "how do you set up this repo".
reply