I believe we, developers, would love to see benchmarks or a few small code snippets rather than pure documentation. By the way, I have noticed that Ryan Dahl of node.js has mentioned you on twitter @ryah:
> Cute but only a fool would introduce this complexity and overhead for easing their C programming experience.
Anyway, that's cool. Keep up good work, tebrikler :)
Ryan has a vested interest in not having people adopt coroutines since they show that his stupid "events with callbacks are faster and easier than threads" is bullshit. The truth is if you have coroutines (and these are really easy in unix with C), then you don't need callbacks and you can make an event based system look and work exactly like a thread based system without the shared resource drawbacks. With coroutines you can also do callbacks, so you can get the best of all worlds, which you can't get with pure callbacks only system like Node.js has.
Coroutines do impose some overhead - each coroutine still requires its own stack. You either have to allocate a stack big enough for the maximum depth that a coroutine will need (and when you're making complex library calls, that can be pretty deep) - OR - you save memory by assuming that the "suspend" call will only occur when there's relatively little stack space used, in which case you manually save/restore the stack by copying - and hope that the overhead of copying a stack for each context switch isn't a killer.
I'm also guessing that it's also not entirely true that there are no shared resource drawbacks. Whenever lthread moves a heavy computation into a pthread, all synchronization bets are presumably off. If you've got two "CPU intensive" workers that reference the same data structures, then you're still going to need mutexes, right?
Everything has overhead and the only way to control it is to have options that fit your proposed work load and then optimize based on empirical evidence. If however your only option is the callback, then you have no way to work around its overhead.
Additionally, callbacks have the same amount of overhead but it's not constant because you have to create a side-channel for the state management. That means, instead of a simpler stack for keeping the state, you have to have a periodic stack + a structure or object for all the state even when the callback isn't active.
Yes a coroutine user has to be aware that allocating on the stack has a penalty, similar to being aware that you cannot make a blocking call in an IO loop for example.
On average, yielding ~10 calls deep results in copying ~75 to 100 bytes but it all depends on what has been on the stack. One advantage in lthread is it's easy to take advantage of cores which isn't very natural in IO loops.
Yes you'll need a synchronization mechanism when accessing shared data structures from multiple CPU intensive workers.
Wait a sec .. I just realized you're copying the entire stack. If I understand correctly, that means that when you move stuff to a compute_lthread, the addresses of local variables change, don't they?
I often take addresses of local variables -- if I understood correctly, this deserves a huge warning in the documentation.
Correct. The local variables address change, but you can still access them, and pass them to functions. What you cannot do is save a pointer of a variable and access inside begin()/end().
I thought I added a warning in the lthread_compute_begin() section but apparently not. I'll go ahead and add it.
It might also be possible to have a "debug mode" that scans the stack while copying it to the lthread_compute_begin() thread, and warns you if any of it looks like pointers that point into the copied stack. It will probably be negligible compared to a long-running thread (compare 60 pointers against a lower and upper bound), and it might have false positives occasionally -- but could save a lot of debugging time...
Well to be fair Javascript is kind of limited so Ryan does not have language support for coroutines unless he wants to redesign Javascript. Even better for node.js would be something a bit more CSP-like such as goroutines and channels but again - Javascript. If you want Javascript with coroutines you can always use Lua.
Speaking of Lua, it doesn't actually have any language constructs for coroutines - coroutine creation and switching is handled entirely via library functions in the `coroutine` package.
So while adding coroutine support to V8 would probably require some substantial internal rewrites, it wouldn't necessitate changing the language - just add a global Coroutine object with the coroutine-switching functions like create, resume, yield...
Right, it just needs a sufficiently powerful C API and internals designed with coroutines in mind. ("just" sounds out of place, there...)
If you want to look at the Lua coroutine implementation, start at auxresume (http://www.lua.org/source/5.1/lbaselib.c.html#auxresume) in lbaselib.c, and the functions tagged luaB_ more generally. (In 5.2, they've been moved to their own file, lcorolib.c.)
I would like to know something here. Since there is only 1 stack for a pThread and the stack state is swapped on a context switch for a co-routine within a pThread, wouldn't this be quite expensive if there are a lot of variables on stack for that co-routine? Or is the swapping being carried out in a different manner, eg. caching, pre-allocated stacks, etc..
> Cute but only a fool would introduce this complexity and overhead for easing their C programming experience.
Anyway, that's cool. Keep up good work, tebrikler :)