More

dgrnbrg · on Oct 6, 2018

What about the 3rd founder? He got written out of the narrative early on, but he built the original platform and was bought out in the first few years.

dhnsmakala · on Oct 6, 2018

Really? Can't find any info on this.

austenallred · on Oct 7, 2018

Ya you’re gonna have to have a source on that one

dhnsmakala · on Oct 6, 2018

Am curious to know more if you have more info.

dgrnbrg · on Dec 24, 2017

At least one commercial company I've heard of has a sophisticated permissioning model, so that the owner of the chain can publish it and have its causal history and/or integrity be validated by all participants, but with specific permission scopes so that subscribes to the blockchain can only see inside of transactions they have permissions for.

dgrnbrg · on Aug 6, 2016

It's not just the root's event log--it's all the event logs. The deferrals allow us to batch writes in an optimized way, and do so with a fully incremental algorithm. In the IO cost model, each "block" read or costs 1 IOP to perform, so we can reduce the IOP cost of writes by a factor of 100-1000x, but reads will still have the same # of nodes to fetch. The event logs augmenting the tree are an effective IO optimization on a B+ tree.

anorwell · on Aug 6, 2016

Thanks, makes sense.

dgrnbrg · on Aug 5, 2016

Creator here. As other commenters noted, this data structure only requires keys to be sortable, not hashable, and the best sorted performance we can achieve is O(log(n)).

That said, when a Cuckoo hash gets very full and bounces entries around a lot, there might be an advantage to buffering operations and choosing insertion patterns that reduce the batch's insertion time. Then again, Cuckoo hashes already perform so well for situations they're designed for, so it's hard to improve them with an event log overlay.

dgrnbrg · on Aug 5, 2016

The code actually includes a benchmarking tool that's meant to help in figuring out this decision. Once you've selected your key size (or distribution), value size (or distribution), and backend, you can play with these factors. Intuitively, targeting "block" size should improve perf. So, 4k-1MB if you're in memory or local, 1.5k-9k if you're over the network (and depending on configuration).

I did some work to split nodes based on size rather than number of children, but that requires accurate & fast size estimation of the decompressed objects, which isn't possible in general.

dgrnbrg · on Aug 5, 2016

Creator here. You could use this from Java using the Clojure API for Java; however, many of the extension hooks use Clojure protocols. I would recommend writing a shim to Java for your particular application, so that your data types are represented in the way you want.

sievebrain · on Aug 5, 2016

That'd required the person doing the using to know Clojure, right?

dgrnbrg · on Aug 5, 2016

Indeed that would.

dgrnbrg · on Aug 5, 2016

That is very cool! When you end up doing reads, do you compute all the pending writes along the path from root to leaf, and then run queries on the projection?

rix0r · on Aug 5, 2016

For index-seeks, yes.

For key-seeks, it's a normal tree walk to the leaf (unless an INSERT is found at a higher level, iirc)

In practice, the blocks are so huge that any tree is going to have at most depth 3, and the root page is likely in cache, so 1 or 2 s3 fetches (= roughly 200ms, let's say)

I've been toying with the idea of putting the root page (or maybe the top 2 levels of pages) in DynamoDB, which would make them fast even if they weren't in cache.

dgrnbrg · on Aug 5, 2016

Creator here. There's no visualizations of this yet, but I'll be speaking about it this year at Strange Loop. For that talk, I'll be creating visualizations, so stay tuned!

dgrnbrg · on Aug 5, 2016

Creator here. I would absolutely agree with this, but let me shed a bit more light to clarify: the underlying idea of B+ trees with inline caches is not patented (or maybe it was, but a long time ago?). Tokutek's patents are all focused on the methods they use to achieve concurrent operations on a single tree--they have a very interesting & novel locking protocol.

The Hitchhiker tree completely avoids their patents, since it makes very different decisions in order to become purely functional.

rdtsc · on Aug 5, 2016

Thanks for explaining. That makes sense.

Great work by the way! Really like the description of the code structure on the README page.

If patent is indeed not an issue, I might play with translating that to Erlang. You already did the hard work and also made it functional. I'd have to learn to read Clojure.

Speaking of functional, I was surprised to find the other day, Erlang's queue module implements an Okasaki API, complete with funny function names like deah, liat, and snoc.

http://erlang.org/doc/man/queue.html

dgrnbrg · on May 26, 2016

I wrote a similar project to this in Clojure, called Piplin: http://www.piplin.org. I worked on it for a year, and did some interesting things to enable REPL-driven FPGA development, but over time my interests shifted to macro-scale systems.

sklogic · on May 27, 2016

There was also this project: https://www.usenix.org/conference/dsl-99/verischemelog-veril...