It is interesting, but what I really need in the JSON functionality of PG is some internal representation that will allow fast and efficient exploration of the JSON blob within the query. i.e. being able to refer to a single attribute within the select/where/groupby clauses without having to pay the toll of serde every time.
Wouldn't that kind of defeat the purpose of using a document database in the first place? Being able to throw unstructured data into the system, and then being able to query on that data once the space is better understood is where the document databases really shine.
If you are able to start with rigid structure, you, in many cases, could have just used a relational database to begin with.
In my (limited) experience with Mongo, its just not like that.
You can't just (quickly) query any key in the document - you have to define indexes anyway or you'll be doing the equivalent of a table scan or Map/Reduce.
Document databases shine when you're just not doing that kind of query anyway, and you've got a ton of data to store. If you need to do e.g., fulltext and/or geospatial search you should probably use a search server anyway (I like Sphinx), and then just lookup by PK in whatever datastore you're using.
Its been mentioned elsewhere on this thread, but I'm also quite fond of the old FriendFeed method: serialize your arbitrary data into a (MySQL) BLOB, and then run a background process to build "indexes" (really tables) when you find out you need them.
Its ridiculously simple to start doing that, and there's no new tech to learn.
Defining 'right', and being able to predict 'right' for future iterations/versions of an app's life is where that mantra seems to fall down.
I'm not a huge nosql fan right now, given that I sometimes need document-style schemaless data storage, but I always (eventually) need adhoc reporting and relational querying capabilities on projects. With that (fore)knowledge, I may as well always choose a relational db.
Well, what I need is expressive and flexible querying over document data that doesn't have a rigid schema. There are several document stores that provide filtering and retrieval matching that, but they all have various limitations when it comes to aggregation.
Serialized is already denormalized, you're talking about a kind of double-denormalization where you introduce metadata as a guide for the serialized-data queries. Spaghetti.
exactly. calling the find_in_obj() function can get fairly expensive, especially given the need to execute JSON.parse() for every call.
there is definitely a lot of room for improvement in the postgres native JSON toolkit. i am hoping that building this exposes more of those issues and helps move it forward.
I haven't tried it, but I think you would be able to use a PostgreSQL functional index (i.e. indexing the result of a function call to extract a particular bit of data from the JSON.)
I was just coming here to say exactly this. You could also use a partial index (that is, an index with a WHERE clause; so, "WHERE json_field LIKE '%foo.bar = baz%'" or whatever).
Immutability is perfectly appropriate for that use-case, though. Creating an index via a function that may return different outputs when provided the same input is always going to lose.
But the same is true for MongoDB. If you're querying an attribute of a document that isn't indexed, Mongo has to scan all of the Documents in the collection for it. The solution to this is to have additional indexes on any attributes you want to be able to query by. This holds true both in Mongo and this Postgres implementation.