It is interesting, but what I really need in the JSON functionality of PG is som...

rhizome · on June 4, 2012

Isn't this a category error? If you need to query within serialized data, you don't want to serialize. Normalize your database for this.

randomdata · on June 4, 2012

Wouldn't that kind of defeat the purpose of using a document database in the first place? Being able to throw unstructured data into the system, and then being able to query on that data once the space is better understood is where the document databases really shine.

If you are able to start with rigid structure, you, in many cases, could have just used a relational database to begin with.

qeorge · on June 5, 2012

In my (limited) experience with Mongo, its just not like that.

You can't just (quickly) query any key in the document - you have to define indexes anyway or you'll be doing the equivalent of a table scan or Map/Reduce.

Document databases shine when you're just not doing that kind of query anyway, and you've got a ton of data to store. If you need to do e.g., fulltext and/or geospatial search you should probably use a search server anyway (I like Sphinx), and then just lookup by PK in whatever datastore you're using.

Its been mentioned elsewhere on this thread, but I'm also quite fond of the old FriendFeed method: serialize your arbitrary data into a (MySQL) BLOB, and then run a background process to build "indexes" (really tables) when you find out you need them.

Its ridiculously simple to start doing that, and there's no new tech to learn.

rhizome · on June 4, 2012

Wouldn't that kind of defeat the purpose of using a document database in the first place?

Yes. Use the right tool for the job.

mgkimsal · on June 5, 2012

Defining 'right', and being able to predict 'right' for future iterations/versions of an app's life is where that mantra seems to fall down.

I'm not a huge nosql fan right now, given that I sometimes need document-style schemaless data storage, but I always (eventually) need adhoc reporting and relational querying capabilities on projects. With that (fore)knowledge, I may as well always choose a relational db.

DEinspanjer · on June 4, 2012

Well, what I need is expressive and flexible querying over document data that doesn't have a rigid schema. There are several document stores that provide filtering and retrieval matching that, but they all have various limitations when it comes to aggregation.

rdtsc · on June 4, 2012

Or de-normalize and keep your data serialized but also keep an additional index of data you want to query.

rhizome · on June 4, 2012

Serialized is already denormalized, you're talking about a kind of double-denormalization where you introduce metadata as a guide for the serialized-data queries. Spaghetti.

jerrysievert · on June 4, 2012

exactly. calling the find_in_obj() function can get fairly expensive, especially given the need to execute JSON.parse() for every call.

there is definitely a lot of room for improvement in the postgres native JSON toolkit. i am hoping that building this exposes more of those issues and helps move it forward.

rarrrrrr · on June 4, 2012

I haven't tried it, but I think you would be able to use a PostgreSQL functional index (i.e. indexing the result of a function call to extract a particular bit of data from the JSON.)

http://www.postgresql.org/docs/9.1/interactive/indexes-expre...

rosser · on June 4, 2012

I was just coming here to say exactly this. You could also use a partial index (that is, an index with a WHERE clause; so, "WHERE json_field LIKE '%foo.bar = baz%'" or whatever).

jerrysievert · on June 4, 2012

you can, but in order to build a partial index, the function needs to be immutable. not exactly ideal in this case.

rosser · on June 4, 2012

Immutability is perfectly appropriate for that use-case, though. Creating an index via a function that may return different outputs when provided the same input is always going to lose.

disbelief · on June 4, 2012

But the same is true for MongoDB. If you're querying an attribute of a document that isn't indexed, Mongo has to scan all of the Documents in the collection for it. The solution to this is to have additional indexes on any attributes you want to be able to query by. This holds true both in Mongo and this Postgres implementation.