NoSQL and SQL

Have a look at this video of Brian Aker's great 10 minute lightning talk about NoSQL. NoSQL is a database movement which promotes non-relational data stores that do not need a fixed schema.

With the exception of Memcached, the Drupal world is very SQL driven -- and for all the right reasons. However, both at Mollom and Acquia we have to deal with some big data volumes and have been using some of the NoSQL technologies like Hadoop, Thrift, Hbase, Tokio Cabinet, etc.

While these are great tools, at the end of the day, NoSQL is an "and" and not a "versus". Plus, I expect the gap to close as there are a couple of interesting projects under way that bring some of the NoSQL advantages to the SQL world. One of them is Brian Aker's own Drizzle project, another project I can't talk about yet ... Sufficient to say, don't ditch your relation database just yet.

Either way, have a look Brian's NoSQL presentation. It's funny!

Comments

chx (not verified):

Let me point out MongoDB. I had a talk at DrupalCamp Vienna recently hopefully the video will be up soon and I for sure will talk about it at the DCSF.

December 4, 2009 - 13:31
Marc (not verified):

I think he really got the point wrong and is basically mocking the NoSQL people.

The NoSQL movement doesn't say that SQL sucks. NoSQL just means that there are different solutions for different problems.

If you want to work with a lot of data and be able to run dynamic ad-hoc queries on it, use a relational database with SQL. Using a key value store doesn't make any sense for that unless you want to easily be able to distribute your workload on several machines without having to go though the hassle of setting up a relational database cluster.

If you want to just keep your objects in a persistent state and have high-performance access to them (e.g. a LOT of web applications), use a key value store.

Also a lot of the other stuff has just been misrepresented. He is right that a lot of key value stores don't have "transactions". They do however have atomic operations, they do also have the ability to script those atomic operations (Tokio Cabinet can e.g. understand Lua). relational databases rely on stored procedures for that kind of stuff (especially in high-load environments).

The success of things like Hybernate or active record have shown that there is a definite need for easy object persistence. Especially with patterns like MVC, you're eliminating a lot of dynamic queries and running 90% of the SQL just to get the object back in the way you saved it. Thanks to key value stores, you don't have to switch between "my model as an object" and "my model as 20 tables connected by other tables" but can just stay in your "original" language of choice.

December 4, 2009 - 13:42
Larry Garfield (not verified):

We were having the document-vs-relational debate at Palantir not long ago, mostly just for fun. (We're weird like that.) The basic conclusion we reached is that, in concept, document-centric databases (which, by the way, includes the file system on your hard drive) are really great for complex object retrieval, but not so hot on arbitrarily complex data querying and mining and analysis. SQL is really really good for arbitrarily complex data querying and mining and analysis (that's what it was created for), but not so hot for complex object retrieval. Just look at all the pain we go through in order to load and save nodes with a couple of multi-value CCK fields. Oy!

Of course, the main challenges are that 1) We need to support both complex object retrieval and arbitrarily complex querying, mining, and analysis. 2) There is a universally available and understood SQL database that we already support (MySQL), but there is no universally available and understood document/object database. OO databases have been around for decades, but none have achieved the sort of market universality that MySQL has. You can always count on Random Cheap Hosting, Inc having MySQL available (and if it doesn't, really, there's $5/month options that do). There is no such equivalent for document/object databases, nor is there a clear standard syntax that even approaches SQL (which is a huge trainwreck of non-standards-compliance on its own).

It's not an easy problem to solve. If there were a clear semi-standard like SQL, we could write an abstraction layer for that like we have for SQL databases. If we had some document/object DB that we could rely on always being available like we can with MySQL, we could add that support to core and do all sorts of amazing things with our entities. (Entities in document DB, main configuration in SQL? Win!) Both of those need to be solved first, though, before we can go that route nicely.

December 4, 2009 - 17:15
Cary Gordon (not verified):

As a set theory wonk at heart, I can relate to the person who exclaimed "I like my job" in response the the question, "Do we really need schemas."

December 8, 2009 - 20:45
Chris Charlton (not verified):

Making fun or not, it was very entertaining. I think the message was not No-SQL is inferior, but that we have depended on SQL for too long and the world's data has not changed but gotten larger and everyone is in the same pot - it does need to change.

December 24, 2009 - 18:29
Arnaud ZIEBA (not verified):

Whether the tone is somewhat funny and might even sound a bit deriding to some, this is a decent introduction to comparing both worlds.
SQL was designed with a specific purpose in mind and a solid conceptual background. Same for NoSQL. There is no exclusivity, it is just an alternative.

April 3, 2010 - 10:01
Josh (not verified):

Thanks for the introduction. It absolutely takes humor to bridge this conceptual gap!

For us folks who spend more of their day in programming and less in database land, this is the only way we're going to be able to make the brain space and make use of this new tool for added performance... when it's appropriate.

April 6, 2010 - 08:55
Alaric Snell-Pym (not verified):

SQL and NoSQL are really just *interfaces* to data, not places where the data is stored. After all, SQL and NoSQL databases generally shove the records in a B-Tree.

Eventual consistency, replication, and sharding are by no means monopolised by the NoSQL world - all can be done with SQL, and indeed MySQL supports them all. The NoSQL crowds, making up their conceptual models from scratch, have been making them easier to implement distributed data storage as that's a big trend at the moment, but that's all the difference.

So, I think, the biggest differences between SQL and NoSQL are simply that the NoSQL databases have been employing a more flexible schema model (I cringe at the term 'schemaless' and prefer the idea of a dynamic schema that grows itself automatically whenever you add new fields, but that's just a matter of perspective), and employing a simpler interface.

After all, SQL vs. NoSQL isn't even "relational" versus "non-relational". What makes a database relational, document-oriented, graph-oriented, object-oriented? It's all a matter of usage, really - the records are still a bag of key=value pairings. In the relational model, you might call one of them a foreign key relationship to another table. In a graph database, you might call that a relationship; in an object database, a pointer. But peel away the query layer and there'll still be some kind of field containing some kind of record identifier.

I think there's more in common than different between the different data models, to be honest. It's easy to look at them all and merge their features. Add inheritance and some kind of record type identifier, and you have an object database. Add an explicit model of references between records, and you have a graph/object database. Add the ability to do JOINs in your query language, and you have a relational database. Add the ability to store arbitrarily tree structures in your records, and it's a document database.

Then on top, add as many interfaces as you like: SQL and it looks like a relational database, ORM and it looks like an object store, HTTP and it looks like a document store, a funky native API and it looks like a NoSQL store...

Here at GenieDB where I work, we've recognised that each approach has its strengths. But different parts of the same system need different strengths. Having a sea of different databases with different strengths and putting bits of your data in each is a step in the right direction, but it causes administrative pain, and you can't do things like JOIN between your different data stores easily. So we're working to combine the different database models, but all talking to the same data. Right now, we've written an awesome replicated key-value/document NoSQL store (with many cool features of its own), but then we've written a MySQL storage engine so it can be viewed as SQL. When you create an SQL facade, you need to declare certain fields, while the underlying store has a dynamic schema - but we just return NULLs if MySQL expects a field that's not there, marrying the two data models; MySQL can't see fields it's not been told about, but that's OK. MySQL apps don't expect to see random unheard of fields appearing from nowhere.

Foreign key relationships to make ORM / graph store access work are in the pipeline.

Check out our site at http://www.geniedb.com/ and our tech blog at http://blog.geniedb.com/ for more information!

April 18, 2010 - 12:36

Add new comment

© 1999-2014 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.