CouchDB: No Threat to RDBMS
Eventually I plan on getting around to doing some in-depth Erlang programming. Until then, it's nice to read up on current projects like the much-hyped CouchDB. According to the CouchDB website...

CouchDb is
- A document database server, accessible via a RESTful JSON API.
- Ad-hoc and schema-free with a flat address space.
- Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
- Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.
...And although they claim it is not a replacement for RDBMSs, there are people who think it is. I have to say, this project looks very promising. I can envision lots of uses for it in which it would simplify apps that just want to quickly save some data. Or apps that are storing lots of "objects" that all differ slightly. Perhaps aggregating content from multiple sources. I love stuff like this that get you to think about something in a whole new light.
But, here's the problem. When a nifty project like this becomes over-sold and pushed beyond its expectations, I can't help but put on my critical hat. The problem is fueled by the fact that there are lots of Erlang programmers out there that are itching to see Erlang take off, so they'll gladly prop up an Erlang project with inflated claims. Let's take a look at some of the reasons why CouchDB will never be a threat to RDBMSs and some of the fallacies being spread.
No fixed columns = less ability to optimize
It's been a while since my database theory class in grad school, but I seem to recall discussion about the fixed structure of database columns as a key ingredient in achieving high performance. Knowing the distance that each record takes up in a binary file allows the data to be read much faster. And, extracting the necessary record(s) from file is where most of the time is spent for a given query. Because CouchDB takes a "no fixed structure" approach, then it seems that it will lose to an RDBMS in disk access, which is the most time-consuming operation.Bidirectional replication and peer setup = extra overhead or transactional issues
Replication is not my area of expertise, but I do know how to count and how to think, so let's do some basic math and logic. For simplicity, let's pretend we have 4 beefy db servers. In configuration A, we have 1 master server and 3 slave servers. Whenever there is an update it takes place on the master server and the changes are sent to the 3 slave servers simultaneously. Thus, 1 update = 3 messages.
In configuration B, we have 4 server nodes connected in a peer-to-peer fashion. If the machines are each connected to no more than 2 other machines, then messages will have to travel along mutliple hops, which would increase the number of messages and the time it takes for the message to be broadcast to all peers. So, instead, let's assume that each peer is connected to every other peer. Now, 1 update = 3 messages, but there is a different problem that has cropped up. Any peer wishing to update a record will have to check with every other peer for changes first before it can safely do so. Or, peers will have to be assigned as authoritative for subsets of the global data. I'm not saying the P2P way of doing things is bad. In fact, I think it allows for the system to scale rather easily. But, we have to acknowledge the performance and transaction issues that come with this approach.No tables = less structure = worse performance
Every means of organizing data also leads to performance optmizations. Storing data in tables is an additional means of classifying the data, which allows for faster extraction later. CouchDB's table-less approach will be great for allowing people to easily store and retrieve things because they don't need to think about tables, but there will be a performance hit.Transactional weakness = no enterprise use
Ooops. I said the 'e' word. I know, I know. I don't like when people throw around the word "enterprise" in a demeaning way. Like "Your Ruby is not as enterprise-ish as my Java or .Net or C++". But, sometimes when the shoe fits...Personally, I would like to know that Fidelity Investments is safeguarding the integrity of its data and every transaction that takes place so that my account balance is accurate. Here's my favorite scary quote on this issue:
"Features like referential integrity, constraints and atomic updates are really important in the client-server world, but irrelevant in a world of services."
You're kidding, right? So, in a world of services, you don't need transactional support across a number of operations? And referential integrity is for the birds? That's fine if CouchDB wants to skimp on transactional support and use optimistic locking to attempt to boost performance, but you can't have your cake and eat it too. That may work for some apps, but it's not going to fly for ones in which the data and the operations on that data are critical to the business and stakeholders.C is faster than Erlang
I don't program in C because it takes too long and is way too tedious. But you better believe the workhorse of my web apps (usually a database server) is going to be programmed in C. As mentioned previously, CouchDB is programmed in Erlang while all the major database players use C. I don't think this point needs to be discussed further.
I know this post is coming across a little hard on CouchDB, but it seems there needs to be some balance in the discussion. This seems like a great project, but lets acknowledge that it has some strengths and some weaknesses. It's a great concept, but as the documentation states, it's not a replacement for relational databases.
About this entry
You’re currently reading “CouchDB: No Threat to RDBMS,” an entry on VotanWeb
- Published:
- September 7th 09:40 AM
- Updated:
- September 8th 09:20 AM
- Sections:
- Erlang


2 comments
Jump to comment form | comments rss [?]