Aaronontheweb

Hacking .NET and Startups

Migrating from RavenDB to Cassandra

February 20, 2013 11:30 by Aaronontheweb in RavenDB, Cassandra // Tags: , , // Comments (4)

Today on the MarkedUp Analytics Blog I authored a post entitled “Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack.”

In it I explain MarkedUp’s evaluation process for choosing a new database, how we selected Cassandra, and some benchmarks from our test. If you want to learn more, go read it.

I’ve written about RavenDB in the past on and I’ve spoken about it at code camp before. I think it’s a great technology for prototyping and it has some really interesting concepts not found in any other database that make Raven really simple and elegant to operate.

So I wanted to copy over a section from our blog post about building MarkedUp on Cassandra which explains why we ultimately needed to move away from RavenDB:

Looking back to what went wrong with RavenDB, we determined that it was fundamentally flawed in the following ways:

  • Raven’s indexing system is very expensive on disk, which makes it difficult to scale vertically – even on SSDs Raven’s indexing system would keep indexes stale by as much as three or four days;
  • Raven’s map/reduce system requires re-aggregation once it’s written by our data collection API, which works great at low volumes but scales at an inverted ratio to data growth – the more people using us, the worse the performance gets for everyone;
  • Raven’s sharding system is really more of a hack at the client level which marries your network topology to your data, which is a really bad design choice – it literally appends the ID of your server to all document identifiers;
  • Raven’s sharding system actually makes read performance on indices orders of magnitude worse (has to hit every server in the cluster on every request to an index) and doesn’t alleviate any issues with writing to indexes – no benefit there;
  • Raven’s map/reduce pipeline was too simplistic, which stopped us from being able to do some more in-depth queries that we wanted; and
  • We had to figure out everything related to RavenDB on our own – we even had to write our own backup software and our own indexing-building tool for RavenDB; there’s very little in the way of a RavenDB ecosystem.

So based on all of this, we decided that our next database system needed to be capable of:

  1. Integrating with Hadoop and the Hadoop ecosystem, so we could get more powerful map/reduce capabilities;
  2. “Linear” hardware scale – make it easy for us to increase our service’s capacity with better / more hardware;
  3. Aggregate-on-write – eliminate the need to constantly iterate over our data set;
  4. Utilizing higher I/O – it’s difficult to get RavenDB to move any of its I/O to memory, hence why it’s so hard on disk;
  5. Fast setup time – need to be able to move quickly;
  6. Great ecosystem support – we don’t want to be the biggest company using whatever database we pick next.

So there you have it. If you want to learn more about our migration to Cassandra, check out our original blog post.

If you enjoyed this post, make sure you subscribe to my RSS feed!



Code Camp Talk: RavenDB vs MongoDB

June 27, 2012 06:19 by Aaronontheweb in MongoDB, RavenDB // Tags: , , // Comments (1)

This past weekend at SoCal Code Camp I presented a session along with my friend Nuri Halperin entitled “Battle of the NoSQL Databases: RavenDB vs. MongoDB.”

I represented the RavenDB team, having used it in production now for a couple of months (and ditched Mongo to do it.) I’ll blog more about the specifics of RavenDB and what it’s awesome at some point in the future, but nevertheless I wanted to post my slides here so you could see the bullet-by-bullet comparison between the databases.

We didn’t cover everything, but we did try to capture all of the high-level details:

NoSQL Shootout: RavenDB vs MongoDB
Update: Some errata that has been pointed out to me courtesy of Itamar Syn-Hershko of the Hibernating Rhinos team.
 Raven actually uses BSON internally as well, and has no auto-sharding support by design, see 

If you enjoyed this post, make sure you subscribe to my RSS feed!



Search

About

My name is Aaron, I'm an entrepreneur and a .NET developer who develops web, cloud, and mobile applications.

I left Microsoft recently to start my own company, MarkedUp - we provide analytics for desktop developers, focusing initially on Windows 8 developers.

You can find me on Twitter or on Github!

Recent Comments

Comment RSS

Sign in