Tuesday, November 29, 2011

Relational DBs vs NoSQL vs OLAP(Hadoop) - When to use What

There are hundreds of database technologies and there are more and more evolving day by day - generally, they could be classified into these three categories -

Relational DBs
These are our old school relational dbs that have worked for us wonderfully for the last 3 or more decades.

Oracle,
MySQL,
SQL Server,
Informix

are under this category. These dbs are meant mainly for transactional processing (OLTP). Although, a few years ago, people largely used one of these according to their needs, with the evolution of NoSQL databases, there needs to be all the more reason to differentiate when to use relational DBs vs the NoSQL DBs. If your business need is to involve transactional processing/processing credit cards, customer accounts which are highly sensitive, relational DB is still the best way to go. The ACID properties (Atomicity, consistency, isolation, durability) draws main attention here since this guarantees that data is processed reliably.

NoSQL DBs
This is the new kid in town and so is drawing a lot of attention during the past few years. Some of the names that you might have heard in this category are -
- Cassandra
- MongoDB
- Apache Couch DB
- Simple DB (from Amazon's cloud offering)
- MarkLogic
- Riak

Most of these DBs mainly are implemented with the CAP (Consistency, Availability, Partition Tolerance) theorem in mind. And what is the need for a NoSQL db? This comes in handy when you need to process volumes of data efficiently without caring about the transactional aspect of the business. The data in these systems are for the most part stored in the form of key-value pairs.
One interesting thing that is noticeable is that all of these dbs call their table-equivalents using different names.

In Cassandra, the table-equivalent is called a column family. Names are always intriguing. May be it is a family of columns and that is why it is being called as a column family. In that case, why not a row family?:-)

OLAP (Hadoop)
These days, it is all about Big data/Hadoop. More on this soon.

Friday, November 18, 2011

Building an API

Why do need them
Evaluating Technologies
Design
Feedback loop
Versioning
Supporting the customers
Best Practices
Scaling your system
Consumers

Storm: Realtime Dataprocessing at twitter

Here is Nathan Marz's blog with all the details - http://nathanmarz.com/