Blog Archives

DataStax Interview: Using Cassandra At SkillPages

Editor’s note:  This interview was first featured on the DataStax blog.  They chatted to George Ciubotaru, a software engineer and database administrator here at SkillPages, to hear more about how we use Apache Cassandra and DataStax Enterprise as part of the engineering behind SkillPages.  

DataStax: Thanks for taking the time to chat with us today, George.  Give us an idea of what you guys do at SkillPages.

George:  SkillPages is a social utility that connects people with skills to the people who need them.  We cover all skills – from software engineering to plumbers, babysitters to entertainers and everything in between.  SkillPages now has over 7 million members from over 160 different countries and we’re currently growing at a rate of one new member every 3 seconds!  Best of all, registering your skills and finding skilled people is completely free.

DataStax: What type of technical infrastructure do you use to support your platform?

George: We began by using all Microsoft technologies, and still use Microsoft in a number of areas.  For example, our main development language is C#.  However, we’ve been pushing more into Linux with our usage of Cassandra, Hadoop, and Solr.

DataStax: Do you run your systems on premise or in the cloud?

George: Our production instances run the cloud; we use Amazon for that.  We do our development locally, however.

DataStax: So how did you come to start using Cassandra?

George: We’ve been a customer of DataStax for a while now and use both Cassandra and OpsCenter.  We started out using Microsoft SQL Server for our platform, but for our particular use case we needed a different approach to scale-out and I/O, which is why we arrived in the NoSQL world.

Our main use case for Cassandra is a graph; a social graph that we built.  While there are other NoSQL databases that specialize in graphs, we decided to use Cassandra instead.

DataStax: Why?

George: NoSQL databases like Neo4J only scale vertically and that limitation was too much for us.  We felt it wouldn’t scale like we needed.

We also looked into HBase, but in the end we chose Cassandra for its scalability and much easier method of deployment over something like HBase.  Cassandra was very easy to install, setup, and test against.

That ease of deployment along with continuous availability and scalability were the primary reasons we chose Cassandra. Other reasons were product maturity, the strong Cassandra community, and the support we get from DataStax.

DataStax: How have you deployed Cassandra today?

George: Right now we have clusters that span two different data centers.  One is a production cluster of 12 nodes and a second cluster is about to go into production that is 6 nodes, but is based on bigger boxes and SSD’s, with more RAM, etc., on Amazon.  In terms of data volume, we keep around 3 TB’s on the clusters.

We also distribute data among 3 Amazon availability zones.  We’ve already experienced one availability zone going down and we had no outage problems at all where Cassandra and our application were concerned.

DataStax: What advice would you give someone new to Cassandra?

George: I’ve found Cassandra very easy to explain, both in terms of the data model and how it operates. It’s very easy to learn in that respect.

The one thing someone should watch for if they’re coming from other databases like Microsoft SQL Server is that the management tools and utilities are not as advanced right now as they are in the RDBMS world. But, with each new release, tools like OpsCenter are getting there, which is nice to see.

DataStax: George, thanks for your time today.

George: Sure thing.