Category Archives: SkillPages Engineering

DataStax Interview: Using Cassandra At SkillPages

Editor’s note:  This interview was first featured on the DataStax blog.  They chatted to George Ciubotaru, a software engineer and database administrator here at SkillPages, to hear more about how we use Apache Cassandra and DataStax Enterprise as part of the engineering behind SkillPages.  

DataStax: Thanks for taking the time to chat with us today, George.  Give us an idea of what you guys do at SkillPages.

George:  SkillPages is a social utility that connects people with skills to the people who need them.  We cover all skills – from software engineering to plumbers, babysitters to entertainers and everything in between.  SkillPages now has over 7 million members from over 160 different countries and we’re currently growing at a rate of one new member every 3 seconds!  Best of all, registering your skills and finding skilled people is completely free.

DataStax: What type of technical infrastructure do you use to support your platform?

George: We began by using all Microsoft technologies, and still use Microsoft in a number of areas.  For example, our main development language is C#.  However, we’ve been pushing more into Linux with our usage of Cassandra, Hadoop, and Solr.

DataStax: Do you run your systems on premise or in the cloud?

George: Our production instances run the cloud; we use Amazon for that.  We do our development locally, however.

DataStax: So how did you come to start using Cassandra?

George: We’ve been a customer of DataStax for a while now and use both Cassandra and OpsCenter.  We started out using Microsoft SQL Server for our platform, but for our particular use case we needed a different approach to scale-out and I/O, which is why we arrived in the NoSQL world.

Our main use case for Cassandra is a graph; a social graph that we built.  While there are other NoSQL databases that specialize in graphs, we decided to use Cassandra instead.

DataStax: Why?

George: NoSQL databases like Neo4J only scale vertically and that limitation was too much for us.  We felt it wouldn’t scale like we needed.

We also looked into HBase, but in the end we chose Cassandra for its scalability and much easier method of deployment over something like HBase.  Cassandra was very easy to install, setup, and test against.

That ease of deployment along with continuous availability and scalability were the primary reasons we chose Cassandra. Other reasons were product maturity, the strong Cassandra community, and the support we get from DataStax.

DataStax: How have you deployed Cassandra today?

George: Right now we have clusters that span two different data centers.  One is a production cluster of 12 nodes and a second cluster is about to go into production that is 6 nodes, but is based on bigger boxes and SSD’s, with more RAM, etc., on Amazon.  In terms of data volume, we keep around 3 TB’s on the clusters.

We also distribute data among 3 Amazon availability zones.  We’ve already experienced one availability zone going down and we had no outage problems at all where Cassandra and our application were concerned.

DataStax: What advice would you give someone new to Cassandra?

George: I’ve found Cassandra very easy to explain, both in terms of the data model and how it operates. It’s very easy to learn in that respect.

The one thing someone should watch for if they’re coming from other databases like Microsoft SQL Server is that the management tools and utilities are not as advanced right now as they are in the RDBMS world. But, with each new release, tools like OpsCenter are getting there, which is nice to see.

DataStax: George, thanks for your time today.

George: Sure thing.

Presenting at Amazon’s Lean Cloud Event

SkillPages Team

I was recently invited by Amazon to present at their Lean Cloud Event in London, to share some insights and experiences around our engineering journey so far.  Contributing back to the community is something the whole engineering team here is passionate about so we were delighted to be part of the event.

The whole event was very enjoyable and informative.  I was blown away by the amount of energy that was evident.  It was inspiring to chat to so many people who are building really exciting start-ups and who were interested in our technology, methodologies and experiences.

If you’re curious about our AWS experience you can view the slides on SlideShare.  Based on some great follow up questions, I figured I’d take this opportunity to expand upon some aspects of our business that folks found interesting.

The Technology behind SkillPages

We get asked quite often about the technology behind SkillPages – we are cloud based platform, based on AWS, and we run a variety of best of breed software on top of this. Essentially, we prefer to focus on building a really great platform that our members find useful – the infrastructure we leave to those that do it best.  We like to consider ourselves technology agnostic and prefer to use the best technology to solve the problem.  Our architecture is such that any of these elements can be easily replaced as technology advances and better solutions are found.

At the moment we are doing a lot of work with provisioned IOPS and High IOPS SSD backed instances, which is showing great promise on our Cassandra clusters.  Interestingly enough, Cassandra is not our sole noSQL element; we are also leveraging DynamoDB to handle elements such as atomic counters.  The key point is that AWS is a very cost effective platform if properly managed and there are a whole load of tools available to help with this, for example Bill Tagging.

Measure Everything

At SkillPages we are relentless about leveraging our data to improve how we serve our members.  Our motto is ‘measure everything’ – anything else is just opinion.  We have a fantastic infrastructure in place right now, using EMR to crunch a vast amount of data and onward it to our BI toolsets for analysis and visualization.  Although it looks simple enough on a slide, it did take a few iterations to get this just right!

In the presentation I reference ‘Boyd’s Law of Iteration’.  For those not familiar with the history, I recommend reading a bit more behind Boyd’s theory – it’s a fascinating story based around two military aircrafts.  Boyd gives rise to the principle ‘Speed of iteration always beats quality of iteration’.  Some may consider this controversial, but a better way to consider this when growing your business is ‘Where you are today does not matter so much, compared to where you are going tomorrow’.  Based on this we have found that iterating fast and measuring the data is something that works really well for us.

In terms of iterating, we always ensure to make provisions for paying down technical debt accrued along the way.  We, along with every other company (including yours!), incur technical debt – it is inevitable.  Anybody that says otherwise is either in denial (!) or not innovating fast enough.

Our Greatest Asset

There were a few questions around our team, specifically What size is the team? Our engineering team is 19 strong – a very talented bunch of people.  Everybody codes, tests and deploys.  This led into some really interesting questions around How we hire? and Where do we find people with AWS experience?  In truth, we don’t necessarily, the main criteria is that we hire smart people who know how to solve problems and get things done.  For us, our team is our greatest asset.

I hope you found this useful, if you have any specific questions don’t hesitate to get in contact.

Mike McCarthy, SkillPages CTO

Innovating In The Cloud With AWS

Delivering a product that is used by millions of people on a global scale takes a quite a lot of effort and resources.  At SkillPages, we decided early on that we wanted to put our efforts into creating great software that delivers value to our members and not get overly stretched with building out and managing platform infrastructure.  Of course, the underlying platform has to be scalable and flexible enough to accommodate our requirements as we learn and evolve.  This is principally why we chose to cloud-base the SkillPages platform on AWS.

From the success we have had so far with AWS, Amazon have invited me to share our experiences at their upcoming London event.  The event is an introduction into what AWS can offer your business.  If you’re involved in a start-up looking to scale out your venture I highly recommend attendance.

What Will You Learn?

  • Learn about how AWS helps you get to market faster and also stay agile
  • Discover how to implement AWS to power a lean lifecycle when running, scaling, and iterating your product
  • Learn from existing AWS customers about their experiences with Amazon Web Services
  • Network with AWS technical representatives and executives and local technology leaders

Who Should Attend?

  • Founders, co-founders, technical and business-minded entrepreneurs
  • Startups building scalable, lean applications

When?

September 26th 13:00-18:00

Check out the AWS events page for registration and full details.  See you there!

We’re Hiring – Front End Developer Wanted

SkillPages is looking for a Front End Developer for its Dublin office.

Skills & Requirements

We’re looking for a smart cookie who has done (or can do!) the following:

  • Accomplished UI Developer with demonstrable expertise and experience
  • Skilled in developing to the highest quality standards – cleverly and pragmatically
  • Have a hands on approach to all aspects of GUI development
  • Be experienced in developing large scale, high-performance internationalised web applications using .Net technologies

Tech stuff you should be comfortable with:

  • 2+ years solid experience in .Net development (C#, ASP.NET, MVC, REST)
  • Extensive experience in web based front end development (JavaScript, AJAX, JQuery, HTML, and CSS)
  • Expertise in addressing cross-browser compatibility challenges
  • Strong debugging and troubleshooting skills
  • Solid understanding of supporting technologies e.g. webserver, DNS, SSL, Cloud Infrastructure etc.

Nice to haves:

  • Familiarity of HTML5 and CSS3
  • Experience of developing mobile web sites and mobile web applications
  • Experience of frameworks such as jQuery, NodeJS, Ext JS etc.
  • Understanding of deep Facebook integration and application development
  • Prior experience of working in a high growth environment
  • A Bachelor’s Degree or higher in related field

For an insight into working at SkillPages check out our Inside SkillPages blog.

Interested? Apply Now

SkillGraph – Self-Learning Technology

Editor’s note: This is an update from the SkillPages Engineering Team to the development of SkillGraph.  See our previous posts for background information on SkillGraph.

Learning is a continuous human process and we apply the same principle to SkillGraph.  As more skills and opportunities are added, and more people interact with these entries on the platform, the graph absorbs this into its knowledge base.  The nodes in the graph and edges that connect them continuously evolve and grow.  The rules that govern its structure and its use, as well as the semantic understanding behind them are in a constant flux.

Training Methods

We mix traditional knowledge based approaches with sophisticated “training” methods.  User feedback is an important part of this process.  If a member enters ambiguous wording for a skill, they are invited to choose their classification from a list of intelligent suggestions.  For example we might get a skill added for a ‘Tester’ but we don’t know exactly what area to classify them in if we don’t know what they test.  However, in this scenario the SkillGraph can evolve as more and more members, often sharing similar backgrounds, choose to categorize themselves as a ‘Software Test Engineer’. Going forward, the graph can make more confident assertions about subsequent users using the same piece of vocabulary to describe their skill.

Similarly user rejections are just as helpful for the SkillGraph to self-learn.  Given rules are often adapted or even removed when a significant number of members are recorded as rejecting the categorizations associated to a particular rule in favour of others. Take for example the word ‘publication’, based on a set of rules around stemming (a programmatic process for reducing words to common base forms), our algorithm understood this to have the same meaning as the word ‘public’. However as members with skill data containing ‘publication’ continuously favoured results from those within the field of publishing over say the public sector or public relations, the SkillGraph learnt that ‘publication’ had little to do with ‘public’ and more to do with ‘publish’ and so adjusted its rules accordingly.

All user selections and rejections are recorded as this is valuable data to feed back to further inform and develop the SkillGraph.  As a result, the graph is in a continual state of improvement.  With the variety and volume of new skills growing every day, as well as the interactions around them, the SkillGraph becomes more dense and rich, resulting in more accurate and relevant classifications.

Context Is Everything With Skill Search

Editor’s note: This is an update from the SkillPages Engineering Team to the development of SkillGraph.  See our previous post for background information on SkillGraph.

Language And Location Context

We are constantly improving and evolving SkillGraph to better match the most relevant skilled people to your need.  One issue we face is that natural languages are ambiguous and imprecise.  As inconvenient as this is for computers, it is a fact of life and must be managed.  Different words have different meanings in different countries – an American user entering “football player” means something different to a British user entering these titles.  Or a Mumbian using the acronym “TC” (meaning Ticket Checker) describes his job succinctly within an Indian context; however this acronym carries no relevance in other English speaking countries.  Just like a brain uses context to disambiguate the meaning of a word, our classification engine capitalizes on the contextual information that a user provides.

Social Context

In addition to typical methods of providing context, like language or location, we can also leverage the social underpinning of our platform.  Take a user who describes themselves as a “Coach”.  SkillGraph recognizes this as ambiguous, so then examines the skills/interests of the user’s social connections.  Combining this knowledge with language and location, we can make a more confident classification as to whether the user is a Sports Coach, Corporate Trainer or Life Advisor.

This is what helps determine that our members get the most relevant opportunities for their skills.  Another step towards creating the perfect relevance engine, but still plenty more development yet to come.

SkillPages Partnership With Clarity Research Centre

If you’ve read the previous post from our CTO Mike you’ll know that we worked together with the Clarity Research Centre at University College Dublin to create The SkillGraph.  SkillGraph is at the very core of the SkillPages platform, underpinning everything we do.  Essentially it’s a market matching skill relevance engine, connecting people looking for skills to people who have the relevant skills in a more accurate way than was previously ever possible.

Check out the video below for more about SkillGraph and working with UCD Clarity.  Plus, you’ll also get a sneak peek into the SkillPages HQ and see the Engineering team hard at work.

SkillPages Engineering Introduce “The SkillGraph”

Editor’s note:  This is an update from the SkillPages Engineering Team, just to warn you it’s a pretty detailed technical insight into the technology we’re developing at SkillPages!

When you look for somebody to fix your sink you probably figure you need a plumber.  A handyman might do it, or maybe a janitor, even a drain cleaner could do the job.  Your brain does something amazing here: in split seconds it generates a cloud of many different skill names and professions, guiding you to the best match for the task.  Knowing what you mean and having an idea of how various skills are related is what makes the skill ‘search & matching engine’ in your head extremely powerful.

Moving Beyond Simplistic Mechanical Matching

People’s skills and expertise often do not fall into rigid categories.  People use different terms to describe their skills and these skill terms mean different things to different people (think painter of paintings and painter of houses).  This causes a problem.  Traditional online search uses keyword matching to connect a search term to search results.  But this doesn’t work effectively with skill search (a search for a “carpenter” will never return a “woodworker”).  To create a truly powerful, effective and useful online skill search utility, we need to move far beyond simplistic mechanical matching of keywords.  To capture all the variations and subtleties of skill search, we need something very different.  That is why the SkillPages Engineering team and UCD Clarity Research Centre have developed a unique technology called SkillGraph.

SkillGraph is at the very core of the SkillPages platform, underpinning everything we do.  Essentially it’s a market matching skill relevance engine that connects people looking for skills to people who have the relevant skills in a more accurate way than was previously ever possible.

How SkillGraph Works

Rather than rely on keywords, SkillGraph captures the meaning of skill terms used by members on SkillPages.  For example when an opportunity is created by a user on SkillPages, it is matched to people with relevant skills through SkillGraph.  This sees the terminology used in the opportunity “inflated” into a cloud of relevant terms.  This cloud is then instantaneously matched against equivalent clouds of words pertaining to each and every skill that members have added.  A relevancy threshold is used to identify only the most relevant skills and then those users are sent a notification about the opportunity relevant to their skills available for them on SkillPages.  Simplistically, this has the effect of connecting someone who is looking for a “pencil drawing artist” to an “illustrator”, a “roofer” to a “builder” and so on.

The SkillGraph algorithms are continuously evolved based on research and continuous behavioral and data analysis of our rapidly growing user base.  The underlying data sets are not just based on public data sources such as Freebase, Wikipedia etc. but rather; the real world data of our members – the people who know most about their own skills.  Using techniques such as Map-Reduce, we are able to mathematically decompose and model the probability of relationships and relevance of domain specific vocabularies and indeed cultures.

Having access to this incredible data is of limited value if it cannot be quickly channeled back into the platform for the benefit of our users.  Consequently, the entire SkillPages platform is designed to natively leverage SkillGraph to ensure that both individual and wide scale reclassification actions are instantly applied whilst maintaining the integrity and quality of the repository via feedback loops, moderation and spam detection.

Relevance is key – everything a user does on SkillPages from adding a skill, opportunity, profile, searching, browsing and opportunity matching, these are all driven from SkillGraph.

Finding The Best Matches

SkillGraph is rapidly developing as the definitive catalogue of all the world’s skills. The more data it gets, the longer its tail stretches.  However, to continue to gratify our users, we must ensure that even those at the tip of this tail are found.  Take for example an opportunity looking for someone “to make an interesting piece of furniture as a centerpiece for a back garden”. This is where the power of SkillGraph really kicks in.  Using the initial vocabulary as a starting point, SkillGraph can find many different skills to which this relates.  It not only makes obvious suggestions like “Furniture Maker” or “Landscape Gardener”, but also less obvious ones like Stonemason and even the highly obscure like Pooktre Tree Shaper.

Building a system that can organize all human skills, capture their relations and make them searchable in a meaningful way is not trivial.  In addition to software engineering challenges like scalability and keeping the latency low it requires a good grasp in disciplines like statistical analysis and linguistics. However, it’s very rewarding to see members gain success through SkillPages and our SkillGraph – it makes all the hard work worthwhile.