With the completion of Hadoop World 2012 in New York City, it was no surprise that there would be a flurry of announcements from Cloudera and of course, exciting news for Apache Hadoop practitioners!
Beyond the move to a new logo and tagline (which is quite nice) and a brand new Cloudera website design (very nice!), Cloudera’s biggest announcement focused on what had traditionally been a weakness for open source Apache Hadoop: its focus on batch-processing queries. Enter Impala, a real-time query engine that allows users to directly query data in HDFS and HBase through Hive SQL. And the beauty of it all is that Impala is 100% open-source code and currently available as a beta release. Expect the production release to hit in early 2013.
In another announcement during Hadoop World, Cloudera unveiled a new “Introduction to Data Science” class that, again, is focused on the analytics portion of Big Data. This Data Scientist class is a bit different in that it is mostly hands-on as is the certification process. In a two-step certification path, candidates will have to first pass a written exam before taking on the second part, a performance-based, hands-on evaluation in developing a recommendation system. More details will be available in early 2013 for this Data Scientist certification path.
As you can see, Cloudera is making a concerted effort to develop a complete Big Data solution stack starting with the core Apache Hadoop infrastructure, a developer/programming layer for ETL, and then the final business intelligence and analytics layer for providing actionable information. While Apache Hadoop may be an amazing technology platform, the need for talented individuals who are able to utilize this new Big Data toolset and derive insight and action from archived organizational information is the crucial piece.
Businesses buy technology to automate and reduce costs or to drive competitive advantages and deliver better services and products. Data scientists who are able to pull together both the technology infrastructure and business knowledge compose the final layer needed to really drive value out of a Big Data/Apache Hadoop deployment, and Cloudera knows that, given their recent announcements. So yes, a new Cloudera logo, new Cloudera tools, and new Cloudera training, it’s a brave new world. Cloudera may now be challenging us to “Ask Bigger Questions,” but thankfully they’re also delivering better tools and expertise to find better answers!