Hadoop – the early years
The origins of Apache™Hadoop® go as far back as 2003 in reference to the emergence of a new file system, followed by the introduction of MapReduce and the birth of Hadoop in 2006. It achieved notoriety and fame as the fastest system to sort a terabyte of data and when it became an Apache open source project (Apache Hadoop) it sent a signal that it was ready for prime time. The world never looked back. Within IT shops and even board rooms there was huge interest, excitement – even hype with suggestions that it might replace the enterprise warehouse.
A quick Hadoop refresh
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Maturity, BigInsights platforms and the move to cloud
All technologies move through a maturity cycle. Hadoop is no exception and is maturing fast. IBM® saw the opportunity to build enterprise ready mission critical Hadoop based solutions delivering its BigInsights™ portfolio which adds significant value to the core Apache Hadoop open source software. IBM helped lead and drive the Open Data Platform initiative (ODPi) [see ODPi.org] to encourage interoperability across different Hadoop vendors.
Built around a no-charge open source based core which includes Apache Spark™ called the IBM Open Platform with Apache Hadoop (IOP), IBM BigInsights brings a rich set of capabilities from advanced and high performance analytics such as BigSQL, to visualization through BigSheets all neatly brought together to meet the needs of different personas. IBM BigInsights is cited as being a leader in The Forrester Wave™: Big Data Hadoop Distributions, Q1 2016 available from Forrester.
Usage of Hadoop in general varied widely across customers – some having multiple thousand node clusters and others just 5 or 10. Having 100s or thousands of nodes might be off-putting for some customers in terms of capital and management costs. With BigInisghts having established itself as a leader and with IBM focused on a Cloud First Strategy, we saw the opportunity to help customers reduce these capital and management costs, to enable them to focus on running the analytics for business advantage while providing BigInsights on a dynamic elastic and scale out infrastructure in the cloud through IBM SoftLayer and Bluemix technologies from any of our many data centers around the world.
The following report cites IBM as a leader : “The Forrester Wave™ : Big Data Hadoop Cloud Solutions, Q2 2016” which states :
“IBM differentiates BigInsights with end-to-end advanced analytics. IBM BigInsights runs atop IBM’s SoftLayer cloud infrastructure and can be deployed on any of 17 global data centers. IBM’s client relationships require it to be flexible in how it offers Hadoop in the cloud and offer highly customized configurations. IBM is making significant investments in Spark, offering data science notebooks that run with the platform. Enterprises using IBM’s data management stack will find BigInsights a natural extension to their existing data platform. The company has also launched an ambitious open source project, Apache SystemML, from its newly minted Spark Technology Center. IBM’s customers value the maturity and depth of its Hadoop extensions, such as BigSQL, which is one of the fastest and most SQL-compliant of all the SQL-for-Hadoop engines. In addition, BigQuality, BigIntegrate, and IBM InfoSphere Big Match provide a mature and feature-rich set of tools that run natively with YARN to handle the toughest Hadoop use cases.”
The report shows IBM scored among the highest in the solution configuration, data security, data management, development, cloud platform integration, ability to execute, road map, professional services, fixes and partnerships criteria.
To conclude, there has never been a better time to invest in your BigInsight projects whether on-prem or in the cloud. The IBM Cloud First strategy is helping customers better manage their costs and focus on delivering business value and insight. IBM can help abstract the complexities of managing infrastructures in a highly performing, highly available, security-rich and elastic scale-out environment across 17 worldwide multi-tenant data centers. IBM BigInsights, combined with making data easy and our leadership and investment in Apache Spark, is helping deliver a next generation analytics platform capable of advanced analytics, machine learning, streaming, powerful SQL, graph analytics and more.
For more information on IBM BigInsights or to get started on BigInsights on Cloud click here.
Dinesh Nirmal – Vice President, IBM Analytics Development
Follow me on Twitter: @dineshknirmal
TRADEMARK DISCLAIMER: Apache, Apache Hadoop, Hadoop, Apache Spark, Spark and the Spark logo are trademarks of The Apache Software Foundation. IBM, IBM BigInsights, BigInsights are trademarks of the IBM Corporation.