Hortonworks is a business computer software company based in Palo Alto, California. The company focuses on the development and support of Apache Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers. Architected, developed, and built completely in the open, Hortonworks Data Platform (HDP) provides Hadoop designed to meet the needs of enterprise data processing.HDP is a platform for multi-workload data processing across an array of processing methods from batch through interactive to real-time – all supported with solutions for governance, integration, security and operations.
- Completely Open: HDP is the only completely open Hadoop data platform available. All solutions in HDP are developed as projects through the Apache Software Foundation (ASF). There are NO proprietary extensions or add-ons required.
- Fundamentally Versatile: At its heart HDP offers linear scale storage and compute across a wide range of access methods from batch to interactive, to real time, search and streaming. It includes a comprehensive set of capabilities across governance, integration, security and operations.
- Wholly Integrated: HDP integrates with and augments your existing applications and systems so that you can take advantage of Hadoop with only minimal change to existing data architectures and skillsets. Deploy HDP in-cloud, on premise or from an appliance across both Linux and Windows.
Hortonworks Data Platform enables Enterprise Hadoop: the full suite of essential Hadoop capabilities that are required by the enterprise and that serve as the functional definition of any data platform technology. This comprehensive set of capabilities is aligned to the following functional areas: Data Management, Data Access, Data Governance and Integration, Security, and Operations.The core components of HDP are YARN and Hadoop Distributed File system (HDFS). YARN is the data operating system of Hadoop that enables you to process data simultaneously in multiple ways. YARN provides the resource management and pluggable architecture for enabling a wide variety of data access methods. HDFS provides the scalable, fault-tolerant, cost-efficient storage for big data.YARN provides the foundation for a versatile range of processing engines that empower you to interact with the same data in multiple ways, at the same time. This means applications can interact with the data in the best way: from batch to interactive SQL or low latency access with NoSQL. Emerging use cases for search and streaming are also supported with Apache Solr and Storm. Additionally, ecosystem partners provide even more specialized data access engines for YARN.HDP extends data access and management with powerful tools for data governance and integration. They provide a reliable, repeatable, and simple framework for managing the flow of data in and out of Hadoop. This control structure, along with a set of tooling to ease and automate the application of schema or metadata on sources is critical for successful integration of Hadoop into your modern data architecture.
Hortonworks has engineering relationships with all of the data management providers to enable their tools to work and integrate with HDP.
Security is woven and integrated into at HDP in multiple layers. Critical features for authentication, authorization, accountability and data protection are in place so that you can secure HDP across these key requirements. Consistent with approach throughout all of the enterprise Hadoop capabilities, HDP also ensures you can integrate and extend your current security solutions to provide a single, consistent, secure umbrella over your modern data architecture.Operations teams deploy, monitor and manage a Hadoop cluster within their broader enterprise data ecosystem. HDP delivers a complete set of operational capabilities that provide both visibilities into the health of your cluster as well as tooling to manage configuration and optimize performance across all data access methods. Apache Ambari provides APIs to integrate with existing management systems. Hortonworks has a thriving ecosystem of vendors providing additional capabilities and/or integration points. These partners contribute to and augment Hadoop with given functionality across Business Intelligence and Analytics, Data Management Tools and Infrastructure. Systems Integrators of all sizes are building skills to assist with integration and solution development.HDP provides the broadest range of deployment options for Hadoop: from Windows Server or Linux to virtualized Cloud deployments. It is the most portable Hadoop distribution, allowing you to easily and reliably migrate from one deployment type to another.HDP demonstrates our commitment to growing Hadoop and it’s sub-projects with the community and completely in the open. HDP is assembled entirely of projects built through the Apache Software Foundation. How is this different from open-source, and why it is so important? Proprietary Hadoop extensions can be made open-source simply by publishing to github. But compatibility issues will creep in, and as the extensions diverge from the trunk, so too does reliance on the extension’s vendor.
Community driven development is different. By combining the efforts of technologists across a diverse range of companies, the roadmap is stronger, and the quality deeper. In the long run community driven innovation will outpace that of any single company.Cloudera needs to immediately answer a number of fundamental questions and begin marketing its message aggressively to counter Hortonworks implicit criticisms. We believe Hadoop World 2011 is Cloudera’s best near term opportunity for this effort. Specifically, there are four areas in which we believe Cloudera must sharpen its marketing message.In addition, the company must make it clear why its strategy of becoming software versus a services company is the most viable approach for customers. If Cloudera can answer these and similar questions, and effectively communicate them to the market, it has a good chance of regaining the momentum Hortonworks has snatched from it. The stakes are high for both vendors. Big Data market will reach the multi-billion dollar level in the next five years and many tens of billions of dollars in market value are at stake.
Hadapt has developed the industries only Big Data analytic platform natively integrating SQL with Apache Hadoop. It is a cloud-optimized system offering an analytical platform for performing complex analytics on structured and unstructured data. Hadapt provides a framework, which integrates Hadoop structured, and unstructured data, BI tools and SQL without the requirement to ingest the Hadoop data. In theory this construct allows both operational and analytical systems to be operating at the same time on the same data sources. The Hadapt approach is a significant improvement over the current approach of connectors ingesting data into current data warehouse and BI solutions from Oracle, Teradata and many other vendors. The Hadapt demonstration and integration with Tableau is neat and illustrative of the power of integrating text and sentiment analysis on unstructured data linked with structured data.
The main business benefit from this approach is improved speed, efficiency and effectiveness in analyzing big data. The process is simplified with fewer steps, the original data is available to the BI analyst, and time to analysis results is improved. Hadapt’s Adaptive Analytical Platform is bringing together the performance and functionality of RDBMS technology with the scalability of Apache Hadoop. By taking advantage of the full power of the Map Reduce framework while also drawing on the latest relational database research, Hadapt’s platform addresses the most pressing “big data” issues facing enterprises today:
Acceleration of Data Growth: Business data is the raw material for informed strategic and operational decisions. The exponential growth of data dictates the need for advanced technology to manage and analyze large-scale datasets.
Insights from Structured & Unstructured Data: Valuable insights can arise from the combined analysis of structured and unstructured data. The current state of the art is an analytic database with a “Hadoop Connector”. Hadapt provides a fully integrated platform that eliminates these data silos.
Unparalleled Speed & Reliability: By leveraging the Map Reduce distributed computing framework, Hadapt provides fast analysis of massive datasets. Hadapt also provides cloud-ready fault tolerance, load balancing, and data replication, ensuring consistent performance in virtualized environments.
The Solution for RDBMS Scalability Hadapt’s patent-pending technology features a hybrid architecture that brings the latest advances in relational database research to the Apache Hadoop platform. RDBMS technology has advanced significantly over the past several decades, but current analytic solutions were designed prior to the advent of Hadoop and the paradigm shift from appliance based computing to distribute computing on clusters of inexpensive commodity hardware. Hadapt is based on Hadoop from the ground up, and offers an all-in-one system for structured, unstructured, and “multi-structured” data. The Solution for RDBMS Scalability Hadapt’s patent-pending technology features a hybrid architecture that brings the latest advances in relational database research to the Apache Hadoop platform. RDBMS technology has advanced significantly over the past several decades, but current analytic solutions were designed prior to the advent of Hadoop and the paradigm shift from appliance-based computing to distribute computing on clusters of inexpensive commodity hardware. Hadapt is based on Hadoop from the ground up, and offers an all-in-one system for structured, unstructured, and “multi-structured” data.
Hadapt helps to get faster Results, at Any Scale. Virtualized environments pose unique challenges for performing analytics on massive data sets. Shared computing environments are economical, but complex queries can perform poorly because of unpredictable node performance and availability. Current solutions plan queries in advance, leading to a high probability of query failure in larger clusters. Hadapt’s platform utilizes an Adaptive Query Execution process to automatically load-balance queries in virtualized environments, leading to faster and more predictable analyses.
The company’s core technology is a full integration of Hadoop with relational database technology. In contrast with the “Hadoop connector” strategy employed by many MPP analytic database vendors, Hadapt uses Hadoop as the parallelization layer for its query processing, with each node’s structured and unstructured data stored in an RDBMS and in HDFS, respectively. Consolidating data into a single platform drastically reduces TCO, eliminates data silos, and allows for richer analytics through consumption of diverse data types.
Hadapt’s universal SQL support allows enterprises to leverage existing BI tools and tackle enormous data sets. In addition, Hadapt consulting services help organizations address challenges in data integration and analytics, as well as identify opportunities to monetize their data. For example, deploying and integrating Hadapt with other data analysis tools, developing the business logic and user interface required to complement the Hadapt product, and forming an end-to-end business solution.
Dell Inc. – Hadoop solutions
Dell Inc is the multinational computer technology company based in round rock, Texas. It sells personal computers, servers, data storage devices. Network switches, software’s, and also electronics built by other manufacturers. The Dell Hadoop solution, offered in conjunction with Cloudera and called Dell Cloudera Solution for Apache Hadoop, lowers the barrier to adoption for businesses looking to use Hadoop in production. Dell’s customer-centered approach is to create rapidly deployable and highly optimized end-to-end Hadoop solutions running on commodity hardware. Dell provides all the hardware and software components and resources to meet the customer’s requirements and no other supplier need be involved.
The hardware platform for the Dell | Hadoop solution is the Dell PowerEdge C Series. Dell PowerEdge C Series servers are focused on hyperscale and cloud capabilities. Rather than emphasizing gigahertz and gigabytes, these servers deliver maximum density, memory, and serviceability while minimizing total cost of ownership. It’s all about getting the processing customers need in the least amount of space and in an energy-efficient package that slashes operational costs.
Apache Hadoop is an open-source framework that scales from a single computer to thousands of servers and offers high availability in the software, rather than in the hardware. Hadoop, a leading technology in open source, provides the software library you need to store, process and analyze large amounts of structured data and unstructured data. Functionality of Hadoop has following points.
- Streamlined Deployment: Cloudera, Enterprise and Dell Crowbar enable enterprises to manage the complete operational lifecycle of their Apache Hadoop systems.
- Increased Efficiency: The Dell Power Edge C series is feature and power-optimized to provide owner TCO in addition to saving on space and energy, it is an ideal platform for high performance computing.
- Low-Risk: Tested, validated, and supported by Dell.
- Complete Support: Led by Dell, customers will have access to comprehensive and collaborative service and support for the entire solution, from installation, configuration, and deployment to optimization and tuning with existing IT infrastructures.
Recent developments combining in memory technologies and Hadoop/Map Reduce from Scale out Software point to a future where big data analytics and real-time processing, as it’s defined in the financial markets, could meet. Scale Out has just released its ScaleOuthServer V2, an in-memory data grid, which it claims can boost Hadoop performance by 20x, and can make it suitable for processing ‘live data’ to deliver ‘real-time analytics’. While ScaleOut is today looking to boost Hadoop performance to make applications that used to take hours and minutes to execute run now in minutes and seconds, the performance trajectory could well follow that of the low-latency space, where milliseconds gave way to microseconds, and now nanoseconds
With all large environments, deployment of the servers and software is an important consideration. Dell provides best practices for the deployment of Hadoop solutions. These best practices are implemented through a set of tools to automate the configuration of the hardware, installation of the operating system (OS), and installation of the Hadoop software stack from Cloudera. As with many other types of information technology (IT) solutions, change management and systems monitoring are a primary consideration within Hadoop. The IT operations team needs to ensure tools are in place to properly track and implement changes, and notify staff when unexpected events occur within the Hadoop environment. Hadoop is a constantly growing, complex ecosystem of software and provides no guidance to the best platform for it to run on. The Hadoop community leaves the platform decisions to end users, most of whom do not have a background in hardware or the necessary lab environment to benchmark all possible design solutions. Hadoop is a complex set of software with more than 200 tunable parameters. Each parameter affects others as tuning is completed for a Hadoop environment and will change over time as job structure changes, data layout evolves, and data volume grows. As data centers have grown and the number of servers under management for a given organization has expanded, users are more conscious of the impact new hardware will have on existing data centers and equipment. When the original MapReduce algorithms were released, and Hadoop was subsequently developed around them, these tools were designed for specific uses. The original use was for managing large data sets that needed to be easily searched. As time has progressed and as the Hadoop ecosystem has evolved, several other specific uses have emerged for Hadoop as a powerful business solution.
- Large Data Sets – MapReduce paired with HDFS is a successful solution for storing large volumes of unstructured data.
- Scalable Algorithms – Any algorithm that can scale to many cores with minimal inter-process communication will be able to exploit the distributed processing capability of Hadoop.
- Log Management – Hadoop is commonly used for storage and analysis of large sets of logs from diverse locations. Because of the distributed nature and scalability of Hadoop, it creates a solid platform for managing, manipulating, and analyzing diverse logs from a variety of sources within an organization.
- Extract-Transform-Load (ETL) Platform – Many companies today have a variety of data warehouse and diverse relational database management system (RDBMS) platforms in their IT environments. Keeping data up to date and synchronized between these separate platforms can be a struggle. Hadoop enables a single central location.
Dell documents all the facility requirements, including space, weight, power, cooling, and cabling for Dell’s defined reference architectures and provides this information to customers prior to any system purchase. This information allows you to make informed decisions regarding the best placement of your Hadoop solutions and the ongoing operational costs associated with them. The Dell |Cloudera Hadoop solution addresses the challenges discussed in this white paper in a variety of ways to ensure streamlined deployment of low-risk solutions. The Dell |Hadoop solution includes the tools necessary to manage the environment’s configuration from a single location and monitor the environment for unexpected changes. As part of the Dell | Hadoop solution, Dell provides reference architectures and associated benchmarks tostreamline the Hadoop design process and minimize the risk to performance bottlenecks. This information allows you to make informed decisions regarding the best placement of your Hadoop solutionand the ongoing operational costs associated with the solution. Dell provides results from lab tests and recommendations for Hadoop tunable parameters for your use in streamlining the time necessary to take a Hadoop installation from bare hardware to a production, operational environment. Dell has worked with Cloudera to design, test, and support an integrated solution of hardware and software for implementing the Hadoop ecosystem. This solution has been engineered and validated to work together and provide.
While the Big Data market experienced healthy growth in 2013 thanks to maturing technology and vendor support, barriers to adoption in the enterprise remain. While not an exhaustive list, these barriers include a lack of best practices for integrating Big Data analytics into existing business processes and workflows. A still volatile and fast developing market of competing Big Data vendors and, though to a lesser degree in 2013, competing technologies and frameworks. A lack of polished Big Data applications designed to solve specific business problems. Dell listened to its customers and designed a Hadoop solution that is fairly unique in the marketplace. Dell’s end-to-end solution approach means that the customer can be in production with Hadoop in shortest time possible. The Dell | Hadoop solution embodies all the software functions and services needed to run Hadoop in a production environment. The customer is not left wondering, “What else is missing?” One of Dell’s chief contributions to Hadoop is a method to rapidly deploy and integrate Hadoop in production. Other major contributions include integrated backup, management, and security functions. These complementary functions are designed and implemented side-by-side with the core Hadoop core technology.
Forecasts Big Data market growth to slow slightly in 2014 to 53%, reaching $28.5 billion for the year. Looking ahead, the Big Data market is currently on pace to top $50 billion in 2017, which translates to a 38% compound annual growth rate over the six year period from 2011. Big Data practitioners will generate significantly more value than Big Data vendors; there is significant opportunity for those vendors that can deliver Big Data solutions that speak to business rather than technical value. Finally the biggest growth inhibitors for the Big Data market are security and privacy concerns.