Big data presents a new revenue source for Oracle as enterprises seek to gain insights on their operations from the social media, machine-generated data (manufacturing sensors, weblogs, smart meters) and traditional enterprise data (CRM systems, ERP data, web store transactions). McKinsey Global Institute estimates this data to grow at a rate of 40% yearly by 2020. This can already be demonstrated by existing systems with a single jet engine generating 10TB of data every 30 minutes, multiplying this with over 25,000 airline flights per day, data collected from a single source runs into the Petabyte scale. Big data as collected from social media sites comes at high velocity and to get useful insights this data is collected across a variety of sources.
The benefits of big data are many, take local government bodies, for instance, they can efficiently deliver essential services — including debris removal, public safety, street maintenance and education due to insights from big data. A good example is given by the New York City which has a limited number of building inspectors, the use of big data analysis speeds up decisions by prioritizing where to investigate – city workers can correlate a tax lien on a building with a nine-fold likelihood for a catastrophic fire.
Oracle Kenya gives a good demo of their Endeca software when used to analyse local twitter hashtags. Endeca is built for unstructured information analysis. In analysing twitter data, metrics would include: retweets, users, tweets, and average sentiments; a hash tag cloud aggregates trending topics (al shabaab, alcoblow, quail eggs), and general themes around which discussions are formed – negative (e.g. Al Shabaab) and positive (e.g. the Safari 7s). By analyzing these measures, organizations identify brand ambassadors (via influence analysis) and address negative feedback concerning their products and services. Filtering all this data to create useful information requires that data officers analyze tag information from different sources (social media, e-commerce sites, web logs) therefore identifying genuine and emergent themes.
A report on big data by Gartner shows that enterprises have made increased investment in the technology from 58% in 2012 to 64% in 2013. Big data solutions are becoming an essential part of the enterprise with 25% of big firms expected to have chief data officers (CDO’s) by 2015. The importance of big data solutions can be felt across the board, retailers for example can model customer behavior using social media data and web log files from their ecommerce sites to learn why the market prefers specific goods. This allows for effective micro customer segmentation and targeted marketing campaigns.
Platforms for big data span data acquisition, data organization and data analysis. Infrastructure requirements for each of these stages is as outlined:
- Acquisition: infrastructure required for this stage must deliver low, predictable latency in capturing data and executing short, simple queries. NoSQL databases are deployed at this stage, they can handle dynamic data structures and are highly scalable.
- Organization: infrastructure required this stage must handle processing and data manipulatation in the original storage location. A large variety of data formats is handled during organization (unstructured and structured).
- Analysis: infrastructure required should support statistical analysis and data mining, on a wider variety of data types stored in diverse systems. It should also integrate analysis on the combination of big data and traditional enterprise data.
Oracle’s approach in serving big data consumers is to offer an integrated solution providing infrastructure for all these 3 stages. Several organizations already utilize Oracle’s data warehousing solutions and the software giant provides them with an option to pivot into big data solutions by running their data silos alongside big data appliances.
Oracle’s solution for big data includes a big data appliance, nosql database, big data connectors and several analytics tools. An extensive description of these solutions as given in an Oracle whitepaper, “Oracle: Big Data for the Enterprise “. Oracle has a dearth of analytical software from which to choose from including Oracle Database (which currently has a new in-memory version for big-data solutions), Endeca and Exalytics.
Oracle Big Data Appliance
Oracle Big Data Appliance is an engineered system that combines optimized hardware with a comprehensive big data software stack to deliver a complete, easy-to-deploy solution for acquiring and organizing big data.
Oracle Big Data Appliance comes in a full rack configuration with 18 Sun servers for a total storage capacity of 648TB. Every server in the rack has 2 CPUs, each with 8 cores for a total of 288 cores per full rack. Each server has 64GB1 memory for a total of 1152GB of memory per full rack.Oracle Big Data Appliance includes a combination of open source software and specialized software developed by Oracle to address enterprise big data requirements.
The Oracle Big Data Appliance software includes:
- Full distribution of Cloudera’s Distribution including Apache Hadoop (CDH4)
- Oracle Big Data Appliance Plug-In for Enterprise Manager
- Cloudera Manager to administer all aspects of Cloudera CDH
- Oracle distribution of the statistical package R
- Oracle NoSQL Database Community Edition2
- And Oracle Enterprise Linux operating system and Oracle Java VM
Oracle NoSQL Database
Oracle NoSQL Database is a distributed, highly scalable, key-value database based on Oracle Berkeley DB. It delivers a general purpose, enterprise class key value store adding an intelligent driver on top of distributed Berkeley DB. This intelligent driver keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency. Unlike competitive solutions, Oracle NoSQL Database is easy to install, configure and manage, supports a broad set of workloads, and delivers enterprise-class reliability backed by enterprise-class Oracle support.The primary use cases for Oracle NoSQL Database are low latency data capture and fast querying of that data, typically by key lookup. Oracle NoSQL Database comes with an easy to use Java API and a management framework. The product is available in both an open source community edition and in a priced enterprise edition for large distributed data centers. The former version is installed as part of the Big Data Appliance integrated software.
Oracle Big Data Connectors
Where Oracle Big Data Appliance makes it easy for organizations to acquire and organize new types of data, Oracle Big Data Connectors tightly integrates the big data environment with Oracle Exadata and Oracle Database, so that you can analyze all of your data together with extreme performance. The Oracle Big Data Connectors consist of four components: Oracle Loader for Hadoop Oracle SQL Connector for Hadoop Distributed File System Oracle Data Integrator Application Adapter for Hadoop, Oracle R Connector for Hadoop, In-Database Analytics
[…]
Connections between Oracle Big Data Appliance and Oracle Exadata are via InfiniBand, enabling high-speed data transfer for batch or query workloads. Oracle Exadata provides outstanding performance in hosting data warehouses and transaction processing databases.
Now that the data is in mass-consumption format, Oracle Exalytics can be used to deliver the wealth of information to the business analyst. Oracle Exalytics is an engineered system providing speed-of-thought data access for the business community. It is optimized to run Oracle Business Intelligence Enterprise Edition with in-memory aggregation capabilities built into the system.Oracle Big Oracle Big Data Appliance, in conjunction with Oracle Exadata Database Machine and the new Oracle Exalytics Business Intelligence Machine, delivers everything customers need to acquire, organize, analyze and maximize the value of Big Data within their enterprise.
Recent additions to Oracle’s big data suite include Big Data Lite, a virtual machine for the big data platform. There is a great demand for applications that consume big data, and with this VM, Oracle is courting the developers. Applications developed on the VM can be ported to Oracle’s Big Data Appliance. Software bundled into the VM include Oracle Database 12c Enterprise Edition, Oracle Advanced Analytics, Oracle NoSQL Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator 12c, Oracle Big Data Connectors.
The Big Data Appliance also has some new add-ons: Kerberos and LDAP authentication, Oracle Audit Vault and Database Firewall. These can be let loose on Hadoop audit trails and model interesting activities and generate reports for administrators. An XQuery Connector has also been included with Oracle’s Hadoop offering, this can be executed in parallel across the Hadoop cluster. To scale performance in MapReduce jobs, Big Data Appliance now includes Perfect Balance.