DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations. From a process and methodology perspective, DataOps applies Agile software development, DevOps and the statistical process control used in lean manufacturing, to data analytics.
In DataOps, development of new analytics is streamlined using Agile software development, an iterative project management methodology that replaces the traditional Waterfall sequential methodology. Studies show that software development projects complete significantly faster and with far fewer defects when Agile Development is used. The Agile methodology is particularly effective in environments where requirements are quickly evolving -- a situation well known to data analytics professionals.
DevOps focuses on continuous delivery by leveraging on-demand IT resources and by automating test and deployment of analytics. This merging of software development and IT operations has improved velocity, quality, predictability and scale of software engineering and deployment. Borrowing methods from DevOps, DataOps seeks to bring these same improvements to data analytics.
Like lean manufacturing, DataOps utilizes statistical process control (SPC) to monitor and control the data analytics pipeline. With SPC in place, the data flowing through an operational system is constantly monitored and verified to be working. If an anomaly occurs, the data analytics team can be notified through an automated alert.
DataOps is not tied to a particular technology, architecture, tool, language or framework. Tools that support DataOps promote collaboration, orchestration, agility, quality, security, access and ease of use.
DataOps was first introduced by Lenny Liebmann, Contributing Editor, InformationWeek, in a blog post on the IBM Big Data & Analytics Hub titled "3 reasons why DataOps is essential for big data success" on June 19, 2014. The term DataOps was later popularized by Andy Palmer at Tamr. DataOps is a moniker for "Data Operations." 2017 was a significant year for DataOps with significant ecosystem development, analyst coverage, increased keyword searches, surveys, publications, and open source projects.
The volume of data is forecast to grow at a rate of 32% CAGR to 180 Zettabytes by the year 2025 (Source: IDC). DataOps seeks to provide the tools, processes, and organizational structures to cope with this significant increase in data. Automation streamlines the daily demands of managing large integrated databases, freeing the data team to develop new analytics in a more efficient and effective way.
DataOps embraces the need to manage many sources of data, numerous data pipelines and a wide variety of transformations. DataOps seeks to increase velocity, reliability, and quality of data analytics. It emphasizes communication, collaboration, integration, automation, measurement and cooperation between data scientists, analysts, data/ETL(extract, transform, load) engineers, information technology (IT), and quality assurance/governance. It aims to help organizations rapidly produce insight, turn that insight into operational tools, and continuously improve analytic operations and performance.
Individuals and organizations supporting DataOps have produced a DataOps manifesto, consisting of 18 DataOps principles, which summarize the mission, values, philosophies, goals and best practices of DataOps practitioners.
The DataOps Engineer orchestrates and automates the data analytics pipeline, promotes features to production and automates quality. In many organizations, the DataOps engineer is a separate role. In others, it is a shared function.
Manage research, learning and skills at defaultlogic.com. Create an account using LinkedIn to manage and organize your omni-channel knowledge. defaultlogic.com is like a shopping cart for information -- helping you to save, discuss and share.