Companies are increasingly turning to data science and data analytics solutions to leverage the sea of data for their own business.
The path to the data product - How you profit sustainably from Data Science
The data science project is defined. The proof of concept has been successfully implemented. And now? Unfortunately, this is often the end of the story. In fact, one of the greatest challenges lies in the operationalization of data science projects, i.e., in their successful transfer to productive business operations.
According to a study by Gartner, more than half of all data science projects are not fully operationalized. However, this step is the key to using data profitably in the long term and to achieving the actual added value of Data Science projects. To prevent you from falling into the same trap in the future, we would like to use this article to show you what operationalization is all about, highlight the differences between it and a Data Science project, and use the Data Science operationalization cycle to show you what to look out for and which questions need to be clarified in advance.
WHAT IS OPERATIONALIZATION?
Definition: The operationalization of Data Science projects describes the permanent integration of Data Science or analysis results into the IT infrastructure and operational business processes of a company. Operationalization thus refers to the continuous delivery of Data Science solutions to the end user, or as we say at pacemaker, the journey from data project to data product.
To get a better understanding of what data products can look like, here are a few examples from our own practice.
Forecasting: Development of a forecasting software for a logistics service provider for the online trade of a large fashion retailer to predict orders and returns and thus optimally plan e-commerce logistics.
P&C order and returns forecasting ➞
Product Insights: Development of an interactive dashboard for continuous evaluation of customer feedback and device data for a 360° view of product and service quality for a manufacturer and distributor of household electronics.
Dynamic Route Optimization: Development of dynamic route planning software for a pharmaceutical logistics company to save time and costs through optimal route planning and to ensure faster supply to hospitals and patients.
Data products can therefore be very different and create real added value in diverse application areas. It is important to note that it is not always a matter of transferring all use cases into a data product. Some data science projects, for example, only serve as a one-time basis for decision-making and do not require a permanently operated data product.
WHAT IS THE DIFFERENCE BETWEEN A DATA SCIENCE PROJECT AND AN OPERATIONALIZATION PROJECT?
A Data Science project and an operationalization project are actually two different things with different requirements and goals. A fact that many end users are often not aware of. But what exactly are the differences between the two?
While the focus of a Data Science project is mostly on the feasibility testing of certain use cases, the so-called proof of concept (POC), and the development of analysis models is in the foreground, operationalization is about developing Data Science software solutions that permanently integrate the analysis results into everyday business. The operationalization process therefore picks up where the Data Science project leaves off. At this point, a successful Data Science project becomes a software development project. The goal is to develop a software solution that meets the requirements of everyday business. It is therefore necessary to check whether the assumptions made in the POC also apply in the productive environment and using constantly updated data. This is also referred to as proof of scale (POS).
The diagram below provides an overview of the main differences between the underlying questions that need to be examined in the context of the two projects.
Comparison of the questions in the two test phases (source: own representation based on Gartner, 2020, Follow 4 Data Science Best Practices to Achieve Project Success).
Similar to the CRISP-DM approach for Data Science projects, successful implementation of data products also requires a systematic approach. In fact, the lack of a systematic operationalization methodology is one of the main failure reasons for successful productization, according to Gartner.
HOW DOES A SUCCESSFUL OPERATIONALIZATION PROCESS PROCEED?
The operationalization process is a continuous cycle. The graphic below provides an overview of the entire process. The individual steps are briefly explained in more detail below.
The operationalization cycle as a structured process for implementing data products (source: own representation based on Gartner, 2018, How to Operationalize Machine Learning and Data Science Projects).
MODEL DEPLOYMENT & APPLICATION INTEGRATION are used to ensure runnability and integration in the production environment. This step includes deciding on hosting (whether on-premises, in the cloud, or a hybrid solution) as well as defining how data will be delivered and how analytics results will be integrated. For the latter, the options range from integration with existing systems to a custom-developed application. Efficient data pipelines must be established to ensure the flawless flow of data.
If the desired results are not achieved, this can be due to various reasons. These may necessitate going through the Data Science project cycle again. Depending on the cause, it is decided at which step to start.
DATA DRIFT: A change in the data used to build the model and the data actually used in production operations may cause the models to underperform. If the underlying data changes, it may be necessary to go back to the point of data ingestion. In addition, a change in the data may be due to variations in data quality. Whereas in a data science project the data is cleaned manually during data preparation and every exceptional case can be taken into account, data preparation in a data product is largely automated. Problems in data quality therefore become all the more noticeable during operationalization. It can happen that the process has to go back to the point of reprocessing the data. To avoid this, it is important to take appropriate measures to permanently ensure data quality in advance.
Thus, as the comments show, operationalization is an ongoing project. After an initial set-up, the developed data product goes into regular operation, which is subject to continuous maintenance and support. Software as a Service (SaaS) models have proven their worth for this purpose.
If this systematic process is followed, a major hurdle of operationalization is overcome. Nevertheless, productification is and remains a complex process with very different challenges, both technical and organizational. In conclusion, we would therefore like to take a look at the main reasons responsible for failure.
WHY DO SO MANY DATA SCIENCE PROJECTS FAIL AT OPERATIONALIZATION?
Although the reasons may vary, the following crystallize as decisive:
- Discrepancy between training and production data.
- Poor integration of the solution into business processes or application.
- Mistrust and skepticism about using the solution.
- Incorrect composition of the team for operationalization.
To ensure that you don't fall into the same traps and that the implementation of your data product is successful, there are a few questions that should be addressed in advance. These include the following:
- By whom are the analysis results used in everyday life and how? That is, how must the analysis results be provided to best support the end user?
- How and how often is the data transferred? And how often is it necessary to update the results? Meaning, how is a flawless data flow, both on the data input and data output sides, ensured?
- How is it ensured that the analysis models function optimally? Meaning, how is the quality of the models monitored and how are deviations and deteriorations responded to?
Successful implementation requires the interaction of an interdisciplinary team. In addition to the business departments, a representative of the IT department should be involved on the company side, because he or she knows the IT infrastructure into which the data product is to be integrated better than anyone else.
The key to using data profitably in the long term and achieving the actual added value of data science projects lies in operationalization. If this step is successful, productivity can be increased, costs can be reduced, revenues can be increased, and ultimately profits can be increased. Therefore, it is important to select use cases at the beginning of a Data Science project with regard to their producibility and then to establish a systematic procedure to transfer them into permanent use. In this way, you will not only profit from a higher ROI of your Data Science initiatives, but also increase the acceptance of these topics in your own company.
WE DEVELOP YOUR DATA PRODUCT
As data science software experts, we at pacemaker are your contact for the realization of your data project and the development of your individual data product. We accompany you from the idea to the seamless integration into your IT infrastructure and operational business processes.
Contact us! To the project inquiry.