Data Science

The path to the data product - How you profit sustainably from Data Science

The data science project is defined. The proof of concept has been successfully implemented. And now? Unfortunately, this is often the end of the story. In fact, one of the greatest challenges lies in the operationalization of data science projects, i.e., in their successful transfer to productive business operations.

Subscribe

According to a study by Gartner, more than half of all data science projects are not fully operationalized. However, this step is the key to using data profitably in the long term and to achieving the actual added value of Data Science projects. To prevent you from falling into the same trap in the future, we would like to use this article to show you what operationalization is all about, highlight the differences between it and a Data Science project, and use the Data Science operationalization cycle to show you what to look out for and which questions need to be clarified in advance.

 

Definition: The operationalization of Data Science projects describes the permanent integration of Data Science or analysis results into the IT infrastructure and operational business processes of a company. Operationalization thus refers to the continuous delivery of Data Science solutions to the end user, or as we say at pacemaker, the journey from data project to data product.

To get a better understanding of what data products can look like, here are a few examples from our own practice.

Forecasting: Development of a forecasting software for a logistics service provider for the online trade of a large fashion retailer to predict orders and returns and thus optimally plan e-commerce logistics.

P&C order and returns forecasting ➞

Product Insights: Development of an interactive dashboard for continuous evaluation of customer feedback and device data for a 360° view of product and service quality for a manufacturer and distributor of household electronics.
 
Dynamic Route Optimization: Development of dynamic route planning software for a pharmaceutical logistics company to save time and costs through optimal route planning and to ensure faster supply to hospitals and patients.

Data products can therefore be very different and create real added value in diverse application areas. It is important to note that it is not always a matter of transferring all use cases into a data product. Some data science projects, for example, only serve as a one-time basis for decision-making and do not require a permanently operated data product.

 WHAT IS THE DIFFERENCE BETWEEN A DATA SCIENCE PROJECT AND AN OPERATIONALIZATION PROJECT?

A Data Science project and an operationalization project are actually two different things with different requirements and goals. A fact that many end users are often not aware of. But what exactly are the differences between the two?

While the focus of a Data Science project is mostly on the feasibility testing of certain use cases, the so-called proof of concept (POC), and the development of analysis models is in the foreground, operationalization is about developing Data Science software solutions that permanently integrate the analysis results into everyday business. The operationalization process therefore picks up where the Data Science project leaves off. At this point, a successful Data Science project becomes a software development project. The goal is to develop a software solution that meets the requirements of everyday business. It is therefore necessary to check whether the assumptions made in the POC also apply in the productive environment and using constantly updated data. This is also referred to as proof of scale (POS).

The diagram below provides an overview of the main differences between the underlying questions that need to be examined in the context of the two projects.

Comparison of the questions in the two test phases (source: own representation based on Gartner, 2020, Follow 4 Data Science Best Practices to Achieve Project Success).

Similar to the CRISP-DM approach for Data Science projects, successful implementation of data products also requires a systematic approach. In fact, the lack of a systematic operationalization methodology is one of the main failure reasons for successful productization, according to Gartner.

 HOW DOES A SUCCESSFUL OPERATIONALIZATION PROCESS PROCEED?

The operationalization process is a continuous cycle. The graphic below provides an overview of the entire process. The individual steps are briefly explained in more detail below.

The operationalization cycle as a structured process for implementing data products (source: own representation based on Gartner, 2018, How to Operationalize Machine Learning and Data Science Projects).

MODEL ACTIVATION marks the starting signal and transition to the operationalization cycle. This is where the proverbial handover of the results of a Data Science project takes place in the form of a "production-ready" model. This means that the Data Scientists now hand over the results of their work to the team of Software Engineers.
 
MODEL DEPLOYMENT & APPLICATION INTEGRATION are used to ensure runnability and integration in the production environment. This step includes deciding on hosting (whether on-premises, in the cloud, or a hybrid solution) as well as defining how data will be delivered and how analytics results will be integrated. For the latter, the options range from integration with existing systems to a custom-developed application. Efficient data pipelines must be established to ensure the flawless flow of data.
 
PRODUCTION AUDIT & MODEL BEHAVIOR serve the technical monitoring of the performance and the Data Science results. In this step, monitoring and notification mechanisms are therefore implemented to systematically draw attention to deviations or other peculiarities. This can be done by ad-hoc analysis or by permanent recording and monitoring of predefined metrics, such as response time, runtime, or accuracy of analysis results. For example, a notification can be sent as soon as a defined quality characteristic falls below a certain threshold. All these measures serve to ensure confidence in the solution and that continuous added value is achieved.
 
KPI VALIDATION serves as a liaison between the data science and operationalization cycle. Here, everything revolves around the comparison of the requirements for the solution with the actual results from the user and business perspective. These KPIs were already defined at the beginning of the Data Science project as part of the Business Understanding. This step is important because the top priority is to ensure that ongoing business value is achieved.

If the desired results are not achieved, this can be due to various reasons. These may necessitate going through the Data Science project cycle again. Depending on the cause, it is decided at which step to start.
 
BUSINESS DRIFT: The model is not delivering the desired business value. Among other things, this may be due to changing market conditions that do not necessarily have a direct impact on the data. In this case, it may be advisable to re-evaluate the originally defined KPIs.
 
DATA DRIFT: A change in the data used to build the model and the data actually used in production operations may cause the models to underperform. If the underlying data changes, it may be necessary to go back to the point of data ingestion. In addition, a change in the data may be due to variations in data quality. Whereas in a data science project the data is cleaned manually during data preparation and every exceptional case can be taken into account, data preparation in a data product is largely automated. Problems in data quality therefore become all the more noticeable during operationalization. It can happen that the process has to go back to the point of reprocessing the data. To avoid this, it is important to take appropriate measures to permanently ensure data quality in advance.
 
CONCEPT DRIFT: Deviations in model quality lead to reduced model performance. If radical changes in the patterns monitored in the data become apparent, it may be necessary to go back to the modeling step and manually revise the analysis models.

Thus, as the comments show, operationalization is an ongoing project. After an initial set-up, the developed data product goes into regular operation, which is subject to continuous maintenance and support. Software as a Service (SaaS) models have proven their worth for this purpose.
 

If this systematic process is followed, a major hurdle of operationalization is overcome. Nevertheless, productification is and remains a complex process with very different challenges, both technical and organizational. In conclusion, we would therefore like to take a look at the main reasons responsible for failure.

 WHY DO SO MANY DATA SCIENCE PROJECTS FAIL AT OPERATIONALIZATION?

Although the reasons may vary, the following crystallize as decisive:

  • Discrepancy between training and production data.
  • Poor integration of the solution into business processes or application.
  • Mistrust and skepticism about using the solution.
  • Incorrect composition of the team for operationalization.

To ensure that you don't fall into the same traps and that the implementation of your data product is successful, there are a few questions that should be addressed in advance. These include the following:

  • By whom are the analysis results used in everyday life and how? That is, how must the analysis results be provided to best support the end user?
  • How and how often is the data transferred? And how often is it necessary to update the results? Meaning, how is a flawless data flow, both on the data input and data output sides, ensured?
  • How is it ensured that the analysis models function optimally? Meaning, how is the quality of the models monitored and how are deviations and deteriorations responded to?

Successful implementation requires the interaction of an interdisciplinary team. In addition to the business departments, a representative of the IT department should be involved on the company side, because he or she knows the IT infrastructure into which the data product is to be integrated better than anyone else.

 FAZIT

The key to using data profitably in the long term and achieving the actual added value of data science projects lies in operationalization. If this step is successful, productivity can be increased, costs can be reduced, revenues can be increased, and ultimately profits can be increased. Therefore, it is important to select use cases at the beginning of a Data Science project with regard to their producibility and then to establish a systematic procedure to transfer them into permanent use. In this way, you will not only profit from a higher ROI of your Data Science initiatives, but also increase the acceptance of these topics in your own company.

WE DEVELOP YOUR DATA PRODUCT

As data science software experts, we at pacemaker are your contact for the realization of your data project and the development of your individual data product. We accompany you from the idea to the seamless integration into your IT infrastructure and operational business processes.

Contact us! To the project inquiry.

Similar posts

Bleiben Sie immer auf dem neusten Stand

In unserem Blog dreht es sich um Themen rund um Data Science und KI.