A study by Gartner It is true that more than half of all data science projects are not fully operationalized. However, this step is the key to using data profitably in the long term and achieving the actual added value of data science projects. So that you don't fall into the same trap in the future, we want to use this article to show what operationalization is all about, highlight the differences to a data science project and use the data science operationalization cycle to show you what to look out for and which questions need to be clarified in advance.
What is operationalization?
Definition: The operationalization of data science projects describes the permanent integration of data science and analysis results into a company's IT infrastructure and operational business processes. Operationalization therefore refers to the continuous provision of data science solutions for end users or, as we say at pacemaker, the path from data project to data product.
To get a better understanding of what data products can look like, here are a few examples from our own practice.
Forecasting: Development of forecasting software for a logistics service provider for the online retail of a large fashion retailer in order to predict orders and returns and thus optimally plan e-commerce logistics.
P&C order and returns forecast ➞
Product insights: Development of an interactive dashboard for continuous evaluation of customer feedback and device data for a 360° view of product and service quality for a manufacturer and distributor of household electronics.
Dynamic route optimization: Development of dynamic route planning software for a pharmaceutical logistics specialist in order to save time and costs through optimal route planning and to ensure faster care for hospitals and patients.
Data products can therefore be very different and create real added value in various areas of application. It is important to note that it is not always a matter of translating all use cases into a data product. Some data science projects, for example, only serve as a unique basis for decision-making and do not require a permanently operated data product.
What is the difference between a data science project and an operationalization project?
A data science project and an operationalization project are actually two different things with different requirements and goals. A fact that many end users are often unaware of. But what exactly is the difference between the two?
While the focus of a data science project is usually on testing the feasibility of certain use cases, the so-called proof of concept (POC), and the development of analysis models, operationalization is about developing data science software solutions that permanently integrate the analysis results into everyday business life. The operationalization process therefore picks up where the data science project ends. At this point, a successful data science project therefore becomes a software development project. The aim is to develop a software solution that meets the requirements of everyday business life. It is therefore important to check whether the assumptions made within the framework of the POC also apply in the productive environment and using constantly updated data. In this context, this is also referred to as a so-called proof of scale (POS).
The chart below provides an overview of the main differences between the underlying issues that need to be examined as part of the two projects.
The successful implementation of data products requires, similar to CRISP-DM procedure for data science projects, as well as a systematic approach. In fact, the lack of a systematic operationalization methodology is one of the main reasons for successful productification, according to Gartner.
How does a successful operationalization process work?
The operationalization process is an ongoing cycle. The chart below gives an overview of the entire process. The individual steps are briefly explained in more detail below.
Model Activation marks the start and transition to the operationalization cycle. Here, the results of a data science project are handed over in the form of a “production-ready” model. The data scientists are now handing over the results of their work to the team of software engineers.
Model Deployment & Application Integration serve to ensure operability and integration in a productive environment. This step includes deciding on hosting (whether on-premises, in the cloud, or a hybrid solution), defining the data delivery and how to integrate the analysis results. With the latter, the options range from integration into existing systems to a specially developed application. It is important to set up efficient data pipelines that ensure a flawless data flow.
Production Audit & Model Behavior are used for technical monitoring of performance and data science results. In this step, monitoring and notification mechanisms are therefore implemented which systematically draw attention to deviations or other special features. This can be done through ad-hoc analyses or through continuous recording and monitoring of predefined metrics, such as response and running time or the accuracy of the analysis results. For example, a notification can be sent as soon as a certain threshold of a defined quality characteristic falls below a certain threshold. All of these measures serve to ensure trust in the solution and ensure that continuous added value is achieved.
KPI Validation serves as a link between the data science and operationalization cycle. Everything here is about comparing the requirements of the solution with actual results from a user and business perspective. These KPIs were already defined at the start of the data science project as part of Business Understanding defined. This step is important because ensuring that continuous business value is achieved is paramount.
If the desired results are not achieved, this may be due to various reasons. These can require a new run of the data science project cycle. Depending on the cause, it is decided which step must be taken.
Business drift: The model is not delivering the desired business value. This may also be due, among other things, to changing market conditions, which do not necessarily have a direct impact on the data. In this case, it may be advisable to reassess the originally defined KPIs.
Data drift: Changing the data that was used to build the model and the data that is actually used in production can result in the models not performing as intended. If the underlying data changes, it may be necessary to go back to data recording. A change in the data may also be due to discrepancies in data quality. While in a data science project, the data is cleaned manually as part of data preparation and can be taken into account in any exceptional case, the data preparation of a data product is largely automated. Data quality problems are therefore all the more serious, especially when it comes to operationalization. As a result, the process may have to go back to data preparation again. To avoid this, it is important to take appropriate measures in advance to ensure data quality over the long term.
Concept Drift: Variations in model quality result in reduced model performance. If radical changes in the patterns monitored in the data become noticeable, it may be necessary to go back to modelling and manually revise the analysis models.
As the explanations show, operationalization is therefore a permanent project. After an initial set-up, the developed data product goes into regular operation, which is subject to continuous maintenance and support. For this purpose, so-called Software as a Service (SaaS) models proven.
If this systematic process is followed, a major hurdle of operationalization is overcome. However, productification is and remains a complex process with very different challenges, both technical and organizational. Finally, let us therefore take a look at the main reasons that are responsible for the failure.
Why do so many data science projects fail due to operationalization?
Even though the reasons may vary, the following emerge as decisive:
- Discrepancy between training and production data.
- Poor integration of the solution with business processes or applications.
- Distrust and skepticism about using the solution.
- Wrong composition of the operationalization team.
So that you don't fall into the same traps and successfully implement your data product, there are a few questions that should be clarified in advance. These include the following:
- Who uses the analysis results and in everyday life? In other words, how must the analysis results be provided in order to best support the end user?
- How and how often does the data transfer take place? And how often is it necessary to update the results? In other words, how is a flawless data flow ensured, both on the part of data input and data output?
- How do you ensure that the analysis models work optimally? In other words, how is the quality of the models monitored and how are deviations and deteriorations responded to?
Successful implementation requires the cooperation of an interdisciplinary team. In addition to the specialist departments, a representative from the IT department should be involved on the part of the company, because he or she knows the IT infrastructure into which the data product is to be integrated, hardly anyone else.
Conclusion
Operationalization is the key to using data profitably in the long term and achieving the actual added value of data science projects. If this step is successful, productivity can be increased, costs reduced, sales increased and ultimately profit increased. It is therefore important to select use cases with regard to their productifiability right from the start of a data science project and then to establish a systematic procedure to transfer them into long-term use. In the end, you not only benefit from a higher ROI of your data science initiatives, but also increase acceptance of these topics within your own company.
We develop your data product
As data science software experts, we at pacemaker are your partner for implementing your data project and developing your individual data product. We support you from the initial idea to seamless integration into your IT infrastructure and operational business processes.