The data science project is defined. The proof of concept has been successfully implemented. And now? Unfortunately, this is often the end of the...
Data Science – Make or Buy?
Companies are increasingly turning to data science and data analytics solutions to leverage the sea of data for their own business.
The question raised in connection with data science is therefore now primarily no longer a question of whether, but rather of how. This is also confirmed by the results of current studies: The main reasons that prevent companies from using these advanced technologies include, first and foremost, a lack of expertise and personnel.
Anyone who wants to benefit from the far-reaching advantages is therefore inevitably confronted with the fundamental question of how to obtain the necessary skills. Basically, there are two possibilities: First, building up data science resources internally (make) or second, purchasing external data science services (buy). Not an easy decision. In the following article, we would like to take a closer look at the two options, focusing in particular on the advantages and challenges.
OPTION 1: MAKE
The make decision means developing the required data science skills internally. This can be done either by training existing employees or by hiring data experts. The word data expert is deliberately used in the plural here. This is because, due to the complexity of data science initiatives, a team of specialists often works together on such a project. In addition to the data scientist, this includes, for example, the data engineer, the machine learning engineer and the software developer. You can find a brief overview of what is behind the individual roles in our blog article "The right mix of roles and competencies for a successful Data Science project".
MAKE DECISION: WHAT ARE THE ARGUMENTS IN FAVOR?
There are clear benefits to building internal data science resources. The main reasons include the following:
INTERNAL COMPETENCE BUILDING.
Long-term competencies are built up within the company that can be called upon at almost any time. Data Science competencies can thus be used in all areas of the company and, assuming a good company network, also across the board.
The company's own employees know the business as well as an outsider rarely can. This is particularly advantageous when implementing use cases that require very specific domain knowledge.
Since data science resources are often scarce, it makes sense to have a data strategist in the company who can specifically prioritize internal needs and the associated use of resources according to the data strategy. This starts with questions about the most important data sources in the company and extends to the prioritization of concrete use cases to be implemented. In addition, to really get started with Data Science, you need the right tools (both in terms of IT infrastructure and technology stack). Here, too, it is advantageous if the selection of the right tools is made strategically by a responsible person.
Regardless of the advantages, the internal development of data science capacities also brings with it certain challenges.
MAKE-DECISION: WHAT MAKES BUILDING A DATA SCIENCES TEAM SO HARD?
In order to achieve the hoped-for value of an in-house data science team, there are a number of hurdles to overcome, which we would like to briefly outline below.
SHORTAGE OF SKILLED AND INEXPERIENCED STAFF.
Hiring data experts and building your own data science resources is no easy feat given the shortage of skilled staff. In fact, the number of job openings for IT professionals cracked a new record high in 2019, with more than 124,000 open positions. First and foremost, these professionals include software developers and also Data Scientists. So those looking to build their own Data Science capabilities are in a battle for talent.
It is even more difficult to recruit experienced data experts with several years of professional experience. The reason? The job descriptions are simply still relatively new, and universities and colleges are for the most part only now beginning to train Data Scientists.
On the other hand, training and further education of existing employees offer an alternative that may involve an investment of time, but not necessarily a large financial outlay. In this context, the term coined by Gartner is often used to describe the development of subject matter experts into so-called Citizen Data Scientists. The term describes people who have a basic understanding of working with data as well as a certain statistical and mathematical knowledge, and are thus able to take on certain analytical tasks of the Data Scientist. Citizen Data Scientists do not replace Data Scientists, but they can take on a complementary role and, most importantly, work more productively with data experts.
PUTTING THE RIGHT TEAM TOGETHER
When building and assembling your Data Science resources, you should also consider the state of your company's existing IT infrastructure and the framework already in place. All these things have an impact on the required roles and the size of the team.
For example, if your company already has a mature data management system and you have already set up a well-maintained data warehouse, the basic requirements and the starting point for the work of a data scientist have already been created. If this is not the case, the data is often still decentralized in the respective source systems. So before a Data Scientist can get started, the appropriate framework conditions must first be created in this case. This task requires corresponding experts, whom you can of course hire, but this in turn increases the size of the team and thus the costs.
INTEGRATION OF THE DATA SCIENCE TEAM IN THE COMPANY
Anyone who decides to set up their own data science team also faces the question of how the team should be integrated within the company. There are a number of options for this, ranging from completely decentralized integration of the experts in the respective specialist department to the establishment of a central unit in the form of a Center of Excellence (CoE), for example. Whether centralized or decentralized integration is the right option depends on various factors. Recommendations often differ here, and it is important to weigh up the advantages and disadvantages of the various approaches individually for your company.
The decision to build up your own data science resources is primarily a strategic one. Therefore, the type of integration and the required expertise in the form of the various data experts ultimately depends significantly on the goal pursued, i.e., the data strategy of your entire company.
OPTION 2: BUY
The alternative to an own department is the purchase of external services. The possibilities here range from purchasing standard software as a finished product to individual software that is developed tailor-made according to your needs. Of course, this also includes consulting and data science projects that are carried out on a one-off basis to answer a specific question.
BUY DECISION: WHAT ARE THE ARGUMENTS IN FAVOR?
Purchasing external services provides quick access to the far-reaching benefits of data science solutions, usually at a fraction of the cost and, more importantly, time that in-house development would require. However, the reasons in favor of the buy approach are far more diverse. We have briefly summarized the most important ones:
First of all, one major advantage is clearly that external service providers have the sought-after data experts and can bring the optimal team and thus the right mix of required competencies to each project. In addition, external service providers can usually draw on many years of project experience with diverse customers and thus have a range of best practices at their disposal.
By working with external experts, you can learn a lot on both a technical and methodological level, for example with regard to the development and evaluation of use cases or the moderation of data workshops.
Often, certain, usually very specific competencies, such as for setting up a data lake or in the area of deep learning, are only needed sporadically. Instead of building up these competencies internally, it is more economical to buy them externally.
Another advantage of the buy option is the very short start-up time. Projects can start virtually immediately and the onboarding process for implementing standardized solutions often takes place within a few weeks.
FASTER ROI THROUGH SCALING
Particularly when starting out in Data Science, initial expenses are often necessary, e.g. for setting up a data warehouse or data lake, before the value-creating implementation of Data Science use cases can be started. In order to achieve a faster ROI, the use case implementation can be scaled with an external service provider so that, for example, one use case is implemented internally and one or directly several are implemented externally in parallel.
In contrast to the make option, the buy approach has a significantly lower capital commitment, as no major investment in personnel, tech stack and infrastructure is required.
External service providers are the ideal way to get started and make initial contact with data science. With the help of service providers, use cases can be quickly identified and certain structures established. By implementing projects that are often small and manageable at the beginning, experience and trust can be built up and acceptance for the topic can be created in your company. You can find out here how these projects are then mastered in the company:
BUY-DECISION: WHAT CHALLENGES ARE THERE TO CONSIDER?
The buy approach also brings with it certain challenges that must not be ignored. Specifically, the following two factors are decisive here:
CHOOSING THE RIGHT PARTNER
Choosing the right Data Science service provider is the biggest and most important decision and therefore a challenging task. A service provider or vendor should not only have excellent Data Science competencies, but at best also bring comprehensive industry and subject matter expertise. It is therefore advisable to ask for reference projects to find out who the service provider has worked with in the past, what specific challenges were solved and what results were achieved.
When purchasing external data science services, the following always applies: The better the service provider knows your company, your strategic goals, your internal processes, etc., the better the chances of success. A good service provider therefore takes time for mutual exchange and listens carefully in order to implement your wishes and ideas in the best possible way.
ONLY LIMITED COMPETENCE DEVELOPMENT
One disadvantage of the buy approach is that the company's own competence development is naturally lower and therefore there is a certain dependence on external partners. However, it should be noted here that good service providers pay great attention to knowledge transfer and promote the development of knowledge through targeted support.
As with so many things, there is no universal answer to the make or buy question. The simple answer is: If data science plays a central role in the corporate strategy and is part of the core business, then make is the right way to go. In most cases, however, the goal is to implement solutions that serve to improve internal business processes or support non-core business activities. In these cases, buy and the use of already proven products is usually the better option.
The long answer to the question is that finding the right solution depends on your strategic goals, your requirements, and your time and budget constraints. To determine the best option for your organization, you should consider the following factors, among others:
- Your business problem to solve
- The analytical maturity of your company
- Your availability of employees with appropriate skills
- The urgency to implement the solution
- Your available budget
And our final recommendation? It does not always have to be a make OR buy decision. Our experience has shown that the make AND buy path often proves promising: Meaning building internal competencies that support daily needs and relying on external support for more complex Data Science projects.