Since Harvard Business Review* named the data scientist the “sexiest job” of the 21st century, they have often been regarded as universal geniuses. However, much more than just a data scientist is needed to really generate added value from data and to make a data science project a success. It requires the interplay of different roles, abilities and competencies. We'll tell you which roles are needed, who is responsible for what and what is behind the new data jobs.
Which roles and competencies are required in a data science project?
Carrying out a data science project is a complex process with multiple phases and steps that requires the collaboration of an entire team. In order to successfully carry out all tasks of a data science project, various roles with different specialized knowledge are required. A role does not necessarily correspond to one person. Especially in smaller projects, several roles are usually taken on by one person. The graphic below gives an overview of the required roles and when they are needed during a data science project.
The main roles of every data science project include the employees of a specialist department (the domain experts), the data scientist, the data engineer and the software engineer. Your tasks within a data science project are explained in more detail below.
Would you like to know more about how a typical data science project works and what special features a data science project entails? Here we explain to you how to make your data science project a success step by step.
What does a data scientist do?
As a problem solver and solution developer, the data scientist plays a central part of the entire data science project. The data scientist works closely with specialist departments as well as with management and must identify, abstract and implement their needs. Your job is to make use of the available mass of data. To do this, data scientists must be able to deal with just those large and heterogeneous amounts of data by extracting, purifying, merging, processing and analyzing data from different sources, with or without (exploratory).
In a more specific sense, the data scientist is primarily responsible for a well-thought-out analysis strategy, the selection of the methodology, the execution of the analyses and the interpretation and visualization of the results. As far as the analysis part is concerned, data scientists are experts in the application of methods from the fields of artificial intelligence and machine learning. In addition to the actual analysis of the data, data scientists are also familiar with data collection and integration as well as the development of software products. In order to be able to support company management in making future business decisions, a basic set of “soft skills” is also required, meaning that communication and presentation skills must not be neglected by a data scientist. It is important to make complex analysis procedures understandable for specialist departments and, above all, to present the resulting benefits in a way that is easy to understand.
Due to the variety of tasks, the responsibilities of a data scientist are often divided into more specific roles, especially in larger projects, including the machine learning engineer (specialized in training and optimizing machine learning models) as well as the data engineer.
What does a data engineer do?
The data engineer has a more technical focus than the data scientist and provides the required IT infrastructure. A data engineer is often a specialization within software engineering. As part of a data science project, data engineers are particularly active in the early and late phases, as they create interfaces to the relevant systems. Data engineers therefore work primarily with databases and data warehousing tools and are at home in big data ecosystems and cloud environments. At the beginning of a data science project, they take care of bringing together, processing, enriching and making all necessary data from the various sources available for subsequent analysis steps. At the end of a data science project, they are then responsible for ensuring the seamless and lasting integration of analysis results into day-to-day business operations and processes.
What does a software engineer do?
The software engineer comes into play as soon as it is time to convert an analytical prototype into a data product. To achieve this, a successful data science project is therefore followed by a software development project. A software engineer works closely with user experience engineers and designers to design and develop user-friendly applications and software solutions so that users can permanently benefit from the analysis results. In addition, there is a close exchange with the data engineers to ensure a flawless flow of data into the software solution and back again.
What does a domain expert do?
The domain experts are usually not analysis experts, but they know the problem, the needs and the context of the issue better than anyone else. Depending on the application, this could be, for example, a marketing expert, a supply chain manager or a machine manufacturer. They are particularly important at the start of a project (business understanding) to ensure that the underlying business problem is well understood. However, your input is also extremely important in the phases of data understanding and data preparation.
What does a project manager do?
The project manager plans and coordinates the entire process of the data science project. In addition to traditional project management skills, this requires a good understanding of the technical and methodological features of a data science project as well as knowledge of the application domain. Agile project management is particularly suitable for meeting the specific requirements of a data science project. An agile approach enables regular exchange and feedback between stakeholders, in particular data scientists and domain experts, throughout the process. The advantage?
- Data scientists are learning more and more about their day-to-day business and special features through exchange with the specialist departments. This is crucial in order to be able to represent reality in model form in the best possible way.
- The department is always up to date and also gets insights into the steps and challenges of a data science project, which serves to transfer knowledge and build up know-how.
Conclusion
Data science is teamwork! For a data science project to be successful, it is important to have the right team composition: Each role brings a wide range of competencies that complement each other and thus make a valuable contribution to achieving the joint goal. Putting together an interdisciplinary team is therefore decisive for success, but not everything. It depends on agile cooperation between the individual areas of responsibility and during the various phases, because the phases are not self-contained, but are closely intertwined. Transparency, understanding of others' tasks, transfer of knowledge and communication on equal footing therefore play a decisive role. In this way, the potential of data treasures is developed cooperatively.