Classification of data-science related jobs
Data science is progressively making its way into every sector of activity. Open positions for data scientist or related are increasing and the demand it vastly superior to what the market has to offer.
However, the specter of the job title Data scientist is very wide and encompass almost orthogonal jobs. Some of them require basic knowledge of Microsoft Excel and some others having deployed and used a large Spark cluster. Some good answers have already been given on Quora.
In this article, I will try to classify some of jobs related to data-science. This classification is functional and may not be adapted to every company.
This chart has 2 dimensions. The x axis represents the raw data distance and helps to construct the distinction between the 3 main classes: engineering, science and business. The y axis represents the “bare-metal” distance, for a lack of a better word. It represents the closeness to the underlying infrastructure used to process and store raw data. The arrows represent the most common interactions between jobs.
A business analyst makes decisions based on a very high level view of the data in the form of nice plots and tables. This job is a bit out of the scope of “data-science” jobs. Nevertheless, he is far from the raw data and from any technological consideration on how the data is stored, replicated and analyzed. On the opposite, a data-engineer has to optimize the storage and retrieval of raw-data in a NoSQL database (for instance).
In this chart, the term research scientist is a synonym of academic researcher but for a company. The job consists of reading the literature, to be informed of the latest scientific discoveries and create new models tested against standard benchmarks that could lead to a publication in a conference or journal. While the model can be tested on real company data, the principal goal of a research scientist is to contribute to the general scientific knowledge.
We introduce the distinction between research engineer and data scientist here. The actual scalable implementation of a new algorithm/model in the company is done by the research engineer. The research engineer implements both from the research scientist and the data scientist. In our definition, the data scientist develops new models and algorithms to extract new insights from the data at hand. His job is to give a competitive edge to the company and to report to either the data analyst or the business analyst.
This concludes the first on the subject. In a future article I’ll try to illustrate what are the different skills each job requires with my PhD work.