The very model of a modern data platform
Data scientists have a tough job as it is, which is why they require a strong and effective analytics platform on which to work. Kelly Lu, a specialist in advanced analytics and artificial intelligence at SAS, unpacks the fundamentals such a platform should offer.
The prevailing view of data scientists is that their job is solely to build models, which is significantly oversimplifying the role, as there is much more to the job than merely that. In fact, most data scientists require a multitude of skills to do their work properly, as the job involves everything from data preparation to visualisation, and encompasses such skills as domain knowledge, communication, real-time deployment and much more. Add on the countless number of new techniques and algorithms that emerge daily, it's easy to understand why data science is a tough job.
Kelly Lu, a specialist in advanced analytics and artificial intelligence at SAS, explained that this means not only do they have to stay on top of the latest trends and industry knowledge, but essentially to be a jack of all trades and also a master of all of these. This is because they are generally expected to have expert knowledge in all the fields mentioned above.
"Since data scientists are required to have such a wide range of skills, it is only fair of them to expect that a modern data science platform is designed to complement these skillsets. Ideally, you would want a platform that helps you to accelerate the analytics life cycle," she says.
"What I mean is that in the initial preparation phase, the user needs that to be able to utilise any data, of any complexity, size or speed. This can include data from both traditional areas and emerging sources, like the social media platforms. Naturally, this means vast quantities of potential information, which is why you need a platform that can assist you to streamline your data preparation."
Lu adds that the second stage - discovery - is usually where the data scientists have the most fun, and that the right platform will assist them by offering both breadth and depth of analytics, as well as programming language flexibility. In the end, the aim is to deliver analytics that are accessible to everyone, even non-data scientists.
"While many data scientists enjoy the coding aspect of the job, there are others that prefer to use visual interfaces. This is why flexibility is so important in the platform being used, as it should ideally offer both options, while also facilitating an environment that is as open as possible. This will enable users to combine the ease of use of open source technologies with the massive processing capabilities offered by the proprietary solutions.
"Finally, when it comes to the third stage, which is deployment, modern data science platforms should be designed to enable this in a faster and easier manner – some systems even offer one-click deployment – which means there is no need for the user to deliver the model to the IT department for deployment."
She points out the other ability that is critical for a successful platform is for the data scientist to be able to build once and then deploy anywhere, as this will have a significant and positive effect on their company's bottom line.
"Finally, you want a platform that is intuitive to use, but still offers a help page if the user gets stuck, and one that offers drag-and-drop capabilities that affords the user the opportunity to play around and to try different things. Once again, with such capabilities available, it will enable the user to accelerate the creation and deployment of data science models.
"The above-mentioned criteria are what most data scientists would consider to be the fundamentals of what is required of a platform, but very few of the current ones can offer all of the above. And what it boils down to is if you don't have all these fundamentals in place, then even the smallest of issues can create a considerable delay. The good news for data scientists is that the latest data science platforms that are coming onto the market are not only more fun to work with, but do meet the criteria above - something that will certainly make data scientists' jobs the world over that much easier," she concludes.