For a few years now, data science has captivated the minds of tech nerds and non-nerds alike. The amount of data that is being generated is staggering. You have probably seen this chart or something like it.
This explosion has led to folks diving headlong into what is now called “data science” but many people don’t consider them more than glorified statisticians.
What is happening!!.
To be fair, people are calling themselves data scientists is fair because for the first time:
-
We have generated so much data that meaningful insights can be drawn easily from the sheer volume even with ordinary models. Some, in fact, say that it’s your data more than your model that makes your the winner.
-
The volume of data blowing up has coincided with the appearance of powerful GPUs, an article this blog wrote about earlier. This meant making computing tasks faster and started involving Silicon Valley as a whole into the subject that mainly was statistician territory.
-
That being said, a lot of math concepts that were considered unimportant in the nineties due to the absence of points 1 and 2 have resurfaced. Machine learning, for example, has been around since the sixties. AI, as a concept, has been investigated by famous mathematicians like Alan Turing, after whom is named the famous Turing Test.
All these factors combined to make data science a discipline composed of three main areas: computer programming (for creating models), statistics (for the investigation of data metrics) and math (machine learning logic). This combination, hitherto unavailable, has made this profession very sought after.
So what all does a data scientist do?
Some of the terms mentioned earlier can now be looked at closer. Let’s start from the simple to the complex:
-
Simplifying or automating simple tasks like combining lots of Excel sheets into one program script that does the same task in seconds that you may take hours.
-
Cleaning or manipulating data from one form to another. Again, this might take lots of passes from one software to another but may be very easy with data science software. Like taking a blog post like this and putting it in a giant table or counting how many times I said “data” or some other ‘transformation’ task.
-
Creating metrics of business data, like a webpage or dashboard at a restaurant office showing how many orders have been placed so far, the average duration of completion of the order and average price. This is the more statistical part of it but with code and user interface design, it can be made easy and intuitive.
-
Making a simple forecast model of say, the stock price for next week from historical stock data of a given company. This is the simplest AI with a basic regression machine learning model.
-
Natural language processing or word classification or image recognition with neural networks, which are specialized versions of machine learning that imitate the human brain cells.
Sounds pretty cool, right?
It kinda is, actually. But where does all this go if it’s so hyped right now? Well, what goes up, must come down. But don’t get me wrong, data science is here to stay. Some of you may not remember how hot Enterprise Resource Programs (ERPs) were in the nineties. Companies like SAP, Baan, etc. trained thousands of individuals with their software for use in large firms. Are ERPs still used? Yes. Is ERP still white hot? No. Similarly, MS Excel was the best package for calculating in the same time period. And may it still is, but no one calls themselves an Excel expert anymore since it is so common. Here are signs that this is happening to data science:
-
Data science tools are becoming drag-and-drop style interfaces rather than intimidating command line ones. For example, Tableau and Alteryx
-
Education for data science is getting democratized. The subjects like machine learning and deep learning are being taught at minimal prices and at the other end, full-fledged degrees are being offered by Ivy league schools.
-
Professionals who are not from statistical, CS and math majors are also taking it up since information is no longer esoteric or too fundamental (though fundamental research is taking place)
Now, data science is a vast subject, whereas Excel is a tool, so it’s not that it will be completely democratized but the fact remains that in the future, a grandmother may tap a button on her smartphone to predict how likely it is that her grandchild will visit her using a neural network.
The future sounds very analytical and robot-like? Don’t worry, these technologies will offer our brains more time and energy things that AI isn’t able to do yet: introspect and contemplate.