Choosing which film to watch at the end of a long day can get to the best of us. S. Anand, an avid movie-watcher, decided to do something about it. After exhausting the IMDb top 250 list, this data scientist by day spent nights increasing his movie database. Over several months, Anand decided to chart every film name he could find since the early 1920s into software. Result—he had all the films one might ever want to watch on a single platform. He’s made it public (https://gramener.com/imdb/?#). Now, if you’re in a fix over what comedy to watch at night, all you need to do is type in the genre, the era and choose your pick.
While looking at the social and economic demography of a locality for a client who wanted to open a convenience store in the area, Laveesh Bhandari of Indicus Analytics realised that the income of the store would increase if it sold wine. The reason: many residents in the upmarket area were wine-drinkers. A detailed analysis of all the shops revealed that none were selling wine in the area. Residents had to go to stores further away from home to pick up their stock. Today, that shop is doing roaring business.
Sounds simple? Well, both Anand and Bhandari are smart guys. They are data scientists—mathematicians, statisticians, computer engineers, all rolled into one. Those belonging to this tribe tease out insights and patterns from big numbers. In doing so, they uncover opportunities and correlations people might not have seen or suspected before. Working behind the scenes, these data nerds draw out the essence of how crowds behave.
Laveesh Bhandari, founder of Indicus Analytics
That’s why the insurance salesman calling you knows exactly what product to sell you. It is thanks to targeting and technology that all the advertisements that pop up on your browser somehow almost perfectly fit your requirements. Similarly, a data-cruncher will help a restaurant opening shop in a locality to know exactly how much to charge for a burger. Using all these numbers, these data-crunchers have become adept at working the system (see graphic to catch their tips to get ahead of the crowd).
Most scientists in the profession have devised these shortcuts even in their personal lives. Abhishek Vaid, the co-founder of Frrole, a data analytics firm based in Bangalore, uses his expertise to find the best deals for gadgets online and reveals that whenever a website has a sale, the best deals are always available on the competitor’s websites. Another area could be managing one’s finances. Vivek Murugesan, a data scientist based in Chennai, was having trouble with his finances. He just couldn’t figure out how and where his money was being spent. So using his analytical skills, he charted each and every transaction he ever made in the past three years to understand his spending patterns. He could cut costs substantially after that.
Data science is one of the fastest growing industries in India with an expected net worth of $2.3 billion by 2017, according to a report published by NASSCOM. India currently holds over 35 per cent share in the global analytics market, which is expected to only grow bigger in the future. As the industry grows, companies will look at more innovative ways of selling products. “Uncovering trends in a certain economy is always important as it helps plan ahead,” says Vaid. “The main job of a data scientist is to find such trends and get a more nuanced look at what the future holds.”
Neeraja Vaidya, analytics engineer at Aureus Analytics
A data scientist’s strength also comes in looking beyond the obvious. For instance, Vaid was spearheading a research project for a phone company. The focus was discovering the features customers look for in their smartphones. While conducting the analysis, they stumbled upon something strange. The data revealed that most smartphone users in tier 2 and tier 3 cities consider 4G to be an inherent aspect of the phone instead of a service offered by the network carrier. The findings, when submitted to the client, became their next campaign, a hugely successful one at that.
The rise in demand for such specialists has also been unprecedented. Puneet Vanvaria, CEO, Corner Office Advisors, an HR consultancy firm, says demand for such professionals will increase by almost three times in the next three years. “There is already a huge gap between demand and supply. An average scientist with 2-3 years of experience is today paid anywhere between Rs 15 to 25 lakh per annum,” he says.
Quite unlike other professions, data science believes that quantity is much better than quality. The idea is simple—the more data there is to play around with, the more correlations one can draw. Anand, who works in Gramener, says that the aim of a data scientist is not to draw one single conclusion from the data but as many conclusions as possible. “We routinely analyse correlations between every parameter,” he says. He explains this with a personal example. While analysing exam marks in Tamil Nadu, he found by chance that north Indian surnames such as Jain, Shah and Agarwal score significantly higher (85 per cent) than the average (65 per cent). “This was one among dozens of other less useful correlations, and was not something that we were looking for but was more insightful than any of the other patterns.”
Software systems play a huge role in making such large-scale computations possible. While several different systems for analytics are available—Python, R and Julia to name a few—most companies prefer developing their own systems. Each data analytics company deals with a different set of clients with different needs. Processing systems that compute data need to be made self-sufficient and capable of running on auto pilot for more efficient output.
S. Anand, chief data scientist, Gramener
Incidentally, quantity, along with being preferred, also proves to be a big drawback in analytics. That’s not surprising given the huge debate over the quality of GDP numbers in India. “The wide mass of data available to us means that we have to clean it in order to start analysis,” says Bhandari, now chief economist at Neilsen, post its acquisition of Indicus. Most companies collect data either from a primary source or the company they offer services to. It becomes imperative for scientists to ensure the data used to run algorithms and reach conclusions is accurate. That’s why Frolle and Gramener employ teams to gather and assess data from primary sources such as social media and sometimes, even conduct research studies to extract information.
The danger is that irregular data can often result in inaccurate conclusions and hence clients getting wrong advice. Bhandari remembers an instance where the data they were using had not been cleaned, resulting in a correlation of a certain piece of furniture with the price of pork in the market. “There was no reason for it to happen and it did not make sense, but the data simply gave that conclusion,” he chuckles, explaining that one must never underestimate the level of inaccuracies data may possess. For this reason, most scientists always use tools to adjust data for variation before running analytics on it.
Correlation between data is used to target customers. So, if you search for an aftershave on an e-commerce site, it is likely you will be served up male grooming product advertisements. However, if one studies a set of data long enough, threats of different correlations are bound to crop up. For instance, companies have found correlation between the per capita consumption of fruit to the number of civil engineer doctorates awarded in the country!
As a result, data science becomes dependent on domain knowledge. “Specialisation makes it much easier for an analyst to understand which data is important,” says Neeraja Vaidya, a data scientist at Aureus Analytics in Bangalore. Though, Piyush Sagar Mishra, apprentice leader at Mu Sigma, says that while expertise is desirable, making the right correlations depends more on asking the right questions.
The competition has grown so exponentially that companies have now started their own colleges for data analysts. “The gap between demand and supply for a data analyst is about 1,000 to one and this is primarily because of the lack of well-trained analysts in the country,” says Murugesan. Mu Sigma already runs an analyst school and companies such as Microsoft and Gramener have followed suit.
For all the buzz, there is a lot that plagues the data industry. According to Bhandari, the biggest problem is that the IT crowd, which develops software and computation methods, remains “extremely lazy”. But all said, if there’s one way to figuring out what makes us do what we do—data science should give us some answers. Hopefully.