Two years ago, Pooja plunged into a world of unusual data. Since then, she has preserved in the science of problem-solving - carving out complex insights from massive databases. At Zapr, she has made great friends out of good mentors, all the while bridging gaps between big data, engineering and business insights.
When I first heard of Zapr, I got really curious about our TV viewership repository. I’d never heard of a technology that could track media consumption at such huge scale, in fact the largest in the world. So I moved cities, accepted the shock of not being able to communicate with everybody in Hindi (I was so used to it) and got on board:
Every single day there was a problem. And everyday the problem was new.
Getting the basics right:
My work primarily involves extracting TV viewership data and insights from Zapr’s 40 million+ user base. The first challenge was picking up what to learn - I had to figure out the crux of every request and the technical components that went into formulating a query. With no set formula or system in place, I realized that I was performing a single task multiple times to get similar results.
Agam, my then mentor, pointed out ways to tap into multiple resources and run things way more efficiently. For large numbers, I started using druid, a high-performance, column-oriented and distributed data storage platform which quickly processes aggregates by deriving approximations. I learnt that I could automate half of my data queries with Python. Next I figured out how to write bash shell scripts from our dev-ops team. Ad-ops helped me put together reports and turn them into insights for our clients. The best part was that I could ask anybody for help at any point and learn new tools on the go.
Unboxing goodies @Zapr Hackathon
The quest for a single-source solution: Automation the new mantra
Initially, each new request required tapping into various data stores such as Druid, Mongo, Hadoop and Redis to name a few. Over time, the number and complexity of our tasks increased, the volumes of data we were handling grew to 1TB each day with viewership records worth over 100 GB. When it’s just ten people, it’s okay to work in piecemeal. But when we scale up as rapidly as we’ve done at Zapr, things need to be automated and readily available.
That’s how we got around to building the data lake using hive which can pull out the most granular insights across Zapr’s entire repository. We worked for months and finally build a one-stop platform where both tech and non-tech teams could access data through a smooth single stop system.
Over a period of time, we have refined the way we project our TV viewership data over TV watching population. We reduced the turn-around time for our data insights projects, and improved the accuracy of the data. We started giving out very interesting and deep insights from the data available with us. Back then, the content team faced the most issues as client requests were always given priority. To solve this, I created a dashboard for them, and from then they started working on a new blog every alternate day!
Working across teams and bridging the gaps:
On a daily basis, I directly interact with both the Engineering and Revenue teams - it’s like getting the perspectives of both halves of the coin. I’m able to gauge business expectations and then relay it to our engineering teams to figure out back-end solutions. Initially, I made my own checklist to validate whatever data I worked on. Now I’m able to preempt many issues that could go wrong on the data pipeline, almost instinctively now!
The best part about working at Zapr is the positive vibes I get here.
There was this time when our founders had to present a report to Arnab Goswami. I knew it was really important, so despite it being an official holiday, I took the time and finished the project. Deepak, our COO, said I could ask for anything in return for the holiday I lost. I didn’t take it seriously and demanded for plenty of Gulab Jamuns, my favourite sweet. The next day, our pantry was filled with Gulab Jamuns for everybody!
(left to right) Pooja, Agam and Deepak at a 10k run
The future of Data Analytics at Zapr:
Our focus for 2018 is on scaling up our data insights capabilities that we take to our customers. We are leveraging Artificial Intelligence and automation to make our data and systems more scalable and robust. We are also integrating additional data sources apart from TV, like radio and cinemas; this expands the scope of media platforms we analyze and provides interesting opportunities to offer insights on the interrelation and consumption patterns across these different media.
All these great ideas have been made possible by the people here at Zapr. For a long time, I worked directly with Deepak (COO) and I have seen him give importance to work, but not more than the person. He constantly encourages our hobbies and interests outside of work. Sajo (CTO) mentored me personally through all the gritty work of understanding and building our data systems. The three founders of Zapr have always appreciated our small and big successes.
At Zapr, we’ve figured out incredibly complex systems and always kept the future in mind. Today our teams are moving towards more automation and highly streamlined work accompanied with plenty of positivity!
Pooja Sharma
Senior Data Analyst, Zapr Media Labs
Join Pooja and other analysts at Zapr; work with cutting-edge systems and carve out media consumption insights from the world’s largest data repository!
Check out our team page and get in touch with us: http://bit.ly/2zapr-team-careers
Also read: Bala goes from Intern to Data guru: The Evolution of a Software Engineer at Zapr
From Intern to Team Lead in Four years: Agam Jain on building Zapr’s tech from scratch
How Atreyee Dey redefined both Big-Data problems and Gender Norms