ZAPR, being India’s largest media consumption repository, has a huge amount of data regarding what people are watching on TV. The viewership data is time-series in nature. Queries such as top shows, reach of a particular TV commercial and other aggregated numbers are frequent. As our queries are OLAP in nature we decided to go with Druid, the best in providing online OLAP query interface.
Druid is an open source column-oriented, distributed database, developed by Metamarkets. Druid is mostly used in business intelligence and OLAP applications to analyze high volumes of real-time and historical data.
At Zapr, we get 100s of millions of events every day. We use Druid extensively for interactive analysis on those events and empower a lot of dashboards to show data to our clients across time and different dimensions. We make use of Druid to figure out trends in media consumption as well.
Druid is generally queried by JSON queries. Since we make use of Druid to drive our dashboards, we write a lot of JSON queries for different dashboards. The code for creating JSON becomes repetitive- every time we have a different code base we end up writing the same code again. Also, when we start adding more queries to our code, it becomes difficult to understand what exactly is happening in the query. We have to apply type and spell-check for all code bases as it is quite easy to make data type and format mistakes. A simple mistake here or there and we have to go back and forth in the code to debug it. Creating JSON queries sometimes cause tedious bugs and bigger, messier and unclear codes.
With all this trouble in creating JSON Druid queries, we decided to make a library, “Druidry” which takes care of the same. The developer’s main focus should be to write the correct query and not on the type and spell checks.
Druidry makes deep meaningful conversations with Druid easier. Using Druidry, developers don’t need to write JSON queries on their own but use simple Java-based query-generator to write it for them.
Here, at ZAPR, we developed an in-house library, Druidry, which is a Java-based Druid query generator. Druidry automatically takes care of
As Druidry takes care of above things, you as a developer will not have to worry about the simple mistakes you do when creating JSON queries. You just need to focus on the correctness of the query.
Seeing the ease of use and readability improvements Druidry provides, we open-sourced the same to contribute to the ever-growing and friendly community of Druid.
Every query and its components have the corresponding Java class. For generating a query, the developer needs to instantiate the specific class and the POJO is converted to JSON by the Jackson library.
E.g. for Aggregations, we have DruidAggregator class and every type of aggregation extends DruidAggregator. So, LongSumAggregator, DoubleSumAggregator etc will extend DruidAggregator.
Druid queries are divided into the following:
Time series
TopN
GroupBy
Time Boundary
Segment Metadata
Datasource Metadata
To understand the design of Druidry, knowledge of Jackson JSON library is prerequisite.
Package: if they have different type or subtype
Class: Single type. Every main property in druid has an Abstract Class in java which is extended by all the sub-types
Druidry does not support all the queries supported by Druid. Druidry currently supports the most common queries used at Zapr. Druid keeps on increasing the type of queries from time to time. So we would appreciate developers to contribute to Druidry or let us know if you want support for more queries or features in the Issues section.
It would be excellent if Druidry transforms from a simple query-generator to a full Druid java client library which would take care of end-to-end functionality of querying from Druid.