ZAPR, being India’s largest media consumption repository, has a huge amount of data regarding what people are watching on TV. The viewership data is time-series in nature. Queries such as top shows, reach of a particular TV commercial and other aggregated numbers are frequent. As our queries are OLAP in nature we decided to go with Druid, the best in providing online OLAP query interface.
Druid at Zapr
Druid is an open source column-oriented, distributed database, developed by Metamarkets. Druid is mostly used in business intelligence and OLAP applications to analyze high volumes of real-time and historical data.
At Zapr, we get 100s of millions of events every day. We use Druid extensively for interactive analysis on those events and empower a lot of dashboards to show data to our clients across time and different dimensions. We make use of Druid to figure out trends in media consumption as well.
The Problem
Druid is generally queried by JSON queries. Since we make use of Druid to drive our dashboards, we write a lot of JSON queries for different dashboards. The code for creating JSON becomes repetitive- every time we have a different code base we end up writing the same code again. Also, when we start adding more queries to our code, it becomes difficult to understand what exactly is happening in the query. We have to apply type and spell-check for all code bases as it is quite easy to make data type and format mistakes. A simple mistake here or there and we have to go back and forth in the code to debug it. Creating JSON queries sometimes cause tedious bugs and bigger, messier and unclear codes.
Motivation For Druidry
With all this trouble in creating JSON Druid queries, we decided to make a library, “Druidry” which takes care of the same. The developer’s main focus should be to write the correct query and not on the type and spell checks.
Druidry makes deep meaningful conversations with Druid easier. Using Druidry, developers don’t need to write JSON queries on their own but use simple Java-based query-generator to write it for them.
Introduction to Druidry
Here, at ZAPR, we developed an in-house library, Druidry, which is a Java-based Druid query generator. Druidry automatically takes care of
- Type Checking
- Spelling Checks
- Code reviewability and readability
As Druidry takes care of above things, you as a developer will not have to worry about the simple mistakes you do when creating JSON queries. You just need to focus on the correctness of the query.
Seeing the ease of use and readability improvements Druidry provides, we open-sourced the same to contribute to the ever-growing and friendly community of Druid.
Druidry – Structure Design
Every query and its components have the corresponding Java class. For generating a query, the developer needs to instantiate the specific class and the POJO is converted to JSON by the Jackson library.
E.g. for Aggregations, we have DruidAggregator class and every type of aggregation extends DruidAggregator. So, LongSumAggregator, DoubleSumAggregator etc will extend DruidAggregator.
Class Design:
Druid queries are divided into the following:
- Aggregation Queries
Time series
TopN
GroupBy
- Metadata Queries
Time Boundary
Segment Metadata
Datasource Metadata
- Search
To understand the design of Druidry, knowledge of Jackson JSON library is prerequisite.
- Every Druid component has its own:
Package: if they have different type or subtype
Class: Single type. Every main property in druid has an Abstract Class in java which is extended by all the sub-types
- For required properties in Druid, we use @NotNull annotation so that developer doesn’t make any unknown mistakes.
- Constructors accept at most 3 query arguments. For more arguments, Builder Pattern is used, as it is extensible and readable.
- For date and time properties, we use Joda’s Date and Time library to take input. To get date and time in the correct format, as per druid’s preferences, we have getter functions which are annotated by @JsonValue.
- All the classes only include Non-Null values when converted to JSON. We ensure this with the annotation @JsonInclude(JsonInclude.Include.NON_NULL) on every class.
- We use easily understandable names for all the fields in all classes. But to convert them to required Druid key we use @JsonProperty annotation. E.g. functionAggregate field is annotated by @JsonProperty(“fnAggregate”).
- For any time – property which is not in common format, we use getter functions to generate the format required by Druid.
Way forward
Druidry does not support all the queries supported by Druid. Druidry currently supports the most common queries used at Zapr. Druid keeps on increasing the type of queries from time to time. So we would appreciate developers to contribute to Druidry or let us know if you want support for more queries or features in the Issues section.
It would be excellent if Druidry transforms from a simple query-generator to a full Druid java client library which would take care of end-to-end functionality of querying from Druid.