Over the past few years, we have seen many vendors in the digital analytics industry standardising on JSON as the format of choice for capturing and processing event data. In the same period, as the number of tools and applications that companies are using for digital analytics has grown, the need to plan & iterate how the data will be structured to help consumers work more effectively with it has led many to adopt JSON schema.
In the past, when you had a single system generating, processing and driving value from digital event data, schemas were less important; the single system simply needed to be able to meet the needs of its users. But in the current environment, schemas have emerged as a critical enabling technology. They facilitate the effective sharing of data from the many systems producing event data across multiple apps and platforms, to the many systems consuming that data.
Iglu was developed back in 2014 to address this need. By “decoupling” Iglu from the core Snowplow technology, our hope was that other companies would be able to adopt it to facilitate the definition of their event data, and drive the processing of data on their pipelines. However, the adoption of Iglu as the common standard for defining events in the wider digital industry hasn’t happened to date.
Recently, we have been prompted to revisit this idea following discussions with the teams at Mixpanel and Iterative.ly.
In the current environment, a company might work with multiple vendors to generate and process event data (e.g. Iterative.ly, Mixpanel and Snowplow). Is it possible that by all leveraging Iglu as an enabling technology, a company could define their events once as a set of Iglu schemas, and have those same schemas power Iterative.ly, Mixpanel and Snowplow?
Having to define & evolve a schema for an event multiple times in multiple places is cumbersome and can cause inconsistencies; enabling organisations to focus on getting that definition right once seems like a worthwhile goal, especially since the number of technologies a company works with for digital data is expanding all the time.
The conversations to date with the teams at Mixpanel and Iterative.ly have been very enlightening. They have helped us to identify how the Iglu standard does not meet all their needs, and we have started to explore how the standard could be extended to better meet them as well as the needs of the myriad of vendors working with event data.
We have discussed a number of ways to evolve the technology to facilitate broader adoption, including the following potential initial steps:
- Make it possible for individual vendors to extend their own Iglu metaschemas for their own Iglu Servers
- This would make Iglu a lot more flexible and better able to meet the needs of individual technology vendors. However, it would drive discrepancies in the way different companies adopt Iglu so in parallel we would look to:
- Put together a working group to evolve the Iglu standard as a whole
- A big focus for this working group would be to look at the way different vendors have extended the Iglu schema, with a view to:
- Incorporating extensions that are widely adopted, to promote interoperability, and
- Find ways to enable easy co-existence of extensions that different vendors require, but are specific to those different vendors (i.e. do not appear to be interesting for the industry as a whole to adopt)
Beyond extending Iglu to better meet the needs of vendors adopting it, we are also interested in working with related standards e.g. the Cloudevents.
Before we invest and go too far in any particular direction however, we are keen to get a feel for:
- How much interest there is in developing a common standard for defining digital events amongst the wider digital analytics community (beyond Snowplow, Iteratively and Mixpanel)
- Whether the initial approach outlined is a sensible set of first steps in starting to realise this vision
That is the motivation for posting this RFC. We look forward to your feedback!