I’m looking for your humble and wise feedback. More applicable to sites with long sales cycle solutions like SaaS or high-end e-commerce and people who are familiar with SQL, R, Python, Pandas.
We (Dripit.io) are working on a new product idea around data API’s.
Problem
Working with data is hard. On one end you either have raw data API and you have to spend long hours preparing data before you can ask any reasonable question. On the other end, it is limited dashboard which doesn’t answer questions you are particularly interested.
Solution
An API endpoint which returns full Customer journeys with:
UserID;
JourneyID;
Acquisition channel + costs;
Touchpoints + costs;
Total acquisition cost of a particular customer;
Abandon point and cumulative cost till the point;
Time till conversion.
As a data source can be used already existing Snowplow dataset. You can use this end-point to play with data in Google Data Studio, load into your database, build/improve your predictive models, build internal dashboards. Essentially it is ETL with a focus to return well-structured customer journey information.
at Goeuro as well as JustWatch, we used the Snowplow SQL data-modeler to create “derived” tables, which are ready-to-dashboard tables representations of the data that is frequently analysed. Those tables are of course tailored for each business you work for, but here are our examples:
At Goeuro, the main focus is on the customer journey. We built therefore the following tables: searches, clickouts, bookings, session, and user.
At JustWatch, our focus is on user interactions with movies, thus the following list: interactions, session, user.
At each granularity, we transform the row-events that land into atomic.events and enrich them with attributes like “title campaigned” (for interaction), “first touchpoint channel” (for session), “most frequently used device” (for user).
Those tables are now granular enough to enable us some flexibility for dashboarding, but also contain all relevant attributes for business reporting.
I know some alternatives exist for the real time pipeline with Spark, but I haven’t used them personally yet.
Hope this helps already, happy to answer your questions if you need further details !
Thank you @SixtineVervial for sharing your expierence.
Our goal would be to remove the complexity and need for you to model the data and connect directly to raw data store and return structured customer journeys with all the meta data. We are already doing this with our own tracking data which are close to Snowplow/Segment logs.
Now we are representing that data through our dashboard and Google Data Studio but we see interest in consuming raw Customer journey data. And we have looked at Snowplow as a data source for a while now.