Quite often we are asked about Snowplow ability to capture the values of the fields within a form submitted, say, on a registration page. An example of such a question follows:
Can someone let me know if or how I need to change the .js in order for form fields to be picked up? We’re going to need any email addresses entered to be stored so that we can join them to sales transactions when we build reports.
Form tracking is available out-of-the-box. In this tutorial, I will explain how to enable it, which is a simple task. However, you also should aware of its limitation. As a result, you might wish to consider a custom event/context to capture the required fields.
Enabling form tracking
All you need to do is to add the following line to your script:
snowplow('enableFormTracking');
where snowplow
is the name you gave to the global function to access the tracker (see the tag you used to load sp.js
), that is if you gave it a different name then amend the above line accordingly.
The above method allows you to capture 2 event types, namely change_form
and submit_form
.
You might feel the submit_form
is what you are after. However, the complication here is the fields’ values in the form will be available in Redshift in JSON format. That is to retrieve, say, the actual email address submitted with the form you would need to deploy some means to extract that value from JSON, which is not available in Redshift itself.
An example of such a value follows:
[{"name":"Message","value":"This is how you do it.","nodeName":"TEXTAREA"},{"name":"Name","value":"Ihor Tomilenko","nodeName":"INPUT","type":"text"},{"name":"Email","value":"ihor@properweb.ca","nodeName":"INPUT","type":"text"},{"name":"Subject","value":"Form Tracking","nodeName":"INPUT","type":"text"}]
As can be seen, the data is actually a varchar
representation of the list of JSONs. Each JSON in the list provides data associated with each (tracked) field of the form. Such a value would populate the (single) column elements
of the com_snowplowanalytics_snowplow_submit_form_1
table.
The change_form
event on the other hand records each individual value for each field in the form to the dedicated column, namely value
of com_snowplowanalytics_snowplow_change_form_1
table. Each time a field is changed (focus from the field is shifted) the change_form
event for the corresponding field is fired.
# select element_id, value
from atomic.com_snowplowanalytics_snowplow_change_form_1
limit 4;
element_id | value
------------+------------------------
Name | Ihor Tomilenko
Email | ihor@properweb.ca
Message | This is how you do it.
Subject | Form Tracking
If you are interested only in one field from the form (or just a few) you can apply a restriction on what is to be captured. This is achieved by means of whitelist and blacklist.
You can find out more about the form tracking from the following wiki page: 2 Specific event tracking with the Javascript tracker · snowplow/snowplow Wiki · GitHub
In addition to adding the script line, you would also have to create the corresponding Redshift tables as both of the events have been implemented as unstructured (self-describing) events. The corresponding DDLs could be found at the following links:
submit_form
: https://github.com/snowplow/iglu-central/blob/master/sql/com.snowplowanalytics.snowplow/submit_form_1.sqlchange_form
: https://github.com/snowplow/iglu-central/blob/master/sql/com.snowplowanalytics.snowplow/change_form_1.sql
You can link the records in those table back to atomic.events
by means of relation:
event_id = root_id and collector_tstamp = root_tstamp
The problem though is it’s not obvious when the change_form
event (which allows acquiring the values of the fields on the form easily) is related to any actual submit_form
event.
Alternative approach (custom event/context)
Alternatively, you could fire a custom event or attach a custom context to the Snowplow authored track...
event to capture a just email address or whatever fields you might be interested in.
Custom contexts could be added as a last parameter to the corresponding function. That could allow you to have your email address field captured into a dedicated field, for example, thus avoiding the need to parse JSON for submit_form
event.
Yet another solution is to create your own version of form submission event, so-called self-describing event.
These last two approaches are more involved, however. They required you to create your own JSON schema, JSONPaths file, and Redshift table. Though, it could be automated with the help of Schema Guru and/or lately with igluctl CLI.
You can read more about the events and contexts in this wiki section: Events and Contexts · snowplow/snowplow Wiki · GitHub
Here, I’m going to show you how you could capture the values of the form fields of your interest by means of the custom self-describing event with the aim of getting an easy access to it in Redshift.
Custom form submission event
For the purpose of this tutorial and to keep it simple, I aim to capture just an email address from the contact form (as per original question at the top of the post).
I will start with defining the JSON schema:
{
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"description": "Schema for form submission",
"self": {
"vendor": "ca.properweb",
"name": "submit_form",
"format": "jsonschema",
"version": "1-0-0"
},
"type": "object",
"properties": {
"email": {
"type": "string"
},
"form": {
"type": "string"
}
},
"minProperties":1,
"required": ["email"],
"additionalProperties": false
}
As can be seen, the only properties defined are email
and form
. Taken we also have the name of the event, submit_form
, as the value of the name
parameter in the self
section of the schema, the following values will be accessible from the dedicated Redshift table ca_properweb_submit_form_1
as summarised below.
submit_form
- event nameform
- form id/name submittedemail
- email address captured
Let’s try out our new CLI to generate the corresponding DDL and JSONpaths file. But first, I want to make sure the JSON schema is in a valid format. Provided the schema was saved as schemas/ca.properweb/submit_form/jsonschema/1-0-0
, I will use the lint
command for the task as shown below (with its output)
$ ./igluctl lint schemas/
SUCCESS: Schema [/home/ec2-user/schemas/ca.properweb/submit_form/jsonschema/1-0-0] is successfully validated
TOTAL: 1 Schemas were successfully validated
TOTAL: 0 errors were encountered
That’s a good start. Let’s generate the Redshift table definition and the corresponding JSONPaths file:
$ ./igluctl static generate --with-json-paths schemas/
File [/home/ec2-user/./sql/ca.properweb/submit_form_1.sql] was written successfully!
File [/home/ec2-user/./jsonpaths/ca.properweb/submit_form_1.json] was written successfully!
Here, we have them:
$ cat sql/ca.properweb/submit_form_1.sql
-- AUTO-GENERATED BY igluctl DO NOT EDIT
-- Generator: igluctl 0.1.0
-- Generated: 2016-08-19 18:44
CREATE SCHEMA IF NOT EXISTS atomic;
CREATE TABLE IF NOT EXISTS atomic.ca_properweb_submit_form_1 (
"schema_vendor" VARCHAR(128) ENCODE RUNLENGTH NOT NULL,
"schema_name" VARCHAR(128) ENCODE RUNLENGTH NOT NULL,
"schema_format" VARCHAR(128) ENCODE RUNLENGTH NOT NULL,
"schema_version" VARCHAR(128) ENCODE RUNLENGTH NOT NULL,
"root_id" CHAR(36) ENCODE RAW NOT NULL,
"root_tstamp" TIMESTAMP ENCODE LZO NOT NULL,
"ref_root" VARCHAR(255) ENCODE RUNLENGTH NOT NULL,
"ref_tree" VARCHAR(1500) ENCODE RUNLENGTH NOT NULL,
"ref_parent" VARCHAR(255) ENCODE RUNLENGTH NOT NULL,
"email" VARCHAR(4096) ENCODE LZO NOT NULL,
"form" VARCHAR(4096) ENCODE LZO,
FOREIGN KEY (root_id) REFERENCES atomic.events(event_id)
)
DISTSTYLE KEY
DISTKEY (root_id)
SORTKEY (root_tstamp);
COMMENT ON TABLE atomic.ca_properweb_submit_form_1 IS 'iglu:ca.properweb/submit_form/jsonschema/1-0-0';
$ cat jsonpaths/ca.properweb/submit_form_1.json
{
"jsonpaths": [
"$.schema.vendor",
"$.schema.name",
"$.schema.format",
"$.schema.version",
"$.hierarchy.rootId",
"$.hierarchy.rootTstamp",
"$.hierarchy.refRoot",
"$.hierarchy.refTree",
"$.hierarchy.refParent",
"$.data.email",
"$.data.form"
]
}
Once uploaded to the dedicated buckets on S3 and the correct permissions/access is set, I’m going to take care of the JavaScript event tracking function (using jQuery). Please, bear in mind the below code is for demonstration purposes only. It has to be more robust to be deployed to production.
<script type="text/javascript">
$('#contact-form').submit(function(){
var form_id = $(this).attr('id'); //form id
var email_address = $('input[name=Email]').val(); //email address value
window.snowplow('trackUnstructEvent', {
schema: 'iglu:ca.properweb/submit_form/jsonschema/1-0-0',
data: {
form: form_id,
email: email_address
}
});
});
</script>
After firing a few events and ensuring they were sent to the collector and were available as raw events in the raw
bucket, I launched the pipeline job. Let’s check the results.
# select derived_tstamp, event, schema_name as event_name, form, email
from atomic.events
join atomic.ca_properweb_submit_form_1
on event_id=root_id and collector_tstamp=root_tstamp
limit 3;
derived_tstamp | event | event_name | form | email
-------------------------+----------+-------------+--------------+-------------------
2016-08-19 02:23:56.999 | unstruct | submit_form | contact-form | ihor@properweb.ca
2016-08-19 02:21:24.999 | unstruct | submit_form | contact-form | ivan@pupkin.org
2016-08-19 02:22:42.496 | unstruct | submit_form | contact-form | alex@gmail.com
(3 rows)
Here we have it: the email addresses captured on submission of the form are available in the dedicated table.
Summary
Snowplow is a flexible platform giving you different ways of capturing data you are interested in. The form submission, in particular, could be handled in the following 3 ways, 2 of which are available “out-of-the-box”.
submit_form
- Snowplow authored event capturing the form values and available in Redshift as a list of JSONs. Despite its simplicity, the drawback is you would have to deploy some means of extracting the required values from JSONchange_form
event - it does capture fields values into dedicated columns. However, it’s not directly related to the form submission. If the same field was amended a few times prior to form submission, it’s hard to correlate the values with the actual values submitted with the formself-describing
event gives you the most flexibility. It’s up to you how you define it. The drawback, however, it is quite involved and requires a much deeper understanding of the Snowplow pipeline functioning and techniques involved
The choice is yours!