Thanks for the thoughts!
Do you have shared properties across multiple pages that you want to capture as attributes? If so I would avoid having a separate context for each page - 100s of schemas is doable but not really optimal. If it’s possible sharing an example of what you’d like to hypothetically send might make it easier to design this structure.
Yes, we’ll use the same properties across all pages. The custom context we created so far would be as below (probably not fully validated).
100s / autogenerated schemas - it is not optimal from the pipeline perspective or just the management of it?
{
"$schema":"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"description":"Schema for content classification",
"self":{
"vendor":"com.dtm",
"name":"sentiment",
"format":"jsonschema",
"version":"1-0-0"
},
"type":"object",
"properties":{
"automotive":{
"direction": "positive",
"positivity": 100,
"negativity": 0,
"score": 3,
"words": 221,
"sentences": 13,
"precision": 0.8,
"recall": 0.4,
"f": 0.55
},
"books_literature": {
"direction": "positive",
"positivity": 100,
"negativity": 0,
"score": 3,
"words": 221,
"sentences": 13,
"precision": 0.8,
"recall": 0.4,
"f": 0.55
},
"business_finance":{
"direction": "negative",
"positivity": 23,
"negativity": 77,
"score": 3,
"words": 221,
"sentences": 13,
"precision": 0.8,
"recall": 0.4,
"f": 0.55
},
"travel": {
"direction": "positive",
"positivity": 100,
"negativity": 0,
"score": 3,
"words": 221,
"sentences": 13,
"precision": 0.8,
"recall": 0.4,
"f": 0.55
}
},
"additionalProperties": false
}
Is this your own content or content on another site? If it’s your own content I’d be tempted to do the sentiment scoring / NLP analysis in the enrichment part of the pipeline if possible rather than necessarily the frontend using a lookup on content id.
Got it. The content is from multiple sites, we have no control over the content. By enrichment you mean custom javascript enrichment?