Weather Enrichment: Retrieving current weather fails

Hello all.

What I would like to achieve

Currently I am trying to get the Weather Enrichment (Snowplow docs) working. Since I don’t need historical but only current weather data, I want to test the enrichment with the free plan, so with api.openweathermap.org as apiHost attribute.

What I have tried so far

So I generated an API key (denoted as owm_key) for a free access of OpenWeatherMap. With curl I can actually query the weather API for e.g. the current weather of London without any problems:

owm_url="https://api.openweathermap.org/data/2.5/weather?"
owm_url+="lat=51.509865&lon=-0.1118092&appid=${owm_key}"
curl "${owm_url}"

results in this output:

{
  "coord": {
    "lon": -0.1118,
    "lat": 51.5099
  },
  "weather": [
    {
      "id": 800,
      "main": "Clear",
      "description": "clear sky",
      "icon": "01d"
    }
  ],
  "base": "stations",
  "main": {
    "temp": 288.48,
    "feels_like": 287.09,
    "temp_min": 286.22,
    "temp_max": 290.25,
    "pressure": 1029,
    "humidity": 39
  },
  "visibility": 10000,
  "wind": {
    "speed": 5.14,
    "deg": 80
  },
  "clouds": {
    "all": 1
  },
  "dt": 1648145271,
  "sys": {
    "type": 2,
    "id": 2019646,
    "country": "GB",
    "sunrise": 1648101259,
    "sunset": 1648145945
  },
  "timezone": 0,
  "id": 2634341,
  "name": "City of Westminster",
  "cod": 200
}

Configuration

With this confirmation, I modified the example configuration script as follows:

{
  "schema": "iglu:com.snowplowanalytics.snowplow.enrichments/weather_enrichment_config/jsonschema/1-0-0",
  "data": {
    "enabled": true,
    "vendor": "com.snowplowanalytics.snowplow.enrichments",
    "name": "weather_enrichment_config",
    "parameters": {
      "apiKey": "owm_key",
      "cacheSize": 5100,
      "geoPrecision": 1,
      "apiHost": "api.openweathermap.org",
      "timeout": 5
    }
  }
}

The IP lookup enrichment (Snowplow Docs) is already running successfully, so the values for the fields geo_latitude and geo_longitude needed for the weather enrichment are already generated.

The documentation points out with the parameter timeout that …

timeout is a time in seconds after which request should be considered failed. Notice that failed weather enrichment will filter out whole your event, whether this failure be timeout or invalid API key

Unfortunately, this is exactly what happens: From the moment I activate this enrichment all events are filtered out. I rule out a problem with the API key for now, as I have shown above that it works.

Are there any logs where I can look for error messages in case of problems with this enrichment? The logs of the Snowplow Enrich Pub/Sub docker container do not provide much details:

$ docker logs --details -f enrich_container_name
 [pool-1-thread-1] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Main - Initialising resources for Enrich job
 [pool-1-thread-1] INFO com.snowplowanalytics.snowplow.enrich.pubsub.io.FileSystem - Files found in /snowplow/enrichments: /snowplow/enrichments/re
ferer_parser.json, /snowplow/enrichments/ua_parser_config.json, /snowplow/enrichments/ip_lookups.json, /snowplow/enrichments/cookie_extractor_confi
g.json, /snowplow/enrichments/campaign_attribution.json, /snowplow/enrichments/weather_enrichment.json, /snowplow/enrichments/anon_ip.json
 [pool-1-thread-2] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Environment - Parsed Iglu Client with following registries: Iglu Central, Iglu
 Central - GCP Mirror
 [pool-1-thread-2] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Environment - Parsed following enrichments: referer_parser, ua_parser_config,
campaign_attribution, anon_ip
 [pool-1-thread-2] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Assets - Preparing enrichment assets
 [pool-1-thread-1] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Assets - Downloading https://s3-eu-west-1.amazonaws.com/snowplow-hosted-assets
/third-party/referer-parser/referers-latest.json
 [pool-1-thread-2] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Environment - Enrich environment initialized
 [pool-1-thread-2] INFO com.snowplowanalytics.snowplow.enrich.pubsub.Main - Running enrichment stream

Do you have any idea what I did wrong or what I could test?

Thanks for your help
Richard

As I delve into this topic, some ambiguities have arisen for me with this enrichment:

When this enrichment was announced in 2015, the description of the configuration parameters pointed out that “failed weather enrichment will cause your whole enriched event to end up in the bad bucket”, which sounds reasonable to me.

However, the current documentation (2021-02-05) points out that “failed weather enrichment will filter out whole your event, whether this failure be timeout or invalid API key”, which is hard for me to understand. Why would you want your own raw data to be lost in the event of a possibly temporarily disrupted query of a weather service?

In addition, I noticed another passage in the documentary for pipelines on GCP (2020-10-22) that left me irritated:

On the other hand, I see in the associated GitHub repository that the associated Scala app continues to be developed.

image

The readme for this app points out that the free access API is usable and even recommended:

image

Whereas the Enrichment documentation claims that the free API cannot be used at all and that a paid subscription is required instead:

So …

Which parts of the documentation are correct and which are not? Is this enrichment developed for current versions of Snowplow components or as stated only for Enrich 1.4.x and older? What is the design decision behind discarding raw data instead of continuing to collect it in a bad bucket? Can the API be used with a free plan?

Perhaps someone can shed some light on my confusion. Given the current information situation, it is definitely a challenge to use this enrichment.

Thanks in advance
Richard

Hi Richard,

Sorry that you struggled with weather enrichment.

Indeed as stated in the docs this enrichment is not available any more, for now. We made it more clear in the docs. The reason is that currently in enrich, historical API is used for the weather, because previous enrich assets were using batch, whereas now they are all streaming, so we need to use current API instead.

We created this issue to take care of it, but this is not on our roadmap yet. Until this happens, something you could do would be to use API enrichment and have your own HTTP server that relays the calls to OpenWeatherMap with current API.

The reason why snowplow/scala-weather got updated is that we maintain this library for Scala developers that would want to get the weather in their code.

1 Like