Hi. We are exploring migrating from GA3 to our own Snowplow running in AWS. Our users want to have real-time stats: the most popular pages, how many visitors for each page, etc.
Has anyone implemented a visualization tool like opensearch dashboard, kibana, etc or implemented some other architecture to make this happen? If so how did you make it work?
Right now we’ve setup the AWS bootstrap which uses Kinesis. We have that loading into Redshift but need a different tool/arch for real-time.
Hi @yangabanga - loading Elasticsearch / Opensearch would likely be the best strategy here to deliver real-time dashboards. We also have an OS Terraform Module for this which you can use as a way to get started quickly.
The alternative if Elasticsearch / Kibana is not something you are interested in would be to setup the Postgres Loader which is also real-time / streaming. This one is not battle-tested for really high ingress rates however so how well this works might vary.
With Opensearch you get Kibana included so can easily spinup dashboards that way but with Postgres you are likely going to have an easier time plugging this in to your existing BI Tools and querying it in much the same syntax as Redshift.
Would either of these options work for what you are looking to implement?
Internally we exclusively use the Elasticsearch Loader with AWS Opensearch clusters which then has Kibana out of the box as well. So yes it for sure works!
Hi @josh , thank you for the link above. I also have a question related to this. My infra was launched via aws quickstart with postgres loader. I would like to add the capability for real time dashboard with aws opensearch kibana. Can (Terraform Registry) be added as is to the quickstart tf or I have to build new infra with elasticsearch loader? Thank you!
Hi @alvin so the quick-start won’t support it but you can simply edit the terraform to have the extra module for the ES Loader and plug it into the existing Kinesis streams you have already created → ultimately forking the quick-start repo to be your own.
So you don’t need to start from scratch - just add the extra Terraform in for the loader you want!
Hi @josh, Thank you for your response! I truly appreciate it. I am encountering a blocking issue with the terraform module (Terraform Registry).
│ The given value is not suitable for module.elasticsearch-loader-kinesis-ec2.var.subnet_ids declared at
│ .terraform/modules/elasticsearch-loader-kinesis-ec2/variables.tf:11,1-22: list of string required.
I am confident that I declared the variable subnet_ids as a list of strings. The value I used for subnet_ids is the private subnet that PostgreSQL is using. I have also tried using two different values for the required variables of Elasticsearch, but both the AWS OpenSearch Engine (OpenSearch 1.3) and Elasticsearch 7.10 are throwing the same error for the subnet_ids in the module.
Would you be able to help me with checking this module? I can only use version 0.1.0 and version 0.1.1 due to version constraints on my current AWS Quickstart infrastructure. Thank you!
So the type on the module is set correctly - can you share the vars file as well that is being used and redact anything sensitive?
Would you be able to help me with checking this module? I can only use version 0.1.0 and version 0.1.1 due to version constraints on my current AWS Quickstart infrastructure. Thank you!
Upgrading to the latest module versions is always recommended here so its worth upgrading the other quick-start modules to remove this limitation.
Hi Josh, thanks for checking the module I provided. It seems that my quick-start modules are in the latest version. But starting from es-loader v.0.2.0 it is throwing version constraints error.
iglu_server | "provider_aws" | ~> 3.45.0 |
pipeline | "provider_aws" | ~> 3.45.0 |
$ terraform init -upgrade
Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 3.25.0, ~> 3.45.0, >= 3.75.0"...
- Finding hashicorp/random versions matching ">= 3.0.0, ~> 3.1.0"...
- Finding snowplow-devops/snowplow versions matching ">= 0.4.0"...
- Using previously-installed snowplow-devops/snowplow v0.7.1
- Using previously-installed hashicorp/random v3.1.3
╷
│ Error: Failed to query available provider packages
│
│ Could not retrieve the list of available versions for provider hashicorp/aws: no available releases match the given constraints >= 3.25.0, ~> 3.45.0, >= 3.75.0
making me use lower es-loader v.0.1.0/0.1.1
Also, please see vars file
# Will be prefixed to all resource names
# Use this to easily identify the resources created and provide entropy for subsequent environments
prefix = "XXXXXX"
# --- S3
s3_bucket_name = "XXXXXX"
# To use an existing bucket set this to false
s3_bucket_deploy = true
# To save objects in a particular sub-directory you can pass in an optional prefix (e.g. 'foo/' )
s3_bucket_object_prefix = ""
# --- VPC
# Update to the VPC you would like to deploy into which must have public & private subnet layers across which to deploy
# different layers of the application
vpc_id = "XXXXXX"
# Load Balancer will be deployed in this layer
public_subnet_ids = ["subnet-XXXXXX", "subnet-XXXXXX"]
# EC2 Servers & RDS will be deployed in this layer
private_subnet_ids = ["subnet-XXXXXX", "subnet-XXXXXX"]
# --- SSH
# Update this to the internal IP of your Bastion Host
ssh_ip_allowlist = ["XXXXXX"]
# Generate a new SSH key locally with `ssh-keygen`
# ssh-keygen -t rsa -b 4096
ssh_public_key = "XXXXXX"
# --- Iglu Server Configuration
# Iglu Server DNS output from the Iglu Server stack
iglu_server_dns_name = "XXXXXX"
# Used for API actions on the Iglu Server
# Change this to the same UUID from when you created the Iglu Server
iglu_super_api_key = "XXXXXX"
# --- Snowplow Postgres Loader
pipeline_db = "XXXXXX"
postgres_db_name = "XXXXXX"
postgres_db_username = "XXXXXX"
# Change and keep this secret!
postgres_db_password = "XXXXXX"
# IP ranges that you want to query the Pipeline Postgres RDS from
# Note: these IP ranges will need to be internal to your VPC like from a Bastion Host
postgres_db_ip_allowlist = ["XXXXXX"]
# Controls the write throughput of the KCL tables maintained by the various consumers deployed
pipeline_kcl_write_max_capacity = 50
# See for more information: https://registry.terraform.io/modules/snowplow-devops/collector-kinesis-ec2/aws/latest#telemetry
# Telemetry principles: https://docs.snowplowanalytics.com/docs/open-source-quick-start/what-is-the-quick-start-for-open-source/telemetry-principles/
user_provided_id = ""
telemetry_enabled = false
# --- AWS IAM (advanced setting)
iam_permissions_boundary = "" # e.g. "arn:aws:iam::0000000000:policy/MyAccountBoundary"
# --- SSL Configuration (optional)
ssl_information = {
certificate_arn = "XXXXXX"
enabled = true
}
# --- Extra Tags to append to created resources (optional)
tags = {}
# --- CloudWatch logging to ensure logs are saved outside of the server
cloudwatch_logs_enabled = false
#cloudwatch_logs_retention_days = 7
bad_stream_name = "XXXXXX"
es_cluster_name = "XXXXXX"
es_cluster_endpoint = "XXXXXX" #same error for both internet and vpc endpoint
es_cluster_index = "XXXXXX"
es_cluster_port = XXXXXX
es_cluster_document_type = "XXXXXX"
in_stream_name = "XXXXXX"
in_stream_type = "XXXXXX"
name = "XXXXXX"
ssh_key_name = "XXXXXX"
subnet_ids = ["subnet-XXXXXX", "subnet-XXXXXX"]
And lastly can you share the new var declarations? These are the configured values but need to see the actual types you have assigned to these new variables.