Duplicate user_identifier in snowplow_unified_users

We are encountering duplicate user_identifier values in our derived.snowplow_unified_users table. This only started several days after a successful implementation of the Snowplow Unified package. There were no changes to the dbt project around the time the issue presented itself.

dbt error

[31mFailure in test unique_snowplow_unified_users_user_identifier (models/users/users.yml)[0m
[2024-07-25, 07:56:12 BST] {{ecs.py:131}} INFO - [2024-07-25 06:55:59,577] [0m06:55:59 Got 8792 results, configured to fail if != 0

Snowplow Unified: version 0.4.4

Warehouse: snowflake

snowplow__user_identifiers and snowplow__session_identifiers left as default

snowplow_unified:
  snowplow__conversion_events: redacted
  snowplow__backfill_limit_days: 10
  snowplow__start_date: '2023-12-01'
  snowplow__atomic_schema: redacted
  snowplow__database: redacted
  snowplow__enable_ua: true
  snowplow__enable_yauaa: true 
  snowplow__enable_mobile_context: true 
  snowplow__enable_screen_summary_context: true

Identifying where the duplicates occur

with user_aggs as (

    select
        user_identifier,
        count(user_identifier) over (
            partition by 
                user_identifier
        ) as count_user_identifier
    from
        scratch.snowplow_unified_users_aggs 
    qualify count_user_identifier > 1
),

user_sess_this_run as (

    select
        user_identifier,
        count(user_identifier) over (
            partition by 
                user_identifier
        ) as count_user_identifier
    from 
        scratch.snowplow_unified_users_sessions_this_run 
    qualify count_user_identifier > 1
),

unified_users_this_run as (

    select
        user_identifier,
        count(user_identifier) over (
            partition by 
                user_identifier
        ) as count_user_identifier
    from 
        scratch.snowplow_unified_users_this_run
    qualify count_user_identifier > 1
),

unified_users as (

    select
        user_identifier,
        count(user_identifier) over (
            partition by 
                user_identifier
        ) as count_user_identifier
    from 
        derived.snowplow_unified_users 
    qualify count_user_identifier > 1
)



select 
    'duplicates from snowplow_unified_users_aggs' as table_name,
    count(distinct user_identifier) as duplicates
from 
    user_aggs 

    union all 

select 
    'duplicates from unified_users_sessions_this_run' as table_name,
    count(distinct user_identifier) as duplicates
from 
    user_sess_this_run  

    union all 

select 
    'duplicates from snowlow_unified_users_this_run' as table_name,
    count(distinct user_identifier) as duplicates
from 
    unified_users_this_run 

    union all

select 
    'duplicates from derived.snowlow_unified_users' as table_name,
    count(distinct user_identifier) as duplicates
from 
    unified_users   

The current run from scratch.unified_users_sessions_this_run naturally shows multiple different sessions by the same users over the period the run took place. Downstream scratch.snowplow_unfied_users_aggs appears to group by user_identifier without issue as no duplicates are found in this table. It appears as though all rows in snowplow_unified_users_this_run are being appended into derived.snowplow_unified_users.