Page MenuHomePhabricator

Add 'first campaign' and 'first campaign status code' to CentralNotice banner_activity_minutely Turnilo cube and Druid source table
Closed, ResolvedPublic4 Estimated Story Points

Description

As reported in T411382: CentralNotice Banner Status Codes are broken in Turnilo the status codes reported in our existing Turnilo cube are for the last campaign considered, which is not necessarily the fundraising campaign when fallback is active. We can parse the first campaign's status code out of the campaignStatuses URL parameter, e.g.

...&campaignStatuses=[{"statusCode":"6","campaign":"soso","bannersCount":1}]&status=banner_shown&statusCode=6...

Let's add three new dimensions to the banner_activity_minutely Turnilo cube (in https://github.com/wikimedia/operations-puppet/blob/ef99835a63e71d5a1ebf5fa8c8a191b1c75fc7d4/modules/turnilo/templates/config.yaml.erb)

  • first_campaign
  • first_campaign_status_code
  • campaign_fallback (calculated boolean to indicate when campaign fallback occurs)

They also have to be added to the banner_activity_minutely source in Druid (in the analytics/refinery repo)

Event Timeline

Ejegg renamed this task from Add 'first campaign status to Add 'first campaign status' to CentralNotice banner_activity_minutely Turnilo cube.Jan 13 2026, 3:25 PM
Ejegg updated the task description. (Show Details)
Ejegg renamed this task from Add 'first campaign status' to CentralNotice banner_activity_minutely Turnilo cube to Add 'first campaign status' to CentralNotice banner_activity_minutely Turnilo cube and Druid source table.Jan 13 2026, 3:30 PM
Ejegg updated the task description. (Show Details)

Adding Avishua and Joseph as subscribers in case either of you have any suggestions for the parsing or know more about Druid config.

AKanji-WMF triaged this task as Medium priority.Feb 9 2026, 4:24 PM

sorry i missed this- I believe @JMando has some parsing logic that we can convert to druid. I dont know much about druid configs but if needed we can bring in P&T for guidance

Tx @AStein-WMF , @JMando please let us know if you want us to spin up a req.

The druid loading process is not straightforward. We have 2 jobs, one running daily and the other running monthly (compaction).
The jobs are defined in various config files:

If need new fields are needed in the cube, their definition needs to be defined precisely enough to be implemented in HQL.
Also, defining new fields will only be applied to newly indexed data, not past data. We can backfill up to 90 days in the past as that' the retention we have for the webrequest dataset.

I hope what I have written make sense, please gie us more details on what you're expecting to see, the parsing mechanism, on which timeframe you'd expect to see it etc :)

Thank you very much @JAllemandou ! The definition is pretty precise - it is always the statusCode property of the first element of the campaignStatuses array. My HQL is rusty, but would this work?

from_json(parse_url(url, 'QUERY', 'statusCode'))[0].statusCode AS first_campaign_status

If we can backfill 90 days, that would be great, as it would cover the late-Nov - December English fundraising campaign to give us numbers to compare with next year.
I can put some gerrit patches / gitlab pull requests up for review now that I know where to look.

Hi @Ejegg ,
Before we engage in changing the code, do you think there could be additional changes needed for the analysis at stake? I'm as backfilling is not really cheap, and I'd rather not have to do it multiple times.

Change #1238821 had a related patch set uploaded (by Ejegg; author: Ejegg):

[analytics/refinery@master] Add first_campaign_status_code for banner activity

https://gerrit.wikimedia.org/r/1238821

Hi @JAllemandou, we're confident that the single field we need to add here is the status for the first campaign. There's more context on the linked ticket T411382: CentralNotice Banner Status Codes are broken in Turnilo which was filed by the banner team when they didn't see the expected codes in Turnilo. When there's a fundraising campaign active, the first status code in campaignStatuses will always be for that one, which is what we want.

I've got a patch for the HQL which I'm pretty sure selects that first status code, but I need to smoke test it a bit: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1238821

Ejegg renamed this task from Add 'first campaign status' to CentralNotice banner_activity_minutely Turnilo cube and Druid source table to Add 'first campaign' and 'first campaign status code' to CentralNotice banner_activity_minutely Turnilo cube and Druid source table.Feb 13 2026, 3:23 AM
Ejegg claimed this task.
Ejegg updated the task description. (Show Details)
Ejegg set the point value for this task to 4.

So much for that early confidence! After fixing the patch with Joal's help and looking at some data, I realized that we also need the first campaign considered, not just its status code. I've updated the patch and the ticket to reflect that fact.

Change #1239232 had a related patch set uploaded (by Ejegg; author: Ejegg):

[analytics/refinery@master] Use names in banner activity GROUP BY

https://gerrit.wikimedia.org/r/1239232

Change #1239232 merged by Joal:

[analytics/refinery@master] Use names in banner activity GROUP BY

https://gerrit.wikimedia.org/r/1239232

Change #1238821 merged by Joal:

[analytics/refinery@master] Add first_campaign_status_code for banner activity

https://gerrit.wikimedia.org/r/1238821

Pipelines have been updated, I can starts to backfill. I'll rerun the data from December 1st 2025.

Change #1240253 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Fix druid banner_activity data prep job

https://gerrit.wikimedia.org/r/1240253

I have hotfixed production with the job with the patch above, some null data needed to be coalesced.

Thank you Joal! I see you made the Airflow patch as well. I'll make the patch for turnilo.

Change #1240298 had a related patch set uploaded (by Ejegg; author: Ejegg):

[operations/puppet@production] Add new dimensions to banner_activity in Turnilo

https://gerrit.wikimedia.org/r/1240298

Change #1240298 merged by Brouberol:

[operations/puppet@production] Add new dimensions to banner_activity in Turnilo

https://gerrit.wikimedia.org/r/1240298

Change #1240253 merged by Joal:

[analytics/refinery@master] Fix druid banner_activity data prep job

https://gerrit.wikimedia.org/r/1240253

Here we are, data is available in turnilo starting Dec 1st 2025.

image.png (1×2 px, 314 KB)

@Ejegg I let you close the task as you see fits :)

Change #1240762 had a related patch set uploaded (by Ejegg; author: Ejegg):

[operations/puppet@production] Fix for new banner activity dimension in Turnilo

https://gerrit.wikimedia.org/r/1240762

Change #1240762 merged by Gehel:

[operations/puppet@production] Fix for new banner activity dimension in Turnilo

https://gerrit.wikimedia.org/r/1240762

Mentioned in SAL (#wikimedia-analytics) [2026-03-05T18:43:01Z] <dr0ptp4kt> Deploying change 1240253 for refinery ( T414478 ), already hotfixed, should be no-op

Mentioned in SAL (#wikimedia-operations) [2026-03-05T19:03:56Z] <dr0ptp4kt> Deployed refinery change 1240253 ( T414478 ), 1240253 (no-op) for refinery ( T414478 ) using scap, then deployed onto hdfs

Mentioned in SAL (#wikimedia-analytics) [2026-03-05T19:04:00Z] <dr0ptp4kt> Deployed refinery change 1240253 ( T414478 ), 1240253 (no-op) for refinery ( T414478 ) using scap, then deployed onto hdfs