We have devised a more precise plan here: https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADsbzmMklczQ1YG7oghGl8/edit?tab=t.u0mzcolehp3v
Description
Details
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| Update main webrequest factory to always refine | repos/data-engineering/airflow-dags!1217 | joal | update_main_webrequest_no_stop | main | |
| Fix main webrequest_deprecated path | repos/data-engineering/airflow-dags!1213 | joal | fix_main_webrequest_deprecated | main | |
| Fix HQL path in refine_webrequest_test job | repos/data-engineering/airflow-dags!1197 | joal | fix_webrequest_test | main |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | Antoine_Quhen | T354694 [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition | |||
| Resolved | JAllemandou | T387750 [HAProxy migration] Compile expected migration delta, switch over plan and communicate | |||
| Resolved | JAllemandou | T386177 Switch webrequest dataset to feed from HAProxy instead of VarnishKafka | |||
| Resolved | JAllemandou | T386176 Add `is_redirect_to_pageview` field to `wmf_staging.webrequest` table | |||
| Resolved | JAllemandou | T386174 Grow number of Gobblin mappers ingesting `webrequest_frontend` data | |||
| Resolved | elukey | T390029 Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data |
Event Timeline
Change #1131387 had a related patch set uploaded (by Joal; author: Joal):
[operations/puppet@production] Update analytics webrequest kafkatee
Change #1131387 merged by Btullis:
[operations/puppet@production] Update analytics webrequest kafkatee
Change #1131405 had a related patch set uploaded (by Joal; author: Joal):
[operations/puppet@production] Update hadoop-test webrequest gobblin/purge jobs
Change #1131652 had a related patch set uploaded (by Joal; author: Joal):
[analytics/refinery@master] Add webrequest_frontend_test gobblin job
Change #1131652 merged by Joal:
[analytics/refinery@master] Add webrequest_frontend_test gobblin job
Change #1131405 merged by Btullis:
[operations/puppet@production] Update hadoop-test webrequest gobblin/purge jobs
joal opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1197
Fix HQL path in refine_webrequest_test job
aqu merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1197
Fix HQL path in refine_webrequest_test job
Change #1132663 had a related patch set uploaded (by Joal; author: Joal):
[operations/alerts@master] Update data-eng gobblin alert
Change #1133068 had a related patch set uploaded (by Joal; author: Joal):
[analytics/refinery@master] Update webrequest schemas for HAProxy migration
Change #1133068 merged by Joal:
[analytics/refinery@master] Update webrequest schemas for HAProxy migration
I'm going to adjust retention of the following topics:
- webrequest_text goes from 7d to 3d
- webrequest_upload goes from 7d to 3d
- webrequest_frontend_text goes from 3d to 7d
- webrequest_frontend_upload goes from 3d to 7d
All good?
brouberol@kafka-jumbo1014:~$ kafka configs --alter --entity-type topics --entity-name webrequest_frontend_text --delete-config retention.ms kafka-configs --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --alter --entity-type topics --entity-name webrequest_frontend_text --delete-config retention.ms Completed Updating config for entity: topic 'webrequest_frontend_text'. brouberol@kafka-jumbo1014:~$ kafka topics --describe --topic webrequest_frontend_text | head kafka-topics --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --describe --topic webrequest_frontend_text Topic:webrequest_frontend_text PartitionCount:256 ReplicationFactor:3 Configs:message.timestamp.type=LogAppendTime Topic: webrequest_frontend_text Partition: 0 Leader: 1012 Replicas: 1012,1013,1009 Isr: 1009,1012,1013 Topic: webrequest_frontend_text Partition: 1 Leader: 1014 Replicas: 1014,1010,1009 Isr: 1014,1010,1009
I deleted the retention.ms override on webrequest_frontend_text, which caused it it use the server-side default of
brouberol@kafka-jumbo1014:~$ sudo grep retention /etc/kafka/server.properties log.retention.hours=168
which is 7d.
brouberol@kafka-jumbo1014:~$ kafka configs --alter --entity-type topics --entity-name webrequest_frontend_upload --delete-config retention.ms kafka-configs --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --alter --entity-type topics --entity-name webrequest_frontend_upload --delete-config retention.ms Completed Updating config for entity: topic 'webrequest_frontend_upload'.
The webrequest_{text,upload} topics now have a retention of 3d.
brouberol@kafka-jumbo1014:~$ kafka configs --alter --entity-type topics --entity-name webrequest_text --add-config retention.ms=259200000 kafka-configs --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --alter --entity-type topics --entity-name webrequest_text --add-config retention.ms=259200000 Completed Updating config for entity: topic 'webrequest_text'. brouberol@kafka-jumbo1014:~$ kafka configs --alter --entity-type topics --entity-name webrequest_upload --add-config retention.ms=259200000 kafka-configs --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --alter --entity-type topics --entity-name webrequest_upload --add-config retention.ms=259200000 Completed Updating config for entity: topic 'webrequest_upload'. brouberol@kafka-jumbo1014:~$ kafka topics --describe --topic webrequest_text | head -n 4 kafka-topics --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --describe --topic webrequest_text Topic:webrequest_text PartitionCount:24 ReplicationFactor:3 Configs:message.timestamp.type=LogAppendTime,retention.ms=259200000 Topic: webrequest_text Partition: 0 Leader: 1013 Replicas: 1013,1008,1011 Isr: 1011,1013,1008 Topic: webrequest_text Partition: 1 Leader: 1008 Replicas: 1008,1012,1015 Isr: 1012,1015,1008 brouberol@kafka-jumbo1014:~$ kafka topics --describe --topic webrequest_upload | head -n 4 kafka-topics --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/jumbo-eqiad --describe --topic webrequest_upload Topic:webrequest_upload PartitionCount:24 ReplicationFactor:3 Configs:message.timestamp.type=LogAppendTime,retention.ms=259200000 Topic: webrequest_upload Partition: 0 Leader: 1010 Replicas: 1010,1014,1008 Isr: 1014,1010,1008 Topic: webrequest_upload Partition: 1 Leader: 1012 Replicas: 1012,1013,1007 Isr: 1012,1013,1007
Let the bonfire begin.
Change #1133084 had a related patch set uploaded (by Joal; author: Joal):
[analytics/refinery@master] Hotfix for webrequest migration
Change #1133084 merged by Joal:
[analytics/refinery@master] Hotfix for webrequest migration
Change #1132663 merged by jenkins-bot:
[operations/alerts@master] Update data-eng gobblin alert
Change #1133847 had a related patch set uploaded (by Joal; author: Joal):
[operations/alerts@master] Update GobblinLastSuccessfulRunTooLongAgo
Change #1133847 merged by jenkins-bot:
[operations/alerts@master] Update GobblinLastSuccessfulRunTooLongAgo
joal opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1213
Fix main webrequest_deprecated path
joal merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1213
Fix main webrequest_deprecated path
Migration impact and documentation: https://wikitech.wikimedia.org/w/index.php?title=Data_Platform%2FData_Lake%2FTraffic%2FWebrequest&diff=2289838&oldid=2248233
joal opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1217
Update main webrequest factory to always refine
mforns merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1217
Update main webrequest factory to always refine
Change #1187450 had a related patch set uploaded (by Joal; author: Joal):
[operations/puppet@production] Fix raw webrequest data purge job
Change #1187450 merged by Stevemunene:
[operations/puppet@production] Fix raw webrequest data purge job