As discussed in T228175: [Metrics Platform] Event Platform Client Libraries, we believe we can migrate existent EventLogging extension produced streams to Modern Event Platform components. This will finally allow us to decommission the EventLogging backend pieces:
- varnishkafka-eventlogging
- eventlogging-processor (and eventlog1003)
- meta.wikimedia.org schemas
- refine_eventlogging_analytics
To support existent EventLogging events in eventgate-analytics, we need to do:
- meta.wikimedia.org schemas ported to draft 7 JSONSchema in a git schema repo with common schema included via $ref.
- stream config entry for each (active) EventLogging schema/stream.
- Schema revision extension attributes changed to use the new semver schema version.
- EL client side code adapted to produce full event (with capsule fields) and to POST to eventgate.
- Resolve capsule userAgent type issues (This is a string in JSONSchema, and a struct in Hive)
Ideally, EventLogging will produce the full event including EventCapsule fields to eventgate-analytics-external, the same eventgate instance that new style schemas will use. The same Refine job we use for eventgate analytics events should be able to Refine the old EL style events. Not all fields from capsule will be set (e.g. seqId and recvFrom), but we can work with what we have on the client side. The main issue will be resolving the userAgent type discrepancy, as we will parse the user_agent during refinement.
We'll start by migrating a single high volume EventLogging stream to MEP: SearchSatisfaction - T249261: Vertical: Migrate SearchSatisfaction EventLogging event stream to Event Platform
Once T259163: Migrate legacy metawiki schemas to Event Platform is done, we should clean up all schemas on metawiki, either by deleting them or emptying out their content with {}.
Steps
Undeploy varnishkafka-eventlogging
- Set ensure => 'absent' on varnishkafka::instance and nrpe::monitor_service in profile::cache::kafka::eventlogging.
- Apply puppet on all varnish cache nodes and ensure varnishkafka-eventlogging is stopped and config files are removed.
- Remove profile::cache::kafka::eventlogging - patch
Undeploy eventlogging-processor
- Set ensure => 'absent' on:
- eventlogging::service::processor in profile::eventlogging::analytics::processor
- scap::target, ssh::userkey, and eventlogging::server eventlogging::monitoring::jobs in profile::eventlogging::analytics::server.
- Apply puppet on eventlog1003 and ensure all eventlogging-processor@client-side-* processes are stopped and gone.
- Remove all relevant eventlogging backend code from operations/puppet
Delete the legacy backend eventlogging Kafka topics in Kafka jumbo-eqiad cluster:
- To delete
eventlogging-client-side eventlogging-valid-mixed eventLogging-valid-mixed eventlogging-virtualpageview
DO NOT delete any eventlogging_* topics. These are migrated eventlogging topics.
Update gobblin job eventlogging_legacy.pull
The eventlogging_legacy Gobblin job is still using wildcard topic names to figure out which Kafka topics to import. Now that all legacy streams are migrated, we should use EventStreamConfig to determine which streams to import.
The eventlogging_legacy_test.pull job (running only in the analytics-test-hadoop cluster) already does this.
- Edit eventlogging_legacy.pull to use event_stream_config.settings_filters. You can use the same setting as in eventlogging_legacy_test.pull. Do not set event_stream_config.stream_names. - gobblin eventlogging_legacy - use EventStreamConfig to pull topics (1109477)
- Deploy refinery
- ensure that the eventlogging_legacy gobblin systemd timer job still works.
Remove refine_eventlogging_analytics job
This is a Refine job dedicated for ingesting not-yet-migrated legacy EventLogging data. This is not to be confused with 'refine_eventlogging_legacy', which is used to ingest migrated legacy EventLogging data.
Now that we have finished the migration, we can delete the refine_eventlogging_analytics job.
- Set ensure => 'absent' on profile::analytics::refinery::job::refine_job { 'eventlogging_analytics':` in profile::analytics::refinery::job::refine.
- Apply puppet on an-launcher1002
- Remove puppet code related to refine eventlogging_analytics profile::analytics::refinery::job::refine.
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1109135
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1109137
Decommission eventlog1003
- T383276: Delete ganeti VM eventlog1003.eqiad.wmnet, tag Data-Platform-SRE, asking them to delete the eventlog1003 ganeti virtual machine.
meta.wikimedia.org schemas
There is not a consistent practice for 'deleting' metawiki schemas. Often, the content is just zeroed out by editing the schema page and deleting the content. We should probably do this for ALL metawiki schemas.
Or, we could decide to just leave them as is. The system won't use these anymore, so it doesn't really matter from a technical perspective.
- Decision: empty all metawiki schema pages to {}
Code here: https://gitlab.wikimedia.org/-/snippets/208
Remove decommissioned Kafka topics and Hive tables
- Delete Kafka topics and Hive tables + HDFS data directories for non-migrated schemas listed in the To Decommission tab of the EventLogging Audit Spreadsheet.
Update wikitech and mediawiki documentation
- Move wikitech EventLogging/ and subpages under Analytics/Archive
- mediawiki documentation is already up to date, and outdated pages are marked as such.