As discussed in T228175: Event Platform Client Libraries, we believe we can migrate existent EventLogging extension produced streams to Modern Event Platform components. This will finally allow us to decommission the EventLogging backend pieces:
- varnishkafka-eventlogging
- eventlogging-processor (and eventlog1003)
- meta.wikimedia.org schemas
- refine_eventlogging_analytics
To support existent EventLogging events in eventgate-analytics, we need to do:
- meta.wikimedia.org schemas ported to draft 7 JSONSchema in a git schema repo with common schema included via $ref.
- stream config entry for each (active) EventLogging schema/stream.
- Schema revision extension attributes changed to use the new semver schema version.
- EL client side code adapted to produce full event (with capsule fields) and to POST to eventgate.
- Resolve capsule userAgent type issues (This is a string in JSONSchema, and a struct in Hive)
Ideally, EventLogging will produce the full event including EventCapsule fields to eventgate-analytics-external, the same eventgate instance that new style schemas will use. The same Refine job we use for eventgate analytics events should be able to Refine the old EL style events. Not all fields from capsule will be set (e.g. seqId and recvFrom), but we can work with what we have on the client side. The main issue will be resolving the userAgent type discrepancy, as we will parse the user_agent during refinement.
We'll start by migrating a single high volume EventLogging stream to MEP: SearchSatisfaction - T249261: Vertical: Migrate SearchSatisfaction EventLogging event stream to Event Platform
Once T259163: Migrate legacy metawiki schemas to Event Platform is done, we should clean up all schemas on metawiki, either by deleting them or emptying out their content with {}.
Steps
Undeploy varnishkafka-eventlogging
- Set ensure => 'absent' on varnishkafka::instance and nrpe::monitor_service in profile::cache::kafka::eventlogging.
- Apply puppet on all varnish cache nodes and ensure varnishkafka-eventlogging is stopped and config files are removed.
- Remove profile::cache::kafka::eventlogging from operations/puppet.
Undeploy eventlogging-processor
- Set ensure => 'absent' on:
- eventlogging::service::processor in profile::eventlogging::analytics::processor
- scap::target, ssh::userkey, and eventlogging::server eventlogging::monitoring::jobs in profile::eventlogging::analytics::server.
- Apply puppet on eventlog1003 and ensure all eventlogging-processor@client-side-* processes are stopped and gone.
Delete the legacy backend eventlogging Kafka topics in Kafka jumbo-eqiad cluster:
To delete:
eventlogging-client-side eventlogging-valid-mixed eventlogging-virtualpageview
DO NOT delete any eventlogging_* topics. These are migrated eventlogging topics.
Update gobblin job eventlogging_legacy.pull
The eventlogging_legacy Gobblin job is still using wildcard topic names to figure out which Kafka topics to import. Now that all legacy streams are migrated, we should use EventStreamConfig to determine which streams to import.
The eventlogging_legacy_test.pull job (running only in the analytics-test-hadoop cluster) already does this.
- Edit eventlogging_legacy.pull to use event_stream_config.settings_filters. You can use the same setting as in eventlogging_legacy_test.pull. Do not set event_stream_config.stream_names.
- Deploy refinery
- ensure that the eventlogging_legacy gobblin systemd timer job still works.
Remove refine_eventlogging_analytics job
This is a Refine job dedicated for ingesting not-yet-migrated legacy EventLogging data. This is not to be confused with 'refine_eventlogging_legacy', which is used to ingest migrated legacy EventLogging data.
Now that we have finished the migration, we can delete the refine_eventlogging_analytics job.
- Set ensure => 'absent' on profile::analytics::refinery::job::refine_job { 'eventlogging_analytics':` in profile::analytics::refinery::job::refine.
- Apply puppet on an-launcher1002
- Remove puppet code related to refine eventlogging_analytics profile::analytics::refinery::job::refine.
Reconfigure refine_eventlogging_legacy job
We can now remove the manually maintained $eventlogging_legacy_table_include_list.
- Remove puppet code that references $eventlogging_legacy_table_include_list and table_include_regex => $eventlogging_legacy_table_include_regex from the EventLogging Legacy data refine job configuration
- Apply puppet on an-launcher1002
- Ensure that refine_eventlogging_legacy job still works.
Decommission eventlog1003
- File a phab ticket and tag Data-Platform-SRE, asking them to delete the eventlog1003 ganeti virtual machine.
meta.wikimedia.org schemas
There is not a consistent practice for 'deleting' metawiki schemas. Often, the content is just zeroed out by editing the schema page and deleting the content. We should probably do this for ALL metawiki schemas.
Or, we could decide to just leave them as is. The system won't use these anymore, so it doesn't really matter from a technical perspective.