This task will track the migration of EventLogging schemas & stream so to Event Platform schemas.
Tracking and planning of what schemas to migrate is being done in the [[ https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822| EventLogging Schema Migration Audit spreadsheet ]].
---
====== Migration plan for a schema:
- Pick a schema to migrate from the [[ https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822| EventLogging Schema Migration Audit spreadsheet ]].
-- Find the owner of that schema and ask if they need client IP and/or geocoded data in the Hive table.
- Create /analytics/legacy/<schemaname>/current.yaml schema (using [[ https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/scripts/eventlogging_legacy_schema_convert.js | eventlogging_legacy_schema_convert script ]]) in the [[ https://gerrit.wikimedia.org/r/admin/repos/schemas/event/secondary | schemas/event/secondary ]] repository:
```
schema_name=searchsatisfaction
mkdir ./jsonschema/analytics/legacy/$schema_name
node ./scripts/eventlogging_legacy_schema_convert.js SearchSatisfaction > ./jsonschema/analytics/legacy/$schema_name/current.yaml
```
You'll need to edit at least the JSONSchema `examples` in current.yaml. Easiest thing to do is get an event out of Kafka and use that as as staring point.
```
# Get the last event out of Kafka
kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -o -1 -c 1 -t eventlogging_SearchSatisfaction
```
If the schema owner indicated that they need client IP and/or geocoded data in Hive, you'll need to add a $ref to the fragment/http/client_ip schema. [[ https://schema.wikimedia.org/repositories//primary/jsonschema/fragment/w3c/reportingapi/report/current.yaml | Example here ]].
- Manually evolve the Hive table to use new schema:
```
table="event.${schema_name}"
schema_uri="/analytics/legacy/${schema_name}/latest"
# First run in dry-rn mode (the default) to see what EvolveHiveTable will do.
spark2-submit --conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar:/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar --driver-java-options='-Dhttp.proxyHost=webproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080' --class org.wikimedia.analytics.refinery.job.refine.tool.EvolveHiveTable /srv/deployment/analytics/refinery/artifacts/refinery-job.jar --table="${table}" --schema_uri="${schema_uri}"
# If that looks good, evolve the table:
spark2-submit --conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar:/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar --driver-java-options='-Dhttp.proxyHost=webproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080' --class org.wikimedia.analytics.refinery.job.refine.tool.EvolveHiveTable /srv/deployment/analytics/refinery/artifacts/refinery-job.jar --table="${table}" --schema_uri="${schema_uri}" --dry-run=false
```
- Add entry to wgEventStreams, wgEventLoggingStreamNames and wgEventLoggingSchemas to rolling deploy changes to make EventLogging extension produce data to EventGate. Example: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/607333/2/wmf-config/InitialiseSettings.php
-- You can check that events for your stream are still flowing through in [[ https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1 | this Grafana dashboard ]], or by consuming the `eventlogging_<SchemaName>` topic from Kafka.
- Once the legacy stream's data is fully produced through EventGate, switch to using Refine job that uses schema repo instead of meta.wm.org. Example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/610055/1/modules/profile/manifests/analytics/refinery/job/refine.pp
- Modify https://meta.wikimedia.org/wiki/Schema_talk:SchemaName and note that schema has been migrated to Event Platform.
- Edit the producer extension.json and set EventLoggingSchemas to the new schema URI.
-- Once this change is fully deployed, edit wgEventLoggingSchemas in InitialiseSettings.php and remove the schema's entry.
- Add the SchemaName to the eventlogging-processor disabled schemas list in puppet in modules/eventlogging/files/plugins.py. This will prevent eventlogging-processor from producing what are now invalid legacy events from clients that are running old code.