This task will track the migration of EventLogging schemas & stream so to Event Platform schemas.
Tracking and planning of what schemas to migrate is being done in the [[ https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822| EventLogging Schema Migration Audit spreadsheet ]].
Explanation of what this means for legacy EventLogging schema owners:
https://wikitech.wikimedia.org/wiki/Event_Platform/EventLogging_legacy
---
====== Migration plan for a schema:
**1. Pick a schema to migrate**
Schemas to migrate are listed in [[ https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822| EventLogging Schema Migration Audit spreadsheet ]].
**2. Create a new task to track this schema's migration**
```lang=bash
# This should work on MacOS to open a new Phab Task form in a browser with some fields already field out.
function new_el_migration_phab_task() {
schema_name="$1"
open "https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=$schema_name Event Platform Migration&description=See: https://wikitech.wikimedia.org/wiki/Event_Platform/EventLogging_legacy
Unless otherwise notified, client IP and consequently geocoded data will no longer be collected for this event data after this migration. Please let us know if this should continue to be captured. See also T262626.& &parent=259163&tags=Event-Platform&subscribers=Ottomata,Mforns"
}
new_el_migration_phab_task SearchSatisfaction
```
Link this task in the [[ https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822| EventLogging Schema Migration Audit spreadsheet ]].
On the task, contact the owner of the schema and ask if they need client IP and/or geocoded data in the Hive table.
**3. Create /analytics/legacy/<schemaname>/current.yaml schema**
Using [[ https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/scripts/eventlogging_legacy_schema_convert.js | eventlogging_legacy_schema_convert script ]]) in the [[ https://gerrit.wikimedia.org/r/admin/repos/schemas/event/secondary | schemas/event/secondary ]] repository:
```
old_schema_name=SearchSatisfaction
new_schema_name=$(echo $old_schema_name | tr '[:upper:]' '[:lower:]')
mkdir ./jsonschema/analytics/legacy/$new_schema_name
node ./scripts/eventlogging_legacy_schema_convert.js $old_schema_name > ./jsonschema/analytics/legacy/$new_schema_name/current.yaml
```
You'll need to edit at least the JSONSchema `examples` in current.yaml. Easiest thing to do is get an event out of Kafka and use that as as staring point.
```
# Get the last event out of Kafka
kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -o -1 -c 1 -t eventlogging_SearchSatisfaction
```
If the schema owner indicated that they need client IP and/or geocoded data in Hive, you'll need to add a $ref to the fragment/http/client_ip schema. [[ https://schema.wikimedia.org/repositories//primary/jsonschema/fragment/w3c/reportingapi/report/current.yaml | Example here ]].
**4. Manually evolve the Hive table to use new schema**
Once the above schema is merged:
```lang=bash
old_schema_name=SearchSatisfaction
new_schema_name=$(echo $old_schema_name | tr '[:upper:]' '[:lower:]')
table="event.${new_schema_name}"
schema_uri="/analytics/legacy/${new_schema_name}/latest"
# First run in dry-rn mode (the default) to see what EvolveHiveTable will do.
spark2-submit --conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar:/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar --driver-java-options='-Dhttp.proxyHost=webproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080' --class org.wikimedia.analytics.refinery.job.refine.tool.EvolveHiveTable /srv/deployment/analytics/refinery/artifacts/refinery-job.jar --table="${table}" --schema_uri="${schema_uri}"
# If that looks good, evolve the table:
spark2-submit --conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar:/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar --driver-java-options='-Dhttp.proxyHost=webproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080' --class org.wikimedia.analytics.refinery.job.refine.tool.EvolveHiveTable /srv/deployment/analytics/refinery/artifacts/refinery-job.jar --table="${table}" --schema_uri="${schema_uri}" --dry-run=false
```
**5. Add entry to `wgEventStreams`, `wgEventLoggingStreamNames` and `wgEventLoggingSchemas` in operations/mediwiki-config**
Rolling deploy changes to make EventLogging extension produce data to EventGate. Example: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/607333/2/wmf-config/InitialiseSettings.php
-- You can check that events for your stream are still flowing through in [[ https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1 | this Grafana dashboard ]], or by consuming the `eventlogging_$old_schema_name` topic from Kafka.
**6. Once the legacy stream's data is fully produced through EventGate, switch to using Refine job that uses schema repo instead of meta.wm.org**
Example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/610055/1/modules/profile/manifests/analytics/refinery/job/refine.pp
**7. Modify https://meta.wikimedia.org/wiki/Schema_talk:$old_schema_name and note that schema has been migrated to Event Platform**
**8. Edit the producer extension.json and set EventLoggingSchemas to the new schema URI**
Example: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/639578
**9. Once the producer extension.json is fully deployed, edit `wgEventLoggingSchemas` in operations/mediawiki-config InitialiseSettings.php and remove the schema's entry.**
Example: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/639579
**10. Add the SchemaName to the eventlogging-processor disabled schemas list in puppet in modules/eventlogging/files/plugins.py**
This will prevent eventlogging-processor from producing what are now invalid legacy events from clients that are running old code.
Example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/639548
**11. Mark the schema as migrated in the [[ https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822| EventLogging Schema Migration Audit spreadsheet ]]**