Page MenuHomePhabricator

Remove user is_registered field from mediawiki/page/change schema
Closed, ResolvedPublic

Description

There is a lot of confusion as to what a 'registered' user is, now that temp users are a thing.

MediaWiki says that a registered user is any user with a user_id. However, colloquially, a registered user is a non-anonymous user. Temp users are a bit of a hybrid.

A documented decision needs to be agreed upon and made on what the actual definitions of these user types are, and how they will be codified in data fields. By removing the is_registered boolean from the MW user entity event schema, we can avoid codifying the wrong decision.

is_registered will be true if user_id > 0, so users of this can still get the same behavior without this field.

Since we haven't reallly officallly finally announced the mediawiki.page_change.v1 stream, we can make this as a backwards incompatible change.

  • Stop producing is_registered in EventBus
  • Remove is_registered field and rematerialize 1.0.0 schemas, including mediawiki/page/prediction_classification_change (T328899)
  • Rebuild eventgate-wikimedia image with latest schema repo and redeploy eventgate-main
  • Drop existent event.mediawiki_page_change_v1 and event.mediawiki_page_outlink_topic_prediction_change Hive tables so that field is removed. (We don't yet care about the data in these tables).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 923253 had a related patch set uploaded (by TChin; author: TChin):

[schemas/event/primary@master] Remove is_registered field from user entity fragment

https://gerrit.wikimedia.org/r/923253

Change 923254 had a related patch set uploaded (by TChin; author: TChin):

[mediawiki/extensions/EventBus@master] Remove is_registered from UserEntitySerializer

https://gerrit.wikimedia.org/r/923254

Oh, foof, we will have to rematerialize prediction_classicifcation_change too!

@achou I'm assuming you all don't use or use or reference the performer.is_registered or revision.editor.is_registered field in your stuff. We'd like to remove these field from the page/change schema, and do it in a backwards incompatible way.

I don't see any events flowing for the mediawiki.page_outlink_topic_prediction_change stream yet, so I'm assuming it is safe to do this. We'll have to drop (or alter) the existent hive table, but I think this will be okay?

Change 923254 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Remove is_registered from UserEntitySerializer

https://gerrit.wikimedia.org/r/923254

Ottomata updated the task description. (Show Details)

Change 923253 merged by jenkins-bot:

[schemas/event/primary@master] Remove is_registered field from user entity fragment

https://gerrit.wikimedia.org/r/923253

Change 929716 had a related patch set uploaded (by Ottomata; author: Ottomata):

[eventgate-wikimedia@master] Bump primary schema repo sha to pick up change to mediawiki/page/change

https://gerrit.wikimedia.org/r/929716

Change 929716 merged by Ottomata:

[eventgate-wikimedia@master] Bump primary schema repo sha to pick up change to mediawiki/page/change

https://gerrit.wikimedia.org/r/929716

Change 929725 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] eventgate-main - Bump image version to pick up change to mediawiki/page/change

https://gerrit.wikimedia.org/r/929725

Change 929725 merged by jenkins-bot:

[operations/deployment-charts@master] eventgate-main - Bump image version to pick up change to mediawiki/page/change

https://gerrit.wikimedia.org/r/929725

Mentioned in SAL (#wikimedia-analytics) [2023-06-13T15:05:41Z] <ottomata> dropping hive table event.mediawiki_page_change_v1 to pick up backwards incompatible schema change - T337395

[@an-launcher1002:/home/otto] $ sudo -u analytics kerberos-run-command analytics hive
drop table event.mediawiki_page_change_v1;
$ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /wmf/data/event/mediawiki_page_change_v1
23/06/13 15:07:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop/wmf/data/event/mediawiki_page_change_v1' to trash at: hdfs://analytics-hadoop/user/hdfs/.Trash/Current/wmf/data/event/mediawiki_page_change_v1

Next refinement will use latest schema without is_registered field. Will verify that table is created properly and refinement works.

Mentioned in SAL (#wikimedia-analytics) [2023-06-13T15:19:25Z] <ottomata> drop event.mediawiki_page_outlink_topic_prediction_change table and data - T337395

drop table event.mediawiki_page_outlink_topic_prediction_change;
sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /wmf/data/event/mediawiki_page_outlink_topic_prediction_change

Recreated Hive tables look good: no is_registered field!

I think we are done.