Page MenuHomePhabricator

Prepare EventBus for temp accounts
Closed, ResolvedPublic

Description

Some things to look for (from parent task) in the codebase:

Use caseSearch termsCodesearch
Feature checks whether a user is registered->isAnon, ->isRegistered, mw.user.isAnon, mw.user.getId, mw.user.getRegistrationsearch results
Feature checks a user name (possibly then checking if it is registered)mw.user.getName, $user->getName, $user->getRealNamesearch results
IP address utility functions are imported/used (IP addresses may be found via an anonymous username)use Wikimedia\IPUtils, mw.util.isIPAddresssearch results
Feature renders a user name::userLink, ::revUserLink, ::revUserToolssearch results

Event Timeline

Feature checks a user name (possibly then checking if it is registered)

We do check for user names for serializing events posted to EventGate in these two places:

This field - and serialized user object as whole - is only read and forwarded to EventGate, we don't have business logic that depends on this info. From the Epic description:

IP Masking is likely to affect anything that:

Does something different for registered vs anonymous accounts
Identifies an anonymous user by checking the user name or ID
Does anything with a user's IP address, having got it from the username

EventBus does not do any of that, and should not be affected by this change. We do check if a user is register, to set some files required by the user schema:

		$userAttrs = [
			'user_text' => $user->getName(),
			'groups' => $this->userGroupManager->getUserEffectiveGroups( $user ),
			'is_bot' => $user->isRegistered() && $user->isBot(),
			'is_system' => $user->isSystemUser(),
			'is_temp' => $user->isTemp()
		];
		if ( $user->getId() ) {
			$userAttrs['user_id'] = $user->getId();
		}
		if ( $user->getRegistration() ) {
			$userAttrs['registration_dt'] =
				EventSerializer::timestampToDt( $user->getRegistration() );
		}
		if ( $user->isRegistered() ) {
			$userAttrs['edit_count'] = $user->getEditCount();
		}

But my understanding is that this behavior should not be impacted by IP Masking either.

EventBus has logic to redact performer info for suppressed revision, but that is built atop RevisionRecord visibility bits, and not on user/performer details.

EventFactory.php (deprecated since 0.5.0).

The only implication I can think of is that the to-be-deprecated event streams, e.g. mediawiki.page-move, don't have an is_temp field. But, we'd like people to switch to mediawiki.page_change.v1 anyway, so if they need this we can ask them to switch.

This might be an issue for streams that aren't captured in mediawiki.page_change.v1, like mediawiki.page-links-change, or visibility changes to past revisions in mediawiki.revision-visibility-change. We'd like to make updated versions of these streams( e.g. T331399: Create new mediawiki links change streams based on fragment/mediawiki/state/change/page), but this has not been prioritized.

Hey @Ottomata

EventFactory.php (deprecated since 0.5.0).

The only implication I can think of is that the to-be-deprecated event streams, e.g. mediawiki.page-move, don't have an is_temp field. But, we'd like people to switch to mediawiki.page_change.v1 anyway, so if they need this we can ask them to switch.

Good point!

If downstream consumers have username/IP-related logic that would break, it could still be an issue, even if we added new is_temp metadata to legacy streams. They would still need to update their business logic to support temporary accounts (although the scale of the changes might differ). I can see how this could impact consumers of analytics data (Hive & c), but in those cases I'd assume information about the state of a user could be found with a lookup on mediawiki tables.

This might be an issue for streams that aren't captured in mediawiki.page_change.v1, like mediawiki.page-links-change, or visibility changes to past revisions in mediawiki.revision-visibility-change.

FWIW I could not find any reference to those streams in code search other than in configuration files (puppet, airflow-dags).

We'd like to make updated versions of these streams( e.g. T331399: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page), but this has not been prioritized.

I’d prefer consumers transition away from legacy streams rather than adding features to a deprecated code path/streams. In my opinion, it’s safe to assume that, as of EventBus >= 0.5.0, those streams could already be in a broken state due to changes unrelated to temporary accounts.

I’d prefer consumers transition away from legacy streams rather than adding features to a deprecated code path/streams.

+1

 FWIW I could not find any reference to those streams in code search

Aye, IIRC for T328899: Add a new outlink topic stream for EventGate main they were going to use page-links-change, but we advised them to use page_create (with some downsides) until we did T331399.

But, I believe many of these streams are used externally, e.g. for Enterprise or Internet Archive:
https://grafana-rw.wikimedia.org/d/znIuUcsWz/eventstreams?orgId=1&refresh=1m

I've synced on this with @Milimetric and the team responsible for the analytics bits of the temp account feature.

We will not add fields to legacy streams via EventFactory.php. Data pipelines dependent on deprecated streams will be marked as deprecated, and docs will be provided about known issues / eventual event paths.

I've synced on this with @Milimetric and the team responsible for the analytics bits of the temp account feature.

We will not add fields to legacy streams via EventFactory.php. Data pipelines dependent on deprecated streams will be marked as deprecated, and docs will be provided about known issues / eventual event paths.

+1

@gmodena is there an estimated completion date for this task? Thanks!

Hey @kostajh

@gmodena is there an estimated completion date for this task? Thanks!

AFAIK there's no work needed on EventBus, and we can close this task.

@Milimetric's team will pick up downstream work on DPE EventBus stream consumers.

Hey @kostajh

@gmodena is there an estimated completion date for this task? Thanks!

AFAIK there's no work needed on EventBus, and we can close this task.

@Milimetric's team will pick up downstream work on DPE EventBus stream consumers.

Sounds good, thank you.