Page MenuHomePhabricator

Make ORES topics and their translations easily available to MediaWiki extensions
Closed, ResolvedPublic3 Estimated Story Points

Description

As part of T362259, the Connection-Team would like to use ORES topics for a new feature. In particular, we would need a list of said topics, as well as translations. The same list and translations are currently already in use in two different MediaWiki extensions:

With this task, I am proposing that the list of topics and its translations be made easily available to MediaWiki extensions. In T368422 there had been an attempt to put these into the ORES extension, which is however not the appropriate place. Here, I'm proposing the WikimediaMessages extension. While not perfect, it has the advantages of being available on every wiki, and being Wikimedia-specific (just like the topic taxonomy).

There would be a new class basically identical to CX's ArticleTopicsDefinition class, providing both the raw topic names and their translated labels. The initial translations could presumably be imported from GrowthExperiments, since they've been around for a long time. The exact API is up for discussion. Now is a good time to identify what each team's needs are, and build it so that we all can use it.

I'm looking for general feedback on the proposal (for example, if there are better places than WikimediaMessages), and for specific feedback about what the new interface should provide. My initial proposal, based on the Campaigns team's needs, would be (method names are just examples):

/** Returns a plain list of topic IDs, for validation and the like */
public static function getTopicList(): array;

/**
 * The "main" entry point, could be identical to ArticleTopicsDefinition::getTopics(). The main difference is that we either
 * make this use message keys (with l10n up to the caller), or add a MessageLocalizer/ITextFormatter parameter.
 */
public static function getGroupedTopicMessages(): array;

/**
 * Returns localised labels for the given topic IDs. Like above, this could either return message keys, or take a
 * MessageLocalizer/ITextFormatter parameter.
 */
public static function getLocalizedLabels( array $topicIDs ): array;

In addition to us Connection-Team, I am tagging Growth-Team (ref), Language and Product Localization, and Research as technical stakeholders. Please let me know if I need to use different tags or processes to reach y'all. Thanks in advance!


Migration plan: T380825#10536518

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

WikimediaMessages sounds good to me.

For code search, it would be better to avoid constructing the message keys dynamically like currently done in CX.

We probably want to preserve the existing translations (from CX?) and try to move them over to WikimediaMessages. We can handle that part once we are that far.

For code search, it would be better to avoid constructing the message keys dynamically like currently done in CX.

Agreed - that, or list the possible keys in a comment as usual.

We probably want to preserve the existing translations (from CX?) and try to move them over to WikimediaMessages. We can handle that part once we are that far.

I was proposing to use Growth translations because they've been around for longer, so I thought they're probably more complete. But as you say, that can come later.


I guess one thing I forgot to mention in the proposal is: how would changes to the list be handled? In T362259 I briefly mentioned the idea of having versioning; I'm not sure if it's a good idea though. We could also just treat it as an ordinary shared code change, and have the person making the change make sure that no code will break (which might include reaching out to the maintainers of said code, i.e. the teams subscribed to this task).

Thanks @Daimona for putting this together! Just adding RecentChanges as another potential stakeholder where these types of labels might appear (and therefore I think @Samwalton9-WMF ?). Context: T245906

One more thing the Campaigns team would like to have is to keep old topics (from a previous version of the taxonomy) around. I'm thinking they could have an extra 'disabled' => true property or something. However, there's no need to implement this now. Instead, it can be done when we'll actually have a change to the taxonomy.

I will make a proof of concept for the current proposal later today.

Change #1100553 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/WikimediaMessages@master] [POC] Introduce ArticleTopicsRegistry

https://gerrit.wikimedia.org/r/1100553

Just checking in: I'd like to confirm whether people have reviewed the proposal and the proof of concept to make sure that it fits their needs. If so, I would like to polish it up next week and put it up for review, so please let me know if you have any objections. Thanks!

Update: the patch is now in review. One thing I should mention are translations. I found the following differences between GrowthExperiments and ContentTranslation:

TopicGECX
GeographyRegionsGeography
BiographyBiography (all)Biography
Women biographiesBiography (women)Women

I asked @ifried, and we would like to keep the GrowthExperiments versions of all three, because:

  • Geography is the study of Earth, but that category really only contains regions
  • "Women" alone is too broad to be used for just categories (could refer to other things such as women’s health, women’s history, women’s rights).
    • Consequently, we then need to have "Biography (all)".

There are also two messages that differ only in capitalization:

GECX
History and SocietyHistory and society
Science, Technology and MathScience, technology and math

For these, I believe the CX version is preferable due to standard capitalisation.

Please let me know if you have any feedback on the above!

Also, @Nikerabbit: can you please take care of moving the messages on TWN when the above patch is merged? I can prepare a complete map from old to new if you need it in a specific format.

Is the mapping straightforward enough to do with a pattern on Special:ReplaceText? Else I would use moveBatch.php but that needs full list of page titles.

Also, I cannot move stuff while CX is using them, so we should prepare a patch in CX to call WikimediaMessages if available and patch for translatewiki.net to setup this new group. This also affects testing locally.

Is the mapping straightforward enough to do with a pattern on Special:ReplaceText? Else I would use moveBatch.php but that needs full list of page titles.

This is actually a good question. I think it depends on how much we want to try and preserve existing translations, since we're trying to merge two separate sources. I guess a more refined solution would look like this.

  1. For each language and topic, compare growthexperiments-homepage-suggestededits-topic-name-$TOPIC and cx-articletopics-topic-$TOPIC. If they're identical, put the content in wikimedia-articletopics-topic-$TOPIC and delete both sources.
  2. Do the same for growthexperiments-homepage-suggestededits-topic-group-name-$GROUP and cx-articletopics-group-$GROUP --> wikimedia-articletopics-group-$GROUP.
  3. (Now we no longer have identical messages that exist in both sources)
  4. Move the following GrowthExperiments messages, if they exist, regardless of the CX version; delete both sources afterwards.
    1. growthexperiments-homepage-suggestededits-topic-group-name-geography to wikimedia-articletopics-group-geography
    2. growthexperiments-homepage-suggestededits-topic-name-biography to wikimedia-articletopics-topic-biography
    3. growthexperiments-homepage-suggestededits-topic-name-women to wikimedia-articletopics-topic-women
  5. For each topic, if only one exists of growthexperiments-homepage-suggestededits-topic-name-$TOPIC and cx-articletopics-topic-$TOPIC, move its content to wikimedia-articletopics-topic-$TOPIC and delete the source.
  6. Do the same for growthexperiments-homepage-suggestededits-topic-group-name-$GROUP XOR cx-articletopics-group-$GROUP --> wikimedia-articletopics-group-$GROUP.
  7. (Now we only have messages that exist in both sources but are different)
  8. For the following messages, use the CX version and then delete both sources:
    1. cx-articletopics-group-history-and-society -> wikimedia-articletopics-group-history-and-society
    2. cx-articletopics-group-science-technology-and-math -> wikimedia-articletopics-group-science-technology-and-math
  9. For the remaining messages: I don't know. I'd be inclined to keep the GE version because it's probably been around for longer than CX, but that doesn't necessarily mean it's best. Manual review would be ideal but I have no idea how many messages there will be.

I don't think the above can be achieved through any means other than a custom script. Or can it?

Also, I cannot move stuff while CX is using them, so we should prepare a patch in CX to call WikimediaMessages if available

I can make a quick patch to have CX use the new messages as soon as they exist; the code itself can be updated later. Same for GrowthExperiments. Both should end up being two-liner patches (plus removal of messages).

and patch for translatewiki.net to setup this new group. This also affects testing locally.

In r1100553 I'm adding messages to the existing "Wikimedia Messages" group. I thought about creating a new group for topics, but the migration already seemed complex enough and I left that aside. Could we maybe do it after the initial migration?

I don't think the above can be achieved through any means other than a custom script. Or can it?

Sounds like manual work to me. And tricky one in the sense that if we move or delete anything, the next daily export will remove the translations. Requires close coordination to avoid breaking anything.

I will check with my team that they are aware of the implications and okay with that.

Trying to summarize the outstanding questions, so we don't lose the thread:

  • @KStoller-WMF, @PWaigi-WMF: I made a proposal in T380825#10399214 about changing some of the existing labels. This will affect both GrowthExperiments and ContentTranslation, so I'd like to make sure that it's OK from a product perspective.
  • @Nikerabbit: Any feedback on the plan in T380825#10400786? Let me know if you need anything else or in a more specific format. We could also pair on the migration, if you'd like to.
  • @Isaac: In gerrit (thread), we've been trying to find a more accurate name for what we're currently calling "ORES topics". This would be specifically in relation to the taxonomy, and it would need to be a short name. Any suggestions on that?
  1. (Now we only have messages that exist in both sources but are different)

I assume you are referring here (and in all the other items in that list) to the English message, in the respecive en.json files, right? Or do you plan to apply these steps per message?

For example, the English word in both cases might be "Technology", and GrowthExperiment's de.json might have "Technologie", however ContentTranslation's de.json might have "Technik". -- Would that be treated as the messages being the same or as them being different in your series of steps?

  1. (Now we only have messages that exist in both sources but are different)

I assume you are referring here (and in all the other items in that list) to the English message, in the respecive en.json files, right? Or do you plan to apply these steps per message?

No, I'm planning to do this for every language. I realized only now that this is only mentioned in step 1 of T380825#10400786, but all those steps would be run for each language. Pseudo-code below if that helps.

For example, the English word in both cases might be "Technology", and GrowthExperiment's de.json might have "Technologie", however ContentTranslation's de.json might have "Technik". -- Would that be treated as the messages being the same or as them being different in your series of steps?

As being different. For the English messages, I've already verified that the GE and CX versions are identical, with the 5 exceptions mentioned in T380825#10399214. But for translations, we might have the problem you are mentioning. My proposed solution is to use the GE version, but just because it's been around for longer.


Pseudocode
<?php

$languages = [ /* list of all languages */ ];
$topics = [ /* list of all topics */ ];
$groups = [ /* list of all groups */ ];

/** Returns content of message in the given language */
function msg( string $key ): ?string {}
/** Moves $from to $to, deleting $from in the process. No-op if $from does not exist */
function move( string $from, string $to ) {}
/** Deletes the message, no-op if it does not exist */
function delete_msg( string $key ) {}
/** Checks whether the msg exists in the given language */
function msg_exists( string $key ): bool {}

foreach ( $languages as $lang ) {
	// Note, below we always iterate the same topics & groups. In reality, we could remove processed topics from the
	// list after each step.

	// Step 1
	foreach ( $topics as $topic ) {
		if (
			msg( "growthexperiments-homepage-suggestededits-topic-name-$topic/$lang" ) ===
			msg( "cx-articletopics-topic-$topic/$lang" )
		) {
			move( "cx-articletopics-topic-$topic/$lang", "wikimedia-articletopics-topic-$topic/$lang" );
			delete_msg( "growthexperiments-homepage-suggestededits-topic-name-$topic/$lang" );
		}
	}
	// Step 2
	foreach ( $groups as $group ) {
		if (
			msg( "growthexperiments-homepage-suggestededits-topic-group-name-$group/$lang" ) ===
			msg( "cx-articletopics-group-$group/$lang" )
		) {
			move( "cx-articletopics-group-$group/$lang", "wikimedia-articletopics-group-$group/$lang" );
			delete_msg( "growthexperiments-homepage-suggestededits-topic-group-name-$group/$lang" );
		}
	}

	// Step 4
	if ( msg_exists( "growthexperiments-homepage-suggestededits-topic-group-name-geography/$lang" ) ) {
		move(
			"growthexperiments-homepage-suggestededits-topic-group-name-geography/$lang",
			"wikimedia-articletopics-group-geography/$lang"
		);
	}
	delete_msg( "cx-articletopics-group-geography/$lang" );
	$forceGETopics = [ 'biography', 'women' ];
	foreach ( $forceGETopics as $forceGETopic ) {
		if ( msg_exists( "growthexperiments-homepage-suggestededits-topic-name-$forceGETopic/$lang" ) ) {
			move(
				"growthexperiments-homepage-suggestededits-topic-name-$forceGETopic/$lang",
				"wikimedia-articletopics-topic-$forceGETopic/$lang"
			);
		}
		delete_msg( "cx-articletopics-topic-$forceGETopic/$lang" );
	}

	// Step 5
	foreach ( $topics as $topic ) {
		if (
			msg_exists( "growthexperiments-homepage-suggestededits-topic-name-$topic/$lang" ) &&
			!msg_exists( "cx-articletopics-topic-$topic/$lang" )
		) {
			move(
				"growthexperiments-homepage-suggestededits-topic-name-$topic/$lang",
				"wikimedia-articletopics-topic-$topic/$lang"
			);
		} elseif (
			msg_exists( "cx-articletopics-topic-$topic/$lang" ) &&
			!msg_exists( "growthexperiments-homepage-suggestededits-topic-name-$topic/$lang" )
		) {
			move(
				"cx-articletopics-topic-$topic/$lang",
				"wikimedia-articletopics-topic-$topic/$lang"
			);
		}
	}
	// Step 6
	foreach ( $groups as $group ) {
		if (
			msg_exists( "growthexperiments-homepage-suggestededits-topic-group-name-$group/$lang" ) &&
			!msg_exists( "cx-articletopics-group-$group/$lang" )
		) {
			move(
				"growthexperiments-homepage-suggestededits-topic-group-name-$group/$lang",
				"wikimedia-articletopics-group-$group/$lang"
			);
		} elseif (
			msg_exists( "cx-articletopics-group-$group/$lang" ) &&
			!msg_exists( "growthexperiments-homepage-suggestededits-topic-group-name-$group/$lang" )
		) {
			move(
				"cx-articletopics-group-$group/$lang",
				"wikimedia-articletopics-group-$group/$lang"
			);
		}
	}

	// Step 8
	$preferCXGroups = [ 'history-and-society', 'science-technology-and-math' ];
	foreach ( $preferCXGroups as $preferCXGroup ) {
		move(
			"cx-articletopics-group-$preferCXGroup/$lang",
			"wikimedia-articletopics-group-$preferCXGroup/$lang"
		);
		delete_msg( "growthexperiments-homepage-suggestededits-topic-group-name-$preferCXGroup/$lang" );
	}

	// Step 9 (choose GE)
	foreach ( $topics as $topic ) {
		move(
			"growthexperiments-homepage-suggestededits-topic-name-$topic/$lang",
			"wikimedia-articletopics-topic-$topic/$lang"
		);
		delete_msg( "cx-articletopics-topic-$topic/$lang" );
	}
	foreach ( $groups as $group ) {
		move(
			"growthexperiments-homepage-suggestededits-topic-group-name-$group/$lang",
			"wikimedia-articletopics-group-$group/$lang"
		);
		delete_msg( "cx-articletopics-group-$group/$lang" );
	}
}

From the user experience perspective, it is good to expose users with consistent terms. Having a single source of truth for topics seems a good idea.

Regarding the question of which terms to use, we may consider how these elements can be presented to users in different contexts. Users can see the full catalog to pick from, they may see one item in isolation that applies to the current page, or a list of several items that is used as a filter. Thus, terms should make sense in isolation, along some or in the full catalog. Having terms that are clear, easy to interpret and quick to process makes sense to me in this context.

On the specific decisions, I'll share some of the context to illustrate how we approached the naming, but I'm happy to hear other perspectives:

  • Regions/Geography. The current support in ORES is planned to evolve to include countries. I don't have a strong preference on the specific term but it would be great to find a concept that works well with different types of location.
  • Women biographies / Biography (women) / Women. Our thinking was that "Women" was the more direct way to represent the topic. If we ask to provide examples of women, I think the reasonable expectation is to get a list of people who are women. They may be represented with their names in the list, and as articles in the format of biographies on Wikipedia, but I think the essence of the topic can be better captured by just "Women" than "Woman names", "Woman articles" or "Woman biographies". If we really want to keep the biography notion, the more human-readable "Women biographies" may be preferred to "Biography (women)" as working better when present in isolation.
  • Biography (all)/Biography. The term "Biography (all)" does not seem to work well in isolation, since the "all" part is only understood when another item defines a specific subset (in this case, biographies of women). Along the lines of the previous case, if we wanted to connect more with the concept rather than the representation, we could consider naming it as "People"?
  • Capitalization. As mentioned, standard capitalization seems preferred.

I'm sharing the above thoughts for context. I think it is ok to unify in whichever we think it is the best approach and iterate. This is something we can easily test through simple user research asking users to classify some articles into categories, and/or provide examples for a given set of categories to learn which terms work the best.

Hi @Pginer-WMF, thank you for providing this explanation behind the CX choices! It was very helpful to read through.

I'm sharing some responses below, and I'm interested in any thoughts or feedback you have :)

From the user experience perspective, it is good to expose users with consistent terms. Having a single source of truth for topics seems a good idea.

Regarding the question of which terms to use, we may consider how these elements can be presented to users in different contexts. Users can see the full catalog to pick from, they may see one item in isolation that applies to the current page, or a list of several items that is used as a filter. Thus, terms should make sense in isolation, along some or in the full catalog. Having terms that are clear, easy to interpret and quick to process makes sense to me in this context.

On the specific decisions, I'll share some of the context to illustrate how we approached the naming, but I'm happy to hear other perspectives:

  • Regions/Geography. The current support in ORES is planned to evolve to include countries. I don't have a strong preference on the specific term but it would be great to find a concept that works well with different types of location.

Thanks for pointing this out! I think that the concept of "region" makes more sense than "geography" when countries are introduced. Here's why: The relationship between a "region" and a "country" is clear, to me at least. For example, the country of Nigeria is in the region of West Africa... or in Wikimedia terms, Sub-Saharan Africa. I honestly don't even know what I would say for "geography" in this context.

I guess you could say "geography" is related to things like the lakes, mountains, etc of Nigeria. But that is just one part of the country -- not things like the culture or history. So, this also makes me think that "region" is a better fit. But perhaps I'm missing something, so I wonder what you think!

  • Women biographies / Biography (women) / Women. Our thinking was that "Women" was the more direct way to represent the topic. If we ask to provide examples of women, I think the reasonable expectation is to get a list of people who are women. They may be represented with their names in the list, and as articles in the format of biographies on Wikipedia, but I think the essence of the topic can be better captured by just "Women" than "Woman names", "Woman articles" or "Woman biographies". If we really want to keep the biography notion, the more human-readable "Women biographies" may be preferred to "Biography (women)" as working better when present in isolation.
  • Biography (all)/Biography. The term "Biography (all)" does not seem to work well in isolation, since the "all" part is only understood when another item defines a specific subset (in this case, biographies of women). Along the lines of the previous case, if we wanted to connect more with the concept rather than the representation, we could consider naming it as "People"?

Thanks for bringing this one up too! The women part is a bit complex, and I hear you on the fact that the term "women" commonly refers to people, rather than topics/concepts like women's health.

I have a few main reasons for going for "Biography (women)," which are the following:

  • It seems that the future proposed taxonomy (based on what is currently shared in Isaac's sandbox -- but @Isaac, please correct me if there are any changes!) is: Biography (women), Biography (men), and Biography (nonbinary). So this format conforms to the proposed new taxonomy.
  • There will be other categories related to women in the new taxonomy too, which still are being figured out, I think... but they could be things like Women's Health or Women's Rights. They are currently listed as Women's X in Isaac's sandbox. This could make "Women" as a standalone category a bit confusing.
  • We have heard from some gender organizers that they want more freedom and support to organize on women's topics that go beyond biography. Some of this has already translated into various projects and initiatives, such as Knowledge Gaps in Women's Health. This has also influenced some of the changes that are currently being determined for the updated taxonomy, I believe. So, my thinking is that since we have heard from some organizers about the importance of not equating work related to women and gender with biographical content, then I also reasoned that it would be good to explicitly call out which articles about women's biographies vs. other women-related topics.
  • Capitalization. As mentioned, standard capitalization seems preferred.

I'm sharing the above thoughts for context. I think it is ok to unify in whichever we think it is the best approach and iterate. This is something we can easily test through simple user research asking users to classify some articles into categories, and/or provide examples for a given set of categories to learn which terms work the best.

Yup, I also prefer the standard capitalization since it's what we generally see on the wikis, but I also think it sounds good to test through simple user research.

In gerrit (thread), we've been trying to find a more accurate name for what we're currently calling "ORES topics". This would be specifically in relation to the taxonomy, and it would need to be a short name. Any suggestions on that?

@Daimona sorry, I had a draft reply sitting in Gerrit but forgot to share it. Essentially I think "articletopic" for things related to the model and LiftWing/Search APIs but then "topic filter" would be good terminology for how these topics are exposed to end-users.

They are currently listed as Women's X in Isaac's sandbox. This could make "Women" as a standalone category a bit confusing.

@ifried for what it's worth, I think this is one of the least likely topics to survive. It's proven very difficult for the model to accurately predict and has a lot of opportunity for harm given the sensitivity/nuance required around gender-related topics. To Ilana's point about desire to enable this functionality by organizers, I think we might end up depending a lot more on the WikiProject/campaign-based filters that you all have been working on too (where we'd try to support organizers in creating those lists but wouldn't want to automatically predict them as we do with these other high-level topics).

It seems that the future proposed taxonomy (based on what is currently shared in Isaac's sandbox -- but @Isaac, please correct me if there are any changes!) is: Biography (women), Biography (men), and Biography (nonbinary). So this format conforms to the proposed new taxonomy.

Yes, I'm fairly confident we'll go forward with using Wikidata for identifying biographies and then using sex-or-gender property to allow for further filtering so Ilana's suggestions there make sense to me.

@Isaac, apologies for the late reply and noted regarding the Women's X topic. Thanks for the clarification!

Change #1112100 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/WikimediaMessages@master] Add shared messages for ArticleTopicFiltersRegistry

https://gerrit.wikimedia.org/r/1112100

Daimona changed the task status from Stalled to Open.Jan 16 2025, 10:04 PM

Update: we (Campaigns) would like to release this work by the end of January. Unfortunately, moving translations is trickier than I expected. So, here's the current plan:

  • I have rewritten r1100553 to reference the existing GrowthExperiments and ContentTranslation messages directly (uses GE for everything except for the "history and society" and STEM topic groups, as described in T380825#10399214). This patch is now unblocked and ready for review.
  • I made a separate patch to consolidate these messages in the WikimediaMessages extension: r1112100. This one is blocked because of the translation issues.
  • We will be in touch with LPL team to figure out a reasonable migration plan for existing translations that isn't too difficult to implement but also doesn't waste too much work. Updates will be posted here.

Hi @Pginer-WMF, thank you for providing this explanation behind the CX choices! It was very helpful to read through.

Thanks for sharing your perspectives too, @ifried. I'm totally happy to try the proposed naming approach (especially when we are aligning with the people organizing events on those topics). We can later observe how these terms works in the different contexts, and consider further adjustments, but getting to a state of consistency would be a great first step.

Hi, @Pginer-WMF. I agree that it would be great if we all have a state of consistency as a first step. Thank you for this response! In that case, keep us updated if any changes are made from the LPL side, and we'll share any updates within this ticket. Much appreciated!

Change #1100553 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Introduce ArticleTopicFiltersRegistry

https://gerrit.wikimedia.org/r/1100553

Marking as blocked again, while we figure out how to move forward with the translations.

2025-01-21 update

Campaigns engineers met with @Nikerabbit to figure out what to do with translations. #1 priority is that nothing breaks during the transition, and #2 is that we will try not to waste any existing translations if possible. The preferred transition method is to move message definitions, which would look something like this:

  1. Make a patch to delete messages from en.json and qqq.json of the source repo
  2. Make a patch to add those messages to en.json and qqq.json of WikimediaMessages, without changing message keys or content
  3. Have both patches merged together, wait 1 day for translation exports
  4. Rename messages in the target repo. This will be a simple rename.
    1. At the same time, update message key references in GE and CX

The first question is what to use as the source repo. My intuition is to use GrowthExperiments because messages have been around for longer. However, we thought it would be nice to see numbers on how many messages we're talking about. So, I quickly wrote a script to get statistics on these messages. You can find the script in P72206. It's vanilla PHP, so you should be able to run it anywhere without installing anything special; also, in case something looks weird, yes, all the boilerplate stuff is GPT-generated. I ran this script for both groups of messages (topics and topic groups):

Summary for topics
1790 messages only in GrowthExperiments
257 messages only in ContentTranslation
97 messages with differences
1007 unchanged messages
Summary for topic groups
186 messages only in GrowthExperiments
58 messages only in ContentTranslation
11 messages with differences
76 unchanged messages

(Full output in P72207.)


Considering all the above, my current proposal would be to:

  • Migrate all message definitions from GrowthExperiments to Wikimedia messages, with the exception of 2 messages that we will take from CX instead, as described above.
    • This is going to a be a total of 3 patches
  • I can make a list of messages that exist in CX but not GE, across all languages; could these then be moved on translatewiki? We'd lose attribution in the JSON files, but it's better than nothing.
  • Everything else (identical duplicates and differing messages) will be deleted

How does this sound?

Change #1117589 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/GrowthExperiments@master] i18n: delete messages for article topics being moved to WikimediaMessages

https://gerrit.wikimedia.org/r/1117589

Change #1117592 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/ContentTranslation@master] i18n: delete messages for article topics being moved to WikimediaMessages

https://gerrit.wikimedia.org/r/1117592

Change #1117602 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/WikimediaMessages@master] Import topic messages from GrowthExperiments and ContentTranslation

https://gerrit.wikimedia.org/r/1117602

@Daimona probably a silly question about the plan above for the messages... did you consider writing a script that copies some messages from GE/i18n/*.json and some from CX/i18n/*.json with all their translations, rename them to wikimediamessages-... and put them in wikimediamessages/i18n/*.json so they can be committed in one go? That would create the new messages with all existing translations with nice names. After that, consumers (GE, CX) could transition on their own timeline.

I made 3 patches for the first step of the migration plan outlined in T380825#10481384. @Nikerabbit I'd appreciate if you could confirm that this is the right approach; thanks in advance!

@Daimona probably a silly question about the plan above for the messages... did you consider writing a script that copies some messages from GE/i18n/*.json and some from CX/i18n/*.json with all their translations, rename them to wikimediamessages-... and put them in wikimediamessages/i18n/*.json so they can be committed in one go? That would create the new messages with all existing translations with nice names. After that, consumers (GE, CX) could transition on their own timeline.

The current plan came out of a conversation between Campaigns and Niklas, and it seemed that this was the best way to migrate messages while preserving attribution. As I understand it, moving the messages first without changing the keys or content allows attribution to be preserved in both translatewiki and the JSON files. And files other than English and qqq will be done by l10n-bot.

  1. Rename messages in the target repo. This will be a simple rename.

Isn't it the case that after this step the link between the CX code and the newly renamed messages will be broken and the topics will be untranslated in the CX UI? I guess I don't understand how the continuity works there unless we do the migration, which we don't have in our short term plans, very quickly.

  1. Rename messages in the target repo. This will be a simple rename.

Isn't it the case that after this step the link between the CX code and the newly renamed messages will be broken and the topics will be untranslated in the CX UI? I guess I don't understand how the continuity works there unless we do the migration, which we don't have in our short term plans, very quickly.

Ah yes, of course. There was a missing/implied fifth step that I've just added explicitly. As long as the patches are merged together, the messages would only remain untranslated briefly, which should not be a problem; unless we have unit tests checking the existance of translations, which might need to be disabled while the rename is in progress.

Quick update: we (Campaigns) would like to move forward with the migration. There has been some confusion as to how the migration will be carried out (including on my part), so I am summarizing it here.


Step one: we will move all relevant messages from GrowthExperiments and ContentTranslation to WikimediaMessages, without changing message keys or contents. By doing so, existing translations will also be moved by the bot, while preserving attribution both in the JSON files and on translatewiki (AIUI). I have already written the patches for this: GrowthExperiments, ContentTranslation, WikimediaMessages. This is backwards-compatible, in that CX and GE will keep using their own messages as before. After this step, we will wait a few days for translation exports to catch up.

Step two: update CX and GE to use the centralized list of topics provided by the WikimediaMessages extension via the ArticleTopicFiltersRegistry class. This will hide implementation details such as what message keys are being used from the extension itself. The extension will start using the new list of messages, which means some translations will be different, hopefully better. At this stage we can also delete leftover messages, i.e. those that still remain in CX and GE and have not been moved to the new shared list. I still haven't made patches for this step. I would appreciate if Growth-Team and Language and Product Localization could do this for their own extension; if not, I will make the changes later on but will need CR. Note that strictly speaking, this step is not blocked by step 1.

Step three: we will rename all messages in the WikimediaMessages extension to use a common key prefix that is extension-agnostic. At this point this will be a simple rename of messages within a single repo, meaning it doesn't need special attention and it's fully automated, as long as only the message keys change (not the content). This will be done in r1112100.


Action items, and pinging people that we've been in touch with thus far:

  • We (Campaigns) are planning to merge the changes for step 1 this week, no later than Thursday, barring any objections. @Nikerabbit
  • The patches for GrowthExperiments and ContentTranslation are ready for review. I would appreciate (virtual) +2s from the respective maintainers. The merge itself will be done together with the WikimediaMessages patch by Campaigns engineers. @SBisson, @Urbanecm_WMF, @Michael
  • If possible, patches to migrate GE and CX to the new ArticleTopicFiltersRegistry class would also be appreciated. Feel free to add me as reviewer. @SBisson, @Urbanecm_WMF, @Michael

I added a virtual +2 to the GrowthExperiments patch.

  • If possible, patches to migrate GE and CX to the new ArticleTopicFiltersRegistry class would also be appreciated. Feel free to add me as reviewer. @SBisson, @Urbanecm_WMF, @Michael

I filled T386018 for the Growth-Team to take a look.

Change #1117602 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Import topic messages from GrowthExperiments and ContentTranslation

https://gerrit.wikimedia.org/r/1117602

Change #1117589 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] i18n: delete messages for article topics being moved to WikimediaMessages

https://gerrit.wikimedia.org/r/1117589

Change #1117592 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] i18n: delete messages for article topics being moved to WikimediaMessages

https://gerrit.wikimedia.org/r/1117592

Change #1119513 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/GrowthExperiments@master] chore: Add WikimediaMessages as a hard GE dependency

https://gerrit.wikimedia.org/r/1119513

Update: step 1 of the migration is complete, and L10n bot has moved the translations (WikimediaMessages, GrowthExperiments, ContentTranslation). Now we need to update GrowthExperiments and ContentTranslation to use the messages in WikimediaMessages, if available. I will get back to this on Monday next week and make patches as needed.

Update: Step two is now in progress, and that work is tracked in the subtasks T386018 and T387159. I made patches for both of them with the minimum necessary to unblock this task; further cleanup can be done later by the respective maintainers.

Update: Step 2 done: ContentTranslation and GrowthExperiments are now using the shared list. Now we can go ahead with step 3, in r1112100: renaming all messages to use the same prefix. This is now ready for review, so I've marked the gerrit patch as such.

Change #1112100 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Rename article topic messages

https://gerrit.wikimedia.org/r/1112100

Update: step 3 also done, and translations have been updated in today's update, hence closing this task. Thanks everyone!