Page MenuHomePhabricator

Implement articlecountry a new CirrusSearch keyword
Closed, ResolvedPublic3 Estimated Story Points

Description

Article country predictions are being populated in the search index, we should create a new search keyword to query them.

We might probably want to adapt/generalize the ArticleTopic keyword since it might work exactly the same way.

Predictions are currently country names for better usability we might want to provide a mapping of country code (3 letters, and 2 letters?) -> country names so that users do not have to write long names (likely prone to misspellings).

AC:

  • Searching for articlecountry:grl would find all articles about Greenland

Event Timeline

thanks! just linking to my long comment where I outputted a couple ways to build this mapping of country -> keyword in case that's helpful: T301671#10500864

Reposting my comment from T301671#10545220 here:

Hello! Jumping into this discussion on behalf of the LPL team — we fully support adopting ISO 3166 codes, as suggested above. Case-insensitive support would be ideal (i.e., allowing both articlecountry:USA and articlecountry:usa), but if a single format must be chosen, lowercase would be our preference.

Thank you for all the work on this front!

Gehel set the point value for this task to 3.Feb 17 2025, 4:34 PM
gmodena updated Other Assignee, added: dcausse.

Hey @ngkountas,

Touching base to let you know that we picked up this task.

Reposting my comment from T301671#10545220 here:

Hello! Jumping into this discussion on behalf of the LPL team — we fully support adopting ISO 3166 codes, as suggested above. Case-insensitive support would be ideal (i.e., allowing both articlecountry:USA and articlecountry:usa), but if a single format must be chosen, lowercase would be our preference.

Ack on using ISO 3166-1 codes, that'd be our preference too. Case-insensitive support should be no problem.

Two questions:

  • Would you expect the country list to change often?
  • Would you like the ability to group countries by geographic areas to simplify OR predicates?

For example, you could define a grouping for countries in the DACH region as:

const DACH = [
  "DEU", // Germany
  "AUT", // Austria
  "CHE"  // Switzerland
];

And under the hood a search for articlecountry:dach would match articlecountry:deu|aut|che.

With the only caveat that there are limits to how long a query CirrusSearch will accept, but that holds regardless of how we'll generate the predicate.

Change #1120938 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/extensions/CirrusSearch@master] Query: add support for articlecountry keyword

https://gerrit.wikimedia.org/r/1120938

Thinking on some of the usecases related to Content Translation, where people can use them to find relevant articles to translate associated to them, I can share some thoughts about the questions below:

Two questions:

  • Would you expect the country list to change often?

No. Countries are expected to be a stable list. In any case, if there is a change (e.g., when Czechoslovakia became two separate countries), the expectation is for that to be reflected in the list transparently. That is, users would be able so select countries to find articles related to them based on the latest version of the list.

  • Would you like the ability to group countries by geographic areas to simplify OR predicates?

We still need to explore which is the best way to expose countries to the final users. It seems useful to have regions that are meaningful for users. Being able to select continents or Wikimedia regions seems useful.

Change #1120938 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Query: add support for articlecountry keyword

https://gerrit.wikimedia.org/r/1120938

Hey @Pginer-WMF

Two questions:

  • Would you expect the country list to change often?

No. Countries are expected to be a stable list. In any case, if there is a change (e.g., when Czechoslovakia became two separate countries), the expectation is for that to be reflected in the list transparently. That is, users would be able so select countries to find articles related to them based on the latest version of the list.

Thanks for clarifying. That simplifies things. This feature should be available for testing with next week's deployment (1.44.0-wmf.19; 2025-03-04).

  • Would you like the ability to group countries by geographic areas to simplify OR predicates?

We still need to explore which is the best way to expose countries to the final users. It seems useful to have regions that are meaningful for users. Being able to select continents or Wikimedia regions seems useful.

For now, we have implemented the capability to support regions, but we have not included them yet. Some bikeshedding might be needed regarding region naming. We can follow up on this once you have more clarity on the best way to expose countries to end users.

Change #1124103 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/extensions/CirrusSearch@master] Query: add articlecountry to max len tests

https://gerrit.wikimedia.org/r/1124103

Change #1124103 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Query: add articlecountry to max len tests

https://gerrit.wikimedia.org/r/1124103