Page MenuHomePhabricator

Request for new search profile for Wikidata that boosts Items for languages
Closed, ResolvedPublic

Description

Problem:
The Wikidata Team is redoing the Special:NewLexeme page. One improvement we want to make is boosting languages in the selector where editors indicate the language of the Lexeme they are creating (See T298140). The language selector currently is a normal entity selector that searches through all Items in Wikidata. We'd like to have a new search profile that boosts Items representing languages in order to make selecting languages easier.

Screenshots:
From the current Special:NewLexeme page:

image.png (578×781 px, 48 KB)

Acceptance criteria:

  • a search profile is available that the Wikidata Team can use in the new Special:NewLexeme page that boosts languages

Ideas for how to determine which Items to boost:

Notes:

  • We still want Items not representing languages to be included. They should just be ranked lower.

Details

SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+5 -0
operations/mediawiki-configmaster+22 -27
operations/mediawiki-configmaster+17 -0
mediawiki/extensions/Wikibasewmf/1.39.0-wmf.21+27 -8
mediawiki/extensions/Wikibasemaster+27 -8
operations/mediawiki-configmaster+0 -17
operations/mediawiki-configmaster+17 -0
mediawiki/extensions/WikibaseCirrusSearchmaster+0 -1
mediawiki/extensions/WikimediaMessagesmaster+4 -0
mediawiki/extensions/Wikibasemaster+76 -21
mediawiki/extensions/WikibaseCirrusSearchmaster+3 -1
mediawiki/extensions/CirrusSearchmaster+5 -4
operations/mediawiki-configmaster+4 -4
mediawiki/extensions/WikibaseCirrusSearchwmf/1.39.0-wmf.17+1 -1
mediawiki/extensions/WikibaseCirrusSearchmaster+1 -1
mediawiki/extensions/WikibaseCirrusSearchwmf/1.39.0-wmf.17+4 -6
operations/mediawiki-configmaster+1 -1
mediawiki/extensions/WikibaseCirrusSearchmaster+4 -6
operations/mediawiki-configmaster+80 -0
mediawiki/extensions/WikibaseCirrusSearchwmf/1.39.0-wmf.17+1 -1
mediawiki/extensions/WikibaseCirrusSearchmaster+1 -1
operations/mediawiki-configmaster+80 -0
mediawiki/extensions/Wikibasemaster+116 -40
mediawiki/extensions/PropertySuggestermaster+0 -1
mediawiki/extensions/WikibaseLexemeCirrusSearchmaster+7 -4
mediawiki/extensions/PropertySuggestermaster+3 -1
mediawiki/extensions/WikibaseCirrusSearchmaster+52 -7
mediawiki/extensions/WikibaseLexemeCirrusSearchmaster+47 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 807902 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: DCausse):

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.39.0-wmf.17] Do not re-use "wikibase_config" for registering the language selector...

https://gerrit.wikimedia.org/r/807902

Change 807902 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.39.0-wmf.17] Do not re-use "wikibase_config" for registering the language selector...

https://gerrit.wikimedia.org/r/807902

Mentioned in SAL (#wikimedia-operations) [2022-06-23T15:11:26Z] <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:807902|Do not re-use "wikibase_config" for registering the language selector... (T307869)]] (duration: 03m 22s)

Change 808011 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[operations/mediawiki-config@master] [cirrus] Add a custom profile for the wikibase language selector

https://gerrit.wikimedia.org/r/808011

The above patch should fix the issue, I forgot that profile repositories must have have unique names, sorry about that!

Thanks! I backported it to wmf.17 and scheduled a repeat of the config change for Monday.

Change 808011 merged by jenkins-bot:

[operations/mediawiki-config@master] [cirrus] Add a custom profile for the wikibase language selector

https://gerrit.wikimedia.org/r/808011

Mentioned in SAL (#wikimedia-operations) [2022-06-27T13:12:17Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (1/4) (duration: 03m 35s)

Mentioned in SAL (#wikimedia-operations) [2022-06-27T13:16:08Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (2/4) (duration: 03m 33s)

Mentioned in SAL (#wikimedia-operations) [2022-06-27T13:20:16Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/SearchSettingsForWikibase.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (3/4) (duration: 03m 32s)

Mentioned in SAL (#wikimedia-operations) [2022-06-27T13:24:04Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/SearchSettingsForWikidata.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (4/4) (duration: 03m 29s)

It looks like the new profile isn’t fully working yet, at least when used via the maintenance script – I get the same results for “Engl” with or without the maintenance script:

lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<Engl 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "Engl",
  "type": "label",
  "text": "family name"
}
{
  "term": "ENGL",
  "type": "alias",
  "text": "protein-coding gene in the species Homo sapiens"
}
{
  "term": "English",
  "type": "label",
  "text": "West Germanic language"
}
{
  "term": "England",
  "type": "label",
  "text": "country in north-west Europe, part of the United Kingdom"
}
{
  "term": "English Wikipedia",
  "type": "label",
  "text": "English-language edition of Wikipedia"
}
lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en <<<Engl 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "Engl",
  "type": "label",
  "text": "family name"
}
{
  "term": "ENGL",
  "type": "alias",
  "text": "protein-coding gene in the species Homo sapiens"
}
{
  "term": "English",
  "type": "label",
  "text": "West Germanic language"
}
{
  "term": "England",
  "type": "label",
  "text": "country in north-west Europe, part of the United Kingdom"
}
{
  "term": "English Wikipedia",
  "type": "label",
  "text": "English-language edition of Wikipedia"
}

Same for “Deu”:

lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<Deu 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "Deutsche Eislauf-Union",
  "type": "label",
  "text": "voluntary association"
}
{
  "term": "Deutschland",
  "type": "alias",
  "text": "country in Central Europe"
}
{
  "term": "Deutsch",
  "type": "alias",
  "text": "West Germanic language spoken mainly in Central Europe"
}
{
  "term": "Deutsche Demokratische Republik",
  "type": "alias",
  "text": "1949–1990 country in central Europe, unified into modern Germany"
}
{
  "term": "Deutsches Kaiserreich",
  "type": "alias",
  "text": "empire in Central Europe between 1871 and 1918"
}
lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en <<<Deu 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "Deutsche Eislauf-Union",
  "type": "label",
  "text": "voluntary association"
}
{
  "term": "Deutschland",
  "type": "alias",
  "text": "country in Central Europe"
}
{
  "term": "Deutsch",
  "type": "alias",
  "text": "West Germanic language spoken mainly in Central Europe"
}
{
  "term": "Deutsche Demokratische Republik",
  "type": "alias",
  "text": "1949–1990 country in central Europe, unified into modern Germany"
}
{
  "term": "Deutsches Kaiserreich",
  "type": "alias",
  "text": "empire in Central Europe between 1871 and 1918"
}

Or “Frenc”:

lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<Frenc 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "French Republic",
  "type": "alias",
  "text": "country in Western Europe"
}
{
  "term": "French",
  "type": "label",
  "text": "Romance language"
}
{
  "term": "French Wikipedia",
  "type": "label",
  "text": "French-language edition of Wikipedia"
}
{
  "term": "French Revolution",
  "type": "label",
  "text": "1789 to 1799 social and political revolution in France"
}
{
  "term": "French Guiana",
  "type": "label",
  "text": "Overseas department of France in South America"
}
lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en <<<Frenc 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "French Republic",
  "type": "alias",
  "text": "country in Western Europe"
}
{
  "term": "French",
  "type": "label",
  "text": "Romance language"
}
{
  "term": "French Wikipedia",
  "type": "label",
  "text": "French-language edition of Wikipedia"
}
{
  "term": "French Revolution",
  "type": "label",
  "text": "1789 to 1799 social and political revolution in France"
}
{
  "term": "French Guiana",
  "type": "label",
  "text": "Overseas department of France in South America"
}

It looks like this isn’t a bug in the maintenance script, the profile context is at least making it to CirrusSearch – if I put in a wrong value, I get an error:

lucaswerkmeister-wmde@mwdebug1001:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context unknown_profile_context <<<Engl
Please input search terms...
CirrusSearch\Profile\SearchProfileException from line 273 of /srv/mediawiki/php-1.39.0-wmf.17/extensions/CirrusSearch/includes/Profile/SearchProfileService.php: No default profile found for wikibase_prefix_querybuilder in context unknown_profile_context
#0 /srv/mediawiki/php-1.39.0-wmf.17/extensions/CirrusSearch/includes/Profile/SearchProfileService.php(258): CirrusSearch\Profile\SearchProfileService->getProfileName('wikibase_prefix...', 'unknown_profile...', Array)
#1 /srv/mediawiki/php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/EntitySearchElastic.php(158): CirrusSearch\Profile\SearchProfileService->loadProfile('wikibase_prefix...', 'unknown_profile...', NULL, Array)
#2 /srv/mediawiki/php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/EntitySearchElastic.php(206): Wikibase\Search\Elastic\EntitySearchElastic->loadProfile(Object(CirrusSearch\Search\SearchContext), 'en')
#3 /srv/mediawiki/php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/EntitySearchElastic.php(301): Wikibase\Search\Elastic\EntitySearchElastic->getElasticSearchQuery('Engl', 'en', 'item', false, Object(CirrusSearch\Search\SearchContext))
#4 /srv/mediawiki/php-1.39.0-wmf.17/extensions/Wikibase/repo/includes/Api/CombinedEntitySearchHelper.php(49): Wikibase\Search\Elastic\EntitySearchElastic->getRankedSearchResults('Engl', 'en', 'item', 5, false, 'unknown_profile...')
#5 /srv/mediawiki/php-1.39.0-wmf.17/extensions/Wikibase/repo/includes/Api/TypeDispatchingEntitySearchHelper.php(48): Wikibase\Repo\Api\CombinedEntitySearchHelper->getRankedSearchResults('Engl', 'en', 'item', 5, false, 'unknown_profile...')
#6 /srv/mediawiki/php-1.39.0-wmf.17/extensions/Wikibase/repo/maintenance/searchEntities.php(106): Wikibase\Repo\Api\TypeDispatchingEntitySearchHelper->getRankedSearchResults('Engl', 'en', 'item', 5, false, 'unknown_profile...')
#7 /srv/mediawiki/php-1.39.0-wmf.17/includes/OrderedStreamingForkController.php(142): Wikibase\Repo\Maintenance\SearchEntities->doSearch('Engl')
#8 /srv/mediawiki/php-1.39.0-wmf.17/includes/OrderedStreamingForkController.php(69): OrderedStreamingForkController->consumeNoFork()
#9 /srv/mediawiki/php-1.39.0-wmf.17/extensions/Wikibase/repo/maintenance/searchEntities.php(65): OrderedStreamingForkController->start()
#10 /srv/mediawiki/php-1.39.0-wmf.17/maintenance/includes/MaintenanceRunner.php(309): Wikibase\Repo\Maintenance\SearchEntities->execute()
#11 /srv/mediawiki/php-1.39.0-wmf.17/maintenance/doMaintenance.php(85): MediaWiki\Maintenance\MaintenanceRunner->run()
#12 /srv/mediawiki/php-1.39.0-wmf.17/extensions/Wikibase/repo/maintenance/searchEntities.php(160): require_once('/srv/mediawiki/...')
#13 /srv/mediawiki/multiversion/MWScript.php(120): require_once('/srv/mediawiki/...')
#14 {main}

Change 808903 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/mediawiki-config@master] Do not set wgWBCSLanguageSelectorRescoreProfile twice

https://gerrit.wikimedia.org/r/808903

Change 808904 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/WikibaseCirrusSearch@master] Use WBCS config when registering language selector profile

https://gerrit.wikimedia.org/r/808904

Sorry about that, there was yet another issue in the WikibaseCirrusSearch Hook that caused the config to be ignored and caused the language selector profile context to simply use exactly the same settings as the classic entity completion search.
There was also a typo in mw-config fixed in one the attached patch.

Change 808904 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Use WBCS config when registering language selector profile

https://gerrit.wikimedia.org/r/808904

Change 808445 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: DCausse):

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.39.0-wmf.17] Use WBCS config when registering language selector profile

https://gerrit.wikimedia.org/r/808445

Change 808903 merged by jenkins-bot:

[operations/mediawiki-config@master] Do not set wgWBCSLanguageSelectorRescoreProfile twice

https://gerrit.wikimedia.org/r/808903

Mentioned in SAL (#wikimedia-operations) [2022-06-27T15:15:13Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/SearchSettingsForWikidata.php: Config: [[gerrit:808903|Do not set wgWBCSLanguageSelectorRescoreProfile twice (T307869)]] (duration: 03m 41s)

Change 808445 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.39.0-wmf.17] Use WBCS config when registering language selector profile

https://gerrit.wikimedia.org/r/808445

Mentioned in SAL (#wikimedia-operations) [2022-06-27T15:32:17Z] <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:808445|Use WBCS config when registering language selector profile (T307869)]] (duration: 03m 38s)

Change 808941 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/mediawiki-config@master] Increase weights on the language selector statement boosts

https://gerrit.wikimedia.org/r/808941

Change 808942 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/WikibaseCirrusSearch@master] Use LanguageSelectorStatementBoost instead of its plurar form

https://gerrit.wikimedia.org/r/808942

Change 808942 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Use LanguageSelectorStatementBoost instead of its plurar form

https://gerrit.wikimedia.org/r/808942

Change 809118 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: DCausse):

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.39.0-wmf.17] Use LanguageSelectorStatementBoost instead of its plurar form

https://gerrit.wikimedia.org/r/809118

Change 809118 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.39.0-wmf.17] Use LanguageSelectorStatementBoost instead of its plurar form

https://gerrit.wikimedia.org/r/809118

Mentioned in SAL (#wikimedia-operations) [2022-06-28T13:03:28Z] <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:809118|Use LanguageSelectorStatementBoost instead of its plurar form (T307869)]] (duration: 03m 35s)

Change 809209 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Construct a match query from TermBoostScoreBuilder

https://gerrit.wikimedia.org/r/809209

There is yet another problem (see patch above that should fix it). I'm sorry that deploying this profile is such a pain, it demonstrates a clear problem in the way we (the search team) deploy such features/profiles and I filed T311528 to discuss and hopefully improve the situation.

Question from @Lea_WMDE and @Evelien_WMDE from today's LOD sync: will this have any influence on other Wikibase installations? Since they are having Elastic issues in wbcloud they want to make sure it's not getting worse for them accidentally due to us creating a new profile for all other Wikibase installations that use the new Lexeme creation page.

No new profiles should be created for other wikibase installation as most of the wikidata specific options are managed in wmf specific config, not Wikibase nor CirrusSearch so the new Lexeme creation page should behave exactly as before.
All the fixes we had to make in CirrusSearch should not impact anything except if other Wikibase installations had tuned such broken settings (but I doubt since they were totally broken and ineffective)

The bugfix that might affect Wikibase installations relying on CirrusSearch&Elastic is:

  • Fixed the handling of the configuration variable wgWBCSStatementBoost which was ignored.

@Lea_WMDE @Evelien_WMDE do you have a link to such problems with Elastic in wbcloud?

I think this would depend on which versions of the CirrusSearch and WikbaseCirrusSearch extensions are used in those Wikibase installations. but IIUC this change should be non breaking. @dcausse please correct me if I'm wrong

Change 808941 merged by jenkins-bot:

[operations/mediawiki-config@master] Increase weights on the language selector statement boosts

https://gerrit.wikimedia.org/r/808941

Mentioned in SAL (#wikimedia-operations) [2022-06-29T15:51:40Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808941|Increase weights on the language selector statement boosts (T307869)]] (expected to be a no-op) (duration: 03m 21s)

Change 809209 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Construct a match query from TermBoostScoreBuilder

https://gerrit.wikimedia.org/r/809209

Looks like the new search profile is working now 🎉

lucaswerkmeister-wmde@mwdebug1002:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<Engl 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "English",
  "type": "label",
  "text": "West Germanic language"
}
{
  "term": "English (Canada)",
  "type": "alias",
  "text": "set of varieties of the English language native to Canada"
}
{
  "term": "Englesko-tahićanski",
  "type": "alias",
  "text": "dialect"
}
{
  "term": "English (United States)",
  "type": "alias",
  "text": "set of dialects of the English language spoken in the United States"
}
{
  "term": "Englisc",
  "type": "alias",
  "text": "earliest historical form of English"
}

lucaswerkmeister-wmde@mwdebug1002:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<Frenc 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "French",
  "type": "label",
  "text": "Romance language"
}
{
  "term": "French Sign Language",
  "type": "label",
  "text": "sign language of the deaf in the nation of France"
}
{
  "term": "French Cree language",
  "type": "alias",
  "text": "The language of the Métis people of Canada and the United States, who are the descendants of First Nations women and fur trade workers of European ancestry"
}
{
  "term": "French Guianan Creole",
  "type": "alias",
  "text": "French-based creole from French Guiana"
}
{
  "term": "French-Canadian dialect",
  "type": "alias",
  "text": "dialect of French, mainly spoken in Canada"
}

Potentially controversial question, but perhaps we want to slightly deboost dialects relative to languages? (E.g. change them from 50 to 40 while keeping languages at 50.) Currently, both “Deutsch” and “German” find a dialect before the main German language.

lucaswerkmeister-wmde@mwdebug1002:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<Deutsch 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "Deutsche Dialekte in der Schweiz",
  "type": "alias",
  "text": "Alemannic dialects spoken in the German-speaking part of Switzerland"
}
{
  "term": "Deutsch",
  "type": "alias",
  "text": "West Germanic language spoken mainly in Central Europe"
}
{
  "term": "Deutschschweizer Gebärdensprache",
  "type": "alias",
  "text": "sign language of Switzerland"
}
{
  "term": "Deutsche Sprache (Österreichisch)",
  "type": "alias",
  "text": "variety of Standard German written and spoken in Austria and North Italy"
}
{
  "term": "Deutsch",
  "type": "alias",
  "text": "German as used in Switzerland, mainly as written language"
}
lucaswerkmeister-wmde@mwdebug1002:~$ mwscript extensions/Wikibase/repo/maintenance/searchEntities.php wikidatawiki --entity-type item --language en --profile-context language_selector_prefix <<<German 2> /dev/null | jq '.rows | .[] | .snippets | { term, type, text }'
{
  "term": "germana elvețiană",
  "type": "label",
  "text": "Alemannic dialects spoken in the German-speaking part of Switzerland"
}
{
  "term": "German",
  "type": "label",
  "text": "West Germanic language spoken mainly in Central Europe"
}
{
  "term": "germana de jos",
  "type": "label",
  "text": "West Germanic language spoken mainly in northern Germany and the eastern part of the Netherlands"
}
{
  "term": "Germana din Pennsylvania",
  "type": "label",
  "text": "variety of West Central German"
}
{
  "term": "Germaneg",
  "type": "alias",
  "text": "language"
}

Change 806932 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Pass $searchProfiles into SearchEntities API

https://gerrit.wikimedia.org/r/806932

Change 806386 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add profile parameter to entity search APIs

https://gerrit.wikimedia.org/r/806386

Change 806929 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Add messages for language entity search profile

https://gerrit.wikimedia.org/r/806929

Change 806933 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Remove no-longer-used Phan suppression

https://gerrit.wikimedia.org/r/806933

Change 806930 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure wbsearchentities profile parameter on Test Wikidata

https://gerrit.wikimedia.org/r/806930

Change 815969 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[operations/mediawiki-config@master] Revert "Configure wbsearchentities profile parameter on Test Wikidata"

https://gerrit.wikimedia.org/r/815969

Change 815969 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "Configure wbsearchentities profile parameter on Test Wikidata"

https://gerrit.wikimedia.org/r/815969

Change 815961 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Fix profile in wbsearchentities and wbsearch

https://gerrit.wikimedia.org/r/815961

Change 815970 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[operations/mediawiki-config@master] Configure wbsearchentities profile parameter on Test Wikidata (take 2)

https://gerrit.wikimedia.org/r/815970

Change 815961 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Fix profile in wbsearchentities and wbsearch

https://gerrit.wikimedia.org/r/815961

Change 815983 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@wmf/1.39.0-wmf.21] Fix profile in wbsearchentities and wbsearch

https://gerrit.wikimedia.org/r/815983

Change 815983 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.39.0-wmf.21] Fix profile in wbsearchentities and wbsearch

https://gerrit.wikimedia.org/r/815983

Mentioned in SAL (#wikimedia-operations) [2022-07-21T15:14:40Z] <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.39.0-wmf.21/extensions/Wikibase/repo/: Backport: [[gerrit:815983|Fix profile in wbsearchentities and wbsearch (T307869)]] (duration: 03m 07s)

Change 815970 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure wbsearchentities profile parameter on Test Wikidata (take 2)

https://gerrit.wikimedia.org/r/815970

Mentioned in SAL (#wikimedia-operations) [2022-07-21T15:21:22Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815970|Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869)]] (1/2) (duration: 02m 59s)

Mentioned in SAL (#wikimedia-operations) [2022-07-21T15:25:23Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/SearchSettingsForWikibase.php: Config: [[gerrit:815970|Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869)]] (2/2) (duration: 03m 13s)

Working on Test Wikidata:

image.png (229×1 px, 47 KB)

I think this is now waiting for the announcement; on the announced date, we can then deploy the config change to enable it on Wikidata, and later the Wikibase cleanup change.

Potentially controversial question, but perhaps we want to slightly deboost dialects relative to languages? (E.g. change them from 50 to 40 while keeping languages at 50.) Currently, both “Deutsch” and “German” find a dialect before the main German language.

I tested this a bit on mwdebug1002, and apparently dialects need to be dropped all the way to 20 before Deutsche Dialekte in der Schweiz and germana elvețiană no longer come first for Deutsch and German, respectively (i.e. they still come first at 25; I didn’t bother checking the exact point between 20 and 25).

And I also just realized that Deutsche Dialekte in der Schweiz and germana elvețiană are in fact aliases of the same item, Swiss German – which is an instance of language, modern language, and dialect, so it probably gets three boosts. So I guess this is perhaps a bit of a special case, and not necessarily representative of dialects dominating the search results in general.

Apart from German, I think most of the search results are actually acceptable even with the current weights (all 50). So perhaps we can leave it like that until Thursday, when we make the new profile available in the API on Wikidata? And in the meantime ask @dcausse: do you know if it would be possible to make these boosts not cumulative, so that Swiss German didn’t get such a high score from its redundant “instance of” statements? (Or would that be a bad idea in general?)

Change 817317 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/mediawiki-config@master] [WIP] Tune wikidata language selector autocomplete

https://gerrit.wikimedia.org/r/817317

@Lucas_Werkmeister_WMDE yes I think it's doable, attached a quick patch to demonstrate how

Change 817317 merged by jenkins-bot:

[operations/mediawiki-config@master] Tune the wikidata "language" profile for wbsearchentities

https://gerrit.wikimedia.org/r/817317

Mentioned in SAL (#wikimedia-operations) [2022-07-27T13:46:47Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817317|Tune the wikidata "language" profile for wbsearchentities (T307869)]] (1/2) (duration: 03m 29s)

Mentioned in SAL (#wikimedia-operations) [2022-07-27T13:50:21Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/SearchSettingsForWikidata.php: Config: [[gerrit:817317|Tune the wikidata "language" profile for wbsearchentities (T307869)]] (2/2) (duration: 03m 21s)

@Lucas_Werkmeister_WMDE yes I think it's doable, attached a quick patch to demonstrate how

Seems to be working very well, thanks for the quick response!

Change 806931 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure wbsearchentities profile parameter on Wikidata

https://gerrit.wikimedia.org/r/806931

Mentioned in SAL (#wikimedia-operations) [2022-07-28T13:10:27Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806931|Configure wbsearchentities profile parameter on Wikidata (T307869)]] (duration: 03m 25s)