Page MenuHomePhabricator

Remove all bolding of search results on a variety of wikis
Closed, ResolvedPublic3 Estimated Story Points

Description

Description

This task is a follow-up from T277256: Bangla letters are getting broken in the search box where we reverted back to the behavior of the old search. It was pointed out that the bolding mechanism was already faulty here as well. We will continue with removing all bolding from the results on this wiki

Acceptance criteria

  • Remove bolding from all search results on the list of wikis listed below
  • This change is for wvui search only (a follow-up task will be created to fix this for the old vector search)

The List

Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)
Gujarati: Gujarati (gu)
Gurmukhi: Punjabi (pa)
Kannada: Kannada (kn), Tulu (tcy)
Khmer: Khmer (km)
Malayalam: Malayalam (ml)
Odia: Odia (or)
Sinhala: Sinhala (si)
Tamil: Tamil (ta)
Telugu: Telugu (te)
bn bdwikimedia bd.wikimedia.org
bn wbwikimedia wb.wikimedia.org
hi hiwikimedia hi.wikimedia.org
mai maiwikimedia mai.wikimedia.org
pa punjabiwikimedia punjabi.wikimedia.org

QA Results - Beta

ACStatusDetails
1T281797#7229953
2T281797#7229953
3T281797#7229953

QA Results - Prod

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@TJones - do you know if we have a list of languages that the issues occur for? We should probably expand this task to cover as many as possible.

I don't know if we have a list, but I can come up with something. It may not be 100% complete, but it should be a good start.

I don't know if we have a list, but I can come up with something. It may not be 100% complete, but it should be a good start.

That would be great, thank you!

Here's a list of scripts that I was able to verify have problems with conjuncts, ligatures, digraphs, etc.—plus the list of languages with wikis that use each script. Languages are listed with their code in parens.

I'm not sure if this will be configured by language, by script, or by wiki. If you need a list of wikis for a given language, see the Site Matrix, including the "Other Wikimedia Projects" section.

The List

  • Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
  • Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
  • Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)
  • Gujarati: Gujarati (gu)
  • Gurmukhi: Punjabi (pa)
  • Kannada: Kannada (kn), Tulu (tcy)
  • Khmer: Khmer (km)
  • Malayalam: Malayalam (ml)
  • Odia: Odia (or)
  • Sinhala: Sinhala (si)
  • Tamil: Tamil (ta)
  • Telugu: Telugu (te)

[†] Kashmiri is listed twice: it uses both Arabic and Devanagari.

Not Sure:

  • Javanese: Javanese (jv) (most pages in Latin, but a small number in Javanese)
  • Lontara: Buginese (bug) (most pages in Latin, not sure if Buginese supports bolding)

Let me know if you have any questions or need any other help!

ovasileva renamed this task from Remove all bolding of search results on bnwiki to Remove all bolding of search results on a variety of wikis.May 6 2021, 10:33 AM
ovasileva updated the task description. (Show Details)
ovasileva updated the task description. (Show Details)

Here's a list of scripts that I was able to verify have problems with conjuncts, ligatures, digraphs, etc.—plus the list of languages with wikis that use each script. Languages are listed with their code in parens.

I'm not sure if this will be configured by language, by script, or by wiki. If you need a list of wikis for a given language, see the Site Matrix, including the "Other Wikimedia Projects" section.

The List

  • Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
  • Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
  • Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)
  • Gujarati: Gujarati (gu)
  • Gurmukhi: Punjabi (pa)
  • Kannada: Kannada (kn), Tulu (tcy)
  • Khmer: Khmer (km)
  • Malayalam: Malayalam (ml)
  • Odia: Odia (or)
  • Sinhala: Sinhala (si)
  • Tamil: Tamil (ta)
  • Telugu: Telugu (te)

[†] Kashmiri is listed twice: it uses both Arabic and Devanagari.

Not Sure:

  • Javanese: Javanese (jv) (most pages in Latin, but a small number in Javanese)
  • Lontara: Buginese (bug) (most pages in Latin, not sure if Buginese supports bolding)

Let me know if you have any questions or need any other help!

Thank you!

I believe this can be achieved by adding a parameter to the wvui TypeaheadSearch component and then adding a new configuration option to $wgVectorWvuiSearchOptions

I went ahead and looked up all of the "Other" wikis that are specified as being in the languages in the list. These should have $wgVectorWvuiSearchOptions configured appropriately, too.

lang  id                url
bn    bdwikimedia       bd.wikimedia.org
bn    wbwikimedia       wb.wikimedia.org
hi    hiwikimedia       hi.wikimedia.org
mai   maiwikimedia      mai.wikimedia.org
pa    punjabiwikimedia  punjabi.wikimedia.org
ovasileva updated the task description. (Show Details)

Will split and make a separate task for fixing this within the old vector search

ovasileva set the point value for this task to 3.

Change 694392 had a related patch set uploaded (by Phuedx; author: Phuedx):

[wvui@master] Optionally disable query match highlighting

https://gerrit.wikimedia.org/r/694392

Change 694393 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/skins/Vector@master] skin: Add option to disable highlighting query

https://gerrit.wikimedia.org/r/694393

Change 695229 had a related patch set uploaded (by Phuedx; author: Phuedx):

[wvui@master] Partially revert "[typeahead-suggestion-title] Preserve graphemes during splitting"

https://gerrit.wikimedia.org/r/695229

Change 695229 had a related patch set uploaded (by Phuedx; author: Phuedx):

[wvui@master] Partially revert "[typeahead-suggestion-title] Preserve graphemes during splitting"

https://gerrit.wikimedia.org/r/695229

This code is still useful on wikis that have highlighting/bolding enabled. For example, on French Wikipedia you can search for नेप and get partial matching on नेपाली; without this code it will be broken into two parts: नेप/ ाली. It works a lot of the time, and even when it doesn't preserve graphemes correctly it generally prevents partial highlights with combining characters from looking broken.

Change 695229 abandoned by Phuedx:

[wvui@master] Partially revert "[typeahead-suggestion-title] Preserve graphemes during splitting"

Reason:

Per Trey Jones' comment above (thanks!).

https://gerrit.wikimedia.org/r/695229

Change 694392 merged by jenkins-bot:

[wvui@master] Optionally disable query match highlighting

https://gerrit.wikimedia.org/r/694392

Jdlrobson updated Other Assignee, added: cjming.
Jdlrobson updated Other Assignee, added: phuedx; removed: cjming.

This is blocked on the next WVUI release /cc @Volker_E

Change 697700 had a related patch set uploaded (by Catrope; author: Catrope):

[mediawiki/core@master] Update wvui to 0.2.0

https://gerrit.wikimedia.org/r/697700

Change 697700 merged by jenkins-bot:

[mediawiki/core@master] Update wvui to 0.2.0

https://gerrit.wikimedia.org/r/697700

Change 694393 merged by jenkins-bot:

[mediawiki/skins/Vector@master] search: Add option to disable highlighting query

https://gerrit.wikimedia.org/r/694393

Test Result - Beta|Prod

Status: ❌ Fail
Environment: various
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

Remove bolding from all search results on the list of wikis listed below

Note: This list is a sub-set of the languages called out in this task description. Only the wikis with a beta version are included.

❌ AC1: Arabic: Arabic (ar):

Screen Shot 2021-06-09 at 11.17.36 PM.png (679×605 px, 77 KB)

❌ Persian (fa):

Screen Shot 2021-06-09 at 11.31.14 PM.png (281×533 px, 23 KB)

❌ Hindi (hi):

Screen Shot 2021-06-09 at 11.34.03 PM.png (126×673 px, 12 KB)

@Jdlrobson I'm not sure if this is in Beta.

@phuedx I think we need to backport a config change here right?

Change 699416 had a related patch set uploaded (by Phuedx; author: Phuedx):

[operations/mediawiki-config@master] WIP: vector: Disable highlighting query in search autocomplete

https://gerrit.wikimedia.org/r/699416

@Jdforrester-WMF: Would you be able to do a concept review of https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/699416/, which attempts to tag the relevant wikis (see the task description for the list). Specifically, is it necessary to create another file, disable_search_autocomplete_highlights.yaml say, simply to add a wikiTag line rather than add that line directly to the wiki config files?

My apologies if you're not the correct person to ping about this.

@Jdforrester-WMF: Would you be able to do a concept review of https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/699416/, which attempts to tag the relevant wikis (see the task description for the list).

Hey there,

If this is required to be done on a per-wiki basis rather than a per-language basis than sadly this is roughly the correct approach, yes. (We're trying to reduce the number of dblists, as each one is loaded on each PHP request, so get quite expensive.)

More simply, you could create a top-level disable-search-autocomplete-highlighting YAML file with some documentation explaining the decision path that should be followed to add or remove wikis in future and extend the relevant wikis' inheritsFrom list, rather than adding wikiTag manually.

I'd have thought this was a feature you'd want to flag on a language basis, but I defer to the team. :-)

Specifically, is it necessary to create another file, disable_search_autocomplete_highlights.yaml say, simply to add a wikiTag line rather than add that line directly to the wiki config files?

Yes, the YAML files aren't (yet) read in production, they're just the mechanism by which the list is generated. The plan back when I was working on this was to read the YAML files directly, but this change-over hasn't happened yet.

My apologies if you're not the correct person to ping about this.

It's not my thing any more, no, but happy to help out. More generally this is a Release Engineering concern.

Change 699416 abandoned by Phuedx:

[operations/mediawiki-config@master] WIP: vector: Disable highlighting query in search autocomplete

Reason:

Incorrect approach.

https://gerrit.wikimedia.org/r/699416

I'd have thought this was a feature you'd want to flag on a language basis, but I defer to the team. :-)

Hah! Good point.

Change 702373 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/skins/Vector@master] search: Disable query highlight for some languages

https://gerrit.wikimedia.org/r/702373

Change 702373 merged by jenkins-bot:

[mediawiki/skins/Vector@master] search: Disable query highlight for some languages

https://gerrit.wikimedia.org/r/702373

Test Result - Beta

Status: ✅ Pass
Environment: beta
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

Remove bolding from all search results on the list of wikis listed below

✅ AC1: Arabic: Arabic (ar):

Screen Shot 2021-07-22 at 7.23.12 AM.png (1×1 px, 327 KB)

✅ Persian (fa):

Screen Shot 2021-07-22 at 7.24.33 AM.png (1×1 px, 321 KB)

✅ Hindi (hi):

Screen Shot 2021-07-22 at 7.27.17 AM.png (1×1 px, 588 KB)

Note: @ovasileva This list is a sub-set of the languages called out in this task description. Only the wikis with a beta version are included. I also assumed that this didn't include the "Search for pages containing text" option.

Note: @ovasileva This list is a sub-set of the languages called out in this task description. Only the wikis with a beta version are included. I also assumed that this didn't include the "Search for pages containing text" option.

Looks good!

Test Result - Prod

Status: ❌ Fao;
Environment: various, see below.
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

Remove bolding from all search results on the list of wikis listed below

✅ AC1: Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
✅ AC2: Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
❌ AC3: Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)

wikiscreenshot
Bhojpuri (bh)
Screen Shot 2021-08-09 at 6.33.13 PM.png (680×777 px, 163 KB)

✅ AC4: Gujarati: Gujarati (gu)
✅ AC5: Gurmukhi: Punjabi (pa)
✅ AC6: Kannada: Kannada (kn), Tulu (tcy)
✅ AC6: Khmer: Khmer (km)
✅ AC7: Malayalam: Malayalam (ml)
✅ AC8: Odia: Odia (or)
✅ AC9: Sinhala: Sinhala (si)
✅ AC10: Tamil: Tamil (ta)
✅ AC11: Telugu: Telugu (te)
✅ AC12: bd bdwikimedia bd.wikimedia.org
✅ AC13: wb wbwikimedia wb.wikimedia.org
✅ AC14: hi hiwikimedia hi.wikimedia.org
✅ AC15: mai maiwikimedia mai.wikimedia.org
❓ AC16: pa punjabiwikimedia punjabi.wikimedia.org
I couldn't find any pages in this wiki to test the search results.

@ovasileva Please take a look at AC3 (Bhojpuri (bh)) and AC16 (Punjabi) above and let me know if AC3 needs more work and any suggestions for AC16.

@ovasileva Please take a look at AC3 (Bhojpuri (bh)) and AC16 (Punjabi) above and let me know if AC3 needs more work and any suggestions for AC16.

Nice catch, @Edtadros. TIL that the code for the default interface language on https://bh.wikipedia.org is "bho" 🙂

Seems bhwiki is the expected wiki, but I'm seeing bolding as expected. Resolving

Screen Shot 2021-08-13 at 11.11.53 AM.png (1×1 px, 432 KB)