Page MenuHomePhabricator

Remove all bolding of search results on a variety of wikis
Open, MediumPublic3 Estimated Story Points

Description

Description

This task is a follow-up from T277256: Bangla letters are getting broken in the search box where we reverted back to the behavior of the old search. It was pointed out that the bolding mechanism was already faulty here as well. We will continue with removing all bolding from the results on this wiki

Acceptance criteria

  • Remove bolding from all search results on the list of wikis listed below
  • This change is for wvui search only (a follow-up task will be created to fix this for the old vector search)

The List

Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)
Gujarati: Gujarati (gu)
Gurmukhi: Punjabi (pa)
Kannada: Kannada (kn), Tulu (tcy)
Khmer: Khmer (km)
Malayalam: Malayalam (ml)
Odia: Odia (or)
Sinhala: Sinhala (si)
Tamil: Tamil (ta)
Telugu: Telugu (te)
bn bdwikimedia bd.wikimedia.org
bn wbwikimedia wb.wikimedia.org
hi hiwikimedia hi.wikimedia.org
mai maiwikimedia mai.wikimedia.org
pa punjabiwikimedia punjabi.wikimedia.org

Event Timeline

@TJones - do you know if we have a list of languages that the issues occur for? We should probably expand this task to cover as many as possible.

I don't know if we have a list, but I can come up with something. It may not be 100% complete, but it should be a good start.

I don't know if we have a list, but I can come up with something. It may not be 100% complete, but it should be a good start.

That would be great, thank you!

Here's a list of scripts that I was able to verify have problems with conjuncts, ligatures, digraphs, etc.—plus the list of languages with wikis that use each script. Languages are listed with their code in parens.

I'm not sure if this will be configured by language, by script, or by wiki. If you need a list of wikis for a given language, see the Site Matrix, including the "Other Wikimedia Projects" section.

The List

  • Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
  • Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
  • Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)
  • Gujarati: Gujarati (gu)
  • Gurmukhi: Punjabi (pa)
  • Kannada: Kannada (kn), Tulu (tcy)
  • Khmer: Khmer (km)
  • Malayalam: Malayalam (ml)
  • Odia: Odia (or)
  • Sinhala: Sinhala (si)
  • Tamil: Tamil (ta)
  • Telugu: Telugu (te)

[†] Kashmiri is listed twice: it uses both Arabic and Devanagari.

Not Sure:

  • Javanese: Javanese (jv) (most pages in Latin, but a small number in Javanese)
  • Lontara: Buginese (bug) (most pages in Latin, not sure if Buginese supports bolding)

Let me know if you have any questions or need any other help!

ovasileva renamed this task from Remove all bolding of search results on bnwiki to Remove all bolding of search results on a variety of wikis.Thu, May 6, 10:33 AM
ovasileva updated the task description. (Show Details)
ovasileva updated the task description. (Show Details)

Here's a list of scripts that I was able to verify have problems with conjuncts, ligatures, digraphs, etc.—plus the list of languages with wikis that use each script. Languages are listed with their code in parens.

I'm not sure if this will be configured by language, by script, or by wiki. If you need a list of wikis for a given language, see the Site Matrix, including the "Other Wikimedia Projects" section.

The List

  • Arabic: Arabic (ar), Moroccan Arabic (ary), Egyptian Arabic (arz), Sorani (ckb), Persian (fa), Gilaki (glk), Kashmiri[†] (ks), Mazanderani (mzn), Western Punjabi (pnb), Pashto (ps), Sindhi (sd), Saraiki (skr), Uyghur (ug), Urdu (ur)
  • Bengali: Assamese (as), Bangla (bn), Bishnupriya Manipuri (bpy)
  • Devanagari: Awadhi (awa), Bhojpuri (bh), Doteli (dty), Goan Konkani (gom), Hindi (hi), Kashmiri[†] (ks), Maithili (mai), Marathi (mr), Nepali (ne), Newari (new), Pali (pi), Sanskrit (sa)
  • Gujarati: Gujarati (gu)
  • Gurmukhi: Punjabi (pa)
  • Kannada: Kannada (kn), Tulu (tcy)
  • Khmer: Khmer (km)
  • Malayalam: Malayalam (ml)
  • Odia: Odia (or)
  • Sinhala: Sinhala (si)
  • Tamil: Tamil (ta)
  • Telugu: Telugu (te)

[†] Kashmiri is listed twice: it uses both Arabic and Devanagari.

Not Sure:

  • Javanese: Javanese (jv) (most pages in Latin, but a small number in Javanese)
  • Lontara: Buginese (bug) (most pages in Latin, not sure if Buginese supports bolding)

Let me know if you have any questions or need any other help!

Thank you!

I believe this can be achieved by adding a parameter to the wvui TypeaheadSearch component and then adding a new configuration option to $wgVectorWvuiSearchOptions

I went ahead and looked up all of the "Other" wikis that are specified as being in the languages in the list. These should have $wgVectorWvuiSearchOptions configured appropriately, too.

lang  id                url
bn    bdwikimedia       bd.wikimedia.org
bn    wbwikimedia       wb.wikimedia.org
hi    hiwikimedia       hi.wikimedia.org
mai   maiwikimedia      mai.wikimedia.org
pa    punjabiwikimedia  punjabi.wikimedia.org
ovasileva updated the task description. (Show Details)

Will split and make a separate task for fixing this within the old vector search

ovasileva set the point value for this task to 3.