Page MenuHomePhabricator

TJones (Trey Jones)
Sr. Computational Linguist, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (310 w, 4 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Wed, Jun 16

TJones moved T280601: Reindex Commons and Wikidata on eqiad and cloudelastic from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Wed, Jun 16, 3:00 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones updated the task description for T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.
Wed, Jun 16, 2:59 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones moved T280184: Enable reindexing the Commons "File" index in Cloudelastic by default from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Jun 16, 2:57 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Discovery-Search (Current work)
TJones updated the task description for T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.
Wed, Jun 16, 1:36 AM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)

Tue, Jun 15

TJones added a comment to T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.

Yeah... I just wasn't thinking about it. I have a tiny patch for T280184 that turns that fatal error into an output message, so it can continue on to the File index under normal operation.

Tue, Jun 15, 7:22 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones moved T280184: Enable reindexing the Commons "File" index in Cloudelastic by default from In Progress to Needs review on the Discovery-Search (Current work) board.
Tue, Jun 15, 6:48 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Discovery-Search (Current work)
TJones moved T280184: Enable reindexing the Commons "File" index in Cloudelastic by default from Incoming to In Progress on the Discovery-Search (Current work) board.
Tue, Jun 15, 6:38 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Discovery-Search (Current work)
TJones edited projects for T280184: Enable reindexing the Commons "File" index in Cloudelastic by default, added: Discovery-Search (Current work); removed Discovery-Search.
Tue, Jun 15, 6:38 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Discovery-Search (Current work)
TJones claimed T280184: Enable reindexing the Commons "File" index in Cloudelastic by default.

I'm running into this again, so I'm going to try to go ahead and fix it.

Tue, Jun 15, 6:38 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Discovery-Search (Current work)
TJones updated the task description for T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.
Tue, Jun 15, 4:42 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones added a comment to T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.

The Cloudelastic reindex of commonswiki finished without an explicit error, but died when it tried to create an archive index—so it didn't get to the file index.

Tue, Jun 15, 2:59 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones updated the task description for T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.
Tue, Jun 15, 2:58 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)

Mon, Jun 14

TJones added a comment to T189511: Locally override the name of crh from "Crimean Turkish" to "Crimean Tatar".

We've been waiting for three years for a change from CLDR or clear feedback on this ticket, so I've decided to be bold and make the change in LocalNamesEn.php. In addition to being more correct (and making the speaker community happier), it is also more consistent, since crh-cyrl and crh-latn already have local English names of "Crimean Tatar (<x> script)".

Mon, Jun 14, 7:01 PM · MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), Language codes, Upstream, MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), WikimediaMessages, MediaWiki-extensions-CLDR
TJones claimed T280601: Reindex Commons and Wikidata on eqiad and cloudelastic.
Mon, Jun 14, 5:20 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones moved T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Jun 14, 3:34 PM · Discovery-Search (Current work)
TJones moved T277213: Eliminate old M2 suggestions with improper tokenization from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Jun 14, 3:28 PM · Discovery-Search (Current work), Chinese-Sites

Wed, Jun 9

TJones placed T284691: Reindex Basque, Catalan, Danish Wikis up for grabs.
Wed, Jun 9, 7:22 PM · Discovery-Search (Current work)
TJones edited projects for T284691: Reindex Basque, Catalan, Danish Wikis, added: Discovery-Search; removed Discovery-Search (Current work).
Wed, Jun 9, 7:22 PM · Discovery-Search (Current work)
TJones updated the task description for T284691: Reindex Basque, Catalan, Danish Wikis.
Wed, Jun 9, 7:22 PM · Discovery-Search (Current work)
TJones updated the task description for T284185: Reindex German, Dutch, and Portugese Wikis.
Wed, Jun 9, 7:21 PM · Discovery-Search (Current work)
TJones updated the task description for T272606: [EPIC] Unpack all Elasticsearch analyzers.
Wed, Jun 9, 7:19 PM · Discovery-Search (Current work)
TJones moved T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Jun 9, 7:18 PM · Discovery-Search (Current work)
TJones set the point value for T284691: Reindex Basque, Catalan, Danish Wikis to 3.
Wed, Jun 9, 7:18 PM · Discovery-Search (Current work)
TJones created T284691: Reindex Basque, Catalan, Danish Wikis.
Wed, Jun 9, 7:17 PM · Discovery-Search (Current work)
TJones updated the task description for T284185: Reindex German, Dutch, and Portugese Wikis.
Wed, Jun 9, 6:58 PM · Discovery-Search (Current work)
TJones moved T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers from In Progress to Needs review on the Discovery-Search (Current work) board.
Wed, Jun 9, 3:51 PM · Discovery-Search (Current work)

Tue, Jun 8

TJones added a comment to T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers.

Basque, Catalan, and Danish Notes

  • Usual 10K sample over a 1–4 week period from Wikipedia and Wiktionary for each language.
  • Usual distribution of tokens—lots of CJK one-character tokens; long tokens are URLs, \u encoded tokens, file names, numbers, etc.
  • Stemming observations:
    • Catalan Wikipedia had up to 180(!) distinct tokens in stemming groups.
    • Basque Wikipedia had up to 200(!!) distinct tokens in stemming groups.
    • Danish Wikipedia had a mere 30 distinct tokens in its largest stemming group.
  • Unpacking was uneventful (disabled homoglyph and ICU normalization upgrades).
    • Note that word_break_helper is no longer configured. However, it doesn't do anything with a monolithic analyzer, so there is no change in functionality.
  • Enabled homoglyphs and found a handful of examples in all six samples.
    • Catalan Wikipedia had two mixed–Cyrillic/Greek/Latin tokens!
    • Found Greek/Latin examples in all three Wikipedias and Danish Wiktionary, and Greek/Cyrillic in Catalan Wikipedia.
  • Enabled ICU normalization and saw the usual normalizations.
    • The expected regression: Dotted I (İ) is lowercased as i̇ — fixed with a char_filter map
    • Most common normalizations: lots of ß and invisibles (soft-hyphen, bidi marks, etc.) all around; 1ª, 1º for Basque and Catalan Wikipedias, and some full-width characters for Catalan Wikipedia.
    • Catalan Wikipedia also loses a lot (12K+ out of 4.1M) of "E⎵" and "O⎵" tokens, where ⎵ represents a "zero-width no-break space" (U+FEFF). "e" and "o" are stop words—"o" means "or", but "e" just seems to refer to the letter; weird. The versions with U+FEFF seem to be used exclusively in coordinates ("E" stands for "est", which is "east"; "O" stands for "oest", which is "west"). Since the coords are very exact (e.g., "42.176388888889°N,3.0416666666667°E"), I don't think many people are searching for them specifically, and if they are, the plain field will help them out.
  • Enabled custom ICU folding for each language, saw lots of the usual folding effects.
    • Exempted [ñ] for Basque and [æ, ø, å] for Danish. [ç] was unclear for Basque and Catalan, but I let it be folded to c for both for the first pass.
    • ˈstressˌmarks, ɪᴘᴀ ɕɦɑʀɐƈʈɛʁʂ, and dìáçrïťɨčãł marks were normalized all around.
    • Basque: ç → c is not 100% clear in all cases, but seems to be overall beneficial.
    • Catalan Wiktionary: ç → c is not 100% clear in all cases, but seems to be overall beneficial.
    • Catalan Wikipedia:
      • Lots of high-impact collisions (ten or more distinct words merged into another group—often two largish groups merging). They came in three flavors:
        • The majority are ç → c; most look ok
        • A few ñ → n; these look good; mostly low frequency Spanish cognates merging with Catalan ones
        • Single letters merging with diacritical variants, like [eː, e̞, e͂, ê, ē, Ĕ, ɛ, ẹ, ẽ, ẽː] merging with [È, É, è, é]
      • Surprisingly, lots of Japanese Katakana changes, deleting the prolonged sound mark ー.
    • Danish: Also straightened a fair number of curly quotes.
Tue, Jun 8, 10:05 PM · Discovery-Search (Current work)
TJones added a subtask for T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers: T284185: Reindex German, Dutch, and Portugese Wikis.
Tue, Jun 8, 3:45 PM · Discovery-Search (Current work)
TJones added a parent task for T284185: Reindex German, Dutch, and Portugese Wikis: T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers.
Tue, Jun 8, 3:45 PM · Discovery-Search (Current work)
TJones updated the task description for T272606: [EPIC] Unpack all Elasticsearch analyzers.
Tue, Jun 8, 3:43 PM · Discovery-Search (Current work)
TJones edited projects for T284578: Unpack Czech, Finnish, Galician Elasticsearch Analyzers, added: Discovery-Search; removed Discovery-Search (Current work).
Tue, Jun 8, 3:41 PM · Discovery-Search (Current work)
TJones created T284578: Unpack Czech, Finnish, Galician Elasticsearch Analyzers.
Tue, Jun 8, 3:41 PM · Discovery-Search (Current work)
TJones added a comment to T219550: [EPIC] Harmonize language analysis across languages.

Is this solely Wikipedias or will it be all WMF wikis? #AskingForAllTheNonWikipedias

Tue, Jun 8, 3:28 PM · Discovery-Search (Current work), Epic

Mon, Jun 7

TJones placed T280601: Reindex Commons and Wikidata on eqiad and cloudelastic up for grabs.

Removing Erik as the asignee because he worked on the code to improve reindexing (thanks!) but we still need to do the reindexing for these specific wikis, and anyone can pick up the task.

Mon, Jun 7, 4:02 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Discovery-Search (Current work)
TJones renamed T281359: Onboard teams with Grafana alerts to AlertManager from Onboard teams with Grafana alerts to AM to Onboard teams with Grafana alerts to AlertManager.
Mon, Jun 7, 3:44 PM · Wikidata, Wikidata-Query-Service, User-fgiunchedi, observability
TJones renamed T281454: Onboard teams with Prometheus-based alerts to AlertManager from Onboard teams with Prometheus-based alerts to AM to Onboard teams with Prometheus-based alerts to AlertManager.
Mon, Jun 7, 3:43 PM · Wikidata, Wikidata-Query-Service, User-fgiunchedi, observability
TJones added a comment to T11519: Search by page size.

insource:/.{3000}/ is a very expensive query. Don't use it without including a regular keyword of some sort to limit the number of articles to scan with the regex.

Mon, Jun 7, 3:18 PM · Discovery-Search, Discovery, CirrusSearch

Thu, Jun 3

TJones added a comment to T219550: [EPIC] Harmonize language analysis across languages.

Is this something we should report in Tech News, in that it will have some small effect on search results? Or is the user-facing effect too minimal and the benefits will mainly be seen on the backend side?

Thu, Jun 3, 10:15 PM · Discovery-Search (Current work), Epic

Wed, Jun 2

TJones moved T226812: de.wikipedia: search for "Bedusz" does not find "Będusz" from Language Stuff to needs triage on the Discovery-Search board.

After T281379 is deployed and T284185 is complete, recheck this ticket. I believe it should be fixed.

Wed, Jun 2, 8:04 PM · CirrusSearch, Discovery-Search
TJones moved T104814: Appropriately ignore diacritics for German-language wikis from Language Stuff to needs triage on the Discovery-Search board.

After T281379 is deployed and T284185 is complete, recheck this ticket. I believe it should be fixed.

Wed, Jun 2, 8:03 PM · Discovery-Search, Discovery, CirrusSearch
TJones updated the task description for T147505: [tracking] CirrusSearch: what is updated during re-indexing.
Wed, Jun 2, 8:01 PM · Tracking-Neverending, Epic, Discovery-Search (Current work), Discovery
TJones created T284185: Reindex German, Dutch, and Portugese Wikis.
Wed, Jun 2, 7:59 PM · Discovery-Search (Current work)

Thu, May 27

TJones removed a project from T87136: ~"daß" should not match "dass": Patch-For-Review.
Thu, May 27, 7:42 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones moved T87136: ~"daß" should not match "dass" from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Thu, May 27, 7:32 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones moved T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Thu, May 27, 7:32 PM · Discovery-Search (Current work)

Wed, May 26

TJones added a comment to T281797: Remove all bolding of search results on a variety of wikis.

Change 695229 had a related patch set uploaded (by Phuedx; author: Phuedx):

[wvui@master] Partially revert "[typeahead-suggestion-title] Preserve graphemes during splitting"

https://gerrit.wikimedia.org/r/695229

Wed, May 26, 3:02 PM · Patch-For-Review, MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), Vector, Readers-Web-Backlog (Kanbanana-FY-2020-21), WVUI, Vue.js (Vue.js Search Experience (Vector modern)), Bengali-Sites, Desktop Improvements

Tue, May 25

TJones claimed T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers.
Tue, May 25, 5:24 PM · Discovery-Search (Current work)

Mon, May 24

TJones moved T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Unpacking + ICU Norm + ICU Folding Impact on Spanish Wikipedia

Mon, May 24, 10:55 PM · Discovery-Search (Current work)
TJones closed T282572: [Session] Let's share our Search challenges as Resolved.

Good discussion and excellent notes! Thanks!

Mon, May 24, 2:53 PM · Wikimedia-Hackathon-2021

Sat, May 22

TJones added a comment to T282572: [Session] Let's share our Search challenges.

Thanks for the offer, @Bmueller! We seem to have done okay; a few other people in the meeting jumped in to help keep track of questions, too. We had a big group—it was great!

Sat, May 22, 2:16 PM · Wikimedia-Hackathon-2021
TJones moved T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis from Needs Reporting to In Progress on the Discovery-Search (Current work) board.

Eh, I'm moving this back to in progress. The reindex is done but there's still a little analysis to do.

Sat, May 22, 6:15 AM · Discovery-Search (Current work)
TJones moved T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Sat, May 22, 6:13 AM · Discovery-Search (Current work)

Fri, May 21

TJones claimed T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis.
Fri, May 21, 11:31 PM · Discovery-Search (Current work)
TJones moved T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Fri, May 21, 11:31 PM · Discovery-Search (Current work)

May 21 2021

TJones moved T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers from Incoming to Ready for Development on the Discovery-Search (Current work) board.
May 21 2021, 3:32 PM · Discovery-Search (Current work)
TJones renamed T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers from Unpack Basque, Catalan, Danish to Unpack Basque, Catalan, Danish Elasticsearch Analyzers.
May 21 2021, 3:31 PM · Discovery-Search (Current work)
TJones edited projects for T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers, added: Discovery-Search; removed Discovery-Search (Current work).
May 21 2021, 3:30 PM · Discovery-Search (Current work)
TJones removed a project from T272606: [EPIC] Unpack all Elasticsearch analyzers: MW-1.36-notes (1.36.0-wmf.36; 2021-03-23).
May 21 2021, 3:29 PM · Discovery-Search (Current work)
TJones updated the task description for T272606: [EPIC] Unpack all Elasticsearch analyzers.
May 21 2021, 3:29 PM · Discovery-Search (Current work)
TJones created T283366: Unpack Basque, Catalan, Danish Elasticsearch Analyzers.
May 21 2021, 3:27 PM · Discovery-Search (Current work)

May 19 2021

TJones added a comment to T90875: Convert tests/phpunit/phpunit.php entrypoint to plain PHPUnit with bootstrap file.

As requested in the email on wikitech-l:

May 19 2021, 10:43 PM · MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), Performance-Team (Radar), Patch-For-Review, User-kostajh, Code-Health-Metrics, Technical-Debt, MediaWiki-Core-Tests

May 18 2021

TJones moved T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers from In Progress to Needs review on the Discovery-Search (Current work) board.
May 18 2021, 9:16 PM · Discovery-Search (Current work)
TJones moved T87136: ~"daß" should not match "dass" from Incoming to Needs review on the Discovery-Search (Current work) board.
May 18 2021, 9:16 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones edited projects for T87136: ~"daß" should not match "dass", added: Discovery-Search (Current work); removed Discovery-Search.
May 18 2021, 9:15 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones claimed T87136: ~"daß" should not match "dass".
May 18 2021, 9:15 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones added a comment to T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers.
  • Usual 10K sample each from Wikipedia and Wiktionary for each language
  • Unpacking was uneventful (disabled homoglyph and ICU normalization upgrades)
  • Note that word_break_helper is no longer configured. However, it doesn't do anything with a monolithic analyzer, so there is no change in functionality.
  • Enabled homoglyphs and found a few examples in all three Wiktionary samples and the Portuguese Wikipedia sample.
  • Enabled ICU normalization and saw the usual normalization in most cases (but see German Notes below)
    • The expected regression: Dotted I (İ) is lowercased as i̇ — fixed with a char_filter map
    • German required customization to maintain ß for stopword processing.
  • Enabled custom ICU folding for each language, saw lots of the usual folding effects.
    • Most impactful ICU folding for all three Wikipedias (and Portuguese Wiktionary) is converting curly apostrophes to straight apostrophes so that (mostly French and some English) words match either way: d'Europe vs d’Europe or Don’t vs Don't.
    • Most common ICU folding for the other two Wiktionaries is removing middle dots from syllabified versions of words: Xe·no·kra·tie vs Xenokratie or qua·dra·fo·ni·scher vs quadrafonischer. (Portuguese uses periods for syllabification, so they remain.)
May 18 2021, 8:56 PM · Discovery-Search (Current work)
TJones added a comment to T87136: ~"daß" should not match "dass".

This is getting fixed as a side effect of unpacking the German analyzer in T281379.

May 18 2021, 8:24 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones added a comment to T282572: [Session] Let's share our Search challenges.

I've scheduled us for Saturday at 13:00 UTC.

May 18 2021, 2:49 PM · Wikimedia-Hackathon-2021

May 17 2021

TJones reopened T127003: Inter language script detection in search as "Open".

This is not a duplicate of T138958. It's a different kind of cross-keyboard issue. (My examples are in Russian/Cyrillic because it's what I know best.)

May 17 2021, 9:51 PM · Discovery-Search
TJones updated the task description for T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis.
May 17 2021, 4:52 PM · Discovery-Search (Current work)
TJones moved T271851: Clean up gui from the wdqs deploy repo and puppet from Needs Reporting to In Progress on the Discovery-Search (Current work) board.
May 17 2021, 3:35 PM · Patch-For-Review, User-Ladsgroup, Discovery-Search (Current work), Wikidata Query UI, Wikidata
TJones moved T277699: Unpack Spanish Elasticsearch Analyzer from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
May 17 2021, 3:28 PM · MW-1.37-notes (1.37.0-wmf.4; 2021-05-04), Discovery-Search (Current work)

May 13 2021

TJones updated the task description for T147505: [tracking] CirrusSearch: what is updated during re-indexing.
May 13 2021, 6:23 PM · Tracking-Neverending, Epic, Discovery-Search (Current work), Discovery
TJones renamed T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis from Reindex Spanish-language wikis to Reindex Spanish-language wikis to enable unpacked version of Spanish analysis.
May 13 2021, 6:21 PM · Discovery-Search (Current work)
TJones reassigned T147505: [tracking] CirrusSearch: what is updated during re-indexing from debt to MPhamWMF.
May 13 2021, 6:21 PM · Tracking-Neverending, Epic, Discovery-Search (Current work), Discovery
TJones added a subtask for T277699: Unpack Spanish Elasticsearch Analyzer: T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis.
May 13 2021, 6:19 PM · MW-1.37-notes (1.37.0-wmf.4; 2021-05-04), Discovery-Search (Current work)
TJones added a parent task for T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis: T277699: Unpack Spanish Elasticsearch Analyzer.
May 13 2021, 6:19 PM · Discovery-Search (Current work)
TJones created T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis.
May 13 2021, 6:18 PM · Discovery-Search (Current work)
TJones updated the task description for T272606: [EPIC] Unpack all Elasticsearch analyzers.
May 13 2021, 6:16 PM · Discovery-Search (Current work)
TJones updated the task description for T272606: [EPIC] Unpack all Elasticsearch analyzers.
May 13 2021, 6:15 PM · Discovery-Search (Current work)
TJones added a comment to T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers.

I'm going to try to do three at once (well, sequentially, but as one patch). I've upped the points from 3 to 5... we'll see if that's reasonable!

May 13 2021, 3:08 PM · Discovery-Search (Current work)
TJones renamed T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers from Unpack German Elasticsearch Analyzer to Unpack German, Portuguese, and Dutch Elasticsearch Analyzers.
May 13 2021, 3:06 PM · Discovery-Search (Current work)
TJones moved T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers from Ready for Development to In Progress on the Discovery-Search (Current work) board.
May 13 2021, 3:04 PM · Discovery-Search (Current work)

May 12 2021

TJones moved T279722: Exempt keywords from query length restrictions from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
May 12 2021, 11:32 PM · MW-1.37-notes (1.37.0-wmf.6; 2021-05-18), Discovery-Search (Current work), Growth-Team-Filtering, CirrusSearch, GrowthExperiments-NewcomerTasks, Growth-Team
TJones added a comment to T279722: Exempt keywords from query length restrictions.

I forgot that we discussed and decided to exempt all keywords from length restrictions; we collectively forgot to update the ticket, too! I've updated the ticket and will update my patch shortly.

May 12 2021, 8:11 PM · MW-1.37-notes (1.37.0-wmf.6; 2021-05-18), Discovery-Search (Current work), Growth-Team-Filtering, CirrusSearch, GrowthExperiments-NewcomerTasks, Growth-Team
TJones renamed T279722: Exempt keywords from query length restrictions from Exempt hastemplate search keyword from query length restrictions to Exempt keywords from query length restrictions.
May 12 2021, 8:10 PM · MW-1.37-notes (1.37.0-wmf.6; 2021-05-18), Discovery-Search (Current work), Growth-Team-Filtering, CirrusSearch, GrowthExperiments-NewcomerTasks, Growth-Team

May 11 2021

TJones moved T279722: Exempt keywords from query length restrictions from In Progress to Needs review on the Discovery-Search (Current work) board.
May 11 2021, 7:59 PM · MW-1.37-notes (1.37.0-wmf.6; 2021-05-18), Discovery-Search (Current work), Growth-Team-Filtering, CirrusSearch, GrowthExperiments-NewcomerTasks, Growth-Team
TJones claimed T279722: Exempt keywords from query length restrictions.
May 11 2021, 4:49 PM · MW-1.37-notes (1.37.0-wmf.6; 2021-05-18), Discovery-Search (Current work), Growth-Team-Filtering, CirrusSearch, GrowthExperiments-NewcomerTasks, Growth-Team
TJones updated the task description for T282572: [Session] Let's share our Search challenges.
May 11 2021, 3:50 PM · Wikimedia-Hackathon-2021

May 7 2021

TJones added a comment to T281797: Remove all bolding of search results on a variety of wikis.

I went ahead and looked up all of the "Other" wikis that are specified as being in the languages in the list. These should have $wgVectorWvuiSearchOptions configured appropriately, too.

May 7 2021, 6:07 PM · Patch-For-Review, MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), Vector, Readers-Web-Backlog (Kanbanana-FY-2020-21), WVUI, Vue.js (Vue.js Search Experience (Vector modern)), Bengali-Sites, Desktop Improvements

May 5 2021

TJones added a comment to T281797: Remove all bolding of search results on a variety of wikis.

Here's a list of scripts that I was able to verify have problems with conjuncts, ligatures, digraphs, etc.—plus the list of languages with wikis that use each script. Languages are listed with their code in parens.

May 5 2021, 8:52 PM · Patch-For-Review, MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), Vector, Readers-Web-Backlog (Kanbanana-FY-2020-21), WVUI, Vue.js (Vue.js Search Experience (Vector modern)), Bengali-Sites, Desktop Improvements

May 4 2021

TJones added a comment to T281797: Remove all bolding of search results on a variety of wikis.

I don't know if we have a list, but I can come up with something. It may not be 100% complete, but it should be a good start.

May 4 2021, 8:10 PM · Patch-For-Review, MW-1.37-notes (1.37.0-wmf.9; 2021-06-07), Vector, Readers-Web-Backlog (Kanbanana-FY-2020-21), WVUI, Vue.js (Vue.js Search Experience (Vector modern)), Bengali-Sites, Desktop Improvements
TJones added a comment to T258094: Improve Breton language analysis.

I've fetched the document counts for dezhañ and dezhi, and I grabbed the document counts for all the stop words from Wikisource, too.

May 4 2021, 5:36 PM · Discovery-Search (Current work)

May 3 2021

TJones removed the point value for T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers.
May 3 2021, 3:27 PM · Discovery-Search (Current work)
TJones edited projects for T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers, added: Discovery-Search (Current work); removed Discovery-Search.
May 3 2021, 3:11 PM · Discovery-Search (Current work)

Apr 30 2021

TJones added a comment to T258094: Improve Breton language analysis.

@VIGNERON—that's great!

Apr 30 2021, 7:08 PM · Discovery-Search (Current work)

Apr 28 2021

TJones added a comment to T277256: Bangla letters are getting broken in the search box .

TL;DR: The best solution is probably to turn off partial-match highlighting for wikis (like bnwiki) written primarily in scripts that have conjucts that don't do well when split by highlighting, but leave the generic combining-mark glomming logic in place for other wikis (like enwiki) that have highlighting and titles in such conjunct-using scripts, so the results don't look entirely ridiculous.

Apr 28 2021, 9:47 PM · Readers-Web-Backlog (Kanbanana-FY-2020-21), WVUI, Vue.js (Vue.js Search Experience (Vector modern)), Bengali-Sites, Desktop Improvements
TJones added a comment to T280131: Evaluate existing tools made to assist relevancy work.

This might be more of a team effort than something assigned to one person. It might also be a good time to improve, document, and share other tools that we individually have laying around. (For example, I have a few shell scripts that make working with vagrant easier. I've shared some with Maryum, but not with everyone else.)

Apr 28 2021, 6:00 PM · CirrusSearch, Discovery-Search
TJones set the point value for T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers to 5.
Apr 28 2021, 3:25 PM · Discovery-Search (Current work)
TJones created T281379: Unpack German, Portuguese, and Dutch Elasticsearch Analyzers.
Apr 28 2021, 3:25 PM · Discovery-Search (Current work)