Page MenuHomePhabricator

TJones (Trey Jones)
Staff Computational Linguist, Search Platform Team

Today

  • No visible events.

Tomorrow

  • No visible events.

Saturday

  • No visible events.

User Details

User Since
Jul 8 2015, 3:02 PM (544 w, 9 h)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Yesterday

TJones closed T411666: Investigate re-ranking second-try exact matches, a subtask of T375215: [EPIC] Support "second-try" transliteration or wrong-keyboard searches (aka N.O.R.M.), as Resolved.
Wed, Dec 10, 8:06 PM · CirrusSearch, Epic, Discovery-Search
TJones closed T411666: Investigate re-ranking second-try exact matches as Resolved.

After thinking about this more while writing up the description, and talking with @dcausse today, I think I'm going to close this ticket for now. If this shows up more often and more annoyingly than it seems to me that it will, someone can re-open the ticket.

Wed, Dec 10, 8:06 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Tue, Dec 9

TJones updated the task description for T411666: Investigate re-ranking second-try exact matches.
Tue, Dec 9, 4:37 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Mon, Dec 8

TJones added a comment to T404858: A/B test using defaultsort with the completion suggester.

Looking at the en/fr/he report and the 20-wiki report.

Mon, Dec 8, 10:35 PM · Patch-For-Review, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
TJones updated subscribers of T411933: Convert between language variants when searching in non-Chinese wikis.

@Bugreporter, the code you linked to is currently only used for as-you-type suggestions, but your mention of page contents sounds like you are looking for matching in fulltext search results, too. Are thinking of one, the other, or both?

Mon, Dec 8, 9:24 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Thu, Dec 4

TJones moved T411112: Clean up documentation for regex searches from Needs Review to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Thu, Dec 4, 5:26 PM · Discovery-Search (2025.10.20 - 2025.12.31), Documentation, CirrusSearch
TJones added a comment to T411112: Clean up documentation for regex searches.

@Pppery, thanks for your bolder edits. I'm always a little hesitant to jump in and make sweeping changes from my WMF account. I appreciate you taking the info I was able to provide and editing it into shape! If you feel like things are done, we can move this ticket to "Done" on the Discovery workboard.

Thu, Dec 4, 3:15 PM · Discovery-Search (2025.10.20 - 2025.12.31), Documentation, CirrusSearch

Wed, Dec 3

TJones added a comment to T408737: Enable Georgian Transliteration Second Try mappings for autocomplete.

I also added a blurb to next week's Tech News.

Wed, Dec 3, 10:13 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones moved T411112: Clean up documentation for regex searches from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.

Okay, I think I'm done with the updates. (Thanks to @Shirayuki for marking them for translation and making a few other updates, too.)

Wed, Dec 3, 10:11 PM · Discovery-Search (2025.10.20 - 2025.12.31), Documentation, CirrusSearch
TJones created T411666: Investigate re-ranking second-try exact matches.
Wed, Dec 3, 7:14 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
TJones added a comment to T375215: [EPIC] Support "second-try" transliteration or wrong-keyboard searches (aka N.O.R.M.).

Is there any plan to make it so that search pages like https://ru.wikipedia.org/wiki/Special:Search/,fhfr_j,fvf («Барак Обама», Barack Obama) display relevant results as well?

Wed, Dec 3, 6:21 PM · CirrusSearch, Epic, Discovery-Search

Tue, Dec 2

TJones added a comment to T411112: Clean up documentation for regex searches.

I've made some edits and added some information. I will try to finish reviewing the rest of the section tomorrow to soften the language throughout from "never do use a bare regex" to "try no to use bare regexes". (Though if someone beats me to it, I won't complain!)

Tue, Dec 2, 10:05 PM · Discovery-Search (2025.10.20 - 2025.12.31), Documentation, CirrusSearch
TJones added a comment to T410758: Timeouts searching for terms and regular expressions too low.

We're off-topic for this ticket, but I'll reply to these here. @dcausse, do you have the link for the on-wiki help with regex searches you mentioned?

Tue, Dec 2, 8:39 PM · Discovery-Search, CirrusSearch
TJones changed the point value for T297761: Create a Latin-to-Devanagari transliteration second-chance search for Hindi wikis from 8 to 13.
Tue, Dec 2, 7:30 PM · Discovery-Search (2025.10.20 - 2025.12.31)
TJones claimed T411112: Clean up documentation for regex searches.

Started looking at this, and it's a little harder to match the style and tone than I expected.

Tue, Dec 2, 7:30 PM · Discovery-Search (2025.10.20 - 2025.12.31), Documentation, CirrusSearch

Wed, Nov 26

TJones added a comment to T410758: Timeouts searching for terms and regular expressions too low.

Indeed it is documented. But how to search for

||}}{{de|

on a wiki where "de" is a common preposition or pronoun? The stupid indexed search will ignore everything except the "de", and since "de" is on every page, it will not generate any "domain".

Maybe wikis would need a simple plain-text search, without all the flawed indexed "magic", and without the resource-hoggy regex magic.

Wed, Nov 26, 4:51 PM · Discovery-Search, CirrusSearch

Fri, Nov 21

TJones added a comment to T407520: Deploy various plugins to fix various things.

The plugins have been deployed to production CirrusSearch.

Fri, Nov 21, 6:40 PM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Thu, Nov 20

TJones claimed T297761: Create a Latin-to-Devanagari transliteration second-chance search for Hindi wikis.
Thu, Nov 20, 6:47 PM · Discovery-Search (2025.10.20 - 2025.12.31)
TJones moved T297761: Create a Latin-to-Devanagari transliteration second-chance search for Hindi wikis from Incoming to In Progress on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Thu, Nov 20, 6:46 PM · Discovery-Search (2025.10.20 - 2025.12.31)

Mon, Nov 17

TJones reassigned T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete from TJones to dcausse.
Mon, Nov 17, 5:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones added a comment to T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.

Added this:

The Wikimedia Search Team has recreated the "DWIM" gadget functionality server-side, for Russian and Hebrew Wikipedias. This feature appends cross-keyboard suggestions to the standard search-box suggestions. For example, searching for cxfcnmt on Russian Wikipedia will now append suggestions for счастье ("happiness"). We plan to enable this feature for other Russian and Hebrew wikis next week. See also Phabricator T408734 and related tickets.

Mon, Nov 17, 5:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones claimed T408745: Create "one-way" Russian wrong-keyboard mapping for English Wiki(pedia/s).
Mon, Nov 17, 4:17 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones added a comment to T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.

could you draft a message for the tech news ...

Mon, Nov 17, 3:27 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Nov 7 2025

TJones set the point value for T297761: Create a Latin-to-Devanagari transliteration second-chance search for Hindi wikis to 8.
Nov 7 2025, 7:50 PM · Discovery-Search (2025.10.20 - 2025.12.31)
TJones edited P85094 The CirrusSearch index dumps are moving.
Nov 7 2025, 2:58 PM

Nov 6 2025

TJones moved T297761: Create a Latin-to-Devanagari transliteration second-chance search for Hindi wikis from Language Stuff to needs triage on the Discovery-Search board.

Moving this to triage, since it is the next step in the NORM/SecondTry project.

Nov 6 2025, 8:36 PM · Discovery-Search (2025.10.20 - 2025.12.31)
TJones moved T408745: Create "one-way" Russian wrong-keyboard mapping for English Wiki(pedia/s) from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 6 2025, 5:50 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Nov 4 2025

TJones moved T408745: Create "one-way" Russian wrong-keyboard mapping for English Wiki(pedia/s) from Incoming to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 4 2025, 8:56 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Nov 3 2025

TJones moved T407440: Regexes with four 32-bit characters throw errors from To be Deployed to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 3 2025, 4:10 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones moved T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis from To be Deployed to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 3 2025, 4:09 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)
TJones added a comment to T405020: Harmonize Invisibles in Cirrus Language Analysis.

Mediawiki notes.

Nov 3 2025, 3:27 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Oct 31 2025

TJones moved T405020: Harmonize Invisibles in Cirrus Language Analysis from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.

Once this is live, we'll need to reindex. (Added to T408431)

Oct 31 2025, 3:42 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones updated the task description for T408431: Reindex all wikis.
Oct 31 2025, 3:41 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Oct 30 2025

TJones moved T405020: Harmonize Invisibles in Cirrus Language Analysis from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Oct 30 2025, 6:42 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Oct 29 2025

TJones added a project to T408745: Create "one-way" Russian wrong-keyboard mapping for English Wiki(pedia/s): Essential-Work.
Oct 29 2025, 8:26 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones created T408745: Create "one-way" Russian wrong-keyboard mapping for English Wiki(pedia/s).
Oct 29 2025, 8:25 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones created T408737: Enable Georgian Transliteration Second Try mappings for autocomplete.
Oct 29 2025, 7:29 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones renamed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete from Enable RU & HE DWIM-style Second Try mappings on-wiki to Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Oct 29 2025, 7:24 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones created T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Oct 29 2025, 7:22 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Oct 28 2025

TJones added a comment to T407514: Ignore MacOS .DS_Store in parent pom.

I often open the target folder to find the jar so I can deploy it locally. I also use the Finder with Java projects to recursively open folders to get all the way to the bottom quickly. (Anything that relies on people not opening folders is doomed to fail eventually.)

Oct 28 2025, 8:04 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
TJones added a comment to T407514: Ignore MacOS .DS_Store in parent pom.

I believe we should find what is causing them to appear in the first place

Oct 28 2025, 2:53 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work

Oct 27 2025

TJones claimed T405020: Harmonize Invisibles in Cirrus Language Analysis.
Oct 27 2025, 8:00 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones updated the task description for T405020: Harmonize Invisibles in Cirrus Language Analysis.
Oct 27 2025, 4:33 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
TJones moved T407440: Regexes with four 32-bit characters throw errors from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Oct 27 2025, 4:23 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones moved T407440: Regexes with four 32-bit characters throw errors from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Oct 27 2025, 4:21 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones moved T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Oct 27 2025, 4:20 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)

Oct 23 2025

TJones renamed T407520: Deploy various plugins to fix various things from Deploy cirrus-highlighter plugin to fix surrogate matching to Deploy various plugins to fix various things.
Oct 23 2025, 10:54 PM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones added a comment to T407520: Deploy various plugins to fix various things.

As part of T407440: Regexes with four 32-bit characters throw errors I've made some changes to the opensearch-extra plugin which also needs to be deployed. Would it make sense to wait and do them both at once?

Oct 23 2025, 6:55 PM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Oct 20 2025

TJones claimed T407440: Regexes with four 32-bit characters throw errors.
Oct 20 2025, 3:54 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones added a project to T406920: deepcategory search fails to show all expected results: Essential-Work.
Oct 20 2025, 3:51 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
TJones added a project to T407432: Follow-up AB test of dym language model variants: Essential-Work.
Oct 20 2025, 3:51 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), CirrusSearch
TJones moved T406920: deepcategory search fails to show all expected results from Incoming to Ready for Dev on the Discovery-Search (2025.09.26 - 2025.10.17) board.
Oct 20 2025, 3:49 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
TJones set the point value for T406920: deepcategory search fails to show all expected results to 5.
Oct 20 2025, 3:49 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
TJones renamed T405475: Search for L7 shows incomplete drop-down box from Search for L7 has shows incomplete drop-down box to Search for L7 shows incomplete drop-down box.
Oct 20 2025, 3:31 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata

Oct 16 2025

TJones added a comment to T390858: Improve CirrusSearch DYM suggestions using the phrase suggester on more content.

Thanks for running all the individual language reports—very interesting to look at them all. I skimmed them, comparing the charts for each to the all_wikis charts.

Oct 16 2025, 6:24 PM · MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), Epic, Discovery-Search, CirrusSearch
TJones added a project to T407440: Regexes with four 32-bit characters throw errors: Essential-Work.
Oct 16 2025, 4:48 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones added a project to T407514: Ignore MacOS .DS_Store in parent pom: Essential-Work.
Oct 16 2025, 4:48 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
TJones created T407520: Deploy various plugins to fix various things.
Oct 16 2025, 4:47 PM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones moved T404632: Searches for surrogate code units display search results with deleted characters from Needs Review to Done on the Discovery-Search (2025.09.26 - 2025.10.17) board.
Oct 16 2025, 4:45 PM · Essential-Work, Discovery-Search (2025.09.26 - 2025.10.17), CirrusSearch
TJones created T407514: Ignore MacOS .DS_Store in parent pom.
Oct 16 2025, 4:12 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
TJones added a comment to T404632: Searches for surrogate code units display search results with deleted characters.

One little spot in the code was counting characters instead of codepoints, so the last matched four-byte character got split in half by the closing highlight. Somewhere up the chain the unmatched high and low surrogates got stripped and the character disappeared.

Oct 16 2025, 3:27 PM · Essential-Work, Discovery-Search (2025.09.26 - 2025.10.17), CirrusSearch
TJones moved T404632: Searches for surrogate code units display search results with deleted characters from In Progress to Needs Review on the Discovery-Search (2025.09.26 - 2025.10.17) board.
Oct 16 2025, 3:22 PM · Essential-Work, Discovery-Search (2025.09.26 - 2025.10.17), CirrusSearch

Oct 15 2025

TJones added a project to T407440: Regexes with four 32-bit characters throw errors: CirrusSearch.
Oct 15 2025, 9:21 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
TJones created T407440: Regexes with four 32-bit characters throw errors.
Oct 15 2025, 9:18 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Oct 8 2025

TJones claimed T404632: Searches for surrogate code units display search results with deleted characters.

I've been poking around at this a bit, and it definitely seems to be the Experimental Highlighter that's causing the problem. When I disable it locally, highlighting works fine.

Oct 8 2025, 9:42 PM · Essential-Work, Discovery-Search (2025.09.26 - 2025.10.17), CirrusSearch

Sep 26 2025

TJones moved T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis from In Progress to Needs Review on the Discovery-Search (2025.09.05 - 2025.09.26) board.
Sep 26 2025, 1:53 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)

Sep 25 2025

TJones added a comment to T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis.

Patch is up, updated notes are done (new section here). Generally, there were a few hiccups, but only in cases where we are already limited to making a best-guess, so the deviations from the previous algorithm are acceptable.

Sep 25 2025, 9:22 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)

Sep 24 2025

TJones updated the task description for T405482: Expand poolcounter heuristics to better capture automated requests.
Sep 24 2025, 4:08 PM · Essential-Work, Discovery-Search (2025.09.26 - 2025.10.17), MW-1.45-notes (1.45.0-wmf.21; 2025-09-30)

Sep 22 2025

TJones added a comment to T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis.

In-progress notes (after some vacation—yay!—and illness—ugh!) on MediaWiki. I think I've got a workable algorithm, but I need to do a bit more implementation and integration.

Sep 22 2025, 9:31 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)
TJones added a comment to T404632: Searches for surrogate code units display search results with deleted characters.

I also thought at first that it might be treating [𐌰𐌹𐌶] as a set of six surrogates rather than three characters, but even without the character class, you get the funky missing highlight: intitle:"𐌰𐌹𐌶" insource:/𐌰/

Sep 22 2025, 5:38 PM · Essential-Work, Discovery-Search (2025.09.26 - 2025.10.17), CirrusSearch
TJones renamed T405020: Harmonize Invisibles in Cirrus Language Analysis from Harmonize Invisibles to Harmonize Invisibles in Cirrus Language Analysis.
Sep 22 2025, 3:19 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Sep 19 2025

TJones moved T87548: Full text search needs to split words on dashes from In Progress to Done on the Discovery-Search (2025.09.05 - 2025.09.26) board.
Sep 19 2025, 2:28 PM · MW-1.45-notes (1.45.0-wmf.20; 2025-09-23), Discovery-Search (2025.09.05 - 2025.09.26), Discovery-ARCHIVED, CirrusSearch, MediaWiki-Search

Sep 18 2025

TJones moved T87548: Full text search needs to split words on dashes from Incoming to In Progress on the Discovery-Search (2025.09.05 - 2025.09.26) board.
Sep 18 2025, 6:06 PM · MW-1.45-notes (1.45.0-wmf.20; 2025-09-23), Discovery-Search (2025.09.05 - 2025.09.26), Discovery-ARCHIVED, CirrusSearch, MediaWiki-Search
TJones edited projects for T87548: Full text search needs to split words on dashes, added: Discovery-Search (2025.09.05 - 2025.09.26); removed Discovery-Search.
Sep 18 2025, 6:06 PM · MW-1.45-notes (1.45.0-wmf.20; 2025-09-23), Discovery-Search (2025.09.05 - 2025.09.26), Discovery-ARCHIVED, CirrusSearch, MediaWiki-Search
TJones claimed T87548: Full text search needs to split words on dashes.

@Gehel, we use the icu_tokenizer (or our version of it) almost everywhere, and it does split on lots of hyphens and dashes correctly. I tested a bunch of dash-like symbols (- ‐ ‑ ﹣ - ‒ – — ゠ ⹀), and tested all of the tokenizers we use: ICU, standard, smartCN, Hebrew, sudachi, Kuromoji, Nori, Thai.

Sep 18 2025, 6:06 PM · MW-1.45-notes (1.45.0-wmf.20; 2025-09-23), Discovery-Search (2025.09.05 - 2025.09.26), Discovery-ARCHIVED, CirrusSearch, MediaWiki-Search
TJones created T405020: Harmonize Invisibles in Cirrus Language Analysis.
Sep 18 2025, 6:04 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Sep 11 2025

TJones changed the point value for T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis from 5 to 8.
Sep 11 2025, 5:48 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)
TJones moved T402858: Create new Cirrus class with Russian and Hebrew DWIM mappings from Needs Review to Done on the Discovery-Search (2025.09.05 - 2025.09.26) board.
Sep 11 2025, 5:47 PM · CirrusSearch, MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), Discovery-Search (2025.09.05 - 2025.09.26)

Sep 10 2025

TJones added a comment to T403826: Evaluate did-you-mean suggestion variants and decide on an AB test plan.

default_1 looks like a winner if default_1_variant turns out to be too expensive. The A/B test results on latency and clicks will be interesting.

Sep 10 2025, 8:33 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch
TJones added a comment to T403826: Evaluate did-you-mean suggestion variants and decide on an AB test plan.

The variant option really is doing the heavy lifting.

Sep 10 2025, 4:46 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch

Sep 9 2025

TJones added a comment to T403826: Evaluate did-you-mean suggestion variants and decide on an AB test plan.

Nice write up! Your conclusions make sense to me. Do you want to do an A/B/C/D test, or a series of (three?) A/B tests?

Sep 9 2025, 6:35 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch

Sep 8 2025

TJones renamed T177251: Dead keys prevent autocomplete in search box of skins using mediawiki.searchSuggest from Dead keys prevent autocomplete in search box of skins usins mediawiki.searchSuggest to Dead keys prevent autocomplete in search box of skins using mediawiki.searchSuggest.
Sep 8 2025, 8:21 PM · User-TheDJ, JavaScript, MediaWiki-Search
TJones added a comment to T177251: Dead keys prevent autocomplete in search box of skins using mediawiki.searchSuggest.

@TheDJ, this seems to be working for the current default skin. Could we either close the ticket (under the theory that not every old skin must be upgraded to have every new feature) or change the title and description to reflect which skins need to be fixed (under the theory that Vector isn't that old)? In the second case, Monobook has the dead key problem, too.

Sep 8 2025, 7:41 PM · User-TheDJ, JavaScript, MediaWiki-Search
TJones closed T94830: Track down missing regex search results on commons as Resolved.

The example file no longer has the information template in it two times, and regex results look reasonable. If there is another example, please re-open or create a new ticket.

Sep 8 2025, 3:48 PM · Discovery-Search (2025.09.05 - 2025.09.26), Discovery-ARCHIVED, MediaWiki-Search, CirrusSearch
TJones closed T102400: Search does not ignore soft hyphens (U+00AD, ­ ) as Resolved.

In the intervening time, this has been taken care of by the icu_normalize filter which is enabled everywhere.

Sep 8 2025, 3:37 PM · Discovery-Search (2025.09.05 - 2025.09.26), MediaWiki-Search, Discovery-ARCHIVED

Aug 26 2025

TJones changed the point value for T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis from 3 to 5.
Aug 26 2025, 5:51 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)
TJones moved T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis from Incoming to In Progress on the Discovery-Search (2025.08.15 - 2025.09.05) board.
Aug 26 2025, 3:33 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)
TJones claimed T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis.
Aug 26 2025, 3:32 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)
TJones moved T402858: Create new Cirrus class with Russian and Hebrew DWIM mappings from In Progress to Needs Review on the Discovery-Search (2025.08.15 - 2025.09.05) board.
Aug 26 2025, 3:29 PM · CirrusSearch, MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), Discovery-Search (2025.09.05 - 2025.09.26)

Aug 25 2025

TJones created T402864: Integrate RU & HE DWIM-style mappings into autocomplete.
Aug 25 2025, 9:24 PM · CirrusSearch, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.25; 2025-10-28), Essential-Work
TJones added a comment to T402858: Create new Cirrus class with Russian and Hebrew DWIM mappings.

A single multi-script mapping turns out not to be so great. While most letters map one-to-one across keyboards (я ↔︎ z), punctuation does not, and there are chains of mappings. Those in RU/QWERTY:

Aug 25 2025, 9:07 PM · CirrusSearch, MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), Discovery-Search (2025.09.05 - 2025.09.26)
TJones set the point value for T402858: Create new Cirrus class with Russian and Hebrew DWIM mappings to 3.
Aug 25 2025, 8:33 PM · CirrusSearch, MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), Discovery-Search (2025.09.05 - 2025.09.26)
TJones created T402858: Create new Cirrus class with Russian and Hebrew DWIM mappings.
Aug 25 2025, 8:33 PM · CirrusSearch, MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), Discovery-Search (2025.09.05 - 2025.09.26)

Aug 21 2025

TJones triaged T402220: Sudachi analysis chain fails on long emoji sequence as High priority.
Aug 21 2025, 5:24 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch, MW-1.45-notes (1.45.0-wmf.16; 2025-08-26)

Aug 20 2025

TJones moved T402220: Sudachi analysis chain fails on long emoji sequence from Needs Review to To be Deployed on the Discovery-Search (2025.08.15 - 2025.09.05) board.
Aug 20 2025, 3:47 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch, MW-1.45-notes (1.45.0-wmf.16; 2025-08-26)

Aug 19 2025

TJones moved T402220: Sudachi analysis chain fails on long emoji sequence from Incoming to Needs Review on the Discovery-Search (2025.07.25 - 2025.08.15) board.
Aug 19 2025, 6:55 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch, MW-1.45-notes (1.45.0-wmf.16; 2025-08-26)
TJones added a comment to T402220: Sudachi analysis chain fails on long emoji sequence.

I can generate the error with the right sequences of katakana, hiragana, emoji, Gothic, and Thai characters.. long sequences of multibyte characters seems to be a reliable source of errors. I never had errors under 8000 characters. Sometimes I got errors at 16000 characters, sometimes 32000 (depending on character type). It looks like the threshold for problems is often around 11,000 characters, but I didn't dig too deeply.

Aug 19 2025, 6:55 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch, MW-1.45-notes (1.45.0-wmf.16; 2025-08-26)

Aug 18 2025

TJones changed the status of T402220: Sudachi analysis chain fails on long emoji sequence from Open to In Progress.
Aug 18 2025, 10:38 PM · Discovery-Search (2025.09.05 - 2025.09.26), CirrusSearch, MW-1.45-notes (1.45.0-wmf.16; 2025-08-26)
TJones moved T317599: Allow ^ and $ in intitle regex search from To be Deployed to Done on the Discovery-Search (2025.07.25 - 2025.08.15) board.
Aug 18 2025, 3:28 PM · User-notice-archive, Discovery-Search (2025.08.15 - 2025.09.05), CirrusSearch
TJones moved T375567: Review indic_normalization for other Indic languages/scripts from To be Deployed to Done on the Discovery-Search (2025.07.25 - 2025.08.15) board.
Aug 18 2025, 3:23 PM · Essential-Work, Discovery-Search (2025.08.15 - 2025.09.05)