Page MenuHomePhabricator

TJones (Trey Jones)
Staff Computational Linguist, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (458 w, 6 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Today

TJones claimed T72899: Search box needs some normalization for Arabic Family languages.
Wed, Apr 24, 1:18 PM · Discovery-Search (Current work), CirrusSearch, Discovery-ARCHIVED, I18n, MediaWiki-Search
TJones moved T72899: Search box needs some normalization for Arabic Family languages from Incoming to In Progress on the Discovery-Search (Current work) board.
Wed, Apr 24, 1:17 PM · Discovery-Search (Current work), CirrusSearch, Discovery-ARCHIVED, I18n, MediaWiki-Search
TJones moved T362501: וי (U+05D5 vav, U+05D9 yod) doesn't find ױ (U+05F1 Yiddish vav yod) from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Apr 24, 1:14 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Fri, Apr 19

TJones moved T362501: וי (U+05D5 vav, U+05D9 yod) doesn't find ױ (U+05F1 Yiddish vav yod) from In Progress to Needs review on the Discovery-Search (Current work) board.
Fri, Apr 19, 9:49 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones added a comment to T362501: וי (U+05D5 vav, U+05D9 yod) doesn't find ױ (U+05F1 Yiddish vav yod).

In reading up on the ligatures, I found another ligature (yod-yod-patah ײַ) that has several variants, one using a ligature from above (double-yod + patah ײַ), one with separate characters (yod + yod + patah ייַ), and a less common variant with the patah in the middle (yod + patah + yod יַי). It looks like icu_normalizer already converts the single-character form (ײַ) to one using the double-yod ligature (ײַ).

Fri, Apr 19, 8:40 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Thu, Apr 18

TJones renamed T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping for other languages to 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping.
Thu, Apr 18, 7:06 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones renamed T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from Enable hiragana/katakana mapping for other languages to 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping for other languages.
Thu, Apr 18, 7:02 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones claimed T362501: וי (U+05D5 vav, U+05D9 yod) doesn't find ױ (U+05F1 Yiddish vav yod).
Thu, Apr 18, 1:59 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones moved T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Thu, Apr 18, 1:58 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Wed, Apr 17

TJones added a comment to T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping.

While we had planned to expand the deployment of the hiragana-to-katakana mapping from English to most other languages (though not Japanese), testing revealed that doing the mapping pre-tokenization interfered with the new ICU tokenizer's ability to parse Japanese text (on non-Japanese wikis).

Wed, Apr 17, 7:20 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Tue, Apr 16

TJones merged T177876: Investigate changing ICU tokenization from whitelist to blacklist into T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Tue, Apr 16, 2:45 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones merged task T177876: Investigate changing ICU tokenization from whitelist to blacklist into T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Tue, Apr 16, 2:45 PM · Discovery-Search

Mon, Apr 15

TJones added a comment to T362442: The search In vector-2022 and minerva does not lead to the full destination of the redirect when searching for the exact name.

#REDIRECT [[Page#Anchor]] - going to it should still lead you to the Anchor, not to #top of that page - as is being done here.

Mon, Apr 15, 6:38 PM · Discovery-Search, CirrusSearch
TJones updated the task description for T362310: SUP rate-limit fetch.
Mon, Apr 15, 3:47 PM · serviceops-radar, Discovery-Search (Current work), CirrusSearch
TJones set the point value for T361950: Ensure that WDQS query throttling does not interfere with federation to 3.
Mon, Apr 15, 3:44 PM · Discovery-Search (Current work), Wikidata
TJones updated the task description for T361950: Ensure that WDQS query throttling does not interfere with federation.
Mon, Apr 15, 3:44 PM · Discovery-Search (Current work), Wikidata
TJones added a comment to T362442: The search In vector-2022 and minerva does not lead to the full destination of the redirect when searching for the exact name.

Showing the canonical page title in the suggestion was a design decision that was made for Vector 2022, though it can be confusing when the full title isn't obviously related to the redirect title. T303013 has some potential heuristics for deciding when to show the redirect info. Feel free to chime in over there if you don't feel like your use case would be covered (the specific artificial example here would be covered).

Mon, Apr 15, 2:58 PM · Discovery-Search, CirrusSearch
TJones merged T362442: The search In vector-2022 and minerva does not lead to the full destination of the redirect when searching for the exact name into T303013: Indicate when search results are from redirects (sometimes).
Mon, Apr 15, 2:57 PM · Web-Team-Backlog, Design-System-Team, Codex, Desktop Improvements (Vector 2022)
TJones merged task T362442: The search In vector-2022 and minerva does not lead to the full destination of the redirect when searching for the exact name into T303013: Indicate when search results are from redirects (sometimes).
Mon, Apr 15, 2:56 PM · Discovery-Search, CirrusSearch
TJones added a comment to T362495: Commons search for galleries and categories shows code in the results.

Is this substantially different from T331389? ("...<nowiki> output in search result descriptions")

Mon, Apr 15, 2:39 PM · Discovery-Search, CirrusSearch

Thu, Apr 11

TJones moved T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Thu, Apr 11, 4:21 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Patch-For-Review, Discovery-Search (Current work)

Wed, Apr 10

TJones added a comment to T358495: Enable dotted_I_fix (almost?) everywhere.

Not sure if this task fixes that, lowercasing I and dotted I (İ) returns different lowercase letters

Wed, Apr 10, 9:44 PM · Patch-For-Review, Discovery-Search (Current work)

Tue, Apr 9

TJones claimed T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping.
Tue, Apr 9, 8:28 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones edited projects for T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping, added: Discovery-Search (Current work); removed Discovery-Search.
Tue, Apr 9, 8:27 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Fri, Apr 5

TJones moved T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures from In Progress to Needs review on the Discovery-Search (Current work) board.
Fri, Apr 5, 8:18 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Patch-For-Review, Discovery-Search (Current work)

Wed, Apr 3

TJones changed the point value for T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures from 5 to 3.
Wed, Apr 3, 7:20 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Patch-For-Review, Discovery-Search (Current work)
TJones moved T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures from Incoming to In Progress on the Discovery-Search (Current work) board.
Wed, Apr 3, 5:01 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Patch-For-Review, Discovery-Search (Current work)
TJones claimed T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures.
Wed, Apr 3, 5:01 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Patch-For-Review, Discovery-Search (Current work)
TJones moved T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from Language Stuff to needs triage on the Discovery-Search board.
Wed, Apr 3, 5:00 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones placed T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping up for grabs.
Wed, Apr 3, 4:59 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones moved T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from In Progress to Incoming on the Discovery-Search (Current work) board.
Wed, Apr 3, 4:58 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones added a comment to T358495: Enable dotted_I_fix (almost?) everywhere.

Not getting automated tags for some reason, but this is included in 1.42.0-wmf.25, so it will be deployed soon.

Wed, Apr 3, 4:56 PM · Patch-For-Review, Discovery-Search (Current work)
TJones claimed T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping.
Wed, Apr 3, 3:33 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones edited projects for T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping, added: Discovery-Search (Current work); removed Discovery-Search, Discovery-ARCHIVED.
Wed, Apr 3, 3:31 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
TJones moved T180387: 𝖤̶𝗇̶𝖺̶𝖻̶𝗅̶𝖾̶ Disable hiragana/katakana mapping from Incoming to In Progress on the Discovery-Search (Current work) board.
Wed, Apr 3, 3:31 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Fri, Mar 29

TJones created T361377: Refactor CirrusSearch AnalysisConfigBuilder Tests & Fixtures.
Fri, Mar 29, 4:05 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Patch-For-Review, Discovery-Search (Current work)
TJones triaged T359100: Analyze results of harmonization as High priority.
Fri, Mar 29, 2:47 PM · Discovery-Search (Current work)

Thu, Mar 28

TJones updated the task description for T219550: [EPIC] Harmonize language analysis across languages.
Thu, Mar 28, 9:44 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Discovery-Search (Current work), Epic
TJones moved T359100: Analyze results of harmonization from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Full write-up (and it's a lot!) is on MediaWiki.

Thu, Mar 28, 9:33 PM · Discovery-Search (Current work)
TJones moved T358495: Enable dotted_I_fix (almost?) everywhere from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Thu, Mar 28, 2:24 PM · Patch-For-Review, Discovery-Search (Current work)

Mar 21 2024

TJones moved T359100: Analyze results of harmonization from Incoming to In Progress on the Discovery-Search (Current work) board.
Mar 21 2024, 12:01 AM · Discovery-Search (Current work)

Mar 19 2024

TJones added a comment to T358495: Enable dotted_I_fix (almost?) everywhere.

The full write up is on MediaWiki.

Mar 19 2024, 9:29 PM · Patch-For-Review, Discovery-Search (Current work)
TJones claimed T353377: CirrusSearchIndexTooOld.

This is done for commons and wikidata for the production clusters (eqiad and codfw) as a result of T342444. (wikidata hasn't reindexed in cloudelastic yet, but it is in the queue.)

Mar 19 2024, 4:32 PM · Discovery-Search (Current work)

Mar 18 2024

TJones added a project to T358495: Enable dotted_I_fix (almost?) everywhere: Patch-For-Review.
Mar 18 2024, 9:22 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T358495: Enable dotted_I_fix (almost?) everywhere from In Progress to Needs review on the Discovery-Search (Current work) board.
Mar 18 2024, 9:11 PM · Patch-For-Review, Discovery-Search (Current work)
TJones added a comment to T358495: Enable dotted_I_fix (almost?) everywhere.

Patch for review: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1011442

Mar 18 2024, 9:11 PM · Patch-For-Review, Discovery-Search (Current work)
TJones reassigned T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair from TJones to EBernhardson.

Swapping Assignee and Other Assignee with Erik, since he's still working on reindexing cloudelastic via the new update pipeline backfill mechanism.

Mar 18 2024, 3:33 PM · Discovery-Search (Current work)

Mar 6 2024

TJones updated Other Assignee for T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair, added: EBernhardson.

Adding Erik as "other assignee" (never done that before) and increasing the points because Erik is doing more than usual for reindexing watching the cloudelastic reindex using the new update pipeline backfilling mechanism, and I've been doing more than usual gathering stats while fretting a bit over reindex speed.

Mar 6 2024, 8:33 PM · Discovery-Search (Current work)

Mar 4 2024

TJones created T359100: Analyze results of harmonization.
Mar 4 2024, 7:50 PM · Discovery-Search (Current work)
TJones created T359092: Requesting access to kubernetes deployment for tjones.
Mar 4 2024, 6:01 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24), SRE, SRE-Access-Requests
TJones moved T332337: Repair multi-script tokens split by the ICU tokenizer from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 4 2024, 4:15 PM · Discovery-Search (Current work)
TJones moved T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 4 2024, 4:15 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones claimed T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair.
Mar 4 2024, 4:11 PM · Discovery-Search (Current work)
TJones moved T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair from Blocked/Waiting to In Progress on the Discovery-Search (Current work) board.
Mar 4 2024, 4:10 PM · Discovery-Search (Current work)
TJones added a comment to T356651: Rebuild and deploy textify plugin.

@RKemper, it does look like it's deployed everywhere it should be!

Mar 4 2024, 1:41 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)

Feb 27 2024

TJones awarded T357473: Divehi wiki search button is misplaced on page load a Like token.
Feb 27 2024, 4:15 PM · Local-Wiki-Template-And-Gadget-Issues, Desktop Improvements (Vector 2022)

Feb 26 2024

TJones moved T358495: Enable dotted_I_fix (almost?) everywhere from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.
Feb 26 2024, 3:38 PM · Patch-For-Review, Discovery-Search (Current work)
TJones claimed T358495: Enable dotted_I_fix (almost?) everywhere.

I prioritized this task to have a smaller task to work on as a break after the ginormous T332337 and T356643, and to have something more interruptable to work on while T342444 is running in the background.

Feb 26 2024, 3:38 PM · Patch-For-Review, Discovery-Search (Current work)
TJones updated the task description for T219550: [EPIC] Harmonize language analysis across languages.
Feb 26 2024, 3:33 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Discovery-Search (Current work), Epic
TJones created T358495: Enable dotted_I_fix (almost?) everywhere.
Feb 26 2024, 3:31 PM · Patch-For-Review, Discovery-Search (Current work)
TJones triaged T332342: Standardize ASCII-folding/ICU-folding across analyzers as High priority.
Feb 26 2024, 3:03 PM · Discovery-Search
TJones moved T332342: Standardize ASCII-folding/ICU-folding across analyzers from needs triage to Language Stuff on the Discovery-Search board.
Feb 26 2024, 3:03 PM · Discovery-Search
TJones placed T332342: Standardize ASCII-folding/ICU-folding across analyzers up for grabs.

Moving this back to the backlog in favor of a smaller next harmonization project.

Feb 26 2024, 3:02 PM · Discovery-Search

Feb 21 2024

TJones moved T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Feb 21 2024, 11:08 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones moved T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair from In Progress to Needs review on the Discovery-Search (Current work) board.
Feb 21 2024, 11:01 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)

Feb 20 2024

TJones changed the point value for T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair from 5 to 8.

Full write up on MediaWiki.

Feb 20 2024, 10:21 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)

Feb 13 2024

TJones updated the task description for T357473: Divehi wiki search button is misplaced on page load.
Feb 13 2024, 9:25 PM · Local-Wiki-Template-And-Gadget-Issues, Desktop Improvements (Vector 2022)
TJones created T357473: Divehi wiki search button is misplaced on page load.
Feb 13 2024, 9:22 PM · Local-Wiki-Template-And-Gadget-Issues, Desktop Improvements (Vector 2022)

Feb 6 2024

TJones added a comment to T356651: Rebuild and deploy textify plugin.

T332337 has been comitted, so this is ready to go.

Feb 6 2024, 4:21 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)
TJones moved T332337: Repair multi-script tokens split by the ICU tokenizer from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Feb 6 2024, 4:19 PM · Discovery-Search (Current work)

Feb 5 2024

TJones updated the task description for T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair.
Feb 5 2024, 3:56 PM · Discovery-Search (Current work)
TJones added a parent task for T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair: T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Feb 5 2024, 3:55 PM · Discovery-Search (Current work)
TJones added a subtask for T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair: T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair.
Feb 5 2024, 3:55 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones renamed T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair from Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, and word_break_helper to Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair.
Feb 5 2024, 3:55 PM · Discovery-Search (Current work)
TJones updated the task description for T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Feb 5 2024, 3:50 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones added a subtask for T356651: Rebuild and deploy textify plugin: T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Feb 5 2024, 3:49 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)
TJones removed a subtask for T332337: Repair multi-script tokens split by the ICU tokenizer: T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Feb 5 2024, 3:49 PM · Discovery-Search (Current work)
TJones edited parent tasks for T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair, added: T356651: Rebuild and deploy textify plugin; removed: T332337: Repair multi-script tokens split by the ICU tokenizer.
Feb 5 2024, 3:49 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones created T356651: Rebuild and deploy textify plugin.
Feb 5 2024, 3:48 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)
TJones renamed T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair from Update AnalysisConfigBuilder to use icu_token_repair to Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Feb 5 2024, 3:45 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones updated the task description for T332337: Repair multi-script tokens split by the ICU tokenizer.
Feb 5 2024, 2:33 PM · Discovery-Search (Current work)
TJones changed the status of T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair, a subtask of T332337: Repair multi-script tokens split by the ICU tokenizer, from Open to In Progress.
Feb 5 2024, 2:32 PM · Discovery-Search (Current work)
TJones changed the status of T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair from Open to In Progress.
Feb 5 2024, 2:32 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)
TJones created T356643: Enable icu_tokenizer (almost) everywhere and update AnalysisConfigBuilder to use icu_token_repair.
Feb 5 2024, 2:31 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Discovery-Search (Current work)

Jan 26 2024

TJones added a comment to T332337: Repair multi-script tokens split by the ICU tokenizer.

More detailed writeup (which partially overlaps the plugin docs) on MediaWiki.

Jan 26 2024, 9:34 PM · Discovery-Search (Current work)

Jan 24 2024

TJones added a comment to T332337: Repair multi-script tokens split by the ICU tokenizer.

Gerrit patch for the plugin (which wasn't added here automatically): https://gerrit.wikimedia.org/r/c/search/extra/+/972478

Jan 24 2024, 5:44 PM · Discovery-Search (Current work)
TJones moved T332337: Repair multi-script tokens split by the ICU tokenizer from In Progress to Needs review on the Discovery-Search (Current work) board.
Jan 24 2024, 3:45 PM · Discovery-Search (Current work)

Dec 5 2023

TJones renamed T311051: Missing space between paragraphs in extract received using API (all wikis) from Missing space between paragraphs in extract received using API (cswiki) to Missing space between paragraphs in extract received using API (all wikis).
Dec 5 2023, 9:52 PM · TextExtracts
TJones added a comment to T311051: Missing space between paragraphs in extract received using API (all wikis).

This happens across all wikis, not just cswiki.

Dec 5 2023, 9:51 PM · TextExtracts

Dec 4 2023

TJones renamed T352538: [EPIC] Evaluate the impact of the graph split from Evaluate the impact of the graph split to [EPIC] Evaluate the impact of the graph split.
Dec 4 2023, 4:36 PM · Epic, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
TJones moved T352538: [EPIC] Evaluate the impact of the graph split from Incoming to Epics on the Discovery-Search (Current work) board.
Dec 4 2023, 4:36 PM · Epic, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
TJones changed the point value for T332337: Repair multi-script tokens split by the ICU tokenizer from 8 to 13.
Dec 4 2023, 4:17 PM · Discovery-Search (Current work)

Nov 13 2023

TJones added a comment to T350974: search/glent fails on Java 11.

Not sure what to do about Spark, but the Java 11 failure is arguably a feature, not a bug! The script that changed is Adlam, and Java 11 got smarter about it. Some of the other changes lsted there make me wonder what other texty corner cases are going to be affected by the upgrade.

Nov 13 2023, 7:37 PM · Discovery-Search (Current work), ci-test-error
TJones updated the task description for T351040: Re-implement the REST endpoint for related pages in PHP.
Nov 13 2023, 4:17 PM · Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, RESTBase Sunsetting

Oct 30 2023

TJones moved T346051: Refactor slow global analysis components from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Oct 30 2023, 4:04 PM · Discovery-Search (Current work)

Oct 26 2023

TJones claimed T332337: Repair multi-script tokens split by the ICU tokenizer.
Oct 26 2023, 8:35 PM · Discovery-Search (Current work)
TJones moved T332337: Repair multi-script tokens split by the ICU tokenizer from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.
Oct 26 2023, 8:34 PM · Discovery-Search (Current work)
TJones awarded T349827: mediawiki.util "debounce (old signature)" test occasionally fails a Like token.
Oct 26 2023, 3:17 PM · MediaWiki-General, ci-test-error (WMF-deployed Build Failure)

Oct 24 2023

TJones added a comment to T346051: Refactor slow global analysis components.

Dev notes and details on Mediawiki.

Oct 24 2023, 10:21 PM · Discovery-Search (Current work)

Oct 23 2023

TJones moved T346051: Refactor slow global analysis components from In Progress to Needs review on the Discovery-Search (Current work) board.
Oct 23 2023, 8:14 PM · Discovery-Search (Current work)