TJones (Trey Jones)
Sr. Software Engineer, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (184 w, 6 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Today

TJones added a parent task for T214439: Review Manually re-built Hebmorph plugin: T194849: Investigate language analyzers in ElasticSearch 6.
Tue, Jan 22, 9:20 PM · Discovery-Search (Current work)
TJones added a subtask for T194849: Investigate language analyzers in ElasticSearch 6: T214439: Review Manually re-built Hebmorph plugin.
Tue, Jan 22, 9:20 PM · Discovery-Search (Current work)
TJones created T214439: Review Manually re-built Hebmorph plugin.
Tue, Jan 22, 9:19 PM · Discovery-Search (Current work)

Fri, Jan 18

TJones added a project to T213936: Deploy new version of TextCat: Discovery-Search (Current work).
Fri, Jan 18, 3:50 PM · Discovery-Search (Current work), Discovery
TJones moved T213931: Update TextCat with wrong-keyboard models from Needs review to Done on the Discovery-Search (Current work) board.
Fri, Jan 18, 3:49 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Thu, Jan 17

TJones added a comment to T213931: Update TextCat with wrong-keyboard models.

Updated Perl models on GitHub. PHP models are awaiting review in the patch above.

Thu, Jan 17, 6:23 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones removed projects from T213935: Revert changes to TextCat that add dependency on autoload.php: Patch-For-Review, Discovery-Search (Current work).
Thu, Jan 17, 2:28 PM · Discovery
TJones closed T213935: Revert changes to TextCat that add dependency on autoload.php as Declined.

Thanks for the info and explanation, @Smalyshev. I should have waited for your reply.

Thu, Jan 17, 2:27 PM · Discovery
TJones closed T213935: Revert changes to TextCat that add dependency on autoload.php, a subtask of T213931: Update TextCat with wrong-keyboard models, as Declined.
Thu, Jan 17, 2:27 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Wed, Jan 16

TJones added a comment to T213959: Decide order of operations for elastic 6 upgrade.

I agree that a custom version of Elastic 5.5.2 seems iffy, so unless someone has a strong argument for it, we should skip that option. The proposed "re-order deployment" plan seems reasonable.

Wed, Jan 16, 8:03 PM · MW-1.33-notes (1.33.0-wmf.12; 2019-01-08), Patch-For-Review, Discovery-Search (Current work)
TJones moved T213931: Update TextCat with wrong-keyboard models from In progress to Needs review on the Discovery-Search (Current work) board.
Wed, Jan 16, 7:49 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T213935: Revert changes to TextCat that add dependency on autoload.php from In progress to Needs review on the Discovery-Search (Current work) board.
Wed, Jan 16, 7:43 PM · Discovery
TJones renamed T213935: Revert changes to TextCat that add dependency on autoload.php from Revert changes to TextCat that make MediaWiki a dependency to Revert changes to TextCat that add dependency on autoload.php.
Wed, Jan 16, 3:56 PM · Discovery
TJones added a comment to T213935: Revert changes to TextCat that add dependency on autoload.php.

/vendor/autoload.php is provided by composer, not MediaWiki, isn't it?

Wed, Jan 16, 3:53 PM · Discovery
TJones moved T213931: Update TextCat with wrong-keyboard models from Backlog to In progress on the Discovery-Search (Current work) board.
Wed, Jan 16, 3:39 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T213935: Revert changes to TextCat that add dependency on autoload.php from Backlog to In progress on the Discovery-Search (Current work) board.
Wed, Jan 16, 3:39 PM · Discovery
TJones triaged T213936: Deploy new version of TextCat as High priority.
Wed, Jan 16, 3:39 PM · Discovery-Search (Current work), Discovery
TJones triaged T213935: Revert changes to TextCat that add dependency on autoload.php as High priority.
Wed, Jan 16, 3:36 PM · Discovery
TJones triaged T213931: Update TextCat with wrong-keyboard models as High priority.
Wed, Jan 16, 3:14 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Tue, Jan 15

TJones added a comment to T213105: Run wikidata entity autocomplete optimizer for more languages.

Very nice! This is one of those subtle improvements that doesn't make a ton of difference to any one search (one whole character saved!) but adds up over all the users who will benefit. Good stuff!

Tue, Jan 15, 10:22 PM · Discovery-Search (Current work)
TJones added a comment to T205746: Cleanup wikidata autocomplete logs.

That sounds reasonable. If it seems to be causing a problem in the future, we know where this ticket is.

Tue, Jan 15, 7:29 PM · User-Smalyshev, Discovery-Search (Current work)

Mon, Jan 14

TJones moved T212885: NLP contractor set up and access from In progress to Waiting/Blocked on the Discovery-Search (Current work) board.
Mon, Jan 14, 6:32 PM · Discovery-Search (Current work)

Tue, Jan 8

TJones added a comment to T212885: NLP contractor set up and access.

@EBernhardson, I've reviewed all the files. I made copies (adding _edit) and deleted a few lines from three of them. I'm assuming the various id fields are MD5 hashes or similar and not something that can be decoded.

Tue, Jan 8, 3:49 PM · Discovery-Search (Current work)

Thu, Jan 3

TJones edited projects for T212891: [EPIC-ish][Milestone 3] Implement NLP Search Suggestion Method 2 for CJK languages, added: Discovery-Search; removed Discovery-Search (Current work).
Thu, Jan 3, 8:10 PM · Chinese-Sites, Discovery-Search, Epic
TJones edited projects for T212888: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 0 for English, added: Discovery-Search; removed Discovery-Search (Current work).
Thu, Jan 3, 8:10 PM · Discovery-Search, Epic
TJones edited projects for T212889: [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 1 for 10 languages, added: Discovery-Search; removed Discovery-Search (Current work).
Thu, Jan 3, 8:09 PM · Discovery-Search, Epic
TJones updated the task description for T212891: [EPIC-ish][Milestone 3] Implement NLP Search Suggestion Method 2 for CJK languages.
Thu, Jan 3, 8:08 PM · Chinese-Sites, Discovery-Search, Epic
TJones triaged T212889: [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 1 for 10 languages as Normal priority.
Thu, Jan 3, 8:08 PM · Discovery-Search, Epic
TJones updated the task description for T212884: [EPIC] Improve Search Suggestions with NLP.
Thu, Jan 3, 8:07 PM · Epic, Discovery-Search (Current work)
TJones triaged T212891: [EPIC-ish][Milestone 3] Implement NLP Search Suggestion Method 2 for CJK languages as Normal priority.
Thu, Jan 3, 8:06 PM · Chinese-Sites, Discovery-Search, Epic
TJones updated the task description for T212889: [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 1 for 10 languages.
Thu, Jan 3, 8:05 PM · Discovery-Search, Epic
TJones created T212889: [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 1 for 10 languages.
Thu, Jan 3, 8:04 PM · Discovery-Search, Epic
TJones triaged T212888: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 0 for English as Normal priority.
Thu, Jan 3, 8:01 PM · Discovery-Search, Epic
TJones added a comment to T212885: NLP contractor set up and access.

@EBernhardson, Let me know if you want specific sub-tickets for any of these tasks. I'm okay with just checking them off as they get done and adding new ones as needed.

Thu, Jan 3, 7:54 PM · Discovery-Search (Current work)
TJones added a comment to T212884: [EPIC] Improve Search Suggestions with NLP.

@Julia.glen, you should be able to edit the task description if anything needs to be adjusted or corrected. (If not, let me know and I can make any edits).

Thu, Jan 3, 7:53 PM · Epic, Discovery-Search (Current work)
TJones moved T212885: NLP contractor set up and access from Backlog to In progress on the Discovery-Search (Current work) board.
Thu, Jan 3, 7:52 PM · Discovery-Search (Current work)
TJones updated the task description for T212884: [EPIC] Improve Search Suggestions with NLP.
Thu, Jan 3, 7:52 PM · Epic, Discovery-Search (Current work)
TJones triaged T212885: NLP contractor set up and access as Normal priority.
Thu, Jan 3, 7:51 PM · Discovery-Search (Current work)
TJones created T212884: [EPIC] Improve Search Suggestions with NLP.
Thu, Jan 3, 7:50 PM · Epic, Discovery-Search (Current work)

Dec 19 2018

TJones added a comment to T204688: cloudvps: shiny-r project trusty deprecation.

Done. Just tested everything and it's all good, so I've deleted the instances running Ubuntu Trusty. The only instances up are running Debian Stretch.

Dec 19 2018, 7:52 PM · Cloud-VPS (Ubuntu Trusty Deprecation)

Dec 18 2018

TJones added a comment to T211824: Investigate a “rare-character” index.

Another angle (from the Extension talk page): it might be useful to find articles with no rare characters (still looking for a concrete use case), so it makes sense in this initial investigation to track how many articles have no rare characters to see how well such a theoretical search conjunct would limit the scope of the more expensive part of a query.

Dec 18 2018, 8:23 PM · Discovery-Search

Dec 17 2018

TJones added a comment to T211824: Investigate a “rare-character” index.

Another idea from the Extension talk page: something equivalent to char:emoji to help people find weird editing errors and vandalism. See T59884, T126047, and T129310 for cases of weird editing bugs generating emoji.

Dec 17 2018, 10:36 PM · Discovery-Search
TJones added a comment to T211824: Investigate a “rare-character” index.

Another good use case from the Wiktionary discussion is regex searches without trigrams that the regex search acceleration can latch on to. As a use case, suppose you have a complex regex with no easy trigrams, but centered on finding specific cases of zero width non-joiners (ZWNJs). Adding a char:[ZWNJ] clause to the search vastly limits the universe of documents to be scanned with the much more complex regex, down to something that might finish before it times out.

Dec 17 2018, 9:11 PM · Discovery-Search
TJones added a comment to T211824: Investigate a “rare-character” index.

I've been thinking for a while that having synonyms would be handy, also for abbreviations for instance. The problem is a bit how to curate such a list, and how to scale that to multiple languages. For english symbols you could easily source it from the unicode definition table of course, but other languages ??? Maybe generate mappings based on feeding symbols through wikidata mapping and getting their labels ? Hmm, wikidata also has an abbreviation property of course...

Dec 17 2018, 8:32 PM · Discovery-Search
TJones added a comment to T211824: Investigate a “rare-character” index.
Dec 17 2018, 3:11 PM · Discovery-Search
TJones added a comment to T95849: Search for unicode symbols like ★ is inconsistent and unpredictable.

It is indeed inconsistent, but it is predictable if you have spent waaaaaay too much time digging into all this. The short version is that the standard tokenizer—which breaks text into words—used by most analyzers (for languages with spaces) for regular search, treats "symbols" like ★ as non-word characters. It's the stuff between words, like whitespace and punctuation.

Dec 17 2018, 3:06 PM · Discovery, CirrusSearch
TJones updated the task description for T95849: Search for unicode symbols like ★ is inconsistent and unpredictable.
Dec 17 2018, 2:59 PM · Discovery, CirrusSearch

Dec 14 2018

TJones added a comment to T211824: Investigate a “rare-character” index.

Another thought from Wiktionary: searching for rare characters in titles (especially zero-width non-joiners, directionality markers, soft hyphens, punctuation/whitespace outside of the Basic Latin Block, combining diacritics, etc.) would be useful. So maybe a titles-only index would be nice, too.

Dec 14 2018, 8:48 PM · Discovery-Search
TJones added a comment to T211824: Investigate a “rare-character” index.

I've opened conversations on English Wikipedia (moved) and Wiktionary, and on Commons.

Dec 14 2018, 6:50 PM · Discovery-Search
TJones added a comment to T211824: Investigate a “rare-character” index.

Additional notes from the on-wiki discussion:

Dec 14 2018, 3:32 PM · Discovery-Search

Dec 12 2018

TJones created T211824: Investigate a “rare-character” index.
Dec 12 2018, 9:52 PM · Discovery-Search

Dec 11 2018

TJones closed T59832: Fails to find same word with an apostrophe before (French usage) as Resolved.

The problem here is that the language rules are customized for the wiki's language. Elision is handled in French but not English.

Dec 11 2018, 6:54 PM · Discovery-Search, Discovery, CirrusSearch

Dec 7 2018

TJones added a comment to T211033: Analyze wbsearchentities AB test from nov/doc.

Thanks for the report. It is odd that the number of characters didn't go down—as discussed elsewhere—but the change in clicks@1 vs clicks@2 is a nice clear step in a good direction.

Dec 7 2018, 9:22 PM · Discovery-Search (Current work), CirrusSearch, Wikidata, Discovery
TJones added a comment to T208917: Build pipeline to transform elastic explains into feature vectors and a tf graph.

Overall i still think while the model used for the tuning metric doesn't exactly match the users, it is a useful metric to improve upon.

Dec 7 2018, 8:40 PM · Patch-For-Review, Discovery-Search (Current work)

Dec 5 2018

TJones added a comment to T211201: Investigate possibility to show the generated history.

@MichaelSchoenitzer_WMDE —thanks for opening this ticket! It would be great to be able to see the generated query!

Dec 5 2018, 2:32 PM · Advanced-Search, TCB-Team

Nov 27 2018

TJones added a comment to T155104: Detect "wrong keyboard" queries for Hebrew/American keyboards on EN/HE Wikipedias.

Bummer. The DWIM edit works, in that it generates wrong-keyboard suggestions in the main search box on Special:Search, but it interacts improperly with OOUI and creates two sets of suggestions, so it needs to be undone.

Nov 27 2018, 7:09 PM · Discovery-Search, Discovery
TJones moved T209537: Review a few more current metrics for accuracy from Backlog to Tests & Analysis on the Discovery-Search (Current work) board.
Nov 27 2018, 6:23 PM · Product-Analytics, Discovery-Search (Current work)
TJones added a comment to F27316679: wbsearchentities tuning analysis.pdf.

Interesting report—nice ways to slice the very messy optimization data.

Nov 27 2018, 3:29 PM

Nov 21 2018

TJones added a comment to T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias.

The Russian DWIM gadget has been updated! It works better with capital letters, and works on the main search input on the Special:Search page.

Nov 21 2018, 4:24 PM · Discovery-Search (Current work), Discovery

Nov 20 2018

TJones added a comment to T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias.

I've left some comments on the discussion page for the DWIM Gadget suggesting changes that should improve the performance of the gadget and make it work on the main search field on Special:Search.

Nov 20 2018, 9:32 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T155104: Detect "wrong keyboard" queries for Hebrew/American keyboards on EN/HE Wikipedias.

Related to this, @Amire80 was able to update the Hebrew DWIM gadget for me so it would work (or work again, I think) on the search box on the Special:Search page, so we effectively have wrong-keyboard detection for the completion suggester in the upper corner, and in the main search input on the search page.

Nov 20 2018, 9:20 PM · Discovery-Search, Discovery

Nov 17 2018

TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Nov 17 2018, 4:18 PM · Discovery-Search (Current work), Discovery
TJones moved T209156: Re-index Chinese Wikis to fix Surrogate Split from In progress to Done on the Discovery-Search (Current work) board.

All done!

Nov 17 2018, 4:17 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Waiting/Blocked to Done on the Discovery-Search (Current work) board.

Reindexing for the live search cluster (eqiad) is complete, and the example link now gives 46 results instead of ~94K. The spare cluster (codfw) is still running, so I won't move the re-indexing task to done until it finished.

Nov 17 2018, 3:27 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery

Nov 16 2018

TJones renamed T209156: Re-index Chinese Wikis to fix Surrogate Split from Re-index Chinese Wikis to Re-index Chinese Wikis to fix Surrogate Split.
Nov 16 2018, 8:25 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones added a comment to T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish].

Almost done. Reindexing (T209156) is still in progress. The smaller wikis (Wikiversity, Wikiquote, Wikibooks, Wikivoyage, Wikinews) are done and checked and everything looks good so far. Wikisource, Wiktionary, and Wikipedia are still processing.

Nov 16 2018, 8:11 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T209156: Re-index Chinese Wikis to fix Surrogate Split from Backlog to In progress on the Discovery-Search (Current work) board.
Nov 16 2018, 3:48 PM · Discovery-Search (Current work), Chinese-Sites, Discovery

Nov 14 2018

TJones triaged T209537: Review a few more current metrics for accuracy as Normal priority.
Nov 14 2018, 8:56 PM · Product-Analytics, Discovery-Search (Current work)
TJones moved T197128: Review current search metrics for accuracy and documentation from In progress to Done on the Discovery-Search (Current work) board.

I think we can close this ticket. The general consensus seems to be that the "anomalies" are mostly not errors and are just unexplained variation in usage patterns, which we don't necessarily need to track down. (There's one that still bothers me, though.)

Nov 14 2018, 8:50 PM · Patch-For-Review, Product-Analytics, Discovery-Search (Current work)

Nov 13 2018

TJones placed T178923: Review Japanese Morphological Libraries up for grabs.
Nov 13 2018, 6:51 PM · Discovery-Search, Discovery
TJones placed T178924: Review Vietnamese Morphological Libraries up for grabs.
Nov 13 2018, 6:51 PM · Discovery-Search, Discovery
TJones moved T177888: Review use of CJK vs ICU default language analyzers for "Chinese" Wikis from Tech Debt/Misc to Later on the Discovery-Search board.
Nov 13 2018, 6:51 PM · Chinese-Sites, Discovery-Search
TJones moved T185721: Null or inconsistent search results using Khmer script from Up Next to Later on the Discovery-Search board.
Nov 13 2018, 6:47 PM · CirrusSearch, Discovery, Discovery-Search
TJones moved T186401: searching I as 1 in Kabardian Wikipedia from Up Next to Later on the Discovery-Search board.
Nov 13 2018, 6:47 PM · Elasticsearch, Discovery-Search, I18n, Discovery
TJones moved T203117: Greek language analysis generates unexpected empty tokens from Up Next to Later on the Discovery-Search board.
Nov 13 2018, 6:47 PM · Discovery-Search
TJones added a comment to T178923: Review Japanese Morphological Libraries.

We've moved on to other tasks and aren't spending time looking at morphological libraries these days.

Nov 13 2018, 6:45 PM · Discovery-Search, Discovery
TJones added a comment to T178924: Review Vietnamese Morphological Libraries.

We've moved on to other tasks and aren't spending time looking at morphological libraries these days.

Nov 13 2018, 6:40 PM · Discovery-Search, Discovery
TJones moved T178924: Review Vietnamese Morphological Libraries from Up Next to Later on the Discovery-Search board.
Nov 13 2018, 6:33 PM · Discovery-Search, Discovery
TJones moved T178923: Review Japanese Morphological Libraries from Up Next to Later on the Discovery-Search board.
Nov 13 2018, 6:33 PM · Discovery-Search, Discovery
TJones moved T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias from Backlog to In progress on the Discovery-Search (Current work) board.
Nov 13 2018, 6:21 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T209348: Port the elasticsearch plugin extra-analysis-surrogates to 6.4.2 as a noop plugin.

Is this necessary? If we have to reindex everything when we upgrade to ES 6, then we should be okay, because the analysis config builder checks for the presence of extra-analysis-surrogates and will configure itself correctly without it. Or is there some interim stage of the upgrade where it needs to exist?

Nov 13 2018, 3:12 PM · Patch-For-Review, Discovery-Search (Current work)

Nov 9 2018

TJones added a comment to T209156: Re-index Chinese Wikis to fix Surrogate Split.

Sorry, @Liuxinyu970226. I noticed that "Re-index Chinese Wikis" was recurring in T147505. "3rd", etc seem just as arbitrary, and keeping count accurately could be hard. What about renaming this task "Re-index Chinese Wikis to fix T168427" or "Re-index Chinese Wikis to fix Surrogate Split" or something else that pointed to what the reindexing will enable?

Nov 9 2018, 3:22 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Nov 9 2018, 3:08 PM · Discovery-Search (Current work), Discovery
TJones triaged T209156: Re-index Chinese Wikis to fix Surrogate Split as Normal priority.
Nov 9 2018, 3:07 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones raised the priority of T209155: Deploy extra-analysis-surrogates & the experimental highlighter 5.5.2.4 to production from Low to Normal.
Nov 9 2018, 3:05 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones triaged T209155: Deploy extra-analysis-surrogates & the experimental highlighter 5.5.2.4 to production as Low priority.
Nov 9 2018, 3:04 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones renamed T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Characters in CJK extension C treated as U+FFFD when searching on zhWP to Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish].
Nov 9 2018, 3:01 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Backlog to Waiting/Blocked on the Discovery-Search (Current work) board.

I should have treated this ticket as an epic and created sub-tasks for it. The first part of the work—creating the plugin to re-merge surrogate pairs and the setting up the config to use the new plugin—was done on this ticket and is complete, but there is more to do. I don't want to close this ticket because the problem isn't solved yet, but the work I was doing here is done. So, after flailing around on the workboard a bit, I've moved it to Waiting, and I'll open sub-tasks for the remaining related tasks.

Nov 9 2018, 3:00 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Done to Backlog on the Discovery-Search (Current work) board.
Nov 9 2018, 2:55 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Needs review to Done on the Discovery-Search (Current work) board.
Nov 9 2018, 2:54 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery

Nov 7 2018

TJones added a comment to T208917: Build pipeline to transform elastic explains into feature vectors and a tf graph.

Very interesting stuff. Thanks for sharing the numbers. A few things come to mind.

Nov 7 2018, 10:31 PM · Patch-For-Review, Discovery-Search (Current work)
TJones added a comment to T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish].

Change 471204 merged by jenkins-bot:
https://gerrit.wikimedia.org/r/471204

Nov 7 2018, 8:02 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery

Nov 6 2018

TJones added a comment to T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish].

I ran a quick analysis of the effect of the analysis chain change to the indexing results. There isn't much to report, and nothing surprising, so I'm not going to do a full write up. I compared before and after the surrogate merging on 10,000 Wikipedia articles (out of ~1M) and 10,000 Wiktionary articles (out of ~800K).

Nov 6 2018, 7:45 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery

Nov 2 2018

TJones added a comment to T208496: search platform maven projects failing post merge build.

Thanks for hunting this down, @Gehel!

Nov 2 2018, 6:47 PM · Patch-For-Review, Release-Engineering-Team, Continuous-Integration-Config, Discovery-Search (Current work)
TJones awarded T208496: search platform maven projects failing post merge build a Like token.
Nov 2 2018, 6:46 PM · Patch-For-Review, Release-Engineering-Team, Continuous-Integration-Config, Discovery-Search (Current work)
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from In progress to Needs review on the Discovery-Search (Current work) board.
Nov 2 2018, 4:29 AM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery

Oct 31 2018

TJones moved T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias from In progress to Backlog on the Discovery-Search (Current work) board.
Oct 31 2018, 7:28 PM · Discovery-Search (Current work), Discovery
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Backlog to In progress on the Discovery-Search (Current work) board.
Oct 31 2018, 7:28 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery
TJones claimed T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish].
Oct 31 2018, 7:28 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T168427: Characters in CJK extension C treated as U+FFFD when searching on zhWP [EPIC-ish] from Up Next to Current work on the Discovery-Search board.
Oct 31 2018, 7:27 PM · Epic, MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, Discovery-Search (Current work), Chinese-Sites, Discovery