Page MenuHomePhabricator

TJones (Trey Jones)
Sr. Computational Linguist, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (275 w, 6 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Mon, Oct 19

TJones moved T238151: Tune Glent Method 1 algorithm from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

It looks like we are currently running glent 0.2.3 which includes the patches referenced above. Checked the attached patches and it looks like everything is shipped. Should we close this and move on to figuring out how we want to put it in front of users?

Mon, Oct 19, 7:38 PM · Discovery-Search (Current work)
TJones created T265931: Set up dashboard to track resource usage for Commons and Wikidata Elasticsearch indexes.
Mon, Oct 19, 5:50 PM · Discovery-Search
TJones updated the task description for T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .
Mon, Oct 19, 5:42 PM · Discovery-Search (Current work)
TJones renamed T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices from Design the solution for Commons and Wikidata Elasticsearch indices to Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .
Mon, Oct 19, 5:35 PM · Discovery-Search (Current work)
TJones renamed T265246: Make search-related phabricator tags less confusing from Make search related phabricator tags less confusing to Make search-related phabricator tags less confusing.
Mon, Oct 19, 5:21 PM · Discovery-Search (Current work), PM
TJones renamed T265056: Cirrus Search dumps failed for some wikis from CIrrus Search dumps failed for some wikis to Cirrus Search dumps failed for some wikis.
Mon, Oct 19, 5:20 PM · Discovery-Search, CirrusSearch, Dumps-Generation
TJones renamed T265081: Fix Chinese Analysis Chain for Glent M2 from Review Chinese Analysis Chain for Glent M2 to Fix Chinese Analysis Chain for Glent M2.
Mon, Oct 19, 5:18 PM · Discovery-Search (Current work), Chinese-Sites
TJones updated the task description for T264404: Determine a way of separating truthy queries.
Mon, Oct 19, 3:33 PM · Discovery-Search, Wikidata, Wikidata-Query-Service
TJones edited projects for T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices , added: Discovery-Search (Current work); removed Discovery-Search.
Mon, Oct 19, 3:29 PM · Discovery-Search (Current work)
TJones renamed T265621: [EPIC] Create a dedicated elasticsearch cluster for Commons and Wikidata from [EPIC]Create a dedicated elasticsearch cluster for Commons and Wikidata to [EPIC] Create a dedicated elasticsearch cluster for Commons and Wikidata.
Mon, Oct 19, 3:25 PM · Discovery-Search
TJones renamed T265641: Build integration test suite for search platform airflow + hadoop + spark integration from Build integration test suite for search platform airflow + hadoop +spark integration to Build integration test suite for search platform airflow + hadoop + spark integration.
Mon, Oct 19, 3:20 PM · Discovery-Search

Wed, Oct 14

TJones closed T149047: Investigate tuning CirrusSearch parameters with optimisation algorithms as Resolved.

I'll go ahead and close this because it's been investigated, it's been almost four years since anything happened, and if we were to follow up on it again, we should open a new ticket. Thanks, @Aklapper!

Wed, Oct 14, 9:25 PM · CirrusSearch, Discovery
TJones added a comment to T237364: Write Glent M0 A/B test report.

Excellent analysis, @EBernhardson! We talked about writing up a paragraph about the results, but this is much more detailed and in-depth. Thanks for taking care of it!

Wed, Oct 14, 8:18 PM · Discovery-Search (Current work), CirrusSearch

Fri, Oct 9

TJones added a comment to T185721: Null or inconsistent search results using Khmer script .

Ok, here we are a year later. Sorry for the significant delay. Too many other projects have pushed ahead of this one. I'm working on this again and hope to have it done by the end of the calendar year.

Fri, Oct 9, 6:38 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones moved T185721: Null or inconsistent search results using Khmer script from Waiting to In Progress on the Discovery-Search (Current work) board.
Fri, Oct 9, 6:00 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones moved T244800: Analysis of Method 2 Suggestion results from In Progress to Waiting on the Discovery-Search (Current work) board.
Fri, Oct 9, 6:00 PM · Discovery-Search (Current work), Chinese-Sites

Thu, Oct 8

TJones added a subtask for T244800: Analysis of Method 2 Suggestion results: T265081: Fix Chinese Analysis Chain for Glent M2.
Thu, Oct 8, 7:49 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a parent task for T265081: Fix Chinese Analysis Chain for Glent M2: T244800: Analysis of Method 2 Suggestion results.
Thu, Oct 8, 7:49 PM · Discovery-Search (Current work), Chinese-Sites
TJones updated the task description for T265081: Fix Chinese Analysis Chain for Glent M2.
Thu, Oct 8, 7:26 PM · Discovery-Search (Current work), Chinese-Sites
TJones created T265081: Fix Chinese Analysis Chain for Glent M2.
Thu, Oct 8, 7:25 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T244800: Analysis of Method 2 Suggestion results.

Data has been wrangled and prepped for review. I have a Japanese reviewer, a likely Korean reviewer, and I'm waiting to hear back on Chinese. Because of a technical glitch, I only have older Japanese data (from Feb), but it should be fine.

Thu, Oct 8, 7:11 PM · Discovery-Search (Current work), Chinese-Sites

Fri, Sep 25

TJones moved T262610: Enable ICUTokNorm() for Glent M0 and M1 from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Today Erik and I worked through the process on getting this enabled. We ran into a few permissions and tool problems, but we improved the docs and will look into permissions to make it better for next time.

Fri, Sep 25, 7:24 PM · Discovery-Search (Current work)
TJones moved T262610: Enable ICUTokNorm() for Glent M0 and M1 from Waiting to In Progress on the Discovery-Search (Current work) board.
Fri, Sep 25, 3:05 PM · Discovery-Search (Current work)

Thu, Sep 24

TJones added a comment to T262610: Enable ICUTokNorm() for Glent M0 and M1.

Turns out there was a misunderstanding about how to access archiva.wikimedia.org, and all that is taken care of. Moved the ticket to waiting until Erik updates the filter that unintentionally filtered all of the Japanese data. When that's done, we can re-deploy the Glent jars just the once, then enable this.

Thu, Sep 24, 4:05 PM · Discovery-Search (Current work)
TJones moved T262610: Enable ICUTokNorm() for Glent M0 and M1 from In Progress to Waiting on the Discovery-Search (Current work) board.
Thu, Sep 24, 4:02 PM · Discovery-Search (Current work)

Tue, Sep 22

TJones added a comment to T262566: Enable DWIM support for Vue.js search.

I've updated the task description to more accurately describe what DWIM does. It is not transliteration or translation, and the previous tomato and matrix examples don't really illustrate it. The mappings are based on keyboards, not alphabets, too, and I believe the Hebrew and Russian DWIM code both assume the US keyboard. (Using the UK or French or German keyboard for the Latin half of the equation would give different results.)

Tue, Sep 22, 8:02 PM · Readers-Web-Backlog, Vue.js (Vue.js-Search)
TJones updated the task description for T262566: Enable DWIM support for Vue.js search.
Tue, Sep 22, 7:45 PM · Readers-Web-Backlog, Vue.js (Vue.js-Search)
TJones updated the task description for T262566: Enable DWIM support for Vue.js search.
Tue, Sep 22, 7:07 PM · Readers-Web-Backlog, Vue.js (Vue.js-Search)
TJones updated the task description for T262566: Enable DWIM support for Vue.js search.
Tue, Sep 22, 7:06 PM · Readers-Web-Backlog, Vue.js (Vue.js-Search)

Mon, Sep 21

TJones added a comment to T262566: Enable DWIM support for Vue.js search.

Sorry... lots of other things have been happening.. I'll try to give a semi-thoughtful reply tomorrow!

Mon, Sep 21, 9:42 PM · Readers-Web-Backlog, Vue.js (Vue.js-Search)
TJones triaged T263088: Allow to download WDQS and WCQS results as Excel spreadsheet as Low priority.
Mon, Sep 21, 3:51 PM · Wikidata, Wikidata-Query-Service
TJones updated the task description for T263088: Allow to download WDQS and WCQS results as Excel spreadsheet.
Mon, Sep 21, 3:37 PM · Wikidata, Wikidata-Query-Service

Sep 17 2020

TJones added a comment to T257058: [Epic] Review / Improve Search Platform team documentation.

Help:CirrusSearch is very long, but also marked up for translation, so I'm trying to edit very lightly, since any edit requires translation in dozens of languages. Parts of the page are very out of date, however. It is slow going.

Sep 17 2020, 5:17 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic
TJones closed T54656: CirrusSearch: searching for Africa finds África but the UI doesn't behave quite right as Declined.

The current expected behavior is that only exact matches (modulo upper/lowercase) are highlighted in the completion suggester suggestions. Searching for Affrica should not highlight África in the suggestion (nor would searching Africa).

Sep 17 2020, 3:51 PM · Discovery-Search, Discovery, CirrusSearch
TJones updated the task description for T257058: [Epic] Review / Improve Search Platform team documentation.
Sep 17 2020, 3:29 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic
TJones claimed T262610: Enable ICUTokNorm() for Glent M0 and M1.

Claiming this and moving this to "in progress"—though Erik is also doing work with documentation, and I'm on hold waiting for access to the right server.

Sep 17 2020, 2:53 PM · Discovery-Search (Current work)
TJones added a comment to T262610: Enable ICUTokNorm() for Glent M0 and M1.

I've asked for access to archiva.wikimedia.org after talking to David this morning. Will update when that's done.

Sep 17 2020, 2:52 PM · Discovery-Search (Current work)
TJones added a comment to T255603: Design spec for new Vue.js search experience.

@TJones I was wondering if you have any opinions regarding the question I raised here: T255603#6416710?

Sep 17 2020, 2:29 PM · Readers-Web-Backlog (Kanbanana-FY-2020-21), Vue.js (Vue.js-Search), Desktop Improvements

Sep 16 2020

TJones added a comment to T262610: Enable ICUTokNorm() for Glent M0 and M1.

Created Discovery/Analytics/Glent on wikitech. Bare bones but will rank highly in wikitech search and covers the commands used for releasing new jars along with links to configuration and the related analytics airflow.

Sep 16 2020, 8:51 PM · Discovery-Search (Current work)

Sep 15 2020

TJones updated the task description for T257058: [Epic] Review / Improve Search Platform team documentation.
Sep 15 2020, 4:08 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic

Sep 14 2020

TJones updated the task description for T257058: [Epic] Review / Improve Search Platform team documentation.
Sep 14 2020, 8:51 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic
TJones updated the task description for T257058: [Epic] Review / Improve Search Platform team documentation.
Sep 14 2020, 8:04 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic
TJones triaged T257058: [Epic] Review / Improve Search Platform team documentation as Medium priority.
Sep 14 2020, 7:55 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic
TJones updated TJones.
Sep 14 2020, 7:48 PM
TJones updated the task description for T257058: [Epic] Review / Improve Search Platform team documentation.
Sep 14 2020, 7:37 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic
TJones added a comment to T262610: Enable ICUTokNorm() for Glent M0 and M1.

The plan is for @EBernhardson to document the process and for me to perform the process following the docs, so that bumped the estimation for the task up to 2.

Sep 14 2020, 6:00 PM · Discovery-Search (Current work)
TJones updated the task description for T257058: [Epic] Review / Improve Search Platform team documentation.
Sep 14 2020, 4:23 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service, Documentation, Epic

Sep 11 2020

TJones added a comment to T258094: Improve Breton language analysis.

Sorry that this is still moving slowly. We will get there eventually!

Sep 11 2020, 9:44 PM · Discovery-Search (Current work)
TJones moved T238151: Tune Glent Method 1 algorithm from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Sep 11 2020, 6:09 PM · Discovery-Search (Current work)

Sep 10 2020

TJones moved T244800: Analysis of Method 2 Suggestion results from Waiting to In Progress on the Discovery-Search (Current work) board.
Sep 10 2020, 9:04 PM · Discovery-Search (Current work), Chinese-Sites
TJones created T262612: Run an A/B test using suggestions generated using glent Method 1.
Sep 10 2020, 9:03 PM · Discovery-Search (Current work)
TJones created T262610: Enable ICUTokNorm() for Glent M0 and M1.
Sep 10 2020, 8:55 PM · Discovery-Search (Current work)
TJones moved T238151: Tune Glent Method 1 algorithm from Waiting to Needs review on the Discovery-Search (Current work) board.
Sep 10 2020, 8:44 PM · Discovery-Search (Current work)
TJones updated the task description for T238151: Tune Glent Method 1 algorithm.
Sep 10 2020, 8:43 PM · Discovery-Search (Current work)
TJones updated the task description for T238151: Tune Glent Method 1 algorithm.
Sep 10 2020, 8:43 PM · Discovery-Search (Current work)
TJones moved T238151: Tune Glent Method 1 algorithm from In Progress to Waiting on the Discovery-Search (Current work) board.
Sep 10 2020, 8:40 PM · Discovery-Search (Current work)

Sep 9 2020

TJones updated the task description for T238151: Tune Glent Method 1 algorithm.
Sep 9 2020, 8:57 PM · Discovery-Search (Current work)
TJones added a comment to T238151: Tune Glent Method 1 algorithm.

Update: M0 and M‍1 won't work the same because of the different frequency counts for M0. Also, the edit distance configuration for v is more restrictive than the default, and M0 works better with the less restrictive config. Rather than come up with a custom config for edit distance and a new frhedscore for M0, I'm going to use the current frhedscore for M0 (which will favor more frequenent corrections when we they exist) and let M0 always beat M‍1. More details on MediaWiki.

Sep 9 2020, 7:45 PM · Discovery-Search (Current work)

Sep 3 2020

TJones added a comment to T261814: MediaSearch: Improve autocomplete for CJK langauges .

Note that my original comment was in reply to the idea of only making suggestions one "word" at a time, which doesn't work as one might expect for spaceless languages.

Sep 3 2020, 6:36 PM · Structured-Data-Backlog

Sep 2 2020

TJones updated subscribers of T261515: Basic user testing for new search experience.

@TJones any chance y'all have thought about, or know about, do-over searches (see above comments for context)?

Sep 2 2020, 5:35 PM · Desktop Improvements, Readers-Web-Backlog (Kanbanana-FY-2020-21)

Sep 1 2020

TJones added a comment to T260292: [M] Add "did you mean" feature to Media Search.

@TJones see @mwilliams' question above - do you know if this functionality exists? Thanks!

Sep 1 2020, 4:44 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate2), Structured-Data-Backlog (Current Work)

Aug 27 2020

TJones raised the priority of T180387: Enable hiragana/katakana mapping for other languages from Medium to High.
Aug 27 2020, 9:46 PM · Discovery-Search, Discovery, CirrusSearch
TJones raised the priority of T219108: Investigate applying aggressive_splitting everywhere, not just on English-language wikis from Medium to High.
Aug 27 2020, 9:46 PM · Discovery, CirrusSearch, Discovery-Search
TJones raised the priority of T170625: Investigate disabling or modifying word_break_helper in language analyzers. from Medium to High.
Aug 27 2020, 9:46 PM · Discovery-Search
TJones triaged T141080: Wikidata items with two coordinates do not show up in geosearch as Medium priority.
Aug 27 2020, 9:27 PM · TestMe, Discovery-Search, Wikidata, GeoData
TJones triaged T78703: Support query continuation for Nearby requests as Medium priority.
Aug 27 2020, 9:27 PM · Discovery-Search, GeoData
TJones triaged T215716: Allow admin settings or user preferences for how many search results to show per default (pagination) as Low priority.
Aug 27 2020, 9:25 PM · MediaWiki-User-preferences, Discovery-Search, MediaWiki-Search
TJones triaged T240209: Consider adding "codesearch" and search for doc.wikimedia.org to mediawiki.org as Low priority.
Aug 27 2020, 9:22 PM · Discovery-Search, VPS-project-codesearch, Documentation, Discovery
TJones triaged T242892: Searching for a protected page doesn't show searchmenu-new message as Medium priority.
Aug 27 2020, 9:21 PM · MediaWiki-Search, Discovery-Search
TJones removed a project from T245905: Integrate CirrusSearch topic search capability with AdvancedSearch: Discovery-Search.
Aug 27 2020, 9:21 PM · Patch-For-Review, Wikimedia-Hackathon-2020, archived--TCB-Team, CirrusSearch, Advanced-Search
TJones lowered the priority of T221560: Searches with hyphens yield a database query error from High to Medium.
Aug 27 2020, 9:18 PM · Discovery-Search, MediaWiki-Search
TJones triaged T258094: Improve Breton language analysis as Medium priority.
Aug 27 2020, 9:12 PM · Discovery-Search (Current work)
TJones moved T243795: Add sort: keyword for explicit sort orders from needs triage to elastic / cirrus on the Discovery-Search board.
Aug 27 2020, 9:11 PM · Discovery-Search, CirrusSearch
TJones moved T242284: Some characters are lost in title and search snippet highlights from needs triage to elastic / cirrus on the Discovery-Search board.
Aug 27 2020, 9:11 PM · CirrusSearch, Discovery-Search
TJones moved T246566: Can't set https as elasticsearch server from needs triage to elastic / cirrus on the Discovery-Search board.
Aug 27 2020, 9:11 PM · Discovery-Search, CirrusSearch
TJones moved T241953: Search should let you search for the title of a book in any language and give results across languages. from needs triage to elastic / cirrus on the Discovery-Search board.
Aug 27 2020, 9:11 PM · Discovery-Search, CirrusSearch, Wikisource
TJones moved T250195: linksto should also search interwiki links from needs triage to elastic / cirrus on the Discovery-Search board.
Aug 27 2020, 9:11 PM · CirrusSearch, Discovery-Search
TJones moved T241953: Search should let you search for the title of a book in any language and give results across languages. from elastic / cirrus to needs triage on the Discovery-Search board.
Aug 27 2020, 9:11 PM · Discovery-Search, CirrusSearch, Wikisource
TJones moved T250195: linksto should also search interwiki links from elastic / cirrus to needs triage on the Discovery-Search board.
Aug 27 2020, 9:11 PM · CirrusSearch, Discovery-Search
TJones moved T246566: Can't set https as elasticsearch server from elastic / cirrus to needs triage on the Discovery-Search board.
Aug 27 2020, 9:11 PM · Discovery-Search, CirrusSearch
TJones moved T242284: Some characters are lost in title and search snippet highlights from elastic / cirrus to needs triage on the Discovery-Search board.
Aug 27 2020, 9:11 PM · CirrusSearch, Discovery-Search
TJones moved T243795: Add sort: keyword for explicit sort orders from elastic / cirrus to needs triage on the Discovery-Search board.
Aug 27 2020, 9:11 PM · Discovery-Search, CirrusSearch
TJones closed T245677: Reader searches with romanized version of non-Latin script as Declined.

The description still conflates DWIM and cross-script searching, which are completely different things. We have other tickets for DWIM-like functionality, and cross-script searching is much more complex. I can't triage it with this ambiguity, so I'm closing it.

Aug 27 2020, 9:04 PM · Discovery-Search, Story
TJones triaged T168652: Articles with dates should be sorted as Low priority.
Aug 27 2020, 9:00 PM · Discovery-Search
TJones triaged T188476: Document how to write a search extension as Low priority.
Aug 27 2020, 8:58 PM · Discovery-Search, Discovery, MediaWiki-Search, Documentation, MediaWiki-Documentation
TJones triaged T232565: case-sensitive equivalent of haswbstatement as Low priority.
Aug 27 2020, 8:57 PM · Wikidata, Discovery-Search
TJones triaged T259442: insource and intitle regular expression search doesn't allow final escaped slash as Medium priority.
Aug 27 2020, 8:56 PM · Discovery-Search, CirrusSearch
TJones triaged T250195: linksto should also search interwiki links as Low priority.
Aug 27 2020, 8:53 PM · CirrusSearch, Discovery-Search
TJones triaged T251671: Geosearch doesn't work for Icelandic Wikipedia; same query works on English Wikipedia as High priority.
Aug 27 2020, 8:53 PM · Discovery-Search (Current work), GeoData
TJones renamed T241953: Search should let you search for the title of a book in any language and give results across languages. from Search should let you search for the title of a book in any language and give results accross languages. to Search should let you search for the title of a book in any language and give results across languages..
Aug 27 2020, 8:44 PM · Discovery-Search, CirrusSearch, Wikisource
TJones triaged T241953: Search should let you search for the title of a book in any language and give results across languages. as Low priority.
Aug 27 2020, 8:44 PM · Discovery-Search, CirrusSearch, Wikisource
TJones triaged T248215: AugmentPageProps appears to be unused as Medium priority.
Aug 27 2020, 8:42 PM · User-DannyS712, Technical-Debt, Discovery-Search, MediaWiki-Search
TJones triaged T243795: Add sort: keyword for explicit sort orders as Low priority.
Aug 27 2020, 8:42 PM · Discovery-Search, CirrusSearch
TJones triaged T242327: QINU appears instead of math in search results as Low priority.
Aug 27 2020, 8:40 PM · Discovery-Search, MediaWiki-Parser, Math
TJones triaged T261146: Make search index creation for new wikis more robust as Medium priority.
Aug 27 2020, 8:38 PM · Discovery-Search
TJones raised the priority of T219912: Loosen limit on DYM suggestions blocking cross-language results from < 3 to < 5 from Medium to High.
Aug 27 2020, 8:34 PM · Discovery-Search
TJones raised the priority of T63080: CirrusSearch: intitle:¢ returns no results despite there being a redirect at [[¢]] from Low to Medium.
Aug 27 2020, 8:20 PM · Discovery-Search, Discovery, good first task, CirrusSearch
TJones renamed T118278: [EPIC] Improve Language Identification for use in Cirrus Search from EPIC: Improve Language Identification for use in Cirrus Search to [EPIC] Improve Language Identification for use in Cirrus Search.
Aug 27 2020, 8:16 PM · Discovery-Search, Epic
TJones moved T118278: [EPIC] Improve Language Identification for use in Cirrus Search from needs triage to [epic] on the Discovery-Search board.
Aug 27 2020, 8:16 PM · Discovery-Search, Epic
TJones edited projects for T118278: [EPIC] Improve Language Identification for use in Cirrus Search, added: Discovery-Search; removed Discovery.
Aug 27 2020, 8:16 PM · Discovery-Search, Epic