Page MenuHomePhabricator

TJones (Trey Jones)
Sr. Software Engineer, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (193 w, 1 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF) [ Global Accounts ]

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Yesterday

TJones renamed T212891: [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 2 for CJK languages from [EPIC-ish][Milestone 3] Implement NLP Search Suggestion Method 2 for CJK languages to [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 2 for CJK languages.
Wed, Mar 20, 3:45 PM · Chinese-Sites, Discovery-Search, Epic
TJones renamed T212889: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages from [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 1 for 10 languages to [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages.
Wed, Mar 20, 3:44 PM · Discovery-Search, Epic
TJones renamed T212888: [EPIC-ish][Milestone 0] Implement NLP Search Suggestion Method 0 for English from [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 0 for English to [EPIC-ish][Milestone 0] Implement NLP Search Suggestion Method 0 for English.
Wed, Mar 20, 3:44 PM · Patch-For-Review, Discovery-Search, Epic

Tue, Mar 19

TJones updated the task description for T174116: Another look at multi-hyphen tokens on enwiki and zhwiki.
Tue, Mar 19, 5:25 PM · Discovery-Search (Current work), Chinese-Sites, Discovery

Tue, Mar 12

TJones moved T217602: Properly handle language-specific lowercasing in language analyzers from Needs review to Done on the Discovery-Search (Current work) board.
Tue, Mar 12, 1:45 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)
TJones moved T203117: Greek language analysis generates unexpected empty tokens from Needs review to Done on the Discovery-Search (Current work) board.
Tue, Mar 12, 1:44 PM · Patch-For-Review, Discovery-Search (Current work)

Fri, Mar 8

TJones moved T216083: Update required version of TextCat in CirrusSearch from Needs review to Done on the Discovery-Search (Current work) board.
Fri, Mar 8, 3:13 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones added a comment to T216083: Update required version of TextCat in CirrusSearch.

Thanks, @Smalyshev & @EBernhardson, for the vendor patch!

Fri, Mar 8, 3:12 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery

Thu, Mar 7

TJones claimed T174116: Another look at multi-hyphen tokens on enwiki and zhwiki.
Thu, Mar 7, 6:10 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T174116: Another look at multi-hyphen tokens on enwiki and zhwiki from Language Stuff to Current work on the Discovery-Search board.
Thu, Mar 7, 6:10 PM · Discovery-Search (Current work), Chinese-Sites, Discovery
TJones moved T216083: Update required version of TextCat in CirrusSearch from in progress to Needs review on the Discovery-Search (Current work) board.
Thu, Mar 7, 6:08 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones claimed T216083: Update required version of TextCat in CirrusSearch.
Thu, Mar 7, 6:06 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T216083: Update required version of TextCat in CirrusSearch from Language Stuff to Current work on the Discovery-Search board.
Thu, Mar 7, 6:06 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery

Wed, Mar 6

TJones added a comment to T217602: Properly handle language-specific lowercasing in language analyzers.

After refactoring the lowercase-to-ICU-normalization upgrade code for Greek (T203117) so that the lowercase filter is kept if it is language-specific, I needed to test it for the other language-specific cases: Turkish and Irish. The impact is positive but small because it is limited to the plain field and other fields besides the text field (where the lang-specific lowercasing is already in effect because the analyzers have not been unpacked). Full details on MediaWiki.

Wed, Mar 6, 11:17 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)
TJones added a comment to T203117: Greek language analysis generates unexpected empty tokens.

Unpacking the Greek analyzer exposes the lowercase filter, which is upgraded to icu_normalizer, losing the Greek-specific processing therein! So, we need to keep the Greek lowercasing even if we do ICU normalization. After that, everything is copacetic. Full write up on MediaWiki.

Wed, Mar 6, 11:14 PM · Patch-For-Review, Discovery-Search (Current work)
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Wed, Mar 6, 11:06 PM · Discovery-Search (Current work), Discovery
TJones renamed T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek) from Reindex Greek-language wikis to enable empty-token filtering to Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek).
Wed, Mar 6, 11:05 PM · Turkish-Sites, Discovery-Search
TJones added a subtask for T217602: Properly handle language-specific lowercasing in language analyzers: T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek).
Wed, Mar 6, 11:04 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)
TJones added a parent task for T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek): T217602: Properly handle language-specific lowercasing in language analyzers.
Wed, Mar 6, 11:04 PM · Turkish-Sites, Discovery-Search
TJones moved T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek) from needs triage to Language Stuff on the Discovery-Search board.
Wed, Mar 6, 11:02 PM · Turkish-Sites, Discovery-Search
TJones edited projects for T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek), added: Discovery-Search; removed Discovery-Search (Current work).
Wed, Mar 6, 11:02 PM · Turkish-Sites, Discovery-Search
TJones created T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek).
Wed, Mar 6, 11:01 PM · Turkish-Sites, Discovery-Search
TJones moved T203117: Greek language analysis generates unexpected empty tokens from in progress to Needs review on the Discovery-Search (Current work) board.
Wed, Mar 6, 11:00 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T217602: Properly handle language-specific lowercasing in language analyzers from in progress to Needs review on the Discovery-Search (Current work) board.
Wed, Mar 6, 11:00 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)

Mon, Mar 4

TJones created T217602: Properly handle language-specific lowercasing in language analyzers.
Mon, Mar 4, 8:49 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work)

Tue, Feb 26

TJones claimed T203117: Greek language analysis generates unexpected empty tokens.
Tue, Feb 26, 4:49 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T203117: Greek language analysis generates unexpected empty tokens from Language Stuff to Current work on the Discovery-Search board.
Tue, Feb 26, 4:48 PM · Patch-For-Review, Discovery-Search (Current work)

Thu, Feb 21

TJones moved T216740: Advanced search syntax for newbies from Backlog to Trainings / Skill sharing on the Wikimedia-Hackathon-2019 board.
Thu, Feb 21, 5:00 PM · Wikimedia-Hackathon-2019
TJones created T216740: Advanced search syntax for newbies.
Thu, Feb 21, 5:00 PM · Wikimedia-Hackathon-2019
TJones renamed T216738: Reindex Korean-language wikis to enable Nori analyzer from Reindex Korean-language wikis to Reindex Korean-language wikis to enable Nori analyzer.
Thu, Feb 21, 4:54 PM · Discovery-Search
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Thu, Feb 21, 4:54 PM · Discovery-Search (Current work), Discovery
TJones moved T216738: Reindex Korean-language wikis to enable Nori analyzer from needs triage to Language Stuff on the Discovery-Search board.
Thu, Feb 21, 4:52 PM · Discovery-Search
TJones created T216738: Reindex Korean-language wikis to enable Nori analyzer.
Thu, Feb 21, 4:52 PM · Discovery-Search
TJones moved T206874: Add Nori (Korean) configuration to AnalysisConfigBuilder from in progress to Done on the Discovery-Search (Current work) board.

We need to reindex, but not until after the ES6 upgrade is complete, and LTR has been disabled.

Thu, Feb 21, 4:47 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Wed, Feb 20

TJones added a comment to T215969: Measure mutation latency across the newly split elasticsearch clusters.

@EBernhardson, thanks for the explanation!

Wed, Feb 20, 10:36 PM · Patch-For-Review, Discovery-Search (Current work)
TJones added a comment to T215969: Measure mutation latency across the newly split elasticsearch clusters.

The spikes on create_index are pretty extreme, with 194s for chi-eqiad-with-archive and 291s for omega-eqiad-with-archive. Is that just bad luck, or is something going on with the archives that makes this sometimes take much longer?

Wed, Feb 20, 9:52 PM · Patch-For-Review, Discovery-Search (Current work)
TJones awarded T215969: Measure mutation latency across the newly split elasticsearch clusters a Pterodactyl token.
Wed, Feb 20, 9:50 PM · Patch-For-Review, Discovery-Search (Current work)

Feb 14 2019

TJones added a comment to T63080: CirrusSearch: intitle:¢ returns no results despite there being a redirect at [[¢]].

Bleh. It looks like that symbol is turned into a text boundary by the standard analyzer which isn't nice.

Feb 14 2019, 9:56 PM · Discovery-Search, good first bug, Discovery, CirrusSearch

Feb 13 2019

TJones moved T216083: Update required version of TextCat in CirrusSearch from needs triage to Language Stuff on the Discovery-Search board.
Feb 13 2019, 10:38 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones renamed T216083: Update required version of TextCat in CirrusSearch from Update required version of TextCat in Mediawiki to Update required version of TextCat in CirrusSearch.
Feb 13 2019, 10:38 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones triaged T216083: Update required version of TextCat in CirrusSearch as Normal priority.
Feb 13 2019, 10:36 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T213936: Deploy new version of TextCat from in progress to Done on the Discovery-Search (Current work) board.
Feb 13 2019, 10:34 PM · Discovery-Search (Current work), Discovery
TJones assigned T213936: Deploy new version of TextCat to Smalyshev.

Cool! Thanks, @Smalyshev!

Feb 13 2019, 10:34 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T215966: Requesting access to Production Shell for julia.glen.

Woo hoo!

Feb 13 2019, 9:18 PM · Patch-For-Review, Operations, SRE-Access-Requests
TJones added a comment to T215966: Requesting access to Production Shell for julia.glen.

Change 490412 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] admin: reset Julia SSH key

https://gerrit.wikimedia.org/r/490412

Feb 13 2019, 9:10 PM · Patch-For-Review, Operations, SRE-Access-Requests
TJones moved T206874: Add Nori (Korean) configuration to AnalysisConfigBuilder from Language Stuff to Current work on the Discovery-Search board.
Feb 13 2019, 7:00 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias from Tech Debt/Misc to Language Stuff on the Discovery-Search board.

Removing this from current work and moving it to the "Language Stuff" backlog. I'm the only one who could work on this this quarter, and I'm a bit out of my depth with the integration. We'll reprioritize this for future work when we can assign a slightly larger team (≥2 people) to work on it.

Feb 13 2019, 6:59 PM · Discovery-Search, Russian-Sites, Discovery
TJones edited projects for T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias, added: Discovery-Search; removed Discovery-Search (Current work).
Feb 13 2019, 6:58 PM · Discovery-Search, Russian-Sites, Discovery

Feb 12 2019

TJones added a comment to T215966: Requesting access to Production Shell for julia.glen.

@Julia.glen, I think this patch should give you an account, but as user juliaglen. You may need to add User juliaglen to your ssh config.

Feb 12 2019, 10:05 PM · Patch-For-Review, Operations, SRE-Access-Requests
TJones added a comment to T215916: ElasticSearch 6 migration plan checklist (search cluster).

Hmm—what about Nori (the Korean analyzer) and LTR? I believe we have to disable LTR for Korean, enable Nori, gather more data, then rebuild the LTR model. Sounds like maybe all of that should wait until after the ES upgrade, even though it means re-indexing Korean wikis at a later date.

Feb 12 2019, 4:49 PM · Discovery-Search
TJones added a comment to T215916: ElasticSearch 6 migration plan checklist (search cluster).

Looks good, and all the detail is much appreciated.

Feb 12 2019, 3:51 PM · Discovery-Search

Feb 11 2019

TJones added a comment to T212889: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages.

Sounds good to me! If it turns out that the smallest volume languages have trouble, we can fall back to larger languages on the list.

Feb 11 2019, 8:43 PM · Discovery-Search, Epic
TJones added a comment to T212885: NLP contractor set up and access.

Should be done now—so try again, please!

Feb 11 2019, 8:11 PM · Discovery-Search (Current work)
TJones added a comment to T212885: NLP contractor set up and access.

@Julia.glen, my hue username has the same weird capitalization as Gerrit (Tjones), which I don't use elsewhere.

Feb 11 2019, 8:06 PM · Discovery-Search (Current work)
TJones added a comment to T212885: NLP contractor set up and access.

I am unable to access hue.wikimedia.org with my LDAP account. Could you take a look? Thanks.

Feb 11 2019, 7:32 PM · Discovery-Search (Current work)
TJones moved T212885: NLP contractor set up and access from in progress to Done on the Discovery-Search (Current work) board.
Feb 11 2019, 6:39 PM · Discovery-Search (Current work)
TJones updated the task description for T212885: NLP contractor set up and access.
Feb 11 2019, 6:38 PM · Discovery-Search (Current work)
TJones added a comment to T212889: [EPIC-ish][Milestone 1] Implement NLP Search Suggestion Method 1 for 10 languages.

What languages should we initially investigate?

Feb 11 2019, 2:47 PM · Discovery-Search, Epic

Feb 8 2019

TJones moved T212885: NLP contractor set up and access from Waiting/Blocked to in progress on the Discovery-Search (Current work) board.
Feb 8 2019, 8:44 PM · Discovery-Search (Current work)
TJones added a comment to T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).

Thanks, @Mooeypoo! That looks like it could work. I really appreciate your explanations and your patches!

Feb 8 2019, 5:51 PM · Patch-For-Review, OOUI
TJones moved T194849: Investigate language analyzers in ElasticSearch 6 from Needs review to Done on the Discovery-Search (Current work) board.
Feb 8 2019, 3:55 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T194849: Investigate language analyzers in ElasticSearch 6.

Everything looks good now. Serbian (et al.) and Esperanto are working as expected. Thanks, @dcausse!

Feb 8 2019, 3:54 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T215555: access to turnilo for members of search team.

I can log in now. Thanks!

Feb 8 2019, 2:45 PM · LDAP-Access-Requests
TJones moved T212885: NLP contractor set up and access from in progress to Waiting/Blocked on the Discovery-Search (Current work) board.
Feb 8 2019, 2:50 AM · Discovery-Search (Current work)
TJones updated the task description for T212885: NLP contractor set up and access.
Feb 8 2019, 2:49 AM · Discovery-Search (Current work)
TJones moved T194849: Investigate language analyzers in ElasticSearch 6 from in progress to Needs review on the Discovery-Search (Current work) board.
Feb 8 2019, 1:25 AM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T194849: Investigate language analyzers in ElasticSearch 6.

First draft done. Full details on MediaWiki.

Feb 8 2019, 1:24 AM · Discovery-Search (Current work), Chinese-Sites

Feb 7 2019

TJones updated the task description for T194849: Investigate language analyzers in ElasticSearch 6.
Feb 7 2019, 10:13 PM · Discovery-Search (Current work), Chinese-Sites
TJones claimed T194849: Investigate language analyzers in ElasticSearch 6.
Feb 7 2019, 10:13 PM · Discovery-Search (Current work), Chinese-Sites
TJones moved T194849: Investigate language analyzers in ElasticSearch 6 from Language Stuff to Current work on the Discovery-Search board.
Feb 7 2019, 10:12 PM · Discovery-Search (Current work), Chinese-Sites
TJones added a comment to T215555: access to turnilo for members of search team.

By "LDAP" I assume you mean the login for Wikitech, etc. OIT uses "LDAP" to refer to your Google Apps login, too.

Feb 7 2019, 9:51 PM · LDAP-Access-Requests

Feb 6 2019

TJones added a comment to T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).

Thanks for looking into this, @Mooeypoo! It's too bad that there isn't a way to make it work now, but I'm glad this provides another use case for potential enhancements to OOUI. If the extra functionality ever gets implemented, ping me if you remember!

Feb 6 2019, 4:47 PM · Patch-For-Review, OOUI

Feb 5 2019

TJones added a comment to T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).

Well, users can do var searchInputWidget = OO.ui.infuse($('#searchText')); to get a handle on the OOUI widget. Is that not sufficient?

Feb 5 2019, 9:32 PM · Patch-For-Review, OOUI
TJones added a comment to T214623: Analytics query access for search platform NLP contractor @Julia.glen.

Thanks, @Dzahn!

Feb 5 2019, 9:18 PM · Patch-For-Review, Operations, SRE-Access-Requests, Discovery-Search (Current work)
TJones created T215346: Enable access to OOUI elements for DWIM gadgets (and maybe others).
Feb 5 2019, 8:45 PM · Patch-For-Review, OOUI
TJones updated the task description for T212885: NLP contractor set up and access.
Feb 5 2019, 6:46 PM · Discovery-Search (Current work)

Jan 31 2019

TJones added a comment to T170099: Search returns random results when search query begins with a hyphen.

I regret not expressing my gratitude or commenting here at the time.

Jan 31 2019, 4:49 PM · Discovery-Search, CirrusSearch, Discovery

Jan 30 2019

TJones closed T124291: Searching for an IRC channel name (beginning with '#') redirects to the main page as Resolved.

Seems to be fixed now.

Jan 30 2019, 11:05 PM · Discovery-Search, CirrusSearch, Discovery
TJones closed T48334: Searching for # reloads the page as Resolved.

Seems to be fixed now.

Jan 30 2019, 11:05 PM · Discovery-Search, MediaWiki-Search
TJones moved T139647: Search box at top right of pages should italicize redirects from later on... to UI tickets on the Discovery-Search board.
Jan 30 2019, 11:00 PM · CirrusSearch, Need-volunteer, good first bug, Discovery-Search, Discovery
TJones moved T72899: Search box needs some normalization for Arabic Family languages from later on... to Language Stuff on the Discovery-Search board.
Jan 30 2019, 10:58 PM · Discovery-Search, Discovery, CirrusSearch, I18n, MediaWiki-Search
TJones closed T155670: Investigate Ratio of First to Second Result Scores as a Confidence Measure, a subtask of T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection, as Declined.
Jan 30 2019, 10:54 PM · Discovery-Search, Epic, CirrusSearch, Discovery
TJones closed T155670: Investigate Ratio of First to Second Result Scores as a Confidence Measure as Declined.
Jan 30 2019, 10:54 PM · Discovery-Search, CirrusSearch, Discovery
TJones closed T149323: Qualitative confidence score for TextCat, a subtask of T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection, as Resolved.
Jan 30 2019, 10:53 PM · Discovery-Search, Epic, CirrusSearch, Discovery
TJones closed T149323: Qualitative confidence score for TextCat as Resolved.
Jan 30 2019, 10:53 PM · CirrusSearch, Discovery-Search, Discovery
TJones moved T157771: [UI Enhancement] Show media license in search results from later on... to UI tickets on the Discovery-Search board.
Jan 30 2019, 10:50 PM · CirrusSearch, Discovery-Search, Discovery
TJones closed T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection as Resolved.

Closing this because after looking into it a while back I decided that internal confidence isn't really a thing for TextCat to do, and easy things to improve the quality of TextCat results were done.

Jan 30 2019, 10:50 PM · Discovery-Search, Epic, CirrusSearch, Discovery
TJones closed T140289: Investigate Improvements and Confidence Measures for TextCat Language Detection, a subtask of T118278: EPIC: Improve Language Identification for use in Cirrus Search, as Resolved.
Jan 30 2019, 10:50 PM · Epic, Discovery
TJones closed T155822: Inconsistent search behavior when asciifolding is not activated on text/plain as Resolved.

I think everything here is fixed. ö, ä, and å are all treated as independent letters and using a instead of ä is the same as using u instead of ä, and other diacritics like á are ignored. Depending on whether you use the completion suggester, go feature, or full text search, you get additional suggestions depending on the place of the typos or the frequency of the incorrect word—all as expected.

Jan 30 2019, 10:37 PM · Discovery-Search, CirrusSearch, Discovery
TJones closed T38954: feature request: replace forbidden characters with lookalike UTF8 signs in the wikipedia search input control as Resolved.

I'm going to close this because it was written before we moved to Elasticsearch. The current behavior of Elasticsearch is the same for both these characters and their proposed normalization: all of are ignored during tokenization. In general, we have implemented ICU Normalization for English-language projects, so most non-punctuation characters are normalized well.

Jan 30 2019, 10:18 PM · MediaWiki-Search, Discovery-Search, Discovery
TJones moved T140300: Provide language identification to the long-tail of wikis from later on... to Language Stuff on the Discovery-Search board.
Jan 30 2019, 10:17 PM · CirrusSearch, Discovery-Search, Discovery
TJones updated the task description for T75862: Update and/or enable custom entries for Hebmorph dictionary.
Jan 30 2019, 9:58 PM · CirrusSearch, Discovery-Search, Discovery, I18n, MediaWiki-Search
TJones renamed T75862: Update and/or enable custom entries for Hebmorph dictionary from Search "טריפלקס" in the Hebrew Wikipedia doesn't find an article with the word "וטריפלקס" to Update and/or enable custom entries for Hebmorph dictionary.
Jan 30 2019, 9:58 PM · CirrusSearch, Discovery-Search, Discovery, I18n, MediaWiki-Search
TJones reopened T75862: Update and/or enable custom entries for Hebmorph dictionary as "Stalled".

If we end up having to abandon HebMorph then either there won't be any morphological processing at all or, if we find a replacement, there will be a completely different set of specific errors. I guess we can leave it as stalled for as long as we have HebMorph. And I'll modify the description to be more generic since it isn't about this particular word, but about the ability to make additions to the HebMorph dictionary.

Jan 30 2019, 9:55 PM · CirrusSearch, Discovery-Search, Discovery, I18n, MediaWiki-Search
TJones closed T75862: Update and/or enable custom entries for Hebmorph dictionary as Declined.

We're not sure if we're going to be able to keep using Hebmorph because it hasn't been released for Elasticsearch 6. @dcausse recompiled it so we probably can go into ES6, but beyond that it's unclear, so putting any significant effort into fixing parses for specific words is unlikely to be something we can do.

Jan 30 2019, 9:45 PM · CirrusSearch, Discovery-Search, Discovery, I18n, MediaWiki-Search
TJones closed T170099: Search returns random results when search query begins with a hyphen as Declined.

I'm going to go ahead an close this. I don't think we're going to have time to explore option 3, and hopefully the documentation and the blog post can help people understand what's going on. Please re-open if you think it's closed in error.

Jan 30 2019, 9:33 PM · Discovery-Search, CirrusSearch, Discovery
TJones closed T193195: Outdated "insource" in search as Resolved.

It looks like the one problem document has been fixed. (There's one result at the moment, but it has the source in the query.) The immediate workaround may be a null edit. The medium-term fix is the "saneitizer" job that cleans up everything every two weeks (meaning a problem like this has an average life span of a week). Given that nothing showed up in the logs, there's nothing really we can do. If it happens again, please re-open this ticket or open another one and we'll see if the logs capture any odd behavior.

Jan 30 2019, 9:20 PM · Discovery-Search, Discovery, CirrusSearch
TJones moved T177251: Dead keys prevent autocomplete in search box from later on... to UI tickets on the Discovery-Search board.
Jan 30 2019, 9:12 PM · Discovery-Search, CirrusSearch, Discovery, MediaWiki-Search
TJones moved T214623: Analytics query access for search platform NLP contractor @Julia.glen from in progress to not in use - please delete on the Discovery-Search (Current work) board.
Jan 30 2019, 8:51 PM · Patch-For-Review, Operations, SRE-Access-Requests, Discovery-Search (Current work)
TJones moved T147505: [Recurring task] CirrusSearch: what is updated during re-indexing from in progress to not in use - please delete on the Discovery-Search (Current work) board.
Jan 30 2019, 7:41 PM · Discovery-Search (Current work), Discovery