TJones (Trey Jones)
Sr. Software Engineer, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (123 w, 4 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF)

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Mon, Nov 13

TJones added a comment to T180387: Enable hiragana/katakana mapping for other languages.

Not directly part of this ticket, but part of the related discussion: we should decide whether we should keep going down the list of unpacked analyzers, and whether we should pro-actively unpack the other analyzers, and whether we should just enable it for languages where it has a small but positive impact (and doesn't cause problems with the analyzer).

Mon, Nov 13, 6:52 PM · Discovery-Search, Discovery, CirrusSearch
TJones renamed T180387: Enable hiragana/katakana mapping for other languages from Enable hiragana/katakana mapping for other languages. to Enable hiragana/katakana mapping for other languages.
Mon, Nov 13, 6:52 PM · Discovery-Search, Discovery, CirrusSearch
TJones created T180387: Enable hiragana/katakana mapping for other languages.
Mon, Nov 13, 6:30 PM · Discovery-Search, Discovery, CirrusSearch
TJones closed T180365: Remove unneeded language-specific config for lowercase filters as Declined.

Declining this since I opened it for the wrong field. D'oh.

Mon, Nov 13, 4:34 PM · Technical-Debt, Discovery-Search (Current work)
TJones added a comment to T180365: Remove unneeded language-specific config for lowercase filters.

On 1000 articles from Wikipedia and 1000 entries from Wiktionary it doesn't make any difference for the text analyzer... but it does make a difference in the "lowercase" analyzer! Whoops. TIL.

Mon, Nov 13, 4:34 PM · Technical-Debt, Discovery-Search (Current work)
TJones edited projects for T180365: Remove unneeded language-specific config for lowercase filters, added: Discovery-Search (Current work); removed Discovery-Search.
Mon, Nov 13, 3:58 PM · Technical-Debt, Discovery-Search (Current work)
TJones updated the task description for T180365: Remove unneeded language-specific config for lowercase filters.
Mon, Nov 13, 3:58 PM · Technical-Debt, Discovery-Search (Current work)
TJones moved T180365: Remove unneeded language-specific config for lowercase filters from Needs triage to Tech Debt/Misc on the Discovery-Search board.
Mon, Nov 13, 3:57 PM · Technical-Debt, Discovery-Search (Current work)
TJones created T180365: Remove unneeded language-specific config for lowercase filters.
Mon, Nov 13, 3:57 PM · Technical-Debt, Discovery-Search (Current work)
TJones renamed T177876: Investigate changing ICU tokenization from whitelist to blacklist from Investigate changing ICU tokenization from whitelist to blacklist. to Investigate changing ICU tokenization from whitelist to blacklist.
Mon, Nov 13, 3:48 PM · Discovery-Search

Thu, Nov 9

TJones added a comment to T180169: Make list of languages where using stemmed analyzer for Wikibase is beneficial.

@Smalyshev, I think this covers the info you need. Let me know if I can give more info or help with anything else. :)

Thu, Nov 9, 11:10 PM · MediaWiki-extensions-WikibaseRepository, Wikidata, Discovery-Search (Current work), Discovery

Wed, Nov 8

TJones added a comment to T176197: Allow hiragana searches to find katakana results and vice versa.

I've added posts on Italian Wikipedia & Wiktionary, and Swedish Wikipedia & Wiktionary.

Wed, Nov 8, 3:27 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery

Tue, Nov 7

TJones added a comment to T170099: Search returns random results when search query begins with a hyphen.

@debt, is there anything left to do for this task? I don't think we want to completely disable single-term negation searching because of the three use cases @dcausse outlined above (T170099#3429153).

Tue, Nov 7, 6:53 PM · Discovery-Search, CirrusSearch, Discovery
TJones moved T178926: Review Serbian Morphological Libraries from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Nov 7, 5:44 PM · Discovery-Search (Current work), Discovery
TJones edited projects for T179945: Re-index English-language wikis to pick up kana mapping, added: Discovery-Search (Current work); removed Discovery-Search.
Tue, Nov 7, 3:46 PM · Discovery-Search (Current work), Discovery, CirrusSearch
TJones moved T176197: Allow hiragana searches to find katakana results and vice versa from Needs review to Done on the Discovery-Search (Current work) board.
Tue, Nov 7, 3:46 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery
TJones added a comment to T176197: Allow hiragana searches to find katakana results and vice versa.

The code has been merged, but not deployed. I've created T179945 to re-index of English-language wikis after the code is deployed, and added it to T147505.

Tue, Nov 7, 3:45 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Tue, Nov 7, 3:44 PM · Discovery-Search (Current work), Discovery
TJones created T179945: Re-index English-language wikis to pick up kana mapping.
Tue, Nov 7, 3:44 PM · Discovery-Search (Current work), Discovery, CirrusSearch

Mon, Nov 6

TJones added a comment to T176197: Allow hiragana searches to find katakana results and vice versa.

Bugs files:

  • ICU Tokenizer: U+0370 and above affect tokenization of characters after whitespace: issue 27290
  • Standard tokenizer incorrectly tokenizes hiragana: issue 27291
  • ICU Normalizer adds spaces before certain non-combining dakuten and handakuten: issue 27292
Mon, Nov 6, 10:10 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery
TJones added a comment to T176197: Allow hiragana searches to find katakana results and vice versa.

Posted messages to:

Mon, Nov 6, 7:48 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery
TJones moved T176197: Allow hiragana searches to find katakana results and vice versa from In progress to Needs review on the Discovery-Search (Current work) board.
Mon, Nov 6, 7:46 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery
TJones added a comment to T179500: Evaluation precision of discernatron results vs our retrieval query.

@EBernhardson—sorry, I should have clarified.. that's not real data! I just made it up to show the format and the kinds of info we might see.

Mon, Nov 6, 6:15 PM · Discovery-Search, CirrusSearch, Discovery
TJones added a comment to T173650: Inappropriate/broken redirecting of Japanese in search.

FYI: my recommendation over on T176197 is to enable hiragana-to-katakana mapping for English, but not Japanese because it runs afoul of a couple of ugly tokenization bugs. We'll also look into whether the French- and Russian-language communities feel they might benefit from this; if so, we may expand beyond those two.

Mon, Nov 6, 4:52 PM · Discovery-Search, Discovery, CirrusSearch
TJones added a comment to T176197: Allow hiragana searches to find katakana results and vice versa.

Whew! What a ride. This turned out to be much more complicated than anticipated for the Japanese analysis. I found three tokenization bugs, one of which depends on context in unexpected ways and so made me question my data collection, which led to me re-running everything... Anyway, because of the bugs in the tokenization, I recommend not deploying this for Japanese.

Mon, Nov 6, 4:49 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery

Thu, Nov 2

TJones added a comment to T179500: Evaluation precision of discernatron results vs our retrieval query.

Since the search engines have different recall profiles—with one being famously expansive in its recall—it makes sense to score them independently. We also have the human-scored results for <150 queries, graded on a 0-4 scale. I'm a big fan of tables of numbers, but not everyone is, so feel free to consider paring this down significantly.

Thu, Nov 2, 2:30 PM · Discovery-Search, CirrusSearch, Discovery
TJones added a comment to T170779: Wikidata search suggestions do not display on screen if character whose decomposition contains nukta is present in search query.

@Smalyshev, thanks for tracking this one down! That was some weird behavior, but things getting normalized and not matching makes sense.

Thu, Nov 2, 1:05 PM · MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), Wikidata-Former-Sprint-Board, User-Smalyshev, Discovery-Search (Current work), ValueView, MediaWiki-extensions-WikibaseRepository, Wikidata

Fri, Oct 27

TJones updated the task description for T104814: Appropriately ignore diacritics for German-language wikis.
Fri, Oct 27, 4:57 PM · Discovery-Search, CirrusSearch, Discovery
TJones updated the task description for T104814: Appropriately ignore diacritics for German-language wikis.
Fri, Oct 27, 4:56 PM · Discovery-Search, CirrusSearch, Discovery
TJones added a comment to T179081: Full text search does not find article with accented word in dewiki.

Do you want to comment on the other e's that are also not folded correctly? I'll add a note to the other ticket to fix this documentation.

Fri, Oct 27, 4:55 PM · Discovery-Search, Discovery, Regression, CirrusSearch
TJones added a comment to T179081: Full text search does not find article with accented word in dewiki.

@FriedhelmW can you point me at the documentation you want to change? If you are referring to "Folds character families. Diacritical folding automatically matches foreign terms" then I agree it should be updated, but please be careful not to make it incorrect in a different way. Diacritical folding is turned on for most languages, though the set of characters that are folded differs from language to language.

Fri, Oct 27, 4:48 PM · Discovery-Search, Discovery, Regression, CirrusSearch
TJones added a comment to T179081: Full text search does not find article with accented word in dewiki.

D'oh—thanks @FriedhelmW, I didn't check for that. The busy, busy WikiGnomes are always fixing things. So, I'd say that this is a specific example of what's happening with e's in T104814. That ticket is on my list for this year. Is it okay to close this ticket and/or fold it into T104814?

Fri, Oct 27, 4:35 PM · Discovery-Search, Discovery, Regression, CirrusSearch
TJones added a comment to T170779: Wikidata search suggestions do not display on screen if character whose decomposition contains nukta is present in search query.

Note that you don’t need to change your interface to Bengali to see these effects, and the fact that it is the Bengali keyword for “category” doesn’t seem to matter either. You can search for single characters and get the described behavior.

Fri, Oct 27, 4:30 PM · MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), Wikidata-Former-Sprint-Board, User-Smalyshev, Discovery-Search (Current work), ValueView, MediaWiki-extensions-WikibaseRepository, Wikidata
TJones added a comment to T179081: Full text search does not find article with accented word in dewiki.

@FriedhelmW, can you post a screenshot or more detailed description of what you are seeing that is wrong? Or maybe another example? When I follow the link you provided, the article for Eugénie Grandet is the first result:

Fri, Oct 27, 2:53 PM · Discovery-Search, Discovery, Regression, CirrusSearch
Liuxinyu970226 awarded T177871: Re-index un-fallbacked languages a Baby Tequila token.
Fri, Oct 27, 12:01 PM · User-notice, Discovery-Search (Current work), Discovery, I18n

Tue, Oct 24

TJones moved T138958: Detect "wrong keyboard" queries for Russian/American keyboards on EN/RU Wikipedias from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:36 PM · Discovery, Discovery-Search
TJones moved T138858: Serbian language search does not allows for use of bald Latin alphabet from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:35 PM · CirrusSearch, Discovery-Search, Discovery, MediaWiki-Internationalization
TJones moved T138857: Serbian language search differentiates between Cyrillic and Latin alphabets from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:35 PM · CirrusSearch, Discovery-Search, MediaWiki-Internationalization, Discovery
TJones moved T140292: A/B Test TextCat settings on non-WP projects from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:35 PM · CirrusSearch, Discovery-Search, Discovery
TJones moved T149307: CirrusSearch: Replace double quotes with spaces in queries from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:35 PM · CirrusSearch, Discovery-Search, Discovery
TJones moved T149121: Go over discernatron data to get an idea of where we need to improve from This Quarter to Later on the Discovery-Search board.
Tue, Oct 24, 5:35 PM · Discovery-Search, Discovery, CirrusSearch
TJones moved T145564: Discernatron should remove redirects from result set from Tech Debt/Misc to Later on the Discovery-Search board.
Tue, Oct 24, 5:35 PM · Discovery-Search, Discovery
TJones moved T145564: Discernatron should remove redirects from result set from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:34 PM · Discovery-Search, Discovery
TJones moved T155104: Detect "wrong keyboard" queries for Hebrew/American keyboards on EN/HE Wikipedias from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:34 PM · Discovery-Search, Discovery
TJones moved T174621: Investigate dropping obvious question words ('what is' 'who is') to get better results from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:33 PM · Discovery-Search, Discovery
TJones moved T87136: ~"daß" should not match "dass" from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:31 PM · Discovery-Search, Discovery, CirrusSearch
TJones moved T174116: Another look at multi-hyphen tokens on enwiki and zhwiki from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:28 PM · Chinese-Sites, Discovery-Search, Discovery
TJones moved T177888: Review use of CJK vs ICU default language analyzers for "Chinese" Wikis from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:28 PM · Chinese-Sites, Discovery-Search
TJones moved T177877: Investigate enabling Nynorsk Light Stemmer from This Quarter to Tech Debt/Misc on the Discovery-Search board.
Tue, Oct 24, 5:28 PM · Discovery-Search
TJones moved T178923: Review Japanese Morphological Libraries from Needs triage to Up Next on the Discovery-Search board.
Tue, Oct 24, 5:15 PM · Discovery-Search, Discovery
TJones moved T178924: Review Vietnamese Morphological Libraries from Needs triage to Up Next on the Discovery-Search board.
Tue, Oct 24, 5:15 PM · Discovery-Search, Discovery
TJones moved T178925: Review Korean Morphological Libraries from Needs triage to Up Next on the Discovery-Search board.
Tue, Oct 24, 5:15 PM · Discovery-Search, Discovery
TJones moved T178928: Review Estonian Morphological Libraries from Needs triage to Up Next on the Discovery-Search board.
Tue, Oct 24, 5:15 PM · Discovery-Search, Discovery
TJones edited projects for T178923: Review Japanese Morphological Libraries, added: Discovery-Search; removed Discovery-Search (Current work).
Tue, Oct 24, 5:15 PM · Discovery-Search, Discovery
TJones edited projects for T178928: Review Estonian Morphological Libraries, added: Discovery-Search; removed Discovery-Search (Current work).
Tue, Oct 24, 5:14 PM · Discovery-Search, Discovery
TJones edited projects for T178925: Review Korean Morphological Libraries, added: Discovery-Search; removed Discovery-Search (Current work).
Tue, Oct 24, 5:14 PM · Discovery-Search, Discovery
TJones edited projects for T178924: Review Vietnamese Morphological Libraries, added: Discovery-Search; removed Discovery-Search (Current work).
Tue, Oct 24, 5:14 PM · Discovery-Search, Discovery
TJones moved T171652: Language Analysis Morphological Library Research Spike from Needs review to Done on the Discovery-Search (Current work) board.
Tue, Oct 24, 4:54 PM · Discovery-Search (Current work), Tamil-Sites, Malayalam-Sites, Bengali-Sites, Discovery
TJones moved T171652: Language Analysis Morphological Library Research Spike from In progress to Needs review on the Discovery-Search (Current work) board.
Tue, Oct 24, 4:54 PM · Discovery-Search (Current work), Tamil-Sites, Malayalam-Sites, Bengali-Sites, Discovery
TJones updated the task description for T171652: Language Analysis Morphological Library Research Spike.
Tue, Oct 24, 4:54 PM · Discovery-Search (Current work), Tamil-Sites, Malayalam-Sites, Bengali-Sites, Discovery
TJones updated the task description for T178924: Review Vietnamese Morphological Libraries.
Tue, Oct 24, 4:53 PM · Discovery-Search, Discovery
TJones added a comment to T171652: Language Analysis Morphological Library Research Spike.

Recurring themes:

  • Not everything is usefully licensed.
  • Code gets abandoned.
  • Useful code may exist that is not in English.
  • Java is easiest, but not everything is in Java.
  • Sometimes all that exists are research papers.
Tue, Oct 24, 4:52 PM · Discovery-Search (Current work), Tamil-Sites, Malayalam-Sites, Bengali-Sites, Discovery
TJones created T178929: Review Slovak Morphological Libraries.
Tue, Oct 24, 4:51 PM · Discovery-Search (Current work), Discovery
TJones created T178928: Review Estonian Morphological Libraries.
Tue, Oct 24, 4:51 PM · Discovery-Search, Discovery
TJones created T178926: Review Serbian Morphological Libraries.
Tue, Oct 24, 4:50 PM · Discovery-Search (Current work), Discovery
TJones created T178925: Review Korean Morphological Libraries.
Tue, Oct 24, 4:49 PM · Discovery-Search, Discovery
TJones created T178924: Review Vietnamese Morphological Libraries.
Tue, Oct 24, 4:48 PM · Discovery-Search, Discovery
TJones created T178923: Review Japanese Morphological Libraries.
Tue, Oct 24, 4:47 PM · Discovery-Search, Discovery
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Tue, Oct 24, 4:10 PM · Discovery-Search (Current work), Discovery
TJones placed T177876: Investigate changing ICU tokenization from whitelist to blacklist up for grabs.
Tue, Oct 24, 4:09 PM · Discovery-Search
TJones added a comment to T177871: Re-index un-fallbacked languages.

I've replied to Chris's comments on Babylon, and all of the Village Pumps that I had posted to before: Slovak, Mirandese, Occitan, Limburgish (which spilled over to my talk page), Egyptian Arabic, Gagauz, and Livvi-Karelian. I'll follow those conversations for the rest of the week to see if any concerns come up. I've also updated the page on MediaWiki to reflect what has happened rather than what will happen.

Tue, Oct 24, 3:00 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones added a comment to T177871: Re-index un-fallbacked languages.
  • Document this work on MediaWiki somewhere - to help with future searches if people do have questions
  • Add to Tech/News (I just did!)
  • Post an update to wikitech-l and wikitech-ambassadors
  • Add to the weekly Discovery update
Tue, Oct 24, 2:38 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones updated the task description for T177871: Re-index un-fallbacked languages.
Tue, Oct 24, 2:31 PM · User-notice, Discovery-Search (Current work), Discovery, I18n

Mon, Oct 23

RandomDSdevel awarded T177871: Re-index un-fallbacked languages a Baby Tequila token.
Mon, Oct 23, 8:01 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones moved T176197: Allow hiragana searches to find katakana results and vice versa from Backlog to In progress on the Discovery-Search (Current work) board.
Mon, Oct 23, 5:19 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery

Oct 20 2017

TJones claimed T176197: Allow hiragana searches to find katakana results and vice versa.
Oct 20 2017, 2:43 PM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Discovery

Oct 18 2017

TJones added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@chelsyx—thanks esp. for the joyplot updates! They are fun to stare at and ponder.

Oct 18 2017, 6:13 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Oct 17 2017

TJones added a comment to T156019: Develop plan for dealing with numerous second-try searches, aka "So Many Search Options".

@debt: While this is something we need to look at eventually, I think we could move it from "Up Next" to "Later". Until we have a new second-try search that needs to fit on the results page and a front-end/UI person with time to think about it, we probably aren't going to be motivated to do anything more on this.

Oct 17 2017, 6:02 PM · Discovery-Search, Discovery, CirrusSearch
TJones added a comment to T170099: Search returns random results when search query begins with a hyphen.

Two other things...

Oct 17 2017, 4:13 PM · Discovery-Search, CirrusSearch, Discovery
TJones added a comment to T170099: Search returns random results when search query begins with a hyphen.

Okay, I've edited the CirrusSearch Help page to explain how to do this. I also changed the example to -in-law because the intent is more obvious (in English) than -happy or -ridden.

Oct 17 2017, 4:02 PM · Discovery-Search, CirrusSearch, Discovery
TJones added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@chelsyx —thanks for the updates! I'm glad everything seems more reasonable now. We always have weird outliers and general odd behavior. (OTOH, it could always be worse: Salesforce—according to their Solr/Revolution talk—can't look at their customers' data or queries; that's definitely doing it in hard mode.)

Oct 17 2017, 2:31 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Oct 16 2017

TJones added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

On second thought, I wonder if the difficulty of applying language analysis is also affecting our ability to group together similar queries which leads to less representative training data. Perhaps it would be possible to use hebmorph instead of lucene stemming during the grouping phase,

Oct 16 2017, 9:50 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
TJones updated the task description for T170473: Wikis have content namespaces that are possibly accidentally excluded from the default results of Special:Search.
Oct 16 2017, 5:20 PM · Readers-Community-Engagement, Bengali-Sites, Discovery-Search (Current work), Wikisource
TJones added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@chelsyx—cool analysis overall! The fact that this is mostly automated is amazing!

Oct 16 2017, 5:19 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
TJones added a comment to T177871: Re-index un-fallbacked languages.

Thanks, @Johan! Sorry for the last-minute shuffle as we get all our ducks in a row.

Oct 16 2017, 2:26 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones added a comment to T176428: Search Relevance test #4 - action items.

It all sounds great, Erik! I have no strong feelings on the API requests, but do you have an estimate of how many a reasonably large number would be, say, per day? For those who worry could better calibrate their worry. I guess I'm not worried because it doesn't seems like it can be all that many.

Oct 16 2017, 2:25 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Oct 13 2017

TJones added a comment to T170099: Search returns random results when search query begins with a hyphen.

Let's take a look at this in ElasticSearch parser and see if we can change the UI display of using negative symbol before a word - to make the UI display nothing if we don't have any results.

Oct 13 2017, 4:05 PM · Discovery-Search, CirrusSearch, Discovery
TJones added a comment to T177871: Re-index un-fallbacked languages.

@Johan, I was suggesting considering delaying it it to Tech News 2017/43 (unless that's not possible) if we need to.

Oct 13 2017, 3:14 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones updated subscribers of T177871: Re-index un-fallbacked languages.

Should we post an update before we know when the re-indexing is going to happen? I think @dcausse is likely to be the person who does the actual updates (I don't have permissions on the relevant servers), so should we plan around when he thinks he can do it? If it is next week, then TechNews 2017/42 would be a good place to announce it. Otherwise should we wait for 43? The others are all easy to announce to right before we do it.

Oct 13 2017, 3:09 PM · User-notice, Discovery-Search (Current work), Discovery, I18n

Oct 11 2017

TJones added a comment to T177871: Re-index un-fallbacked languages.

All of the un-fallbacked languages seem okay with the default analyzer. Javanese script doesn't use spaces, but... (a) Javanese wikis mostly use the Javanese Latin alphabet, (b) the ICU tokenizer is already configured for Javanese as a result of earlier spaceless language config, and (c) none of the tokenizers do anything different with it anyway. So, we're ready to start re-indexing once the changes have been deployed.

Oct 11 2017, 8:30 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones added a comment to T177888: Review use of CJK vs ICU default language analyzers for "Chinese" Wikis.

And, isn't it will be love to not only do things on Wikipedias, but also zhwiktionary, zhwikibooks, zhwikivoyage...?

Oct 11 2017, 12:33 PM · Chinese-Sites, Discovery-Search
TJones renamed T177888: Review use of CJK vs ICU default language analyzers for "Chinese" Wikis from Review use of CJK vs ICU default language analyzers for "Chinese" Wikipedias to Review use of CJK vs ICU default language analyzers for "Chinese" Wikis.
Oct 11 2017, 12:29 PM · Chinese-Sites, Discovery-Search
TJones added a comment to T177871: Re-index un-fallbacked languages.

That's a lot of languages, but hopefully the re-indexing won't be too much of a strain or take too long. :)

Oct 11 2017, 12:14 PM · User-notice, Discovery-Search (Current work), Discovery, I18n

Oct 10 2017

TJones added a comment to T177871: Re-index un-fallbacked languages.

Okay, the zh-* languages thing got out of control. It's complicated (see T177888) and none were/will be changed by the fallback changes, so I'm dropping that from this ticket. I'll try to review the rest of the un-fallbacked languages this week.

Oct 10 2017, 9:21 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones created T177888: Review use of CJK vs ICU default language analyzers for "Chinese" Wikis.
Oct 10 2017, 9:20 PM · Chinese-Sites, Discovery-Search
TJones updated the task description for T177871: Re-index un-fallbacked languages.
Oct 10 2017, 8:17 PM · User-notice, Discovery-Search (Current work), Discovery, I18n
TJones added a comment to T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.

T147959 should probably be added here?

Oct 10 2017, 8:16 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T177877: Investigate enabling Nynorsk Light Stemmer.

@jhsoby—Ha! Thanks! Yeah, we want to test nn (Nynorsk) not no (Bokmål)—though presumably if we tested no it would become clear that it was a bad idea.

Oct 10 2017, 8:12 PM · Discovery-Search
TJones moved T147959: Generic language fallbacks in Mediawiki should not be used for Elasticsearch language analyzers from Needs review to Done on the Discovery-Search (Current work) board.
Oct 10 2017, 8:02 PM · MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, Discovery-Search (Current work), Discovery, I18n, Epic
TJones added a comment to T147959: Generic language fallbacks in Mediawiki should not be used for Elasticsearch language analyzers.

I've created T177871 for re-indexing affected wikis and added it to T147505 (the recurring re-indexing ticket).

Oct 10 2017, 8:02 PM · MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, Discovery-Search (Current work), Discovery, I18n, Epic