TJones (Trey Jones)
Sr. Software Engineer, Search Platform Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jul 8 2015, 3:02 PM (145 w, 6 d)
Availability
Available
IRC Nick
Trey314159
LDAP User
Tjones
MediaWiki User
TJones (WMF)

I would have written a shorter comment, but I did not have the time.

I'm part of the Search Platform team and I spend my time working on search & relevance, trying to better support search in various languages, analyzing queries, and doing random mathy things. I tend to write long, detailed notes about my investigations (so as to improve the bus number of my work).

When I have to work on _GitHub,_ /‍‍/Phab,/‍‍/ and ''MediaWiki'' all on the same day, I sometimes suffer Severe Markup Incongruence Fatigue.

I � Unicode.

Recent Activity

Yesterday

TJones moved T192395: Create Croatian, Serbo-Croatian, and Bosnian Analysis Chains Using Serbian Morphological Libraries from In progress to Needs review on the Discovery-Search (Current work) board.
Tue, Apr 24, 5:23 PM · I18n, Discovery-Search (Current work), Discovery
TJones moved T191544: Deploy the analysis config for the new Slovak stemmer from In progress to Waiting/Blocked on the Discovery-Search (Current work) board.
Tue, Apr 24, 5:19 PM · Patch-For-Review, Discovery-Search (Current work)
TJones claimed T191544: Deploy the analysis config for the new Slovak stemmer.
Tue, Apr 24, 3:28 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T191544: Deploy the analysis config for the new Slovak stemmer from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Apr 24, 3:28 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T191543: Deploy updated search/extra plugin with Slovak Stemmer from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Apr 24, 3:28 PM · Patch-For-Review, Discovery-Search (Current work)

Mon, Apr 23

TJones added a comment to T192395: Create Croatian, Serbo-Croatian, and Bosnian Analysis Chains Using Serbian Morphological Libraries.

Everything looks good to me with the Serbian analyzer and ICU folding enabled, but need speaker review. If everything looks good to them, then I'll deploy the analysis chain and then re-index the relevant wikis.

Mon, Apr 23, 5:11 PM · I18n, Discovery-Search (Current work), Discovery

Fri, Apr 20

TJones closed T138857: Serbian language search differentiates between Cyrillic and Latin alphabets as Resolved.

This is working now on all Serbian-language wikis. Ranking still prefers exact matches, so Beograd and Београд give the same results but in a different order—and since there are over 40,000 results and many with partial title matches, the order difference can be significant.

Fri, Apr 20, 1:26 PM · CirrusSearch, Discovery-Search, MediaWiki-Internationalization, Discovery
TJones moved T189265: Re-index Serbian Wikis from In progress to Done on the Discovery-Search (Current work) board.
Fri, Apr 20, 1:17 PM · I18n, Discovery-Search (Current work), Discovery
TJones moved T189265: Re-index Serbian Wikis from Backlog to In progress on the Discovery-Search (Current work) board.
Fri, Apr 20, 1:17 PM · I18n, Discovery-Search (Current work), Discovery
TJones added a comment to T189265: Re-index Serbian Wikis.

The index is still catching up, but the re-index is done and search on all Serbian-language wikis is now using the Serbian analysis chain!

Fri, Apr 20, 1:17 PM · I18n, Discovery-Search (Current work), Discovery

Thu, Apr 19

TJones updated subscribers of T189265: Re-index Serbian Wikis.

Yesterday's message from @dcausse about reindexing on eqiad was actually reindexing on codfw, and is complete. Reindexing on eqiad is happening now.

Thu, Apr 19, 1:32 PM · I18n, Discovery-Search (Current work), Discovery
TJones moved T189239: Deploy initial version of the extra-analysis plugin from In progress to Done on the Discovery-Search (Current work) board.
Thu, Apr 19, 12:33 PM · Patch-For-Review, Discovery-Search (Current work), Discovery

Wed, Apr 18

TJones created T192502: Don't index empty strings in Elasticsearch.
Wed, Apr 18, 9:23 PM · Discovery-Search

Tue, Apr 17

TJones moved T192395: Create Croatian, Serbo-Croatian, and Bosnian Analysis Chains Using Serbian Morphological Libraries from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Apr 17, 6:29 PM · I18n, Discovery-Search (Current work), Discovery
TJones triaged T192395: Create Croatian, Serbo-Croatian, and Bosnian Analysis Chains Using Serbian Morphological Libraries as Normal priority.
Tue, Apr 17, 6:29 PM · I18n, Discovery-Search (Current work), Discovery
TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

Okay... a plausibly final version of the patch is up. Just waiting for code review. Thanks to @DonAlessandro for all the help getting everything just right.

Tue, Apr 17, 3:35 PM · Patch-For-Review, MediaWiki-Language-converter

Wed, Apr 11

TJones updated the task description for T191925: Discuss use of Finite State Transducer based formalism for language variant implementations.
Wed, Apr 11, 4:54 PM · Services (watching), TechCom, Parsoid

Tue, Apr 10

TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

No worries!

Tue, Apr 10, 3:22 PM · Patch-For-Review, MediaWiki-Language-converter

Mon, Apr 9

TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

I will review it by tomorrow afternoon (UTC +3).

Mon, Apr 9, 1:40 PM · Patch-For-Review, MediaWiki-Language-converter

Sat, Apr 7

TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

@DonAlessandro, I'd appreciate it if you could review the patch above for linguistic improvement. If you have any questions or suggestions, let me know!

Sat, Apr 7, 7:36 AM · Patch-For-Review, MediaWiki-Language-converter

Fri, Apr 6

TJones awarded T44085: Wikimedia needs a URL shortener (tracking) a Like token.
Fri, Apr 6, 10:47 PM · RfC, Patch-For-Review, TechCom-RFC (TechCom-Approved), Tracking, WorkType-NewFunctionality, Wikimedia-General-or-Unknown
TJones added a comment to T191114: Implement personalized search for logged-in wiki users.

It's true that publishing zero-result searches (or any searches) is a more obvious potential privacy leak than personalization, because the data for personalization should be hidden away internally. However, if there ever were a data breach of any kind, personalization data would be a concentrated source of information on users. I'm not an ops engineer, but storing and securing personalized data properly for hundreds of thousands to millions of users could entail more hardware, and seems likely to require more general complexity in our infrastructure.

Fri, Apr 6, 4:05 PM · Discovery, Discovery-Search

Thu, Apr 5

TJones added a comment to T95404: A Hebrew article title with an apostrophe cannot be found when searching without an apostrophe.

From the Description:

It's just a simple and common punctuation mark, so the search engine should be smart enough to find the article.

Thu, Apr 5, 8:54 PM · Discovery-Search, CirrusSearch, Discovery, I18n, MediaWiki-Search
TJones added a comment to T95404: A Hebrew article title with an apostrophe cannot be found when searching without an apostrophe.

Are apostrophes only normally omitted at the end of a word? Would it be omitted in צ'ארלס‬, and searched as צארלס‬?

Thu, Apr 5, 7:11 PM · Discovery-Search, CirrusSearch, Discovery, I18n, MediaWiki-Search
TJones updated the task description for T95404: A Hebrew article title with an apostrophe cannot be found when searching without an apostrophe.
Thu, Apr 5, 7:08 PM · Discovery-Search, CirrusSearch, Discovery, I18n, MediaWiki-Search
TJones added a comment to T95404: A Hebrew article title with an apostrophe cannot be found when searching without an apostrophe.

Term vectors for that page on title.near_match reports "סנדומייז " with an additional space that should probably not be on the end of the token. This seems like potentially a bug in the hebrew analyzer. @TJones any thoughts?

Thu, Apr 5, 6:44 PM · Discovery-Search, CirrusSearch, Discovery, I18n, MediaWiki-Search
TJones added a comment to T191485: [[MediaWiki:Cirrussearch-completion-profile-fuzzy-pref-desc/ml]] i18n issue.

This works most of the time but has some false positives like: Life assurance => Life insurance

Thu, Apr 5, 6:09 PM · Discovery-Search, Discovery, CirrusSearch, I18n
TJones updated subscribers of T191485: [[MediaWiki:Cirrussearch-completion-profile-fuzzy-pref-desc/ml]] i18n issue.

Summoning @dcausse—I'm happy to help with wordsmithing, but I'm not sure what a "close redirect" is, either.

Thu, Apr 5, 5:44 PM · Discovery-Search, Discovery, CirrusSearch, I18n
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Thu, Apr 5, 4:41 PM · Discovery-Search (Current work), Discovery
TJones updated the task description for T191543: Deploy updated search/extra plugin with Slovak Stemmer.
Thu, Apr 5, 4:37 PM · Patch-For-Review, Discovery-Search (Current work)
TJones updated the task description for T191544: Deploy the analysis config for the new Slovak stemmer.
Thu, Apr 5, 4:37 PM · Patch-For-Review, Discovery-Search (Current work)
TJones triaged T191545: Re-index Slovak Wikis after analysis chain is deployed as Normal priority.
Thu, Apr 5, 4:36 PM · Discovery-Search (Current work)
TJones triaged T191544: Deploy the analysis config for the new Slovak stemmer as Normal priority.
Thu, Apr 5, 4:36 PM · Patch-For-Review, Discovery-Search (Current work)
TJones triaged T191543: Deploy updated search/extra plugin with Slovak Stemmer as Normal priority.
Thu, Apr 5, 4:34 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T190815: Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm from Needs review to Done on the Discovery-Search (Current work) board.
Thu, Apr 5, 3:32 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T190815: Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm from In progress to Needs review on the Discovery-Search (Current work) board.
Thu, Apr 5, 3:32 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones edited projects for T191535: Evaluate need for Myanmar Zawgyi encoding detection/transliteration in search, added: Discovery-Search; removed Discovery.
Thu, Apr 5, 2:58 PM · Discovery-Search
TJones added a project to T191535: Evaluate need for Myanmar Zawgyi encoding detection/transliteration in search: Discovery.
Thu, Apr 5, 2:57 PM · Discovery-Search
TJones created T191535: Evaluate need for Myanmar Zawgyi encoding detection/transliteration in search.
Thu, Apr 5, 2:55 PM · Discovery-Search

Mon, Apr 2

TJones added a comment to T190815: Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm.

The update to the extra/search plugin above is a work in progress because it does not yet contain unit tests. However, I was able to use the plugin to test the full analysis chain. The write up is on MediaWiki. The key points:

Mon, Apr 2, 8:59 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T178929: Review Slovak Morphological Libraries from Needs review to Done on the Discovery-Search (Current work) board.
Mon, Apr 2, 8:43 PM · Discovery-Search (Current work), Discovery

Thu, Mar 29

TJones added a comment to T187148: Evaluate features provided by `query_explorer` functionality of ltr plugin.

I'm a bit behind in my reading! This is very cool stuff.

Thu, Mar 29, 1:59 PM · MW-1.31-release-notes (WMF-deploy-2018-04-10 (1.31.0-wmf.29)), MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Discovery-Search (Current work), Discovery

Wed, Mar 28

TJones moved T190816: Add support for external stemmer to Analyzer Analysis tools from Needs review to Done on the Discovery-Search (Current work) board.
Wed, Mar 28, 2:00 PM · Patch-For-Review, Discovery-Search (Current work)

Tue, Mar 27

TJones moved T184771: Set up RelForge test of phonetic title search from Current work to This Quarter on the Discovery-Search board.
Tue, Mar 27, 5:38 PM · Discovery-Search
TJones moved T190816: Add support for external stemmer to Analyzer Analysis tools from In progress to Needs review on the Discovery-Search (Current work) board.
Tue, Mar 27, 2:45 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T190816: Add support for external stemmer to Analyzer Analysis tools from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Mar 27, 2:43 PM · Patch-For-Review, Discovery-Search (Current work)
TJones created T190816: Add support for external stemmer to Analyzer Analysis tools.
Tue, Mar 27, 2:43 PM · Patch-For-Review, Discovery-Search (Current work)
TJones moved T190815: Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Mar 27, 2:34 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones triaged T190815: Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm as Normal priority.
Tue, Mar 27, 2:34 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T178929: Review Slovak Morphological Libraries from In progress to Needs review on the Discovery-Search (Current work) board.
Tue, Mar 27, 2:15 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

Ugh. I got a decent version of #1 here and T189512 working yesterday. I was still struggling with Roman numerals—some anchoring of some particular regular expressions was misbehaving—and some of the other regexes still needed to be re-integrated, but it was working. I ran the unit tests and got an unexpected failure!: Compilation failed: regular expression is too large at offset 55600.

Tue, Mar 27, 2:13 PM · Patch-For-Review, MediaWiki-Language-converter

Mar 23 2018

TJones added a comment to T178929: Review Slovak Morphological Libraries.

Full write up is on Mediawiki.

Mar 23 2018, 12:24 AM · Discovery-Search (Current work), Discovery

Mar 22 2018

TJones added a comment to T187148: Evaluate features provided by `query_explorer` functionality of ltr plugin.

So, almost entirely limited to title and redirect fields.

Mar 22 2018, 3:27 PM · MW-1.31-release-notes (WMF-deploy-2018-04-10 (1.31.0-wmf.29)), MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Discovery-Search (Current work), Discovery

Mar 20 2018

TJones added a comment to T176428: Search Relevance test #4 - action items.

Should this go in "Needs Review"? It seems to cover multiple topics, so we need a column labeled "It's Complicated"

Mar 20 2018, 6:31 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.
  1. Crowdsourced labeling platform would be tedious indeed. And we can't guarantee the authenticity of volunteers. I gave one of my batchmates the same form. He started assigning random labels to each of result page of a particular query. The point being anyone will get annoyed in this tedious procedure.
Mar 20 2018, 3:15 PM · Possible-Tech-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 15 2018

TJones created T189791: 2018 Hackathon: Tell Me Why Your Search Sucks!.
Mar 15 2018, 4:23 PM · Discovery-Search, Wikimedia-Hackathon-2018

Mar 14 2018

TJones added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

Also, I am wondering if you would be able to suggest two mentors (including you perhaps, but not sure if you will have time :P) who would be willing to mentor for this project if a student shows interest.

Mar 14 2018, 6:49 PM · Possible-Tech-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 13 2018

TJones moved T189239: Deploy initial version of the extra-analysis plugin from Backlog to In progress on the Discovery-Search (Current work) board.
Mar 13 2018, 5:34 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones moved T178929: Review Slovak Morphological Libraries from Backlog to In progress on the Discovery-Search (Current work) board.
Mar 13 2018, 5:29 PM · Discovery-Search (Current work), Discovery
TJones added a comment to T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries.

When this is done and Serbian wikis are re-indexed (T189265), then T138857: Serbian language search differentiates between Cyrillic and Latin alphabets will be done, too.

Mar 13 2018, 1:43 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery
TJones added a comment to T138857: Serbian language search differentiates between Cyrillic and Latin alphabets.

We would still need deal with the bald Latin search (T138858), but the upcoming Serbian analysis chain (T183015) will take care of the Cyrillic-vs-Latin search, in addition to doing some basic stemming.

Mar 13 2018, 1:42 PM · CirrusSearch, Discovery-Search, MediaWiki-Internationalization, Discovery
TJones added a comment to T189239: Deploy initial version of the extra-analysis plugin.

I was able to download and install the plugin from maven central, and I've updated the plugin docs with installation instructions (and spot bugs instructions, also copied from search/extra). Patch here.

Mar 13 2018, 1:28 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones added a comment to T189511: Rename crh from "Crimean Turkish" to "Crimean Tatar".

Thanks, @Nikerabbit. I've opened a ticket with CLDR. I guess that's all we can do; should we go ahead and close the ticket here?

Mar 13 2018, 10:18 AM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, WikimediaMessages, MediaWiki-extensions-CLDR

Mar 12 2018

TJones added a comment to T189511: Rename crh from "Crimean Turkish" to "Crimean Tatar".

Thanks @Jayprakash12345!

Mar 12 2018, 6:13 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, WikimediaMessages, MediaWiki-extensions-CLDR
TJones created T189512: Crimean Tatar/crh transliteration should not block on "km²".
Mar 12 2018, 5:18 PM · Patch-For-Review, MediaWiki-Language-converter
TJones added a comment to T189511: Rename crh from "Crimean Turkish" to "Crimean Tatar".

I'm not sure where this info is kept, so I'm not sure what project it should belong to.

Mar 12 2018, 5:12 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, WikimediaMessages, MediaWiki-extensions-CLDR
TJones created T189511: Rename crh from "Crimean Turkish" to "Crimean Tatar".
Mar 12 2018, 5:12 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, WikimediaMessages, MediaWiki-extensions-CLDR
TJones added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

Though this ticket is closed, I wanted to document what happened. There is a 12-hour cache for some of the components in the transliteration. It's possible to purge the cache, but waiting 12 hours also works. In this case, none of the changed pieces were incompatible, so waiting was a reasonable answer, even if it was a bit less satisfying.

Mar 12 2018, 3:51 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Patch-For-Review, MediaWiki-Language-converter

Mar 9 2018

TJones added a comment to T189239: Deploy initial version of the extra-analysis plugin.

Awesome—thanks!

Mar 9 2018, 5:59 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones closed T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю as Resolved.

Woo hoo! It looks like either waiting was the answer, or some helpful WikiGnome purged a cache somewhere.

Mar 9 2018, 3:26 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Patch-For-Review, MediaWiki-Language-converter
TJones added a comment to T189239: Deploy initial version of the extra-analysis plugin.

@Gehel—thanks, I knew there was more to the actual deployment of the plugin. If you could do the release, that'd be great. I appreciate all the help from you and @dcausse.

Mar 9 2018, 3:05 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones updated the task description for T189239: Deploy initial version of the extra-analysis plugin.
Mar 9 2018, 3:03 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones added a comment to T188686: Set up CI and github sync for new extra-analysis repo.

@hashar — so much happened while I was asleep! Thanks for getting everything into good shape!

Mar 9 2018, 2:59 PM · Patch-For-Review, Repository-Admins, Continuous-Integration-Config, GitHub-Mirrors, Release-Engineering-Team

Mar 8 2018

TJones edited parent tasks for T189265: Re-index Serbian Wikis, added: T189239: Deploy initial version of the extra-analysis plugin; removed: T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries.
Mar 8 2018, 11:10 PM · I18n, Discovery-Search (Current work), Discovery
TJones removed a subtask for T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries: T189265: Re-index Serbian Wikis.
Mar 8 2018, 11:10 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery
TJones added a subtask for T189239: Deploy initial version of the extra-analysis plugin: T189265: Re-index Serbian Wikis.
Mar 8 2018, 11:10 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones added a comment to T189265: Re-index Serbian Wikis.

I'm already working on the messaging, with a plan to post to the Serbian Wikipedia and Wiktionary Village Pumps and to Tech News before the re-indexing happens.

Mar 8 2018, 11:10 PM · I18n, Discovery-Search (Current work), Discovery
TJones triaged T189265: Re-index Serbian Wikis as Normal priority.
Mar 8 2018, 11:09 PM · I18n, Discovery-Search (Current work), Discovery
TJones moved T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries from Needs review to Done on the Discovery-Search (Current work) board.
Mar 8 2018, 11:06 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery
TJones added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

"Thursday" in California may be Friday across the Atlantic, but it looks like the patch has been deployed (crhwiki is in "Group 2" and the patch is in "1.31.0-wmf.24" which is now live on Group 2), and there are many fewer errors in @DonAlessandro's example.

Mar 8 2018, 10:12 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Patch-For-Review, MediaWiki-Language-converter
TJones placed T189239: Deploy initial version of the extra-analysis plugin up for grabs.
Mar 8 2018, 7:36 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones triaged T189239: Deploy initial version of the extra-analysis plugin as Normal priority.
Mar 8 2018, 7:36 PM · Patch-For-Review, Discovery-Search (Current work), Discovery
TJones added a comment to T188686: Set up CI and github sync for new extra-analysis repo.

Gerrit has indeed now replicated the repo to GitHub. Thanks!

Mar 8 2018, 7:20 PM · Patch-For-Review, Repository-Admins, Continuous-Integration-Config, GitHub-Mirrors, Release-Engineering-Team
TJones moved T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries from In progress to Needs review on the Discovery-Search (Current work) board.
Mar 8 2018, 4:18 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery
TJones added a comment to T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries.
  • The Serbian stemmer plugin (in the new search/extra-analysis plugin) is just about ready, in Change 415788 above.
  • The Analysis config to use it, with additional analysis chain config, including diacritic folding, is in Change 417299 above.
  • My write up of my analysis chain analysis is available on MediaWiki.
    • Summary: enabled ICU folding with Serbian exceptions, and it works well with the stemmer, nothing unexpected.
Mar 8 2018, 4:18 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery

Mar 6 2018

TJones updated subscribers of T188686: Set up CI and github sync for new extra-analysis repo.

@dcausse set up the CI here: https://gerrit.wikimedia.org/r/#/c/416743/

Mar 6 2018, 9:16 PM · Patch-For-Review, Repository-Admins, Continuous-Integration-Config, GitHub-Mirrors, Release-Engineering-Team
TJones moved T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries from Needs review to In progress on the Discovery-Search (Current work) board.
Mar 6 2018, 6:34 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery
TJones moved T183015: Create Serbian Elasticsearch Plugin/Analysis Chain Using Serbian Morphological Libraries from In progress to Needs review on the Discovery-Search (Current work) board.
Mar 6 2018, 2:35 PM · MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), Patch-For-Review, I18n, Discovery-Search (Current work), Discovery

Mar 5 2018

TJones added a comment to T188686: Set up CI and github sync for new extra-analysis repo.

Cool. Hopefully I'll be committing soon. Thanks again!

Mar 5 2018, 7:45 PM · Patch-For-Review, Repository-Admins, Continuous-Integration-Config, GitHub-Mirrors, Release-Engineering-Team
TJones added a comment to T188686: Set up CI and github sync for new extra-analysis repo.

Thanks, @Reedy! Does anything else need to be done for the github synchronization?

Mar 5 2018, 7:37 PM · Patch-For-Review, Repository-Admins, Continuous-Integration-Config, GitHub-Mirrors, Release-Engineering-Team
TJones updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Mar 5 2018, 5:49 PM · Discovery-Search (Current work), Discovery

Mar 2 2018

TJones created T188686: Set up CI and github sync for new extra-analysis repo.
Mar 2 2018, 1:23 AM · Patch-For-Review, Repository-Admins, Continuous-Integration-Config, GitHub-Mirrors, Release-Engineering-Team

Feb 28 2018

TJones added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

I believe this should go out in next week's deployment, and should be live next Thursday if all goes well.

Feb 28 2018, 9:24 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Patch-For-Review, MediaWiki-Language-converter
TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

I spoke to some other developers on the search team, including folks who are more knowledgeable about PHP and how our servers are configured. It may be possible to use a giant regex (which would be parsed for maximum efficiency and cached by PHP) and preg_replace_callback() to implement the equivalent of preg_replace_callback_array() (which is available in PHP 7, so not yet available for MediaWiki).

Feb 28 2018, 7:26 PM · Patch-For-Review, MediaWiki-Language-converter

Feb 27 2018

TJones added a comment to T156474: Add the possibility to do regex search on titles.

Don't forget to update the documentation! Is that a separate task?

Feb 27 2018, 3:32 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Discovery-Search (Current work), Patch-For-Review, CirrusSearch, Discovery

Feb 26 2018

TJones added a comment to T188321: CRH Transliteration pattern matching fixes.

Notes on options for issue (1), matching exceptions:

Feb 26 2018, 10:02 PM · Patch-For-Review, MediaWiki-Language-converter
TJones created T188321: CRH Transliteration pattern matching fixes.
Feb 26 2018, 9:54 PM · Patch-For-Review, MediaWiki-Language-converter
TJones added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

Based on my time constraints and the severity of the bugs I found and fixed, I've submitted the patch above, which fixes the incorrect Ö/ö -> Ё/ё and Ü/ü -> Ю/ю mappings and which should have the regex and exception tables loading properly in production.

Feb 26 2018, 6:12 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Patch-For-Review, MediaWiki-Language-converter
TJones awarded T156474: Add the possibility to do regex search on titles a Like token.
Feb 26 2018, 5:40 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Discovery-Search (Current work), Patch-For-Review, CirrusSearch, Discovery

Feb 23 2018

TJones added a comment to T186727: Cimean Tatar transliteration has trouble with ё, ь, э, ю.

Okay, using the example @DonAlessandro provided, I've been able to fix most of the errors. I also found two major bugs.

Feb 23 2018, 10:14 PM · MW-1.31-release-notes (WMF-deploy-2018-03-06 (1.31.0-wmf.24)), Patch-For-Review, MediaWiki-Language-converter