Cool! Thanks, @Smalyshev!
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Feb 13 2019
Woo hoo!
In T215966#4952468, @gerritbot wrote:Change 490412 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] admin: reset Julia SSH key
Removing this from current work and moving it to the "Language Stuff" backlog. I'm the only one who could work on this this quarter, and I'm a bit out of my depth with the integration. We'll reprioritize this for future work when we can assign a slightly larger team (≥2 people) to work on it.
Feb 12 2019
@Julia.glen, I think this patch should give you an account, but as user juliaglen. You may need to add User juliaglen to your ssh config.
Hmm—what about Nori (the Korean analyzer) and LTR? I believe we have to disable LTR for Korean, enable Nori, gather more data, then rebuild the LTR model. Sounds like maybe all of that should wait until after the ES upgrade, even though it means re-indexing Korean wikis at a later date.
Looks good, and all the detail is much appreciated.
Feb 11 2019
Sounds good to me! If it turns out that the smallest volume languages have trouble, we can fall back to larger languages on the list.
Should be done now—so try again, please!
@Julia.glen, my hue username has the same weird capitalization as Gerrit (Tjones), which I don't use elsewhere.
In T212885#4944841, @Julia.glen wrote:I am unable to access hue.wikimedia.org with my LDAP account. Could you take a look? Thanks.
What languages should we initially investigate?
Feb 8 2019
Thanks, @Mooeypoo! That looks like it could work. I really appreciate your explanations and your patches!
Everything looks good now. Serbian (et al.) and Esperanto are working as expected. Thanks, @dcausse!
I can log in now. Thanks!
First draft done. Full details on MediaWiki.
Feb 7 2019
By "LDAP" I assume you mean the login for Wikitech, etc. OIT uses "LDAP" to refer to your Google Apps login, too.
Feb 6 2019
Thanks for looking into this, @Mooeypoo! It's too bad that there isn't a way to make it work now, but I'm glad this provides another use case for potential enhancements to OOUI. If the extra functionality ever gets implemented, ping me if you remember!
Feb 5 2019
In T215346#4929559, @Jdforrester-WMF wrote:Well, users can do var searchInputWidget = OO.ui.infuse($('#searchText')); to get a handle on the OOUI widget. Is that not sufficient?
Thanks, @Dzahn!
Jan 31 2019
In T170099#4920016, @TTO wrote:I regret not expressing my gratitude or commenting here at the time.
Jan 30 2019
Seems to be fixed now.
Seems to be fixed now.
Closing this because after looking into it a while back I decided that internal confidence isn't really a thing for TextCat to do, and easy things to improve the quality of TextCat results were done.
I think everything here is fixed. ö, ä, and å are all treated as independent letters and using a instead of ä is the same as using u instead of ä, and other diacritics like á are ignored. Depending on whether you use the completion suggester, go feature, or full text search, you get additional suggestions depending on the place of the typos or the frequency of the incorrect word—all as expected.
I'm going to close this because it was written before we moved to Elasticsearch. The current behavior of Elasticsearch is the same for both these characters and their proposed normalization: all of are ignored during tokenization. In general, we have implemented ICU Normalization for English-language projects, so most non-punctuation characters are normalized well.
If we end up having to abandon HebMorph then either there won't be any morphological processing at all or, if we find a replacement, there will be a completely different set of specific errors. I guess we can leave it as stalled for as long as we have HebMorph. And I'll modify the description to be more generic since it isn't about this particular word, but about the ability to make additions to the HebMorph dictionary.
We're not sure if we're going to be able to keep using Hebmorph because it hasn't been released for Elasticsearch 6. @dcausse recompiled it so we probably can go into ES6, but beyond that it's unclear, so putting any significant effort into fixing parses for specific words is unlikely to be something we can do.
I'm going to go ahead an close this. I don't think we're going to have time to explore option 3, and hopefully the documentation and the blog post can help people understand what's going on. Please re-open if you think it's closed in error.
It looks like the one problem document has been fixed. (There's one result at the moment, but it has the source in the query.) The immediate workaround may be a null edit. The medium-term fix is the "saneitizer" job that cleans up everything every two weeks (meaning a problem like this has an average life span of a week). Given that nothing showed up in the logs, there's nothing really we can do. If it happens again, please re-open this ticket or open another one and we'll see if the logs capture any odd behavior.