Page MenuHomePhabricator

Reindex English and Italian wikis to enable homoglyph plugin
Closed, ResolvedPublic1 Estimated Story Points

Description

Once T268730 is deployed, we need to reindex English- and Italian-language wikis to enable the homoglyph plugin.

Notes:

  • Italian wikis are done
  • English wikis other than Commons and Wikidata are done
  • See comments below for details.

Event Timeline

Gehel set the point value for this task to 1.Feb 15 2021, 4:22 PM

Mentioned in SAL (#wikimedia-operations) [2021-03-22T17:30:15Z] <Trey314159> reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-03-23T15:25:41Z] <Trey314159> reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-03-30T23:59:40Z] <Trey314159> reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-04-13T15:12:15Z] <Trey314159> reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (with some failures) (T274200)

The massive reindexing run is complete, but it was not entirely successful.

  • The File index failed for Commonswiki on eqiad.
  • The Wikidata reindex failed on both eqiad and cloudelastic.

I'll do some clean up and then we can regroup and decide how best to handle these.

Looks like there are no *new* broken indexes as the result of this update. I'll bring this up in the Wednesday Meeting tomorrow and we'll figure out what to try next rather than just repeat the reindexing and hope for a better outcome.

@MPhamWMF and others: Write up on before-and-after testing is on Mediawiki.

Summary: We can't easily detect the impact in a sample of 5K queries, but we know it is a rare problem, so that isn't a huge surprise. (What was surprising is that there was a Latin/Cyrillic/Greek triple homoglyph example in the 5K query sample... so if we had Greek homoglyphs, too, it would have gotten some results!)

Thanks. It seems like the the rarity of the problem was confirmed. Knowing what we know from having done the work, and how much effort it was, do you think it was 'worth it'? It seems relatively low impact (though I know we can't easily tell), but also it seemed to be a low effort task (1pt estimated, though I suspect it is inflated a bit by how long it took to reindex English

The reindexing wasn't that much work, but we spent a lot more time working on homoglyphs before reindexing:

Was it worth it? I'd say yes, even if not based on the impact it had on search results.

This was something I found a good while back, and then Erik re-discovered in T222669. I had done a little work on it as a 10% project because it really annoys me that vandals (and typos) can make text readable but unsearchable, and editors can't normally tell there's a problem. But it turned out to also be a good project for Maryum as she was ramping up on all things Search Platform, because building a plugin is fairly self-contained, at least compared to lots of other stuff we have going on.

It also highlighted the fact that our reindex process—especially when we reindex lots of wikis at once in batch mode—is not only a bit brittle, but also not very good at alerting us to failures. These were fairly bad failures, too—as in the given analysis chain will always fail on certain input, and we got no alerts. I think this puts more priority on T219507: Create cookbook to reindex into elasticsearch / cirrus and making it smart enough to tell you what did and didn't happen.

The failures in this round of re-reindexing also highlight that Commons and Wikidata are putting real, serious strain on our infrastructure. Both are so big it may not be possible to reindex them on machines that are actively serving queries, which makes reindexing much more complex. Reindexing Wikidata may also be too much for the cloudelastic cluster even though (I think!) it is not usually under too much stress.

All of that came from making a small change that was very widespread (it affected every wiki); it stressed our infrastructure in different ways than we usually do, and it has been enlightening.

(I also think that the original expected level of work for building and deploying the plugin was worth it because it fixes an "invisible" error on the wikis.)

The consensus of the Wednesday Brain Trust is that trying again is the best short term solution, so I'll work on that, and report back here as events unfold.

Mentioned in SAL (#wikimedia-operations) [2021-04-15T19:42:09Z] <Trey314159> reindexing commons and wikidata on elastic@eqiad (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-04-15T19:43:04Z] <Trey314159> reindexing wikidata on cloudelastic (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-04-15T19:56:25Z] <Trey314159> reindexing wikidata on cloudelastic finished/failed (T274200)

Welp, reindexing wikidata on cloudelastic failed and failed fast:

        Creating index...ok
                Validating number of shards...
Unexpected Elasticsearch failure.
Elasticsearch failed in an unexpected way. This is always a bug in CirrusSearch.
Error type: Elastica\Exception\ResponseException
Message: index_not_found_exception: no such index

Looks like it failed to create the new index, though there was a new index there, which I've cleaned up. Since this is cloudelastic and not production, I'll let it sit until next week.

Mentioned in SAL (#wikimedia-operations) [2021-04-16T20:40:33Z] <Trey314159> reindexing wikidata on cloudelastic... AGAIN (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-04-19T22:37:39Z] <Trey314159> reindexing commons and wikidata on elastic@eqiad finished/failed (T274200)

Mentioned in SAL (#wikimedia-operations) [2021-04-19T22:37:47Z] <Trey314159> reindexing wikidata on cloudelastic finished/failed (T274200)

I'm going to stop trying to get the last few indexes done on this ticket, since Italian is done and all of the English-specific wikis are done. I'll open a new sub-ticket with specific info on what still needs to be done.