Page MenuHomePhabricator

Cirrus-streaming-updater test: validate relforge indices are correctly updated
Closed, ResolvedPublic

Description

Per 2023/10/26 conversation with @dcausse , @pfischer and @EBernhardson ,

There is a script called compare-clusters.py in the mediawiki-extensions-CirrusSearch repository. We could potentially use this script to validate the data created by the test streaming updater.

Creating this ticket to:

  • Validate relforge indices are correctly updated.

Details

Other Assignee
RKemper

Event Timeline

Per pairing session today, the above script needs a small bit of work to fetch the entire document. We'd use it to compare the pages flowing to relforge (via the cirrus streaming updater test) to the ES documents in production. We (as in SREs) should be able to handle this, so moving this to our column.

Gehel triaged this task as High priority.Nov 3 2023, 10:29 AM
Gehel moved this task from Incoming to Ready for Work on the Data-Platform-SRE board.
bking moved this task from Ready for Work to In Progress on the Data-Platform-SRE board.
bking updated Other Assignee, added: RKemper.

@EBernhardson @dcausse based on chatter in Wikimedia-Search , it seems like we're already past this point? Like, we're already doing these same tests in cloudelastic? Let us know if we still need to do this, feel free to close ticket if not.

I've run this a few times, it claims the indices in relforge match the ones in production. I'm still a bit suspicious that it passed on the first try, maybe could try harder to see what is broken. But we've done the testing and what we have so far claims to work.

@pfischer Wanted to ask you since Erik is out for the holidays...what is your level of confidence that the indices are created correctly? In other words, can we close this one or do we need to leave it open so we can continue testing?

Per conversation with @pfischer yesterday, here are the latest numbers. The mismatches have decreased by an order of magnitude since the last run.

I did have a few questions that didn't seem to be answered by the script output:

  • What is the target amount of consistency, and
  • How can we measure it?

For example, we might want more detail about revision mismatches. Which one is newer? How many revisions behind is the trailing index, and how does that compare to the rate of edits on the article?

Apologies if this is already being discussed, just a few thoughts I had from reading the script output.

Gehel removed bking as the assignee of this task.Dec 20 2023, 2:35 PM
Gehel assigned this task to pfischer.
Gehel moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.

We can also use Saneitizer on Cloudelastic to see how much it diverges from other clusters (will be tracked on another ticket)

We followed this data over time and it seemed to stay in line. We've now progressed from relforge to a cloudelastic deployment and we can probably consider this complete.