Page MenuHomePhabricator

Difftesting between staging and production
Closed, ResolvedPublic

Assigned To
Authored By
Jgiannelos
Jan 22 2025, 7:37 PM
Referenced Files
F58355476: image.png
Feb 4 2025, 2:35 PM
F58355469: image.png
Feb 4 2025, 2:34 PM
F58355280: image.png
Feb 4 2025, 2:02 PM
F58355223: image.png
Feb 4 2025, 1:53 PM
F58317361: image.png
Jan 30 2025, 4:20 PM

Description

Since most of the issues are tackled on staging and some casual browsing doesn't show up any rendering issues we need to run difftesting between current prod and staging to check how many inconsistencies we have between the old and the new version.

Event Timeline

  1. I created a dataset with Kartotherian URLs after parsing wikipedia articles that have kartotherian references from: {en,de,fa,zh,ja,ru}wiki
  2. Shuffled and sampled (stratified) 100 from each (total 600 articles)
  3. Fetched the kartotherian snapshot URLs from current prod (from maps nodes) and staging
  4. Calculated the SSIM (similarity index) of the 2 versions
  5. Exported the output of the diff image to be able to inspect whats happening

From a quick look to the results:

results['ssim'].quantile(([0, 0.25, 0.5, 0.75, 1.0]))
quantilessim
00.429146
0.250.938396
0.50.977298
0.750.998523
11

This means that we do have some inconsistencies between staging and prod but with a very high level look 75% of the sample has similarity more than 93%. On the bright side other than the similarity this test is a very good smoke test to see if something is wrong in the upgrade/migration but things look OK overall (no errors raised so expect of some transient issues most of the requests returned 200 status codes).

Change #1113816 had a related patch set uploaded (by Elukey; author: Elukey):

[mediawiki/services/kartotherian@master] blubber: add fonts-noto and fonts dejavu to the prod variant

https://gerrit.wikimedia.org/r/1113816

From a similar run but testing A/B between staging(eqiad) /prod(eqiad) since in the previous test before it was targeting prod in codfw the results are:

|    quantile  |     ssim |
|-----:|---------:|
| 0    | 0.799977 |
| 0.25 | 0.969635 |
| 0.5  | 0.993923 |
| 0.75 | 0.999367 |
| 1    | 1        |

Change #1113816 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] blubber: add fonts-noto and fonts dejavu to the prod variant

https://gerrit.wikimedia.org/r/1113816

Change #1113835 had a related patch set uploaded (by Elukey; author: Elukey):

[mediawiki/services/kartotherian@master] blubber: add more fonts packages to close the gap with prod

https://gerrit.wikimedia.org/r/1113835

Change #1113835 abandoned by Elukey:

[mediawiki/services/kartotherian@master] blubber: add more fonts packages to close the gap with prod

https://gerrit.wikimedia.org/r/1113835

Change #1113842 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: bump kartotherian's Docker image

https://gerrit.wikimedia.org/r/1113842

Change #1113842 merged by Elukey:

[operations/deployment-charts@master] services: bump kartotherian's Docker image

https://gerrit.wikimedia.org/r/1113842

Change #1114420 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: update Kartotherian's docker image

https://gerrit.wikimedia.org/r/1114420

Change #1114420 merged by Elukey:

[operations/deployment-charts@master] services: update Kartotherian's docker image

https://gerrit.wikimedia.org/r/1114420

Latest diff test run:

quantilessim
0.250.99175
0.50.998551
0.751
0.91
0.951
0.991

Looks much better after the latest patches

Change #1115049 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: set the Tegola's cluster local endpoint for Kartotherian

https://gerrit.wikimedia.org/r/1115049

Change #1115049 merged by Elukey:

[operations/deployment-charts@master] services: set the Tegola's cluster local endpoint for Kartotherian

https://gerrit.wikimedia.org/r/1115049

Latest difftesting after fixing localization

quantilessim
0.10.983166
0.20.992429
0.250.993921
0.50.998057
0.750.999939
0.91
0.951
0.991

@elukey I double checked the results and the issue was fixed. I am taking a look at the diffs that show some inconsistencies but its not something to worry so far.

Difftesting between current prod (bare metal) and k8s prod deployment (eqiad):

quantilessim
0.10.990876
0.20.994956
0.250.995939
0.50.999625
0.751
0.91
0.951
0.991

Quick back of the napkin calculation of latency in the response between A and B.

A: kartotherian prod (maps1009)
B: kartotherian prod k8s (wikikube worker)

quantilePercentage diff of latency between A and B %
0.1-29.5699
0.2-20.8101
0.25-19.2296
0.5-5.18895
0.756.43051
0.916.1314
0.9526.5609
0.99127.555

Change #1115420 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: bump kartotherian's allowed millicores to 5k

https://gerrit.wikimedia.org/r/1115420

Change #1115420 merged by Elukey:

[operations/deployment-charts@master] services: bump kartotherian's allowed millicores to 5k

https://gerrit.wikimedia.org/r/1115420

After some back and forth with @elukey and increasing the cpu resources in kartotherian deployment charts here is some numbers that are a bit more useful.

results["diff_latency_ms"] = 1000 * (results["elapsed_b"] - results["elapsed_a"])
quantiles = results.diff_latency.quantile([0.1, 0.2, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99])
print(quantiles.to_markdown())

A: kartotherian in current bare metal prod
B: kartotherian in prod k8s pod

quantileDifference in latency (in ms)
0.1-619.083
0.2-343.267
0.25-263.88
0.5-52.129
0.7549.224
0.9135.147
0.95195.49
0.99680.492

Tests are running ~1000 kartographer URLs with allowed concurrency 16 tests in parallel.

Also here is a histogram of the diference in latency:

image.png (1×1 px, 97 KB)

After testing the outliers on the higher end there is something wrong going with geoshapes rendering (timeout ?). Given how problematic geoshapes historically were it doesn't look like something we should worry.

Change #1116815 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/kartotherian@master] Make outgoing requests service mesh aware

https://gerrit.wikimedia.org/r/1116815

Change #1116815 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Make outgoing requests service mesh aware

https://gerrit.wikimedia.org/r/1116815

Change #1116833 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kartotherian: update Docker image and geoshapes yaml config

https://gerrit.wikimedia.org/r/1116833

Change #1116833 merged by Elukey:

[operations/deployment-charts@master] kartotherian: update Docker image and geoshapes yaml config

https://gerrit.wikimedia.org/r/1116833

Change #1116881 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/kartotherian@master] Fix handling of getJSON with service mesh endpoints

https://gerrit.wikimedia.org/r/1116881

Change #1116881 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Fix handling of getJSON with service mesh endpoints

https://gerrit.wikimedia.org/r/1116881

A: kartotherian in current bare metal prod
B: kartotherian in prod k8s pod

Latest difftesting run after fixing hanging connections of geoshapes:

quantilessim
0.050.987757
0.10.993571
0.20.997578
0.250.998374
0.51
0.751
0.91
0.951
0.991

From a quick look on the diffs on the lower side of similarity its mostly font issues because we use new fonts which introduce improvements overall.

Regarding latency:

results["diff_latency"] = 1000 * (results["elapsed_b"] - results["elapsed_a"])
quantiles = results.diff_latency.quantile([0.1, 0.2, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99])
print(quantiles.to_markdown())
diff_latency
0.1-184.014
0.2-86.6914
0.25-65.8558
0.59.0725
0.7573.6645
0.9137.611
0.95189.539
0.99710.617

And here is the histogram of the latency change.

image.png (884×1 px, 61 KB)

Here is the latency quantiles in ms for each A/B test run.

image.png (1×1 px, 219 KB)

So overall there is no big difference in latency in the k8s deployment

I things its pretty safe to continue with the migration. Closing this ticket for now. We can run the tests again in the future if its needed.

Jgiannelos claimed this task.
Jgiannelos attached a referenced file: F58317361: image.png. (Show Details)