Page MenuHomePhabricator

switchdc cache warmup should include URLs that warmup relevant Wikidata caches
Closed, ResolvedPublic

Description

s8 was struggling today after the DC switchover, and it was suggested that we add some URLs to the cache warmup for Wikidata.

(log trimmed)

07:53:09 <_joe_> marostegui: do we have dbs suffering?
07:53:15 <@marostegui> yes, s8
07:54:08 <Amir1> (once this is all done): marostegui let's talk about if there is anything on wikibase side is needed
07:56:06 <jynus> is it Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::selectTermsViaJoin ?
07:56:34 <Amir1> that's termstore replacement
07:56:42 <Amir1> I think because cache is cold
07:56:48 <Amir1> (memcached for term store)
07:56:52 <jynus> mine it is just a guess because it is the only query I am seeing
07:57:04 <jynus> on the heavy hit servers
07:57:34 <Amir1> maybe for the next dc switch we should "warm it up" beforehand
07:57:41 <_joe_> Amir1: yes definitely
07:57:50 <jynus> I see a few fetchterms too
07:57:51 <@legoktm> we can add some Wikidata URLs to the warmup
07:58:03 <@marostegui> legoktm: that'd be nice indeed
07:58:20 <Amir1> legoktm: it's a common misconception: most of reads on s8 are not coming from wikidata.org
07:58:30 <Amir1> this needs some parsing
07:59:17 <@legoktm> Amir1: "URLs that warmup the relevant Wikidata paths :)"
07:59:25 <Amir1> better :D

The URLs that are currently hit as part of the warmup process are located at https://gerrit.wikimedia.org/g/operations/puppet/+/545b517cea678d8fdeadeb6051e0f3757bd4ebff/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

https://grafana.wikimedia.org/d/000000548/wikibase-sql-term-storage?orgId=1&from=1631626432679&to=1631638695040

image.png (995×1 px, 146 KB)

It is a bit scary that cache hit reducing to 94% could bring down s8 but I guess there is no way around it.

Correct me if I am wrong, but this would also be solved by this cache using WANCache rather than just a BagOStuff?

Marostegui triaged this task as Medium priority.Sep 15 2021, 1:51 PM
Marostegui moved this task from Triage to Blocked on the DBA board.

Just to mention that this is a nice-to-have indeed but not something that brought us down or anything close to that - it just required some load switching on the load balancer to help the latency to improve.
We are now in a much more better position that we were a few switchovers ago as we have added (in the last couple of years) additional capacity on our s8 (wikidata) section.

I don't know how much work this requires to make it happen but it is certainly something that would be a benefit especially as wikidata keeps growing and growing (yay!).

Correct me if I am wrong, but this would also be solved by this cache using WANCache rather than just a BagOStuff?

My understanding is that WANCache only sets values in the current DC, but invalidation is relayed to both DCs. Main stash would set the values across both DCs, see https://www.mediawiki.org/wiki/Manual:Object_cache#Main_stash

My understanding is that WANCache only sets values in the current DC, but invalidation is relayed to both DCs. Main stash would set the values across both DCs, see https://www.mediawiki.org/wiki/Manual:Object_cache#Main_stash

Right, and reading that we probably don't want to use Main stash for this.

Change 737498 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] [WIP] mediawiki-cache-warmup: Add support for POST requests

https://gerrit.wikimedia.org/r/737498

Hi @karapayneWMDE @Addshore and @Lydia_Pintscher. Can your team take a look at this? i made a non-working PoC for it already.

i made a non-working PoC for it already.

Where? :)

i made a non-working PoC for it already.

Where? :)

The patch attached to this ticket I guess ;)

@RLazarus You were working on rewriting the app server warmup script in Python, I suppose this is directly related.

Considering we are in a multi-dc state now, impact on switchover should be less, but @Marostegui @Ladsgroup do you think work needs to happen on this before T327920: March 2023 Datacenter Switchover ?

I personally don't think it's needed anymore due to multidc, but Manuel might disagree.

From the DB point of view, we don't really need this anymore.

Agree we don't need it in order to switch the RW site to codfw.

Are we also planning to depool eqiad for reads, and serve fully from codfw for any of the switchover period? If so, will we need cache warming in eqiad before repooling?

Are we also planning to depool eqiad for reads, and serve fully from codfw for any of the switchover period? If so, will we need cache warming in eqiad before repooling?

According to https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Overall_switchover_flow we'll leave eqiad completely depooled at the traffic level for 1 week, followed by 7 weeks of multi-DC with eqiad as the RO datacenter.

If we are going to leave eqiad fully depooled for a week...can we do the switches operation during that time or leave eqiad depooled for longer and fit them there? @ayounsi thoughts?

It's technically possible but a more aggressive schedule would require the buy-in from everybody (including services that only live in eqiad like analytics).
We can see it as "ripping the band-aid" vs. going carefully.

In that case it sounds like yes, we do need cache warming in eqiad before repooling it -- and we'll need to add URLs to warm up s8, per this task.

The Python rewrite doesn't have to be related -- for now, the new script will read URLs from the same files as the old script (urls-cluster.txt and urls-server.txt) so feel free to add entries there without waiting for me. That might change, but if so I'll coordinate, so in the meantime don't worry about it.

The urls that could make reads to s8 are mostly POSTs, the s8 warmup doesn't mean wikidata.org reads here. It means reparsing Hubble Space Telescope in English Wikipedia and reparse several categories in Commons. So the warmup script needs to be able to do POST reqs. That's the complicating factor, otherwise I would have done it last time.

POSTs will probably have to wait for the Python rewrite, but then they'll be easy. Can you recommend specific requests?

Yeah, rewrite in python sounds like a great idea. Stuff like:

(POST so they would actually reparse the page). This should probably happen in parallel on every appserver to warm up the APCu cache too.

Does Hubble use a lot of Wikidata data? Its infobox wikitext is fairly long, and South Pole Telescope is our more common Wikidata showcase page, where the infobox in wikitext is just {{Infobox telescope|refs=yes|suppressfields=namedafter}}. (The Commons category for Hubble probably uses Wikidata more, though.)

Change 892569 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] mediawiki-cache-warmup: Rename `Request` to `Task`

https://gerrit.wikimedia.org/r/892569

Change 892570 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] mediawiki-cache-warmup: Add POSTs

https://gerrit.wikimedia.org/r/892570

Sigh, this is not very useful:

mysql:research@s1-analytics-replica.eqiad.wmnet [enwiki]> select page_title, page_namespace, count(*) from wbc_entity_usage join page on eu_page_id = page_id where page_namespace = 0 group by eu_page_id order by count(*) desc limit 50;
+----------------------------------------------------------------+----------------+----------+
| page_title                                                     | page_namespace | count(*) |
+----------------------------------------------------------------+----------------+----------+
| List_of_lighthouses_in_China                                   |              0 |     2209 |
| List_of_dams_in_Hokkaido                                       |              0 |     2075 |
| List_of_lighthouses_in_Scotland                                |              0 |     1666 |
| List_of_Polish_mathematicians                                  |              0 |     1363 |
| List_of_crossings_of_the_River_Thames                          |              0 |     1324 |
| List_of_learned_societies                                      |              0 |     1287 |
| Ying_Fan_Reinfelder                                            |              0 |     1260 |
| List_of_lighthouses_in_Argentina                               |              0 |     1216 |
| Lidia_Morawska                                                 |              0 |     1007 |
| Listed_buildings_at_the_University_of_Leeds                    |              0 |      921 |
| Lightvessel_stations_of_Great_Britain                          |              0 |      816 |
| List_of_learned_societies_in_the_United_Kingdom                |              0 |      774 |
| List_of_dams_in_Akita_Prefecture                               |              0 |      742 |
| List_of_parks_and_open_spaces_in_the_London_Borough_of_Croydon |              0 |      731 |
| List_of_dams_in_Yamaguchi_Prefecture                           |              0 |      706 |
| List_of_dams_in_Toyama_Prefecture                              |              0 |      696 |
| List_of_dams_in_Yamagata_Prefecture                            |              0 |      673 |
| List_of_dams_in_Hyogo_Prefecture                               |              0 |      652 |
| List_of_dams_in_Gifu_Prefecture                                |              0 |      638 |
| List_of_monuments_in_Meknes                                    |              0 |      629 |
| List_of_lighthouses_in_Madagascar                              |              0 |      617 |
| Catenin_beta-1                                                 |              0 |      614 |
| List_of_monuments_in_Rabat,_Morocco                            |              0 |      591 |
| List_of_dams_in_Hiroshima_Prefecture                           |              0 |      565 |
| List_of_dams_in_Chiba_Prefecture                               |              0 |      562 |
| List_of_lightvessels_of_Great_Britain                          |              0 |      557 |
| List_of_dams_in_Nagano_Prefecture                              |              0 |      547 |
| List_of_dams_in_Kagawa_Prefecture                              |              0 |      539 |
| TGF_beta_1                                                     |              0 |      533 |
| List_of_lighthouses_in_Japan                                   |              0 |      521 |
| Bone_morphogenetic_protein_4                                   |              0 |      507 |
| List_of_mountain_peaks_by_prominence                           |              0 |      503 |
| List_of_dams_in_Fukuoka_Prefecture                             |              0 |      499 |
| Notch_1                                                        |              0 |      491 |
| Sonic_hedgehog_protein                                         |              0 |      487 |
| P53                                                            |              0 |      483 |
| Protein_Wnt-5a                                                 |              0 |      483 |
| Dopamine_receptor_D2                                           |              0 |      481 |
| List_of_dams_in_Shimane_Prefecture                             |              0 |      480 |
| AKT1                                                           |              0 |      475 |
| List_of_learned_societies_in_the_United_States                 |              0 |      468 |
| List_of_crossings_of_the_Danube                                |              0 |      467 |
| List_of_monuments_in_Marrakesh                                 |              0 |      461 |
| List_of_dams_in_Fukushima_Prefecture                           |              0 |      459 |
| Proto-oncogene_tyrosine-protein_kinase_Src                     |              0 |      457 |
| Epidermal_growth_factor_receptor                               |              0 |      454 |
| List_of_historic_places_in_the_Edmonton_Capital_Region         |              0 |      451 |
| CALM2                                                          |              0 |      450 |
| HLA-B                                                          |              0 |      439 |
| List_of_dams_in_Iwate_Prefecture                               |              0 |      437 |
+----------------------------------------------------------------+----------------+----------+
50 rows in set (2 min 59.544 sec)

Probably we should pick something that has the highest number of properties.

Change 892569 merged by RLazarus:

[operations/puppet@production] mediawiki-cache-warmup: Rename `Request` to `Task`

https://gerrit.wikimedia.org/r/892569

Change 892570 merged by RLazarus:

[operations/puppet@production] mediawiki-cache-warmup: Add POSTs

https://gerrit.wikimedia.org/r/892570

We can't really test this without draining a DC, but we'll find out at the next switchover whether this is still a problem, and if so we can open a new task. Resolving with bold optimism in the meantime.

Change 737498 abandoned by Ladsgroup:

[operations/puppet@production] [WIP] mediawiki-cache-warmup: Add support for POST requests

Reason:

Superseded by better patches

https://gerrit.wikimedia.org/r/737498