Page MenuHomePhabricator

Some PostgreSQL replicas are not fully updated
Closed, ResolvedPublic

Description

Since approximately mid-November on enwiki, Mapframe maps have not always been displaying markers when they should. See https://en.wikipedia.org/wiki/Module_talk:Mapframe#Maplink_markers_not_appearing_in_article_space for reports from users, but basically, for the same article they are only rendering some times or for some users. For example, they appeared when editing but not when saved, or appeared for some users but not others, or can be made to appear by saving an edit. Possibly they are only working when rendered by some servers but not others?

Event Timeline

@Evad37 I've tested every machine currently serving production traffic without caching, it looked fine for me. Maybe it was fixed after T267296#6609192. Is someone still able to reproduce it?

I also observe that lately it takes hours or days after page save until map content matching dynamic view becomes available on snapshot image. E.g. in this article map was added three days ago and on snapshot I currently still don't see OSM object that is auto-positioned in dynamic view. Or, in this article I observed the same yesterday, while now snapshot highlights the OSM object, but the object has become missing in dynamic view.

Could the issue be similar to T266807? Per Grafana 100% CPU usage remains to be more or less an issue, If I'm not mistaken. I suppose users seeing different output is due to different level of unstability in eqiad and codfw?

I also observe that lately it takes hours or days after page save until map content matching dynamic view becomes available on snapshot image. E.g. in this article map was added three days ago and on snapshot I currently still don't see OSM object that is auto-positioned in dynamic view. Or, in this article I observed the same yesterday, while now snapshot highlights the OSM object, but the object has become missing in dynamic view.

Could the issue be similar to T266807? Per Grafana 100% CPU usage remains to be more or less an issue, If I'm not mistaken. I suppose users seeing different output is due to different level of unstability in eqiad and codfw?

@Pikne, this specific case was cache, but thank you for reporting, maybe we shouldn't cache geoshape responses too long.

Thanks to @Evad37 for reporting. I just want to add that the marker/shape sometimes does show initially in article space, but subsequently it can disappear and then reappear. That's even after I link the Wikidata item to the OSM item and it does that initial cache which shows it. This article for instance I am sure showed it previously, but doesn't presently.

Thanks for your assistance.

MSantos renamed this task from Map markers not always being displayed to Purge maps.wikimedia.org/geoshapes and /geoline cache to fix map markers not always being displayed.Dec 2 2020, 1:34 PM
MSantos added a project: Traffic.

@The_Equalizer it looks like there are still some cache from the maps outage, I'm tagging Traffic to help us understand if it's possible to clean-up cache from maps.wikimedia.org/geoshape and /geoline endpoints created before T267296#6609192 (Nov 6~7th).

The max lifetime of any object in the Traffic CDN is 24 hours. Are you sure they're being cached there? Can you give a full example URL?

The max lifetime of any object in the Traffic CDN is 24 hours. Are you sure they're being cached there? Can you give a full example URL?

The full example URL is https://maps.wikimedia.org/geoshape?getgeojson=1&ids=Q2073436

I just tested every machine in production (after your comment) and it looks like maps1001.eqiad.wmnet is the one serving the same content cached for this URL, all other machines are serving a different output (the correct one)

MSantos renamed this task from Purge maps.wikimedia.org/geoshapes and /geoline cache to fix map markers not always being displayed to Some PostgreSQL replicas are not fully updated.Dec 2 2020, 1:58 PM

Sounds like you chaps have a very quick handle on things - great stuff and keep up the good work.

jbond triaged this task as Medium priority.Dec 9 2020, 12:00 PM
jbond added a subscriber: hnowlan.

maps1001 is depooled and resyncing.

maps1001 is now in sync and serving data consistent with the other nodes.