Page MenuHomePhabricator

Community Relations support needed for several read-only windows (s2, s3, s4 and s8)
Open, HighPublic

Description

What is the problem?

Due to on-site maintenance (T226778) going on some of the racks that host our primary database servers, we need to switchover some of our current primary masters to other hosts, to ensure that there is not an unexpected downtime as result of this on-site maintenance.

This means we need read-only windows to perform this maintenance.

How can we help you?

Notifying the affected wikis (below) with the scheduled maintenance.

What does success look like?

Affected wiki users will get to know that there will be a period of read-only time (30 minutes requested, expected just a few minutes if everything goes fine)
Users will know that the impact is the that: writes will be blocked, and reads will remain unaffected

What is your deadline?

These are the windows, days, time and affected wikis:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Aug 20, 10:35 AM
Marostegui updated the task description. (Show Details)Tue, Aug 20, 10:41 AM
Marostegui updated the task description. (Show Details)
Johan added a subscriber: Trizek-WMF.EditedWed, Aug 21, 1:01 PM

OK.

I've added this to Tech News #36 which will go out on September 2 (no issue on Monday, all the usual writers are travelling and no one else picked up the torch). I've also posted on Project Chat on Wikidata as that is fairly soon.

In general, I recommend:

  • Posting on the Village Pump, to give a heads up, at least a couple of weeks prior to the read-only period. It would be good if this could be done for all wikis.
  • Mentioning it in Tech News. (This is being done.)
    • We can follow up in Tech News later, especially for September 10, 24 and 26. The wikis on September 24 are a) few and b) not Commons and Wikidata that are used by other wikis. If we want to remind the wikis on September 17, that's probably best done individually, outside of Tech News.
  • Set up banner maybe 30–60 minutes before the read-only period. This can be done by asking the communities or by a CentralNotice banner (ping @Trizek-WMF).

This is a routine by now, and works well, so the above should be enough.

I'll be OoO for two and half weeks starting tomorrow, so someone else will have to take this up from here on.

Johan added a comment.Wed, Aug 21, 1:04 PM

@Marostegui Something that would be helpful would be if we could have a simplified technical explanation (aimed at fairly non-technical audience) explaining why we need the read-only periods, so that we could explain to the communities what's happening and why they can't edit. Would you have the time to do that?

(If you feel uncomfortable writing for a non-technical audience: we can help with editing, once we're back.)

Sure @Johan, let me know if this is good enough or you need more or less detail.

There is an on-site maintenance at our primary datacenter, where the primary database masters are located. The maintenance will be specifically on those racks, and it involves plugging and unplugging servers from one power source to another, whilst all our servers have redundant power supplies and they will most likely not lose power at anytime, accidents can occur and if that happens, the affected server can go off.
Our primary database masters are the ones that get the edits, and replicate them to other hosts (replicas) where reads happen.

If any of our primary database masters lose power, we don't only lose the ability to edit until it is restored (it can take a few minutes for a server to boot up) but we could run into data corruptions (due to that abrupt crash).
To avoid running into any of those unexpected scenarios, we prefer to switch our master to a host located in a rack that won't be affected by this maintenance.
The reason the read-only time is required, is because we have to change the configuration to make mediawiki point to the new host, and that needs to be done while there are not edits happening to avoid the possibility of a split brain (an edit happening at the exact moment the master switch change is being propagated to all our mediawiki servers).

Johan added a comment.Wed, Aug 21, 5:20 PM

@Marostegui Thanks! The one additional detail some people keep asking about where it'd be great if I could give them a good answer would be what in our setup creates this problem, when they can't remember it happening for other websites.

That is a broader discussion.
Essentially, the architecture we have at the moment (both, systems and MW related) is thought to have only one active master at the time. So only one host receives writes at the time. Having more than one host allowing this, is something that we are discussing but requires lots of changes on both, our system's architecture and on MW code itself.

There is a huge number of organizations out there with a similar architecture, and the key is to do this switchover as fast as possible so your users don't get impacted that much. We have done tremendous improvements to be able to do this under 2 minutes (and we the latest changes we are aiming to do it under 1 minute now).

There is one thing to keep in mind here, and it is the fact that we do announce our read-only times, even though they are very short, some other organizations don't do so, and they let their users experiment an error, which is usually fixed the next time the write is attempted if the process is fast. So the fact that sometimes the feeling is that other websites doesn't suffer from this problem doesn't mean it is not there, it is handled in a different way.
In some organizations showing errors during a minute (or less) is fine and it is assumed as part of regular maintenance. We, however, prefer to announce our read-only times to make sure users and bots are aware so we don't create unexpected inconveniences.

Hope this helps!

Elitre assigned this task to Trizek-WMF.Fri, Aug 23, 8:46 AM
Elitre added a subscriber: Elitre.

I'll assign this temporarily to Trizek, although Johan has already done most of the diligence for the first read-only period, and I'll make sure that he's aware of the few pending tasks for that.

Trizek-WMF triaged this task as High priority.Thu, Aug 29, 2:42 PM

I've worked on the announce on Tech News.

Concerning the banners, it may be a bit more complicated, since we have multiple wikis families. I'll work on it tomorrow.

@Marostegui is this task only to coordinate our team support?

@Marostegui is this task only to coordinate our team support?

Yes :-)
The technical bits are on different tickets

Thanks for confirming! In this case, it'd be helpful to follow the structure suggested at https://office.wikimedia.org/wiki/Community_Relations#Public_requests_(standard) - we more or less know what it is that you may need from us, but more details for everyone else could also help - and thanks for the email headsup BTW!

Elitre renamed this task from Several read-only windows needed for: s2, s3, s4 and s8 to Community Relations support needed for several read-only windows (s2, s3, s4 and s8).Thu, Aug 29, 4:10 PM
Marostegui updated the task description. (Show Details)Thu, Aug 29, 4:15 PM

Thanks for confirming! In this case, it'd be helpful to follow the structure suggested at https://office.wikimedia.org/wiki/Community_Relations#Public_requests_(standard) - we more or less know what it is that you may need from us, but more details for everyone else could also help - and thanks for the email headsup BTW!

I have edited the task with that template.

Elitre rescinded a token.Thu, Aug 29, 4:24 PM
Elitre awarded a token.

Here is the Great Banners Matrix.
Each cell is a separate banner. Since we have on banner template and I don't want to multiply it, I will setup each banner once at the time.

10th Sept17th Sept24th Sept26th Sept
wikimediaam, be, br, ca, cn, co, dk, ec, et, fi, hi, id, il, mai, mk, mx, nl, no, nyc, nz, pa_us, pl, pt, punjabi, romd, rs, ru, se, tr, ua, wb
wikipediabg, cs, eo, fi, id, it, nl, no, pl, pt, sv, th, tr, zhaa, ab, ace, ady, af, ak, als, am, ang, an, arc, arz, ast, as, atj, av, ay, azb, az, bar, bat_smg, ba, bcl, be_x_old, be, bh, bi, bjn, bm, bn, bo, bpy, br, bo, bpy, br, bs, bug, bxr, cbk_zam, cdo, ce, cho, chr, ch, chy, ckb, co, crh, cr, csb, cu, cv, cy, da, din, diq, dsb, dty, dv, dz, ee, el, elm, et, eu, ext, fdc, ff, fiu_vro, fj, fo, frp, frr, fur, fy, gag, gan, ga, gd, glk, gl, gn, gom, gor, got, gu, gv, hak, ha, haw, hif, hi, ho, hr, hsb, ht, hy, hyw, hz, ia, ie, ig, ii, ik, ilo, inh, io, is, iu, jam, jbo, jv, kaa, kab, ka, kbd, kbp, kg, ki, kj, kk, kl, km, kn, koi, krc, kr, ksh, ks, ku, kv, kw, ky, lad, la, lbe, lb, lez, lfn, lg, lij, li, lmo, ln, lo, lrc, ltg, lt, lv, mai, mdf, mg, mhr, mh, min, mi, mk, ml, mn, mrj, mr, ms, mt, mus, mwl, myv, my, mzn, nah, nap, na, nds, ne, new, ng, nn, nov, nrm, nso, nv, ny, oc, olo, om, or, os, pag, pam, pap, pa, pcd, pdc, pfl, pih, pi, pms, pnb, pnt, ps, qu, rm, rmy, rn, roa_rup, roa_tara, rue, rw, sah, sat, sa, scn, sco, sc, sd, se, sg, shn, simple, si, sk, sl, sm, sn, so, sq, srn, ss, stq, st, su, sw, szl, ta, tcy, tet, te, tg, ti, tk, tl, tn, to, tpi, ts, tt, tum, tw, tyv, ty, udm, ug, ur, uz, vec, vep, ve, vls, vo, wa, wo, wuu, xal, xh, xmf, yi, yo, za, zea, zh_classical, zh_min_nan, zh_yue, zu
wikibooksak, ang, ar, ast, as, ay, az, ba, be, bg, bi, bm, bn, bo, bs, ca, ch, co, cs, cv, cy, da, de, el, en, eo, es, et, eu, fa, fi, fr, fy, ga, gl, gn, gn, got, gu, he, hi, hr, hu, hy, ia, id, ie, is, it, ja, ka, kk, km, kn, ko, ks, ku, ky, la, lb, li, ln, lt, lv, mg, mi, mk, ml, mn, mr, ms, my, nah, na, nds, ne, nl, no, oc, pa, pl, ps, pt, qu, rm, ro, ru, sa, se, simple, si, , sk, sl, sq, sr, , tr, su, sv, sw, ta, te, tg, th, tk, tl, tt, ug, uk, ur, uz, vi, vo, wa, xh, yo, za, zh_min_nan, zh, zu
wikinewsar, bg, bs, ca, cs, de, el, en, eo, es fa, fi, fr, he, hu, it, ja, ko, li, nl, no, pl, pt, ro, ru, sd, sq, sr, sv, ta, th, tr, uk, zh
wikiquoteenaa, af, ang, am, ar, ast, az, be, bg, bm, br, bs, ca, co, cr, cs, cy, da, de, el, eo, es, et, eu, fa, fi, fr, ga, gl, gu, he, hi, hr, hu, hy, id, is, itkk, kn, ko, kr, ks, ku, kw, ky, la, lb, li, lt, ml, mr, na, nds, nl, nn, no, pl, pt, qu, ro, ru, sah, sa, simple, sk, sl, sq, sr, su, sv, ta, te, th, tk, tr, tt, ug, uk, ur, uz, vi, vo, wo, za, zh_min_nan, zh
wikisourceang, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, gl, gu, he, hr, ht, hu, hy, id, is, it, ja, kn, ko, la, li, ltmk, ml, mr, nap, nl, no, ,pa, pl, pms, pt, ro, ru, sah, sa, sk, sl, sr, sv, ta, te, th, tr, uk, vec, vi, yi, zh_min_nan, zh
wikiversityar, cs, de, el, en, es, fi, fr, hi, it, ja, ko, pt, ru, sl, sv, zh
wikivoyagebn, de, el, es, fa, fi, fr, he, hi, it, nl, pl, ps, pt, ro, ru, sv, uk, vi, zh
wiktionarybg, enaa, ab, af, ak, am, ang, an, ar, ast, as, av, ay, az, be, bh, bi, bm, bn, bo, br, bs, ca, chr, ch, co, cr, csb, cs, cy, da, de, dv, dz, el, eo, es, et, eu, fa, fi, fj, fo, fy, ga, gd, gl, gn, gu, gv, ha, he, hif, hi, hr, hsb, hu, hy, ia, id, ie, ik, io, is, it, iu, ja, jbo, jv, ka, kk, kl, km, km, kn, ko, ks, ku, kw, ky, la, lb, li, ln, lo, lt, lv, mh, mi, mk, ml, mn, mr, ms, mt, my, nah, na, nds, ne, nl, nn, no, oc, om, or, pa, pi, pl, pnb, ps, pt, qu, rm, rn, roa_rup, ro, ru, rw, sa, scn, sc, sd, sg, sh, simple, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tpi, tr, ts, tt, tw, ug, uk, ur, uz, vec, vi, vo, wa, wo, xh, yi, yo, yue, za, zh_min_nan, zh, zu
global wikiswikidatamediawiki, outreach, speciescommons

Wikis that can't be covered:

  • arbcom
  • boardgovcom
  • board
  • beta
  • chairwiki
  • chapcomwiki
  • checkuserwiki
  • collabwiki
  • donate
  • electcomwiki
  • exec
  • noboard_chapters,
  • nostalgia
  • foundation
  • fixcopyright
  • grants
  • id_internal
  • incubator
  • internal
  • legalteam
  • login
  • map_bms
  • movementroles
  • nds_nl
  • nyc
  • pa_us
  • office
  • otrs
  • ombudsmen
  • quality
  • searchcom
  • strategy
  • stewards
  • spcom
  • ten
  • techconduct
  • iegcom
  • projectcom
  • all test wikis
  • transitionteam
  • usability
  • vote
  • wg_en

We can skip them, since they don't seem to have a lot of trafic, or being used by people who know how to use them. So the locked database message would be enough, with no prior warning.

Wikimania wikis are read-only, except wikimania.wikimedia.org. But since Wikimania is over, we can expect to only rely on the locked database message.

Znotch190711 raised the priority of this task from High to Unbreak Now!.Thu, Sep 5, 2:19 AM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptThu, Sep 5, 2:19 AM

@Znotch190711 why do you consider this as UBN?

RhinosF1 lowered the priority of this task from Unbreak Now! to High.Thu, Sep 5, 5:50 AM
RhinosF1 removed a subscriber: Liuxinyu970226.

Lowering - no reason for UBN! On a community relations task where the first impact is 5 days away and there getting on with what the task asks them to perfectly fine.

UBN is a drop everything and fix priority and no one needs to drop everything now to finish this.

Trizek-WMF added a comment.EditedThu, Sep 5, 2:41 PM

Banner set for Sept 10. Will be displayed from 04:30 to 05:30 UTC.

s8 (wikidata) has been done today:
read-only start: Tue Sep 10 05:00:47 UTC 2019
read-only stop: Tue Sep 10 05:02:14 UTC 2019

Total read-only time: 1 minute 27 seconds.

Thank you for the update here @Marostegui , much appreciated!

Marostegui updated the task description. (Show Details)Wed, Sep 11, 9:50 AM

3 banners set for September 17.

I'm warming up for the 24th! :)

Just for the record, I realised that the banner for today on itwiki is wrong:

A breve verrà svolta della manutenzione tecnica. 17 settembre - 05:00 AM UTC - 05:30 AM UTC
Durante tale intervallo potresti non riuscire a salvare alcuna modifica. (dalle 15:00 alle 15:15 UTC del 19 marzo 2019)

So the first part is correct, 17th Sept from 05:00AM-05:30AM UTC, but the second part looks wrong (19 march from 15:00 to 15:15 UTC).
Just saying it here in case you get some messages from people about it as it is a past date :-)

s2 switchover is done
read-only start: 05:00:44
read-only stop: 05:01:34

Total read-only time: 50 seconds.

Marostegui updated the task description. (Show Details)Tue, Sep 17, 5:11 AM