Doing the others now:
The script finished on dewiktionary, and fixed all the rows with missing ar_sha1 values. I've undeleted the remaining revision on Böotien, and this bug should no longer affect undeletions on dewiktionary. I'll run the script for the other wikis tomorrow.
The testwiki run failed to remedy any of the 11 missing ar_sha1 entries because those archive rows point to broken text rows, with the external store URL DB://cluster20/0. I suspect this is the story behind most of the wikis that only have a handful of missing SHA1s. It's now running for dewikivoyage and that does seem to be doing something.
Mon, Jun 17
Fri, Jun 14
Yes, that's right.
Thu, Jun 13
It works fine for me on Android too:
Not all types of site notices work in no-JS, because some of them are based on stuff like geo location. However, if all our links/redirects that lead from the account creation / welcome survey flow to wherever users end up next add a query string parameter (say, ?showhomepagetour=1), then we could see that query string parameter on the server side and output a site notice into the HTML. No-JS users will then see that site notice. For JS users, that site notice HTML will also be delivered, and we can either let it stand, or hide it with CSS and instead show a GuidedTour (or display both).
Wed, Jun 12
Strangely, I can't reproduce this locally, even in a Vagrant setup with CentralAuth. Perhaps it's specific to having the master and replica be different servers.
I think this happened because the change made DatabaseMysqlBase::doSelectDomain() use executeQuery() whereas it previously used doQuery(). The former has the assertIsWritableMaster() check but the latter bypasses it. USE queries appear to be considered write queries (because they're not explicitly marked as read queries by the regex in Database::isWriteQuery()), so this failure started happening every time we try to switch databases in MySQL.
I believe https://gerrit.wikimedia.org/r/c/mediawiki/core/+/512043 is likely to be the culprit.
The first error of this type appeared at 20:05:59 UTC today (June 12)
I was going to take a look but can't because of T225682: Login, account creation, anything else that accesses global user throws DBReadOnlyRoleError in beta labs, so I'm going to investigate that first.
There are 1043 Logstash events in the last 30 days that match +exception.trace:"RecentChangeSaveHookHandler" and only 9 of them match +exception.trace:"RecentChangeSaveHookHandler". However, the other 1034 all have the same stack trace, which is the one from T225200: Fatal error during CirrusSearch-LinksUpdate job (CirrusTitleJob) from JobQueueGroup->push. (I checked that all of the non-ORES events match +exception.trace:"requeueError".) So they probably have the same root cause (or similar root causes).
The preference is labeled "Show Wikidata edits *by default* in recent changes". If you don't have it enabled, the default state of the RC filters when you open RC will be such that Wikidata edits aren't shown, but if you then change the filters they may end up being shown. I'm not sure how we could make this clearer.
Tue, Jun 11
Some of the deleted revisions for this page have ar_sha1 set to an empty string, which is what causes this error. For this page, this only affects the revisions dated November 2012 and earlier. The ones after November 2012 do have ar_sha1 set, and I was able to undelete those successfully.
(for wiki in $(cat ~/all.dblist); do echo -n "$wiki: "; echo "select count(*) from archive where ar_sha1='';" | analytics-mysql $wiki -N; done) | tee ~/empty-sha1 cat ~/empty-sha1 | grep -v ': 0'
Mon, Jun 10
Note that one of the reasons we put the name and the date in the header was to help make the header (more likely to be) unique, because with duplicate headers it's a pain to link to the right one. But that's only true for wikitext talk pages, not in Flow where every topic has an internal ID that's used for links. That's not to say that having dozens of topics on the same page all named "Help panel question" wouldn't still be confusing/annoying, so that might still be a valid reason to keep the username, or the date, or both.
Thu, Jun 6
The code from CentralNotice that does this: https://github.com/wikimedia/mediawiki-extensions-CentralNotice/blob/master/includes/CentralNoticeHooks.php#L230-L240
Wed, Jun 5
OK, if we can get away with not TTLing the data at all, that would certainly be simpler!
I've submitted a patch on top of Stephane's that does this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/514412
More than one day, probably close to a week. This also requires a combination of doing some thinking about how exactly to approach this, and getting a brain dump from Piotr.
Tue, Jun 4
I like Kosta's idea of showing a "your email is verified" message based on whether the user was redirected from Special:ConfirmEmail to the home page.
Yes, the user will lose their unsaved edit if they navigate to Special:ChangeEmail. Clicking the link would first show them a warning that they're about to lose unsaved content, and they have to click through that to leave the editor and go to Special:ChangeEmail. After they finish changing/setting their email address, they will see a link taking them back to the page they were on (because of returnto, as Marshall said), which will also open the editor again (because of returntoquery, which will be set to action=edit or something similar). I believe VisualEditor will recover the unsaved change in this case, but despite that I think that making the user navigate away during an edit session is bad and we should avoid it.
Wed, May 29
Thanks for cleaning this up, @Krinkle and @thcipriani. I heard about this problem during the hackathon, but forgot to work on it and then went on vacation. The way it was solved is exactly what I had in mind,
May 18 2019
May 17 2019
May 13 2019
This could be the same as / similar to the button in the email submodule of the account module on the home page. Technically that button is just a link, and it links to Special:ChangeEmail (not to preferences).
May 9 2019
This also applies to the help panel, which shares (much of) the relevant code with the help and mentorship dialogs.
Sounds sane to me; except that I think the scope might need to be extended to random other bits (beyond namespaces, log types and i18n) that we're not thinking of now but will discover when we try to undeploy an extension.
The "sitenotice" area mentioned in option B is where the Wiki Loves Earth banner is in the screenshot below. If no banner is shown, the sitenotice area is empty (so not easy to point out in a screenshot), but it still exists and we can put stuff in it.
We have three main options:
- A: Put something in the "chrome" area of the page (the upside-down-L-shaped area along the top and left edge of the screen), which includes the sidebar, personal links and tabs along the top
- B: Put something in the "sitenotice" area, which is where site notices, central notices and fundraising banners go
- C: Put something in the content area of the main page
May 8 2019
Migrating from one storage backend to another, and from individual keys to pairs, would be a bit complicated. @mobrovac suggested that it would be easier if we could do both of these migrations at the same time. One way to do that would be to create a config setting that takes an array of storage backends, and have the code try these in order when reading. That way we can configure it to read first from the new backend, then the old backend (and then the user preference, which we still have fallback code for today), but only write new values to the first one, and we could make the pair vs separate keys thing a property of each storage backend as well.
I ran the script above to recompute the notification counts for every user who could potentially have been affected by this (users who had at least one unread login-success notification). It took a while to run, but it finished some time yesterday. This should completely resolve the issue: new cases shouldn't happen (unless someone changes a notification type from being available on web to unavailable on web again), and existing cases should now be fixed. Nobody should be seeing wrong counts or phantom notifications anymore.
May 7 2019
Wrong bug, sorry. Neither of these patches addresses this bug, although 508488 is semi-related.
May 6 2019
As I was talking this through with @Krinkle on IRC, he pointed out that in WMF production, we only run rebuildLocalisationCache.php once, on the deployment server, and then distribute the CDB files it generates to the application servers(*). rebuildLocalisationCache.php doesn't run on each app server. Together with $wgLocalisationCacheConf['manualRecache'] = true, this means that LocalisationCache::recache() only runs once, on the deployment server, many seconds (even minutes) before any of the app servers have the new CDB files. Reverting 7f1a3bc742 will cause MessageBlobStore::clear() to be called at this time, but that's not very useful. The MBS cache will be invalidated, but then it'll just be repopulated with stale data.
Since the reason that clearing the MessageBlobStore from LocalisationCache was originally disabled is now moot (MBS isn't DB-based anymore), and since MessageBlobStore::clear() is just a touchCheckKey() call nowadays, I think we can safely revert rOMWC7f1a3bc742ed: Disable MessageBlobStore::clear() via hook. I will upload a patch to do that now.
It turns out MessageBlobStore's normal update mechanism for when messages change was disabled in the WMF config by rOMWC7f1a3bc742ed: Disable MessageBlobStore::clear() via hook, because of T29320: MessageBlobStore::clear() causes scaling problems on multi-server setups with CDB l10ncache. However, that bug might be moot since MessageBlobStore no longer uses the database. I think this probably broke when we stopped running LocalisationUpdate, because the only thing that appears to run refreshMessageBlobs.php (referenced in the comment in that config patch) is l10nupdate-1, which I don't believe we run anymore.
For some reason, this particular message has failed to update on every wiki that it's been deployed to. I'll force-update it in about half an hour, and will investigate why this happened after.