Page MenuHomePhabricator

Office Wiki credentials inexplicably stop working
Closed, ResolvedPublic

Description

Hi all,

Recently we received four reports yesterday and this morning about people's office wiki passwords suddenly stopping working.

AVossbrinck_(WMF)
DDeJarnatt_(WMF)
JKelsoteel-WMF
DSeyfert_(WMF)

All of them say the password was working and then all of a sudden it was not. We don't believe this was user error and basic browser troubleshooting steps were taken. The issue was ultimately resolved through password resets, so it was not a big deal. But it would be good to determine root cause in case this comes up again.

Could there be a systematic explanation for this? Appreciate any help in looking into this.

Thanks!

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptApr 16 2025, 6:50 PM

Note: We just had another user complaint about this, username of AJayadi-WMF.

Thanks!

matmarex triaged this task as Unbreak Now! priority.Apr 16 2025, 7:07 PM
matmarex subscribed.

I tested it and my password (saved in a password manager) isn't working on Officewiki either. Definitely seems like a software problem.

This doesn't seem to affect wikis other than Officewiki. I can log in on normal production wikis, and on collab.wikimedia.org, which should be configured similarly to office.wikimedia.org.

Shouldn't be. Has anyone reported similar problems on any of those wikis? I don't have an account on any of them, so I can't test.

If this is train related, we could probably confirm by rolling just officewiki back to 1.44.0-wmf.24...

I also can't test this with verbose logging enabled, which might shed some light on the issue, because Logstash log intake is currently broken: T390215#10749665.

If this is train related, we could probably confirm by rolling just officewiki back to 1.44.0-wmf.24...

If it's something you can do easily, it would be worth trying.

Okay. I broke this. On the good side, there is nothing to be worry about in terms of security. On the bad side, I have to either do a recovery of password hashes of everyone or everyone has to do password reset. The context is T57420: Remove local wiki password hash when CentralAuth has attached account and I deleted password hashes of all group0 databases since they all are connected to centralauth and it should be central. Right? Right?? I learned it later and didn't go to group1 wikis but I thought it'd be just closed wikis but now I checked and it includes officewiki too. My bad. I fix it.

I just do a recovery of s3 and fix this.

@Ladsgroup Please log your maint script runs in the future, so that they are listed in https://wikitech.wikimedia.org/wiki/Server_Admin_Log โ€“ first thing I did here was to search that for 'password', and there is nothing.

Ladsgroup added a project: DBA.

@Ladsgroup Please log your maint script runs in the future, so that they are listed in https://wikitech.wikimedia.org/wiki/Server_Admin_Log โ€“ first thing I did here was to search that for 'password', and there is nothing.

Generally speaking. I tend to avoid it related to security matters but yeah.

@Ladsgroup

If I'm reading this correctly, that means some but not all office wiki users have had their passwords deleted? That would explain why the issue only seems to be affecting some folks and not everyone.

In either case, would the next step here just to recommend affected users use the password reset tool? Because if this isn't affecting everyone, I don't see why we should make it affect everyone by forcing company wide password resets for all office wiki accounts.

Some 40 people have reset their Officewiki passwords over the last 2 days: https://logstash.wikimedia.org/goto/798fc6af2da2c87893feb4b927ac35d0 (probably a few more, but data is missing due to T390215#10749665), so you might want to let them know that their previous password will be restored, or at least be aware in case they report problems again (@JLam-WMF).

I think I will just re-insert old password hash if it's empty. So if they have done a password reset, they keep the new one and if they haven't done a password rest, they'll have the old hash.

Ah, so we're reverting? Meaning the affected folks will be able to use their previous password again?

It's likely they will just end up using the password reset tool again.

Would it be possible to provide a list of the affected users?

Also, when is the change going into effect?

EDIT: disregard the above, just saw your comment @Ladsgroup .

So, that means the folks who have reset their password will keep theirs and the affected users who have not yet reset their passwords won't have to take any further action?

So, that means the folks who have reset their password will keep theirs and the affected users who have not yet reset their passwords won't have to take any further action?

Yes. That's the idea.

Progress update: Isolated the deletion binlogs. Writing a script to process it and produce the update queries

The context is T57420: Remove local wiki password hash when CentralAuth has attached account and I deleted password hashes of all group0 databases since they all are connected to centralauth and it should be central. Right? Right?? I learned it later and didn't go to group1 wikis but I thought it'd be just closed wikis but now I checked and it includes officewiki too. My bad. I fix it.

I wonder if there is somewhere that we can make it easier to see what wikis are actually in the train's group0/1/2? https://versions.toolforge.org/ has them listed (click the โ–บ after the MW version number), but that view is really compressed and difficult to scan for patterns. The dblists themselves like https://noc.wikimedia.org/conf/highlight.php?file=dblists/group0.dblist probably aren't a lot better.

The interesting thing in this particular case would be highlighting which wikis are "different" than the majority in the list. Maybe some fancy styling based on intersection with lists like private.dblist, fishbowl.dblist, ?special.dblist? I'm realizing in realtime that I don't know the exact purpose of some of these lists myself.

Progress update: Produced the update statements. Just testing it on my account to make sure it works.

Works just fine. About to run it on all users.

It should be done now. Can people confirm it fixes the issue?

The context is T57420: Remove local wiki password hash when CentralAuth has attached account and I deleted password hashes of all group0 databases since they all are connected to centralauth and it should be central. Right? Right?? I learned it later and didn't go to group1 wikis but I thought it'd be just closed wikis but now I checked and it includes officewiki too. My bad. I fix it.

I wonder if there is somewhere that we can make it easier to see what wikis are actually in the train's group0/1/2? https://versions.toolforge.org/ has them listed (click the โ–บ after the MW version number), but that view is really compressed and difficult to scan for patterns. The dblists themselves like https://noc.wikimedia.org/conf/highlight.php?file=dblists/group0.dblist probably aren't a lot better.

The interesting thing in this particular case would be highlighting which wikis are "different" than the majority in the list. Maybe some fancy styling based on intersection with lists like private.dblist, fishbowl.dblist, ?special.dblist? I'm realizing in realtime that I don't know the exact purpose of some of these lists myself.

+111 Specially a dblist that I've been needing for a while now is sul wikis (or opposite of it, non-sul wikis, I'd assume it's fishbowl + private but not sure). I can't find something like that in https://noc.wikimedia.org/conf/ maybe it's already there.

At some point, using dblists for minor things got discouraged for performance reasons. We could have documentation-only dblists but then there is always the risk that they get outdated.

SUL is $wmgUseCentralAuth which is everything not fishbowl/private, yeah.

If anyone else confirms that their old password works, I can close this ticket. Thanks!

At some point, using dblists for minor things got discouraged for performance reasons.

Do we have a place to see the discussion/reasoning? The thing is that I want to use them for table catalog T363581: Build a machine-readable catalogue of mariadb tables in production to avoid listing all wikis for ores tables or others.

We could have documentation-only dblists but then there is always the risk that they get outdated.

You can have a dblist made of dbexpr. e.g. https://noc.wikimedia.org/conf/highlight.php?file=dblists/group2.dbexpr so they wouldn't go out of sync that easily.

SUL is $wmgUseCentralAuth which is everything not fishbowl/private, yeah.

Thanks.

Yep, works now for me. Thank you!

Just to double check if any other wiki needing this:

ladsgroup@mwmaint1002:~$ comm -12  <(expanddblist '%% group0.dblist - closed.dblist') <(expanddblist '%% private.dblist + fishbowl.dblist')
officewiki
ladsgroup@mwmaint1002:~$

So we are good. My apologies for the inconvenience.

Change #1137087 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/mediawiki-config@master] dblists: Add sul.dbexpr and generated sul.dblist

https://gerrit.wikimedia.org/r/1137087

At some point, using dblists for minor things got discouraged for performance reasons.

Do we have a place to see the discussion/reasoning?

Not sure. Maybe @Jdforrester-WMF would know.

At some point, using dblists for minor things got discouraged for performance reasons.

Do we have a place to see the discussion/reasoning?

Not sure. Maybe @Jdforrester-WMF would know.

There was (is?) a measurable extra cost for each dblist (wmf-config is code executed on every incoming request to any MW endpoint), and the consensus between ServiceOps and Performance at the time was that driving down the timing of the cost of serving MW.

I vaguely think this is now less a concern in the MW-on-k8s universe, but I don't have data either way.

If that's still the case, I think we can get rid of a couple of dblists with a tiny bit of effort. For example: https://noc.wikimedia.org/conf/highlight.php?file=dblists/mediamoderation-continuous-scan.dblist is just one wiki. wikitech.dblist is just one wiki. etc.

Hi all,

Just a note. From the service desk standpoint, this is resolved as far as end users are concerned. Thanks!

At some point, using dblists for minor things got discouraged for performance reasons.

There was (is?) a measurable extra cost for each dblist (wmf-config is code executed on every incoming request to any MW endpoint), and the consensus between ServiceOps and Performance at the time was that driving down the timing of the cost of serving MW. [โ€ฆ]

This concern no longer exists as of 2022 after https://gerrit.wikimedia.org/r/816029, which introduced dblist-index.php backed by Opcache.

The only requirement is that any dblist used by wmf-config (as opposed to used from the command-line via db list expressions) must be listed in the DB_LISTS constant so that it is included in this cache. Other than that, have at it :)

Change #1137087 merged by jenkins-bot:

[operations/mediawiki-config@master] dblists: Add sul.dbexpr and generated sul.dblist

https://gerrit.wikimedia.org/r/1137087

Mentioned in SAL (#wikimedia-operations) [2025-04-28T20:20:22Z] <bd808@deploy1003> Started scap sync-world: Backport for [[gerrit:1137087|dblists: Add sul.dbexpr and generated sul.dblist (T392142)]], [[gerrit:1139474|Design Research Participant Survey: Increase Coverage (T392325)]], [[gerrit:1138859|Remove Search AB test config (T388719)]]

Mentioned in SAL (#wikimedia-operations) [2025-04-28T20:25:02Z] <bd808@deploy1003> dani, bwang, bd808: Backport for [[gerrit:1137087|dblists: Add sul.dbexpr and generated sul.dblist (T392142)]], [[gerrit:1139474|Design Research Participant Survey: Increase Coverage (T392325)]], [[gerrit:1138859|Remove Search AB test config (T388719)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-04-28T20:34:57Z] <bd808@deploy1003> Finished scap sync-world: Backport for [[gerrit:1137087|dblists: Add sul.dbexpr and generated sul.dblist (T392142)]], [[gerrit:1139474|Design Research Participant Survey: Increase Coverage (T392325)]], [[gerrit:1138859|Remove Search AB test config (T388719)]] (duration: 14m 34s)