Page MenuHomePhabricator

Check detached accounts in DB with same username for "mediawiki" and "phab" sources but different uuid's (and merge if connected)
Closed, DeclinedPublic

Description

Duplicate usernames in current DB (get usernames and their uuids):

  • SELECT mw.username, DISTINCT(mw.uuid) FROM identities p JOIN identities mw WHERE mw.source = "mediawiki" AND p.source = "phabricator" AND mw.uuid != p.uuid AND mw.username = p.username AND mw.username IS NOT NULL AND p.username IS NOT NULL ORDER BY mw.username;
#!/bin/bash
mwusername=( username1 username2 username3 from sql query above)
uuid=( uuid1 uuid2 uuid3 from sql query above)
arrayLength=${#mwusername[@]}
# Output "jq: error (at <stdin>:0): Cannot iterate over null (null)" means that the mwusername has no associated Phab account
# Empty output means that the Phab username has no associated mw (but LDAP) account and hence cannot be queried by that
for (( i=0; i<${arrayLength}; i++ ));
do
  curl -s https://phabricator.wikimedia.org/api/user.mediawikiquery -d api.token=yourapitoken -d names[0]="${mwusername[$i]}" | jq '.result[] | .userName,.mediawiki_username'
  echo "https://wikimedia.biterg.io/identities/hatstall/${uuid[$i]}"
  sleep 10
done

Then clean up output.
Then if same name, merge (needs more thought how to do exactly)

Event Timeline

Aklapper changed the task status from Open to Stalled.Sep 4 2017, 4:27 PM

Stalled on an updated DB dump

...and using the usernames in the output of that script again as mw.username for four followup queries, to quickly construct sortinghat merge commands:

#Phab connected account is 'stronger' (got more sources attached as id != uuid):
SELECT mw.uuid,p.uuid FROM identities p JOIN identities mw WHERE mw.source = "mediawiki" AND p.source = "phabricator" AND mw.uuid != p.uuid AND mw.username = p.username AND mw.username IS NOT NULL AND p.username IS NOT NULL AND mw.id = mw.uuid AND p.id != p.uuid AND (mw.username = "123" OR mw.username = "456") ORDER BY mw.username;

#mw connected account is 'stronger' (got more sources attached as id != uuid):
SELECT p.uuid,mw.uuid FROM identities p JOIN identities mw WHERE mw.source = "mediawiki" AND p.source = "phabricator" AND mw.uuid != p.uuid AND mw.username = p.username AND mw.username IS NOT NULL AND p.username IS NOT NULL AND mw.id != mw.uuid AND p.id = p.uuid AND (mw.username = "123" OR mw.username = "456") ORDER BY mw.username;

#both are isolated and equally strong, hence prefer Phab one with potential real name:
SELECT mw.uuid,p.uuid FROM identities p JOIN identities mw WHERE mw.source = "mediawiki" AND p.source = "phabricator" AND mw.uuid != p.uuid AND mw.username = p.username AND mw.username IS NOT NULL AND p.username IS NOT NULL AND mw.id = mw.uuid AND p.id = p.uuid AND (mw.username = "123" OR mw.username = "456") ORDER BY mw.username;

#and the rest to be investigated:
SELECT mw.uuid,p.uuid FROM identities p JOIN identities mw WHERE mw.source = "mediawiki" AND p.source = "phabricator" AND mw.uuid != p.uuid AND mw.username = p.username AND mw.username IS NOT NULL AND p.username IS NOT NULL AND mw.id != mw.uuid AND p.id != p.uuid AND (mw.username = "123" OR mw.username = "456") ORDER BY mw.username;
Aklapper moved this task from Ready to Go to March on the Developer-Advocacy (Jan-Mar-2018) board.

The number of such accounts is pretty low nowadays (200) and as we concentrate on development stats this issue does not deserve its own task.

This query can be occasionally run; I've added it to https://www.mediawiki.org/wiki/User:AKlapper_(WMF)/Bitergia_data_quality_queries