Page MenuHomePhabricator

Unable to edit external-ids ref in notedb due to validation
Open, Stalled, MediumPublic

Description

Gerrit accounts were recently migrated to a new storage mechanism called notedb which stores data directly in git. When attempting to resolve T197083 I manually checked out the external-ids ref from the All-Users database in gerrit. I then made a commit to remove a duplicate email id from the external-ids ref. When attempting to push my change back to gerrit, my push was rejected due to many validation errors.

Apparently the migration created a notedb with a lot of invalid duplicate email records and resolving the issue will require a ton of disambiguation of the date.

I received a total of 283 errors, most of them are duplicate email errors resembling the following pattern:

remote: error: Email 'user@wikimedia.org' is not unique, it's used by the following external IDs: 'gerrit:user', 'mailto:user@wikimedia.org'

Event Timeline

mmodell created this task.
$ ssh -p 29418 gerrit.wikimedia.org 'gerrit gsql'
gerrit> select * from account_external_ids where email_address='jbranaa@wikimedia.org';
 account_id | email_address         | password | external_id
 -----------+-----------------------+----------+----------------
 59yy       | jbranaa@wikimedia.org | NULL     | gerrit:jbranaa
 45xx       | jbranaa@wikimedia.org | NULL     | gerrit:jrbranaa

Note how the same email is known with two different ids.

Some accounts have a mailto: external_id so I guess they are out of scope. In the end I came up with:

gerrit> select email_address, count(*) as occurences, group_concat(external_id order by account_id)  from account_external_ids where external_id LIKE 'gerrit:%' group by email_address having occurences > 1;
...
(114 rows)

There is at least one email that have four accounts and couple more addresses having three accounts.

@hashar that is old data :), it’s stored in All-Users now which you have to clone a repo.

See https://phabricator.wikimedia.org/T197083#4280194

You would also what to grep specific text to be able to find the user too.

Yup I got that Gerrit is now relying on NoteDB for the accounts data, but at least the database give us a snapshot of the state pre migration. I guess the conversion from ReviewDB to NoteDB has not been straightforward and cause a wild bunch of issues and errors.

If we can identify a pass to fix up the accounts, there is apparently "only" a hundred of them being affected. That might be fixable without too many troubles.

Ah I see, if it’s not too much trouble yeh we could fix it :).

Upstream has wrote a reply https://bugs.chromium.org/p/gerrit/issues/detail?id=9256#c10

Questions from upstream

Sorry, I don't understand this bug report.
Can you please take some time and write a clear summary, saying
- What is the issue that the user is facing?
- How is the user issue related to external IDs?
- How are the external IDs looking like in NoteDb?
- How do you intend to change them so that the problem gets fixed?

If we are talking about external IDs with the same email, this is an inconsistency and not a corruptions as the title of the issue suggests.

In case of external IDs with duplicate emails, the following information is important:
1. To which external ID schemes do these external IDs belong to?
2. What kind of authentication do you use? 
3. Do you have any plugins that create external IDs?
4. Do the external IDs belong to different accounts?
5. Are these accounts active?
6. When were these accounts registered?
7. When were the external IDs in question assigned to the accounts?
   E.g. where both external IDs assigned during the schema migration to NoteDb
   or was the email first assigned to one account and later assigned to a second account?
8. Do you have a backup of your ReviewDb before the external IDs were migrated to NoteDb? Can you check if the external IDs with the duplicated emails already existed back then?

Lets do the analysis on our Phabricator task based on upstream questions. Then we can come back to them with a detailed bug report. At least at first glance there is no duplicate external id, but we have some emails associated to multiple accounts. That was already the case in ReviewDB as one can see by looking at the accounts_external_ids table. I guess the migration did not take that in account, and since Gerrit 2.15 no enforce uniqueness of emails address we get screwed up.

Probably some accounts id are not used at all and might be deletable. For other accounts we would have to merge using carefully crafted queries that update the use of some account id from one account to another (user merging).

@hashar: yeah, essentially the problem is that migration didn't catch the duplicates but now gerrit has changed to enforcing the email uniqueness so we're stuck. I don't think it's going to be easy to fix, there are a lot of dupes and no tools available to repair the situation.

thcipriani lowered the priority of this task from High to Medium.Aug 29 2018, 4:06 PM
thcipriani subscribed.

@mmodell found a way for gerrit admins to be able to edit, although not ideal it does lower the priority of this task.

thcipriani changed the task status from Open to Stalled.Jan 22 2021, 9:25 PM