Investigate why there is a mismatch between six names and certain email address in mediawiki-identities data
Closed, ResolvedPublic

Description

<tl;dr> Update on 2017-03-31: This is due to corrupted data in Wikimedia Git. There is no issue in Grimoire code. We will not fix this. See T123643#3143480 for more info.


Probably another artefact related to T119755:

There are six IDs all with the very same email address of ttijhof, but with names of other people. They all have "scm" as their source:

d3af6e4d2efc077a6efa26aa0b3c9511504d18a8
e891edbbfc7a9f3f7b2027cfb9de44e35d16cfb9
136155e528cf630d61a8352ab1a80ae2ddbc7a03
3f893381f154b7a05b4c1271cdc413c3291e15f0
4db496181f424e564fb692aadaab97fdbec793f9
a8f40dc4f3f4f6f7e15ce6c604e3f96ede41976d

As @Lcanasdiaz wrote via email,

Those people where grouped in a single identity a few weeks ago. Yes, something strange happened in the past with that accounts and the result of our matching process seems a bit confusing. Besides that Alvaro has just told me we had issues with that identities two years ago, before the sorting hat age! So, we'll have a deep look to identify clearly the reason.

Aklapper created this task.Jan 14 2016, 6:21 PM
Aklapper updated the task description. (Show Details)
Aklapper raised the priority of this task from to Normal.
Aklapper added subscribers: Aklapper, Lcanasdiaz.
Dicortazar set Security to None.Jan 22 2016, 10:44 AM
Aklapper lowered the priority of this task from Normal to Low.Feb 26 2016, 11:06 AM
Aklapper added a project: DevRel-March-2016.
Aklapper added a subscriber: Dicortazar.
Qgil added a subscriber: Qgil.

I am tentatively removing this task from the Developer-Advocacy sprints. If you commit to work on it, please bring it back to the corresponding quarter. Thank you.

Aklapper lowered the priority of this task from Low to Lowest.Nov 23 2016, 3:03 PM
Aklapper removed Dicortazar as the assignee of this task.Jan 30 2017, 8:33 PM
Aklapper raised the priority of this task from Lowest to Low.Mar 3 2017, 4:02 PM

In b856c99b5dbb02f0ccfaf48f32d5789069f2d155 I changed the names of those six profiles (not: identities) to "$ABC $DEF with the email address of $XYZ".
In db86767b3eead724732d206dc507d96583d2e66f I deleted that email address from the blacklist.

After T157898 is fixed I hope that searching in the "Discover" section for author_uuid will help us investigate/debug and fix. (While continuing to hide the inconsistency does not. :P )

Aklapper renamed this task from Mismatch between six names and certain email address in mediawiki-identities data to Investigate why there is a mismatch between six names and certain email address in mediawiki-identities data.Mar 30 2017, 1:09 PM
Aklapper claimed this task.
Aklapper moved this task from Backlog to March on the Developer-Advocacy (Jan-Mar-2017) board.
Aklapper moved this task from Ready to Go to Doing on the Analytics-Tech-community-metrics board.
Aklapper closed this task as Resolved.

<tl;dr> This is wrong / corrupted data in Wikimedia Git. There is no issue in Grimoire code.
We now know that this is about <100 commits (neglectable), about 1 repository, and about commits in Sep-Oct 2012 only (ancient).
Finding out how to rewrite Git history or workarounds (=blacklisting) in Grimoire do not feel justified.
Hence declining.


Longer version:

136155e528cf630d61a8352ab1a80ae2ddbc7a03: Two commits in the mediawiki-config repo, on 2012-10-04 and 2012-10-24:

$:acko\> git remote show origin
* remote origin
  Fetch URL: ssh://aklapper@gerrit.wikimedia.org:29418/operations/mediawiki-config.git
$:acko\> git show b3a9d943c8084116ab7c49603419fa3b54919f32
commit b3a9d943c8084116ab7c49603419fa3b54919f32
Author: Leslie Carr <ttijhof@...>
Date:   Wed Oct 24 17:20:12 2012 +0000

    41355
    
    Change-Id: Ifb6f545ca9ffad988aa057f8db7bbc34b82ca728
$:acko\> git show 9ccdd75cef19b0fa205ab5c1e8685c1c24505119
commit 9ccdd75cef19b0fa205ab5c1e8685c1c24505119
Author: Leslie Carr <ttijhof@...>
Date:   Thu Oct 4 21:24:39 2012 +0000

    fixing account creation (grrr UTC time !)
    
    Change-Id: Ibfb33023474d7bc4f06af58d04c7c4fffaaac49f

d3af6e4d2efc077a6efa26aa0b3c9511504d18a8: One commit in the mediawiki-config repo, on 2012-10-29:

$:acko\> git show beb4da02c3abbbbf03b03284f77ba5ba72ebd1c9
commit beb4da02c3abbbbf03b03284f77ba5ba72ebd1c9
Author: Mark Bergsma <ttijhof@...>
Date:   Mon Oct 29 16:30:04 2012 +0000

    Add new bits servers
    
    Change-Id: Ife2fffaddccad9fde727f6d8c2ee20a2ea6e06d1

e891edbbfc7a9f3f7b2027cfb9de44e35d16cfb9: Same mediawiki-config repo, one commit on 2012-10-29.
3f893381f154b7a05b4c1271cdc413c3291e15f0: Same mediawiki-config repo, one commit on 2012-10-09.
4db496181f424e564fb692aadaab97fdbec793f9: Same mediawiki-config repo, 86 commits between 2012-09-27 and 2012-10-31.
a8f40dc4f3f4f6f7e15ce6c604e3f96ede41976d: Same mediawiki-config repo, 2 commits on 2012-10-09 and 2012-10-30.

I'm going to keep those specific identities (name vs other email address) whitelisted in the database and won't revert my commit as the problem looks neglectable when it comes to number of affected commits (<100), affected repositories (1), time (2012).

Aklapper updated the task description. (Show Details)Mar 30 2017, 1:11 PM
Qgil awarded a token.Apr 5 2017, 10:03 AM