Page MenuHomePhabricator

Explain inconsistent numbers between number widgets and list widgets (some accounts counted twice)
Closed, ResolvedPublic

Description

Steps:

  1. Go to https://wikimedia.biterg.io/goto/9c7f4ebdac33ec67030ffcc97fe598b5 (Gerrit Overview; all of Sep 2017)
  2. In the gerrit_main_numbers widget at the top, see "218 # Changeset Submitters" displayed
  3. In the gerrit_top_developers widget at the bottom, click "Export: Formatted"
  4. Open the resulting CSV file in LibreOffice

Expected outcome:
218 entries
Actual outcome:
220 entries

Upstreamed as https://gitlab.com/Bitergia/c/Wikimedia/support/issues/7

Testcase:
On https://wikimedia.biterg.io/goto/ead16eaea45dd8ec20746803fa66bc9b I get "59 new authors".
Entering author_name:"Jgleeson" as the search scope, I get "2 new authors".
Though only one author is displayed in the C_Gerrit_Demo_Table widget.

Note that the org pie displays both Independent and Wikimedia Foundation, maybe that is related.

Event Timeline

Aklapper triaged this task as Medium priority.Jan 11 2018, 4:44 PM
Aklapper created this task.
Aklapper updated the task description. (Show Details)
Aklapper renamed this task from Number of changeset submitters in "gerrit_main_numbers" widget differs from number of submitters in "gerrit_top_developers" widget to One account (in "gerrit_top_developers" widget) counted as two accounts (in "gerrit_main_numbers" widget).Jan 15 2018, 1:28 PM
Aklapper updated the task description. (Show Details)

Another example: https://wikimedia.biterg.io/goto/5c35a039b8e89180d3b5cfd2311d4de4 at the top says "58 new authors". Exporting the list of names in the "New Authors" widget at the bottom as CSV, it only has 46 entries.

This has improved but there is still one out of four cases left which has inconsistent data.

Upstreamed as https://gitlab.com/Bitergia/c/Wikimedia/support/issues/7

This task is not assigned to anyone. Is it committed to this quarter?

Aklapper added a subscriber: Lcanasdiaz.

Reflecting https://gitlab.com/Bitergia/c/Wikimedia/support/issues/7 and assigning to @Lcanasdiaz as per his comment "I guess we are missing an unique count there. We're working on this."

Qgil raised the priority of this task from Medium to High.Feb 27 2018, 12:41 PM
Qgil moved this task from Backlog to Ready to Go on the Developer-Advocacy (Jan-Mar-2018) board.
Aklapper renamed this task from One account (in "gerrit_top_developers" widget) counted as two accounts (in "gerrit_main_numbers" widget) to Inconsistent numbers between number widgets and list widgets; some accounts counted twice.Apr 9 2018, 10:09 AM

Other example:

  1. Go to https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit for timeframe 20180101 to 20180331 at https://wikimedia.biterg.io/goto/99b01d3abfa3d3188eb449b0904aa19d
  2. See "330 # Changeset Submitters" in top box
  3. Replace * by NOT name:PortalsBuilder in search text box

Expected outcome: See "329 # Changeset Submitters" in top box

Actual outcome: See "328 # Changeset Submitters" in top box

238482n375 lowered the priority of this task from High to Lowest.
238482n375 moved this task from Next Up to In Code Review on the Analytics-Kanban board.
238482n375 edited subscribers, added: 238482n375; removed: Aklapper.

SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLgoKICAgIFNlYXJjaAoKQ3JlYXRlIFRhc2sKTWFuaXBoZXN0ClQxOTcyODEKRml4IGZhaWxpbmcgd2VicmVxdWVzdCBob3VycyAodXBsb2FkIGFuZCB0ZXh0IDIwMTgtMDYtMTQtMTEpCk9wZW4sIE5lZWRzIFRyaWFnZVB1YmxpYwoKICAgIEVkaXQgVGFzawogICAgRWRpdCBSZWxhdGVkIFRhc2tzLi4uCiAgICBFZGl0IFJlbGF0ZWQgT2JqZWN0cy4uLgogICAgUHJvdGVjdCBhcyBzZWN1cml0eSBpc3N1ZQoKICAgIE11dGUgTm90aWZpY2F0aW9ucwogICAgQXdhcmQgVG9rZW4KICAgIEZsYWcgRm9yIExhdGVyCgpUYWdzCgogICAgQW5hbHl0aWNzLUthbmJhbiAoSW4gUHJvZ3Jlc3MpCgpTdWJzY3JpYmVycwpBa2xhcHBlciwgSkFsbGVtYW5kb3UKQXNzaWduZWQgVG8KSkFsbGVtYW5kb3UKQXV0aG9yZWQgQnkKSkFsbGVtYW5kb3UsIEZyaSwgSnVuIDE1CkRlc2NyaXB0aW9uCgpPb3ppZSBqb2JzIGhhdmUgYmVlbiBmYWlsaW5nIGF0IGxlYXN0IGEgZmV3IHRpbWVzIGVhY2guIE1vcmUgaW52ZXN0aWdhdGlvbiBuZWVkZWQuCkpBbGxlbWFuZG91IGNyZWF0ZWQgdGhpcyB0YXNrLkZyaSwgSnVuIDE1LCA3OjIxIEFNCkhlcmFsZCBhZGRlZCBhIHN1YnNjcmliZXI6IEFrbGFwcGVyLiC3IFZpZXcgSGVyYWxkIFRyYW5zY3JpcHRGcmksIEp1biAxNSwgNzoyMSBBTQpKQWxsZW1hbmRvdSBjbGFpbWVkIHRoaXMgdGFzay5GcmksIEp1biAxNSwgNzoyMiBBTQpKQWxsZW1hbmRvdSB1cGRhdGVkIHRoZSB0YXNrIGRlc2NyaXB0aW9uLiAoU2hvdyBEZXRhaWxzKQpKQWxsZW1hbmRvdSBhZGRlZCBhIHByb2plY3Q6IEFuYWx5dGljcy1LYW5iYW4uCkpBbGxlbWFuZG91IG1vdmVkIHRoaXMgdGFzayBmcm9tIE5leHQgVXAgdG8gSW4gUHJvZ3Jlc3Mgb24gdGhlIEFuYWx5dGljcy1LYW5iYW4gYm9hcmQuCkNoYW5nZSBTdWJzY3JpYmVycwpDaGFuZ2UgUHJpb3JpdHkKQXNzaWduIC8gQ2xhaW0KTW92ZSBvbiBXb3JrYm9hcmQKQ2hhbmdlIFByb2plY3QgVGFncwpBbmFseXRpY3MtS2FuYmFuCtcKU2VjdXJpdHkK1wpXaWtpbWVkaWEtVkUtQ2FtcGFpZ25zIChTMi0yMDE4KQrXClNjYXAK1wpTY2FwIChTY2FwMy1BZG9wdGlvbi1QaGFzZTIpCtcKQWJ1c2VGaWx0ZXIK1wpEYXRhLXJlbGVhc2UK1wpIYXNodGFncwrXCkxhYnNEQi1BdWRpdG9yCtcKTGFkaWVzLVRoYXQtRk9TUy1NZWRpYVdpa2kK1wpMYW5ndWFnZS0yMDE4LUFwci1KdW5lCtcKTGFuZ3VhZ2UtMjAxOC1KYW4tTWFyCtcKSEhWTQrXCkhBV2VsY29tZQrXCkJvbGQKSXRhbGljcwpNb25vc3BhY2VkCkxpbmsKQnVsbGV0ZWQgTGlzdApOdW1iZXJlZCBMaXN0CkNvZGUgQmxvY2sKUXVvdGUKVGFibGUKVXBsb2FkIEZpbGUKTWVtZQpQcmV2aWV3CkhlbHAKRnVsbHNjcmVlbiBNb2RlClBpbiBGb3JtIE9uIFNjcmVlbgoyMzg0ODJuMzc1IGFkZGVkIHByb2plY3RzOiBTZWN1cml0eSwgV2lraW1lZGlhLVZFLUNhbXBhaWducyAoUzItMjAxOCksIFNjYXAgKFNjYXAzLUFkb3B0aW9uLVBoYXNlMiksIEFidXNlRmlsdGVyLCBEYXRhLXJlbGVhc2UsIEhhc2h0YWdzLCBMYWJzREItQXVkaXRvciwgTGFkaWVzLVRoYXQtRk9TUy1NZWRpYVdpa2ksIExhbmd1YWdlLTIwMTgtQXByLUp1bmUsIExhbmd1YWdlLTIwMTgtSmFuLU1hciwgSEhWTSwgSEFXZWxjb21lLlBSRVZJRVcKMjM4NDgybjM3NSBtb3ZlZCB0aGlzIHRhc2sgZnJvbSBJbiBQcm9ncmVzcyB0byBJbiBDb2RlIFJldmlldyBvbiB0aGUgQW5hbHl0aWNzLUthbmJhbiBib2FyZC4KMjM4NDgybjM3NSByZW1vdmVkIEpBbGxlbWFuZG91IGFzIHRoZSBhc3NpZ25lZSBvZiB0aGlzIHRhc2suCjIzODQ4Mm4zNzUgdHJpYWdlZCB0aGlzIHRhc2sgYXMgTG93ZXN0IHByaW9yaXR5LgoyMzg0ODJuMzc1IHJlbW92ZWQgc3Vic2NyaWJlcnM6IEFrbGFwcGVyLCBKQWxsZW1hbmRvdS4KQ29udGVudCBsaWNlbnNlZCB1bmRlciBDcmVhdGl2ZSBDb21tb25zIEF0dHJpYnV0aW9uLVNoYXJlQWxpa2UgMy4wIChDQy1CWS1TQSkgdW5sZXNzIG90aGVyd2lzZSBub3RlZDsgY29kZSBsaWNlbnNlZCB1bmRlciBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSAoR1BMKSBvciBvdGhlciBvcGVuIHNvdXJjZSBsaWNlbnNlcy4gQnkgdXNpbmcgdGhpcyBzaXRlLCB5b3UgYWdyZWUgdG8gdGhlIFRlcm1zIG9mIFVzZSwgUHJpdmFjeSBQb2xpY3ksIGFuZCBDb2RlIG9mIENvbmR1Y3QuILcgV2lraW1lZGlhIEZvdW5kYXRpb24gtyBQcml2YWN5IFBvbGljeSC3IENvZGUgb2YgQ29uZHVjdCC3IFRlcm1zIG9mIFVzZSC3IERpc2NsYWltZXIgtyBDQy1CWS1TQSC3IEdQTApZb3VyIGJyb3dzZXIgdGltZXpvbmUgc2V0dGluZyBkaWZmZXJzIGZyb20gdGhlIHRpbWV6b25lIHNldHRpbmcgaW4geW91ciBwcm9maWxlLCBjbGljayB0byByZWNvbmNpbGUu

Aklapper raised the priority of this task from Lowest to High.

Some of the testcases are fixed but 218 vs 220 is still the case.

Aklapper lowered the priority of this task from High to Medium.Jul 20 2018, 8:55 AM

This hopefully should get fixed with the Kibiter 6 upgrade in Sep/Oct 2018.

This hopefully should get fixed with the Kibiter 6 upgrade in Sep/Oct 2018.

No, problem still exists for the testcase in https://gitlab.com/Bitergia/c/Wikimedia/support/issues/7#note_98002739 (which is: going to https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit and setting timeframe to last 1 year and querying for author_name:"Ryan Kaldari", you get "1 Changeset Submitters" but in the "Submitters" widget there are two entries: "Kaldari" and "Valerie".)

For the "218 vs 220" issue in this very task I currently cannot retest due to https://gitlab.com/Bitergia/c/Wikimedia/support/issues/43 (must first increase the number of listed names from 100 to something bigger).

Aklapper renamed this task from Inconsistent numbers between number widgets and list widgets; some accounts counted twice to Explain inconsistent numbers between number widgets and list widgets (some accounts counted twice).Dec 14 2018, 8:00 PM
Aklapper claimed this task.

No, problem still exists for the testcase in https://gitlab.com/Bitergia/c/Wikimedia/support/issues/7#note_98002739 (which is: going to https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit and setting timeframe to last 1 year and querying for author_name:"Ryan Kaldari", you get "1 Changeset Submitters" but in the "Submitters" widget there are two entries: "Kaldari" and "Valerie".)

Looking at this case again, it can be seen when searching for author_name:"Ryan Kaldari" AND name:"Valerie" in Bitergia's gerrit DB, and that is actually correct as the data in Gerrit shows the same. See for example https://gerrit.wikimedia.org/r/#/q/I7a4a37f3d83b475ff61566a7e31516ea07935e7f

For the "218 vs 220" issue in this very task I currently cannot retest due to https://gitlab.com/Bitergia/c/Wikimedia/support/issues/43 (must first increase the number of listed names from 100 to something bigger).

The "cannot retest" part is solved, so under "Visualize" I went to gerrit_authors_changesets just to realize that it is gerrit_top_developers instead (gosh, I already wrote in this task desc the name of the visualization). Then increased under Buckets the Size to get more than 100 names listed in that widget.

Based on the above example I isolated a testcase in https://wikimedia.biterg.io/goto/dd710519f76d650b848843e7e9ddeb7c on the "Gerrit" dashboard by setting the timeframe from Oct11 2018, 18:49:20 to Oct11 2018, 18:53:59 and filtering for repository:"operations/mediawiki-config".
# Changeset Submitters shows 1 Changeset Submitters, the Submitters list shows 2.

"Visualize" (on the left) shows the # Changeset Submitters metric value in the gerrit_main_numbers visualization is based on author_uuid field.
"Visualize" also states for gerrit_top_developers that it is Linked to Saved Search “SCR Reviews scr gerrit_enrich”. Under "Management > Saved Objects", one can run that search and ends up in "Discover" and there is no author_uuid column but a name column with 2 different names. I manually added the author_uuid column and it shows the same 1 author_uuid:

bt.png (638×1 px, 133 KB)

Mystery solved, I'd say:
The gerrit_main_numbers widget at the top bases its number on the author_uuid field.
The gerrit_top_developers widget at the bottom bases its number on the name field.

I've documented this in a new "Behavior which might surprise you" section in https://www.mediawiki.org/w/index.php?title=Community_metrics&type=revision&diff=3008572&oldid=3006446