Page MenuHomePhabricator

Maniphest burnup report incorrectly counts "All Time" opened and closed tasks (due to lack of transactions from Bugzilla/RT times)
Open, LowestPublic

Description

Maniphest burnup report incorrectly counts "All Time" opened and closed tasks. It seems like it considers closed tasks migrated from Bugzilla to be still open.

For example, for MediaWiki-Page-editing, https://phabricator.wikimedia.org/maniphest/report/burn/?project=PHID-PROJ-hd4zy7ho6fpqtllay7jm reports 1,422 opened and 66 closed tasks, while there are in fact 310 open and 1,103 closed tasks. (And why these numbers don't match each other is yet another mystery.)

It seems that tasks migrated from Bugzilla that were already closed there do not have any "event" marking that it was closed. For example, T116178 has an entry "TheDJ closed this task as "Resolved".", while T2015 has nothing like this. I think these should be backfilled (probably using @bzimport as the actor).

Event Timeline

matmarex raised the priority of this task from to Needs Triage.
matmarex updated the task description. (Show Details)
matmarex added a project: Phabricator.
matmarex added subscribers: matmarex, bzimport.
Aklapper triaged this task as Lowest priority.Jan 20 2016, 1:07 PM
Aklapper changed the task status from Open to Stalled.Jan 20 2016, 1:07 PM

This doesn't seem to have been fixed by the subtask. :(

Aklapper changed the task status from Stalled to Open.May 22 2024, 8:59 PM

Hmm, right.
https://phabricator.wikimedia.org/source/phabricator/browse/wmf%252Fstable/src/applications/maniphest/controller/ManiphestReportController.php$74-413 is the code for the Burnup Rate graph - it queries the transaction table (which definitely has no entries from Bugzilla times, means: from before 2014-11-21).

closedEpoch, which I backfilled today in T107254, was introduced to the codebase on 2018-02-08, and is used only for "Closed After" and "Closed Before" on https://phabricator.wikimedia.org/maniphest/query/advanced/

Aklapper renamed this task from Maniphest burnup report incorrectly counts "All Time" opened and closed tasks to Maniphest burnup report incorrectly counts "All Time" opened and closed tasks (due to lack of transactions from Bugzilla/RT times).May 22 2024, 9:08 PM
Aklapper removed a subscriber: bzimport.

I played with the code a bit and initially wondered if it could be made more performant by relying on dateCreated and closedEpoch values instead of on our evergrowing maniphest_transaction table but had to realize that this would make detecting task reopens (and their dates) in the chart impossible.

Thus in the long term, I expect the upstream report/chart code to break anyway due to timeouts, given the current database structure.

I believe it should be possible to add custom downstream code to workaround BZ and RT tickets missing, but ofc that will make things even less performant:

1diff --git a/src/applications/maniphest/controller/ManiphestReportController.php b/src/applications/maniphest/controller/ManiphestReportController.php
2index 012d6d136b..fc72ae7f05 100644
3--- a/src/applications/maniphest/controller/ManiphestReportController.php
4+++ b/src/applications/maniphest/controller/ManiphestReportController.php
5@@ -174,6 +174,56 @@ final class ManiphestReportController extends ManiphestController {
6
7 // Merge the synthetic rows into the real transactions.
8 $data = array_merge($create_rows, $data);
9+
10+/* WMF T119376 BEGIN - handle tasks imported from BZ + RT w/o transactions */
11+ if ($project_phid) {
12+ $wmf_joins = qsprintf(
13+ $conn,
14+ 'JOIN %T p ON p.src = t.phid AND p.type = %d AND p.dst = %s',
15+ PhabricatorEdgeConfig::TABLE_NAME_EDGE,
16+ PhabricatorProjectObjectHasProjectEdgeType::EDGECONST,
17+ $project_phid);
18+ } else {
19+ $wmf_joins = qsprintf($conn, '');
20+ }
21+
22+ $wmf_legacy_rows_open = queryfx_all(
23+ $conn,
24+ 'SELECT t.dateCreated
25+ FROM %T t %Q
26+ WHERE ((t.id > 2000 AND t.id < 75683)
27+ OR (t.id > 78842 AND t.id < 84828))',
28+ id(new ManiphestTask())->getTableName(),
29+ $wmf_joins);
30+ foreach ($wmf_legacy_rows_open as $key => $wmf_legacy_row_open) {
31+ $wmf_legacy_rows_open[$key] = array(
32+ 'transactionType' => 'status',
33+ 'oldValue' => null,
34+ 'newValue' => $default_status, // open
35+ 'dateCreated' => $wmf_legacy_row_open['dateCreated'],
36+ );
37+ }
38+ $wmf_legacy_rows_closed = queryfx_all(
39+ $conn,
40+ 'SELECT t.closedEpoch
41+ FROM %T t %Q
42+ WHERE ((t.id > 2000 AND t.id < 75683)
43+ OR (t.id > 78842 AND t.id < 84828))
44+ AND (t.closedEpoch = 1418860800 OR t.closedEpoch = 1416614400)',
45+ id(new ManiphestTask())->getTableName(),
46+ $wmf_joins);
47+ foreach ($wmf_legacy_rows_closed as $key => $wmf_legacy_row_closed) {
48+ $wmf_legacy_rows_closed[$key] = array(
49+ 'transactionType' => 'status',
50+ 'oldValue' => $default_status, //open
51+ 'newValue' => 'resolved',
52+ 'dateCreated' => $wmf_legacy_row_closed['closedEpoch'],
53+ );
54+ }
55+ $data = array_merge($wmf_legacy_rows_open, $data);
56+ $data = array_merge($wmf_legacy_rows_closed, $data);
57+/* WMF T119376 END - handle tasks imported from BZ + RT w/o transactions */
58+
59 $data = array_values($data);
60 $data = isort($data, 'dateCreated');

Uhm. I think that having these transactions is doable.

The first sub-task I see, is creating a mapping between Bugzilla usernames and Phabricator ones.

I'm quite sure that, somewhere, somebody has this map. If not on a repo, at least in their computer.

Anyway... plan B: I can maybe contribute to this map by... scraping >=] ... again our Bugzilla dump and matching task authors at least (skipping Bugzilla bot, where set here). Then, crowdsourcing/guessing the missing ones, or, assigning the ghosts to the Bugzilla import bot.

The first sub-task I see, is creating a mapping between Bugzilla usernames and Phabricator ones.

This task has nothing to do with people. This task is only about tickets' status and dates. Not people.

I'm not sure if this is worth the effort, I doubt that anyone would need the burnup reports covering 10 years ago now. It probably wasn't even worth it in 2015 when I filed this.

If anyone wants to fix it, I'd try it by changing the code to use the dates we already imported (it'd probably be more efficient that way too), rather than trying to import more data.