Page MenuHomePhabricator

iOS Milestones missing from Phlogiston
Closed, ResolvedPublic8 Estimated Story Points

Description

Example: https://phabricator.wikimedia.org/T127391

This is tagged with Milestone, has children, and is in the recat file, but doesnt appear in the charts.

Event Timeline

JAufrecht triaged this task as Unbreak Now! priority.Feb 19 2016, 6:38 PM
JAufrecht added a subscriber: MBinder_WMF.

Preliminary finding: the Friday morning dump didn't run, so none of this data is present in the Phlogiston data load because it was created after the Thursday morning dump.

In this case, the milestone appears in Forecasts, but has no data despite children. The labels do not appear in Project Burnup or "Complete forecast dates".

Researching: the child tasks are present in the data (in task_history) but not associated with the milestone. The milestone task is missing from task_history:

phab=# select count(*) from task_history where id = '127391';
 count 
-------
     0
(1 row)

However, the milestone task is present in the loaded data:

phab=# select count(*) from maniphest_task where id = '127391';
 count 
-------
     1
(1 row)

The edge data shows that it belongs to only one project, and that it's duplicated

  task  | project | edge_date  
--------+---------+------------
 127391 |    1656 | 2016-02-19
 127391 |    1656 | 2016-02-19
 127391 |    1656 | 2016-02-19
 127391 |    1656 | 2016-02-19
 127391 |    1656 | 2016-02-19
 127391 |    1656 | 2016-02-19
 127391 |    1656 | 2016-02-19

Project 1656 is Milestone.

So, next step is to dig into the routine that figures out the edges.

While reconstructing the data, Phlogiston builds a list of which tasks are relevant. This is derived from edge data in maniphest_transaction, which appears accurate:

phab=# select active_projects from maniphest_transaction where date_modified <= '2016-02-19' and task_id = 127391 and has_edge_data is true;
 active_projects 
-----------------
 {782}  
 {942}  
 {1656}
(3 rows)

Those three are Wikipedia-iOS-App-Product-Backlog, Epic, and Milestone, respectively. Phlogiston then reconstructs that information, day by day, in maniphest_edge. But, all it ends up reconstructing is that this task belongs to Milestone:

phab=# select * from maniphest_edge where task = '127391';
  task  | project | edge_date  
--------+---------+------------
 127391 |    1656 | 2016-02-23
 127391 |    1656 | 2016-02-22
 127391 |    1656 | 2016-02-21
 127391 |    1656 | 2016-02-20
 127391 |    1656 | 2016-02-19

However, it only does that for the single most recent project (the LIMIT 1 phrase below). I don't know why; it seems like that would be wrong for most tasks, and so I'm not even sure, if that does that, why more isn't broken.

FOR taskrow IN SELECT id
                 FROM maniphest_task
                ORDER BY id
LOOP
    FOR projrow IN SELECT active_projects
                     FROM maniphest_transaction
                    WHERE date_modified <= run_date
                      AND task_id = taskrow.id
                      AND has_edge_data IS TRUE
                 ORDER BY date_modified DESC
                    LIMIT 1
    LOOP
        FOREACH project_id IN ARRAY projrow.active_projects & project_id_list
        LOOP
            INSERT INTO maniphest_edge
                 VALUES (taskrow.id, project_id, run_date);
        END LOOP;
    END LOOP;
END LOOP;```

That's causing the problem, but I can't fix it until I figure out what that was intended to accomplish.

Followup:
127391, of course, should not be in the Namespace milestone because it _is_ the Namespace milestone (T125910).

However, it has four children and they should be showing up in the reports, but only two are three are.

T124816: Don't show "Because you read..." items for disambiguation pages
T119010: [5.0.0.503] no-main namespace articles place empty 'Read more' in footer
T119007: [5.0.0.503] no-main space articles not handled properly from Home page
T126555: 3D-peeking at a non-mainspace (ns 0) article on the Explore tab shows preview of the top article on Explore tab

After removing the "limit 1" clause, two of the four show up with category=Namespace. 119010 is present for some days, but disappears after Feb 29; its project membership changes March 1, so that may be a separate bug or non-bug issue. 119007 is never present.

Debugging further, starting with earliest suspect table: maniphest_edge.

select * from maniphest_edge where task = 127391 order by project, edge_date;

Shows that 127391 isn't a milestone after Feb 23. Investigating this next.

Fresh data generation produces this:

 phab=# select * from maniphest_edge where task = 127391 order by edge_date, project;
 task  | project | edge_date  
-------+---------+------------
127391 |    1656 | 2016-02-19
127391 |    1656 | 2016-02-20
127391 |    1656 | 2016-02-21
127391 |    1656 | 2016-02-22
127391 |    1546 | 2016-02-23
127391 |    1546 | 2016-02-24
127391 |    1546 | 2016-02-25
127391 |    1546 | 2016-02-26
127391 |    1546 | 2016-02-27
127391 |    1546 | 2016-02-28
127391 |    1546 | 2016-02-29
127391 |    1546 | 2016-03-01
127391 |    1546 | 2016-03-02
127391 |    1546 | 2016-03-03
127391 |    1546 | 2016-03-04
(15 rows)

Suggesting there is still only one edge per task being added, even when there should be more.

After reconstructing maniphest_edge without the LIMIT 1 clause (and with a check-for-existing clause to eliminate duplicate rows), maniphest_edge now shows the correct relationships. Data below is not quite correct because it combines queries from different test runs with different special debug configurations.

phab=# select edge_date, count(*) from maniphest_edge where task = 127391 group by edge_date order by edge_date;
 edge_date  | count 
------------+-------
 2016-02-19 |     2
 2016-02-20 |     2
 2016-02-21 |     2
 2016-02-22 |     2
 2016-02-23 |     3
 2016-02-24 |     3
 2016-02-25 |     3
 2016-02-26 |     3
 2016-02-27 |     3
 2016-02-28 |     3
 2016-02-29 |     3

But the number of tasks present in task_history is incorrect.
Several of the child tasks disappear from task_history:

phab=# select id, date from task_history where id = 126555;
   id   |        date         
--------+---------------------
[...]
 126555 | 2016-02-26 00:00:00
 126555 | 2016-02-27 00:00:00
 126555 | 2016-02-28 00:00:00
 126555 | 2016-02-29 00:00:00
(4 rows)

This data should extend to March 7 or 8, but ends March 2. That's not exactly the sa

phab=# select id, date from task_history where id = 126555 order by date;
   id   |        date         
--------+---------------------
[... ] 126555 | 2016-02-27 00:00:00
 126555 | 2016-02-28 00:00:00
 126555 | 2016-02-29 00:00:00
 126555 | 2016-03-01 00:00:00
 126555 | 2016-03-02 00:00:00

Examining one disappearing task, the issue seems to be the list of projects it belongs to:

phab=# SELECT mt.active_projects
phab-#           FROM maniphest_transaction mt
phab-#          WHERE date(mt.date_modified) <= '2016-03-02'
phab-#            AND mt.task_id = 126555
phab-#            AND mt.has_edge_data IS TRUE
phab-#          ORDER BY date_modified DESC
phab-#          LIMIT 1;
 active_projects 
-----------------
 {1735}
(1 row)

If we remove the LIMIT:

phab=# SELECT mt.active_projects
          FROM maniphest_transaction mt
         WHERE date(mt.date_modified) <= '2016-03-02'
           AND mt.task_id = 126555
           AND mt.has_edge_data IS TRUE
         ORDER BY date_modified DESC
         ;
 active_projects 
-----------------
 {1735}
 {1724,782,1546}
 {1724,782}
 {782}
(4 rows)

So, after the transaction on 2016-02-29, Phlogiston is now getting incorrect project membership data from the transaction history.

Fresh recap with clean data (starting Feb 26):

  1. On 2016-03-02 (for example), task 127391 in Phlogiston should have five children in the iOS report, because it meets the Phlogiston rules for that: It has five "child" (blocking) tasks, and it is tagged as a Milestone, and it and all children tasks are in projects that are specified in ios_source.py. But, on the burnup chart it has only two to four children:

ios_tranche6_burnup_count.png (700×1 px, 10 KB)

This data comes from task_history:

phab=# select date, count(*) from task_history_recat where category = 'Namespace' group by date order by date;

        date         | count 
---------------------+-------
 2016-02-26 00:00:00 |     4
 2016-02-27 00:00:00 |     4
 2016-02-28 00:00:00 |     4
 2016-02-29 00:00:00 |     3
 2016-03-01 00:00:00 |     2
 2016-03-02 00:00:00 |     2
 2016-03-03 00:00:00 |     2
 2016-03-04 00:00:00 |     3
 2016-03-05 00:00:00 |     3
 2016-03-06 00:00:00 |     3
 2016-03-07 00:00:00 |     2
 2016-03-08 00:00:00 |     2
 2016-03-09 00:00:00 |     2
 2016-03-10 00:00:00 |     2
(14 rows)

However, the milestone relationship is correct and complete:

phab=# select date, count(*) from task_milestone where milestone_id = 127391 group by date order by date;
        date         | count 
---------------------+-------
 2016-02-26 00:00:00 |     5
 2016-02-27 00:00:00 |     5
 2016-02-28 00:00:00 |     5
 2016-02-29 00:00:00 |     5
 2016-03-01 00:00:00 |     5
 2016-03-02 00:00:00 |     5
 2016-03-03 00:00:00 |     5
 2016-03-04 00:00:00 |     5
 2016-03-05 00:00:00 |     5
 2016-03-06 00:00:00 |     5
 2016-03-07 00:00:00 |     5
 2016-03-08 00:00:00 |     5
 2016-03-09 00:00:00 |     5
 2016-03-10 00:00:00 |     5
(14 rows)

One of the discrepant tasks is 126555, which disappears after Feb 29:

phab=# select id, date from task_history where id = 126555;
   id   |        date         
--------+---------------------
 126555 | 2016-02-26 00:00:00
 126555 | 2016-02-27 00:00:00
 126555 | 2016-02-28 00:00:00
 126555 | 2016-02-29 00:00:00
(4 rows)

The reason it disappears is that its project memberships (aka edges) are changed on March 1, and while the project memberships should keep it in the data, Phlogiston mis-parses the transaction data and incorrectly includes it in only one project. At the end of March 1, according to the UI, it should be in

Wikipedia-iOS-App-Product-Backlog
User-Josve05a
iOS-app-v5.0.1-Kiwi

But in Phlogiston, starting March 1, it belongs to only one project.

phab=# select date_modified, active_projects from maniphest_transaction where task_id = 12
6555 and has_edge_data is true order by date_modified;                                    
     date_modified      | active_projects 
------------------------+-----------------
 2016-02-10 23:18:16+00 | {782}
 2016-02-10 23:34:14+00 | {1724,782}
 2016-02-11 18:18:17+00 | {1724,782,1546}
 2016-03-01 21:50:53+00 | {1735}

The raw Phabricator transaction data shows three projects, but this is apparently mis-parsed to one:

{"PHID-PROJ-pvhu7u7ahy32wuyadzui":
  {"src":"PHID-TASK-qnqhgkbijgcefgiu4snf",
    "type":"41",
    "dst":"PHID-PROJ-pvhu7u7ahy32wuyadzui",
    "dateCreated":"1455147254",
    "seq":"0",
    "dataID":null,
    "data":[]},
 "PHID-PROJ-ojylfmfiogxrvnxoiwlp":
   {"src":"PHID-TASK-qnqhgkbijgcefgiu4snf",
    "type":"41",
    "dst":"PHID-PROJ-ojylfmfiogxrvnxoiwlp",
    "dateCreated":"1455146296",
    "seq":"0",
    "dataID":null,
    "data":[]},
 "PHID-PROJ-kuvr3fgqaux3rqrxv4i6":
   {"dst":"PHID-PROJ-kuvr3fgqaux3rqrxv4i6",
    "type":41,
    "data":[]}}

Spotted the problem.

In a previous transaction, which is correctly processed, the type is a number:

2016-02-11 18:18:17+00 | {"PHID-PROJ-pvhu7u7ahy32wuyadzui":{"dst":"PHID-PROJ-pvhu7u7ahy32wuyadzui","type":41,"data":[]},
                          "PHID-PROJ-ojylfmfiogxrvnxoiwlp":{"dst":"PHID-PROJ-ojylfmfiogxrvnxoiwlp","type":41,"data":[]},
                          "PHID-PROJ-4hicuoywzukirrr5jg43":{"dst":"PHID-PROJ-4hicuoywzukirrr5jg43","type":41,"data":[]}}

But in the mis-parsed transaction, the type is a string in the first two projects and a number in the last one.

Code change (to handle force "41" to int) has run on dev, appears fixed:

http://phlogiston-dev.wmflabs.org/ios.html

JAufrecht set the point value for this task to 8.Mar 10 2016, 6:50 PM