Page MenuHomePhabricator

Provide Blocked By relationships in public dump
Closed, ResolvedPublic

Description

Phlogiston can interpret either Project or Projectcolumn membership as "category" in reporting. Some teams are using parent task (or rather, the "Blocked by" relationship in Phabricator in place of "parent") instead. Can Phlogiston use this information?

  • Identify level of effort and security concerns to get parent task information in dump
  • Add transactional core:edge data to dump
  • Add as-is core:edge data to dump

Example from iOS, including grandchildren: https://phabricator.wikimedia.org/T104510

Event Timeline

JAufrecht raised the priority of this task from to Needs Triage.
JAufrecht updated the task description. (Show Details)
JAufrecht added a project: Phlogiston.
JAufrecht added subscribers: JAufrecht, DStrine.

Chase, is this already in the dump, or something that could be added to the dump? I looked through 'edge' and 'info' of a sample task with children (on both the parent and child sides) and didn't see anything.

JAufrecht set Security to None.
JAufrecht moved this task from To Be Triaged to Backlog on the Phlogiston board.
chasemp added a subscriber: chasemp.

I don't know if this is included, probably not as we whitelist transactions and I think this is just a type.

But Specify how to differentiate parent task from regular blocked-by task

I don't think there is a difference at heart, it's one abstraction (where child just copies settings on creation)

I don't have time atm but if you can investigate and outline how this is stored (transaction or what?) I can get it included I imagine. There shouldn't be any real risk....the only edge case is hidden/private things as children etc. Which I think I can work around.

This is included in edge. When I go to task 100, click "Edit 'Blocked By' Tasks" and set task 200 to make task 100 blocked by task 200, I get this:

MariaDB [phabricator_maniphest]> select * from edge where dateCreated > 1445340000;
+--------------------------------+------+--------------------------------+-------------+-----+--------+
| src                            | type | dst                            | dateCreated | seq | dataID |
+--------------------------------+------+--------------------------------+-------------+-----+--------+
| PHID-TASK-2hmlxisfesql6nmx6rq5 |    3 | PHID-TASK-6acczr47mr5ye6reaam6 |  1445342809 |   0 |   NULL |
| PHID-TASK-6acczr47mr5ye6reaam6 |    4 | PHID-TASK-2hmlxisfesql6nmx6rq5 |  1445342809 |   0 |   NULL |
+--------------------------------+------+--------------------------------+-------------+-----+--------+
2 rows in set (0.25 sec)

MariaDB [phabricator_maniphest]> select id,phid from maniphest_task where id = 200 or id = 100;
+-----+--------------------------------+
| id  | phid                           |
+-----+--------------------------------+
| 100 | PHID-TASK-2hmlxisfesql6nmx6rq5 |
| 200 | PHID-TASK-6acczr47mr5ye6reaam6 |
+-----+--------------------------------+
2 rows in set (0.06 sec)

Obviously when this relation between two tasks is removed, the entry in edge is also removed and only left in maniphest_transaction.

Yes, I now have several teams wanting features that depend on this data, so it would be good to add edge transactions. I'm pretty sure it's not in the dump:

Test case: T117102: REVIEW strategy possibilities you didn't work on has as a parent T113053: [EPIC] TPG selects and documents a strategy

phab=# select * from maniphest_task where id=117102;
   id   |              phid              |                      title                       | story_points 
--------+--------------------------------+--------------------------------------------------+--------------
 117102 | PHID-TASK-bmuloioy4qwwltscekmw | REVIEW strategy possibilities you didn't work on | 
(1 row)

phab=# select * from maniphest_task where id=113053;
   id   |              phid              |           title            | story_points 
--------+--------------------------------+----------------------------+--------------
 113053 | PHID-TASK-paz3m3pz66cpbv6ezmkj | EPIC: TPG Strategy Process | 


>>> str(data['task']['117102']).find('paz3m3')
-1
>>> str(data['task']['113053']).find('bmu')
-1
>>> str(data['task']['113053']).find('117102')                                                                                                                                                    
-1
>>> str(data['task']['117102']).find('113053')                                                                                                                                              
-1
JAufrecht raised the priority of this task from Low to High.EditedNov 21 2015, 10:40 PM

Here are the transactions visible in the dump. core:edge is not the same thing as edge, I guess.

>>> data['task']['117102']['transactions'].keys()
dict_keys(['projectcolumn', 'status', 'priority', 'reassign', 'core:edge'])
>>>
chasemp lowered the priority of this task from High to Low.Nov 23 2015, 3:47 PM

@chasemp what is necessary to get this implemented? The priority of this has increased on our end due to forecasting needs of a few of the teams we support.

Change 254912 had a related patch set uploaded (by Rush):
Current task edge relationship data

https://gerrit.wikimedia.org/r/254912

Change 254912 merged by Rush:
Current task edge relationship data

https://gerrit.wikimedia.org/r/254912

I was already looking at this actually, but it would be beneficial to discuss what the priority of phlogiston is in general. My understanding was that the intial purpose was supplementary and experimental reporting things. It has now transitioned into a needed service? I'm happy to support as a best-effort thing but whoever the owner is may want to coordinate with Release-Engineering-Team to get it some official recognition (especially in relation to upgrades and maintenance support)

Note the information asked for is already there as part of edge:core actually, but it is a bit tricky.

See an example from phab-01

             id: 816
           phid: PHID-XACT-TASK-6to72u2p2zs37uo
     authorPHID: PHID-USER-7t36l5d3llsm5abqfx3u
     objectPHID: PHID-TASK-cgxqoojz2d3yyvow7rx5
     viewPolicy: public
     editPolicy: PHID-USER-7t36l5d3llsm5abqfx3u
    commentPHID: NULL
 commentVersion: 0
transactionType: core:edge
       oldValue: {"PHID-TASK-pulvzzhstt6zrecwfchr":{"src":"PHID-TASK-cgxqoojz2d3yyvow7rx5","type":"3","dst":"PHID-TASK-pulvzzhstt6zrecwfchr","dateCreated":"1442803041","seq":"0","dataID":null,"data":[]}}
       newValue: {"PHID-TASK-pulvzzhstt6zrecwfchr":{"dst":"PHID-TASK-pulvzzhstt6zrecwfchr","type":3,"data":[]},"PHID-TASK-3wubyro6yvyeon2j6lav":{"dst":"PHID-TASK-3wubyro6yvyeon2j6lav","type":3,"data":[]}}
  contentSource: {"source":"web","params":{"ip":"10.68.21.68"}}
       metadata: {"edge:type":3}
    dateCreated: 1448306092
   dateModified: 1448306092
*************************** 9. row ***************************
             id: 818
           phid: PHID-XACT-TASK-anxbe4ehbcujhz5
     authorPHID: PHID-USER-7t36l5d3llsm5abqfx3u
     objectPHID: PHID-TASK-cgxqoojz2d3yyvow7rx5
     viewPolicy: public
     editPolicy: PHID-USER-7t36l5d3llsm5abqfx3u
    commentPHID: NULL
 commentVersion: 0
transactionType: core:edge
       oldValue: {"PHID-TASK-3wubyro6yvyeon2j6lav":{"src":"PHID-TASK-cgxqoojz2d3yyvow7rx5","type":"3","dst":"PHID-TASK-3wubyro6yvyeon2j6lav","dateCreated":"1448306092","seq":"0","dataID":null,"data":[]},"PHID-TASK-pulvzzhstt6zrecwfchr":{"src":"PHID-TASK-cgxqoojz2d3yyvow7rx5","type":"3","dst":"PHID-TASK-pulvzzhstt6zrecwfchr","dateCreated":"1442803041","seq":"0","dataID":null,"data":[]}}
       newValue: {"PHID-TASK-pulvzzhstt6zrecwfchr":{"dst":"PHID-TASK-pulvzzhstt6zrecwfchr","type":3,"data":[]}}
  contentSource: {"source":"web","params":{"ip":"10.68.21.68"}}
       metadata: {"edge:type":3}
    dateCreated: 1448306171
   dateModified: 1448306171
9 rows in set (0.01 sec)

These are transactions that reflect this relationship.

I have added the actual point-in-time edge state now as well:

https://gerrit.wikimedia.org/r/#/c/254912/

Here's the change in today's dump, compared to before. The relationships are present in the dump:

>>> str(data['task']['117102']).find('paz3m3')
4656
>>> str(data['task']['113053']).find('bmu')
3168
>>> str(data['task']['113053']).find('117102')
-1
>>> str(data['task']['117102']).find('113053')
-1
>>>

Digging deeper, I can see the relationship in the point-in-time edge data:

>>> str(data['task']['113053']['edge']).find('bmu')
144

But it doesn't appear to be in the transactional data:

>>> str(data['task']['113053']['transactions']).find('bmu')
-1

I tried to work through the example above, but could not find that data in the dump by ID or PHID or objectPHID. Which Phabricator tasks are they?

So, with today's dump I have point-in-time edge data, but either the transactional edge data isn't there or I don't understand how to find it. Should I re-open, or make a new ticket?

I don't understand the significance of the transaction historical parent task data in this context. The point in time edge relationships should be sufficient, I imagined the transaction logs as being less ideal as they require the user to replay transactions to get an end state which is available as-is now.

But either way the example above says See an example from phab-01

Taking the example task https://phab-01.wmflabs.org/T113 on phab-01.eqiad.wmflabs

Running a new dump:

root@phab-01:/srv/phab/tools# time python public_task_dump.py
real 0m0.419s

Grepping for the relationships in the transaction log shown here:

cat /srv/dumps/phabricator_public.dump | python -m json.tool | grep -A 12 PHID-XACT-TASK-6to72u2p2zs37uo
                        "PHID-XACT-TASK-6to72u2p2zs37uo",
                        "PHID-USER-7t36l5d3llsm5abqfx3u",
                        "PHID-TASK-cgxqoojz2d3yyvow7rx5",
                        null,
                        0,
                        "core:edge",
                        "{\"PHID-TASK-pulvzzhstt6zrecwfchr\":{\"src\":\"PHID-TASK-cgxqoojz2d3yyvow7rx5\",\"type\":\"3\",\"dst\":\"PHID-TASK-pulvzzhstt6zrecwfchr\",\"dateCreated\":\"1442803041\",\"seq\":\"0\",\"dataID\":null,\"data\":[]}}",
                        "{\"PHID-TASK-pulvzzhstt6zrecwfchr\":{\"dst\":\"PHID-TASK-pulvzzhstt6zrecwfchr\",\"type\":3,\"data\":[]},\"PHID-TASK-3wubyro6yvyeon2j6lav\":{\"dst\":\"PHID-TASK-3wubyro6yvyeon2j6lav\",\"type\":3,\"data\":[]}}",
                        "{\"edge:type\":3}",
                        1448306092,
                        1448306092
                    ],
                    [
cat /srv/dumps/phabricator_public.dump | python -m json.tool | grep -A 12 PHID-XACT-TASK-anxbe4ehbcujhz5
                        "PHID-XACT-TASK-anxbe4ehbcujhz5",
                        "PHID-USER-7t36l5d3llsm5abqfx3u",
                        "PHID-TASK-cgxqoojz2d3yyvow7rx5",
                        null,
                        0,
                        "core:edge",
                        "{\"PHID-TASK-3wubyro6yvyeon2j6lav\":{\"src\":\"PHID-TASK-cgxqoojz2d3yyvow7rx5\",\"type\":\"3\",\"dst\":\"PHID-TASK-3wubyro6yvyeon2j6lav\",\"dateCreated\":\"1448306092\",\"seq\":\"0\",\"dataID\":null,\"data\":[]},\"PHID-TASK-pulvzzhstt6zrecwfchr\":{\"src\":\"PHID-TASK-cgxqoojz2d3yyvow7rx5\",\"type\":\"3\",\"dst\":\"PHID-TASK-pulvzzhstt6zrecwfchr\",\"dateCreated\":\"1442803041\",\"seq\":\"0\",\"dataID\":null,\"data\":[]}}",
                        "{\"PHID-TASK-pulvzzhstt6zrecwfchr\":{\"dst\":\"PHID-TASK-pulvzzhstt6zrecwfchr\",\"type\":3,\"data\":[]}}",
                        "{\"edge:type\":3}",
                        1448306171,
                        1448306171
                    ]
                ],

Phlogiston does historical reporting and trends, so it already replays the transactions to rebuild the complete state-in-time history, so the historical data is more useful than the current state data. In my comment above, I picked a random relationship on the live data and could not find it represented in the dump within core:edge (as determined by searching for the phid), so I will try and use this same technique for your example on phab-01 and see if I can figure out what I'm not understanding.

Notes primarily to myself in the vein of an H.P. Lovecraft protagonist recording the events that drove them to madness for the hypothetical benefit of future adventurers:

Was able to replicate Chase's results on Phab-01: T113's relationship with T114 is present in an edge:type transaction for 113. Note that it doesn't appear to be in an edge:type transaction for 114, presumably because the relationship was created from 113, so it's a transaction only for 113.

Went back to the real data to re-investigate. I used three tasks as my sample data:

Eve   117924                 [EPIC] Run Strategy Tests                PHID-TASK-i2ouskz76mbmb4bdwhcu
          blocked by 
Cain      113053                 EPIC: TPG Strategy Process               PHID-TASK-paz3m3pz66cpbv6ezmkj
              blocked by          
Enoch         117102                 REVIEW strategy possibilities ...        PHID-TASK-bmuloioy4qwwltscekmw

Cain doesn't have Enoch's (T117102) PHID in any of Cain's core:edge transaction; nor does Enoch have Cain's PHID in any of Enoch's core:edge transactions. Enoch doesn't have any core:edge transactions of type 3 or 4. And, Enoch doesn't have any mention of Blocked or Blocked by in the transaction log on the web page. This is where I got stuck in my previous investigation noted above.

But, Eve (T117924) does have a core:edge transaction with that include Cain's (T113053) PHID. In fact, Eve has two, the second of which is for Abel (T117104) and contain's Cain's PHID I think either as old-value or for sort order or both. These both have an edge:type of 3, which I guess is "blocked by". And Cain has Eve's PHID in a core:edge transaction of edge:type 4, which must be "blocks".

My hypothesis is that if the subtask is created via the "Create Subtask" action, the blocked-by relationship is created as part of task creation and is not a transaction, whereas if the subtask relationship is created with the "Edit Blocked By" action, it is a transaction. I created some fresh tasks on phab-01 to test this, and it appears to be true.

So, a proper historical reconstruction would require

  1. finding all current core:edge transactions for each task in task.N.edge instead of task.N.transaction.core:edge.
  2. checking to see if any of them is associated with a transaction record in task.N.transaction.core:edge
  3. if not, treating the creation date of the task as the transaction date of the relationship.
  4. merging that list with the list of transactions from task.N.transaction.core:edge
  5. handling deleted relationships (which Phlogiston already does for regular core:edge, so maybe that could be recycled)

So the new state-in-time data Chase added is necessary to have a complete list of blocked by relationships, either historical or at-present. I guess I'll use it as at-present data until there's a pressing need to have better reconstruction.

JAufrecht renamed this task from Use parent tasks as categories to Provide Blocked By relationships in public dump.Dec 3 2015, 6:39 AM