Page MenuHomePhabricator

Use SprintFactDaemon to record and aggregate Maniphest Transactions related to points and status changes.
Closed, DeclinedPublic4 Estimated Story Points

Description

There is a functional problem with Sprint calculations in that when a task is closed, the points for that day are accumulated without a transaction record. So, if the same task is reopened the following day, the points cannot be deducted from the Sprint points closed for the previous day. The current behavior is that when a task is reopened the points are readded to the points remaining, in addition to the original points from when it was assigned to the Sprint. This yields an incorrect points remaining.

The solution is to build a real stats table that records the transactions related to task points and status changes using the Facts application.

Event Timeline

Christopher claimed this task.
Christopher raised the priority of this task from to Needs Triage.
Christopher updated the task description. (Show Details)
Christopher changed Security from none to None.
Christopher subscribed.
Christopher renamed this task from Create custom field transaction to record "points closed" to Create custom field transaction to record "points status".Dec 10 2014, 12:37 PM

I have made progress with using the Facts application to store Task transaction data. The current implementation of the PhabricatorFactCountEngine does not look at Transactions.

There are several steps with this. First, a new Fact Daemon called SprintFactDaemon is launched with phd.

The SprintFactDaemon retrieves a Maniphest Transaction object (identified with the getFactObjectsForAnalysis() method) from the Sprint Application.

The daemon then runs a LiskDAO query (i.e. loadAllWhere('transactionType = %s', 'core:customfield');) using the transaction object to retrieve typed transactions. The daemon iterates over each transaction group and then sends each transaction to the computeRawFactsForObject() method in the engine for analysis.

There is a new engine PhabricatorFactSprintEngine that gets the Task PHID from the transaction object id. Then, if the transaction type is a "core:customfield" type, it records the taskPHID, the oldvalue, the newvalue and the date in the table. If it is a type "status", it checks the old and new value and assigns "1" to open and "0" to closed. It ignores all other status types for now.

The daemon compiles all of the raw facts and then updates them in the database. So, now from this data, accurate point aggregates can be assembled and task status can be properly identified.

The engine should be able to be localized and contained within Sprint. The facts daemon can be started with phd, though it is not run by default.

Also, I do not think it will be necessary to create a new transaction type, so I am going to rename this task.

Christopher renamed this task from Create custom field transaction to record "points status" to Use SprintFactsDaemon to record and aggregate Maniphest Transactions related to points and status changes..Dec 15 2014, 2:44 PM

Change 179916 had a related patch set uploaded (by Christopher Johnson (WMDE)):
adds SprintFactDaemon and PhabricatorFactSprintEngine

https://gerrit.wikimedia.org/r/179916

Patch-For-Review

Christopher renamed this task from Use SprintFactsDaemon to record and aggregate Maniphest Transactions related to points and status changes. to Use SprintFactDaemon to record and aggregate Maniphest Transactions related to points and status changes..Dec 15 2014, 2:55 PM
Christopher updated the task description. (Show Details)

Is it really necessary to have a dedicated daemon for this? Would it be possible to trigger the SprintFactDaemon functionality from the existing phabricator worker queue or from a regular cron job? IMO, we should try to avoid the extra complexity of managing one more daemon with phd if there is a simpler way to do it ...

twentyafterfour:
is the fact app a dead-end? We've got a team building a scrum app for phabricator
and they decided to build part of it on top of the fact app.
Is it a horribly bad idea to depend on facts app in any way?

epriestley: the app is not a dead end but is definitely nowhere near stable enough to build on

The daemon could be launched from a cron job. It does not have to be a continuous background process. Using a cron launched daemon is a rather clean method to aggregate stats. It is not really "simple", but using redis or some async mq is probably more complex.

Facts is definitely immature, but its basic workflow functionality is only what is required for Sprint at the moment. The main benefit is that we can use the table schema. The local implementation is really thin in that it basically calls LiskDAO directly without depending on many Facts methods.

There are two main issues that are being addressed with your comment. Scalability and Stability.

Scale -> Complexity: If running a periodic task daemon that runs heavy queries (inserting thousands of rows at a time, perhaps) will stress the production installation, your concern is noted. However, if the instance is stressed now, when difffusion and differential are implemented, this problem will become even more real. I do not know exactly how big the production design will scale, but it seems to be a bit slow right now. Maybe established profiling numbers should be published so that this can be analyzed (from a programmatic viewpoint) more closely?

Stability: There is *bad* code in the "stable" version of Phabricator now. For example, the TaskEditController for example has one function that is 744 lines long and Scrutinizer gave "F"s to most of the Workboard classes. I would not consider this code to be immutable, yet it has many functional dependencies. Sprint is still experimental and a just an extension, so if upstream changes something with Facts, Sprint will just have to be modified accordingly.

The existing incremental calculation logic of Sprint cannot accommodate all of the weird exceptions that may occur with Tasks, and Facts is a solution to this problem. Using Facts to record transactions is something that could be useful for evaluating and reporting aggregates, not only for Sprint but for all of Phabricator. The Maniphest reports are all you have right now, and the code for these is way too complex. If you look at loadRecentlyClosedTasks() in ManiphestReportController (line 667), the comment says "// TODO: Gross. This table is not meant to be queried like this. Build real stats tables." Facts is the way forward, but as epriestley has said, reporting is a "low priority", and may take awhile to implement.

So, it is a question of time. I suggest that we test the modified prototype daemon to see what solutions it provides and also what problems it may cause. If it doesn't work, then the alternative is to wait patiently for upstream to make something better.

Change 179916 merged by Christopher Johnson (WMDE):
adds SprintFactDaemon and PhabricatorFactSprintEngine

https://gerrit.wikimedia.org/r/179916

@Christopher:Thank you for thoroughly addressing the issues I raised. This does help ease my concerns quite a bit.

I don't explicitly have a problem with extending the facts app, and if it fits your needs then that's great. My concern about upstream changes disrupting your work remains though. Would you be opposed to forking the facts application code and the db schema so that upstream changes won't matter? Just a suggestion for something to think about.

@chasemp: what do you think of all this, given the last response by @Christopher?

I no longer see the need for implementing Facts and agree with @mmodell about waiting for upstream to deliver something more robust.

The main issues that would have been addressed with this can be fixed with the following behavior assumptions.

  1. The current modified behavior of Sprint is to not allow closed points to be deducted if a task is reopened. This solves the previous problem of an incorrect points remaining when a task is reopened. Basically, if a task is closed more than once, it accumulates points closed for every time it is closed. Points remaining then is always consistent, because it is computed as the difference between points total and points closed for that day (plus points reopened and points added).
  1. Do not add resolved tasks to a Sprint. This is an upstream problem because task status has no exception hooks. If there were, it would be a question of not allowing this to happen.
  1. Scope reductions are not known to Phabricator projects so they cannot be known in Sprint. Upstream should implement a new project transaction types to record task actions in a project. The current method is "task centric" in that a project is added to a task as a task transaction. The project has no reciprocal transaction to record a task being added to it.

I am closing this issue and all blocking tasks based on this.