Closed tickets in Bugzilla migrated without closing event?
Open, LowestPublic

Description

We found in the Grimoire stats for Maniphest that the number of closed tickets per month up to 2014 was 0, which obviously is wrong, since Bugzilla tickets were migrated to Maniphest.

The problem seems to be due to the migration process. Apparently, tickets closed in Bugzilla were migrated to Maniphest as closed tickets, but with no closing event. And the trouble is that for calculating closed tickets per month, and the backlogs in the past, in Grimoire we use closing events, which have a date.

See for example these two, closed but without closing event:

Compared to these two (look for "closed this task as Resolved"):

I wonder if the missing closing events could be migrated to Maniphest. If that is not possible, we will look for some workaround in Grimoire, but the stats for before 2014 will probably render void.

Bugzilla had about 7500 duplicates, with this distribution over time:

jgbarah created this task.Jul 29 2015, 7:14 AM
jgbarah updated the task description. (Show Details)
jgbarah raised the priority of this task from to Needs Triage.
jgbarah moved this task to Backlog on the Analytics-Tech-community-metrics board.
jgbarah added a subscriber: jgbarah.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 29 2015, 7:14 AM
jgbarah updated the task description. (Show Details)Jul 29 2015, 7:26 AM
jgbarah set Security to None.
Qgil triaged this task as Low priority.Jul 29 2015, 7:29 AM
Qgil added a subscriber: Qgil.
Qgil added a subscriber: chasemp.Jul 29 2015, 7:32 AM

I hadn't notice that there is no closing event in our migrated tickets. I don't think we are going to touch that now, so I guess we will have to live with separate Bugzilla metrics for the period before November 2014.

This is not a huge deal, because the move to Phabricator also meant the move of tickets handled in Trello / Mingle / RT plus a wave of new activities, so in any case it doesn't make sense to map directly Bugzilla and Phabricator, other than to see the jump in the volume of activity.

So probably this is a task we will have to decline.

PS: CCing @chasemp, who knows the details about the Bugzilla migration just in case he has anything to add.

OK, waiting for @chasemp comments, if any.

I'm confused on what is wanted here. Is the idea that tasks from bugzilla should have a kanban workboard style card movement transaction to close them out?

Qgil added a comment.EditedJul 29 2015, 8:29 PM

No, this is about the blue lines flat between Jan 2002 and Nov 2014 at http://korma.wmflabs.org/browser/maniphest.html, because closed Bugzilla reports migrated to Maniphest don't state when they were closed, just that they were closed.

This is probably the main reason I still frequent the old bugzilla data.

In fact, the main reason would be simplifying (and making it more accurate) the retrieving of historical stats of almost any kind. For example, if at some point you're interested in studying the historical evolution of time-to-close, those closing records will be needed.

The solution would be to include a closing event (which could replicate the structure of Maniphest closing events) with the date of closing. That data is obviously in the Bugzilla database. But if it were convenient, we could probably provide as well a listing based on the information we have in the Bicho database compsed with Bugzilla records.

For sake of completeness, it could be interesting migrating not only the closing events, but all events (comments, status changes, etc.). .But maybe that's out of scope...

Aklapper updated the task description. (Show Details)Jul 30 2015, 8:57 AM

I've corrected the task desc to say "closed as resolved" (=task status change) instead of "moved to done" (=workboard column change) to avoid confusion.

Importing closures would be very welcome. While at it, it would be good to fix the status of RESOLVED DUPLICATE reports, which were incorrectly migrated to "Resolved" IIRC.

Among other issues, we currently have no way to know what the status of a report was before being reopened: reopening a report effectively destroys its history. https://phabricator.wikimedia.org/T51024#1465675

[off-topic]

While at it, it would be good to fix the status of RESOLVED DUPLICATE reports

Very different topic - if you feel strongly about it please create a separate task. In short: We considered that in November 2014 but setting "merged" in Maniphest tasks required creating edges between the master task and the duplicate task. Would have made the conversion logic more complicated for no strong-enough reason (excluding tasks with "duplicate" status from search results did not and does not seem important enough, so I'd decline such a request).

Among other issues, we currently have no way to know what the status of a report was before being reopened [...] https://phabricator.wikimedia.org/T51024#1465675

Not sure how often tasks closed before Nov 2014 get reopened and how often it's important to find out the previous status. Using https://static-bugzilla.wikimedia.org/show_bug.cgi?id=49024 seems to be the most simple way and sufficient enough for my needs.

Guys, any news about the details of the Bugzilla migration? Is it going to be possible to have this "closing" event in the migrated tickets?

I don't think anybody currently plans to add those John Doe closed this task as "Resolved". events to the resolved (etc.) tickets imported from Bugzilla.

Qgil added a comment.Aug 10 2015, 4:38 PM

I agree, I don't think any further work will be done to fix this at this point. I think we should decline this task and move forward with what we have. If this means that we will have one graph for Bugzilla and then one for Phabricator, then so be it.

I mean in theory we could do this I think as we have the data:

select header from bugzilla_meta where id=2002\G;

header: {"cf_hugglebeta": "---", "classification": "Unclassified", "creator": "hanzomon05@aol.com", "cc": [], "depends_on": [], "cf_platform": "---", "creation_time": "1114677600.0", "is_open": false, "keywords": [], "cf_browser": "---", "summary": "Spontaneous User Generated Bug", "id": 2002, "severity": "normal", "is_confirmed": true, "is_creator_accessible": true, "priority": "Normal", "platform": "PC", "version": "unspecified", "status": "RESOLVED", "product": "Wikimedia", "blocks": [], "qa_contact": "", "see_also": [], "component": "General/Unknown", "groups": [], "target_milestone": "---", "is_cc_accessible": true, "url": "", "whiteboard": "", "last_change_time": "1114793072.0", "flags": [], "assigned_to": "wikibugs-l@lists.wikimedia.org", "update_token": "1416530927-4w274KLqMmhGvbgRnBMyVxpjFVpPwdaqk5jD3WmRmIc", "resolution": "INVALID", "op_sys": "Windows XP"}

last_change and status are basically this, but it would take some work to do safely.

I'm going to agree with andre and qgil that it isn't prioritized atm. If some time in the future there is a case made for it we could in theory revisit.

Aklapper lowered the priority of this task from Low to Lowest.Aug 10 2015, 10:57 PM

I had this on my (admittedly not high priority) wishlist of things to fix up. Lemme copy over the email I sent to Greg last week:

I saw the recent discussion on T32106 Explicitly disallow GPL v3, where you pointed out that "I think I missed that this was already declined....???". I did too, so I skimmed the history of the issue. I didn't see who did it or when, which was a little frustrating.

I found the state change in my old email, which was when Brion marked it "RESOLVED/WONTFIX", concurrently with his first comment.

I did all this anal retentive sleuthing and documentation not because I deeply care about the GPLv3 issue that much, but because it'd be nice to make it so that a person doesn't have to have bugmails going back to 2011 in a searchable archive in order to find this information out. Is that something you think is worth looking into? It's not anywhere near the top of my "when I get back, I'm gonna..." list, but I thought I'd bring it up since I went this far already.

Thanks RobLa, that's a worthwhile example because it shows people are more likely to stay near to where they are (e.g. their email) and certainly they are not going to

  • understand that information important to them is missing from the report and instead lives elsewhere,
  • figure out the meaning of the mysterious bzXXX note hidden somewhere on the page,
  • empathically find out the existence of the domain static-bugzilla.wikimedia.org,
  • manually construct the URL to reach the history and finally put pieces together.

Related: T882.

Just putting a vague date / deadline to reach to a conclusion on this discussion.

Qgil raised the priority of this task from Lowest to Low.Sep 25 2015, 11:48 AM
Aklapper added a comment.EditedOct 23 2015, 5:19 PM

I mean in theory we could do this I think as we have the data:

select header from bugzilla_meta where id=2002\G;

header: {"cf_hugglebeta": "---", "classification": "Unclassified", "creator": "hanzomon05@aol.com", "cc": [], "depends_on": [], "cf_platform": "---", "creation_time": "1114677600.0", "is_open": false, "keywords": [], "cf_browser": "---", "summary": "Spontaneous User Generated Bug", "id": 2002, "severity": "normal", "is_confirmed": true, "is_creator_accessible": true, "priority": "Normal", "platform": "PC", "version": "unspecified", "status": "RESOLVED", "product": "Wikimedia", "blocks": [], "qa_contact": "", "see_also": [], "component": "General/Unknown", "groups": [], "target_milestone": "---", "is_cc_accessible": true, "url": "", "whiteboard": "", "last_change_time": "1114793072.0", "flags": [], "assigned_to": "wikibugs-l@lists.wikimedia.org", "update_token": "1416530927-4w274KLqMmhGvbgRnBMyVxpjFVpPwdaqk5jD3WmRmIc", "resolution": "INVALID", "op_sys": "Windows XP"}

last_change and status are basically this,

I'm afraid no.

If a BZ ticket got resolved as invalid in 1984 and reopened in 1988 and closed again as fixed in 1997 and I added a comment on it in 1999 and nothing ever happened after that, last_change_time would simply be 1999.
So it would not help constructing "JohnDoe closed this task as Invalid" and insert into the task around 1984 and constructing "JohnDoe closed this task as Resolved" and insert into the task around 1997.

@chasemp: So I'm not even sure if we would have the data to "fix" this task, regardless of any potential resources assignment.

Summarizing issues brought up here:

So I'm not even sure if we would have the data to "fix" this task, regardless of any potential resources assignment.

Starring at https://git.wikimedia.org/blob/phabricator%2Ftools.git/1af5b11e418918ab2c1f69ac4379bcb01305dc0d/bugzilla_fetch.py I think we didn't pull that data

mmodell added a subscriber: mmodell.EditedOct 23 2015, 6:40 PM

we could at least link to old-bugzilla.wikimedia.org/bugNN.html I guess, it doesn't really solve the problem but it makes the bug history sleuthing easier.

Oh and that is already a task: T882

Continuing on my comment in T107254#1748539

@chasemp: Am I right (you know that code best)?
If I am not right (data is still available somewhere in some DB): In two-three sentences, any idea how technically complex / time consuming / risky it would be to extract and merge that into the Phabricator DB?

chasemp added a comment.EditedNov 4 2015, 5:00 PM

If we know what we want from bugzilla it should be fairly trivial to update the row for each task in the bugzilla_migration.bugzilla_meta.

Looking at the two examples:

2002

https://static-bugzilla.wikimedia.org/bug2002.html

I believe the comment that accompanied the action is: https://phabricator.wikimedia.org/T4002#62639

We stored:

{"count": 2, "author": "avarab@<redacted>", "text": "INVALID, take block complaints somewhere else, like to the WikiEN mailing list,\nsee http://mail.wikipedia.org/mailman/listinfo/wikien-l", "creator": "avarab@<redacted>", "creation_time": "1114678092.0", "bug_id": 2002, "time": "1114678092.0", "id": 7170, "is_private": false}

There is nothing associated with this comment that we got from bugzilla that indicates it is tied to the closing event. It would be difficult to impossible to link this as we have it now.

The header is equally clueless:

header: {"cf_hugglebeta": "---", "classification": "Unclassified", "creator": "hanzomon05@<redacted>", "cc": [], "depends_on": [], "cf_platform": "---", "creation_time": "1114677600.0", "is_open": false, "keywords": [], "cf_browser": "---", "summary": "Spontaneous User Generated Bug", "id": 2002, "severity": "normal", "is_confirmed": true, "is_creator_accessible": true, "priority": "Normal", "platform": "PC", "version": "unspecified", "status": "RESOLVED", "product": "Wikimedia", "blocks": [], "qa_contact": "", "see_also": [], "component": "General/Unknown", "groups": [], "target_milestone": "---", "is_cc_accessible": true, "url": "", "whiteboard": "", "last_change_time": "1114793072.0", "flags": [], "assigned_to": "wikibugs-l@lists.wikimedia.org", "update_token": "1416530927-4w274KLqMmhGvbgRnBMyVxpjFVpPwdaqk5jD3WmRmIc", "resolution": "INVALID", "op_sys": "Windows XP"}

AFAICT we got last_change_time and status but not enough contexual information that would overcome the problem outlined in https://phabricator.wikimedia.org/T107254#1748539

7

https://static-bugzilla.wikimedia.org/bug7.html

wontfix action: https://phabricator.wikimedia.org/T2007#25011

We stored:

{"count": 15, "author": "gwicke@<redacted>", "text": "Describing all of WikiText in EBNF is simply impossible, as parts of it are context-sensitive. Closing as wontfix for that reason.", "creator": "gwicke@<redacted>", "creation_time": "1368229779.0", "bug_id": 7, "time": "1368229779.0", "id": 243303, "is_private": false}

So no indication from the comment metadata we got from bugzilla that this comment is associated with the status change.


TLDR: we don't seem to have the metadata in our migration db to provide this however...

I forget what version of the API / bugzilla we were on but do remember it wasn't current. This does exist though https://www.bugzilla.org/docs/5.0/en/html/api/Bugzilla/WebService/Bug.html#history. It's labeled "experimental" but was added in 3.4.

any idea how technically complex / time consuming / risky it would be to extract and merge that into the Phabricator DB?

We store metadata by bug id in a table as json. If someone had the relevant history we could easily update that. It could be as simple as a ['status_change'] = timestamp for each bug. From there we have to dummy up a "closing event" or look at how https://phabricator.wikimedia.org/T75922#786596 is stored. That isn't terribly bad, we did a lot of this kind of thing previously. For instance tasks created a certain way did not come w/ title transactions so we created them. I would say this is a bit tricky but with a bit of testing and right metadata pretty achievable.

steps:

  1. get the relevant status change history event timestamp, probably from the old-bugzilla DB(?).

save it as yaml or json or whatever.

  1. update the row for each bug in our migration history table, this is pretty easy
  2. write a job that creates a transaction for the event for each task (this would be close to what we did for titles as referenced above)

medium technical, low risk with testing, a bit time consuming

Thanks for the quick answer @chasemp!

I believe the comment that accompanied the action is: https://phabricator.wikimedia.org/T4002#62639

Yes it is - the comment was added in Bugzilla on 2005-04-28 08:48:12 UTC and the History view of bug2002 shows the same time stamp.

TLDR: we don't seem to have the metadata in our migration db to provide this however...

Alright. Thanks for confirming my impression.

https://www.bugzilla.org/docs/5.0/en/html/api/Bugzilla/WebService/Bug.html#history

We were running 4.4. JSON and XML are available in 4.4 but not REST.

Bugzilla ticket #7 saw a bunch of status and resolution changes. To take a look:

curl -X GET "http://bugs.wmflabs.org/jsonrpc.cgi?method=Bug.history&params=%5B+%7B+%22ids%22%3A+%5B7%5D+%7D+%5D"

We'd be only interested in "field_name":"resolution" and "field_name":"status" changes.

Two items about complexity:

  • Is importing all status and resolution changes a ticket has ever seen (and not just the last one) more complex, or would that "just" be inserting more closing and reopening events? Do we actually know how to insert a reopening event?
  • The status/resolution mapping in T212. For example we turned both status=RESOLVED&&resolution=FIXED and status=RESOLVED&&resolution=DUPLICATE into Phabricator task status=RESOLVED. Just pointing out that we still don't plan and will not mark duplicates as such - too complex.

Two items about complexity:

  • Is importing all status and resolution changes a ticket has ever seen (and not just the last one) more complex, or would that "just" be inserting more closing and reopening events? Do we actually know how to insert a reopening event?

It's not something i would choose to get into unless we have a need but the idea is the same. Some of the complexity shows in the reopening idea. Inserting an event that shows up on a task as an explicit closer for posterity is one thing, we are not actually bothering with the actual status of the task itself etc. We are only inserting a historical marker. If we want to insert a bunch of them in historical sequence that reflect the history in that way I'm not sure what the challenges are. Multiple types of events that all have to add up to the already set "real" state etc. It's not difficult I guess, but it is nuanced. i.e. task state is not the consequence of every replayed transaction but stored separately

my thinking was somehere in this line:

  1. have the closing timestamp on hand for bug
  2. create a closing transaction (confirming first the bug is now closed in the new system)
  3. insert closing transaction

the state for these issues is now stale in the old system so if the "event" we want imported doesn't now match the state in phab it will get confusing quick.

Just a minor note. From the point of view of the historical information, what is needed is that "closing transaction" (in Bugzilla, a change of state to closed) with the date of the closing action. That allows for calculating the backlog at any snapshot time in the past, for example.

I guess this is addressed in your last comments, but just to be sure you understand the problem we see...

Aklapper assigned this task to greg.Nov 10 2015, 9:45 AM

I think chasemp and I worked out the technical side of things in the last comments, so it's @greg 's call whether and how to assign resources to this task.
Temporarily assigning to Greg.

Restricted Application added a project: User-greg. · View Herald TranscriptNov 10 2015, 9:45 AM
Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJan 19 2016, 2:03 PM
greg added a comment.Jan 19 2016, 3:52 PM

I think chasemp and I worked out the technical side of things in the last comments, so it's @greg 's call whether and how to assign resources to this task.
Temporarily assigning to Greg.

No time/resources yet.

greg removed greg as the assignee of this task.Feb 22 2016, 8:52 PM
greg removed a project: User-greg.
Aklapper lowered the priority of this task from Low to Lowest.Feb 23 2016, 8:57 AM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptJul 2 2016, 7:55 PM

I don't think it's acceptable that now users (well, one user) have to manually re-mark as duplicate every single duplicated bug, spamming everyone and confusing the history of our tasks.

Nemo_bis updated the task description. (Show Details)Jul 31 2016, 8:06 AM
Aklapper added a comment.EditedMay 5 2017, 12:11 PM

Took another look at this due to hashar's last comment.

Regarding Grimoire, GrimoireELK/grimoire_elk/elk/phabricator.py looks like the relevant code.

If I close a task in Phab (which got created in Phab) there are (at least?) two DB table changes:

  • In DB phabricator_maniphest in table maniphest_task, the dateModified field + status fields get updated. Those values are already correct here.
  • More relevant, in DB phabricator_maniphest in table maniphest_transaction, a new row is created with the transactionType field set to status and the newValue field set to "resolved"/"invalid" etc. This seems to be the culprit here. Example rows:

MariaDB [phabricator_maniphest]> select * from maniphest_transaction WHERE id > 237115;

idphidauthorPHIDobjectPHIDviewPolicyeditPolicycommentPHIDcommentVersiontransactionTypeoldValuenewValuecontentSourcemetadatadateCreateddateModified
237116PHID-XACT-TASK-euxufvvpfgjznhqPHID-USER-cgilgxteicxndvcw5w2tPHID-TASK-p5cx22ewpdefj6lyr46apublicPHID-USER-cgilgxteicxndvcw5w2tNULL0status"open""resolved"{"source":"web","params":[]}[]14939840841493984084
237117PHID-XACT-TASK-pgenejesrmhwhzePHID-USER-cgilgxteicxndvcw5w2tPHID-TASK-q7ngormcr2yfovudv3xmpublicPHID-USER-cgilgxteicxndvcw5w2tNULL0status"open""invalid"{"source":"web","params":[]}[]14939846291493984629

So if we ever wanted to manually insert fake closing events into our DB, we'd probably need to find out how the phid value is created (to not collide with existing ones).
Quick and dirty, dateModified and dateCreated could be mass-set to 1416614400 (which equals Sat, 22 Nov 2014 00:00:00 GMT).
And of course we'd need to have a list of the tasks to actually manipulate.

So if we ever wanted to manually insert fake closing events into our DB, we'd probably need to find out how the phid value is created

Replying to myself: generatePHID() in PhabricatorApplicationTransaction.php calls PhabricatorPHID::generateNewPHID. PhabricatorPHID.php shows that the suffix of PHID-XACT-TASK- is simply created by calling Filesystem::readRandomCharacters(15);

So in theory we know all we'd have to do. In practice I wonder if someone else could comment on potential implications / risk.

I don't think it's acceptable that now users (well, one user) have to manually re-mark as duplicate every single duplicated bug

Closed duplicates are a special case. https://secure.phabricator.com/T12234#225298 provides information about what would be required in the database.

And of course we'd need to have a list of the tasks to actually manipulate.

Not covering tasks resolved as duplicates:

SELECT maniphest_task.id, maniphest_task.phid, maniphest_task.status 
FROM maniphest_task 
WHERE (maniphest_task.status != "open" AND maniphest_task.status != "stalled" AND maniphest_task.status != "duplicate") 
AND maniphest_task.id > 2000 AND maniphest_task.id < 75682 
AND maniphest_task.phid NOT IN 
(SELECT maniphest_task.phid 
FROM maniphest_task 
JOIN maniphest_transaction 
WHERE maniphest_task.phid = maniphest_transaction.objectPHID 
AND maniphest_transaction.dateModified > 1416697200 
AND maniphest_transaction.transactionType = "status" 
AND (maniphest_transaction.oldValue = "\"open\"" OR maniphest_transaction.oldValue = "\"stalled\"") 
AND (maniphest_transaction.newValue != "\"open\"" AND maniphest_transaction.newValue != "\"stalled\""));

The query explicitly excludes resolved tasks with status set to duplicate in the maniphest_task table, as proper fiddling with "resolved as dup" tasks requires inserting two rows in the maniphest_transaction table (one transactionType value is core:edge and the other one is mergedinto instead of status) and also changes in the edge table. See my previous comment.
1416697200 is the epoch when the first task T75682 was created after migration, as we only want resolved tasks which did not see any status change after 2014-11-23 and which are still resolved (but not resolved as a duplicate).

Qgil removed a subscriber: Qgil.Jun 1 2017, 12:08 AM
hashar removed a subscriber: hashar.Oct 19 2017, 6:11 AM