Phlogiston dump is missing projects
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• JAufrecht
	Oct 25 2019, 5:53 PM

Description

Steps to reproduce

Download a Phlogiston dump
Open in python

> with open('../phabricator_public.dump') as dump_file:
      data = json.load(dump_file)

Examine the data.project.projects dict:

> len(data['project']['projects'])

Actual result

*** TypeError: object of type 'NoneType' has no len()

Expected Result
Thousands of keys found.

Note that the dump is not empty; it's 1.1 Gb and the adjacent data set, project.columns, is populated:

> len(data['project']['columns'])
15978

The Phlogiston dump has not been actively used in 2019, AFAIK, and the reports have been empty and/or erroring for many months, so this may not be a recent change.

Revisions and Commits

rPHTO phabricator-Tools
	rPHTOb3b4a8587022 When passing a tuple to string formatting, include an ending comma

Related Objects

Mentioned In: T108199: Evaluate Phabricator for its ability to export and import data
Mentioned Here: rPHTOb3b4a8587022: When passing a tuple to string formatting, include an ending comma

Event Timeline

• JAufrecht created this task.Oct 25 2019, 5:53 PM

Adding @mmodell since this is going to be something related to the phab puppet module or the phabricator-tools/public_task_dump.py script.

ArielGlenn moved this task from Backlog to Other teams on the Dumps-Generation board.Oct 29 2019, 2:59 PM

In T236507#5613861, @ArielGlenn wrote:

Adding @mmodell since this is going to be something related to the phab puppet module or the phabricator-tools/public_task_dump.py script.

Thanks @ArielGlenn, for bringing this to my attention!

It's possible (likely even) that the phabricator schema changed and public_task_dump.py needs to be updated to match. I'll try to figure out what's happened.

• mmodell triaged this task as Medium priority.Oct 29 2019, 9:08 PM

So the schema didn't change as far as I can see. I can't actually find any reason why that part of the dump would be empty. I'll keep digging though.

Here is the relevant part of the dump code:

data['task'] = taskdata
data['project'] = {}
data['project']['projects'] = phabdb.get_projectbypolicy(pdb, policy='public')
data['project']['columns'] = phabdb.get_projectcolumns(pdb)

mdb.close()
pdb.close()

with open('/srv/dumps/phabricator_public.dump', 'w') as f:
    f.write(json.dumps(data))

Note that data['project']['columns'] is filled after data['project']['projects'], so a timeout or unhandled exception should prevent data['project']['columns'] from being filled. There don't appear to be any exception handlers in the code either so the presence of data in data['project']['columns'] should indicate that the projects data fetch completed without any detectable errors. The final serialization is done all at once at the end with a call to json.dumps so really any errors should prevent the json from being saved at all. 😕

Any suggestions on how to proceed? There is no specific deadline, but I would like to have some Phlogiston results for consideration in the next few weeks.

I haven't had a chance to look at this further but I will attempt to run the dump script manually and hopefully I can find further clues.

• mmodell added a project: Release-Engineering-Team-TODO.Nov 12 2019, 7:41 PM

• mmodell moved this task from Should be empty (use Release-Engineering-Team) to Soon-ish on the Release-Engineering-Team-TODO board.

	        for p in rules:
	            if p['rule'] == "PhabricatorPolicyRuleProjects":
	                allowedProjects = p['value']
	                break
	        else:
	            allowedProjects = []

Where exactly get PhabricatorPolicyRuleProjects (and PhabricatorPolicyRuleUsers) defined?

The Phlogiston dump has not been actively used in 2019, AFAIK, and the reports have been empty and/or erroring for many months, so this may not be a recent change.

If there is really no use case, then I whether to consider sunseting this dump instead of maintaining it...
Phabricator has its Conduit API (which might lack some functionality but hard to say without an analysis).

Who was using it in the past? Is this https://www.mediawiki.org/wiki/Phlogiston/Data_Loading_Model actively used by anyone?

I like the idea of getting rid of cruft; otoh removing a public dataset seems suboptimal, if the data is not available for download in some other fashion. Call me a fence-sitter.

It has no active users at the moment and has been broken for months. However, I would like to re-activate it for further reporting exploration and prototyping. It may now be possible to replace it with calls to the API (which didn't have all the necessary info and/or wasn't stable when the code was originally written), but that would be a significant refactor. If there's a quicker fix for the dump, it would be more helpful sooner to have the dump back.

ArielGlenn mentioned this in T108199: Evaluate Phabricator for its ability to export and import data.Dec 6 2019, 5:01 AM

I'm running the public_task_dump.py in a terminal session and I found two coding errors which may be related. It's really strange that it didn't cause a problem before but maybe there was a subtle change in python behavior between versions and we just upgraded to a newer python somewhere along the way?

This whole dump script is some pretty messy python 2 code, I'm not sure how maintainable it is in the long term but hopefully I will be able to get it working again for the time being.

Does it need to be converted to python3? How much of a PITA is it going to be to port wmfphablib?

@ArielGlenn: I suspect it might be a pain. It's pretty old code not written with python3 in mind, however, it might not be too bad, it's mostly straightforward but there is just quite a lot of it.

This is fixed in rPHTOb3b4a8587022: When passing a tuple to string formatting, include an ending comma. I ran the new code manually and confirmed that the dump now correctly includes the projects data.

• mmodell added a commit: rPHTOb3b4a8587022: When passing a tuple to string formatting, include an ending comma.Dec 16 2019, 4:09 PM

ArielGlenn moved this task from Other teams to Done on the Dumps-Generation board.Dec 20 2019, 11:43 AM

I ran a new phlogiston process and it handled the new dump without error, and got projects data.

Phlogiston dump is missing projectsClosed, ResolvedPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Phlogiston dump is missing projects
Closed, ResolvedPublic
Actions