Page MenuHomePhabricator

Evaluate Phabricator for its ability to export and import data
Closed, DeclinedPublic

Description

Exporting data from Phabricator on an object level (task, project, user, etc.) can be quite difficult. Importing data into Phabricator is also not as easy as it should be. See T715 among others.

The schema design of Phabricator relies extensively on relational ids and there appears to be no easy way to export individual data sets as "open data" with object metadata descriptors. These seems to me to be a potential problem for integration with and possible future migration to other tools.

The premise that Phabricator will be the "be all, end all" software development tool is as yet unproven. In fact, it is probable that in the near future something new (and perhaps better) will appear. It behooves us to seriously consider how to get data out of Phabricator if the need arises.

I propose that the evaluation ask the following questions:

Question 1: Is it possible to readily export Phabricator data to another platform?
Question 2: Is the Phabricator API able to provide and facilitate open integration with other platforms?
Question 3: How does Phabricator measure up against other Software Development tools in regard to "open data" export and import?

Event Timeline

Christopher raised the priority of this task from to Needs Triage.
Christopher updated the task description. (Show Details)
Christopher added a project: Phabricator.
Christopher added subscribers: Christopher, JanZerebecki.
Christopher renamed this task from Evaluate the Phabricator for its ability to export and import data to Evaluate Phabricator for its ability to export and import data.Aug 6 2015, 5:02 PM
Christopher set Security to None.

This came up because upstream refuses to take patches for anything that does import/export or integration with other systems unless you pay them $1500/hour. See https://secure.phabricator.com/T9066 for the discussion, and note that the upstream application discussed there is marked prototype and https://secure.phabricator.com/book/phabcontrib/article/contributing_code/ says "We do not accept patches against prototype applications.".

What is the intent of this task, in the end? Are you proposing that if it fails your three questions at this point in time we should move away from Phabricator back to Bugzilla? (not going to happen) :)

Or, shall we think of this more positively: We identify areas that could use improvement and, potentially, use some of our budget to get them supported (whatever that looks like, I'm not sure) in Phabricator upstream? Or something else (maintain our own Phab App?)?

Also, @JanZerebecki: I can't find a statement from upstream equal to "upstream refuses to take patches for anything that does import/export or integration with other systems unless you pay them $1500/hour". From what I read in the linked upstream task is "this isn't a priority at all for us, if you want it to be, we can be told it is via a contract/payment". That's a reasonable response from them.

It's unclear to me what is supposed to happen once these three questions have been answered and this task is closed as resolved. (And who is supposed to investigate / judge.)

[...]

That comment came across as unnecessarily confrontational.

Also, @JanZerebecki: I can't find a statement from upstream equal to "upstream refuses to take patches for anything that does import/export or integration with other systems unless you pay them $1500/hour".

Yes there is no direct quote for that, I used more than 2 separate documents with multiple statements and reasoned what the logical outcome of following all the statements is.

From what I read in the linked upstream task is "this isn't a priority at all for us, if you want it to be, we can be told it is via a contract/payment". That's a reasonable response from them.

So if one were to upload a patch and then reviewing and merging that is not a priority at all? (As in nobody with the permission will do it.) Isn't that equivalent to refusing to take patches?

I read https://secure.phabricator.com/T9066 as Christopher asking if it is ok if he sends in a patch for that, it was refused, which is AFAIK consistent with their contribution rules I linked in my previous comment.

What is the intent of this task, in the end? Are you proposing that if it fails your three questions at this point in time we should move away from Phabricator back to Bugzilla? (not going to happen) :)

It's unclear to me what is supposed to happen once these three questions have been answered and this task is closed as resolved. (And who is supposed to investigate / judge.)

From my perspective: We find out all this is a misunderstanding, then we go and happily contribute upstream, cooperate with them and are able to integrate with other systems we need. That is why I pushed for discussing this with other Wikimedians who work with Phabricator upstream. However I can't speak for Christopher.

[...]

That comment came across as unnecessarily confrontational.

To not distract from the topic at hand I responded to this criticism of my person or tone via personal mail.

It's unclear to me what is supposed to happen once these three questions have been answered

depends on the answer. if it's "we can export easily" all is good and no action is needed, if it's"we can't ever export" we should probably consider paying a bit for that feature

Or, shall we think of this more positively: We identify areas that could use improvement and, potentially, use some of our budget to get them supported

I would expect it's this. Seems like the intention is finding out whether export is currently possible or not which then tells us if improvement is needed.

Phabricator upstream is very conservative about what they are willing to take on maintenance of, because they don't want to have unmaintained code in the upstream and they simply don't have enough time / enough people to maintain a lot of new features. This is perfectly reasonable of them, although challenging for those of us who want to contribute code.

As discouraging as it might be, their policy doesn't preclude features from being designed as extensions, with careful consideration given to the way which the extension is integrated in order to avoid huge maintenance headaches.

The schema is difficult but not excessively so. They have a lot of existing infrastructure which can be used to export data. There is also the phabricator-tools framework, which was what we used to import our bugzilla data, and the data could be pulled out the same way it went in, without even touching the phabricator php codebase.

Having just spent many months researching "linked open data" and developing a native XML database application has definitely changed my perspective and perhaps given me a bias against certain elements of Phabricator's design. It seems to me now "ineffective" to not have a linked object data model directly accessible from the api or interface. There are many reasons why Phabricator should provide export (and import) functionality directly and as a first development priority. What surprises me, and is disconcerting, is how dismissive they are of the general need to integrate with other data sources. They genuinely seem to perceive Phabricator as the ultimate developer product. This attitude is sometimes referred to as "blind arrogance". Phabricator has many limitations, and clearly recognizing them can only provide insight as to how to chart a way forward for future implementations and expansion/extension of the data set that has already been created in the past months following the Bugzilla migration.

Do not misunderstand this as a criticism of the way that Phabricator works, which I entirely respect. I see the problem is more in the apparent lack of concern by Phacility about the "big data" picture and how it is entirely relevant to Phabricator. What I would like to see is a redirection in development efforts from trying to create every possible gizmo application for every possible use, to really creating one thing that allows people to implement their own data visualization solutions. This is what I perceive as the real strength in the Phabricator application architecture, flexibility.

Regarding the specific task at hand, I think that these questions should be used to guide an independent development effort towards "opening up" Phabricator data, which is entirely possible with little if any upstream intervention/support. (Maybe by providing a mirrored data instance on a different domain with a forked codebase?). How about an application called "Phlood" that provides RESTful api data exports for every object view for starters. (i.e. https://api.wikimedia.org/T1000/export that would provide complete JSON/RDF/XML data set for the task)?

The import data and integration question is also very relevant, especially if Phabricator is perceived as providing a "console" view for all project development activity. The potential for "mashup" views exist with the Phabricator interface. Ultimately, I think that it can be said with some certainty that it is improbable that Phabricator will be the only tool that will be used by WMF for software development, so the reality is that Phabricator should be able to accommodate disparate and remote data source integration. And, Upstream will eventually need to allow application/json content type requests for this to happen...

Phabricator upstream is very conservative about what they are willing to take on maintenance of, because they don't want to have unmaintained code in the upstream and they simply don't have enough time / enough people to maintain a lot of new features. This is perfectly reasonable of them, although challenging for those of us who want to contribute code.

As discouraging as it might be, their policy doesn't preclude features from being designed as extensions, with careful consideration given to the way which the extension is integrated in order to avoid huge maintenance headaches.

This is about a fix to existing code so that it can better work with other data sources. This fix is something they will need themselves for a prototype application that they already have in their code base, so your argument about maintenance burden is not applicable here. This fix is needed for the extension/application Christopher is building.

AFAIK their policy explicitly says that they will refuse any patches that are necessary for a 3rd party application/extension to work and also for any prototype application they already have in their own code base.

I'm not interested in if their stance is reasonable or not. I just want to know for sure if we can get such fixes upstream or not? If we can work with upstream or not? (And thus need to pay money for that or not use Phabricator or fork.)

As far as I can see, it does not make sense for WMF to be "held hostage" by Phacility. Their services as solution providers/consultants do not collude well with their role as developers. The decision whether or not to pay them as consultants to solve integration issues seems obvious to me. Their rates are too high and what they are able to offer is simply not that valuable.

To feasibly address the main integration issues going forward will require some "big thinking" that includes and also extends beyond Phabricator. And I personally see that this solution should be built with LOD. The data aggregation would occur in a triple store and the query interface would be SPARQL. The relevant software development data concepts should be mapped with ontologies, with the goal being to abstract the data structures from the tools.

The objective of better data integration and smoother development workflow can be achieved by aggregating the tool data in the triple data store and allowing for transparent access (import and export) to this data store. "Phasing out" a viable tool (e.g. Gerrit) before this is done may actually hinder productivity without realizing the intended benefit to the community.

The use cases for having a consolidated data store for all tool generated data are many. I can see many developers possibly wanting to use their IDE to retrieve data with a SPARQL query rather than using the browser for example. The argument can readily be made why Phabricator is not the only way forward, and this discourse is merely trying to provide some new options....

See also: T123417: Create conduit method to query the feed and return records with relevant details populated instead of just a bunch of phids

I intend to build an extension that publishes the public phabricator change feed as a usable API with the individual object references (PHIDs) resolved to their data so that the consumer (primarily the Wikibugs bot) would not have to do a bunch of complex lookups in order to build a view of what's changed.

I know it's not directly related to this but it is, perhaps, relevant to this discussion.

You are probably already aware of this, but the new edit engines feature provides standardized conduit APIs. This means it is becoming more straightforward to search and update a phabricator object from external systems. (Conduit API Documentation)

What is the intent of this task, in the end?

I'm also still wondering. There are some concerns in this task, some maybe valid, but with regard to the general feeling my crystal ball is broken for any application and platform out there whether it will provide the perfect database architecture and support for exporting and importing data now and in the future.
There is some API in Conduit.
If you are afraid of vendor lock-in, we host the data ourselves, we can access our DB, can write (external) code to export and import (like we did when moving away from Bugzilla and RT).

If there are specific problems, those should be reported in dedicated tasks. Upstream is free to decline certain requests and downstream is free to implement and maintain the delta.

Or, shall we think of this more positively: We identify areas that could use improvement and, potentially, use some of our budget to get them supported (whatever that looks like, I'm not sure) in Phabricator upstream?

That's what we've been doing this year in T136213.

This task is high-level and abstract enough and missing "success criteria" that I propose to close it as declined nowadays.

We're on Phabricator for many things right now. Data is here in it's database. There may indeed be another migration in the (hopefully distant) future as technologies evolve but for now that is not on the plate.

When the time comes to migrate (or some other need shows up) evaluating how to dump data out of Phab and into something else will be a part of that.

Similarly for importing data into Phabricator (as we've done with RT and BZ and Mingle and...).

If you are afraid of vendor lock-in, we host the data ourselves, we can access our DB, can write (external) code to export and import (like we did when moving away from Bugzilla and RT).

^ that.

I have not tried Conduit API yet, it could be a step into importing and exporting, but it's just much relevant if there is some exporter/importer, even if it worked only with phabricator and it's not guaranteed between different versions.

But coupling other tools to the internals of phabricator (database) doesn't sound the best solution.

The gist of the topic is to be able to extract the data in some structure that is not coupled with Phabricator's internal, and also, if possible, follow some open standard, or even enforce a new one if needed.

I know that the development here is open-source, and you have got your own rules about how the upstream code shall exist for maintenance reasons.
But if this is truly the case I would suggest you to change your philosophy and follow the CentOS/Fedora model by having a second upstream that can go wild for the sake experimentation.

The point isn't users want the developers to maintain stuff for free, but it's rather, Phabricator is great and the users wish to see it better by contributions.

@Weshouman I am not sure if that's what this is about but i want to point out there are Phabricator dumps at https://dumps.wikimedia.org/other/misc/ nowadays and the last comment on this ticket was a couple years ago.

The dumps referenced above are missing some content and have been a bit fidgety to maintain. For more on that, see T236507.

@Weshouman: Welcome to Wikimedia's Phabricator instance. If you have general comments about the Phabricator software (which are not specific about the instance of Phabricator running on Wikimedia servers), please contact the upstream Phabricator community at https://discourse.phabricator-community.org/ if you think that you can speak for "users" (as this is your very first comment on the Wikimedia Phabricator instance) - thanks! :)

Christopher was right in 2015.

We are now running into the predicated scenario.

It's hard to export individual tasks and Phabricator is no longer maintained upstream.

We are trying to use the community fork, Phorge, but can only rely on that still having the same database scheme. Migrating to another tool would be a problem.

When the time comes to migrate (or some other need shows up) evaluating how to dump data out of Phab and into something else will be a part of that.

The time has come and I don't know how to dump data out of Phab, especially without including private tasks.