Page MenuHomePhabricator

Technology selection for the Developer Hub
Closed, ResolvedPublic

Description

After agreeing on the requirements of the Developer Hub infrastructure, we need to agree on the technology selection to fulfill these requirements.

Current proposal

After several discussions during the MediaWiki-Developer-Summit-2015, it looks like we have agreement on the basic technology selections:

  • The landing page of the Developer Hub, the basic content structure, showcases, tutorials, and other types of "manual writing" docs would be all powered by MediaWiki, specifically in an own namespace with an own skin in mediawiki.org.
  • The API reference documentation from each software component would be provided by the appropriate tool for each component. At least in this first phase, we would not seek any uniformity of tools, and neither we would require API docs to be imported to wiki pages.

Links from the hub to the api docs and vice versa should make everything easy to find. Having a common search covering all these spaces would be nice, but left for a later stage, or for the personal initiative of someone willing to work specifically in this problem.

.

Details

Reference
fl491
TitleReferenceAuthorSource BranchDest Branch
Data checks for data pipeline outputsrepos/structured-data/image-suggestions!2cparleT312235main
Declare platform_eng specilfic hdfs artifact cache directoryrepos/data-engineering/airflow-dags!119ottoplatform_eng_artifactsmain
Declare platform_eng specilfic hdfs artifact cache directoryrepos/data-engineering/airflow-dags!118ottoplatform_eng_artifactsmain
Change default semgrep policies to use semgrep-rule-mergerepos/security/gitlab-ci-security-templates!11sbassettscotts-semgrep-default-change-T312901main
Fix a handful of minor bugs within the Semgrep Merge Toolrepos/security/semgrep-merge-tool!3sbassettscotts-prod-deploy-testing-fixes-T312807main
Customize query in GitLab

Related Objects

Event Timeline

flimport raised the priority of this task from to High.Sep 12 2014, 1:43 AM
flimport added a project: Web-APIs-Hub.
flimport set Reference to fl491.

spage wrote on 2014-07-28 19:44:07 (UTC)

To the Tools list add

Additional reasons to use MediaWiki beyond the Etherpad:

  • allows all the resources to be at mediawiki.org
  • makes it easier for MW hackers to contribute to the code.
  • allows users to use standard Talk pages to comment on it.

Let's not treat "integrating API docs generated from source code" as some impossible story, let's solve the problems of integrating generated content into MediaWiki, to the general benefit of our software and community. Those problems are:

  1. How to publish generated pages to mw.org
  2. How to present the elaborate TOCs of generated content within an existing MediaWiki skin
  • One way to present static content is to have a Special page that finds it in the file system and renders it. A disadvantage is Special pages don't have talk pages, so the generated content would need links to "Talk about this".
  • Another way is to use the MW API to create a wiki page from each static page. Generated pages sometimes have weird names, so comments on generated docs might get lost when we rebuild.

Tools that generate content produce HTML, but Parsoid can turn hrefs and other HTML markup into wikitext.

"Collaborative editing" doesn't apply to generated content: you fix it by editing the source and commiting changes to git.

qgil wrote on 2014-07-29 14:46:01 (UTC)

(Generous CC because this is probably the most important discussion of this project)

I agree that we should explore the possibilities of MediaWiki before deciding to use something else. MediaWiki has already many pieces in place. Also, MediaWiki is used as documentation tool in many open source projects. If we find a way to publish API documentation automatically to a MediaWiki, I'm sure that this feature will be used by others as well.

This doesn't mean that we should ignore the existing tools out there and write our solution from scratch. Maybe we can use the best tool(s) to extract API documentation out of Git repositories, and then find a way to embed their output in protected MediaWiki pages, as @Spage suggests.

What do we need?

  1. A way to scan Git repositories and extract/update the API documentation whenever there are changes in the source. Is there one tool that fits all, or do we depend on different tools depending on the programming language used in the repo?
  2. A way to publish that output in wiki pages. Would MediaWiki's limited HTML support suffice or should be look at Parsoid? Since nobody will edit these pages directly, we don't need to convert anything to wikitext if the HTML content renders correctly.

The rest would be provided by MediaWiki out of the box:

  • tree structure using subpages
  • page protection to avoid human edits
  • search (optionally restricted to API docs namespace)
  • possibility to watch pages for changes
  • discussion pages
  • possibility to transclude API docs sections and pages in other wiki pages e.g. tutorials

Plus integration with the API Sandbox -- see T481: A single page describing the MediaWiki release checklist

anomie wrote on 2014-08-01 14:49:43 (UTC)

Since Quim asked for my opinions,

  • Where third-party tools are already in use for extracting documentation from code, I doubt it would be a good idea to go trying to change that. But for tutorials and other human-generated documentation, I'd much rather see MediaWiki used (with maybe an export to a static site) than someone trying to create a new content-authoring system.
  • Having a special page that finds content in the filesystem is probably not a great idea, since it doesn't scale very well: that content would have to be synced to every apache. I'd rather see an extension with a ContentHandler that displays data stored in the existing revision storage; it would likely have to disallow direct editing in favor of the content being directly injected by a maintenance script of some sort.
  • I'm not familiar enough with Parsoid to know how well it handles converting arbitrary HTML to wikitext, but I do know there are aspects of HTML that cannot be represented well in wikitext. Whether the third-party-tool HTML is convertible would need investigation.
  • Some third-party tools may well be able to output in formats other than HTML. They may even support MediaWiki wikitext.
  • As far as scanning git repositories and rebuilding documentation, it's likely that Jenkins could trigger a doc rebuild when changes are merged.

As far as sandboxes, as I mentioned on T481 it may make more sense to have multiple sandoxes for each API, rather than cramming them all into one special page. In particular, I know Gabriel has mentioned using a third-party tool to auto-generate a sandbox for his content API.

qgil wrote on 2014-08-21 11:23:04 (UTC)

After some conversations at Wikimania, I believe the best approach we have considering all the factors is the following:

  • Use mediawiki.org - fresh and well maintained MediaWiki installation + common search with the rest of the developer content.
  • Use a specific namespace - good for navigation and search scope.
  • Use a custom CSS / skin unique to that namespace - requires some experimentation, but seems doable.
  • Use existing tools to export API documentation from source code repositories to protected wiki pages.
  • Use MediaWiki's features for watching pages, user comments, translations, etc.
  • Use a specific domain something.wikimedia.org to access directly the homepage - more sophisticated redirects to be discussed

I think this addresses most if not all the concerns from those that were reluctant to use MediaWiki, and preferred to work on a specific tool instead. I'd rather put our limited resources in CSS or skin customization, offering a solution that other open source software projects using MediaWiki can emulate and perhaps contribute to as well.

Do you agree?

legoktm wrote on 2014-08-25 21:04:36 (UTC)

In T491#18, @Qgil wrote:

After some conversations at Wikimania, I believe the best approach we have considering all the factors is the following:

  • Use mediawiki.org - fresh and well maintained MediaWiki installation + common search with the rest of the developer content.

Yes!

  • Use a specific namespace - good for navigation and search scope.

We already have the PD help namespace, which I think would be perfect for this.

  • Use a custom CSS / skin unique to that namespace - requires some experimentation, but seems doable.

Eh...why? Does Vector not look good enough?

  • Use existing tools to export API documentation from source code repositories to protected wiki pages.

Which existing tools are you referring to? By protected pages do you mean anomie's idea of a ContentHandler + maintenance script?

  • Use MediaWiki's features for watching pages, user comments, translations, etc.

+1

  • Use a specific domain something.wikimedia.org to access directly the homepage - more sophisticated redirects to be discussed

The content of https://toolserver.org/ was automatically pulled from https://wiki.toolserver.org/view/Toolserver:Homepage, which was nice. I think you're imagining something like that?

I think this addresses most if not all the concerns from those that were reluctant to use MediaWiki, and preferred to work on a specific tool instead. I'd rather put our limited resources in CSS or skin customization, offering a solution that other open source software projects using MediaWiki can emulate and perhaps contribute to as well.

Is the main problem really CSS or skin customization? TBH, I would think it's actually getting people to write and update the documentation.

MZMcBride wrote on 2014-08-26 00:44:25 (UTC)

You can consider T491#22 my response as well.

qgil wrote on 2014-08-26 06:33:57 (UTC)

Yes, let's make mediawiki.org's MediaWiki the core ingredient in the mix. This will provide features like watching pages, user comments, translations, etc.

And let's go for the details.

In T491#22, @Legoktm wrote:
  • Use a specific namespace - good for navigation and search scope.

We already have the PD help namespace, which I think would be perfect for this.

Since this is not a technology selection per se, let's discussit at T555: Per-user projects for personal work in progress tracking.

  • Use a custom CSS / skin unique to that namespace - requires some experimentation, but seems doable.

Eh...why? Does Vector not look good enough?

Depends on the design we go for. Let's continue this specific discussion at http://fab.wmflabs.org/T480#14 since it has little impact in terms of technology selections.

  • Use existing tools to export API documentation from source code repositories to protected wiki pages.

Which existing tools are you referring to? By protected pages do you mean anomie's idea of a ContentHandler + maintenance script?

Tools that scan Git repositories and extract the API documentation, like the ones listed above in the description of the task. It seems that we agree that reusing existing tools is better than writing from scratch a MediaWiki extension to do this work.

By protected pages I mean that no human should be able to edit the content of those pages. This content comes from the code repositories, and there is the only place where it should be edited. Unless someone builds at some point a bride from MediaWiki to Git, akin to GitHub's feature to edit files via web and create the corresponding patches and pull requests -- something out of scope here and now.

  • Use a specific domain something.wikimedia.org to access directly the homepage - more sophisticated redirects to be discussed

The content of https://toolserver.org/ was automatically pulled from https://wiki.toolserver.org/view/Toolserver:Homepage, which was nice. I think you're imagining something like that?

Another discussion that deserves its own task because it doesn't affect technology selections: T558: Goal: Common project management guidelines to be followed by teams and individual contributors are published

Is the main problem really CSS or skin customization? TBH, I would think it's actually getting people to write and update the documentation.

Both are important problems solved by different people. We want good content in a good context.

legoktm wrote on 2014-08-26 17:26:48 (UTC)

In T491#26, @Qgil wrote:
  • Use existing tools to export API documentation from source code repositories to protected wiki pages.

Which existing tools are you referring to? By protected pages do you mean anomie's idea of a ContentHandler + maintenance script?

Tools that scan Git repositories and extract the API documentation, like the ones listed above in the description of the task. It seems that we agree that reusing existing tools is better than writing from scratch a MediaWiki extension to do this work.

By protected pages I mean that no human should be able to edit the content of those pages. This content comes from the code repositories, and there is the only place where it should be edited. Unless someone builds at some point a bride from MediaWiki to Git, akin to GitHub's feature to edit files via web and create the corresponding patches and pull requests -- something out of scope here and now.

All the tools listed above are for internal source code documentation (I suppose JSDuck is also documenting the public API), not for extracting api.php docs into wiki pages. Maybe I'm misunderstanding what specifically this project is planning to document? TBH, I'd much rather see a proper contenthandler solution.

gwicke wrote on 2014-08-26 18:33:43 (UTC)

In T491#27, @Legoktm wrote:

All the tools listed above are for internal source code documentation

Most of them are certainly not meant to document web APIs. In this list, Swagger would be the only tool actually dedicated to that task.

See https://www.mediawiki.org/wiki/Requests_for_comment/Content_API#Structured_API_specs for some more links to web / REST API spec & documentation tools.

qgil wrote on 2014-08-26 21:59:39 (UTC)

We are talking about two steps:

  1. Extract the API documentation from the source code repository. We'd rather solve this step with existing tools TBD.
  2. Publish that documentation in MediaWiki. We have to decide how to solve this step.

I'm just trying to stir the discussion; you are the experts. How can we solve this problem?

Let's look at the three APIs we have initially selected at T479:

  • RCStream
  • Content API
  • Wikidata API (especially with a focus on multimedia data and Commons)

Is Swagger the right tool to extract the documentation of these three APIs? If not, which tool(s)?

I guess first we need to identify the tool(s) and then we will be able to see which kind of output format(s) we can get from them. Then we can discuss how to import that output in MediaWiki, and how to keep code repositories and wiki pages in sync.

legoktm wrote on 2014-08-26 22:42:24 (UTC)

What is the "Content API"? The Wikidata API stores no multimedia data except for sitelinks to Commons. I'll take a look at Swagger, but my first impression is that it's for "RESTful APIs", and I don't think RCStream fits under that.

anomie wrote on 2014-08-27 15:40:57 (UTC)

In T491#29, @Qgil wrote:

Let's look at the three APIs we have initially selected at T479:

  • RCStream
  • Content API
  • Wikidata API (especially with a focus on multimedia data and Commons)

Is Swagger the right tool to extract the documentation of these three APIs? If not, which tool(s)?

Of these three, Swagger is probably only for the Content API.

I don't know much about RCStream. It depends on whether there's end-user documentation in-code; if not and they don't want it there, the best method may be to directly write the docs in wikitext on mediawiki.org.

For the Wikidata API, we might be able to extract something from the api.php documentation (or make the api.php documentation directly includable via some parser function?). But there will probably also want to be human-generated documentation (likely written directly on mediawiki.org) that goes more in depth, unless this already exists somewhere that I don't know about.

qgil wrote on 2014-08-28 07:02:10 (UTC)

For details about documenting each API, see

We can always write documentation manually in wiki pages. Let's focus the discussion here on the APIs that do have documentation in code. So far it looks like one tool will not be enough for all cases. We have to find a way to publish api.php content in wiki pages, and also content generated by Swagger. Ideally, the procedure to import API docs tom wiki pages would be the same, working for different backends. Do you agree?

Qgil lowered the priority of this task from High to Medium.Oct 7 2014, 9:06 PM
Qgil removed Qgil as the assignee of this task.Oct 11 2014, 6:27 PM
Qgil subscribed.

After several discussions during the MediaWiki-Developer-Summit-2015, it looks like we have agreement on the basic technology selections:

  • The landing page of the Developer Hub, the basic content structure, showcases, tutorials, and other types of "manual writing" docs would be all powered by MediaWiki, specifically in an own namespace with an own skin in mediawiki.org.
  • The API reference documentation from each software component would be provided by the appropriate tool for each component. At least in this first phase, we would not seek any uniformity of tools, and neither we would require API docs to be imported to wiki pages.

Links from the hub to the api docs and vice versa should make everything easy to find. Having a common search covering all these spaces would be nice, but left for a later stage, or for the personal initiative of someone willing to work specifically in this problem.

Do we have an agreement here?

In T312#998615, @Qgil wrote:
  • The landing page of the Developer Hub, the basic content structure, showcases, tutorials, and other types of "manual writing" docs would be all powered by MediaWiki, specifically in an own namespace with an own skin in mediawiki.org.

Yup, I'm thinking dev: or doc: As for skin, the stuff @werdna and @Prtksxna built for http://living-style-guide.wmflabs.org/wiki/Main_Page is a good starting point.

  • The API reference documentation from each software component would be provided by the appropriate tool for each component. At least in this first phase, we would not seek any uniformity of tools, and neither we would require API docs to be imported to wiki pages.

Links from the hub to the api docs and vice versa should make everything easy to find. Having a common search covering all these spaces would be nice, but left for a later stage, ...

It's pretty important, I created T87802: "Search the docs" that searches static documentation as well as on-wiki doc namespaces.. E.g. if some team documents its database layout in project/docs/DB_TABLES.md and we publish this to doc.wikimedia.org, it would suck not to find this when searching on mediawiki.org

Do we have an agreement here?

I agree and I'll be writing the content.

People are still confused between "a hub for developers and researchers interested in Wikimedia data and the APIs to interact with it" and an overall goal of improving our developer documentation. The former is a new, SUB-area of our developer documentation. For example, OOjs UI documentation is not part of it.

In T312#999513, @Spage wrote:

People are still confused between "a hub for developers and researchers interested in Wikimedia data and the APIs to interact with it" and an overall goal of improving our developer documentation. The former is a new, SUB-area of our developer documentation. For example, OOjs UI documentation is not part of it.

Yes. I think we need to be clear and consistent with the goal we want to complete here: "a hub for developers and researchers interested in Wikimedia data and the APIs to interact with it". This hub will link to other interesting venues for developers, and everybody should be able to find what they are looking for. But this hub doesn't plan to solve all the documentation problems.

Qgil lowered the priority of this task from Medium to Low.Feb 4 2015, 8:50 AM

We have decided to prioritize the creation of some actual content under the existing API: namespace in mediawiki.org.

In T312#3738, @flimport wrote:

qgil wrote on 2014-07-29 14:46:01 (UTC)

(Generous CC because this is probably the most important discussion of this project)

I agree that we should explore the possibilities of MediaWiki before deciding to use something else.

If this approach is used, licensing needs to be handled correctly. Generally, the code licenses used for MW (mainly GPL, though with notable exceptions) are not compatible with the default license for MediaWiki.org.

The simplest solution is to have a custom license for certain namespaces or pages. For example, the Help namespace on MediaWiki.org is CC0. Perhaps this namespace could be GPL, or it could be per-page, but that's more confusing. This should be addressed before any content is imported.

Releasing the documentation under a CC0 license will make it flexible and easy to use. Is there any existing licensed content that you plan to incorporate in the hub?

Releasing the documentation under a CC0 license will make it flexible and easy to use. Is there any existing licensed content that you plan to incorporate in the hub?

I guess Matt's concern arises from documentation that is generated, transcluded, or published from source code into a system like mediawiki.org or doc.wikimedia.org with a different license. IANAL :)

In T312#1107217, @Mattflaschen wrote:

This should be addressed before any content is imported.

  • I'm mostly writing content from scratch.
  • We already transclude generated API documentation into wiki pages, e.g. https://www.mediawiki.org/wiki/Extension:TextExtracts#API
  • The generated API documentation at api.php doesn't seem to mention a license, though its source code has a <link rel="copyright" href="//creativecommons.org/licenses/by-sa/3.0/" /> line the same as a wiki content [sage.
  • Looking at doc.wikimedia.org, both JSDuck and PHPDoc seem careful not to mention any copyright to their generated pages.
In T312#1108671, @Spage wrote:

Releasing the documentation under a CC0 license will make it flexible and easy to use. Is there any existing licensed content that you plan to incorporate in the hub?

I guess Matt's concern arises from documentation that is generated, transcluded, or published from source code into a system like mediawiki.org or doc.wikimedia.org with a different license. IANAL :)

Got it. At the moment, there are a few different licenses in our docs:

  • Code and auto-generated portions of the docs: may be GPL v2 or any later version (per MediaWiki's license) or the license of the relevant extension (if the extension specified a different license).
  • Content on MediaWiki.org: Usually CC BY-SA 3.0 Unported (per MediaWiki.org's rightsinfo), CC0 (for Help pages), or GFDL (for Manual pages)
  • Content we write from scratch: We get to choose the license. I recommend a permissive license like CC0 for the sake of simplicity, but we can discuss this further if you have other concerns. or wish to incorporate any CC BY-SA of GFDL text.
In T312#1107217, @Mattflaschen wrote:

This should be addressed before any content is imported.

  • I'm mostly writing content from scratch.
  • We already transclude generated API documentation into wiki pages, e.g. https://www.mediawiki.org/wiki/Extension:TextExtracts#API
  • The generated API documentation at api.php doesn't seem to mention a license, though its source code has a <link rel="copyright" href="//creativecommons.org/licenses/by-sa/3.0/" /> line the same as a wiki content [sage.
  • Looking at doc.wikimedia.org, both JSDuck and PHPDoc seem careful not to mention any copyright to their generated pages.

If you use a permissive license (like CC0), then we should avoid incorporating any CC BY-SA content from elsewhere on MediaWiki.org. We can quote portions of code and auto-generated docs, but we should explain that it is under a different license (like GPL v2 or later) on a licensing page for your new docs. If you want to use CC0 (or decide to use another license), then follow up with me via email and I can send specific advice on how we should proceed.

In T312#1108671, @Spage wrote:
  • The generated API documentation at api.php doesn't seem to mention a license, though its source code has a <link rel="copyright" href="//creativecommons.org/licenses/by-sa/3.0/" /> line the same as a wiki content [sage.

OutputPage adds that, and I don't see any hook or callback to override it. Pages like https://www.mediawiki.org/wiki/Help:Magic_words have the same deal despite being CC0.

The API documentation is in general presumably under the same license as the rest of our i18n, but I don't see any clear indication of what that is if it isn't GPLv2.

I thought the discussion about the namespace in mediawiki.org was clear and agreed, but @Krinkle (and probably @GWicke) are not on board, and considering their role in API and developer documentation in general, I think we need to discuss further until we all agree on a plan.

Krinkle challenges the idea of basing dev.wikimedia.org in a MediaWiki at T91626#1102477 and following comments. What would be the alternative, and is there something we can learn from https://doc.wikimedia.org ?

Gabriel seems to be skeptical with the idea of centralized developer documentation in general (? see T87702), and with his team has launched a different approach at https://rest.wikimedia.org ... but this is just part of the picture, right?

Meanwhile, we still have to create "a hub for developers and researchers interested in Wikimedia data and the APIs to interact with it". While the source documentation platforms are clear (MediaWiki for HowTo docs, git for API docs), we still miss a clear plan to serve our users (T92941) and satisfy our requirements (T310).

@Spage can continue working in API:Data_and_developer_hub under the assumption that we will have a namespace in mediawiki.org, since no matter how the homepage and howto docs need to be written, but we need a common plan urgently, and move forward with its implementation.

@Qgil, I am on board with a central entry point with tutorials and other high-level documentation. The detailed API specs / docs need to be accurate and up to date though, which is why we decided not to use a wiki to document the API in detail. I also think that there is good value in having a sandbox environment that lets you try each end point easily.

Folks, I believe everyone is in violent agreement with the split in the two bullets of the task description here (as of March 17, I don't know how to permalink to a Phab task version). We'll continue to link to reference material, wherever it may be, when writing wiki pages for "a data and developer hub encouraging third-parties to access Wikimedia's data sets and experiment with our APIs."

Yes we need a plan for "documentation in git", but I don't think it's as urgent.

In T312#1107217, @Mattflaschen wrote:

Filed as T93994: Special:ApiHelp should include licensing information

  • The generated API documentation at api.php doesn't seem to mention a license, though its source code has a <link rel="copyright" href="//creativecommons.org/licenses/by-sa/3.0/" /> line the same as a wiki content [sage.

Filed as T93995: api.php help should not output wrong link rel="copyright"

  • Looking at doc.wikimedia.org, both JSDuck and PHPDoc seem careful not to mention any copyright to their generated pages.

I think this is inadvertent, not careful.

Doxygen: T93996: Doxygen output should include MediaWiki license (in footer)
JSDuck: T93997: JSDuck output should include codebase's license (in footer)
Puppet RDoc: T93998: RDoc puppet documentation should state license
Sphinx: T94000: Sphinx generated documentation should state license in footer
Ruby Gems: T94001: Ruby gem documentation should state license

I'm not saying the above are high priority issues. But we shouldn't say "There are problems elsewhere, so we don't have to try to do it right here."

Are the Manual pages really GFDL-only? That text was originally added way back in 2006, well before the licensing update. IANAL, but IIRC the licensing update applied to all text.

OutputPage adds that, and I don't see any hook or callback to override it. Pages like https://www.mediawiki.org/wiki/Help:Magic_words have the same deal despite being CC0.

Good point. Filed as T94002: $wgRightsPage, $wgRightsUrl, and $wgRightsText should be customizable by namespace

Qgil claimed this task.

As far as I can see, we have made our technology selections for T101441: Goal: Integrate the new Web APIs hub with mediawiki.org. This is the tangible goal we have now and we will focus on it, leaving the other hypothetical discussions for later (unless someone comes with the resources to sync up with our goal on time).

Resolving. Feel free to open new tasks to tackle specific open topics, if any.