Page MenuHomePhabricator

Set up a platform to create, edit and query meta wiki taxonomies
Closed, ResolvedPublic

Description

We're looking to create or set up a platform where users can create meta taxonomies of wiki entities (templates, automated users, etc.). The first example is an attempt to classify templates.

The system should be generic enough to meet the following use cases:

  • automatically generation of datasets (list of templates in en.wp, list of bots in ru.wp)
  • manual curation of metadata for items using their unique identifiers (User:CorenSearchBot = description:"copyvio detection bot", tag: "quality" )

Event Timeline

Maniphest changed the visibility from "Public (No Login Required)" to "Custom Policy".Feb 19 2015, 11:56 PM
Maniphest changed the edit policy from "All Users" to "Custom Policy".
gpaumier claimed this task.
gpaumier triaged this task as Medium priority.
gpaumier updated the task description. (Show Details)
gpaumier changed Security from None to Other confidential issue.
gpaumier removed a project: WMF-NDA.
gpaumier changed Security from Other confidential issue to None.
gpaumier removed a subscriber: Aklapper.

Documenting my thinking of the past few weeks on this project:

The query part of the tool already exists in some way in the form of Quarry. For anything that's available in the MediaWiki database, Quarry is sufficient.

However we're also looking to add and maintain meta-taxonomies on those items, like the classification of templates, bots, etc. according to various criteria that won't be stored in the MediaWiki database. Therefore, we need a system to create and edit properties and statements about various items (see where I'm going yet?).

Requirements

Here's the list of requirements I came up with:

  • The system allows multiple users.
  • Users can create new items.
  • Users can edit items by adding and editing properties / statements about them.
  • Anyone can query the system to generate data sets.
  • The system keeps track of historical changes made to items.

Existing solutions

MediaWiki and Wikibase give us a lot of those requirements for free. Here's a list of pros and cons:

Advantages

  • MediaWiki is a platform we know as sysadmins.
  • MediaWiki is a platform we know as users, meaning it's easier to collaborate on this project on a wider scale.
  • MediaWiki gives us editable pages, user management, history tracking, an API, and an army of scripts for batch importing & editing.
  • Wikibase gives us the structured data framework we need to create and maintain properties & statements about items.

Drawbacks

  • Wikidata queries are currently limited but they're under active development. We can also use a Quarry-like tool to query the database directly.
  • One may feel that MediaWiki+Wikibase is overkill for what we're trying to accomplish here (maintaining meta taxonomies). However, when considering the requirements, creating a new platform from scratch is going to require a lot of wheel-reinventing. Reusing existing tools may not be as fun as building something from scratch, but it may be more efficient.

Querying

We don't want to just maintain taxonomies, we we're also looking to query them, possibly in conjunction with information that's stored in the MediaWiki database. This means querying from multiple sources, which isn't trivial. If we use MediaWiki+Wikibase for the main platform, we save time that we can devote to building a better querying tool with a nice interface. For example, I imagine we could offer "simple" predefined queries that don't require any SQL knowledge.

Authentication

If we want to encourage collaboration on the taxonomies, we need a system as open as possible, but we also need some way of identifying users, preventing spam, etc. The easiest method is probably to require users to use an account, but we want to lower the barrier of entry, so it'd be better if they could reuse an existing account.

MediaWiki can be used both as an OAuth provider and as an OAuth client. Therefore, I can easily imagine using the taxonomy wiki as an OAuth client that would ask users to log in using their SUL account by using Meta-Wiki as the OAuth provider. This would allow users to log in and participate easily, while still making sure edits are identified on the taxonomy wiki, where local account creation could be disabled.

Thoughts?

This is currently where I'm at. What I've outlined above is the result of investigating this project on-and-off over the past few weeks. It's by no means exhaustive or final. It's a sketch of what this could look like, and I'm hoping that by putting my thoughts in public writing, it'll be easier to get some initial feedback and start a discussion.

Side note: Since we're talking about "meta taxonomies", I also considered hosting those taxonomies on Meta-Wiki. It would make some things easier (existing community, no need to maintain a separate wiki installation, etc.) but I was concerned that Wikibase might not play nice with a mixed-content MediaWiki site. Admittedly, my knowledge of Wikibase is limited so I may be overestimating the potential issues.

Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".Feb 20 2015, 6:53 PM
Aklapper changed the edit policy from "Custom Policy" to "All Users".

@gpaumier I think the only viable path for use of Wikibase for this narrow purpose is if we can actually use Wikidata itself. Many templates do have Wikidata entities, so that may not be out of the question. Maybe broach the question on Wikidata itself to see what people think, giving some examples of the kinds of classifications that would be useful to us? I suspect they'd find it a bit too navel-gazey, but it may be worth a shot.

@Eloquence: I'll talk to the Wikidata community about it and see what they think.

Just to be clear, I'm not against creating a new tool from scratch if that's what we decide. I think it would be a fun project. I just want to make it clear that there'll be a time penalty, which may or may not be the same as the one we'd face if we went with a standalone Wikibase.

Understood. I think a standalone tool would have to be pretty minimal indeed to be worth our effort, and a custom Wikibase setup is by definition not minimal. :)

gpaumier lowered the priority of this task from Medium to Low.Mar 27 2015, 3:21 PM

What projects are supposed to be listed here?

Qgil added a subscriber: Qgil.

Just bringing some parents to this orphan task.

kostajh added subscribers: gpaumier, kostajh.

Untagging Contributors-Team per T300558.

@gpaumier if this task is still of interest to you, could you please find some other project tags that might be relevant?

gpaumier claimed this task.

This task was left over from an old project. Closing.