Page MenuHomePhabricator

Wikimedia Technical Conference 2019 Unconference: Federated MediaWiki
Closed, ResolvedPublic

Description

Unconference notes:

MediaWiki federation
TechConf 2019

  • Tgr: Some people are working on a project for having wiki cluster on cloud services - can be used to test ou tnew project ideas
    • Foundation had an exploratory phase 'til 2008, several new projects got created (wikiversity, wiktionary, wikinews...)
    • After 2008 exploration largely stopped
    • Experimenting with a project incubator wiki farm
  • TGR: hook wikis with central sources!?
  • [slide] What are the things a wiki might want to get from a central source?
    • InstantCommons and other foreign file repos (and global image usage)
    • Wikidata and other uses of a Wikibase client-server relationship
    • content sharing
    • identity sharing
    • imported content (and ExternalUserNames)
    • global user prefs
    • in the future, maybe cross-wiki page forks per T113004
  • maybe global templates and gadgets?
  • https://phabricator.wikimedia.org/T216112 "support data sharing in complex networks of MediaWiki wikis"
  • TGR: Most of functionality currently assumes you're in the same db cluster
    • Hoping we can move to a world where we rely mostly on APIs

Discussion

  • BD: This has been a discussion for a long time.
  • BD: This has a lot of overlap in general with what Birgit has been calling the small wiki toolkits.
    • We have a platform for collaborative knowledge, but on small wikis we do not have rules /governance / workflows. We don't have tools to help patrol, deal with spam etc
    • AdamShortland: from commons/wikidata pov. Data from commons are just on wikidata yeah! Lot of services are implemented based on a database lookup.
      • once things get updated, how cache get invalidated. All this work for multiple wikis support needs some API that are nice to use
      • wikidata has items and properties, commons has wikimediainfo. How do you get the link? That has to be in wikidata.
      • A use case for a small wiki could be: I want my entire template namespace to come from that other wikis. Or even 3 remote places, potentially fallback to a local implementation?
  • Piotr: For mobile frontend / mobile development web, we came up with idea of content provider - we just point to production
    • Works well to retrieve templates, articles etc
    • The problem is the frontend - we need to overwrite base URL that Ajax requests go to - CSRF
  • Addshore: We have entity sources which are essentially the same thing as content provider
  • Piotr: one can use your own database, MCS, mw media api
  • Proxying: How to do proxying for production RESTBase
  • AM: We have 3 different implementations?! Entity sources in Wikibase, Content Provider in Mobile Frontend, and ForeignAPI in MediaWiki core
  • AS:
  • BD: InstantCommons has 2 functional modes
  • AS: mediawiki/core with all the refactoring that happened recently will make it easier to implement a potential federated / cross wiki system.
  • tgr: taking InstantCommons as an example, there is not much way to find reuses of content. A file might be deleted for copyright, but nothing would notify users (other wikis) of that content
    • mediawiki/core doesn't really have concept of files authorship (?)
  • Piotr: Would be nice to talk about some general... Domain driven design?
  • Tgr: URLs ...
  • AS: Versioning of APIs - using the action API would be possible, but a lot more effort if there were nicer APIs.
  • BD: You could version the action API
  • Piotr: Quite difficult to imagine a 3rd party wiki that would like to act as a proxy
  • AS If global templates existed - you could say that one of my sources for templates is going to be English Wikipedia, for example
  • P: If we talk about sourcing data from place to place - do I want to have mulitple data sources?
  • AS: In terms of Wikibase we definitely see use cases for multiple places
  • We should keep ... caching and performance in mind.
  • What happens when you have 2 wikipedias with the same page title?
  • BD: Search being federated. How would you discover templates/modules/gadgets that are out there somewhere else?
  • AS: We're going to have to do that for Wikibase. The moment you have a Wikibase that can use properties from both... Combined on the back end, calls out for one or more other things, reconcile and rank the results...
  • AM: Do we want to federate MediaWikis or centralize them?
    • Take search for example - search it on several wikis - algorithmically complicated.
    • If you had everything in a single wiki.
  • AS: Our initial version is gonna be the local versions appear first and then you can choose the remote one.
  • AM: You could make the same point for templates for example.
  • BD: But now you have one single template
  • AS: Not having it centralized makes it possible to differentiate between use cases and select / pick solely the ones you are interested in
    • There could be a global template source
  • ?: Template transclusion?
  • BD: Funding. 2030 strategy sort of level.
    • To be the infrastructure for open and the hub of free knowledge
    • freeknowledge.wikipedia.org ?!! Could federate all the best content in a central place
    • One master URL space
    • AS: 3rd party users
  • Piotr: the all sourcing data from others places, is that single or both ways? Could we write back to a foreign wiki?!
  • AS: Wikidata bridge
    • Clients have a copy of API and talk to that
  • Tgr: going to wikidata to vandalize pages that include data
  • Tgr: This (federated editing?) is largely a social problem.
    • notifications / messages to the central user page should (?) be propagated
  • global user preferences / federated / global user prefs
  • page forking
  • Daren you could imagine having a copy of a wiki on raspebery pi for places without internet. Having edits there and then upstreaming the content produced, requiring tools to handle the potential merge conflicts.
  • BD: main thing missing in data storage is putting an id for the wiki in everything
  • Speed - database access is faster than HTTP
    • You'd need caching
    • Change propagation
  • BD: Without the edge caching we do on English Wikipedia, there's no way we could serve any of our content - worrying about if it needs to be baked all the way down in would probably keep you from getting anything done
  • Tgr: What is stage 1 for this?
    • Volunteer work
    • Global authentication
    • Global userpages
    • BD: Writing up an RfC about the conceptual idea of federation
  • How do you make it clear to users how to edit something federated from another source?
  • How do you see changes that have affected something on your watchlist?
  • Oversight / deletion / concerns with content going away

Action items:

  • Write the RfC
  • BD: "Federated RfC should have federated authorship"
    • AS: Wikibase side, we've got a lot of stakeholders

<(o^o)>

Event Timeline

  • TGR: Most of functionality currently assumes you're in the same db cluster
    • Hoping we can move to a world where we rely mostly on APIs

We'll probably wind up with the InstantCommons model: data accessed via a code abstraction. with implementations for direct DB access and for API access, as the former usually gives much better performance when it's possible.

  • AS: Versioning of APIs - using the action API would be possible, but a lot more effort if there were nicer APIs.
  • BD: You could version the action API

I doubt versioning would actually get you very much in practice. When the Action API[1] breaks compatibility, it's usually because of some underlying issue[2] where old versions of a versioned API would break or be discontinued too. When it's not, it's usually after a very long deprecation process to the point where a versioned API would work similarly by dropping old versions to reduce the maintenance burden. But I've had that argument elsewhere.

And if a particular Action API module does need a version, it's easy enough for the module to take a version parameter. Witness formatversion=2 for ApiFormatJson and ApiFormatPhp.

[1]: At least the parts in core, I can't speak to various extensions.
[2]: e.g. a security problem, or a change to the underlying data model.

@Quiddity: Thank you for proposing and/or hosting this session. This open task only has the archived project tag Wikimedia-Technical-Conference-2019.
If there is nothing more to do in this very task, please change the task status to resolved via the Add Action...Change Status dropdown.
If there is more to do, then please either add appropriate non-archived project tags to this task (via the Add Action...Change Project Tags dropdown), or make sure that appropriate follow up tasks have been created and resolve this very task. Thank you for helping clean up!

@Tgr @bd808 @Addshore (@AM?) - Are there any followup actions that need to be taken, from this session's notes and Anomie's comment?
If yes: Please comment or file tasks as needed.
If no: Please boldly Resolve this task.
Thanks!

I think the reasonable next step if anyone actually wants to make this happen is starting a technical RFC which would then probably spawn other technical RFCs as big issues were uncovered. I don't personally have the passion to drive that, but I would be happy to help someone who did want to drive. From my point of view this would be a many years long journey, but there are lots of cool things that could come out of it.

@Tgr @Addshore: If there is nothing more to do then please resolve this lingering task. Thanks!

Addshore claimed this task.

I think the reasonable next step if anyone actually wants to make this happen is starting a technical RFC which would then probably spawn other technical RFCs as big issues were uncovered. I don't personally have the passion to drive that, but I would be happy to help someone who did want to drive. From my point of view this would be a many years long journey, but there are lots of cool things that could come out of it.

I don't have the passion to drive that right now either.
However as mentioned during the tech conf right now WMDE is working on an initial version of federated properties for wikibase.
We will also be tackling a further iteration of this later in the year.

This will eventually progress toward needing to formalize some of these harder questions in RFC form etc.