⚓ T96903 Identify and prioritize architectural challenges

		Status	Subtype	Assigned	Task
		Invalid		• RobLa-WMF	T87470 Streamline the Architecture Committee and the RfC process (tracking)
		Declined		None	T96903 Identify and prioritize architectural challenges

It's a good list but in order for this not to be a yet another sprawling etherpad of good ideas, this task must be timeboxed and produce two artifacts:

Evolving the MediaWiki architecture (incomplete) wiki page authored and signed off by the Architecture Committee, without any question marks.
The bulleted list in this task's description becomes a prioritized list of features to implement and RFCs to approve, expressed as {TNNNN} Phabricator task links.

@Spage, I hope that we can identify a shared list of the most urgent architectural questions we need clarity on. The next step can then be your 2), converting that into RFCs / tasks and working out the details until we get to an agreement.

Your step 1) sounds more like a vision statement, which I agree we need as well. It is a big task though, and we might have an easier time making progress towards one by focusing on concrete questions we all care about first.

• GWicke set Security to None.May 5 2015, 8:05 PM

• GWicke added a subscriber: • ssastry.

• brooke subscribed.May 6 2015, 8:06 PM

• GWicke updated the task description. (Show Details)May 12 2015, 5:20 PM

Move to HTML5, with wikitext as edit UI?

We need to think through the implications of this. Besides the usual things like having to sanitize HTML, additional aspects are:

impact on storage size
ability to represent in some form of normative wikitext as well as round-trip requirements through wikitext -- this requirements informs the next one
need for a HTML spec -- unless we decide we'll store arbitrary HTML (in which case we need to figure out what it means for wikitext editing)
some kind of normalization of HTML that is stored (this is also a fallout of the previous bullet point)
what are the implications when HTML spec evolves to say HTML6? Are we thinking of running a mass conversion script to migrate old revisions to the new format? Or, on-demand format conversion script? Or is this not a concern?
what about a more compact representation that is not tied to a specific html version but which can be rapidly transformed to html or wikitext?

• GWicke updated the task description. (Show Details)May 12 2015, 11:13 PM

Today @brion, @Catrope, @ori & myself met and discussed overall priorities. We agreed that the area we should focus on first is T99088: [RFC] Evolving our content platform: Content adaptability, structured data and caching, based on the first candidate section in the task description.

We'll start to flesh out issues and solutions in the next days, likely as an RFC. We'll solicit wider input once we have a first outline in place, with a goal of discussing the topic at the Lyon hackathon.

• GWicke updated the task description. (Show Details)May 12 2015, 11:21 PM

• GWicke updated the task description. (Show Details)

In T96903#1280612, @ssastry wrote:

Move to HTML5, with wikitext as edit UI?

We need to think through the implications of this. Besides the usual things like having to sanitize HTML, additional aspects are:

impact on storage size

ability to represent in some form of normative wikitext as well as round-trip requirements through wikitext -- this requirements informs the next one

need for a HTML spec -- unless we decide we'll store arbitrary HTML (in which case we need to figure out what it means for wikitext editing)

some kind of normalization of HTML that is stored (this is also a fallout of the previous bullet point)

We already sanitize HTML, and will continue to do so. This defines a safe and somewhat semantic subset of HTML5. It would certainly be great to document this outside of the code, as a combination of the DOM spec and the sanitizer definitions.

what are the implications when HTML spec evolves to say HTML6? Are we thinking of running a mass conversion script to migrate old revisions to the new format? Or, on-demand format conversion script? Or is this not a concern?

I don't see any issues that are different from migrations within HTML5.

what about a more compact representation that is not tied to a specific html version but which can be rapidly transformed to html or wikitext?

Anything in particular that you have in mind here?

• Spage mentioned this in T91744: Have an architecture guidelines & roadmap session at the Lyon hackathon.May 13 2015, 8:41 PM

• GWicke mentioned this in T99088: [RFC] Evolving our content platform: Content adaptability, structured data and caching.May 14 2015, 4:28 PM

• GWicke updated the task description. (Show Details)

• Spage mentioned this in T489: Create a wiki page about the Architecture Committee.May 15 2015, 1:43 AM

• BGerstle-WMF subscribed.May 18 2015, 9:41 PM

• GWicke added a subscriber: faidon.May 19 2015, 9:51 PM

@daniel points out we should compare this with the "pain points" Etherpads from the S.F. Developer Summit, see what clusters overlap.

daniel updated the task description. (Show Details)May 23 2015, 7:07 PM

• Mattflaschen-WMF updated the task description. (Show Details)May 24 2015, 1:08 PM

daniel mentioned this in T594: Architecture Committee proposal for Wikimedia Foundation engineering priorities in Apr-Jun 2015.May 25 2015, 1:11 PM

daniel added a project: TechCom.

At the meeting at the Lyon Hackathon, 2015-05-24, we identified several key points we think should serve as guiding principles and high level tasks for the development of the MediaWiki platform for the foreseeable future.

THIS IS AN INCOMPLETE DRAFT

Content Representation

HTML vs Wikitext
structured data, meta-data (Language-neutral)

Rationale: there is an increasing need to represent, store, and process other kinds of documents besides wikitext.

Multi-Content Revisions

meta-data and other "attachments"
primary vs derived
sub-revisions

Rationale: the ability to attach multiple types of content to a given revision, and the ability to edit multiple types of content in one logical edit, will allow us to become much more flexibly with respect to integrating different types of media and structured data. Using the concept of page revisions, the management of media files and structured data can be integrated more closely into the wiki way of content curation.

Generalized Transclusion

HTML-based transclusion
Late content assembly

Rationale: being able to render, store, and use bits and pieces of page content individually should improve performance, and make us more flexible in regards to which content can be used where, and how.

Smart Caching

Late content assembly / widgets
CDN

Pushing the assembly and rendering to the edge of the cluster, or even to the client, will improve our ability to scale horizontally.

Modularity and Testability

Dependency Injection
Interface segregation
Unit testing vs. integration testing

Rationale: Modularity improves maintainability and reusability, as well as testability. Having better tests will allow more confident changes, and thus speed up development. Improving modularity and testability on all levels is key to achieve the other goals mentioned here.

Service Oriented Architecture

RestBase & co
No more LAMP?! What about 3rd party installs on shared hosting?

Rationale: the ability to move components to separate locations / hardware adds another degree of flexibility and scalability.

Client Diversity

Different rendering for different devices
Localized renderings of neutral content
Multilingual content

Rationale: improving our handing of different locales and devices is key to making content available to more people in more regions and languages. WE need to improve support for this aspect of content delivery especially with respect to caching.

Remove Assumption

Do not assume wikitext (or any text)
Do not assume information is local
Do not assume information is static

Rationale: In general, dropping assumptions allows more freedom. In particular, dropping these assumptions is necessary to achieve the goals described above.

Hackathon2015-ArchPrios3-20150524_160152.jpg (1×2 px, 2 MB)

Hackathon2015-ArchPrios2-20150524_160159.jpg (1×2 px, 2 MB)

Hackathon2015-ArchPrios1-20150524_161135jpg (1×2 px, 2 MB)

daniel added a subscriber: tstarling.May 25 2015, 1:26 PM

Some related notes about wikitext when I was trying to prepare for the (not accepted) wikimania talk to start thinking about ways to evolve wikitext. https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Wikitext

Please ignore the specific syntactic details, but even where a page comes from wikitext markup, envisioning a page as being basic markup + HTML-DOM-shaped holes (which can be filled by transclusions, extensions, widgets, whatever) might be an useful abstraction and could fit within the multi-content revision / content representation headings.

• Spage updated the task description. (Show Details)May 28 2015, 12:22 AM

In T96903#1281257, @GWicke wrote:

In T96903#1280612, @ssastry wrote:

Move to HTML5, with wikitext as edit UI?

We need to think through the implications of this. Besides the usual things like having to sanitize HTML, additional aspects are:

impact on storage size

ability to represent in some form of normative wikitext as well as round-trip requirements through wikitext -- this requirements informs the next one

need for a HTML spec -- unless we decide we'll store arbitrary HTML (in which case we need to figure out what it means for wikitext editing)

some kind of normalization of HTML that is stored (this is also a fallout of the previous bullet point)

We already sanitize HTML, and will continue to do so. This defines a safe and somewhat semantic subset of HTML5. It would certainly be great to document this outside of the code, as a combination of the DOM spec and the sanitizer definitions.

This came up in a different context (T100225#1315556), but unless wikitext editing is going away, we need to provide a spec for input HTML (and hence storage HTML, if HTML is being stored) and normalization routines that preserve semantics but generate reasonable editable wikitext. With the caveat that I haven't spent a lot of time thinking about it, I do however think that this is a non-trivial constraint that goes over and beyond sanitization requirements. On the other hand, just like editors can today write arbitrary HTML by using HTML tags instead of wikitext constructs and editing policies / norms on wikis constraint wikitext input, I suppose editing norms might similarly provide the necessary constraints on HTML. In any case, worth pondering this a bit.

what about a more compact representation that is not tied to a specific html version but which can be rapidly transformed to html or wikitext?

Anything in particular that you have in mind here?

Nothing specific at this time. I was tempted to offer this for 2 reasons: (a) with a markup format, you naturally constrain the kind of HTML you generate / accept. (b) you could potentially represent content more compactly. But, without a real proposal, that is just a vague idea at this time.

jeremyb-phone added a subscriber: jeremyb.Jun 8 2015, 2:24 PM

• GWicke updated the task description. (Show Details)Jun 9 2015, 3:00 PM

• GWicke updated the task description. (Show Details)Jun 9 2015, 6:06 PM

• GWicke updated the task description. (Show Details)Jun 9 2015, 8:48 PM

• GWicke updated the task description. (Show Details)Jun 9 2015, 8:52 PM

• GWicke updated the task description. (Show Details)Jun 9 2015, 8:58 PM

• GWicke renamed this task from Identify and prioritize architectural questions to Identify and prioritize architectural challenges.Jun 9 2015, 10:21 PM

• GWicke updated the task description. (Show Details)

• Fhocutt subscribed.Jun 20 2015, 12:23 AM

Ltrlg subscribed.Jun 20 2015, 8:19 PM

greg subscribed.Jun 22 2015, 6:55 PM

Smalyshev subscribed.Jun 23 2015, 8:03 PM

Glaisher subscribed.Sep 4 2015, 4:47 PM

• Spage mentioned this in T109612: Define main themes of the Wikimedia Developer Summit 2016.Sep 9 2015, 11:20 PM

cscott subscribed.Sep 18 2015, 6:13 PM

Apologies in advance.

Although the listed architectural challenges are useful and interesting, they are, by and large, completely *invisible* to our user community.

I am interesting in "identifying and prioritizing architectural challenges" *that enable new ways of community interaction*. Is anybody else with me?

Just as a starting point: what about changing how we represent revisions in mediawiki? What are the architectural challenges preventing a fork-and-merge model of community contribution?

Or what about HTML-only wikis? What challenges are preventing us from moving past wikitext entirely, so our users never have to see or use it? (See T112999 for some answers.)

Perhaps this is the wrong forum. But if so, I would be interested in pointers toward a user-focused architectural discussion, if one exists.

• Gilles mentioned this in T113210: How should Wikimedia software support non-Wikimedia deployments of its software?.Sep 21 2015, 7:52 AM

In the interests of advancing concrete discussion, let me propose four concrete "user-focused" architectural challenges:

Moving to web technologies. It's been twenty years since wikitext was introduced. Wikitext, PHP, and (even) Lua are off of the modern mainstream, and so we force our users to climb barriers to entry before they can use mediawiki or contribute to development. Here are some ways we can resync with modern practice (not intended to be exhaustive, roughly arranged from least controversial to most):
1. T112999: HTML-only wikis. Decouple wikitext from mediawiki-core. This doesn't mean that we're going to turn it off for everyone! Just that we lay the foundation necessary to have wikis which use other representations: HTML-native, markdown, a refreshed wikitext 2.0 -- who knows what the future will bring. Let's refactor core so that we are not tied to wikitext 1.0 going forward.
2. JavaScript support for Scribunto. At the time Scribunto was first developed, heap- and time-limiting in the v8 engine was immature. That limitation is past, let's ensure that folks can use web technologies to script templates, so that learning a brand new language isn't a prerequisite to contributing to our project.
3. Programming language agnosticism in core. We can embed PHP in node, and vice-versa. Let's invest in the infrastructure necessary for mediawiki-core to play well in a multi-language environment. Perhaps a service-oriented architecture is part of this, so that more parts of core can be split into separate services and acccessed via language-agnostic APIs. Perhaps it's investing in a PHP-node bridge so that extensions can be written in JavaScript and play nicely with code's PHP engine. It's too early (and unwise) to consider rewriting the PHP core of mediawiki -- but we can start the process of decoupling PHP from our identity, so that PHP isn't a wall for new contributors to climb.
4. Committing to standard contribution/collaboration mechanisms. We've started embracing composer, which is a good start. But we can also redouble our commitment to accepting patches via github/gitlabs, test suites runnable with travis, and other standard ad hoc mechanisms. Perhaps we should look at something like mattermost with an IRC bridge for a more newbie-friendly interface to our developers. The overall focus should be to embrace commonality with other open source products; metrics should include "how many other projects do this the same way we do", rather than narrowly focusing on ourselves. (Ob. disclaimer: with projects like mattermost and phabricator we obviously can't predict future uptake perfectly, but we can make reasonable guesses about the direction things are going.)
A Social Wiki. The social web has arrived, and it's not going away. Although academic models of interaction are different that facebook chatter, there are many successful social networks aimed at academics. We should be either embracing these features natively or actively integrating them from partner organizations. We need to acknowledge that our readers, editors, and template developers are people, and help them find and communicate with other people. Some related architectural features:
1. Real-time chat
2. "Groups of users" support in core.
3. Mechanisms to make user pages more like a blog, conversation, or social stream by default.
4. Collaborative editing by default. (Just like etherpad or google docs is collaborative by default; you don't have to "turn it on", you just have to invite someone.)
5. Surfacing the activities of our users. "Watch live edits" mode in article view, for example.
A Thousand Flowers Bloom, or a fork-and-merge model (T113004). Centralized development is so CVS; let's embrace the "fork first" social model pioneered by git and github. This includes:
1. Tweaks to core to factor out versioning schemes. Allow branches and merges to be represented. (T40795)
2. A (extensible) merge engine in core, and a model for user-guided conflict resolution (T108664).
3. Better diff mechanisms. (T26617 might be one component, and ties in with better automatic merges, but better UX for diffs is equally important as tweaking the underlying diff/merge.)
Polyglot Wikimedia. Mediawiki supports a number of different mechanisms for accommodating content in all the world's languages, but technical development of these features has stalled: they are not supported in our latest VisualEditor/Flow work, for example, and there is no plan for this. We should turn this around, and restart active development on polyglot features. Let's embrace ContentTranslation. This experiment seems to have succeeded, time to bring its UX into core. Rather than treating our projects in different languages as isolated silos, we should make it as easy as possible for content in wiki A to borrow from or translate content from wiki B (regardless of "variant" or whether A and B share a database, etc). For example:
1. Fine-grained content tagging (like Parsoid's stable IDs) so that CX can permanently relate translated sections.
2. An easily-accessible split-screen view, so that (for example) the author of an article on enwiki on (say) a South American country can very easily see a google-translated version of the eswiki article on that topic, and translate/incorporate information from it.
3. A CX workflow for editors to keep translated sections up to date. Once we have persistent data that section A is a translated version of section B, edits to section B should be visible to editors who are maintaining section A.
4. A process to migrate Language Converter and the Translate extension to using this same mechanism. ContentTranslation should be able to work on articles in different languages residing on the same wiki, in the way the Translate extension does, for example. We need to identify and develop whatever features are currently missing in ContentTranslation to enable this.

I would love to see architectural working groups formed around top-down challenges such as these, incorporating not only developers but also UX designers and community members, charged with writing and coordinating specific low-level tasks/RFCs necessary to address the challenge. The top-level architectural committee work can decide on top-level tasks (such as "social wiki"), set up the working groups for each, and then audit the results and ensure coordination on points of overlap. For example, if multiple working groups would benefit from (say) storage backend refactoring, then the top-level architecture committee can identify and prioritize that subtask and coordinate the work so that the result will satisfy all potential users.

cscott added a project: Wikimedia-Developer-Summit-2016.Sep 21 2015, 4:51 PM

Congratulations! This is one of the 52 proposals that made it through the first deadline of the Wikimedia-Developer-Summit-2016 selection process. Please pay attention to the next one: > By 6 Nov 2015, all Summit proposals must have active discussions and a Summit plan documented in the description. Proposals not reaching this critical mass can continue at their own path out of the Summit.

Qgil moved this task from Backlog to Missing expected fields on the Wikimedia-Developer-Summit-2016 board.Oct 12 2015, 9:20 PM

• MZMcBride subscribed.Oct 16 2015, 5:11 AM

@GWicke, are you proposing this task as a Summit proposal, or is it a source for possible ideas? I mean, do you expect to have a session about "Identify and prioritize architectural challenges" and a prior discussion here?

Today is November 6, and this proposal is basically not on track. Unless the situation suddenly changes and/or @RobLa-WMF and the Architecture Committee really want to schedule it, it will be removed as a Wikimedia-Developer-Summit-2016 proposal.

Qgil mentioned this in T116024: WikiDev16 program .Nov 12 2015, 3:27 PM

This proposal has the support of the Architecture Committee. I still have a hard time understanding what is this about in the context of the Summit, though. Is this a session about how to identify and prioritize architectural changes? About going through the priorities currently listed in the description? Else?

Also, there hasn't been any discussion so far. @GWicke, are you planning to launch this discussion?

• RobLa-WMF mentioned this in T119032: WikiDev 16 working area: Software engineering.Nov 19 2015, 1:02 AM

Qgil moved this task from Missing expected fields to Missing active discussion on the Wikimedia-Developer-Summit-2016 board.Nov 23 2015, 9:29 AM

I'm starting to share @Qgil's discomfort with this as a session at Wikimedia-Developer-Summit-2016. @cscott's comment at T96903#1659718 had four different proposals; each of which we can (and should?) have a separate conversation about (perhaps in newly filed Phab tasks).

If we decide to have this conversation, what outcome do we hope for?

I think this session could be useful if it was scoped purely as a prioritization exercise for a reasonably sized list of technical projects that we know need to be undertaken. If it is yet another venue to catalog deficiencies real or imagined in the MediaWiki and Wikimedia technical stacks I think it would be a waste of time.

As a prioritization exercise I think it would be most useful as a component of a wrap up session where we try to figure out how to actually resource and move forward on initiatives that are approved in other sessions during the summit.

• RobLa-WMF mentioned this in T119022: WikiDev 16 working area: Content format.Dec 9 2015, 2:49 AM

• RobLa-WMF mentioned this in T119018: Working groups/areas for macro-organization of RfCs for the summit.Dec 9 2015, 5:04 AM

LikeLifer subscribed.Dec 10 2015, 9:46 PM

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!