Page MenuHomePhabricator

Nexts steps along the Service Oriented Architecture path, plenary session at 2015 MediaWiki Developer Summit
Closed, ResolvedPublic

Description

https://www.mediawiki.org/wiki/MediaWiki_Developer_Summit_2015#Schedule
Tuesday 27, 9:45am

Etherpad: http://etherpad.wikimedia.org/p/MDS_2015_SOA_plenary

Agenda

  • Introduction: ~15 minutes
  • Discussion on open questions: ~30 minutes

Why SOA?

  • well-defined interfaces / APIs:
    • org / team scaling: don't need to understand every detail to get started
    • parallelize development behind stable interfaces
    • testing: clear & narrow interface to test against, can mock dependencies
    • horizontal layers / interfaces can reduce vertical silo tendency & help to identify common concerns / patterns
  • performance: parallelism using distribution
  • security & robustness: least privilege, fault isolation & monitoring

What we have done so far

  • Parsoid
  • feature services: mathoid, citoid, hieroglyphs?
  • PHP API improvements
    • HHVM perf -> graph
    • features for mobile

What we learned

  • Stable APIs are great:
    • PHP API powers apps, Parsoid, OCG, ..
    • Parsoid API powers VE, Flow, Kiwix, content translation, ..
  • Isolation is great
    • example: had issue in OCG; service isolation meant that potential damage was limited
  • Got to use third-party code, new contributors: MathJax, Zotero
    • Both use client-side code on server
  • No standard / convenient solutions for
    • monitoring
    • caching
    • some security aspects
  • Infrastructure & deployment
    • shortage of manpower in ops for puppet, reviews
    • mostly using trebuchet, which has improved but still has plenty of issues (mostly in salt land)
      • can we find something simpler / more reliable & better integrated with config management?
    • Unclear responsibilities: what happens when stuff breaks?

What we are currently working on

  • Parsoid: Perfecting rendering, media support in preparation for Parsoid-powered views
  • Wikidata Query Service: see later session
  • API improvements
    • lots of work on PHP API
      • Performance: HHVM, ongoing per-request overhead optimizations
      • Features in support of mobile
    • RESTBase close to first deploy
      • light-weight & high performance REST API driven by Swagger specs
      • optimized for storage / caching backed by internal services like PHP API, Parsoid, Mathoid etc
        • aims to provide standard monitoring, security headers, CSRF validation, sanitization and authorization facilities; avoid duplication of effort in each service.
      • initial focus on HTML content and revision metadata, including services like revscore
      • looking into HTML section editing and -retrieval
  • Coming up:
    • Auth service (Wikia: Helios)
    • Image scaling (Wikia: Vignette)
    • Small feature services like svg to png renderer or hieroglyphs

Possible discussion topics

  • scaling down for testing and third-party use (covered in https://phabricator.wikimedia.org/T86559?)
    • should we start targeting cheap VMs with packages, images, vagrant, puppet, docker files or [insert here]?
  • Keeping complexity in check
    • granularity
    • communication patterns
    • stateless services
    • limited number of platforms: PHP, Node, some Java & Python
  • Can we move faster without breaking?
    • test coverage
    • ability to restart & roll back: stateless vs. stateful
    • security: XSS, CSRF
    • responsibility: who is getting pages?
    • isolation, CI & deployment: next session
  • Front-ends as API consumers
    • Should we gradually rework the desktop and mobile skins as API consumers?
  • Can we achieve fully cached page views for logged-in readers?
    • ESI vs. client-side
  • Using Parsoid HTML for views
    • Parsoid eventually becoming the default parser?

This session should ideally be followed by T86138 and T86372.

Event Timeline

Qgil created this task.Dec 22 2014, 4:54 PM
Qgil assigned this task to GWicke.
Qgil raised the priority of this task from to Needs Triage.
Qgil updated the task description. (Show Details)
Qgil changed Security from none to None.
Qgil added subscribers: Aklapper, Qgil.
MZMcBride renamed this task from Nexts steps along the Service Oriented Architecture path, plenary session at #MWDS15 to Nexts steps along the Service Oriented Architecture path, plenary session at 2015 MediaWiki Developer Summit.Jan 6 2015, 10:16 PM

Adding @tstarling and @ori to the conversation. I believe the most important aspect of SOA to talk about at the summit is what our infrastructure should look like to clients, where the "client" here is "anything", including but not limited to:

  • Mobile apps
  • Bots
  • A cleanly-separated web front end

Ori and I discussed this earlier, and he pointed out that our long-term strategy for creating this clean separation can be one of three approaches:

  • Rewrite everything
  • Build a new layer that hide the ugly bits behind a sanity layer, then replace the ugly bits at our leisure
  • Incrementally improve our existing APIs

We still disagree on this, but at a minimum, we should come up with a plan for a plan at the summit, if we can't actually agree on a plan. This plenary should be a tool to get us there.

GWicke added a comment.EditedJan 8 2015, 6:46 AM

@RobLa-WMF, I think we are in pretty broad agreement about the importance of building our front-ends on top of a performant API, for all the obvious reasons. This is what mobile has been moving to for a while, and I believe there is also growing support for doing the same for desktop.

So, the question is how to best get to a point where we can support high request volumes with low latencies. I believe nobody is proposing a complete rewrite, which leaves options 2 (wrapper with performance work targeted at high-volume end points) and 3 (incremental improvements). Currently we are following option 2 with restbase & the REST content API, and option 3 with the PHP API.

So lets definitely discuss this. I am actually pretty optimistic that we can come to an agreement on this, as the options aren't even mutually exclusive.

GWicke updated the task description. (Show Details)Jan 8 2015, 6:53 AM
GWicke triaged this task as High priority.Jan 8 2015, 7:26 AM
GWicke updated the task description. (Show Details)
GWicke added subscribers: faidon, mark, ssastry.
GWicke updated the task description. (Show Details)Jan 8 2015, 5:07 PM
GWicke updated the task description. (Show Details)
GWicke updated the task description. (Show Details)Jan 9 2015, 6:27 PM
GWicke updated the task description. (Show Details)
bd808 added a subscriber: bd808.Jan 12 2015, 4:42 PM
GWicke updated the task description. (Show Details)Jan 12 2015, 9:40 PM
GWicke updated the task description. (Show Details)Jan 12 2015, 10:17 PM
GWicke updated the task description. (Show Details)
GWicke updated the task description. (Show Details)
ori added a comment.Jan 12 2015, 10:48 PM
  • "Why SOA?" should be cut, in my opinion. There should be a very quick overview for the benefit of curious newcomers, but it should not take more than two minutes at the most. There is a lot of excellent material on this online that we can just direct people to in an e-mail before the actual summit.
  • An alternative would be to recast "Why SOA?" into a "Should my pet project be a service?" discussion. The idea would be to help developers identify opportunities for converting their project into a standalone service.
  • I'd cut "what has happened so far", too. People can go to MediaWiki:Services to catch up.
  • Re: "learnings: what has worked, what hasn't". This could work. What are you planning to bring up?

I have grave concerns that service development by community developers is currently unfeasible, because successfully seeing a service through to deployment on Wikimedia's cluster requires meeting a wide range of expectations that are currently half-articulated, undocumented, or contentious. In the absence of clear expectations, people have to rely on social relationships to get feedback and have their code deployed. Rather than talk about our social organization (which team does what, who has the authority to decide what), we should strive to set clear, objective criteria for quality.

It's essential to solicit feedback at all stages (pre-summit planning, summit itself, retrospective) from community members.

@TheDJ, @Jackmcbarn: I'd love to have your input on this.

GWicke added a comment.EditedJan 12 2015, 11:05 PM

@ori: The entire intro including the learnings part is intended to last less than 10 minutes, so that we can spend the bulk of the time on discussion and working towards better alignment by tackling important and far-reaching issues that would be too large in most other venues.

Regarding your point about development by community members: What we learned from mathoid (@Physikerwelt) and citoid (@Mvolz) is that the difficulties are not so much in developing the services themselves, but in the complexities of deployment and exposing things in a reasonable API. Some of the infrastructure bits are being addressed with restbase and the auth service, but many operational and structural challenges still remain.

This is a big and important topic, which is why I believe we would have trouble covering it sufficiently in this 45 minute session alone. Instead of trying to do so, we currently have the next session in the same room about "SOA and operations" (T86138), followed by "Service virtualization, deployment & CI" (T86372). We can use the learnings as well as the last part of the discussion to highlight the issue, and then follow up with in-depth discussions in the other sessions.

GWicke updated the task description. (Show Details)Jan 12 2015, 11:52 PM
GWicke updated the task description. (Show Details)Jan 12 2015, 11:54 PM
GWicke updated the task description. (Show Details)Jan 13 2015, 12:01 AM
GWicke updated the task description. (Show Details)Jan 13 2015, 12:04 AM
GWicke updated the task description. (Show Details)Jan 13 2015, 2:05 AM
GWicke updated the task description. (Show Details)Jan 13 2015, 2:18 AM
In T85154#972282, @ori wrote:

I have grave concerns that service development by community developers is currently unfeasible, because successfully seeing a service through to deployment on Wikimedia's cluster requires meeting a wide range of expectations that are currently half-articulated, undocumented, or contentious. In the absence of clear expectations, people have to rely on social relationships to get feedback and have their code deployed.
Rather than talk about our social organization (which team does what, who has the authority to decide what), we should strive to set clear, objective criteria for quality.

... the difficulties are not so much in developing the services themselves, but in the complexities of deployment and exposing things in a reasonable API.

IMHO, the problems you two have mentioned are newbie problems in the sense that most of the code and development processes have been targeting MW directly. With the planned move to SOA, things are going to get new aspects, including community development. As an example, we could provide appropriate packages for SOA-related MW components, which people would then install and develop further (or create their own services utilising the provided packages). They would then deploy that on their own infrastructure and test. What would be left to do would be to bridge the gap between their infrastructure and the WM cluster. I am not implying this last step to be an easy feat, my point is that I believe it would require a much smaller effort than it currently does.

I definitely agree, @ori, that we need to discuss the exact way in which to proceed, explain it cleanly to the community and get as much feedback as possible.

GWicke added a comment.EditedJan 13 2015, 3:54 PM

IMHO, the problems you two have mentioned are newbie problems in the sense that most of the code and development processes have been targeting MW directly. With the planned move to SOA, things are going to get new aspects, including community development.

The difficulties mentioned are definitely solvable, and *are* solved in many other engineering organizations. For example, Ryan will likely speak to that from his new perspective at Lyft. So far it has however be a major problem, especially for volunteer projects like mathoid and citoid.

@GWicke Natural, because services has not been the modus operandi thus far (at least that's my humble understanding). What I meant to highlight is the thought that this hardness of adopting community-based services is likely to decrease as we ourselves switch to SOA. And for that, as was pointed out earlier, we need to agree on clear ways to get there :)

Oh, and another thought. Moving forward to SOA and targeting cheap VPS deployment solutions would actually allow community developers to:

  • thoroughly test and improve upon their services
  • use their deployment until they are up and running on WM's cluster

Naturally, this would also allow our engineers to test the service and give direct guidelines as to the changes needed to be made by developers to ease the transition to WM's cluster.

Please, tell me if I'm going overboard a bit here :P

GWicke updated the task description. (Show Details)Jan 14 2015, 12:24 AM
GWicke lowered the priority of this task from High to Normal.Jan 14 2015, 12:27 AM
GWicke updated the task description. (Show Details)Jan 14 2015, 4:24 PM
In T85154#972282, @ori wrote:

I have grave concerns that service development by community developers is currently unfeasible, because successfully seeing a service through to deployment on Wikimedia's cluster requires meeting a wide range of expectations that are currently half-articulated, undocumented, or contentious. In the absence of clear expectations, people have to rely on social relationships to get feedback and have their code deployed. Rather than talk about our social organization (which team does what, who has the authority to decide what), we should strive to set clear, objective criteria for quality.

I agree with the concerns. Broadly, I'm still uncomfortable with the idea of a wiki page requiring dozens of independent services in order to be rendered, and what this means for non-Wikimedia Foundation wiki users.

We should make setting clear architecture, security, and performance standards a high priority.

I'm also interested in hearing more input from people outside of the Wikimedia Foundation Services team about what services we might have and how they might fit into the Wikimedia development ecosystem. I've been reviewing the various pages on mediawiki.org and I'm unhappy about the lack of discussion and scrutiny here.

Lastly, I want to see a clear and honest evaluation of how services such as Parsoid have developed and progressed so far before we commit further projects to a services model/architecture.

GWicke updated the task description. (Show Details)Jan 27 2015, 5:37 AM
GWicke closed this task as Resolved.Feb 21 2015, 3:44 AM