Page MenuHomePhabricator

Services team goals January - March 2016 (Q3 2015/16)
Closed, ResolvedPublic

Description

1) Strengthen: API & community build-out

Continue API build-out & scale service development community resources.

Key results

  • Scale service development by refining guidelines and documentation.
  • Support apps and the API-driven frontend effort with cacheable high-traffic entry points and clean content interfaces.

Dependencies

  • Operations
  • Reading
  • Editing, especially the Parsing team

Tasks

2) Focus: Storage scaling and selective replication for hot data.

Key results

  • Prepare for cost-effective history storage by improving compression ratios for HTML content improved to < 10% of raw content size.
  • Support selective replication of hot data to edge PoPs for low-latency API access.

Dependencies

  • Operations

3) Experiment: Reliable event production & change propagation

Key results

  • Implement reliable event production from MediaWiki to EventBus.
  • Migrate job queue use cases to EventBus consumers.
  • Provide a light-weight EventBus implementation for third party users.
  • Prototype fine-grained dependency tracking for change propagation.

Dependencies

  • Performance for reliable event production.
  • Analytics for ongoing EventBus development.

See also:

Related Objects

StatusSubtypeAssignedTask
Resolved GWicke
Resolved mobrovac
Resolved mobrovac
ResolvedNone
Resolved GWicke
Resolved GWicke
ResolvedNone
Resolved GWicke
Resolved mobrovac
Resolved GWicke
Resolved GWicke
Resolvedfgiunchedi
Resolvedfgiunchedi
Resolved Cmjohnson
Resolved Cmjohnson
ResolvedJoe
Resolvedfgiunchedi
Resolved GWicke
Resolved Jdouglas
Resolved GWicke
Resolved GWicke
ResolvedArlolra
Resolved GWicke
Resolved mobrovac
Resolved mobrovac
Resolved mobrovac
Resolved mobrovac
Duplicate Jdouglas
ResolvedAndrew
Resolved GWicke
Resolvedfgiunchedi
Resolvedfgiunchedi
Resolvedfgiunchedi
ResolvedEevans
Resolvedfgiunchedi
Resolved GWicke
Resolved GWicke
Resolvedfgiunchedi
Resolved mobrovac
Resolved GWicke
InvalidNone
Resolved Pchelolo
ResolvedArlolra
Resolved mobrovac
Resolvedbd808
Resolved GWicke
DeclinedNone
Resolved Pchelolo
Resolved mobrovac
ResolvedNone
Resolved Pchelolo
Resolved GWicke
Resolved GWicke
Resolved GWicke
Resolved Pchelolo
OpenNone
ResolvedArlolra
Resolved Pchelolo
Resolved mobrovac

Event Timeline

GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke renamed this task from Services roadmap January - March 2016 (Q3 2015/16) to Services team goals January - March 2016 (Q3 2015/16).Nov 17 2015, 5:25 PM

After a discussion we've set 4 groups of goal candidates. Here're my thoughts on them ordered by priority.

Note: **Petr** marks tasks I'm most interested in (it doesn't mean I will manage to do all of them, just marks the pieces I'd like to work on).

Migrate several job queue use cases to the change propagation service

  • Reliable event production. The key goal for this group. We can't replace the job queue until the events are produced reliably.
  • Replace job queue use cases. After the previous point is done, this one becomes pretty easy - we could start from just porting the job queue handlers to the change propagation handlers without any modification.
  • Light-weight Kafka replacement. We'd need this at least for testing anyway. Could be done independently of anything. **Petr**

Community engagement

We haven't been paying any attention to this, so it's important to pay the dept in this area. Although I'm really interested, I'm not good enough in producing high-quality English texts, so in case I take this task, I'd need assistance in converting my writings to a readable form. Here's a list of actions I see viable:

  • Finish RESTBase-The-Framework. A key point here is to provide a rich documentation and a set of step-by-step guides on how to create some applications on the framework. Writing those guides would allow us to apply the framework to different problems, allowing to identify issues in generality and usability, thus improving the quality of our own software. **Petr**
  • Convert RESTBase to ES6. As ES6 is starting to get more and more adoption, stories about big project conversion, the results, key challenges and performance impact are getting a lot of attention. However, it's not clear if we would win or loose on performance, but even if we figure out that we should wait on that, a story still worths publishing. **Petr**
  • A showcase on converting an endpoint to RESTBase. We keep having the discussions on whether something should be proxied in RESTBase or not. We could take some examples on how easy it is to improve latency with RESTBase. One example is recent work on the summary endpoint - we've decreased latency 2 times compared to the PHP endpoint - pretty nice result without writing a single line of code.

Storage: Separate hot from warm content

The 3 subtasks of this goal are dependent on one another, so can't be done in parallel

  • Store current versions separately, so that they are cheaper to access & can be replicated to PoPs with modest resources. While PoPs looks too ambitious for this quarter, just separating the content is a nice achievement. +1 from me on this. **Petr**
  • Improve compression ratios for old revisions, possibly at the cost of some access latency. A research task without clear achievement criteria, dependent on the previous one. But as the achievement criteria is not clear, we can still go for it, as the risk of failing this is low.

I've intentionally excluded the "Import full wikitext history" part from this group, as it's dependent on the results of compression ratio improvements, so it's too ambitious to count on good research outcomes by setting it as an explicit goal.

Continue REST API build-out and service development mentoring

This is an ongoing task we have every quarter, however it doesn't have any clear points on what we have to achieve.

  • Focus on high-traffic end points. (Cover one PHP endpoint with RESTBase) Not clear enough which endpoint we will want to cover. Analysis made in the current quarter showed, that it's not that easy to find a right candidate, so we need to find one before stating a goal like this.
  • Develop a longer term layout for the REST API. The outcome of this task is not clear or measurable, so I don't think we need to set it as a goal.

In general, I'm wondering if we should mark that as a goal at all. This is an ongoing task which we will work on anyway, so what's the point in setting it as a goal? We don't set "bug-fixing" or "making new features in RESTBase" as a goal, how is this different?

I'm very much interested in Services support for Reading Web's flagship Q3 goal T120341: [GOAL] Make Wikipedia more accessible to all connections with new fast API-driven web experience in mobile web beta. Part of our request is for implementation, part of it is for coaching, all of it is for the good of the world.

Eevans updated the task description. (Show Details)

I'm very much interested in Services support for Reading Web's flagship Q3 goal T120341: [GOAL] Make Wikipedia more accessible to all connections with new fast API-driven web experience in mobile web beta. Part of our request is for implementation, part of it is for coaching, all of it is for the good of the world.

This is a really noteworthy goal, and I agree that all of it is for the good of the world :) That said, I need to express my doubt in adopting this as a goal for the next quarter. The MobileWeb and MobileApps sub-teams seem to have a bit diverged in the way they plan to achieve it while still having (more-or-less) the same big-picture goals. I think it would be quite good to determine a joint strategy in reaching this goal. Primarily, I'm thinking here about laying out a plan for code reusage between the Web and the Apps.

From my side, I'd be willing to provide support and coaching with everything pertaining to server-side fall-back. I also think it'd be quite important to focus the coaching in the dev-ops department so that you are able to develop iteratively and independently in the long term.

Disclaimer: This is my view based on (limited) information I have about your goal, so feel free to tell me to mind my own business. However, judging by the task's description, it seems to me there is no consensus on how this will be achieved.

A couple of notes on the goals from my POV.

IMHO, we should focus on goals 1, 2 and 4:

  • Strengthen: Continue REST API build-on and service-development mentoring
  • Focus: Community engagement
  • Experiment: Migrate job-queue jobs to the event bus

From the task's description I can see that there are various aspects of each goal we are interested in. That is quite good, but I do think we should set goal owners for each goal, primarily because most of our goals are actually multi-quarter endeavours which we need to consistently track.

Of the three mentioned goals, the community-engagement one seems the most important, mostly because we have been actively neglecting it. Also, bringing RB closer to third-party users will help us in the long term, especially in our effort to bring the REST API closer to the MW API.

Migrating job-queue jobs is another long-term project worthy of our time. It ought to bring benefits not only to WMF-run wikis (getting rid of Redis eventually), but to MW-the-SW (have only one way to signal events and deliver them) and MW admins as well (easier set-up using a SOA approach).

Finally, for the first goal, in my view, the most important thing we'd need to focus on is defining a long-term plan for the development and maintenance of all existing and future services (not just Node.js-powered ones).

In general, I'm wondering if we should mark that as a goal at all. This is an ongoing task which we will work on anyway, so what's the point in setting it as a goal? We don't set "bug-fixing" or "making new features in RESTBase" as a goal, how is this different?

I couldn't agree more with the analogy, but the trick here is that a considerable amount of our time goes into this, so we need to account for it.

I would like to see here better testing environment for RESTBase + cassandra

I would like to see here better testing environment for RESTBase + cassandra

+1

Migrate several job queue use cases to the change propagation service

Sounds like a goal that will be immensely catastrophic for many users, cf. 1.22 and 1.23. https://www.mediawiki.org/wiki/Manual:Job_queue#History
Does the job queuing really need a revolution at every release? What plan will there be to address the fallout?

Per our team discussion yesterday, I updated the task description with our goals for Q3. There will be some more fine-tuning in the exact wording, but the areas are locked down at this point.

For reference, here is the task description as it stood before the team meeting:

1) Continue REST API build-out and service development mentoring

  1. Focus on high-traffic end points.
  2. Thumb API?
  3. Develop a longer term layout for the REST API. {@Eevans upvote}
  4. Figure out a long-term plan for supporting existing services {@mobrovac work-on}
  5. Service development mentoring {@mobrovac go-to-so-work-on}

2) Migrate several job queue use cases to the change propagation service; start dependency tracking

  1. Build on work done in Q2 to replace job queue use cases. {@mobrovac: this should be merged with point 3 below imho}
  2. Support dependency tracking (by integrating with link tables, tracking new dependencies), and start using this for change propagation. {@mobrovac upvote, work-on}
  3. Develop long-term plan for replacement of the job queue {{@mobrovac upvote, work-on}}
  4. Light-weight Kafka replacement for small-scale/dev deployments {@Pchelolo work-on}
  5. Atomic event production from MediaWiki {@Eevans work-on,upvote, @mobrovac upvote}

3) Storage: Separate hot from warm content; import full wikitext history

  1. Store current versions separately, so that they are cheaper to access & can be replicated to PoPs with modest resources. {@Pchelolo work-on}
  2. Improve compression ratios for old revisions, possibly at the cost of some access latency. {@Eevans work-on,upvote}
  3. Import the full wikitext history
  4. Evaluate RB as an option for a longer-term ExternalStore replacement. {{@mobrovac upvote, possibly work-on}

4) Community engagement

  1. Finish the RESTBase-The-Framework work {@Pchelolo work-on, @mobrovac upvote}
  2. Consider using RB-the-FW as a replacement for service-template-node {@mobrovac upvote}
  3. Create a showcase on how to use it RESTBase-The-Framework (step-by-step tutorial) {@Eevans && @mobrovac work-on,upvote}
  4. Convert RESTBase to ES6 {@Pchelolo work-on}
  5. Community engagement / marketing: tech talks, blog posts, updated docs & showcase {@Eevans && @mobrovac work-on,upvote}
  6. Push distribution for third party users forward (docker images, https://phabricator.wikimedia.org/T92826#1804775) {@mobrovac upvote, work-on}
  7. Release RESTBase 1.0 (signaling a level of readiness, maturity). {@Eevans work-on,upvote, @mobrovac upvote}

Legend

upvoteindicates an opinion that the item should be included in the goals
work-onindicates a desire to work on the item

The diff is unreadable... please avoid doing markup changes when you change the text, make markup changes in a separate edit. (Or use a MediaWiki wiki page to make it easier to follow updates.)

We're in Q4 now, so resolving this one.