Page MenuHomePhabricator

[Epic] Deploy Revscoring/ORES service in Prod
Closed, ResolvedPublic

Description

This card is done when ORES is deployed in the production network.

Note that this card was originally a [Discussion] card and was later changed to an engineering task.

Aaron/Dario: Talk to Mark and Gabriel about scaling and the move towards Prod:

  • more sys than prod
  • Aaron, Dario, Yuvi to meet w/Gabriel whether they will adopt this as a service, if yes, then would need to work on our process with them
  • already started conversation with Mark re: where services like Revscoring will live:
    • need non-Prod/non-Labs place for Revscoring to live = meso-level support

In parallel with T106860: Write down current process and ideal process for Revscoring (request from Wikimania 2015)

Related Objects

StatusSubtypeAssignedTask
ResolvedLadsgroup
ResolvedHalfak
ResolvedLadsgroup
InvalidHalfak
ResolvedHalfak
DeclinedNone
Resolvedyuvipanda
Resolvedawight
Resolvedyuvipanda
Resolvedyuvipanda
Resolvedawight
Resolvedawight
Resolvedawight
Resolvedawight
Resolvedawight
DeclinedNone
ResolvedHalfak
ResolvedHalfak
ResolvedHalfak
DeclinedHalfak
DuplicateNone
Resolvedakosiaris
Resolvedakosiaris
ResolvedRobH
Resolved Cmjohnson
Resolved Cmjohnson
ResolvedMoritzMuehlenhoff
Resolvedakosiaris
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
DuplicateNone
ResolvedLadsgroup
ResolvedLadsgroup
Resolvedakosiaris
Resolvedakosiaris

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Meeting scheduled for 7/28 @ 10:30 PDT

DarTar updated the task description. (Show Details)
DarTar moved this task from Backlog to In Progress on the Research board.

Notes from the meeting: https://etherpad.wikimedia.org/p/revscoring_and_services

My summary:

  • to ensure security and allow experimentation, we'd ideally want to deploy this in a semi-prod vlan, accessible from prod but without access to the prod-internal network; however, this depends on scarce network engineering resources, which makes it unrealistic in the short term
  • as a result, decision is to deploy service on the prod network (for now)
    • the service is already puppetized
    • yuvi will take lead on python packaging
    • services will help with general deploy workflow, monitoring, logging, in collaboration with research & ops
    • hardware requirements are moderate (currently ~2 cores?), two hw boxes for redundancy should be sufficient with caching / storage
      • use SCB cluster / see T96017?
    • services will provide a public API and caching via RESTBase
GWicke renamed this task from Talk to Mark and Gabriel about scaling and moving Revscoring towards Prod to Revscoring in Production.Jul 28 2015, 7:21 PM
GWicke added subscribers: mobrovac, yuvipanda.
Halfak renamed this task from Revscoring in Production to [Discussion] Revscoring in Production.Jul 30 2015, 8:59 PM

Timeline from my perspective:

  1. Get packaging / puppet conversion to use packages done by end of Month August. Helped by @awight and @madhuvishy
  2. Get Extension:ORES into a deployable state by end of Month August. @Legoktm has been doing great on this
  3. Start the process for provisioning some hardware for this. I think one of the server spares can run the celery bits (so it can take the CPU load) and we can keep the uwsgi server in SCA.
  4. Get Extension:ORES out as a beta feature by end of next month!!!!!!1
  5. Everyone buys everyone else involved in this lots of alcohol or other drinks of choice.

Need to check if we need performance / security review of this.

Need to check if we need performance / security review of this.

For the MW extension? We will need a security review at least, perf reviews are optional.

@Legoktm yeah, but also for the service itself.

@yuvipanda, yes, service will need its own review if it's running on production hardware or on a project domain. If someone can make separate Tasks for each and tag them with Security-Review, that would be best.

(also hahaa at optimistic schedules :P)

Just for the record, there is no such thing as a "semi prod vlan". Please wait for @csteipp and maybe Moritz to take a look at this.

So list of things that need to be done to actually get this deployed from an operational perspective:

  1. Security Review
  2. Performance Review(??)
  3. Figure out how we're going to expose this to the internet
  4. Figure out which hardware this will live on

Other things that can happen in parallel:

  1. Graphite metrics
  2. Centralized logging.

Just for the record, there is no such thing as a "semi prod vlan".

Indeed, sadly. It would be great if we could partition off services that don't need access to any internal infrastructure from the regular production network. We want to be able to do requests *from* production to this service, but the service's network access should ideally be limited to public production APIs only.

I think there is a wider need for better network isolation, and a semi-prod vlan could be a stepping stone in that direction. Another option that was brought up for use cases like HTML dumps was bare metal in the labs network. This is a wider discussion, which I think is just starting to happen.

@GWicke @Joe let's take that discussion to T95185? Suffice to say, it's irrelevant to ORES at this point.

This was deployed with all the blockers still open?

Can someone point to the production url where the service is running?

This is not deployed in a prod network. The service lives in wmflabs.

See ores.wmflabs.org and https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service

Not sure why this was closed...

This card is a [Discussion]. That discussion happened. We should either have a new card for revscoring actually making it into production, or rewrite this card's description.

Halfak renamed this task from [Discussion] Revscoring in Production to Deploy Revscoring/ORES service in Prod.Nov 20 2015, 2:35 PM
Halfak updated the task description. (Show Details)

I've seen a few of these tags popping up in task titles. Where are they documented?

No documentation I know of. We just use them as a folksonomy within the revscoring project.

Things that still need to happen:

  1. Import and build debs into production repository
  2. Modify puppet to use debs instead of pip
  3. Setup redises on oresdb hosts
  4. Setup ores on scb cluster
  5. Setup LVS for ORES
  6. Setup varnish endpoint

@akosiaris, I just updated the blocked-by tasks to include tasks for each of the notes that @yuvipanda left. I didn't fill in much for details. Please feel free to ping me if you need more.

Ladsgroup renamed this task from Deploy Revscoring/ORES service in Prod to [Epic] Deploy Revscoring/ORES service in Prod.Mar 12 2016, 7:08 AM
Ladsgroup added a project: Epic.

@akosiaris, can this be assigned to you since you already started work.

Yeah. This was about getting ores.wikimedia.org online.

Our plan is to keep ores.wmflabs.org online for the forseeable future. We'll have a deprecation announcement coming soon to encourage people to move over to ores.wikimedia.org. Eventually ores.wmflabs.org will be reserved for experimental modeling and processing strategies. So, we'll likely have tools that use experimental/new models using it and that will provide us with real usage patterns to test out performance improvements and that sort of thing.