[Epic] Deploy Revscoring/ORES service in Prod
Closed, ResolvedPublic
Actions

Description

This card is done when ORES is deployed in the production network.

Note that this card was originally a [Discussion] card and was later changed to an engineering task.

Aaron/Dario: Talk to Mark and Gabriel about scaling and the move towards Prod:

more sys than prod
Aaron, Dario, Yuvi to meet w/Gabriel whether they will adopt this as a service, if yes, then would need to work on our process with them
already started conversation with Mark re: where services like Revscoring will live:
- need non-Prod/non-Labs place for Revscoring to live = meso-level support

In parallel with T106860: Write down current process and ideal process for Revscoring (request from Wikimania 2015)

Related Objects
Search...

Status	Assigned	Task
Resolved	Ladsgroup	T130210 [Epic] Deploy ORES extension as a Beta feature
Resolved	Halfak	T140002 [Epic] Deploy ORES review tool
Resolved	Ladsgroup	T130212 Deploy ORES review tool to wikidatawiki
Invalid	Halfak	T106398 Revscoring tasks from Wikimania discussions
Resolved	Halfak	T106867 [Epic] Deploy Revscoring/ORES service in Prod
Declined	None	T107493 Python packaging for getting ORES into production
Resolved	yuvipanda	T107964 Create a mediawiki-utilities debian package
Resolved	awight	T107970 Create a python package for 'stopit' module
Resolved	yuvipanda	T107971 Create debian package for yamlconf
Resolved	yuvipanda	T107972 Create a debian package for revscoring
Resolved	awight	T108451 Create a debian package for scikit-learn
Resolved	awight	T110658 Update debian package for python3-scipy to >= 0.14.1
Resolved	awight	T107974 Create a debian package for pylru
Resolved	awight	T107989 Create a debian package for python3-socketio-client 0.5.6
Resolved	awight	T107991 Create a debian package for python3-jsonpify
Declined	None	T108556 Build ORES dependencies and store objects in repo
Resolved	Halfak	T108421 Setup configurable logging support for ORES
Resolved	Halfak	T110072 Security Review of Revscoring
Resolved	Halfak	T115534 Set up backpressure for ORES (Limit queue sizes in Celery)
Declined	Halfak	T119435 Setup gerrit mirror of the repos from GitHub
Duplicate	None	T124199 Modify puppet to use <something> for storing dependencies/virtualenv
Resolved	akosiaris	T124200 Setup redises on oresdb hosts
Resolved	akosiaris	T125562 setup/deploy oresrdb1001-oresrdb1002
Resolved	RobH	T119598 eqiad: (2) servers request for ORES
		Unknown Object (Task)
		Unknown Object (Task)
		Unknown Object (Task)
Resolved	• Cmjohnson	T121578 Rack 8 new misc servers
Resolved	• Cmjohnson	T125565 Update Label for oresrdb1001 (WMF4577) & relocate and update label for oresrdb1002 (WMF4578)
Resolved	MoritzMuehlenhoff	T125256 jessie installer fails after partitioning stage- same recipe works on trusty and a it worked few weeks ago
Resolved	akosiaris	T124201 Setup ores on scb cluster
Resolved	Ladsgroup	T129109 Switch wmflabs ORES to deploy using python wheels
Resolved	Ladsgroup	T129110 Make requirements.txt in ores-wikimedia-config expand dependencies and pin versions
Resolved	Ladsgroup	T129458 Drop usage of mw library in wikiclass
Resolved	Ladsgroup	T129114 Modify puppet/fabric to make ores on labs deploy from wheels
Duplicate	None	T129113 Switch wikimedia ORES deploy to use bdist wheels
Resolved	Ladsgroup	T129112 Build a git repository with all the wheels required to deploy ORES
Resolved	Ladsgroup	T128670 Move to using scap3 for deployment for ORES service
Resolved	akosiaris	T124202 Setup LVS for ORES
Resolved	akosiaris	T124203 Setup varnish endpoint for ORES

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Meeting scheduled for 7/28 @ 10:30 PDT

Halfak updated the task description. (Show Details)Jul 24 2015, 7:25 PM

• DarTar triaged this task as High priority.Jul 24 2015, 11:00 PM

• DarTar updated the task description. (Show Details)

• DarTar moved this task from Backlog to In Progress on the Research board.

Notes from the meeting: https://etherpad.wikimedia.org/p/revscoring_and_services

My summary:

to ensure security and allow experimentation, we'd ideally want to deploy this in a semi-prod vlan, accessible from prod but without access to the prod-internal network; however, this depends on scarce network engineering resources, which makes it unrealistic in the short term
as a result, decision is to deploy service on the prod network (for now)
- the service is already puppetized
- yuvi will take lead on python packaging
- services will help with general deploy workflow, monitoring, logging, in collaboration with research & ops
- hardware requirements are moderate (currently ~2 cores?), two hw boxes for redundancy should be sufficient with caching / storage
  - use SCB cluster / see T96017?
- services will provide a public API and caching via RESTBase

• GWicke renamed this task from Talk to Mark and Gabriel about scaling and moving Revscoring towards Prod to Revscoring in Production.Jul 28 2015, 7:21 PM

• GWicke added subscribers: • mobrovac, yuvipanda.

• GWicke added a subscriber: • csteipp.Jul 28 2015, 7:23 PM

• GWicke added a subscriber: akosiaris.Jul 28 2015, 7:48 PM

Ricordisamoa subscribed.Jul 30 2015, 6:18 AM

Halfak renamed this task from Revscoring in Production to [Discussion] Revscoring in Production.Jul 30 2015, 8:59 PM

He7d3r updated the task description. (Show Details)Jul 30 2015, 10:54 PM

• DarTar moved this task from In Progress to Done (current quarter) on the Research board.Aug 6 2015, 10:32 PM

• DarTar added a project: Research-and-Data-2016-Q1.Aug 7 2015, 9:12 PM

• GWicke added a subtask: T107493: Python packaging for getting ORES into production.Aug 8 2015, 1:05 AM

Timeline from my perspective:

Get packaging / puppet conversion to use packages done by end of Month August. Helped by @awight and @madhuvishy
Get Extension:ORES into a deployable state by end of Month August. @Legoktm has been doing great on this
Start the process for provisioning some hardware for this. I think one of the server spares can run the celery bits (so it can take the CPU load) and we can keep the uwsgi server in SCA.
Get Extension:ORES out as a beta feature by end of next month!!!!!!1
Everyone buys everyone else involved in this lots of alcohol or other drinks of choice.

Need to check if we need performance / security review of this.

In T106867#1520641, @yuvipanda wrote:

Need to check if we need performance / security review of this.

For the MW extension? We will need a security review at least, perf reviews are optional.

@Legoktm yeah, but also for the service itself.

@yuvipanda, yes, service will need its own review if it's running on production hardware or on a project domain. If someone can make separate Tasks for each and tag them with Security-Review, that would be best.

Halfak moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.Aug 13 2015, 4:26 PM

(also hahaa at optimistic schedules :P)

Just for the record, there is no such thing as a "semi prod vlan". Please wait for @csteipp and maybe Moritz to take a look at this.

So list of things that need to be done to actually get this deployed from an operational perspective:

Security Review
Performance Review(??)
Figure out how we're going to expose this to the internet
Figure out which hardware this will live on

Other things that can happen in parallel:

Graphite metrics
Centralized logging.

Krenair subscribed.Aug 24 2015, 5:47 PM

Just for the record, there is no such thing as a "semi prod vlan".

Indeed, sadly. It would be great if we could partition off services that don't need access to any internal infrastructure from the regular production network. We want to be able to do requests *from* production to this service, but the service's network access should ideally be limited to public production APIs only.

I think there is a wider need for better network isolation, and a semi-prod vlan could be a stepping stone in that direction. Another option that was brought up for use cases like HTML dumps was bare metal in the labs network. This is a wider discussion, which I think is just starting to happen.

@GWicke @Joe let's take that discussion to T95185? Suffice to say, it's irrelevant to ORES at this point.

• GWicke mentioned this in T95185: PoC bare-metal server allocation in labs -- bootstrap mode.Aug 24 2015, 8:58 PM

Halfak closed this task as Resolved.Sep 19 2015, 2:10 PM

This was deployed with all the blockers still open?

Can someone point to the production url where the service is running?

This is not deployed in a prod network. The service lives in wmflabs.

See ores.wmflabs.org and https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service

Halfak closed subtask T108421: Setup configurable logging support for ORES as Resolved.Nov 19 2015, 11:45 PM

Not sure why this was closed...

yuvipanda closed subtask T107196: Set up revscoring entry points in RESTBase as Declined.Nov 20 2015, 4:23 AM

• GWicke reopened subtask T107196: Set up revscoring entry points in RESTBase as Open.Nov 20 2015, 6:29 AM

Joe closed subtask T107196: Set up revscoring entry points in RESTBase as Declined.Nov 20 2015, 8:38 AM

This card is a [Discussion]. That discussion happened. We should either have a new card for revscoring actually making it into production, or rewrite this card's description.

Halfak renamed this task from [Discussion] Revscoring in Production to Deploy Revscoring/ORES service in Prod.Nov 20 2015, 2:35 PM

Halfak updated the task description. (Show Details)

Halfak added a subtask: T115534: Set up backpressure for ORES (Limit queue sizes in Celery).Nov 20 2015, 2:38 PM

Halfak closed subtask T115534: Set up backpressure for ORES (Limit queue sizes in Celery) as Resolved.

• GWicke removed a subtask: T107196: Set up revscoring entry points in RESTBase.Nov 20 2015, 3:23 PM

I've seen a few of these tags popping up in task titles. Where are they documented?

No documentation I know of. We just use them as a folksonomy within the revscoring project.

yuvipanda closed subtask T119435: Setup gerrit mirror of the repos from GitHub as Declined.Nov 23 2015, 9:17 PM

akosiaris added a subtask: T119598: eqiad: (2) servers request for ORES.Nov 25 2015, 8:49 AM

• GWicke mentioned this in T121240: Network isolation for production and semi-production services.Dec 11 2015, 5:38 PM

Halfak moved this task from Completed to Backlog on the Machine-Learning-Team (Active Tasks) board.Dec 28 2015, 3:34 PM

Halfak moved this task from Backlog to Monitor (long term) on the Machine-Learning-Team (Active Tasks) board.

Anomie mentioned this in T120923: Ask for consensus to enable and deploy ORES extension to production.Jan 2 2016, 5:52 PM

Things that still need to happen:

Import and build debs into production repository
Modify puppet to use debs instead of pip
Setup redises on oresdb hosts
Setup ores on scb cluster
Setup LVS for ORES
Setup varnish endpoint

Halfak added a subtask: T108556: Build ORES dependencies and store objects in repo.Jan 20 2016, 6:44 PM

Halfak added a subtask: T124199: Modify puppet to use <something> for storing dependencies/virtualenv.

Halfak added a subtask: T124200: Setup redises on oresdb hosts.

Halfak added a subtask: T124201: Setup ores on scb cluster.Jan 20 2016, 6:48 PM

Halfak added a subtask: T124202: Setup LVS for ORES.

Halfak added a subtask: T124203: Setup varnish endpoint for ORES.

@akosiaris, I just updated the blocked-by tasks to include tasks for each of the notes that @yuvipanda left. I didn't fill in much for details. Please feel free to ping me if you need more.

Halfak closed subtask T110072: Security Review of Revscoring as Resolved.Jan 21 2016, 3:40 PM

Halfak moved this task from Done (current quarter) to Radar on the Research board.Jan 21 2016, 11:42 PM

RobH mentioned this in T125562: setup/deploy oresrdb1001-oresrdb1002.Feb 2 2016, 7:22 PM

RobH closed subtask T119598: eqiad: (2) servers request for ORES as Resolved.

hashar mentioned this in T127661: Deploy ORES extension to beta cluster.Feb 22 2016, 3:09 PM

yuvipanda closed subtask T107493: Python packaging for getting ORES into production as Declined.Mar 7 2016, 6:45 PM

yuvipanda closed subtask T108556: Build ORES dependencies and store objects in repo as Declined.

Lydia_Pintscher added a subscriber: JanZerebecki.Mar 8 2016, 3:22 PM

Sjoerddebruin subscribed.Mar 11 2016, 1:55 PM

Ladsgroup renamed this task from Deploy Revscoring/ORES service in Prod to [Epic] Deploy Revscoring/ORES service in Prod.Mar 12 2016, 7:08 AM

Ladsgroup added a project: Epic.

Halfak added a subtask: T128670: Move to using scap3 for deployment for ORES service.Mar 14 2016, 8:20 PM

Halfak added a subtask: T129109: Switch wmflabs ORES to deploy using python wheels.

Halfak moved this task from Monitor (long term) to Non-Epic on the Machine-Learning-Team (Active Tasks) board.Mar 16 2016, 2:04 PM

• ggellerman edited projects, added Research-Freezer; removed Research.Mar 17 2016, 10:26 PM

• ggellerman moved this task from Backlog to Radar on the Research-Freezer board.Mar 17 2016, 10:28 PM

@akosiaris, can this be assigned to you since you already started work.

The Epic one ? Er, yeah sure.

akosiaris closed subtask T124200: Setup redises on oresdb hosts as Resolved.Mar 23 2016, 7:11 PM

Halfak added a parent task: T130212: Deploy ORES review tool to wikidatawiki.Apr 3 2016, 6:41 AM

Halfak reopened subtask T128670: Move to using scap3 for deployment for ORES service as Open.Apr 4 2016, 4:50 PM

mark added a project: SRE.Apr 26 2016, 11:02 AM

Ladsgroup closed subtask T128670: Move to using scap3 for deployment for ORES service as Resolved.Apr 26 2016, 3:06 PM

Halfak mentioned this in T134651: NDA for Amir Sarabadani.May 7 2016, 3:31 PM

akosiaris closed subtask T124201: Setup ores on scb cluster as Resolved.Jun 2 2016, 6:02 PM

akosiaris closed subtask T124202: Setup LVS for ORES as Resolved.Jun 3 2016, 8:56 AM

akosiaris closed subtask T124203: Setup varnish endpoint for ORES as Resolved.Jun 6 2016, 9:38 PM

Krinkle subscribed.Jun 8 2016, 9:21 PM

Halfak moved this task from Non-Epic to Completed on the Machine-Learning-Team (Active Tasks) board.Jun 14 2016, 5:10 PM

Ladsgroup closed this task as Resolved.Jun 14 2016, 9:53 PM

Now we have https://ores.wmflabs.org/ and https://ores.wikimedia.org/ Was this task about implementing the latter one?

It was not about the first one.

Yeah. This was about getting ores.wikimedia.org online.

Our plan is to keep ores.wmflabs.org online for the forseeable future. We'll have a deprecation announcement coming soon to encourage people to move over to ores.wikimedia.org. Eventually ores.wmflabs.org will be reserved for experimental modeling and processing strategies. So, we'll likely have tools that use experimental/new models using it and that will provide us with real usage patterns to test out performance improvements and that sort of thing.

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:48 PM

[Epic] Deploy Revscoring/ORES service in ProdClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[Epic] Deploy Revscoring/ORES service in Prod
Closed, ResolvedPublic
Actions

Related Objects
Search...