Page MenuHomePhabricator

eqiad/codfw: (4)+(4) hardware access request for videoscalers
Closed, ResolvedPublic

Description

Request for at least 2 additional machines for the "video scaler" job queue runners. Software configuration already exists and is in production. (Budget only for 2 if need to purchase new ones, but we can use more if they're available for repurposing and not needed elsewhere.)

Labs Project Tested: production
Site/Location:EQIAD
Number of systems: 2 minimum, but could use 4 or 6 for transition period if available
Service: videoscalers
Networking Requirements: internal
Processor Requirements: 20 cores / 40 threads or more
Memory: recommend 1-2GB per thread (64GB is good for 20/40 config)
Disks: Local disks should have room for at least 10-20 gigabytes of temporary files. No need for super-huge disks.
NIC(s): enough to be able to push ~1-4 gigabyte files to Swift service relatively speedily.
Partitioning Scheme: default app server-style layout
Other Requirements:

We need additional CPU capacity for video scalers to migrate video transcodes from WebM's older VP8 codec to the newer VP9 codec: T63805. This will reduce bandwidth and storage requirements for video playback by about 40% due to VP9's better compression, but encoding is 2-4x slower and the additional CPU headroom would be very helpful -- especially during initial migration when we have to re-encode the backlog while still handling new incoming files.

Note that past issues with VP9 and ffmpeg packaging are in the past since we migrated to Debian stretch with its newer ffmpeg -- no additional config/packaging should be required.

Event Timeline

brion created this task.Feb 23 2018, 8:23 AM
brion added a comment.Feb 23 2018, 9:00 AM

General capacity note: current version of libvpx can use a varying number of threads for VP9 encoding depending on the resolution. At our current resolutions, this means we can peg up to 14 cores simultaneously when processing a single HD input file:

  • 160p 1 thread
  • 240p 1 thread
  • 360p 2 threads
  • 480p 2 threads
  • 720p 4 threads
  • 1080p 4 threads

Each resolution is a distinct job, so these can spread over multiple active video scalers.

Having more servers will mean we can handle more than one or two files at the same time at all these resolutions, which is a must for re-encoding the back catalog or handling very long uploads like conference presentations while remaining responsive.

And if we ever enable UHD resolutions they can use more threads:

  • 1440p 8 threads
  • 2160p 8 threads

The image scalers currently only serve requests for internal wikis and Gilles is in the process of also moving those to Thumbor. Once completed, we could repurpose those servers as video scalers.

brion added a comment.Feb 23 2018, 1:59 PM

@MoritzMuehlenhoff Great! What are the specs on those, for reference?

Our current six eqiad image scalers are Dell PowerEdge R430 with 40 cores (Xeon CPU E5-2650 v3 @ 2.30GHz) and 64G RAM.

brion added a comment.Feb 23 2018, 2:13 PM

Is that dual-socket for 20 cores/40 threads or quad-socket for 40 cores/80 threads? Hyperthreading makes everything confusing. ;) Either way those should work very well as video scalers.

brion added a comment.Feb 23 2018, 2:18 PM

(And would all 6 be available for video scaler use -- I'll happily take them! -- or would we share with a bigger pool?)

brion updated the task description. (Show Details)Feb 23 2018, 2:23 PM

Is that dual-socket for 20 cores/40 threads or quad-socket for 40 cores/80 threads? Hyperthreading makes everything confusing. ;) Either way those should work very well as video scalers.

It's 20 physical cores / 40 threads:

root@mw1293:~# lscpu
Architecture:          x86_64
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2

(And would all 6 be available for video scaler use -- I'll happily take them! -- or would we share with a bigger pool?)

That's up for Faidon/Mark to decide :-)

brion updated the task description. (Show Details)Feb 23 2018, 2:47 PM

Rough plan is to get two new r430s with roughly the same config as the old image scalers, and also repurpose as many of the old ones as are available.

If any questions on the budget, ping @dr0ptp4kt. :)

If the repurposed ones are needed for other tasks, but can be spared for a few months, then I can use them as video scalers during migration from VP8 to VP9 encoding and then free some or all of them back up.

Joe added a comment.Feb 26 2018, 2:36 PM

I think reusing the imagescalers (which are quite beefy machines) to this purpose is a good idea. I don't think that the normal load of the videoscalers cluster merits so many additional machines, but for the duration of the migration this seems like a smart use of our resources.

I support the temporary lease of the imagescalers (all 4 of them in each DC) to the videoscaling infrastructure for the duration of the project.

faidon assigned this task to RobH.Feb 26 2018, 2:47 PM
faidon added a subscriber: faidon.

Sounds good. Note that eqiad has 6 imagescalers (mw1293-mw1298) and codfw has 4 now ( mw2244-2245/mw2150-2151) but let's go with reassigning 4+4 for videoscaling for symmetry. (Note that this is blocked on T188062 right now to my knowledge)

faidon renamed this task from Site: (2) hardware access request for videoscalers to eqiad/codfw: (4)+(4) hardware access request for videoscalers.Feb 26 2018, 2:50 PM
faidon changed the task status from Open to Stalled.
faidon triaged this task as Normal priority.
RobH added a comment.Feb 26 2018, 10:39 PM

Sounds good. Note that eqiad has 6 imagescalers (mw1293-mw1298) and codfw has 4 now ( mw2244-2245/mw2150-2151) but let's go with reassigning 4+4 for videoscaling for symmetry. (Note that this is blocked on T188062 right now to my knowledge)

I'm uncertain if this requires any kind of triage or work by me or procurement or DC-Ops? It seems that we're going to allocate existing systems to this, and there is nothing for dc-ops to do?

I'm happy to reimage, but as its just runnign a script and otherwise is all puppet and service level changes, it seems like it would be handled by some other team within SRE? (Happy to help out if needed though.)

brion added a comment.Feb 28 2018, 3:13 PM

@RobH we'd still like to buy 2 new machines with this configuration, so if/when the ones taken from the image scaler pool are needed elsewhere we've got still enough increased capacity to cover ongoing uploads with the increased CPU requirements for VP9 encoding.

RobH added a comment.Mar 5 2018, 7:55 PM

@RobH we'd still like to buy 2 new machines with this configuration, so if/when the ones taken from the image scaler pool are needed elsewhere we've got still enough increased capacity to cover ongoing uploads with the increased CPU requirements for VP9 encoding.

This configuration being the mw system config? Also 2 for codfw and eqiad each or just one site at this point?

brion added a comment.Mar 5 2018, 8:03 PM

Yes, the R430 with 20/40 cores/threads and 64GB ram, roughly matching the existing ones from the old image scalers pool.

As long as they can all be used together in general usage, consider splitting the 2x purchase between 1 for eqiad and 1 for codfw. Keeps things more even in case of fail over.

RobH mentioned this in Unknown Object (Task).Mar 5 2018, 8:03 PM
RobH mentioned this in Unknown Object (Task).Mar 5 2018, 8:06 PM
RobH added a comment.Mar 5 2018, 8:10 PM

Yes, the R430 with 20/40 cores/threads and 64GB ram, roughly matching the existing ones from the old image scalers pool.
As long as they can all be used together in general usage, consider splitting the 2x purchase between 1 for eqiad and 1 for codfw. Keeps things more even in case of fail over.

Yep!

So I've created two sub tasks, one for the order on each site. Those have pricing, so they have to remain in the private space and no pricing discussion can take place here. I'm setting this to stalled, pending the ordering of the two sub tasks.

@RobH would you please grant me access on those tix?

Reedy removed a subscriber: Reedy.Mar 5 2018, 8:25 PM
RobH added a comment.EditedMar 5 2018, 8:27 PM

@RobH would you please grant me access on those tix?

Done, you can now view the contents of the S4 space. Access to the S4 procurement space is handled via #acl*procurement-review. Please do NOT disclose the contents of any tasks in that space, or copy their contents elsewhere. If you have any questions, let me know. Thanks!

Thanks @RobH, understood.

Joe added a comment.Mar 12 2018, 12:46 PM

Instead of buying more hardware (specifically: 1 server per dc) we should reshuffle things so that we have more videoscaling capacity (that is - reassign a server from another pool temporarily if needed).

Also, I think we're going to change a few things regarding how things work, in the short, middle and long term that could make such hardware requests useless:

  • Since we're switching to changeprop for videoscaling scheduling soon(TM), I plan to merge the videoscaler and jobrunner clusters and make one larger cluster that can do both things at the same time. we might need to do some fine-tuning in terms of concurrency, but that's my current best guess at what we should do (see T188947 for following the work on that front). This will be done in the short-to-mid-term
  • Since jobrunners are now not much different from other appservers, modulo the "jobrunner/jobchron" scripts, I would like to devote a larger cluster to all async operations, including restbase calls for async jobs, ores batch jobs, etc, so that we can separate async processing from serving users properly. This is a medium-term plan
  • Once we move MediaWiki to kubernetes, we will be able to dynamically allocate resources to different functions easily and with flexibility, and individual hardware will become less and less important. This is of course very long term

So, in all, I do see us merging the videoscalers cluster in the jobrunner one pretty soon, and if not, we can always add 5 servers per DC to the videoscaling pool, or even 6.

I'd prefer to get an estimate of the concurrency we want to support on videoscaling (that is, how many concurrent transcodes we want to support), rather than a number of machines; that will allow, in addition to correctly size the cluster, to correctly configure changeprop when the time will come.

brion added a comment.Mar 13 2018, 9:28 PM

*nod* If there's general agreement not to add more specific hardware yet, we can just work with the reassigned image servers for now and add later if needed, and cancel on purchasing more for now.

On concurrency:

Each job runs one file at one target resolution, which can use from 1 to 4 threads in ffmpeg depending on the resolution:

  • 1x 160p, 240p
  • 2x 360p, 480p
  • 4x 720p, 1080p

To run all resolutions for an HD source file at once thus eats 1 + 1 + 2 + 2 + 4 + 4 = 14 threads at peak, but the encoding time will be dominated by the top couple of resolutions, so average usage will be something more like 7-10 threads parallel of usage.

As for the linear time: encoding in VP9, as I hope to transition to, takes about 2-4x as long as encoding in VP8 which we do now. Currently it uses about the same number of threads, maxing out what libvpx 1.6.0 can do for threading.

General usage is "spiky", sometimes very low with no or view small uploads and sometimes there's a whole bunch of stuff that comes in including public domain full-length movies or conference videos. I'd aim for a minimum capacity to run at least 4 files in parallel at various resolutions, which allows both small files and a large file or two to run in parallel without interfering too much.

For a switchover from WebM VP8 to WebM VP9, we'd be re-running the existing set of all files, which is .... currently about 114,341 total video files on Commons and a handful on other sites, many of which are less than full HD resolution. This could keep many threads full for some weeks.

Future considerations:

  • If we enable Ultra-HD resolution output, ffmpeg/libvpx can do 8-threads at 1440p and 2160p
  • In libvpx 1.7.0 with a suitable ffmpeg, VP9 encoding can scale wider to use more threads at any resolution, which will allow faster encoding in the general case when few files are in the queue. This would require backported packages for Debian Stretch.
Joe closed subtask Unknown Object (Task) as Declined.Apr 10 2018, 9:07 AM
Joe closed subtask Unknown Object (Task) as Declined.
RobH added a comment.May 1 2018, 6:02 PM

*nod* If there's general agreement not to add more specific hardware yet, we can just work with the reassigned image servers for now and add later if needed, and cancel on purchasing more for now.

Does this make this task resolved? (Just ensuring I'm not missing anything.)

RobH closed this task as Resolved.May 3 2018, 11:43 PM