Page MenuHomePhabricator

eqiad: 1 hardware access request for labs on real hardware (mwoffliner)
Closed, ResolvedPublic

Description

Labs Project Tested: mwoffliner / not applicable
Site/Location: eqiad
Number of systems: 1
Service: labs on real hardware / mwoffliner
Networking Requirements: TBD
Processor Requirements: 8 Core Xeon
Memory: 32G
Disks: at least 500G
NIC(s): Doesn't matter
Partitioning Scheme: big /srv
Other Requirements:

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added projects: hardware-requests, SRE.
yuvipanda subscribed.
chasemp triaged this task as Medium priority.
chasemp set Security to None.

FWIW, this would help offline server distributions like the XSCE and many others.

FWIW, this would help offline server distributions like the XSCE and many others.

@Metanish, are you involved with XSCE? Do you know how they grab the generated files in question? Is it rsync from us or http://kiwix.org/ or none of the above?

Yes. I am one of the volunteers with the project. I believe we grab them off of the kiwix.org download page through a web interface.

More typically though, people keep copies of zim files with them (again downloaded from kiwix.org) and as per requirement, roll a custom set of zim files as per need. To get a sense, please see:
https://docs.google.com/spreadsheets/d/13SA6Grzx70svt_8B4epxaDtbtElg0VlmdcXXg-fth5I/edit#gid=0

@yuvipanda or @chasemp: Is labs in a state to support bare metal deployments in eqiad at this time? I wasn't under the impression it was yet the case. (I thought it was pending the codfw test labs deployment to test labs on bare metal.)

If it isn't ready, can we link in any blocking tasks so its more evident, since this is sitting in the hardware-requests project without all the relevant information.

If it is, then I'll claim this task and start the process of allocations.

Thanks!

This box is to work out the alternative hw case outside of openstack if ironic doesn't pan out

chasemp changed the task status from Open to Stalled.Nov 24 2015, 10:09 PM
RobH added a subscriber: mark.

I only seem to have a single spare that meets these requirements closely:

Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory, Dual 300GB SSD, Dual 500GB Nearline SAS

promethium (warranty expired in early 2015)

Then we have a one other that is overkill on disk space:

Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks

wmf4541 (warranty expires on 2017-03-19)

So Promethium seems to be the best choice for allocation. I'm assigning this to @mark for his approval, as its allocating an out of warranty spare (though one of the newer out of warranty spares) into further use. (This isn't as bad as most of the eqiad spares, as they are well of our warranty but far less memory and unsuitable for this request.)

@mark: Please comment relevant approvals/questions/clarifications and assign back to either me for implementation or to Yuvi for followup questions.

Thanks!

@yuvipanda

How does the implementation of this task looks like?
This need/request is always really valid.

I just spoke to Kelson about this, and I'm willing to set this up if we can provide him with the hardware. Adding a second bare-metal server will be an 'interesting' test case, and Kelson understands the work and responsibilities that will be needed to maintain the box.

promethium has since been allocated elsewhere (to the wikitextexp project), right? Or was that for this purpose?

Correct, the parsing folks are using Promethium. So we would need them to release it, or to rack an additional misc server here.

Since promethium has since been allocated, I'm taking this task back to find another system to propose for approval to use on this task/project.

hang on, I'm confused on why we are pursuing this. The last public statement was we are not going to get entrenched in the hack of hardware in the instance subnet. I would like to have a meeting as to what we are doing and why before we allocate resources. @RobH can you table this until further comment?

Sounds fine by me. I'm going to set this to keep this stalled & assign to @chasemp for his comment following his meeting about this. (If it should have something else done, please correct or let me know!)

OK, sorry -- I may have overpromised here. Can we get a clear description of what this is for and what specifically is insufficient about labs VMs for the solution? Every time we go down this road we seem to conclude that 'bare metal' is a red herring and doesn't actually get us much.

@Andrew @RobH @chasemp @AlexMonk-WMF Thank you for taking time to answer to this ticket. I'm now back on this topic after a long summer pause.

Purpose:
Create ZIM files of all (big) Wikimedia projects. ZIM files are the only solution to have an easy usable offline copy of our projects.

Current situation:
We have been doing these files for years on different systems (meanwhile mostly on labs, except for the few biggest projects)

Problem:
We can not really do monthly snapshot (what would a good thing) and we have pretty serious difficulties to create ZIM files of the biggest Wikipedias (EN, DE, FR). The reason are the limitation in hardware resources. The problem is that we have a quota which blocks us to request one or two more large VMs from the XL/large setup (and there is still a doubt that even the large VM will be able to create a ZIM of WPEN within a month due to CPU and/or storage limitation).

We can not really do monthly snapshot (what would a good thing) and we have pretty serious difficulties to create ZIM files of the biggest Wikipedias (EN, DE, FR). The reason are the limitation in hardware resources. The problem is that we have a quota which blocks us to request one or two more large VMs from the XL/large setup (and there is still a doubt that even the large VM will be able to create a ZIM of WPEN within a month due to CPU and/or storage limitation).

So you just need a quota bump and perhaps a custom flavor?

@AlexMonk Yes, this is highly probable that we could fix the problem that way. I had created a while ago this request in an attempt to push things this direction #T91976.

A pragmatic approach might be:

  • One additional XL VM for DE/FR.
  • One additional XL VM for EN (but with a bit more storage, something like 300Go).

By doing this, I think we will have enough HW to make all of the projects one time a month.

@Kelson we did a capacity audit and spoke today about what is possible. We are going to create a custom flavor for the wmoffliner project that is an XL VM with an additional 300G, and allocate that quota along with enough for another XL VM. Hopefully this gets you rolling? At some point down the line we'll end up talking about breaking this out into its own hardware possibly (depending on load and capacity), but we can fulfill this now and so want to :) Thanks for your patience. We will try to get this done soon, the coming holiday (US) may put a bit of delay on it.

I've adjusted quotas to allow for creation of two xlarge instances, and added a special flavor called xlarge-xtradisk for your EN instance.

@chasemp @Andrew You are awesome! It looks like you just have fixed a many years old HW bootleneck problem! Thank you very much!

This comment was removed by Kelson.