Maniphest T196485

WDQS diskspace is low
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Smalyshev
	Jun 5 2018, 5:31 PM

Description

On the WDQS cluster, we are around 80% disk usage. We need to expand the disk space pretty soon. WDQS Internal cluster is at about 40%, so it's fine for now.

/dev/mapper/wdqs1003--vg-data  686G  521G  131G  80% /srv
/dev/mapper/wdqs1004--vg-data  686G  519G  133G  80% /srv
/dev/mapper/wdqs1005--vg-data  686G  528G  124G  82% /srv
/dev/mapper/wdqs2001--vg-data  686G  506G  146G  78% /srv
/dev/mapper/wdqs2002--vg-data  686G  510G  141G  79% /srv
/dev/mapper/wdqs2003--vg-data  686G  508G  144G  79% /srv

Details

	Subject	Repo	Branch	Lines +/-
	configured wqds to use RAID10	operations/puppet	production	+1 -2

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Gehel	T196485 WDQS diskspace is low
		Unknown Object (Task)
		Unknown Object (Task)
Resolved	Gehel	T202778 add ssds to wdqs2003
		Unknown Object (Task)
Resolved	• Cmjohnson	T202780 add SSDs to wdqs1003
		Unknown Object (Task)
Resolved	Smalyshev	T202777 add SSDs to wdqs200[12]
		Unknown Object (Task)
Resolved	• Cmjohnson	T202779 add SSDs to wdqs100[45]

Event Timeline

Smalyshev created this task.Jun 5 2018, 5:31 PM

Restricted Application added projects: Wikidata, Discovery-ARCHIVED. · View Herald TranscriptJun 5 2018, 5:31 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Smalyshev triaged this task as High priority.Jun 5 2018, 5:31 PM

Smalyshev added a subscriber: Gehel.

We have a "sleeping" task to order new disks: T186526

Smalyshev added a subtask: Unknown Object (Task).Jun 5 2018, 5:34 PM

Smalyshev moved this task from Incoming to Operations/SRE on the Wikidata-Query-Service board.Jun 10 2018, 9:07 PM

• Vvjjkkii renamed this task from WDQS diskspace is low to elbaaaaaaa.Jul 1 2018, 1:05 AM

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

CommunityTechBot renamed this task from elbaaaaaaa to WDQS diskspace is low.Jul 2 2018, 9:21 AM

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added a subscriber: Aklapper.

In T196485#4258384, @Gehel wrote:

We have a "sleeping" task to order new disks: T186526

That task has been resolved, as there are now 4 sub-tasks off it for the ordering of the ssds for each of the hosts listed in this task. As they are procurement tasks, they are non-public (thus my update here.)

faidon reopened subtask Unknown Object (Task) as Open.Jul 11 2018, 1:24 PM

Gehel mentioned this in T200202: WDQS disk usage increase is correlated with reloading of categories.Jul 23 2018, 2:48 PM

RobH closed subtask Unknown Object (Task) as Resolved.Aug 6 2018, 6:06 PM

I have the 4 ssds on-site.

Please note all sub-tasks for additions to the wdqs cluster have been created and are linked off this task.

Next steps are for coordination with @Gehel and onsites to add the disks. As the disks are hot swap addtions (not replacements), it shouldnt result in any downtime. However, its always best to be logged in and watching the host when swapping harware.

To not duplicate infos on each of the child tasks, I'll add anything that is common to all on this task.

We'll take this occasion to reimage the systems, so that we can validate that we have a working partman configuration with the new disks as well. Newer wdqs servers use the raid10-gpt-srv-lvm-ext4 recipe. We should use the same. We will loose a bit of disk space, since the new disks are slightly larger than the old ones (960GB vs 800GB). We are unlikely to need those GB for a few years, and by that time, those systems will be out of warranty. So let's choose simplicity and coherence over maximization of disk space that we're probably not going to need.

Note that data import after reimage can be done by copying over data from wdqs1010, which has been reimported recently. Procedure is documented on https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_transfer_procedure.

@Gehel Alright then.

Change 455563 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] configured wqds to use RAID10

https://gerrit.wikimedia.org/r/455563

gerritbot added a project: Patch-For-Review.Aug 27 2018, 3:09 PM

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201808271540_gehel_3833.log.

Change 455563 merged by Gehel:
[operations/puppet@production] configured wqds to use RAID10

https://gerrit.wikimedia.org/r/455563

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201808271550_gehel_5518.log.

Completed auto-reimage of hosts:

['wdqs2003.codfw.wmnet']

and were ALL successful.

ArielGlenn removed a project: Patch-For-Review.Aug 30 2018, 2:35 PM

New SSD in place, server reimaged and data reimported. We're all good!

WDQS diskspace is lowClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

WDQS diskspace is low
Closed, ResolvedPublic
Actions

Related Objects
Search...