Page MenuHomePhabricator

Deploy ceph osd processes to data-engineering cluster
Closed, ResolvedPublic5 Estimated Story Points

Description

We have now acquired the Ceph servers identified in T310195: Ceph Data Infrastructure Request and they have been racked in the task {T311869}

This is the 6th subtask in Epic T324660: Install Ceph Cluster for Data Platform Engineering

The purpose of this task is to enable puppet management of the OSDs in the new Ceph cluster.

An OSD is an object storage daemon - In essence, this is one software process that runs for every hard drive or solid-state drive in the cluster.

The new Ceph cluster in question will have:

  • 12 OSDs per node in the hdd storage class - each drive is 18 TB in size
  • 8 OSDs per node in the ssd storage class - each drive is 8 TB in size
  • 5 nodes in the cluster.

That makes 100 OSDs in the cluster to begin with.

The puppet module must faciliate operational tasks such as identification and replacement of a failed device.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+10 -0
operations/puppetproduction+1 -4
operations/puppetproduction+1 -1
operations/puppetproduction+8 -1
operations/puppetproduction+5 -5
operations/puppetproduction+16 -7
operations/puppetproduction+4 -4
operations/puppetproduction+19 -17
operations/puppetproduction+1 -1
operations/puppetproduction+4 -0
operations/puppetproduction+2 -2
operations/puppetproduction+2 -1
operations/puppetproduction+2 -2
operations/puppetproduction+724 -26
operations/puppetproduction+59 -0
operations/puppetproduction+42 -22
operations/puppetproduction+1 -0
operations/puppetproduction+2 -2
operations/puppetproduction+38 -9
operations/puppetproduction+30 -0
operations/puppetproduction+325 -6
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 909707 merged by Btullis:

[operations/puppet@production] Add the perccli utility to the new Ceph servers

https://gerrit.wikimedia.org/r/909707

Change 910460 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Initial commit of a custom ceph_disks fact

https://gerrit.wikimedia.org/r/910460

Change 910460 merged by Btullis:

[operations/puppet@production] Add a custom ceph_disks fact

https://gerrit.wikimedia.org/r/910460

With sincere thanks to @jbond we now have a ceph_disks fact that allows us to map the storage slots reliably to their wwn values. For example, on cephosd1001:

btullis@cephosd1001:~$ sudo facter -p ceph_disks
{
   5f4ee0803cba8e00 => {
    status => "Success",
    model => "Dell HBA330 Mini",
    disks => {
      c0/e23/s0 => {
        controller => "0",
        enclosure => "23",
        slot => "0",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB2BB4",
        serial => "ZR5BQ14S"
      },
      c0/e23/s1 => {
        controller => "0",
        enclosure => "23",
        slot => "1",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9B553C4",
        serial => "ZR5BNHDF"
      },
      c0/e23/s2 => {
        controller => "0",
        enclosure => "23",
        slot => "2",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB85D8",
        serial => "ZR5BPZ8Q"
      },
      c0/e23/s3 => {
        controller => "0",
        enclosure => "23",
        slot => "3",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB1044",
        serial => "ZR5BNZ5V"
      },
      c0/e23/s4 => {
        controller => "0",
        enclosure => "23",
        slot => "4",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB4918",
        serial => "ZR5BQ0K4"
      },
      c0/e23/s5 => {
        controller => "0",
        enclosure => "23",
        slot => "5",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB1C20",
        serial => "ZR5BPR9R"
      },
      c0/e23/s6 => {
        controller => "0",
        enclosure => "23",
        slot => "6",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB3FDC",
        serial => "ZR5BP8E5"
      },
      c0/e23/s7 => {
        controller => "0",
        enclosure => "23",
        slot => "7",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB3140",
        serial => "ZR5BP8R4"
      },
      c0/e23/s8 => {
        controller => "0",
        enclosure => "23",
        slot => "8",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB3FA4",
        serial => "ZR5BP8E6"
      },
      c0/e23/s9 => {
        controller => "0",
        enclosure => "23",
        slot => "9",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9B58AF8",
        serial => "ZR5BNQQM"
      },
      c0/e23/s10 => {
        controller => "0",
        enclosure => "23",
        slot => "10",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9B560B8",
        serial => "ZR5BNH1N"
      },
      c0/e23/s11 => {
        controller => "0",
        enclosure => "23",
        slot => "11",
        medium => "HDD",
        interface => "SAS",
        wwn => "5000C500D9BB2F68",
        serial => "ZR5BQ13E"
      },
      c0/e23/s16 => {
        controller => "0",
        enclosure => "23",
        slot => "16",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21F3C50C",
        serial => "3250A0ATTPA8"
      },
      c0/e23/s17 => {
        controller => "0",
        enclosure => "23",
        slot => "17",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21F3C518",
        serial => "3250A0AVTPA8"
      },
      c0/e23/s18 => {
        controller => "0",
        enclosure => "23",
        slot => "18",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21EDCD8C",
        serial => "22K0A13ATPA8"
      },
      c0/e23/s19 => {
        controller => "0",
        enclosure => "23",
        slot => "19",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21F6A9B0",
        serial => "32B0A01XTPA8"
      },
      c0/e23/s20 => {
        controller => "0",
        enclosure => "23",
        slot => "20",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21F6A9DC",
        serial => "32B0A028TPA8"
      },
      c0/e23/s21 => {
        controller => "0",
        enclosure => "23",
        slot => "21",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21EDCD68",
        serial => "22K0A131TPA8"
      },
      c0/e23/s22 => {
        controller => "0",
        enclosure => "23",
        slot => "22",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21EDCD78",
        serial => "22K0A135TPA8"
      },
      c0/e23/s23 => {
        controller => "0",
        enclosure => "23",
        slot => "23",
        medium => "SSD",
        interface => "SAS",
        wwn => "58CE38EE21EDCD40",
        serial => "22K0A12RTPA8"
      },
      c0/e23/s24 => {
        controller => "0",
        enclosure => "23",
        slot => "24",
        medium => "SSD",
        interface => "SATA",
        wwn => "5ACE42E0254CA5B5",
        serial => "   ENB3N6461I2103U45"
      },
      c0/e23/s25 => {
        controller => "0",
        enclosure => "23",
        slot => "25",
        medium => "SSD",
        interface => "SATA",
        wwn => "5ACE42E0254CA5B9",
        serial => "   ENB3N6461I2103U49"
      }
    }
  }
}

I'll now work on updating the OSD patch so that we can set the desired topology in hiera and then translate this to commands that use the /dev/disk/by-id/wwn-* device names.

That's interesting :)

After installing perccli on the WMCS hosts, it seems to fail at the ruby level :/

root@cloudcephosd1005:~# facter -p ceph_disks
2023-04-24 10:39:42.169470 ERROR puppetlabs.facter - error while resolving custom fact "ceph_disks": undefined method `each_pair' for nil:NilClass

Probably the output of perccli is not what the fact expects (so it gets Nil at some step, and then tries to iterate it):

1{
2"Controllers":[
3{
4 "Command Status" : {
5 "CLI Version" : "007.1910.0000.0000 Oct 08, 2021",
6 "Operating system" : "Linux 5.10.0-21-amd64",
7 "Controller" : 0,
8 "Status" : "Success",
9 "Description" : "None"
10 },
11 "Response Data" : {
12 "Basics" : {
13 "Controller" : 0,
14 "Model" : "PERC H730P Adapter",
15 "Serial Number" : "03H02SB",
16 "Current Controller Date/Time" : "04/24/2023, 10:40:43",
17 "Current System Date/time" : "04/24/2023, 10:40:50",
18 "SAS Address" : "54cd98f0cd3ac000",
19 "PCI Address" : "00:18:00:00",
20 "Mfg Date" : "03/14/20",
21 "Rework Date" : "03/14/20",
22 "Revision No" : "A01"
23 },
24 "Version" : {
25 "Firmware Package Build" : "25.5.6.0009",
26 "Firmware Version" : "4.300.00-8352",
27 "Bios Version" : "6.33.01.0_4.16.07.00_0x06120304",
28 "Ctrl-R Version" : "5.18-0702",
29 "NVDATA Version" : "3.1511.00-0028",
30 "Boot Block Version" : "3.07.00.00-0003",
31 "Driver Name" : "megaraid_sas",
32 "Driver Version" : "07.714.04.00-rc1"
33 },
34 "Bus" : {
35 "Vendor Id" : 4096,
36 "Device Id" : 93,
37 "SubVendor Id" : 4136,
38 "SubDevice Id" : 8002,
39 "Host Interface" : "PCI-E",
40 "Device Interface" : "SAS-12G",
41 "Bus Number" : 24,
42 "Device Number" : 0,
43 "Function Number" : 0,
44 "Domain ID" : 0
45 },
46 "Pending Images in Flash" : {
47 "Image name" : "No pending images"
48 },
49 "Status" : {
50 "Controller Status" : "Optimal",
51 "Memory Correctable Errors" : 0,
52 "Memory Uncorrectable Errors" : 0,
53 "ECC Bucket Count" : 0,
54 "Any Offline VD Cache Preserved" : "No",
55 "BBU Status" : 0,
56 "PD Firmware Download in progress" : "No",
57 "Support PD Firmware Download" : "Yes",
58 "Lock Key Assigned" : "No",
59 "Failed to get lock key on bootup" : "No",
60 "Lock key has not been backed up" : "No",
61 "Bios was not detected during boot" : "No",
62 "Controller must be rebooted to complete security operation" : "No",
63 "A rollback operation is in progress" : "No",
64 "At least one PFK exists in NVRAM" : "No",
65 "SSC Policy is WB" : "No",
66 "Controller has booted into safe mode" : "No",
67 "Controller shutdown required" : "No",
68 "Controller has booted into certificate provision mode" : "No",
69 "Current Personality" : "RAID-Mode "
70 },
71 "Supported Adapter Operations" : {
72 "Rebuild Rate" : "Yes",
73 "CC Rate" : "Yes",
74 "BGI Rate " : "Yes",
75 "Reconstruct Rate" : "Yes",
76 "Patrol Read Rate" : "Yes",
77 "Alarm Control" : "Yes",
78 "Cluster Support" : "No",
79 "BBU" : "Yes",
80 "Spanning" : "Yes",
81 "Dedicated Hot Spare" : "Yes",
82 "Revertible Hot Spares" : "Yes",
83 "Foreign Config Import" : "Yes",
84 "Self Diagnostic" : "Yes",
85 "Allow Mixed Redundancy on Array" : "No",
86 "Global Hot Spares" : "Yes",
87 "Deny SCSI Passthrough" : "No",
88 "Deny SMP Passthrough" : "No",
89 "Deny STP Passthrough" : "No",
90 "Support more than 8 Phys" : "Yes",
91 "FW and Event Time in GMT" : "No",
92 "Support Enhanced Foreign Import" : "Yes",
93 "Support Enclosure Enumeration" : "Yes",
94 "Support Allowed Operations" : "Yes",
95 "Abort CC on Error" : "Yes",
96 "Support Multipath" : "Yes",
97 "Support Odd & Even Drive count in RAID1E" : "No",
98 "Support Security" : "Yes",
99 "Support Config Page Model" : "Yes",
100 "Support the OCE without adding drives" : "Yes",
101 "Support EKM" : "No",
102 "Snapshot Enabled" : "No",
103 "Support PFK" : "No",
104 "Support PI" : "No",
105 "Support Ld BBM Info" : "No",
106 "Support Shield State" : "Yes",
107 "Block SSD Write Disk Cache Change" : "No",
108 "Support Suspend Resume BG ops" : "Yes",
109 "Support Emergency Spares" : "No",
110 "Support Set Link Speed" : "Yes",
111 "Support Boot Time PFK Change" : "No",
112 "Support SystemPD" : "Yes",
113 "Disable Online PFK Change" : "No",
114 "Support Perf Tuning" : "Yes",
115 "Support SSD PatrolRead" : "Yes",
116 "Real Time Scheduler" : "No",
117 "Support Reset Now" : "Yes",
118 "Support Emulated Drives" : "Yes",
119 "Headless Mode" : "Yes",
120 "Dedicated HotSpares Limited" : "No",
121 "Point In Time Progress" : "No",
122 "Extended LD" : "Yes",
123 "Support Uneven span " : "Yes",
124 "Support Config Auto Balance" : "No",
125 "Support Maintenance Mode" : "No",
126 "Support Diagnostic results" : "Yes",
127 "Support Ext Enclosure" : "No",
128 "Support Sesmonitoring" : "No",
129 "Support SecurityonJBOD" : "No",
130 "Support ForceFlash" : "No",
131 "Support DisableImmediateIO" : "No",
132 "Support LargeIOSupport" : "Yes",
133 "Support DrvActivityLEDSetting" : "No",
134 "Support FlushWriteVerify" : "No",
135 "Support CPLDUpdate" : "No",
136 "Support ForceTo512e" : "No",
137 "Support discardCacheDuringLDDelete" : "No",
138 "Support JBOD Write cache" : "Yes",
139 "Support Large QD Support" : "No",
140 "Support Ctrl Info Extended" : "No",
141 "Support IButton less" : "No",
142 "Support AES Encryption Algorithm" : "No",
143 "Support Encrypted MFC" : "No",
144 "Support Snapdump" : "No",
145 "Support Force Personality Change" : "No",
146 "Support Dual Fw Image" : "No",
147 "Support PSOC Update" : "No",
148 "Support Secure Boot" : "No",
149 "Support Debug Queue" : "No",
150 "Support Least Latency Mode" : "Yes",
151 "Support OnDemand Snapdump" : "No",
152 "Support Clear Snapdump" : "No",
153 "Support PHY current speed" : "No",
154 "Support Lane current speed" : "No",
155 "Support NVMe Width" : "No",
156 "Support Lane DeviceType" : "No",
157 "Support Extended Drive performance Monitoring" : "No",
158 "Support NVMe Repair" : "No",
159 "Support Platform Security" : "No",
160 "Support None Mode Params" : "No",
161 "Support Extended Controller Property" : "No",
162 "Support Smart Poll Interval for DirectAttached" : "No",
163 "Support Write Journal Pinning" : "No",
164 "Support SMP Passthru with Port Number" : "No",
165 "Support SnapDump Preboot Trace Buffer Toggle" : "No",
166 "Support Parity Read Cache Bypass" : "No",
167 "Support NVMe Init Error Device ConnectorIndex" : "No"
168 },
169 "Enterprise Key management" : {
170 "Capability" : "Not Supported"
171 },
172 "Supported PD Operations" : {
173 "Force Online" : "Yes",
174 "Force Offline" : "Yes",
175 "Force Rebuild" : "Yes",
176 "Deny Force Failed" : "No",
177 "Deny Force Good/Bad" : "No",
178 "Deny Missing Replace" : "No",
179 "Deny Clear" : "No",
180 "Deny Locate" : "No",
181 "Support Power State" : "Yes",
182 "Set Power State For Cfg" : "No",
183 "Support T10 Power State" : "Yes",
184 "Support Temperature" : "Yes",
185 "NCQ" : "No",
186 "Support Max Rate SATA" : "No",
187 "Support Degraded Media" : "No",
188 "Support Parallel FW Update" : "No",
189 "Support Drive Crypto Erase" : "Yes",
190 "Support SSD Wear Gauge" : "No"
191 },
192 "Supported VD Operations" : {
193 "Read Policy" : "Yes",
194 "Write Policy" : "Yes",
195 "IO Policy" : "Yes",
196 "Access Policy" : "Yes",
197 "Disk Cache Policy" : "Yes",
198 "Reconstruction" : "Yes",
199 "Deny Locate" : "No",
200 "Deny CC" : "No",
201 "Allow Ctrl Encryption" : "No",
202 "Enable LDBBM" : "Yes",
203 "Support FastPath" : "Yes",
204 "Performance Metrics" : "Yes",
205 "Power Savings" : "Yes",
206 "Support Powersave Max With Cache" : "No",
207 "Support Breakmirror" : "Yes",
208 "Support SSC WriteBack" : "No",
209 "Support SSC Association" : "No",
210 "Support VD Hide" : "No",
211 "Support VD Cachebypass" : "Yes",
212 "Support VD discardCacheDuringLDDelete" : "No",
213 "Support VD Scsi Unmap" : "No"
214 },
215 "HwCfg" : {
216 "ChipRevision" : " C0",
217 "BatteryFRU" : "N/A",
218 "Front End Port Count" : 0,
219 "Backend Port Count" : 8,
220 "BBU" : "Present",
221 "Alarm" : "Absent",
222 "Serial Debugger" : "Present",
223 "NVRAM Size" : "32KB",
224 "Flash Size" : "16MB",
225 "On Board Memory Size" : "2048MB",
226 "CacheVault Flash Size" : "4.000 GB",
227 "TPM" : "Absent",
228 "Upgrade Key" : "Absent",
229 "On Board Expander" : "Absent",
230 "Temperature Sensor for ROC" : "Present",
231 "Temperature Sensor for Controller" : "Present",
232 "Upgradable CPLD" : "Absent",
233 "Upgradable PSOC" : "Absent",
234 "Current Size of CacheCade (GB)" : 0,
235 "Current Size of FW Cache (MB)" : 0,
236 "ROC temperature(Degree Celsius)" : 67,
237 "Ctrl temperature(Degree Celsius)" : 67
238 },
239 "Policies" : {
240 "Policies Table" : [
241 {
242 "Policy" : "Predictive Fail Poll Interval",
243 "Current" : "300 sec",
244 "Default" : ""
245 },
246 {
247 "Policy" : "Interrupt Throttle Active Count",
248 "Current" : "16",
249 "Default" : ""
250 },
251 {
252 "Policy" : "Interrupt Throttle Completion",
253 "Current" : "50 us",
254 "Default" : ""
255 },
256 {
257 "Policy" : "Rebuild Rate",
258 "Current" : "30 %",
259 "Default" : "30%"
260 },
261 {
262 "Policy" : "PR Rate",
263 "Current" : "30 %",
264 "Default" : "30%"
265 },
266 {
267 "Policy" : "BGI Rate",
268 "Current" : "30 %",
269 "Default" : "30%"
270 },
271 {
272 "Policy" : "Check Consistency Rate",
273 "Current" : "30 %",
274 "Default" : "30%"
275 },
276 {
277 "Policy" : "Reconstruction Rate",
278 "Current" : "30 %",
279 "Default" : "30%"
280 },
281 {
282 "Policy" : "Cache Flush Interval",
283 "Current" : "4s",
284 "Default" : ""
285 }
286 ],
287 "Flush Time(Default)" : "4s",
288 "Drive Coercion Mode" : "128MB",
289 "Auto Rebuild" : "On",
290 "Battery Warning" : "On",
291 "ECC Bucket Size" : 255,
292 "ECC Bucket Leak Rate (hrs)" : 4,
293 "Restore Hot Spare on Insertion" : "Off",
294 "Expose Enclosure Devices" : "Off",
295 "Maintain PD Fail History" : "Off",
296 "Reorder Host Requests" : "On",
297 "Auto detect BackPlane" : "SGPIO/i2c SEP",
298 "Load Balance Mode" : "Auto",
299 "Security Key Assigned" : "Off",
300 "Disable Online Controller Reset" : "Off",
301 "Use drive activity for locate" : "Off"
302 },
303 "Boot" : {
304 "BIOS Enumerate VDs" : 1,
305 "Stop BIOS on Error" : "Off",
306 "Delay during POST" : 0,
307 "Spin Down Mode" : "None",
308 "Enable Ctrl-R" : "Yes",
309 "Enable Web BIOS" : "No",
310 "Enable PreBoot CLI" : "No",
311 "Enable BIOS" : "Yes",
312 "Max Drives to Spinup at One Time" : 4,
313 "Maximum number of direct attached drives to spin up in 1 min" : 0,
314 "Delay Among Spinup Groups (sec)" : 12,
315 "Allow Boot with Preserved Cache" : "Off"
316 },
317 "High Availability" : {
318 "Topology Type" : "None",
319 "Cluster Permitted" : "No",
320 "Cluster Active" : "No"
321 },
322 "Defaults" : {
323 "Phy Polarity" : 0,
324 "Phy PolaritySplit" : 0,
325 "Strip Size" : "64 KB",
326 "Write Policy" : "WB",
327 "Read Policy" : "Adaptive",
328 "Cache When BBU Bad" : "Off",
329 "Cached IO" : "Off",
330 "VD PowerSave Policy" : "Controller Defined",
331 "Default spin down time (mins)" : 30,
332 "Coercion Mode" : "128 MB",
333 "ZCR Config" : "Unknown",
334 "Max Chained Enclosures" : 4,
335 "Direct PD Mapping" : "Yes",
336 "Restore Hot Spare on Insertion" : "No",
337 "Expose Enclosure Devices" : "No",
338 "Maintain PD Fail History" : "No",
339 "Zero Based Enclosure Enumeration" : "Yes",
340 "Disable Puncturing" : "No",
341 "EnableLDBBM" : "Yes",
342 "DisableHII" : "No",
343 "Un-Certified Hard Disk Drives" : "Allow",
344 "SMART Mode" : "Mode 6",
345 "Enable LED Header" : "No",
346 "LED Show Drive Activity" : "Yes",
347 "Dirty LED Shows Drive Activity" : "No",
348 "EnableCrashDump" : "No",
349 "Disable Online Controller Reset" : "No",
350 "Treat Single span R1E as R10" : "Yes",
351 "Power Saving option" : "Enabled",
352 "TTY Log In Flash" : "Yes",
353 "Auto Enhanced Import" : "No",
354 "BreakMirror RAID Support" : "single span R1",
355 "Disable Join Mirror" : "Yes",
356 "Enable Shield State" : "No",
357 "Time taken to detect CME" : "60 sec"
358 },
359 "Capabilities" : {
360 "Supported Drives" : "SAS, SATA",
361 "RAID Level Supported" : "RAID0, RAID1, RAID5, RAID6, RAID10(2 or more drives per span), RAID50, RAID60",
362 "Enable SystemPD" : "Yes",
363 "Mix in Enclosure" : "Allowed",
364 "Mix of SAS/SATA of HDD type in VD" : "Not Allowed",
365 "Mix of SAS/SATA of SSD type in VD" : "Not Allowed",
366 "Mix of SSD/HDD in VD" : "Not Allowed",
367 "SAS Disable" : "No",
368 "Max Arms Per VD" : 32,
369 "Max Spans Per VD" : 8,
370 "Max Arrays" : 128,
371 "Max VD per array" : 16,
372 "Max Number of VDs" : 64,
373 "Max Parallel Commands" : 928,
374 "Max SGE Count" : 60,
375 "Max Data Transfer Size" : "8192 sectors",
376 "Max Strips PerIO" : 128,
377 "Max Configurable CacheCade Size(GB)" : 0,
378 "Max Transportable DGs" : 0,
379 "Enable Snapdump" : "No",
380 "Enable SCSI Unmap" : "Yes",
381 "Read cache bypass enabled for Parity RAID LDs" : "No",
382 "FDE Drive Mix Support" : "No",
383 "Min Strip Size" : "64 KB",
384 "Max Strip Size" : "1.000 MB"
385 },
386 "Scheduled Tasks" : {
387 "Consistency Check Reoccurrence" : "168 hrs",
388 "Next Consistency check launch" : "NA",
389 "Patrol Read Reoccurrence" : "168 hrs",
390 "Next Patrol Read launch" : "04/29/2023, 03:00:00",
391 "Battery learn Reoccurrence" : "2160 hrs",
392 "OEMID" : "Dell"
393 },
394 "Security Protocol properties" : {
395 "Security Protocol" : "None"
396 },
397 "JBOD Drives" : 10,
398 "JBOD LIST" : [
399 {
400 "EID:Slt" : "32:0",
401 "DID" : 0,
402 "State" : "JBOD",
403 "DG" : "-",
404 "Size" : "223.570 GB",
405 "Intf" : "SATA",
406 "Med" : "SSD",
407 "SED" : "N",
408 "PI" : "N",
409 "SeSz" : "512B",
410 "Model" : "MTFDDAK240TCB",
411 "Sp" : "U",
412 "Type" : "-"
413 },
414 {
415 "EID:Slt" : "32:1",
416 "DID" : 1,
417 "State" : "JBOD",
418 "DG" : "-",
419 "Size" : "223.570 GB",
420 "Intf" : "SATA",
421 "Med" : "SSD",
422 "SED" : "N",
423 "PI" : "N",
424 "SeSz" : "512B",
425 "Model" : "MTFDDAK240TCB",
426 "Sp" : "U",
427 "Type" : "-"
428 },
429 {
430 "EID:Slt" : "32:2",
431 "DID" : 2,
432 "State" : "JBOD",
433 "DG" : "-",
434 "Size" : "1.746 TB",
435 "Intf" : "SATA",
436 "Med" : "SSD",
437 "SED" : "N",
438 "PI" : "N",
439 "SeSz" : "512B",
440 "Model" : "MTFDDAK1T9TDN",
441 "Sp" : "U",
442 "Type" : "-"
443 },
444 {
445 "EID:Slt" : "32:3",
446 "DID" : 3,
447 "State" : "JBOD",
448 "DG" : "-",
449 "Size" : "1.746 TB",
450 "Intf" : "SATA",
451 "Med" : "SSD",
452 "SED" : "N",
453 "PI" : "N",
454 "SeSz" : "512B",
455 "Model" : "MTFDDAK1T9TDN",
456 "Sp" : "U",
457 "Type" : "-"
458 },
459 {
460 "EID:Slt" : "32:4",
461 "DID" : 4,
462 "State" : "JBOD",
463 "DG" : "-",
464 "Size" : "1.746 TB",
465 "Intf" : "SATA",
466 "Med" : "SSD",
467 "SED" : "N",
468 "PI" : "N",
469 "SeSz" : "512B",
470 "Model" : "MTFDDAK1T9TDN",
471 "Sp" : "U",
472 "Type" : "-"
473 },
474 {
475 "EID:Slt" : "32:5",
476 "DID" : 5,
477 "State" : "JBOD",
478 "DG" : "-",
479 "Size" : "1.746 TB",
480 "Intf" : "SATA",
481 "Med" : "SSD",
482 "SED" : "N",
483 "PI" : "N",
484 "SeSz" : "512B",
485 "Model" : "MTFDDAK1T9TDN",
486 "Sp" : "U",
487 "Type" : "-"
488 },
489 {
490 "EID:Slt" : "32:6",
491 "DID" : 6,
492 "State" : "JBOD",
493 "DG" : "-",
494 "Size" : "1.746 TB",
495 "Intf" : "SATA",
496 "Med" : "SSD",
497 "SED" : "N",
498 "PI" : "N",
499 "SeSz" : "512B",
500 "Model" : "MTFDDAK1T9TDN",
501 "Sp" : "U",
502 "Type" : "-"
503 },
504 {
505 "EID:Slt" : "32:7",
506 "DID" : 7,
507 "State" : "JBOD",
508 "DG" : "-",
509 "Size" : "1.746 TB",
510 "Intf" : "SATA",
511 "Med" : "SSD",
512 "SED" : "N",
513 "PI" : "N",
514 "SeSz" : "512B",
515 "Model" : "MTFDDAK1T9TDN",
516 "Sp" : "U",
517 "Type" : "-"
518 },
519 {
520 "EID:Slt" : "32:8",
521 "DID" : 8,
522 "State" : "JBOD",
523 "DG" : "-",
524 "Size" : "1.746 TB",
525 "Intf" : "SATA",
526 "Med" : "SSD",
527 "SED" : "N",
528 "PI" : "N",
529 "SeSz" : "512B",
530 "Model" : "MTFDDAK1T9TDN",
531 "Sp" : "U",
532 "Type" : "-"
533 },
534 {
535 "EID:Slt" : "32:9",
536 "DID" : 9,
537 "State" : "JBOD",
538 "DG" : "-",
539 "Size" : "1.746 TB",
540 "Intf" : "SATA",
541 "Med" : "SSD",
542 "SED" : "N",
543 "PI" : "N",
544 "SeSz" : "512B",
545 "Model" : "MTFDDAK1T9TDN",
546 "Sp" : "U",
547 "Type" : "-"
548 }
549 ],
550 "Physical Drives" : 10,
551 "PD LIST" : [
552 {
553 "EID:Slt" : "32:0",
554 "DID" : 0,
555 "State" : "JBOD",
556 "DG" : "-",
557 "Size" : "223.570 GB",
558 "Intf" : "SATA",
559 "Med" : "SSD",
560 "SED" : "N",
561 "PI" : "N",
562 "SeSz" : "512B",
563 "Model" : "MTFDDAK240TCB",
564 "Sp" : "U",
565 "Type" : "-"
566 },
567 {
568 "EID:Slt" : "32:1",
569 "DID" : 1,
570 "State" : "JBOD",
571 "DG" : "-",
572 "Size" : "223.570 GB",
573 "Intf" : "SATA",
574 "Med" : "SSD",
575 "SED" : "N",
576 "PI" : "N",
577 "SeSz" : "512B",
578 "Model" : "MTFDDAK240TCB",
579 "Sp" : "U",
580 "Type" : "-"
581 },
582 {
583 "EID:Slt" : "32:2",
584 "DID" : 2,
585 "State" : "JBOD",
586 "DG" : "-",
587 "Size" : "1.746 TB",
588 "Intf" : "SATA",
589 "Med" : "SSD",
590 "SED" : "N",
591 "PI" : "N",
592 "SeSz" : "512B",
593 "Model" : "MTFDDAK1T9TDN",
594 "Sp" : "U",
595 "Type" : "-"
596 },
597 {
598 "EID:Slt" : "32:3",
599 "DID" : 3,
600 "State" : "JBOD",
601 "DG" : "-",
602 "Size" : "1.746 TB",
603 "Intf" : "SATA",
604 "Med" : "SSD",
605 "SED" : "N",
606 "PI" : "N",
607 "SeSz" : "512B",
608 "Model" : "MTFDDAK1T9TDN",
609 "Sp" : "U",
610 "Type" : "-"
611 },
612 {
613 "EID:Slt" : "32:4",
614 "DID" : 4,
615 "State" : "JBOD",
616 "DG" : "-",
617 "Size" : "1.746 TB",
618 "Intf" : "SATA",
619 "Med" : "SSD",
620 "SED" : "N",
621 "PI" : "N",
622 "SeSz" : "512B",
623 "Model" : "MTFDDAK1T9TDN",
624 "Sp" : "U",
625 "Type" : "-"
626 },
627 {
628 "EID:Slt" : "32:5",
629 "DID" : 5,
630 "State" : "JBOD",
631 "DG" : "-",
632 "Size" : "1.746 TB",
633 "Intf" : "SATA",
634 "Med" : "SSD",
635 "SED" : "N",
636 "PI" : "N",
637 "SeSz" : "512B",
638 "Model" : "MTFDDAK1T9TDN",
639 "Sp" : "U",
640 "Type" : "-"
641 },
642 {
643 "EID:Slt" : "32:6",
644 "DID" : 6,
645 "State" : "JBOD",
646 "DG" : "-",
647 "Size" : "1.746 TB",
648 "Intf" : "SATA",
649 "Med" : "SSD",
650 "SED" : "N",
651 "PI" : "N",
652 "SeSz" : "512B",
653 "Model" : "MTFDDAK1T9TDN",
654 "Sp" : "U",
655 "Type" : "-"
656 },
657 {
658 "EID:Slt" : "32:7",
659 "DID" : 7,
660 "State" : "JBOD",
661 "DG" : "-",
662 "Size" : "1.746 TB",
663 "Intf" : "SATA",
664 "Med" : "SSD",
665 "SED" : "N",
666 "PI" : "N",
667 "SeSz" : "512B",
668 "Model" : "MTFDDAK1T9TDN",
669 "Sp" : "U",
670 "Type" : "-"
671 },
672 {
673 "EID:Slt" : "32:8",
674 "DID" : 8,
675 "State" : "JBOD",
676 "DG" : "-",
677 "Size" : "1.746 TB",
678 "Intf" : "SATA",
679 "Med" : "SSD",
680 "SED" : "N",
681 "PI" : "N",
682 "SeSz" : "512B",
683 "Model" : "MTFDDAK1T9TDN",
684 "Sp" : "U",
685 "Type" : "-"
686 },
687 {
688 "EID:Slt" : "32:9",
689 "DID" : 9,
690 "State" : "JBOD",
691 "DG" : "-",
692 "Size" : "1.746 TB",
693 "Intf" : "SATA",
694 "Med" : "SSD",
695 "SED" : "N",
696 "PI" : "N",
697 "SeSz" : "512B",
698 "Model" : "MTFDDAK1T9TDN",
699 "Sp" : "U",
700 "Type" : "-"
701 }
702 ],
703 "Enclosures" : 1,
704 "Enclosure LIST" : [
705 {
706 "EID" : 32,
707 "State" : "OK",
708 "Slots" : 10,
709 "PD" : 10,
710 "PS" : 0,
711 "Fans" : 0,
712 "TSs" : 0,
713 "Alms" : 0,
714 "SIM" : 1,
715 "Port#" : "00 & 00 x8",
716 "ProdID" : "BP14G+EXP",
717 "VendorSpecific" : " +"
718 }
719 ],
720 "BBU_Info" : [
721 {
722 "Model" : "BBU",
723 "State" : "Optimal",
724 "RetentionTime" : "0 hour(s)",
725 "Temp" : "31C",
726 "Mode" : "-",
727 "MfgDate" : "0/00/00"
728 }
729 ]
730 }
731}
732]
733}
734

Ah, also interesting. Thanks @dcaro.
I wasn't aware that these cloudcephosd hosts were using cards compatible with perccli

I suspect that it could be made to work for these hosts as well without much modification. When I started looking at it, I was only thinking that it would be useful for the cards with host bus adapters and with multiple storage types.
I can see now that the cloudcephosd hosts have Perc H750 raid controllers, but that they have JBOD mode enabled, so your disks are passed straight through to the O/S.

Perhaps you get a JBOD LIST instead of a PD LIST.

I'm happy to help with making it more applicable to your case as well, if it would help.

Change 911282 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] ceph_disks: ensure all confines are considered

https://gerrit.wikimedia.org/r/911282

I wasn't aware that these cloudcephosd hosts were using cards compatible with perccli

I'm not sure if they do, either way they don't have the command installed

cloudcephosd1005 ~ $ sudo which -a  perccli64                                                       
cloudcephosd1005 ~ $

There is currently an issue with the confine, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/911282

Change 911282 merged by Jbond:

[operations/puppet@production] ceph_disks: ensure all confines are considered

https://gerrit.wikimedia.org/r/911282

After installing perccli on the WMCS hosts, it seems to fail at the ruby level :/

actully i see it was removed ill take another look to send a quick fix

There is currently an issue with the confine, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/911282

Ah, thanks for that. I wasn't aware that the && was required for multiple confine statements. Sorry about that.

Change 911284 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] ceph_disks: Skip if we dont have drive information:

https://gerrit.wikimedia.org/r/911284

Change 911284 merged by Jbond:

[operations/puppet@production] ceph_disks: Skip if we dont have drive information:

https://gerrit.wikimedia.org/r/911284

There is currently an issue with the confine, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/911282

Ah, thanks for that. I wasn't aware that the && was required for multiple confine statements. Sorry about that.

yes the confine is based on the return value of the statment, most confines are just for a fact and take the form

# confine fact => value
confine kernel :Linux

When you use a block it executes the entire block and uses its return value i.e. last executed statement. FTR it could also be written as multiple blokcs

confine do
  Facter::Core::Execution.which('perccli64')
end 
confine do
  Facter::Util::Resolution.which('dpkg-query')
end 
confine do
  Facter::Util::Resolution.exec("dpkg-query -W --showformat='${Status}' ceph-osd") == "install ok installed"
end

Latest patch fixes the cloudceph issue or at least stops them from having an error. @dcaro i have left perccli installed on cloudcephosd1005 to test improvments

Change 911287 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] ceph_disks: add more info based on pd list

https://gerrit.wikimedia.org/r/911287

Latest patch fixes the cloudceph issue or at least stops them from having an error. @dcaro i have left perccli installed on cloudcephosd1005 to test improvements

I have added some basic info for the cloudceph host based on the PD list which gives output like the following

{
  "54cd98f0cd3ac000": {
    "status": "Success",
    "model": "PERC H730P Adapter",
    "disks": {
      "32:0": {
        "enclosure": "32:0",
        "slot": 0,
        "medium": "SSD",
        "interface": "SATA"
      }
}

unfortunately the output don't contain the wwn or serial number, it may be possible to get this from else where e.g. /dev/disks/by-id however keep in mind that if the raid controller has configured a vdisk (for each jbod) then the wwn will represent the vdisk and not the physical disk

Change 911287 merged by Jbond:

[operations/puppet@production] ceph_disks: add more info based on pd list

https://gerrit.wikimedia.org/r/911287

unfortunately the output don't contain the wwn or serial number, it may be possible to get this from else where e.g. /dev/disks/by-id however keep in mind that if the raid controller has configured a vdisk (for each jbod) then the wwn will represent the vdisk and not the physical disk

Thanks!
Super quick fix :)

Hmm, yep, we might not be able to uniquely identify the hard drives with this, I will try to take a look whenever I have some time, though we might have to rely on knowledge of what should be there (ex. that there's a raid on the OS drives, or the size of the ceph drives, etc.) as we currently do.

unfortunately the output don't contain the wwn or serial number, it may be possible to get this from else where

On cloudcephosd1005 (with a PERC H730P) we can obtain the WWN and serial number for each drive if we run perccli64 /call/eall/sall show all but not perccli64 /call show all

On cephosd100* (wih an HBA330 mini) this information is available with either perccli64 /call/eall/sall show all or perccli64 /call show all - we're currently using the second method, because this gives us everything we need for these particular servers, but we could change that if we think it will help with reusability of this code.

It's also worth considering that firmware updates to these storage adapters might potentially change the information that is reported to the perccli64 command.

! In T330151#8801265, @dcaro wrote:

Hmm, yep, we might not be able to uniquely identify the hard drives with this, I will try to take a look whenever I have some time, though we might have to rely on knowledge of what should be there (ex. that there's a raid on the OS drives, or the size of the ceph drives, etc.) as we currently do.

I think that we should probably add the size of the drives to the ceph_disks fact too.

I had originally started trying to ascertain the disks to use for Ceph by including only those with SAS interfaces.
For the cephosd100* servers this is fine, because the only SATA disks are the pair used in the O/S software RAID mirror. However, now that I check cloudcephosd1005 I can see that all of the Ceph OSD disks are SATA as well.

btullis@cloudcephosd1005:~$ sudo perccli64 /call/eall/sall show
CLI Version = 007.1910.0000.0000 Oct 08, 2021
Operating system = Linux 5.10.0-21-amd64
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

---------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model         Sp Type 
---------------------------------------------------------------------------
32:0      0 JBOD  -  223.570 GB SATA SSD N   N  512B MTFDDAK240TCB U  -    
32:1      1 JBOD  -  223.570 GB SATA SSD N   N  512B MTFDDAK240TCB U  -    
32:2      2 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:3      3 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:4      4 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:5      5 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:6      6 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:7      7 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:8      8 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
32:9      9 JBOD  -    1.746 TB SATA SSD N   N  512B MTFDDAK1T9TDN U  -    
---------------------------------------------------------------------------

So I would say that the best thing to do is probably add the disk size to the ceph_disks fact and then exclude any SSDs disks smaller than 1 TB.

It's worth noting that although reusability across clusters would be great, there isn't any time pressure to make the new modules/ceph/osd*classes compatible with the WMCS (cloudceph) cluster.
However, there is time pressure to get them working on the new ceph cluster, so I'm fine with leaving a few TODOs in place for getting cross-cluster compatibility. We don't have to solve all of these issues today, we just need to think how to avoid tech debt for the future.

JArguello-WMF set the point value for this task to 5.

It feels like I've been working on this patch forver, but I'm almost there, I believe.
This is just a status update, while I continue to work on the patch to get it to the point of requesting a review.

  • The latest pcc run is successful.
  • It shows the patch adding all of the OSD processes to cephosd1001, each using a puppet define of ceph::osd
  • The drives to use are obtained from the new ceph_disks fact that has previoulsly been created.
  • Operating system drives are excluded from this by the use of a hiera value that can be set per role or per host, as required. This identification is based on the storage bay IDs.
  • Similarly, any OSDs can be temporarily absented from the cluster with a hiera lookup, should they fail and need to be replaced.
  • A bluestore DB device can be specified as an option and it will be partitiond according to the number of OSDs (backed by hard drives) that share it.
  • The OSDs are created and by using their world-wide-name (wwn) i.e. /dev/disk/by-id/wwn-* instead of their /dev/sd* name, since the latter will change across reboots.
  • Both SAS and SATA drives are supported using this approach, despite the fact that the WWN is handle differently by the two architectured.
  • The names of the OSD resources are based on their controller, enclosure, and slot numbers (e.g. c0e23s16) which will make identification more practical later, when any maintenance is required.
  • Unit tests exist and are currently passing.

I'm going to carry on to address the existing comments on the patch.

I've now got this to the point where I think it is ready for review.

I've added @bking and @RKemper from Data-Platform-SRE as reviewers on the Ceph OSD patch for good measure, given that our teams are merging.

I'd be particularly grateful for any reviews from other Ceph admins, such as @dcaro, @aborrero, @MatthewVernon.

I have verified that it is a noop on existing cloudceph servers, so it only affects cephosd100[1-5]. That gives me confidence that it is safe to deploy, but I'd still value a second opinion before proceeding to see if I've missed anything.

I'm picking up my previous work on this.

One of the things that I need to do is to be able to get ceph-volume lvm list --format=json to work, so that I can update the ceph::osd defined type to parse its output.

In order to do that, I'm starting running some manual commands to add and remove the OSDs.

The first command that I've run manually is the command to prepare one of the SSDs for use as an LVM based OSDs. This has no separate bluestore device.

I'm recording the input and output here in case it is useful.

btullis@cephosd1001:~$ sudo ceph-volume lvm prepare --bluestore --data /dev/disk/by-id/wwn-0x58ce38ee21f3c50d --crush-device-class ssd
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c45294e1-3301-458c-81a2-4e307947bd65
Running command: vgcreate --force --yes ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3 /dev/sdb
 stdout: Physical volume "/dev/sdb" successfully created.
 stdout: Volume group "ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3" successfully created
Running command: lvcreate --yes -l 915707 -n osd-block-c45294e1-3301-458c-81a2-4e307947bd65 ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3
 stdout: Logical volume "osd-block-c45294e1-3301-458c-81a2-4e307947bd65" created.
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
--> Executable selinuxenabled not in PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-3
Running command: /usr/bin/ln -s /dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65 /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
 stderr: 2023-07-13T09:30:57.620+0000 7ff7a2fe2700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-07-13T09:30:57.620+0000 7ff7a2fe2700 -1 AuthRegistry(0x7ff79c0607d0) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
 stderr: got monmap epoch 3
--> Creating keyring file for osd.0
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid c45294e1-3301-458c-81a2-4e307947bd65 --setuser ceph --setgroup ceph
 stderr: 2023-07-13T09:30:58.048+0000 7f30affe4240 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sdb

Next I ran the activation command:

btullis@cephosd1001:~$ fsid=$(sudo ceph-volume lvm list /dev/disk/by-id/wwn-0x58ce38ee21f3c50d --format=json | jq -r '.[]|.[]|.tags|."ceph.osd_fsid"')
btullis@cephosd1001:~$ echo $fsid
c45294e1-3301-458c-81a2-4e307947bd65
btullis@cephosd1001:~$ sudo ceph-volume lvm activate 0 $fsid
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65 /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-3
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/systemctl enable ceph-volume@lvm-0-c45294e1-3301-458c-81a2-4e307947bd65
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-c45294e1-3301-458c-81a2-4e307947bd65.service → /lib/systemd/system/ceph-volume@.service.
Running command: /usr/bin/systemctl enable --runtime ceph-osd@0
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service → /lib/systemd/system/ceph-osd@.service.
Running command: /usr/bin/systemctl start ceph-osd@0
--> ceph-volume lvm activate successful for osd ID: 0

Looks good.

btullis@cephosd1001:~$ sudo ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-1         3.49309  root default                                   
-3         3.49309      host cephosd1001                           
 0    ssd  3.49309          osd.0             up   1.00000  1.00000
btullis@cephosd1001:~$ sudo ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL    %USE  VAR   PGS  STATUS
 0    ssd  3.49309   1.00000  3.5 TiB  7.4 MiB  112 KiB   0 B  7.3 MiB  3.5 TiB     0  1.00    0      up
                       TOTAL  3.5 TiB  7.4 MiB  112 KiB   0 B  7.3 MiB  3.5 TiB     0                   
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

The sizes look good.

Beginning the decom checks for this disk.

btullis@cephosd1001:~$ sudo ceph osd ok-to-stop osd.0
{"ok_to_stop":true,"osds":[0],"num_ok_pgs":0,"num_not_ok_pgs":0}
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ sudo ceph osd safe-to-destroy osd.0
OSD(s) 0 are safe to destroy without reducing data durability.
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ sudo systemctl stop ceph-osd@0
btullis@cephosd1001:~$ sudo ceph osd crush remove osd.0
removed item id 0 name 'osd.0' from crush map
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ sudo ceph auth del osd.0
updated
btullis@cephosd1001:~$ echo $?
0
purged osd.0
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ sudo umount /var/lib/ceph/osd/ceph-0
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ sudo rm -fr /var/lib/ceph/osd/ceph-0
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ sudo ceph-volume lvm zap /dev/disk/by-id/wwn-0x58ce38ee21f3c50d
--> Zapping: /dev/sdb
--> Zapping lvm member /dev/sdb. lv_path is /dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0261164 s, 402 MB/s
--> --destroy was not specified, but zapping a whole device will remove the partition table
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
 stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
--> failed to wipefs device, will try again to workaround probable race condition
-->  RuntimeError: could not complete wipefs on device: /dev/sdb
btullis@cephosd1001:~$ echo $?
1

So everything worked apart from the ceph-volume zap command. I'll look into that a bit more.

OK, I simply needed to add the --destroy option to ceph-volume lvm zap

btullis@cephosd1001:~$ sudo ceph-volume lvm zap /dev/disk/by-id/wwn-0x58ce38ee21f3c50d --destroy
--> Zapping: /dev/sdb
--> Zapping lvm member /dev/sdb. lv_path is /dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3/osd-block-c45294e1-3301-458c-81a2-4e307947bd65 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0288098 s, 364 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3
Running command: vgremove -v -f ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3
 stderr: Removing ceph--8bbf2abe--4ef1--4d7a--b978--51cf4d9829f3-osd--block--c45294e1--3301--458c--81a2--4e307947bd65 (253:3)
 stderr: Archiving volume group "ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3" metadata (seqno 6).
 stderr: Releasing logical volume "osd-block-c45294e1-3301-458c-81a2-4e307947bd65"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3" (seqno 7).
 stdout: Logical volume "osd-block-c45294e1-3301-458c-81a2-4e307947bd65" successfully removed
 stdout: Volume group "ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3" successfully removed
 stderr: Removing physical volume "/dev/sdb" from volume group "ceph-8bbf2abe-4ef1-4d7a-b978-51cf4d9829f3"
Running command: pvremove -v -f -f /dev/sdb
 stdout: Labels on physical volume "/dev/sdb" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0236941 s, 443 MB/s
--> Zapping successful for: <Raw Device: /dev/sdb>
btullis@cephosd1001:~$ echo $?
0

Moving to deploy this change now. I expect that we might receive some alerts from Icinga re: disk-space etc so I have added a week's downtime on the servers in Icinga.
As they are still pre-production, I think this is fine. I will disable puppet on the five servers and then begin testing the roullout from 1001->1005

Change 896116 merged by Btullis:

[operations/puppet@production] ceph: Add puppet management of OSDs on new ceph cluster

https://gerrit.wikimedia.org/r/896116

There are a few errors.
It's trying to disable the write cache on the NVMe drive, which I'm not sure was intentional.

Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0c0n1]/returns: /dev/nvme0c0n1: No such file or directory
Error: 'hdparm -W 0 /dev/nvme0c0n1' returned 2 instead of one of [0]
Error: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0c0n1]/returns: change from 'notrun' to ['0'] failed: 'hdparm -W 0 /dev/nvme0c0n1' returned 2 instead of one of [0]
Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns:  HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns:  HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device
Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns:  HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns: 
Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns: /dev/nvme0n1:
Notice: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns:  setting drive write-caching to 0 (off)
Error: 'hdparm -W 0 /dev/nvme0n1' returned 25 instead of one of [0]
Error: /Stage[main]/Ceph::Osds/Exec[Disable write cache on device /dev/nvme0n1]/returns: change from 'notrun' to ['0'] failed: 'hdparm -W 0 /dev/nvme0n1' returned 25 instead of one of [0]

Setting the partition label on the NVMe drive failed with an incorrect device name:

Notice: /Stage[main]/Ceph::Osds/Exec[Create gpt label on /dev/nvme0n1]/returns: Error: Could not stat device /dev//dev/nvme0n1 - No such file or directory.
Error: 'parted -s -a optimal /dev//dev/nvme0n1 mklabel gpt' returned 1 instead of one of [0]
Error: /Stage[main]/Ceph::Osds/Exec[Create gpt label on /dev/nvme0n1]/returns: change from 'notrun' to ['0'] failed: 'parted -s -a optimal /dev//dev/nvme0n1 mklabel gpt' returned 1 instead of one of [0]

...however creating the subsequent partitions seemed to work anyway.

Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s0 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s1 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s2 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s3 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s4 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s5 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s6 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s7 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s8 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s9 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s10 on /dev/nvme0n1]/returns: executed successfully
Notice: /Stage[main]/Ceph::Osds/Exec[Create partition db.c0e23s11 on /dev/nvme0n1]/returns: executed successfully

The it's one of these per osd:

Error: /Stage[main]/Ceph::Osds/Ceph::Osd[c0e23s0]/Exec[ceph-osd-check-fsid-mismatch-c0e23s0]: Could not evaluate: Could not find command 'if'

This looks like a straight error on my part. Will rectify.

Change 940109 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the device name when running parted on cephosd servers

https://gerrit.wikimedia.org/r/940109

Change 940109 merged by Btullis:

[operations/puppet@production] Fix the device name when running parted on cephosd servers

https://gerrit.wikimedia.org/r/940109

Change 940116 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Do not attempt to use hdparm on nvme drives for cephosd servers

https://gerrit.wikimedia.org/r/940116

Change 940116 merged by Btullis:

[operations/puppet@production] Do not attempt to use hdparm on nvme drives for cephosd servers

https://gerrit.wikimedia.org/r/940116

Change 940178 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Stop repeatedly disabling the write cache on cephosd servers

https://gerrit.wikimedia.org/r/940178

Change 940178 merged by Btullis:

[operations/puppet@production] Stop repeatedly disabling the write cache on cephosd servers

https://gerrit.wikimedia.org/r/940178

Change 940192 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Run the ceph osd execs with the shell provider

https://gerrit.wikimedia.org/r/940192

Change 940192 merged by Btullis:

[operations/puppet@production] Run the ceph osd execs with the shell provider

https://gerrit.wikimedia.org/r/940192

Change 940326 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the cephosd fsid mismatch check

https://gerrit.wikimedia.org/r/940326

Change 940326 merged by Btullis:

[operations/puppet@production] Fix the cephosd fsid mismatch check

https://gerrit.wikimedia.org/r/940326

Change 940332 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Disable the ceph fsid mismatch check

https://gerrit.wikimedia.org/r/940332

Change 940332 merged by Btullis:

[operations/puppet@production] Disable the ceph fsid mismatch check

https://gerrit.wikimedia.org/r/940332

Change 940368 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the wwn of sas drives used for the cephosd servers

https://gerrit.wikimedia.org/r/940368

Change 940368 merged by Btullis:

[operations/puppet@production] Fix the wwn of sas drives used for the cephosd servers

https://gerrit.wikimedia.org/r/940368

Change 940388 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Use different WWN values for SAS HDDs and SSDs for cephosd servers

https://gerrit.wikimedia.org/r/940388

Change 940388 merged by Btullis:

[operations/puppet@production] Use different WWN values for SAS HDDs and SSDs for cephosd servers

https://gerrit.wikimedia.org/r/940388

Change 940882 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the cephosd activate exec resources

https://gerrit.wikimedia.org/r/940882

Change 940882 merged by Btullis:

[operations/puppet@production] Fix the cephosd activate exec resources

https://gerrit.wikimedia.org/r/940882

The first of the cephosd hosts now has all OSDs active.
There are 20 OSDs, numbered 0 to 19.

btullis@cephosd1001:~$ sudo ceph osd tree
ID  CLASS  WEIGHT     TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-1         229.98621  root default                                   
-3         229.98621      host cephosd1001                           
 0    hdd   16.83679          osd.0             up   1.00000  1.00000
 1    hdd   16.83679          osd.1             up   1.00000  1.00000
 2    hdd   16.83679          osd.2             up   1.00000  1.00000
 3    hdd   16.83679          osd.3             up   1.00000  1.00000
 4    hdd   16.83679          osd.4             up   1.00000  1.00000
 5    hdd   16.83679          osd.5             up   1.00000  1.00000
 6    hdd   16.83679          osd.6             up   1.00000  1.00000
 7    hdd   16.83679          osd.7             up   1.00000  1.00000
 8    hdd   16.83679          osd.8             up   1.00000  1.00000
 9    hdd   16.83679          osd.9             up   1.00000  1.00000
10    hdd   16.83679          osd.10            up   1.00000  1.00000
11    hdd   16.83679          osd.11            up   1.00000  1.00000
12    ssd    3.49309          osd.12            up   1.00000  1.00000
13    ssd    3.49309          osd.13            up   1.00000  1.00000
14    ssd    3.49309          osd.14            up   1.00000  1.00000
15    ssd    3.49309          osd.15            up   1.00000  1.00000
16    ssd    3.49309          osd.16            up   1.00000  1.00000
17    ssd    3.49309          osd.17            up   1.00000  1.00000
18    ssd    3.49309          osd.18            up   1.00000  1.00000
19    ssd    3.49309          osd.19            up   1.00000  1.00000

The raw capacities at present are shown here:

btullis@cephosd1001:~$ sudo ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    202 TiB  196 TiB  5.6 TiB   5.6 TiB       2.77
ssd     28 TiB   28 TiB   79 MiB    79 MiB          0
TOTAL  230 TiB  224 TiB  5.6 TiB   5.6 TiB       2.43
 
--- POOLS ---
POOL  ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr   1    1  705 KiB        2  705 KiB      0     71 TiB

There are still a few rouch edges on the puppet manifests, but I'm going to proceed to roll out the OSDs to the remaining four hosts.

All 100 OSD daemons are installed and running.

root@cephosd1002:~# ceph osd tree
ID   CLASS  WEIGHT      TYPE NAME             STATUS  REWEIGHT  PRI-AFF
 -1         1149.93103  root default                                   
 -3          229.98621      host cephosd1001                           
  0    hdd    16.83679          osd.0             up   1.00000  1.00000
  1    hdd    16.83679          osd.1             up   1.00000  1.00000
  2    hdd    16.83679          osd.2             up   1.00000  1.00000
  3    hdd    16.83679          osd.3             up   1.00000  1.00000
  4    hdd    16.83679          osd.4             up   1.00000  1.00000
  5    hdd    16.83679          osd.5             up   1.00000  1.00000
  6    hdd    16.83679          osd.6             up   1.00000  1.00000
  7    hdd    16.83679          osd.7             up   1.00000  1.00000
  8    hdd    16.83679          osd.8             up   1.00000  1.00000
  9    hdd    16.83679          osd.9             up   1.00000  1.00000
 10    hdd    16.83679          osd.10            up   1.00000  1.00000
 11    hdd    16.83679          osd.11            up   1.00000  1.00000
 12    ssd     3.49309          osd.12            up   1.00000  1.00000
 13    ssd     3.49309          osd.13            up   1.00000  1.00000
 14    ssd     3.49309          osd.14            up   1.00000  1.00000
 15    ssd     3.49309          osd.15            up   1.00000  1.00000
 16    ssd     3.49309          osd.16            up   1.00000  1.00000
 17    ssd     3.49309          osd.17            up   1.00000  1.00000
 18    ssd     3.49309          osd.18            up   1.00000  1.00000
 19    ssd     3.49309          osd.19            up   1.00000  1.00000
 -7          229.98621      host cephosd1002                           
 20    hdd    16.83679          osd.20            up   1.00000  1.00000
 21    hdd    16.83679          osd.21            up   1.00000  1.00000
 22    hdd    16.83679          osd.22            up   1.00000  1.00000
 23    hdd    16.83679          osd.23            up   1.00000  1.00000
 24    hdd    16.83679          osd.24            up   1.00000  1.00000
 25    hdd    16.83679          osd.25            up   1.00000  1.00000
 26    hdd    16.83679          osd.26            up   1.00000  1.00000
 27    hdd    16.83679          osd.27            up   1.00000  1.00000
 28    hdd    16.83679          osd.28            up   1.00000  1.00000
 29    hdd    16.83679          osd.29            up   1.00000  1.00000
 30    hdd    16.83679          osd.30            up   1.00000  1.00000
 31    hdd    16.83679          osd.31            up   1.00000  1.00000
 32    ssd     3.49309          osd.32            up   1.00000  1.00000
 33    ssd     3.49309          osd.33            up   1.00000  1.00000
 34    ssd     3.49309          osd.34            up   1.00000  1.00000
 35    ssd     3.49309          osd.35            up   1.00000  1.00000
 36    ssd     3.49309          osd.36            up   1.00000  1.00000
 37    ssd     3.49309          osd.37            up   1.00000  1.00000
 38    ssd     3.49309          osd.38            up   1.00000  1.00000
 39    ssd     3.49309          osd.39            up   1.00000  1.00000
-10          229.98621      host cephosd1003                           
 40    hdd    16.83679          osd.40            up   1.00000  1.00000
 41    hdd    16.83679          osd.41            up   1.00000  1.00000
 42    hdd    16.83679          osd.42            up   1.00000  1.00000
 43    hdd    16.83679          osd.43            up   1.00000  1.00000
 44    hdd    16.83679          osd.44            up   1.00000  1.00000
 45    hdd    16.83679          osd.45            up   1.00000  1.00000
 46    hdd    16.83679          osd.46            up   1.00000  1.00000
 47    hdd    16.83679          osd.47            up   1.00000  1.00000
 48    hdd    16.83679          osd.48            up   1.00000  1.00000
 49    hdd    16.83679          osd.49            up   1.00000  1.00000
 50    hdd    16.83679          osd.50            up   1.00000  1.00000
 51    hdd    16.83679          osd.51            up   1.00000  1.00000
 52    ssd     3.49309          osd.52            up   1.00000  1.00000
 53    ssd     3.49309          osd.53            up   1.00000  1.00000
 54    ssd     3.49309          osd.54            up   1.00000  1.00000
 55    ssd     3.49309          osd.55            up   1.00000  1.00000
 56    ssd     3.49309          osd.56            up   1.00000  1.00000
 57    ssd     3.49309          osd.57            up   1.00000  1.00000
 58    ssd     3.49309          osd.58            up   1.00000  1.00000
 59    ssd     3.49309          osd.59            up   1.00000  1.00000
-13          229.98621      host cephosd1004                           
 60    hdd    16.83679          osd.60            up   1.00000  1.00000
 61    hdd    16.83679          osd.61            up   1.00000  1.00000
 62    hdd    16.83679          osd.62            up   1.00000  1.00000
 63    hdd    16.83679          osd.63            up   1.00000  1.00000
 64    hdd    16.83679          osd.64            up   1.00000  1.00000
 65    hdd    16.83679          osd.65            up   1.00000  1.00000
 66    hdd    16.83679          osd.66            up   1.00000  1.00000
 67    hdd    16.83679          osd.67            up   1.00000  1.00000
 68    hdd    16.83679          osd.68            up   1.00000  1.00000
 69    hdd    16.83679          osd.69            up   1.00000  1.00000
 70    hdd    16.83679          osd.70            up   1.00000  1.00000
 71    hdd    16.83679          osd.71            up   1.00000  1.00000
 72    ssd     3.49309          osd.72            up   1.00000  1.00000
 73    ssd     3.49309          osd.73            up   1.00000  1.00000
 74    ssd     3.49309          osd.74            up   1.00000  1.00000
 75    ssd     3.49309          osd.75            up   1.00000  1.00000
 76    ssd     3.49309          osd.76            up   1.00000  1.00000
 77    ssd     3.49309          osd.77            up   1.00000  1.00000
 78    ssd     3.49309          osd.78            up   1.00000  1.00000
 79    ssd     3.49309          osd.79            up   1.00000  1.00000
-16          229.98621      host cephosd1005                           
 80    hdd    16.83679          osd.80            up   1.00000  1.00000
 81    hdd    16.83679          osd.81            up   1.00000  1.00000
 82    hdd    16.83679          osd.82            up   1.00000  1.00000
 83    hdd    16.83679          osd.83            up   1.00000  1.00000
 84    hdd    16.83679          osd.84            up   1.00000  1.00000
 85    hdd    16.83679          osd.85            up   1.00000  1.00000
 86    hdd    16.83679          osd.86            up   1.00000  1.00000
 87    hdd    16.83679          osd.87            up   1.00000  1.00000
 88    hdd    16.83679          osd.88            up   1.00000  1.00000
 89    hdd    16.83679          osd.89            up   1.00000  1.00000
 90    hdd    16.83679          osd.90            up   1.00000  1.00000
 91    hdd    16.83679          osd.91            up   1.00000  1.00000
 92    ssd     3.49309          osd.92            up   1.00000  1.00000
 93    ssd     3.49309          osd.93            up   1.00000  1.00000
 94    ssd     3.49309          osd.94            up   1.00000  1.00000
 95    ssd     3.49309          osd.95            up   1.00000  1.00000
 96    ssd     3.49309          osd.96            up   1.00000  1.00000
 97    ssd     3.49309          osd.97            up   1.00000  1.00000
 98    ssd     3.49309          osd.98            up   1.00000  1.00000
 99    ssd     3.49309          osd.99            up   1.00000  1.00000
root@cephosd1002:~#

Capacity of the hdd storage class is just over a petabyte.

root@cephosd1002:~# ceph df
--- RAW STORAGE ---
CLASS      SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    1010 TiB  982 TiB   28 TiB    28 TiB       2.77
ssd     140 TiB  140 TiB  926 MiB   926 MiB          0
TOTAL   1.1 PiB  1.1 PiB   28 TiB    28 TiB       2.43
 
--- POOLS ---
POOL  ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr   1    1  705 KiB        2  705 KiB      0    354 TiB

Noting down some things to fix, while I think about it:

  • Need to install manually: ceph-osd, ceph-volume, hdparm
  • Need to take a copy of /var/lib/ceph/bootstrap-osd/ceph.keyring and store it at /etc/ceph/ceph.client.bootstrap-osd.keyring

Haven't tested the osd removal mechanism yet.

Change 941010 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Install the ceph-volume and hdparm packages on cephosd servers

https://gerrit.wikimedia.org/r/941010

Change 941011 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add a second copy of the bootstrap-osd keyring to cephosd

https://gerrit.wikimedia.org/r/941011

I've made two small CRs to suggest fixes for the errors mentioned in T330151#9037871 but there is one more fix that will require a little more thinking about.
The ceph::osds class makes use of the ceph_volumes fact and will not compile without it.
However, that fact is constrained so that it will not be generated without the ceph-osd package being installed. This ius going to cause a chicken & egg situation.

I'm tempted to remove these two lines:
https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/facter/ceph_disks.rb#L52-L53
...which will cause the ceph_disks fact to be created on any host with the perccli64 executable present.

In general though, this ticket is nearly done.

root@cephosd1005:~# ceph df
--- RAW STORAGE ---
CLASS      SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    1010 TiB  982 TiB   28 TiB    28 TiB       2.77
ssd     140 TiB  140 TiB  876 MiB   876 MiB          0
TOTAL   1.1 PiB  1.1 PiB   28 TiB    28 TiB       2.43
 
--- POOLS ---
POOL  ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr   2    1  1.3 MiB        2  1.3 MiB      0    354 TiB
root@cephosd1005:~# ceph health
HEALTH_OK

Change 941014 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Exclude nagios checks of tmpfs mounts on cephosd servers

https://gerrit.wikimedia.org/r/941014

Change 941014 merged by Btullis:

[operations/puppet@production] Exclude nagios checks of tmpfs mounts on cephosd servers

https://gerrit.wikimedia.org/r/941014

Change 941010 merged by Btullis:

[operations/puppet@production] Install the ceph-volume and hdparm packages on cephosd servers

https://gerrit.wikimedia.org/r/941010

Change 941380 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Stop repeatedly disabling write cache on cephosd servers

https://gerrit.wikimedia.org/r/941380

Change 941380 merged by Btullis:

[operations/puppet@production] Stop repeatedly disabling write cache on cephosd servers

https://gerrit.wikimedia.org/r/941380

Tentatively moving this task to Done. Puppet now runs cleanly and Icinga is clean for these servers.
There are still likely to be some changes to make regarding:

  • Initial install/reimage
  • OSD removal

I'd like to:

  1. rewrite the defined type ceph::osd as a custom provider and this make less use of puppet's exec resources
  2. create a custom fact for ceph_volumes based on the output from ceph-volume lvm list --format=json

I'll make follow-up tickets for these.

This comment was removed by BTullis.

Change #941011 abandoned by Btullis:

[operations/puppet@production] Add a second copy of the bootstrap-osd keyring to cephosd

Reason:

Not required

https://gerrit.wikimedia.org/r/941011