Page MenuHomePhabricator

colewhite (cwhite)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2018, 6:05 PM (30 w, 2 d)
Availability
Available
LDAP User
Cwhite
MediaWiki User
Unknown

Recent Activity

Yesterday

colewhite added a comment to T217932: Change log routing to ELK cluster to use rsyslog->kafka rather than talking directly to the ELK cluster.

As I understand it, journald is already wired up to copy to rsyslog. The only change needed to get these logs onto Kafka is to whitelist the application in the lookup_table_output.json.

Thu, Mar 21, 5:24 PM · Patch-For-Review, Striker

Wed, Mar 6

colewhite closed T214594: node-exporter collector.diskstats.ignored-devices underescaped as Resolved.
Wed, Mar 6, 6:34 PM · Patch-For-Review, monitoring

Mon, Mar 4

colewhite claimed T214594: node-exporter collector.diskstats.ignored-devices underescaped.
Mon, Mar 4, 4:10 PM · Patch-For-Review, monitoring

Mon, Feb 25

colewhite closed T216120: LDAP access to the wmf group for Delphine Ménard (dmenard) as Resolved.
Mon, Feb 25, 8:13 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T216120: LDAP access to the wmf group for Delphine Ménard (dmenard).

@Delphine_wmf is now in the wmf ldap group. Resolving task.

Mon, Feb 25, 8:13 PM · Patch-For-Review, LDAP-Access-Requests

Thu, Feb 21

colewhite created P8120 Smartmon Node Exporter comparison.
Thu, Feb 21, 10:19 PM
colewhite placed T215940: Mailing list migration for Arbitration Committee to Google Group up for grabs.
Thu, Feb 21, 6:23 PM · Operations, CommRel-Specialists-Support (Jan-Mar-2019), Office-IT, Wikimedia-Mailing-lists
colewhite updated the task description for T215940: Mailing list migration for Arbitration Committee to Google Group.
Thu, Feb 21, 6:23 PM · Operations, CommRel-Specialists-Support (Jan-Mar-2019), Office-IT, Wikimedia-Mailing-lists
colewhite updated subscribers of T215940: Mailing list migration for Arbitration Committee to Google Group.

Mbox files shared with @eross .

Thu, Feb 21, 6:23 PM · Operations, CommRel-Specialists-Support (Jan-Mar-2019), Office-IT, Wikimedia-Mailing-lists
colewhite closed T215576: Please add Runa Bhattacharjee to the `wmf` LDAP group as Resolved.
Thu, Feb 21, 5:59 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T215576: Please add Runa Bhattacharjee to the `wmf` LDAP group.

@Arrbee is now in the wmf ldap group. Resolving task.

Thu, Feb 21, 5:59 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T213708: Upgrade production prometheus-node-exporter to >= 0.16.

On further investigation, the log messages appear to be from the shebang of the ipmitool awk script.

Thu, Feb 21, 4:51 PM · Patch-For-Review, Goal, monitoring, Operations

Feb 15 2019

colewhite added a comment to T216120: LDAP access to the wmf group for Delphine Ménard (dmenard).

I was unable to find your account in LDAP. Have you had an account created for you by OIT or created one on wikitech?

Feb 15 2019, 9:41 PM · Patch-For-Review, LDAP-Access-Requests
colewhite triaged T216235: cleanup reprepro configuration for elasticsearch-curator as Normal priority.
Feb 15 2019, 7:36 PM · User-fgiunchedi, Discovery-Search, Elasticsearch, Operations
colewhite triaged T216226: GPU upgrade for stat1005 as Normal priority.
Feb 15 2019, 7:35 PM · Analytics, hardware-requests, Operations
colewhite triaged T216202: Disk failure on labsdb1005 as Normal priority.
Feb 15 2019, 7:34 PM · Operations, ops-eqiad
colewhite triaged T216243: cron spam for slow queries on mwmaint /usr/local/bin/foreachwiki initSiteStats.php --update > /dev/null as Normal priority.
Feb 15 2019, 7:33 PM · Operations, MediaWiki-Maintenance-scripts
colewhite triaged T216273: New cronspam from db clusters as Normal priority.
Feb 15 2019, 7:33 PM · Operations
colewhite added a subtask for T132324: Tracking and Reducing cron-spam to root@ : T216273: New cronspam from db clusters.
Feb 15 2019, 7:32 PM · Patch-For-Review, Operations
colewhite added a parent task for T216273: New cronspam from db clusters: T132324: Tracking and Reducing cron-spam to root@ .
Feb 15 2019, 7:32 PM · Operations
colewhite triaged T216223: Degraded RAID on labsdb1005 as Normal priority.
Feb 15 2019, 7:31 PM · cloud-services-team (Kanban), Toolforge, ops-eqiad, Operations
colewhite created T216273: New cronspam from db clusters.
Feb 15 2019, 7:22 PM · Operations
colewhite edited projects for T216223: Degraded RAID on labsdb1005, added: cloud-services-team (Kanban); removed cloud-services-team.
Feb 15 2019, 4:53 PM · cloud-services-team (Kanban), Toolforge, ops-eqiad, Operations

Feb 14 2019

colewhite triaged T216090: ensure httpd error logs from "misc apps" (krypton) end up in logstash as Normal priority.
Feb 14 2019, 11:12 PM · Wikimedia-Logstash, Operations, serviceops
colewhite updated subscribers of T216090: ensure httpd error logs from "misc apps" (krypton) end up in logstash.
Feb 14 2019, 11:12 PM · Wikimedia-Logstash, Operations, serviceops
colewhite triaged T216192: Update label and switch to rename labvirt1012 to cloudvirt1012 as Normal priority.
Feb 14 2019, 11:11 PM · ops-eqiad, Operations
colewhite claimed T215940: Mailing list migration for Arbitration Committee to Google Group.
Feb 14 2019, 10:51 PM · Operations, CommRel-Specialists-Support (Jan-Mar-2019), Office-IT, Wikimedia-Mailing-lists
colewhite claimed T216101: LDAP access to the WMF group for Angela Muigai.
Feb 14 2019, 8:56 PM · Patch-For-Review, LDAP-Access-Requests
colewhite claimed T216120: LDAP access to the wmf group for Delphine Ménard (dmenard).
Feb 14 2019, 8:55 PM · Patch-For-Review, LDAP-Access-Requests
colewhite closed T215830: Requesting access to analytics-privatedata for esanders as Resolved.
Feb 14 2019, 8:55 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite added a comment to T215830: Requesting access to analytics-privatedata for esanders.

The group membership change has been deployed.

Feb 14 2019, 8:54 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite closed T215938: Access request: Ladsgroup to analytics-wmde-users as Resolved.
Feb 14 2019, 8:54 PM · Patch-For-Review, SRE-Access-Requests, Operations
colewhite added a comment to T215938: Access request: Ladsgroup to analytics-wmde-users.

The group membership change has been deployed.

Feb 14 2019, 8:53 PM · Patch-For-Review, SRE-Access-Requests, Operations
colewhite triaged T216183: Special:ProtectedPages times out on enwiki for Module namespace as High priority.
Feb 14 2019, 8:33 PM · User-Marostegui, Wikimedia-production-error, MediaWiki-Database, MediaWiki-Special-pages
colewhite added a comment to T216183: Special:ProtectedPages times out on enwiki for Module namespace.

The logs indicate that the request is timing out fetching data from the database.

Feb 14 2019, 8:32 PM · User-Marostegui, Wikimedia-production-error, MediaWiki-Database, MediaWiki-Special-pages
CDanis awarded T216088: Mapping of servers to stakeholders a Like token.
Feb 14 2019, 1:07 AM · Operations

Feb 13 2019

colewhite claimed T213708: Upgrade production prometheus-node-exporter to >= 0.16.
Feb 13 2019, 11:36 PM · Patch-For-Review, Goal, monitoring, Operations
colewhite claimed T215830: Requesting access to analytics-privatedata for esanders.
Feb 13 2019, 11:31 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite triaged T216088: Mapping of servers to stakeholders as Normal priority.
Feb 13 2019, 11:28 PM · Operations
colewhite removed a project from T215938: Access request: Ladsgroup to analytics-wmde-users: LDAP-Access-Requests.
Feb 13 2019, 8:49 PM · Patch-For-Review, SRE-Access-Requests, Operations
colewhite claimed T215938: Access request: Ladsgroup to analytics-wmde-users.
Feb 13 2019, 8:48 PM · Patch-For-Review, SRE-Access-Requests, Operations
colewhite closed T216068: Degraded RAID on cloudvirt1024, a subtask of T215892: Degraded RAID on cloudvirt1024, as Resolved.
Feb 13 2019, 8:43 PM · cloud-services-team (Kanban), ops-eqiad, Operations
colewhite closed T216068: Degraded RAID on cloudvirt1024 as Resolved.
Feb 13 2019, 8:43 PM · ops-eqiad, Operations
colewhite added a comment to T216068: Degraded RAID on cloudvirt1024.

Resolving as duplicate of parent.

Feb 13 2019, 8:42 PM · ops-eqiad, Operations
colewhite added a parent task for T216068: Degraded RAID on cloudvirt1024: T215892: Degraded RAID on cloudvirt1024.
Feb 13 2019, 8:42 PM · ops-eqiad, Operations
colewhite added a subtask for T215892: Degraded RAID on cloudvirt1024: T216068: Degraded RAID on cloudvirt1024.
Feb 13 2019, 8:42 PM · cloud-services-team (Kanban), ops-eqiad, Operations
colewhite closed T215575: Please add Petar Petković to the `wmf` LDAP group as Resolved.
Feb 13 2019, 8:39 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T215575: Please add Petar Petković to the `wmf` LDAP group.

@Petar.petkovic is now in the wmf ldap group. Resolving task.

Feb 13 2019, 8:39 PM · Patch-For-Review, LDAP-Access-Requests
colewhite triaged T215892: Degraded RAID on cloudvirt1024 as Normal priority.
Feb 13 2019, 8:32 PM · cloud-services-team (Kanban), ops-eqiad, Operations
colewhite triaged T216004: Degraded RAID on cloudvirt1018 as Normal priority.
Feb 13 2019, 8:31 PM · cloud-services-team (Kanban), ops-eqiad, Operations
colewhite updated the task description for T213708: Upgrade production prometheus-node-exporter to >= 0.16.
Feb 13 2019, 3:04 AM · Patch-For-Review, Goal, monitoring, Operations
colewhite triaged T215848: icinga really needs to check puppet run success of passive icinga hosts as Normal priority.
Feb 13 2019, 2:38 AM · monitoring, Icinga, Operations

Feb 11 2019

colewhite closed T215574: Please add Natalia Harateh to the `wmf` LDAP group as Resolved.
Feb 11 2019, 10:55 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T215574: Please add Natalia Harateh to the `wmf` LDAP group.

@NHarateh_WMF is now in the wmf ldap group. Resolving task.

Feb 11 2019, 10:54 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T215576: Please add Runa Bhattacharjee to the `wmf` LDAP group.

@Arrbee your LDAP user does not have a WMF email address associated and this appears to be required for membership of the wmf group.

Feb 11 2019, 9:48 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T215575: Please add Petar Petković to the `wmf` LDAP group.

@Petar.petkovic your LDAP user does not have a WMF email address associated and this appears to be required for membership of the wmf group.

Feb 11 2019, 9:47 PM · Patch-For-Review, LDAP-Access-Requests
colewhite claimed T215576: Please add Runa Bhattacharjee to the `wmf` LDAP group.
Feb 11 2019, 8:08 PM · Patch-For-Review, LDAP-Access-Requests
colewhite claimed T215575: Please add Petar Petković to the `wmf` LDAP group.
Feb 11 2019, 8:07 PM · Patch-For-Review, LDAP-Access-Requests
colewhite claimed T215574: Please add Natalia Harateh to the `wmf` LDAP group.
Feb 11 2019, 7:32 PM · Patch-For-Review, LDAP-Access-Requests
colewhite closed T215573: Please add Joe Walsh to the `wmf` LDAP group as Resolved.
Feb 11 2019, 7:32 PM · LDAP-Access-Requests
colewhite added a comment to T215573: Please add Joe Walsh to the `wmf` LDAP group.

@JoeWalsh is now in the wmf ldap group. Resolving task.

Feb 11 2019, 7:32 PM · LDAP-Access-Requests
colewhite claimed T215573: Please add Joe Walsh to the `wmf` LDAP group.
Feb 11 2019, 7:31 PM · LDAP-Access-Requests
colewhite closed T215572: Please add Alex Ezell to the `wmf` LDAP group as Resolved.
Feb 11 2019, 7:30 PM · LDAP-Access-Requests
colewhite added a comment to T215572: Please add Alex Ezell to the `wmf` LDAP group.

@aezell is now in the wmf ldap group. Resolving task.

Feb 11 2019, 7:30 PM · LDAP-Access-Requests
colewhite claimed T215572: Please add Alex Ezell to the `wmf` LDAP group.
Feb 11 2019, 7:28 PM · LDAP-Access-Requests

Feb 8 2019

colewhite updated the task description for T213708: Upgrade production prometheus-node-exporter to >= 0.16.
Feb 8 2019, 8:16 PM · Patch-For-Review, Goal, monitoring, Operations

Feb 4 2019

colewhite added a subtask for T210108: icinga1001 mysterious reboots: T214760: icinga1001 crashed.
Feb 4 2019, 4:43 PM · ops-eqiad, DC-Ops, Operations
colewhite added a parent task for T214760: icinga1001 crashed: T210108: icinga1001 mysterious reboots.
Feb 4 2019, 4:43 PM · Patch-For-Review, ops-eqiad, monitoring, Operations

Jan 24 2019

colewhite placed T210486: Audit "misc" cluster hosts up for grabs.
Jan 24 2019, 10:37 PM · User-fgiunchedi, User-Marostegui, Patch-For-Review, Operations
colewhite updated the task description for T213708: Upgrade production prometheus-node-exporter to >= 0.16.
Jan 24 2019, 6:21 PM · Patch-For-Review, Goal, monitoring, Operations
colewhite added a comment to T214594: node-exporter collector.diskstats.ignored-devices underescaped.

@fgiunchedi found that systemd is the likely culprit: https://github.com/systemd/systemd/issues/10659 https://github.com/systemd/systemd/pull/11427

Jan 24 2019, 5:28 PM · Patch-For-Review, monitoring

Jan 17 2019

colewhite added a comment to T213708: Upgrade production prometheus-node-exporter to >= 0.16.

Tested command line flags for prometheus-node-exporter v0.17

ARGS='--collector.diskstats.ignored-devices=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvmed+nd+p)d+$ --collector.filesystem.ignored-fs-types=^(overlay|autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ --collector.filesystem.ignored-mount-points=^/(sys|proc|dev|var/lib/docker|var/lib/kubelet)($|/) --collector.textfile.directory=/var/lib/prometheus/node.d --collector.buddyinfo --collector.conntrack --collector.diskstats --collector.edac --collector.entropy --collector.filefd --collector.filesystem --collector.hwmon --collector.loadavg --collector.mdadm --collector.meminfo --collector.netdev --collector.netstat --collector.netstat.fields="^(.*)" --collector.sockstat --collector.stat --collector.tcpstat --collector.textfile --collector.time --collector.uname --collector.vmstat --collector.vmstat.fields="^(.*)" --web.listen-address=:9100'
Jan 17 2019, 10:59 PM · Patch-For-Review, Goal, monitoring, Operations
colewhite added a comment to T213708: Upgrade production prometheus-node-exporter to >= 0.16.

After deploying the rules and node-exporter v0.17 to deployment-prometheus02, it appears the rules are not for backwards compatibility, but for forwards compatibility. Dashboards will need to be updated before node-exporter 0.17 is deployed.

Jan 17 2019, 10:18 PM · Patch-For-Review, Goal, monitoring, Operations
colewhite added a comment to T213708: Upgrade production prometheus-node-exporter to >= 0.16.

Changeset [1] contains a current snapshot of the converted rules files. These will be needed for prometheus server to not only maintain current behavior, but it has added the backwards-compatibility rules from [2].

Jan 17 2019, 5:09 PM · Patch-For-Review, Goal, monitoring, Operations

Jan 14 2019

colewhite claimed T210486: Audit "misc" cluster hosts.
Jan 14 2019, 5:36 PM · User-fgiunchedi, User-Marostegui, Patch-For-Review, Operations

Jan 11 2019

colewhite updated the task description for T205870: Provision >= 50% of statsd/Graphite-only metrics in Prometheus.
Jan 11 2019, 5:45 PM · Performance-Team (Radar), Patch-For-Review, monitoring, Operations

Jan 7 2019

colewhite committed rDEPLOYCHARTS1b9fd201b2ff: add statsd_exporter config to mathoid (authored by colewhite).
add statsd_exporter config to mathoid
Jan 7 2019, 10:45 PM

Jan 2 2019

colewhite closed T212525: Administrator password recovery for wmfaliens@lists.wikimedia.org as Resolved.
Jan 2 2019, 4:17 PM · Operations, Wikimedia-Mailing-lists

Dec 21 2018

colewhite closed T212334: Please give LDAP access to the wmf group for Joe Matazzoni as Resolved.
Dec 21 2018, 10:11 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T212334: Please give LDAP access to the wmf group for Joe Matazzoni.

Added you to wmf ldap group.

Dec 21 2018, 10:11 PM · Patch-For-Review, LDAP-Access-Requests
colewhite claimed T212334: Please give LDAP access to the wmf group for Joe Matazzoni.
Dec 21 2018, 10:09 PM · Patch-For-Review, LDAP-Access-Requests
colewhite closed T212266: Request to create mailing list for Wikimedians of Chicago User Group as Resolved.
Dec 21 2018, 9:44 PM · Operations, Wikimedia-Mailing-lists
colewhite added a comment to T212266: Request to create mailing list for Wikimedians of Chicago User Group.

The list has been created and the password emailed to you. At your convenience, please add the list to: https://meta.wikimedia.org/wiki/Mailing_lists/Overview

Dec 21 2018, 9:44 PM · Operations, Wikimedia-Mailing-lists
colewhite added a comment to T212525: Administrator password recovery for wmfaliens@lists.wikimedia.org.

Email set to @Elena with reset password.

Dec 21 2018, 9:37 PM · Operations, Wikimedia-Mailing-lists
colewhite claimed T212525: Administrator password recovery for wmfaliens@lists.wikimedia.org.
Dec 21 2018, 9:26 PM · Operations, Wikimedia-Mailing-lists
colewhite claimed T212266: Request to create mailing list for Wikimedians of Chicago User Group.
Dec 21 2018, 6:16 PM · Operations, Wikimedia-Mailing-lists
colewhite moved T212334: Please give LDAP access to the wmf group for Joe Matazzoni from Backlog to Awaiting User Input on the LDAP-Access-Requests board.
Dec 21 2018, 6:12 PM · Patch-For-Review, LDAP-Access-Requests

Dec 19 2018

colewhite added a comment to T212334: Please give LDAP access to the wmf group for Joe Matazzoni.

Hi Joe! Welcome!

Dec 19 2018, 8:19 PM · Patch-For-Review, LDAP-Access-Requests
colewhite triaged T212334: Please give LDAP access to the wmf group for Joe Matazzoni as Normal priority.
Dec 19 2018, 8:17 PM · Patch-For-Review, LDAP-Access-Requests

Dec 18 2018

colewhite updated the task description for T205870: Provision >= 50% of statsd/Graphite-only metrics in Prometheus.
Dec 18 2018, 10:55 PM · Performance-Team (Radar), Patch-For-Review, monitoring, Operations
colewhite updated the task description for T205870: Provision >= 50% of statsd/Graphite-only metrics in Prometheus.
Dec 18 2018, 10:55 PM · Performance-Team (Radar), Patch-For-Review, monitoring, Operations
colewhite triaged T212231: Remove Diamond from production as Normal priority.
Dec 18 2018, 9:48 PM · Patch-For-Review, monitoring, Operations
colewhite moved T211962: LDAP nda access request for Daimona from Backlog to Manager Approval Pending on the LDAP-Access-Requests board.
Dec 18 2018, 5:04 PM · Patch-For-Review, LDAP-Access-Requests

Dec 17 2018

colewhite triaged T212102: Add `supervised` option to redis configuration as Normal priority.
Dec 17 2018, 10:28 PM · User-jijiki, Operations, serviceops
colewhite triaged T206448: Decommission script race condition as Normal priority.
Dec 17 2018, 10:11 PM · Operations, Operations-Software-Development

Dec 12 2018

colewhite added a comment to T211750: Introduce Python code formatters usage.

I like Black, but any formatter, as long as the barrier to entry is low, is a good idea.

Dec 12 2018, 7:48 PM · Patch-For-Review, Operations, Operations-Software-Development
colewhite added a comment to T210723: Address recurrent service check time out for "HP RAID" on swift backend hosts.

There are few options that occur to me right away

  • Cron generates Prometheus metrics and exposed via the node text exporter
  • Script that runs on cron and caches the output of hpssacli and update the nrpe check to use the cached version (with additional staleness check)
  • Passive Icinga checks
  • hpraid_exporter: https://github.com/chromium58/hpraid_exporter
Dec 12 2018, 6:04 PM · User-fgiunchedi, Operations, monitoring

Dec 11 2018

colewhite updated the task description for T183454: Deprovision Diamond collectors no longer in use.
Dec 11 2018, 7:33 PM · Patch-For-Review, User-fgiunchedi, monitoring, Operations
Dzahn awarded T208066: Push check latency and check execution time to Prometheus a Barnstar token.
Dec 11 2018, 7:22 PM · Patch-For-Review, Icinga, monitoring, Operations
colewhite closed T208066: Push check latency and check execution time to Prometheus as Resolved.
Dec 11 2018, 6:40 PM · Patch-For-Review, Icinga, monitoring, Operations