Page MenuHomePhabricator

colewhite (cwhite)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2018, 6:05 PM (46 w, 6 d)
Availability
Available
LDAP User
Cwhite
MediaWiki User
Unknown

Recent Activity

Today

colewhite added a comment to T228089: Logstash down for MediaWiki.

We decided to drop logs from cpjobqueue and changeprop at the logstash layer with the following config:

Mon, Jul 15, 8:03 PM · Wikimedia-Incident, observability, Operations, Wikimedia-Logstash

Wed, Jul 10

colewhite added a comment to T225604: log spam from mtail 3.0.0~rc19 on wezen.

@jbond there appears to be a missing feature around the -logfds flag in mtail rc24. This causes varnishmtail and varnishmtail-backend to fail to start because the -logfds flag is missing. Downgrading back to 3.0.0~rc5-1~bpo9+1wmf1 restores the flag and varnishmtail and varnishmtail-backend worked again.

Wed, Jul 10, 11:12 PM · observability

Fri, Jul 5

colewhite created T227360: wikibase: Request raises 500 on commons.
Fri, Jul 5, 8:49 PM · StructuredDataOnCommons, Wikidata-Campsite, Wikidata, Wikimedia-production-error

Mon, Jul 1

colewhite added a comment to T213708: Upgrade production prometheus-node-exporter to >= 0.16.

@CDanis thanks for the heads up. Should look better now.

Mon, Jul 1, 10:52 PM · Patch-For-Review, Goal, observability, Operations
colewhite added a comment to T226815: Gather metrics on request status codes, latencies from the MediaWiki appservers.

@Joe afaik, we're using mtail for this kind of metrics gathering. Some examples are the varnish programs here

Mon, Jul 1, 4:42 PM · Patch-For-Review, Operations, observability, serviceops

Wed, Jun 26

colewhite committed rODFRB52f28c095034: branched from tags/v2.0.0 and added debian directory (authored by colewhite).
branched from tags/v2.0.0 and added debian directory
Wed, Jun 26, 4:20 PM
colewhite committed rODFRB94055a771cfe: branched from tags/v2.0.0 and added debian directory (authored by colewhite).
branched from tags/v2.0.0 and added debian directory
Wed, Jun 26, 3:36 PM
colewhite committed rODFRBd15f7a2739d0: branched from tags/v2.0.0 and added debian directory (authored by colewhite).
branched from tags/v2.0.0 and added debian directory
Wed, Jun 26, 2:10 PM

Tue, Jun 25

colewhite committed rODFRBe96ce23c1129: branched from tags/v2.0.0 and added debian directory (authored by colewhite).
branched from tags/v2.0.0 and added debian directory
Tue, Jun 25, 9:01 PM
colewhite added a comment to T226449: Please create operations/debs/file-read-backwards gerrit repository.

@MarcoAurelio Repo works for me. Thanks for the quick turnaround!

Tue, Jun 25, 5:01 PM · User-MarcoAurelio, Repository-Admins
colewhite committed rODFRB0d62aaa86183: add debian folder (authored by colewhite).
add debian folder
Tue, Jun 25, 5:01 PM
colewhite committed rODFRBaca5baf2392f: Merge https://github.com/RobinNil/file_read_backwards (authored by colewhite).
Merge https://github.com/RobinNil/file_read_backwards
Tue, Jun 25, 4:16 PM

Mon, Jun 24

colewhite created T226449: Please create operations/debs/file-read-backwards gerrit repository.
Mon, Jun 24, 9:01 PM · User-MarcoAurelio, Repository-Admins

Jun 6 2019

colewhite added a comment to T184942: Deprecate python varnish cachestats.

Latest dashboard audit:

Jun 6 2019, 10:28 PM · Patch-For-Review, Traffic, User-fgiunchedi, Goal, Operations

Jun 5 2019

bd808 awarded T216088: Mapping of servers to stakeholders a Love token.
Jun 5 2019, 7:14 PM · Operations

Jun 4 2019

colewhite added a comment to T219825: Update dashboards to node-exporter 0.16+ metric names.

Thanks for the report! I've gone through and updated dashboards where I've found the legacy metric names in dashboard variables. Please let me know if you find any additional instances.

Jun 4 2019, 12:03 AM · Patch-For-Review, observability

May 29 2019

colewhite closed T219825: Update dashboards to node-exporter 0.16+ metric names as Resolved.
May 29 2019, 6:01 PM · Patch-For-Review, observability
colewhite closed T219825: Update dashboards to node-exporter 0.16+ metric names, a subtask of T220104: TEC6: Metrics monitoring infrastructure (Q4 2018/19 goal), as Resolved.
May 29 2019, 6:01 PM · User-fgiunchedi, Operations, observability, Goal
colewhite updated the task description for T219825: Update dashboards to node-exporter 0.16+ metric names.
May 29 2019, 6:01 PM · Patch-For-Review, observability

May 23 2019

colewhite updated the task description for T219825: Update dashboards to node-exporter 0.16+ metric names.
May 23 2019, 5:18 PM · Patch-For-Review, observability

May 13 2019

colewhite updated the task description for T219825: Update dashboards to node-exporter 0.16+ metric names.
May 13 2019, 3:08 PM · Patch-For-Review, observability

May 9 2019

colewhite added a comment to T196066: Add prometheus metrics for varnishkafka instances running on caching hosts.

I agree with dropping the prefix in favor of "rdkafka".

May 9 2019, 10:02 PM · Patch-For-Review, Analytics-Kanban, Traffic, Operations, Analytics

May 8 2019

colewhite lowered the priority of T222826: Leverage Grafana annotations to show events in graphs from Normal to Low.
May 8 2019, 5:12 PM · observability, Operations
colewhite triaged T222826: Leverage Grafana annotations to show events in graphs as Normal priority.
May 8 2019, 5:12 PM · observability, Operations
colewhite added a subtask for T222826: Leverage Grafana annotations to show events in graphs: T174172: unused grafana-dashboard indices on elasticsearch / logstash.
May 8 2019, 5:12 PM · observability, Operations
colewhite added a parent task for T174172: unused grafana-dashboard indices on elasticsearch / logstash: T222826: Leverage Grafana annotations to show events in graphs.
May 8 2019, 5:12 PM · Graphite, Operations
colewhite created T222826: Leverage Grafana annotations to show events in graphs.
May 8 2019, 5:11 PM · observability, Operations

Apr 30 2019

colewhite added a comment to T219825: Update dashboards to node-exporter 0.16+ metric names.

@CDanis good catch!

Apr 30 2019, 5:13 PM · Patch-For-Review, observability

Apr 25 2019

colewhite updated the task description for T219825: Update dashboards to node-exporter 0.16+ metric names.
Apr 25 2019, 5:24 PM · Patch-For-Review, observability

Apr 24 2019

colewhite added a comment to T217142: [WIP] [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors.

Copying thoughts to task:

Apr 24 2019, 3:48 PM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Patch-For-Review, User-herron, Reading-Infrastructure-Team-Backlog, Wikimedia-Logstash

Apr 19 2019

colewhite triaged T221481: Degraded RAID on db2047 as High priority.
Apr 19 2019, 10:34 PM · DBA, Operations, ops-codfw

Apr 18 2019

colewhite closed T220084: analytics-wmde group addition for Lucas Werkmeister as Resolved.
Apr 18 2019, 10:41 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite added a comment to T220084: analytics-wmde group addition for Lucas Werkmeister.

The group membership change has been deployed.

Apr 18 2019, 10:41 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite updated the task description for T220084: analytics-wmde group addition for Lucas Werkmeister.
Apr 18 2019, 10:40 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite triaged T221288: Phabricator SPF record contains internal addressing for phab[12]001 as Normal priority.
Apr 18 2019, 7:07 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
colewhite claimed T221290: wiki-mail DKIM failing.
Apr 18 2019, 7:07 PM · Patch-For-Review, Traffic, Operations, DNS, Mail

Apr 17 2019

colewhite triaged T221138: relocate/reimage cloudvirt1004 with 10G interfaces as Normal priority.
Apr 17 2019, 6:48 PM · Patch-For-Review, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221139: relocate/reimage cloudvirt1003 with 10G interfaces as Normal priority.
Apr 17 2019, 6:48 PM · Patch-For-Review, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221140: relocate/reimage cloudvirt1002 with 10G interfaces as Normal priority.
Apr 17 2019, 6:48 PM · Patch-For-Review, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221141: relocate/reimage cloudvirt1001 with 10G interfaces as Normal priority.
Apr 17 2019, 6:47 PM · Patch-For-Review, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221259: eqord - ulsfo Telia link down - IC-313592 as High priority.
Apr 17 2019, 6:47 PM · Operations, netops

Apr 16 2019

colewhite triaged T220860: access for foks to labweb (in one way or another) (or make changePassword.php work on mwmaint hosts) as Normal priority.
Apr 16 2019, 6:11 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite triaged T220844: remove RT mail aliases as Normal priority.
Apr 16 2019, 6:10 PM · Mail, Operations
colewhite triaged T221125: cumin aliases not matching any hosts as Normal priority.
Apr 16 2019, 6:09 PM · Patch-For-Review, cloud-services-team, Operations, Operations-Software-Development
colewhite triaged T221115: labpuppetmaster logs 'cannot collect exported resources without storeconfigs being set' as Normal priority.
Apr 16 2019, 6:08 PM · cloud-services-team, Operations
colewhite triaged T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory as Normal priority.
Apr 16 2019, 6:07 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
colewhite triaged T220590: Decom ms-be101[345] as Normal priority.
Apr 16 2019, 6:06 PM · ops-eqiad, decommission, User-fgiunchedi, media-storage, Operations
colewhite triaged T200297: Introduce a new namespace for collaborative judgements about wiki entities as Normal priority.
Apr 16 2019, 6:05 PM · TechCom-RFC (TechCom-Approved), MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Scoring-platform-team (Current), DBA, Operations, Jade
colewhite triaged T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool as Normal priority.
Apr 16 2019, 6:04 PM · Patch-For-Review, Operations, Icinga, observability
colewhite triaged T220567: Wikitech page views sometimes default to MobileFrontend as Normal priority.
Apr 16 2019, 6:03 PM · Traffic, wikitech.wikimedia.org, Operations
colewhite lowered the priority of T220500: logstash1012 lock up caused central logging stuck from High to Normal.
Apr 16 2019, 6:02 PM · User-herron, Wikimedia-Logstash, Operations
colewhite triaged T220500: logstash1012 lock up caused central logging stuck as High priority.
Apr 16 2019, 6:02 PM · User-herron, Wikimedia-Logstash, Operations
colewhite closed T220880: Degraded RAID on analytics1039 as Resolved.
Apr 16 2019, 6:02 PM · ops-eqiad, Operations
colewhite added a comment to T220880: Degraded RAID on analytics1039.

we're pretty sure this is a false alarm

Apr 16 2019, 6:02 PM · ops-eqiad, Operations
colewhite updated subscribers of T220880: Degraded RAID on analytics1039.
Apr 16 2019, 6:01 PM · ops-eqiad, Operations
colewhite triaged T220681: Set `enable_dl` to 0 in php.ini as Normal priority.
Apr 16 2019, 5:32 PM · Patch-For-Review, PHP 7.2 support, Performance-Team (Radar), Operations
colewhite triaged T220901: Elasticsearch nodes overloading in eqiad as High priority.
Apr 16 2019, 3:55 PM · Operations, Discovery-Search (Current work)
colewhite triaged T220907: Degraded RAID on ms-be1013 as High priority.
Apr 16 2019, 3:42 PM · ops-eqiad, Operations
colewhite triaged T220982: maps hosts have bad permissions under /srv/deployment as High priority.
Apr 16 2019, 3:41 PM · Operations
colewhite triaged T221047: relocate/reimage cloudvirt1007 with 10G interfaces as Normal priority.
Apr 16 2019, 3:40 PM · Patch-For-Review, ops-eqiad, DC-Ops, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221048: relocate/reimage cloudvirt1006 with 10G interfaces as Normal priority.
Apr 16 2019, 3:40 PM · Patch-For-Review, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221049: relocate/reimage cloudvirt1005 with 10G interfaces as Normal priority.
Apr 16 2019, 3:40 PM · Patch-For-Review, Operations, Epic, cloud-services-team (Kanban)
colewhite triaged T221052: config file change canarying for logstash as Normal priority.
Apr 16 2019, 3:39 PM · Operations, Wikimedia-Logstash
colewhite triaged T221068: decom ms-be201[345] as Normal priority.
Apr 16 2019, 3:39 PM · decommission, ops-codfw, media-storage, User-fgiunchedi, Operations
colewhite triaged T221083: puppet fact: migrate away from the uniqueid fact as Normal priority.
Apr 16 2019, 3:36 PM · Puppet, Operations

Apr 15 2019

colewhite moved T219825: Update dashboards to node-exporter 0.16+ metric names from Backlog to In progress on the observability board.
Apr 15 2019, 3:14 PM · Patch-For-Review, observability

Apr 3 2019

colewhite added a comment to T219825: Update dashboards to node-exporter 0.16+ metric names.

Fundraising dashboards cannot be updated at this time. It looks like the nodes may need upgrading or forwards-compatibility rules.

Apr 3 2019, 10:51 PM · Patch-For-Review, observability

Apr 1 2019

colewhite added a subtask for T213288: TEC6: Upgrade metrics monitoring infrastructure core components (Q3 2018/19 goal): T219825: Update dashboards to node-exporter 0.16+ metric names.
Apr 1 2019, 6:44 PM · User-fgiunchedi, Goal, observability, Operations
colewhite added a parent task for T219825: Update dashboards to node-exporter 0.16+ metric names: T213288: TEC6: Upgrade metrics monitoring infrastructure core components (Q3 2018/19 goal).
Apr 1 2019, 6:44 PM · Patch-For-Review, observability
colewhite closed T213708: Upgrade production prometheus-node-exporter to >= 0.16 as Resolved.
Apr 1 2019, 6:43 PM · Patch-For-Review, Goal, observability, Operations
colewhite closed T213708: Upgrade production prometheus-node-exporter to >= 0.16, a subtask of T213288: TEC6: Upgrade metrics monitoring infrastructure core components (Q3 2018/19 goal), as Resolved.
Apr 1 2019, 6:43 PM · User-fgiunchedi, Goal, observability, Operations
colewhite triaged T219825: Update dashboards to node-exporter 0.16+ metric names as Low priority.
Apr 1 2019, 6:42 PM · Patch-For-Review, observability
colewhite created T219825: Update dashboards to node-exporter 0.16+ metric names.
Apr 1 2019, 6:41 PM · Patch-For-Review, observability

Mar 28 2019

colewhite updated the task description for T213708: Upgrade production prometheus-node-exporter to >= 0.16.
Mar 28 2019, 11:13 PM · Patch-For-Review, Goal, observability, Operations

Mar 22 2019

colewhite closed T216101: LDAP access to the WMF group for Angela Muigai as Resolved.
Mar 22 2019, 6:18 PM · LDAP-Access-Requests
colewhite added a comment to T216101: LDAP access to the WMF group for Angela Muigai.

Thank you for following up!

Mar 22 2019, 6:09 PM · LDAP-Access-Requests

Mar 21 2019

colewhite added a comment to T217932: Change log routing to ELK cluster to use rsyslog->kafka rather than talking directly to the ELK cluster.

As I understand it, journald is already wired up to copy to rsyslog. The only change needed to get these logs onto Kafka is to whitelist the application in the lookup_table_output.json.

Mar 21 2019, 5:24 PM · cloud-services-team (Kanban), Patch-For-Review, Striker

Mar 6 2019

colewhite closed T214594: node-exporter collector.diskstats.ignored-devices underescaped as Resolved.
Mar 6 2019, 6:34 PM · Patch-For-Review, observability

Mar 4 2019

colewhite claimed T214594: node-exporter collector.diskstats.ignored-devices underescaped.
Mar 4 2019, 4:10 PM · Patch-For-Review, observability

Feb 25 2019

colewhite closed T216120: LDAP access to the wmf group for Delphine Ménard (dmenard) as Resolved.
Feb 25 2019, 8:13 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T216120: LDAP access to the wmf group for Delphine Ménard (dmenard).

@Delphine_wmf is now in the wmf ldap group. Resolving task.

Feb 25 2019, 8:13 PM · Patch-For-Review, LDAP-Access-Requests

Feb 21 2019

colewhite created P8120 Smartmon Node Exporter comparison.
Feb 21 2019, 10:19 PM
colewhite placed T215940: Mailing list migration for Arbitration Committee to Google Group up for grabs.
Feb 21 2019, 6:23 PM · Operations, Office-IT, Wikimedia-Mailing-lists
colewhite updated the task description for T215940: Mailing list migration for Arbitration Committee to Google Group.
Feb 21 2019, 6:23 PM · Operations, Office-IT, Wikimedia-Mailing-lists
colewhite updated subscribers of T215940: Mailing list migration for Arbitration Committee to Google Group.

Mbox files shared with @eross .

Feb 21 2019, 6:23 PM · Operations, Office-IT, Wikimedia-Mailing-lists
colewhite closed T215576: Please add Runa Bhattacharjee to the `wmf` LDAP group as Resolved.
Feb 21 2019, 5:59 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T215576: Please add Runa Bhattacharjee to the `wmf` LDAP group.

@Arrbee is now in the wmf ldap group. Resolving task.

Feb 21 2019, 5:59 PM · Patch-For-Review, LDAP-Access-Requests
colewhite added a comment to T213708: Upgrade production prometheus-node-exporter to >= 0.16.

On further investigation, the log messages appear to be from the shebang of the ipmitool awk script.

Feb 21 2019, 4:51 PM · Patch-For-Review, Goal, observability, Operations

Feb 15 2019

colewhite added a comment to T216120: LDAP access to the wmf group for Delphine Ménard (dmenard).

I was unable to find your account in LDAP. Have you had an account created for you by OIT or created one on wikitech?

Feb 15 2019, 9:41 PM · Patch-For-Review, LDAP-Access-Requests
colewhite triaged T216235: cleanup reprepro configuration for elasticsearch-curator as Normal priority.
Feb 15 2019, 7:36 PM · Discovery-Search (Current work), Patch-For-Review, User-fgiunchedi, Elasticsearch, Operations
colewhite triaged T216226: GPU upgrade for stat1005 as Normal priority.
Feb 15 2019, 7:35 PM · Analytics, hardware-requests, Operations
colewhite triaged T216202: Disk failure on labsdb1005 as Normal priority.
Feb 15 2019, 7:34 PM · Operations, ops-eqiad
colewhite triaged T216243: cron spam for slow queries on mwmaint /usr/local/bin/foreachwiki initSiteStats.php --update > /dev/null as Normal priority.
Feb 15 2019, 7:33 PM · Performance-Team (Radar), Patch-For-Review, Operations, MediaWiki-Maintenance-scripts
colewhite triaged T216273: New cronspam from db clusters as Normal priority.
Feb 15 2019, 7:33 PM · Operations
colewhite added a subtask for T132324: Tracking and Reducing cron-spam to root@ : T216273: New cronspam from db clusters.
Feb 15 2019, 7:32 PM · Patch-For-Review, Operations
colewhite added a parent task for T216273: New cronspam from db clusters: T132324: Tracking and Reducing cron-spam to root@ .
Feb 15 2019, 7:32 PM · Operations
colewhite triaged T216223: Degraded RAID on labsdb1005 as Normal priority.
Feb 15 2019, 7:31 PM · cloud-services-team (Kanban), Toolforge, ops-eqiad, Operations
colewhite created T216273: New cronspam from db clusters.
Feb 15 2019, 7:22 PM · Operations
colewhite edited projects for T216223: Degraded RAID on labsdb1005, added: cloud-services-team (Kanban); removed cloud-services-team.
Feb 15 2019, 4:53 PM · cloud-services-team (Kanban), Toolforge, ops-eqiad, Operations

Feb 14 2019

colewhite triaged T216090: ensure httpd error logs from "misc apps" (krypton) end up in logstash as Normal priority.
Feb 14 2019, 11:12 PM · Wikimedia-Logstash, Operations, serviceops