Page MenuHomePhabricator

colewhite (cwhite)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2018, 6:05 PM (64 w, 10 h)
Availability
Available
LDAP User
Cwhite
MediaWiki User
Unknown

Recent Activity

Fri, Nov 8

colewhite created T237706: Phatality deployments invoke oom-killer on logstash::collector nodes..
Fri, Nov 8, 1:27 AM · Operations, observability

Thu, Nov 7

colewhite added a comment to T234565: Standardize the logging format.

@Krinkle T180051 IMHO implies a different solution. That task, as well as speeding up Kibana, would be accomplished with the work intended here. The last comment from @Eevans lines up with the intent of this task.

Thu, Nov 7, 11:39 PM · Wikimedia-Logstash, observability, Operations

Wed, Nov 6

colewhite renamed T205870: Fully migrate producers off statsd from Fully migrate >= 30% of producers off statsd to Fully migrate producers off statsd.
Wed, Nov 6, 4:37 PM · Performance-Team (Radar), Patch-For-Review, observability, Operations
colewhite added a subtask for T205870: Fully migrate producers off statsd: T233448: Review promethius ORES rules for completeness.
Wed, Nov 6, 12:15 AM · Performance-Team (Radar), Patch-For-Review, observability, Operations
colewhite added a parent task for T233448: Review promethius ORES rules for completeness: T205870: Fully migrate producers off statsd.
Wed, Nov 6, 12:15 AM · ORES, Scoring-platform-team
colewhite added a comment to T233448: Review promethius ORES rules for completeness.

If the statsd-exporter sidecar approach is appropriate for ORES, there are quite a few metrics with unclear type and meaning. I've constructed a tree to assist us in defining them.

Wed, Nov 6, 12:07 AM · ORES, Scoring-platform-team

Wed, Oct 30

colewhite triaged T236954: Hieradata yaml style checking as Low priority.
Wed, Oct 30, 8:31 PM · Puppet, Operations, User-jbond
colewhite created T236954: Hieradata yaml style checking.
Wed, Oct 30, 8:31 PM · Puppet, Operations, User-jbond

Tue, Oct 29

colewhite closed T233666: Close wikimediameta-l mailing list as Resolved.
Tue, Oct 29, 8:59 PM · Wikimedia-Mailing-lists, Operations
colewhite added a comment to T233666: Close wikimediameta-l mailing list.

Done.

Tue, Oct 29, 8:59 PM · Wikimedia-Mailing-lists, Operations
colewhite claimed T233666: Close wikimediameta-l mailing list.
Tue, Oct 29, 8:59 PM · Wikimedia-Mailing-lists, Operations
colewhite added a comment to T194558: Enable CAPTCHA on mailman instances.

It looks like recaptcha was built in recently and is available in buster https://bugs.launchpad.net/mailman/+bug/1774826

Tue, Oct 29, 8:46 PM · Operations, Wikimedia-Mailing-lists
colewhite moved T236829: Create mailing list for Wikidebate project from Backlog to List creation on the Wikimedia-Mailing-lists board.
Tue, Oct 29, 8:43 PM · Operations, Wikimedia-Mailing-lists

Mon, Oct 28

colewhite reassigned T236321: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004 from colewhite to CGlenn.
Mon, Oct 28, 8:14 PM · Patch-For-Review, Operations, SRE-Access-Requests

Fri, Oct 25

colewhite added a comment to T236505: Monitor mailman outbound mail queue.

Historically, out queue monitoring has been noisy. One idea to have less noisy outbound monitoring is to take the queue depth and estimate how long it will take to send that queue based on the average send time.

Fri, Oct 25, 7:49 PM · observability, Operations
colewhite claimed T236505: Monitor mailman outbound mail queue.
Fri, Oct 25, 5:48 PM · observability, Operations
colewhite created T236505: Monitor mailman outbound mail queue.
Fri, Oct 25, 5:48 PM · observability, Operations
colewhite closed T235983: Lengthy delays in emails being received from mailing lists in October 2019 as Resolved.
Fri, Oct 25, 4:10 PM · Mail, Operations, Wikimedia-Mailing-lists
colewhite closed T234999: Create wikimedia sustainability mailing list as Resolved.
Fri, Oct 25, 4:07 PM · Operations, Wikimedia-Mailing-lists
colewhite added a comment to T234999: Create wikimedia sustainability mailing list.

@mepps The list has been created and the password emailed to you. You may need to share it with your co-admin(s). The admin interface can be found here: https://lists.wikimedia.org/mailman/admin/sustainability/

Fri, Oct 25, 4:07 PM · Operations, Wikimedia-Mailing-lists
colewhite closed T234209: Grant LDAP groups and deployment shell access to Kevin Bazira as Resolved.
Fri, Oct 25, 3:58 PM · SRE-Access-Requests, Operations, LDAP-Access-Requests, Scoring-platform-team
colewhite closed T234209: Grant LDAP groups and deployment shell access to Kevin Bazira, a subtask of T234222: Onboarding Kevin Bazira -- Accounts and Access, as Resolved.
Fri, Oct 25, 3:58 PM · Scoring-platform-team (Current)
colewhite added a comment to T234209: Grant LDAP groups and deployment shell access to Kevin Bazira.

The necessary changes have been deployed. Please let me know if you encounter any related issue.

Fri, Oct 25, 3:58 PM · SRE-Access-Requests, Operations, LDAP-Access-Requests, Scoring-platform-team
colewhite updated the task description for T234209: Grant LDAP groups and deployment shell access to Kevin Bazira.
Fri, Oct 25, 3:52 PM · SRE-Access-Requests, Operations, LDAP-Access-Requests, Scoring-platform-team
colewhite added a comment to T236130: Elevated 502s observed in ulsfo.

And it's back!

Fri, Oct 25, 3:48 PM · Operations, Traffic

Thu, Oct 24

colewhite moved T236321: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004 from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Thu, Oct 24, 11:30 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite updated subscribers of T236321: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004.

Hi CherRaye!

Thu, Oct 24, 11:30 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite updated the task description for T236321: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004.
Thu, Oct 24, 11:20 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite claimed T236321: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004.
Thu, Oct 24, 11:15 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite updated the task description for T236321: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004.
Thu, Oct 24, 11:15 PM · Patch-For-Review, Operations, SRE-Access-Requests
colewhite triaged T236367: Tune HTTP availability alerts as Normal priority.
Thu, Oct 24, 11:13 PM · Operations, observability
colewhite triaged T236401: Apache error log noise "Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1" on mwdebug1001 as Normal priority.
Thu, Oct 24, 11:13 PM · Operations, Wikimedia-production-error
colewhite triaged T236404: Prepare and check storage layer for ge.wikimedia.org as Normal priority.
Thu, Oct 24, 11:12 PM · Data-Services, DBA, Operations
colewhite added a comment to T235983: Lengthy delays in emails being received from mailing lists in October 2019.

This issue is mitigated as of this UTC morning and confirm I am no longer seeing long delays of list email.

Thu, Oct 24, 6:49 PM · Mail, Operations, Wikimedia-Mailing-lists
colewhite claimed T235983: Lengthy delays in emails being received from mailing lists in October 2019.
Thu, Oct 24, 8:33 AM · Mail, Operations, Wikimedia-Mailing-lists

Wed, Oct 23

colewhite triaged T161822: Please add @jrobell and @spatton to WMF-NDA (access to private Phabricator tasks) as Normal priority.
Wed, Oct 23, 7:40 PM · Operations, SRE-Access-Requests, WMF-NDA-Requests
colewhite reassigned T161822: Please add @jrobell and @spatton to WMF-NDA (access to private Phabricator tasks) from spatton to Dzahn.
Wed, Oct 23, 7:40 PM · Operations, SRE-Access-Requests, WMF-NDA-Requests
colewhite closed T235260: Analytics Access for Grant (groups cn=wmf and analytics-privatedata-users) as Resolved.
Wed, Oct 23, 7:32 PM · LDAP-Access-Requests, Operations, SRE-Access-Requests, Analytics-Kanban, Analytics
colewhite added a comment to T235260: Analytics Access for Grant (groups cn=wmf and analytics-privatedata-users).

Account deployed and checked Grant is in wmf ldap group. Please let me know if you encounter any related issue.

Wed, Oct 23, 7:32 PM · LDAP-Access-Requests, Operations, SRE-Access-Requests, Analytics-Kanban, Analytics
colewhite reassigned T234209: Grant LDAP groups and deployment shell access to Kevin Bazira from colewhite to kevinbazira.
Wed, Oct 23, 7:21 PM · SRE-Access-Requests, Operations, LDAP-Access-Requests, Scoring-platform-team
colewhite added a comment to T234209: Grant LDAP groups and deployment shell access to Kevin Bazira.

It looks like your email address in wikitech is not updated to your staff address. Would you please correct this then we can proceed?
The place to change it is here: https://wikitech.wikimedia.org/wiki/Special:Preferences

Wed, Oct 23, 7:21 PM · SRE-Access-Requests, Operations, LDAP-Access-Requests, Scoring-platform-team
colewhite triaged T236246: Error restoring file: "The file … is in an inconsistent state within the internal storage backends" as Normal priority.
Wed, Oct 23, 7:18 PM · Operations, Multimedia, MediaWiki-File-management, SRE-swift-storage, Wikimedia-production-error, Commons
colewhite closed T236244: Degraded RAID on maerlant as Invalid.
Wed, Oct 23, 7:17 PM · ops-esams, Operations
colewhite added a comment to T236244: Degraded RAID on maerlant.

Looks like a short disconnection. Icinga shows all ok now for 7 hours.

Wed, Oct 23, 7:16 PM · ops-esams, Operations
colewhite triaged T236241: Yoruba Language Wikipedia not being indexed by search engines as Normal priority.
Wed, Oct 23, 7:15 PM · Readers-Web-Backlog, SEO, Operations, Wikimedia-General-or-Unknown
colewhite triaged T236240: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly as Normal priority.
Wed, Oct 23, 7:14 PM · Traffic, Operations, Thumbor, MediaWiki-File-management, Commons, Multimedia
colewhite triaged T236292: php-fpm invalid opcode on mw1317 as Normal priority.
Wed, Oct 23, 7:14 PM · Operations, serviceops
colewhite added a comment to T236240: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly.

This issue is easily replicated requesting pages 7 and 14. It first throws a 500 and then 429.

Wed, Oct 23, 7:13 PM · Traffic, Operations, Thumbor, MediaWiki-File-management, Commons, Multimedia
colewhite triaged T236181: setup/install an-airflow1001.eqiad.wmnet on ganeti as Normal priority.
Wed, Oct 23, 4:02 PM · vm-requests, Operations, Discovery-Search
colewhite triaged T236253: systemd-coredump can make a system unresponsive as Normal priority.
Wed, Oct 23, 4:01 PM · Patch-For-Review, serviceops, Operations
colewhite added a comment to T236130: Elevated 502s observed in ulsfo.

And now it's dropped off for a few hours.

Wed, Oct 23, 3:57 PM · Operations, Traffic
cmadeo awarded T234999: Create wikimedia sustainability mailing list a Baby Tequila token.
Wed, Oct 23, 3:04 PM · Operations, Wikimedia-Mailing-lists

Tue, Oct 22

colewhite closed T234429: Requesting access to view EventLogging data for Co_WMDE as Resolved.
Tue, Oct 22, 11:34 PM · WMF-Legal, Operations, SRE-Access-Requests
colewhite added a comment to T234429: Requesting access to view EventLogging data for Co_WMDE.

Hi Corinna!

Tue, Oct 22, 11:34 PM · WMF-Legal, Operations, SRE-Access-Requests
colewhite added a comment to T235983: Lengthy delays in emails being received from mailing lists in October 2019.

I've been monitoring this the past couple days. Since yesterday we've gone from over 20k messages in the queue to less than 6k. The backlog seems to be coming from a particular provider's ratelimiting. Samples from my inbox indicate delays between when Google relays the message to the list, and delays between the list server and the outbound mail relay. The combined effect makes the delay metric you're seeing.

Tue, Oct 22, 11:22 PM · Mail, Operations, Wikimedia-Mailing-lists
colewhite triaged T236209: WikimediaFoundation.org analytics access for CherRaye Glenn as Normal priority.
Tue, Oct 22, 10:34 PM · SRE-Access-Requests, wikimediafoundation.org, Operations, Analytics, LDAP-Access-Requests
colewhite claimed T236209: WikimediaFoundation.org analytics access for CherRaye Glenn.
Tue, Oct 22, 10:27 PM · SRE-Access-Requests, wikimediafoundation.org, Operations, Analytics, LDAP-Access-Requests
colewhite closed T235136: LDAP membership for new employee Nikki Nikkhoui as Resolved.
Tue, Oct 22, 10:23 PM · LDAP-Access-Requests, Operations
colewhite added a comment to T235136: LDAP membership for new employee Nikki Nikkhoui.

Hi Nikki!

Tue, Oct 22, 10:23 PM · LDAP-Access-Requests, Operations
colewhite updated the task description for T234429: Requesting access to view EventLogging data for Co_WMDE.
Tue, Oct 22, 10:22 PM · WMF-Legal, Operations, SRE-Access-Requests
colewhite updated the task description for T234429: Requesting access to view EventLogging data for Co_WMDE.
Tue, Oct 22, 10:08 PM · WMF-Legal, Operations, SRE-Access-Requests
colewhite triaged T236143: Editing in Gerrit isn't saved after the update/migration to gerrit1001 as Normal priority.
Tue, Oct 22, 8:09 PM · Operations, Gerrit
colewhite triaged T236145: processEchoEmailBatch.php failing for labtestwiki as Normal priority.
Tue, Oct 22, 8:07 PM · Operations, MediaWiki-Maintenance-scripts, cloud-services-team (Kanban)
colewhite triaged T236130: Elevated 502s observed in ulsfo as Normal priority.
Tue, Oct 22, 8:05 PM · Operations, Traffic
colewhite added a comment to T236130: Elevated 502s observed in ulsfo.

Of interest: all have user agent FortiGate (FortiOS 5.0) and have appeared near simultaneously from a number of sources globally starting 2019/10/21 at 0900 UTC.

Tue, Oct 22, 8:05 PM · Operations, Traffic

Mon, Oct 21

colewhite claimed T235136: LDAP membership for new employee Nikki Nikkhoui.
Mon, Oct 21, 11:13 PM · LDAP-Access-Requests, Operations
colewhite claimed T234999: Create wikimedia sustainability mailing list.
Mon, Oct 21, 11:12 PM · Operations, Wikimedia-Mailing-lists
colewhite triaged T236102: Can't load flame or coal graphs on performance.wikimedia.org (HTTP 502) as Normal priority.
Mon, Oct 21, 11:09 PM · Operations, Traffic, Performance-Team
colewhite triaged T235983: Lengthy delays in emails being received from mailing lists in October 2019 as High priority.
Mon, Oct 21, 10:00 PM · Mail, Operations, Wikimedia-Mailing-lists

Oct 3 2019

colewhite updated the task description for T234565: Standardize the logging format.
Oct 3 2019, 8:41 PM · Wikimedia-Logstash, observability, Operations
colewhite created T234565: Standardize the logging format.
Oct 3 2019, 8:24 PM · Wikimedia-Logstash, observability, Operations
colewhite moved T233662: Logstash pipeline crashes on non-UTF8 log messages. from Backlog to In Dev/Progress on the Wikimedia-Logstash board.
Oct 3 2019, 7:53 PM · Wikimedia-Incident, Patch-For-Review, Wikimedia-Logstash, Operations

Sep 23 2019

colewhite added a comment to T233662: Logstash pipeline crashes on non-UTF8 log messages..

There are a few options to consider.

Sep 23 2019, 8:29 PM · Wikimedia-Incident, Patch-For-Review, Wikimedia-Logstash, Operations
colewhite created T233662: Logstash pipeline crashes on non-UTF8 log messages..
Sep 23 2019, 8:21 PM · Wikimedia-Incident, Patch-For-Review, Wikimedia-Logstash, Operations

Sep 19 2019

colewhite updated the task description for T205870: Fully migrate producers off statsd.
Sep 19 2019, 10:41 PM · Performance-Team (Radar), Patch-For-Review, observability, Operations
colewhite updated the task description for T205870: Fully migrate producers off statsd.
Sep 19 2019, 10:41 PM · Performance-Team (Radar), Patch-For-Review, observability, Operations

Sep 18 2019

colewhite updated the task description for T205870: Fully migrate producers off statsd.
Sep 18 2019, 9:33 PM · Performance-Team (Radar), Patch-For-Review, observability, Operations

Sep 12 2019

colewhite updated the task description for T205870: Fully migrate producers off statsd.
Sep 12 2019, 10:42 PM · Performance-Team (Radar), Patch-For-Review, observability, Operations

Sep 10 2019

colewhite added a comment to T227360: wikibase: Request raises 500 on commons.

I cannot reproduce this anymore. Resolving.

Sep 10 2019, 5:51 PM · StructuredDataOnCommons, Wikidata-Campsite, Wikidata, Wikimedia-production-error
colewhite closed T227360: wikibase: Request raises 500 on commons as Resolved.
Sep 10 2019, 5:51 PM · StructuredDataOnCommons, Wikidata-Campsite, Wikidata, Wikimedia-production-error

Sep 3 2019

colewhite created T231953: Gerrit repositories mediawiki/services/service-runner and mediawiki/services/service-template-node appear abandoned.
Sep 3 2019, 11:36 PM · Repository-Admins, Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, Cleanup, Services, service-runner, Gerrit

Aug 30 2019

colewhite created T231696: Please create operations/debs/prometheus-swagger-exporter.
Aug 30 2019, 11:09 PM · Continuous-Integration-Config, GitHub-Mirrors, User-MarcoAurelio, Repository-Admins
colewhite updated the task description for T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC).
Aug 30 2019, 8:58 PM · DC-Ops, Operations, ops-eqiad
colewhite closed T229357: Remove logster from cp* hosts, a subtask of T220116: Migrate all metrics originated by PoPs from statsd to Prometheus, as Resolved.
Aug 30 2019, 8:49 PM · User-fgiunchedi, Operations, observability, Goal
colewhite closed T229357: Remove logster from cp* hosts as Resolved.
Aug 30 2019, 8:49 PM · Operations, observability

Aug 26 2019

colewhite archived P8978 service-checker.
Aug 26 2019, 7:49 PM
colewhite edited P8978 service-checker.
Aug 26 2019, 5:33 PM
colewhite updated the task description for T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).
Aug 26 2019, 4:38 PM · DC-Ops, Operations, ops-eqiad
colewhite updated the task description for T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).
Aug 26 2019, 4:38 PM · DC-Ops, Operations, ops-eqiad
colewhite created P8978 service-checker.
Aug 26 2019, 3:19 PM

Aug 13 2019

colewhite placed T230104: Requesting access to LogStash for Abijeet Patro up for grabs.
Aug 13 2019, 6:56 PM · SRE-Access-Requests, Operations

Aug 9 2019

colewhite closed T230242: Membership to 'wmf' LDAP group request for Connie Chen as Resolved.
Aug 9 2019, 9:15 PM · LDAP-Access-Requests, Operations
colewhite updated subscribers of T230242: Membership to 'wmf' LDAP group request for Connie Chen.

@cchen is now in the wmf ldap group. Resolving task.

Aug 9 2019, 9:15 PM · LDAP-Access-Requests, Operations
colewhite updated the task description for T230242: Membership to 'wmf' LDAP group request for Connie Chen.
Aug 9 2019, 9:13 PM · LDAP-Access-Requests, Operations
colewhite triaged T230242: Membership to 'wmf' LDAP group request for Connie Chen as Normal priority.
Aug 9 2019, 9:12 PM · LDAP-Access-Requests, Operations
colewhite closed T228447: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen as Resolved.
Aug 9 2019, 9:11 PM · SRE-Access-Requests, Operations
colewhite added a comment to T228447: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen.

Access to Superset and Turnilo are managed by the 'wmf' LDAP group. Since it is beyond the scope of this task, your new access request can be found here: T230242

Aug 9 2019, 9:11 PM · SRE-Access-Requests, Operations
colewhite created T230242: Membership to 'wmf' LDAP group request for Connie Chen.
Aug 9 2019, 9:06 PM · LDAP-Access-Requests, Operations
colewhite added a comment to T229963: Add Anne Tomasevich to ldap/wmf group.

Based on what I could find about your position, you may need more access than indicated here. @MarkTraceur could you help?

Aug 9 2019, 5:50 PM · LDAP-Access-Requests
colewhite claimed T229357: Remove logster from cp* hosts.
Aug 9 2019, 4:38 PM · Operations, observability