CDanis (Chris Danis)
SRE

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Nov 5 2018, 2:54 PM (5 w, 6 d)
Availability
Available
IRC Nick
cdanis
LDAP User
CDanis
MediaWiki User
CDanis (WMF) [ Global Accounts ]

Recent Activity

Fri, Dec 14

fgiunchedi awarded T211979: 'provision' Grafana's datasources as YAML in puppet a Like token.
Fri, Dec 14, 4:40 PM · monitoring, User-CDanis
CDanis triaged T211982: Find links to grafana.wikimedia.org and change them to use the new URL format as Normal priority.
Fri, Dec 14, 2:55 PM · Operations, monitoring, User-CDanis
CDanis triaged T211979: 'provision' Grafana's datasources as YAML in puppet as Normal priority.
Fri, Dec 14, 2:51 PM · monitoring, User-CDanis

Thu, Dec 13

CDanis created T211880: Upgrade grafana-labs.wikimedia.org to Grafana 5.x.
Thu, Dec 13, 2:05 PM · Cloud-Services

Tue, Dec 11

CDanis added a comment to T211712: moving from krypton to grafana1001 broke fundraising dashboards.

When you're ready to make firewall changes, I'm happy to help :)

Tue, Dec 11, 8:07 PM · fundraising-tech-ops, Patch-For-Review, User-CDanis, monitoring
CDanis reassigned T211712: moving from krypton to grafana1001 broke fundraising dashboards from CDanis to cwdent.
Tue, Dec 11, 8:07 PM · fundraising-tech-ops, Patch-For-Review, User-CDanis, monitoring
CDanis updated the task description for T211712: moving from krypton to grafana1001 broke fundraising dashboards.
Tue, Dec 11, 7:20 PM · fundraising-tech-ops, Patch-For-Review, User-CDanis, monitoring
CDanis moved T211712: moving from krypton to grafana1001 broke fundraising dashboards from Backlog to Blocked on others on the User-CDanis board.
Tue, Dec 11, 7:20 PM · fundraising-tech-ops, Patch-For-Review, User-CDanis, monitoring
CDanis added a comment to T211712: moving from krypton to grafana1001 broke fundraising dashboards.

It's serving again: https://grafana-old.wikimedia.org/dashboard/db/fundraising-database?orgId=1

Tue, Dec 11, 7:19 PM · fundraising-tech-ops, Patch-For-Review, User-CDanis, monitoring
CDanis created T211712: moving from krypton to grafana1001 broke fundraising dashboards.
Tue, Dec 11, 7:01 PM · fundraising-tech-ops, Patch-For-Review, User-CDanis, monitoring
CDanis added a comment to T211596: mtail seems broken on syslog::centralserver installations.

Looks like this worked:

Tue, Dec 11, 5:58 PM · Patch-For-Review, User-CDanis, Operations
fgiunchedi awarded T211654: puppet-provisioned dashboards not found in Grafana 5 a Like token.
Tue, Dec 11, 5:18 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis closed T211654: puppet-provisioned dashboards not found in Grafana 5, a subtask of T210416: Upgrade grafana to 5.x, as Resolved.
Tue, Dec 11, 3:56 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis closed T211654: puppet-provisioned dashboards not found in Grafana 5 as Resolved.
Tue, Dec 11, 3:56 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T211654: puppet-provisioned dashboards not found in Grafana 5.

New config now puppetized; grafana has been restarted; the JSON consoles look good

Tue, Dec 11, 3:56 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T211654: puppet-provisioned dashboards not found in Grafana 5.

I believe I've fixed this:

Tue, Dec 11, 2:49 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis moved T211654: puppet-provisioned dashboards not found in Grafana 5 from Backlog to Doing on the User-CDanis board.
Tue, Dec 11, 2:48 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T211654: puppet-provisioned dashboards not found in Grafana 5.

Ah, I think I know what happened. Hopefully will be an easy fix...

Tue, Dec 11, 1:12 PM · Patch-For-Review, Operations, monitoring, User-CDanis

Mon, Dec 10

CDanis added a comment to T210416: Upgrade grafana to 5.x.

Need to create some other tasks to track work that should be done with new 5.x features but marking this as done :)

Mon, Dec 10, 11:07 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis closed T210416: Upgrade grafana to 5.x as Resolved.
Mon, Dec 10, 11:07 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

One part of the bug is fixed. The other (typing in the tag filter dropdown box) is not.

Mon, Dec 10, 8:02 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

This bug may have been fixed in 5.4.1. Going to grab that version into wikimedia-stretch.

Mon, Dec 10, 7:56 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

Thanks to @ema who found a bug in 5.4.0 -- the tag filter UI seems quite broken.
Reported upstream: https://github.com/grafana/grafana/issues/14437

Mon, Dec 10, 6:30 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis moved T211596: mtail seems broken on syslog::centralserver installations from Blocked on others to Radar on the User-CDanis board.
Mon, Dec 10, 6:25 PM · Patch-For-Review, User-CDanis, Operations
CDanis added a comment to T211596: mtail seems broken on syslog::centralserver installations.

Cool, looks like mtail is happy now as well:

Mon, Dec 10, 4:48 PM · Patch-For-Review, User-CDanis, Operations
CDanis added a subtask for T209863: graph server temperature metrics: T211596: mtail seems broken on syslog::centralserver installations.
Mon, Dec 10, 4:24 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a parent task for T211596: mtail seems broken on syslog::centralserver installations: T209863: graph server temperature metrics.
Mon, Dec 10, 4:24 PM · Patch-For-Review, User-CDanis, Operations
CDanis added a comment to T210416: Upgrade grafana to 5.x.

I've heard of no issues with Grafana 5, and will be upgrading today.

Mon, Dec 10, 4:15 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis moved T211596: mtail seems broken on syslog::centralserver installations from Backlog to Blocked on others on the User-CDanis board.
Mon, Dec 10, 3:10 PM · Patch-For-Review, User-CDanis, Operations
CDanis added a project to T211596: mtail seems broken on syslog::centralserver installations: User-CDanis.
Mon, Dec 10, 3:10 PM · Patch-For-Review, User-CDanis, Operations
CDanis created T211596: mtail seems broken on syslog::centralserver installations.
Mon, Dec 10, 3:08 PM · Patch-For-Review, User-CDanis, Operations

Sat, Dec 8

CDanis awarded T182028: DNS repo: add CI checks for obvious configuration errors a Love token.
Sat, Dec 8, 4:51 PM · Traffic, DNS, Patch-For-Review, Operations-Software-Development, Operations

Fri, Dec 7

CDanis claimed T210416: Upgrade grafana to 5.x.
Fri, Dec 7, 9:53 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis updated the task description for T210416: Upgrade grafana to 5.x.
Fri, Dec 7, 7:55 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis updated the task description for T210416: Upgrade grafana to 5.x.
Fri, Dec 7, 1:24 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis updated the task description for T210416: Upgrade grafana to 5.x.
Fri, Dec 7, 12:56 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
fgiunchedi awarded T210416: Upgrade grafana to 5.x a Like token.
Fri, Dec 7, 9:13 AM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis

Thu, Dec 6

CDanis added a project to T210416: Upgrade grafana to 5.x: Operations.
Thu, Dec 6, 11:38 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis updated the task description for T210416: Upgrade grafana to 5.x.
Thu, Dec 6, 10:44 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis updated the task description for T210416: Upgrade grafana to 5.x.
Thu, Dec 6, 10:39 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

It's alive!

Thu, Dec 6, 9:55 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

Okay, grafana1001.eqiad is up and running, with the database (and the plugins tree) copied over from krypton. At first glance it seems to work.

Thu, Dec 6, 8:10 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis

Wed, Dec 5

CDanis added a comment to T209863: graph server temperature metrics.

Haven't had too much time to look at this. FWIW I do see temperature sensors exported by the kernel on cp3007:

Wed, Dec 5, 8:37 PM · Patch-For-Review, Operations, monitoring, User-CDanis

Tue, Dec 4

CDanis added a comment to T210416: Upgrade grafana to 5.x.

The first Puppet run with the new configuration on grafana1001 failed. I invoked puppet again and it immediately succeeded.

Tue, Dec 4, 8:24 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis created T211121: Usual git mechanism for aborting commit does not work on the private puppet repo.
Tue, Dec 4, 3:12 PM · Operations

Mon, Dec 3

CDanis moved T210416: Upgrade grafana to 5.x from Backlog to In progress on the monitoring board.
Mon, Dec 3, 4:56 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

Please give this a good look! I've never done this before...

Mon, Dec 3, 1:40 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis

Fri, Nov 30

CDanis updated subscribers of T210416: Upgrade grafana to 5.x.

@fgiunchedi does the above sound good to you?

Fri, Nov 30, 11:24 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

Talked some with volans. Seems like the best thing to do is probably to make a VM in Ganeti running stretch, set it up with a new puppet role just for grafana, copy the DB over, verify that it works well, and then switch over grafana.wikimedia.org to point there.

Fri, Nov 30, 11:24 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis moved T210416: Upgrade grafana to 5.x from Backlog to Doing on the User-CDanis board.
Fri, Nov 30, 10:58 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

My plan from here had been to try naively copying grafana's database from krypton to my VM.

Fri, Nov 30, 7:52 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T210416: Upgrade grafana to 5.x.

Fixed above by manually adding a pile of hiera in Horizon.

Fri, Nov 30, 6:17 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T189641: Service for checking the Pwned Passwords database.

However, I do think we should think about a multi-DC-capable solution in this case. We don't want DC switches, outages and other concerns influence, compromise or complicate the security aspects of this mechanism.

Fri, Nov 30, 1:41 PM · Services (watching), User-Tgr, WMF-Legal, Patch-For-Review, Security, MediaWiki-User-login-and-signup, MediaWiki-Authentication-and-authorization, Security-General

Thu, Nov 29

hashar awarded T209863: graph server temperature metrics a Burninate token.
Thu, Nov 29, 11:05 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T209863: graph server temperature metrics.

Things I have learned today:

Thu, Nov 29, 11:01 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T189641: Service for checking the Pwned Passwords database.

but I think we should aim for having the data in memory as much as possible to achieve good performances, no matter the choice of the data storage

22GB of data in memory?

Thu, Nov 29, 8:18 PM · Services (watching), User-Tgr, WMF-Legal, Patch-For-Review, Security, MediaWiki-User-login-and-signup, MediaWiki-Authentication-and-authorization, Security-General

Wed, Nov 28

CDanis edited P7859 RAII objhits.cc.
Wed, Nov 28, 4:35 PM
CDanis created P7859 RAII objhits.cc.
Wed, Nov 28, 4:29 PM
CDanis added a comment to T210416: Upgrade grafana to 5.x.

So I thought it would be simple to create a VM, use Horizon to enable the Puppet role used for grafana on it (role::webserver_misc_apps), and then install the updated deb manually (since messing with reprepro for seems both unnecessary and scary).

Wed, Nov 28, 2:13 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis

Tue, Nov 27

CDanis added a comment to T210416: Upgrade grafana to 5.x.

cdanis-test-grafana5-stretch1.monitoring.eqiad.wmflabs is alive!

Tue, Nov 27, 12:57 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T209863: graph server temperature metrics.

Late last week I figured out scraping aggregated data from Prometheus as a CSV and fed that into Plotly:
https://plot.ly/~cdanis-wmf/1/#/

Tue, Nov 27, 12:56 PM · Patch-For-Review, Operations, monitoring, User-CDanis

Mon, Nov 26

CDanis created T210416: Upgrade grafana to 5.x.
Mon, Nov 26, 4:54 PM · Performance-Team (Radar), Patch-For-Review, Operations, monitoring, User-CDanis
CDanis moved T209863: graph server temperature metrics from Backlog to Doing on the User-CDanis board.
Mon, Nov 26, 4:03 PM · Patch-For-Review, Operations, monitoring, User-CDanis

Mon, Nov 19

CDanis moved T209863: graph server temperature metrics from Backlog to In progress on the monitoring board.
Mon, Nov 19, 6:10 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a project to T209863: graph server temperature metrics: Operations.
Mon, Nov 19, 5:34 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis added a comment to T209863: graph server temperature metrics.

Added some basic graphs to the prometheus-cluster-breakdown console: percentiles of server temperature for a given cluster, and plots for each server of a single, arbitrarily-selected temperature graph (sensor="temp1",chip="platform_coretemp_0")

Mon, Nov 19, 5:29 PM · Patch-For-Review, Operations, monitoring, User-CDanis
CDanis created T209863: graph server temperature metrics.
Mon, Nov 19, 5:26 PM · Patch-For-Review, Operations, monitoring, User-CDanis

Nov 12 2018

CDanis added a comment to T149287: Heating alerts for mw servers in eqiad.

Many of these machines are always running hot -- ambient temps of 85C or more, even when only lightly loaded.

Nov 12 2018, 7:01 PM · Operations, ops-eqiad
CDanis created P7792 (An Untitled Masterwork).
Nov 12 2018, 6:32 PM
CDanis added a comment to T149287: Heating alerts for mw servers in eqiad.

We observed overheating symptoms on the following machines today:
mw[1221-1227,1229,1231-1235,1238,1240-1248,1250-1251,1253,1255].eqiad.wmnet

Nov 12 2018, 6:24 PM · Operations, ops-eqiad
CDanis created P7791 servers that were too hot in the hour of 1700, Nov 12 2018, summarized.
Nov 12 2018, 5:53 PM
CDanis created P7790 servers that were too hot in the hour of 1700, Nov 12 2018.
Nov 12 2018, 5:46 PM

Nov 9 2018

CDanis moved T201409: Harmonise the identification of requests across our stack from Backlog to Radar on the User-CDanis board.
Nov 9 2018, 7:32 PM · User-CDanis, TechCom-RFC (TechCom-Approved), Performance-Team (Radar), Operations, Services (designing), User-mobrovac, Traffic
CDanis moved T178690: Better organization for SRE grafana dashboards from Backlog to Radar on the User-CDanis board.
Nov 9 2018, 7:32 PM · User-CDanis, Patch-For-Review, User-fgiunchedi, monitoring, Operations
CDanis added a project to T201409: Harmonise the identification of requests across our stack: User-CDanis.
Nov 9 2018, 7:31 PM · User-CDanis, TechCom-RFC (TechCom-Approved), Performance-Team (Radar), Operations, Services (designing), User-mobrovac, Traffic
CDanis added a project to T178690: Better organization for SRE grafana dashboards: User-CDanis.
Nov 9 2018, 7:31 PM · User-CDanis, Patch-For-Review, User-fgiunchedi, monitoring, Operations
CDanis moved T177195: Reduce technical debt in metrics monitoring from Backlog to Radar on the User-CDanis board.
Nov 9 2018, 7:28 PM · User-CDanis, User-fgiunchedi, Technical-Debt, Goal, Operations
CDanis added a project to T177195: Reduce technical debt in metrics monitoring: User-CDanis.
Nov 9 2018, 7:28 PM · User-CDanis, User-fgiunchedi, Technical-Debt, Goal, Operations
CDanis closed T208729: Onboarding Chris Danis (CDanis) as Resolved.
Nov 9 2018, 4:35 PM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis moved T208729: Onboarding Chris Danis (CDanis) from Backlog to Blocked on others on the User-CDanis board.
Nov 9 2018, 5:50 AM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis moved T209134: pwstore access for cdanis from Backlog to Blocked on others on the User-CDanis board.
Nov 9 2018, 5:50 AM · User-CDanis, SRE-Access-Requests, Operations, User-herron
CDanis added a project to T208729: Onboarding Chris Danis (CDanis): User-CDanis.
Nov 9 2018, 5:15 AM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis triaged T209134: pwstore access for cdanis as Normal priority.
Nov 9 2018, 4:45 AM · User-CDanis, SRE-Access-Requests, Operations, User-herron

Nov 7 2018

CDanis added a parent task for T208952: Requesting personal tag User-cdanis: T208729: Onboarding Chris Danis (CDanis).
Nov 7 2018, 3:22 PM · Project-Admins
CDanis added a subtask for T208729: Onboarding Chris Danis (CDanis): T208952: Requesting personal tag User-cdanis.
Nov 7 2018, 3:22 PM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis added a parent task for T208952: Requesting personal tag User-cdanis: T555: Per-user projects for personal work in progress tracking.
Nov 7 2018, 3:12 PM · Project-Admins
CDanis added a subtask for T555: Per-user projects for personal work in progress tracking: T208952: Requesting personal tag User-cdanis.
Nov 7 2018, 3:12 PM · Upstream, Phabricator (Upstream)
CDanis created T208952: Requesting personal tag User-cdanis.
Nov 7 2018, 3:12 PM · Project-Admins
CDanis added a comment to T208729: Onboarding Chris Danis (CDanis).

Need one more signature on my GPG key before pwstore access can be granted

Nov 7 2018, 1:02 AM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis updated the task description for T208729: Onboarding Chris Danis (CDanis).
Nov 7 2018, 1:02 AM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests

Nov 5 2018

CDanis updated the task description for T208729: Onboarding Chris Danis (CDanis).
Nov 5 2018, 10:01 PM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis updated the task description for T208729: Onboarding Chris Danis (CDanis).
Nov 5 2018, 8:14 PM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis updated the task description for T208729: Onboarding Chris Danis (CDanis).
Nov 5 2018, 3:33 PM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests
CDanis created T208729: Onboarding Chris Danis (CDanis).
Nov 5 2018, 2:55 PM · User-CDanis, Patch-For-Review, SRE-Access-Requests, Operations, User-herron, LDAP-Access-Requests