Page MenuHomePhabricator

razzi
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Aug 26 2020, 8:28 PM (42 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
RAbuissa (WMF) [ Global Accounts ]

Recent Activity

Yesterday

razzi closed T284934: Site: 2 VM request for an-airflow100{2,3} as Resolved.

VMS are created, further actions will happen on https://phabricator.wikimedia.org/T284225

Mon, Jun 21, 6:43 PM · vm-requests, SRE
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

an-master1002 is active; assuming nothing goes wrong, we'll keep it active for a couple days so we're confident it's safe to reimage 1001, then failover back and make a plan for the final reimage

Mon, Jun 21, 1:58 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi moved T278423: Upgrade the Hadoop masters to Debian Buster from Waiting for reply to Future on the User-razzi board.
Mon, Jun 21, 1:57 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi moved T278423: Upgrade the Hadoop masters to Debian Buster from Backlog to Waiting for reply on the User-razzi board.
Mon, Jun 21, 1:24 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi added a project to T278423: Upgrade the Hadoop masters to Debian Buster: User-razzi.
Mon, Jun 21, 1:24 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

Didnt' end up doing the failover last week since it was all hands; I think this can be done whenever, @elukey how do you feel about me doing the failover in the next couple of days and leaving it failed over for a couple days?

Mon, Jun 21, 1:23 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Tue, Jun 15

razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

savenamespace worked 🎉 and an-master1002 is running Buster!

Tue, Jun 15, 6:05 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Mon, Jun 14

razzi moved T284225: Create airflow instances for Platform Engineering and Research from Backlog to Ready for action on the User-razzi board.
Mon, Jun 14, 2:58 PM · User-razzi, Analytics-Kanban, Research, Platform Engineering, Analytics
razzi added a comment to T284225: Create airflow instances for Platform Engineering and Research.

Sounds good @Ottomata, creating vms in https://phabricator.wikimedia.org/T284934

Mon, Jun 14, 2:48 PM · User-razzi, Analytics-Kanban, Research, Platform Engineering, Analytics
razzi claimed T284934: Site: 2 VM request for an-airflow100{2,3}.

Sounds good @MoritzMuehlenhoff, will do.

Mon, Jun 14, 2:48 PM · vm-requests, SRE
razzi created T284934: Site: 2 VM request for an-airflow100{2,3}.
Mon, Jun 14, 2:43 PM · vm-requests, SRE

Fri, Jun 11

razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

Ok, this is done, with one last bit of cleanup: I'd like to rename role::analytics_cluster::ui::dashboards to role::analytics_cluster::ui::superset since it's only hosting superset, not turnilo. Currently there are comments like "will eventually host turnilo as well" etc. Then I'll close this.

Fri, Jun 11, 12:02 AM · Patch-For-Review, Analytics-Clusters

Thu, Jun 10

razzi closed T268784: Configure superset cache , a subtask of T268219: Move Superset and Turnilo to an-tool1010, as Resolved.
Thu, Jun 10, 11:20 PM · Patch-For-Review, Analytics-Clusters
razzi closed T268784: Configure superset cache as Resolved.
Thu, Jun 10, 11:20 PM · Analytics-Clusters, Product-Analytics
razzi added a comment to T268784: Configure superset cache .

Update here: we had to roll back the caching because our data access permissions weren't used by caching. Architecturally, this is a real problem: the permissions are checked when the query runs, and if the query doesn't run because it's cached, the permissions aren't checked. The solution would be to mirror our database permission structure with Superset roles, but we don't have plans to do this currently. The issue for that is here: https://phabricator.wikimedia.org/T273850 and I'm closing this one.

Thu, Jun 10, 11:20 PM · Analytics-Clusters, Product-Analytics
razzi added a project to T273064: Setup Analytics team in VO/splunk oncall: User-razzi.
Thu, Jun 10, 11:08 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters, User-fgiunchedi, observability
razzi moved T283733: hdfs dfsadmin saveNamespace fails on an-master1001 from Waiting for reply to Future on the User-razzi board.
Thu, Jun 10, 8:56 PM · User-razzi, Analytics

Wed, Jun 9

razzi moved T283733: hdfs dfsadmin saveNamespace fails on an-master1001 from Backlog to Waiting for reply on the User-razzi board.
Wed, Jun 9, 9:41 PM · User-razzi, Analytics
razzi added a project to T283733: hdfs dfsadmin saveNamespace fails on an-master1001: User-razzi.
Wed, Jun 9, 9:39 PM · User-razzi, Analytics
razzi created P16365 otto screen session.
Wed, Jun 9, 9:37 PM
razzi moved T278423: Upgrade the Hadoop masters to Debian Buster from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Wed, Jun 9, 6:41 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Tue, Jun 8

razzi moved T283125: dbstore1004 85% disk space used. from Ready to Deploy to Done on the Analytics-Kanban board.

@Marostegui we're ready to migrate over, so I'll mark this as done on our end and close it. Thanks for your help!

Tue, Jun 8, 7:50 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi added a comment to T284172: [SPIKE] analytics-airflow jobs development.

Following up from our airflow hang yesterday, here's a working plugin import (based off https://stackoverflow.com/a/66479399/1636613)

Tue, Jun 8, 7:46 PM · Analytics-Kanban, Analytics
razzi closed T279564: New Wikivoyages are only partially included in Stats as Resolved.
Tue, Jun 8, 4:45 PM · Analytics-Kanban, Analytics, Analytics-Wikistats

Mon, Jun 7

razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

Ok, new plan as we discussed at ops sync is to try the upgrade again next week - I'm picking Tuesday June 15. We'll see if the new memory + threads settings fix the snapshot issue, and if they do, we'll proceed with the upgrade. If not, we'll get the cluster back up and running and exit safe mode.

Mon, Jun 7, 5:26 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi added a comment to T283125: dbstore1004 85% disk space used..

@Marostegui new firewall rules are pushed, thanks for the update on your end.

Mon, Jun 7, 3:21 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi closed T284022: Requesting Kerberos password as Resolved.

Should be all set

Mon, Jun 7, 3:00 PM · Analytics
razzi moved T283125: dbstore1004 85% disk space used. from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Mon, Jun 7, 2:49 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA

Fri, Jun 4

razzi moved T279564: New Wikivoyages are only partially included in Stats from Next Up to Done on the Analytics-Kanban board.
Fri, Jun 4, 6:19 PM · Analytics-Kanban, Analytics, Analytics-Wikistats
razzi added a project to T279564: New Wikivoyages are only partially included in Stats: Analytics-Kanban.

@KuboF this is fixed now, there was an unrelated issue (for those interested, https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS#Deploy_new_History_snapshot_for_Wikistats_Backend is currently a manual process, but should be automated).

Fri, Jun 4, 6:18 PM · Analytics-Kanban, Analytics, Analytics-Wikistats

Thu, Jun 3

razzi closed T284081: Kerberos identity for jdl as Resolved.

Should be all set

Thu, Jun 3, 10:38 PM · Analytics
razzi closed T284096: Kerberos identity for phuedx as Resolved.

Should be all set

Thu, Jun 3, 10:38 PM · Analytics
razzi added a comment to T251376: Support right-to-left languages in Wikistats.

I'd like to refactor the CSS and get off of Semantic before we incorporate something like this, though. It's been EOL for a long time and it has lots of security issues that I don't want to carry forward.

Thu, Jun 3, 4:04 PM · I18n, RTL, Analytics
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

As @Ottomata commented in https://phabricator.wikimedia.org/T283733#7121008, we're going to try putting the cluster in safe mode again and taking a snapshot to see if the new heap/gc settings make snapshotting work. If the snapshot works, we can proceed with the original plan, reimaging both nodes. If the snapshotting doesn't work, we can get the cluster back to fully operational, then do the upgrade without draining the cluster / safe mode at a later time.

Thu, Jun 3, 3:22 AM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi added a comment to T283733: hdfs dfsadmin saveNamespace fails on an-master1001.

Historical note: we followed what indicated in https://community.cloudera.com/t5/Community-Articles/Scaling-the-HDFS-NameNode-part-2/ta-p/246681, basically half of the log2(#datanodes) * 20 figure when setting up the current setting, 60.

Thu, Jun 3, 2:54 AM · User-razzi, Analytics

Wed, Jun 2

razzi added a project to T284126: Relabel db1183 to be dbstore1007: DC-Ops.
Wed, Jun 2, 6:52 AM · DC-Ops
razzi created T284126: Relabel db1183 to be dbstore1007.
Wed, Jun 2, 6:51 AM · DC-Ops

Tue, Jun 1

razzi added a comment to T283125: dbstore1004 85% disk space used..

@Marostegui I'll reimage db1183 today, should be set for you to work on it tomorrow.

Tue, Jun 1, 7:25 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi created T284104: Site: 1 VM request for an-airflow1002.
Tue, Jun 1, 6:51 PM · vm-requests, SRE

Thu, May 27

razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

Ok, so this didn't go as planned, but there were no lasting issues or data loss. The full logs of the day are here, the relevant part is from 14:37 to 18:04.

Thu, May 27, 8:36 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Tue, May 25

razzi updated the task description for T283230: Move SRE-related IRC channels to Libera.
Tue, May 25, 11:24 PM · wikimedia-irc-libera, SRE
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

I was able to failover using sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1001-eqiad-wmnet an-master1002-eqiad-wmnet, everything seemed to work ok, restarted hadoop-hdfs-namenode on an-master1001, waited a few minutes, failovered back. Everything seems to be working there.

Tue, May 25, 1:07 AM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Mon, May 24

razzi updated the task description for T283230: Move SRE-related IRC channels to Libera.
Mon, May 24, 7:38 PM · wikimedia-irc-libera, SRE
razzi added a comment to T251376: Support right-to-left languages in Wikistats.

I worked on this at the wikimedia hackathon, and got a prototype working. Some screenshots:

Mon, May 24, 6:32 PM · I18n, RTL, Analytics

May 21 2021

razzi updated subscribers of T283125: dbstore1004 85% disk space used..

@Marostegui glad you were able to figure that out and that it worked on a new reimage. My last attempt timed out, and I was troubleshooting some network issues that might not have been fully resolved when I started the reimage.

May 21 2021, 8:58 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi added a comment to T283125: dbstore1004 85% disk space used..

Hmm, the machine has been renamed and is almost operational, but doesn't have ssh keys so I can't log in. I'm not sure what to do at this point, I tried adding the mariadb::dbstore_multiinstance role in case the problem was that it didn't have a role, but that didn't do anything. If you know what to do from here @Marostegui feel free to go for it; otherwise I'll ask around SRE tomorrow.

May 21 2021, 12:18 AM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi created T283300: Relabel db1125 to be dbstore1006.
May 21 2021, 12:18 AM · SRE, ops-eqiad, DC-Ops

May 20 2021

razzi moved T283125: dbstore1004 85% disk space used. from In Progress to In Code Review on the Analytics-Kanban board.
May 20 2021, 12:50 AM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA

May 19 2021

razzi moved T283125: dbstore1004 85% disk space used. from Next Up to In Progress on the Analytics-Kanban board.
May 19 2021, 5:40 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi closed T182804: Remove request for font.googleapis.com from analytics.wikimedia.org, a subtask of T253393: Revamp analytics.wikimedia.org data portal & landing page, as Resolved.
May 19 2021, 5:14 PM · Epic, Product-Analytics, Analytics
razzi closed T182804: Remove request for font.googleapis.com from analytics.wikimedia.org as Resolved.
May 19 2021, 5:14 PM · Analytics-Kanban, Analytics
razzi closed T280784: Aggregate table not working after superset upgrade as Resolved.
May 19 2021, 5:13 PM · Analytics-Kanban, Analytics, Product-Analytics
razzi added a project to T283125: dbstore1004 85% disk space used.: Analytics-Kanban.
May 19 2021, 4:58 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi claimed T283125: dbstore1004 85% disk space used..

Thanks for calling this out @Marostegui and offering db1125. I'll get started on the reimage of db1125.

May 19 2021, 4:55 PM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
razzi moved T282710: Missing data in virtualpageview_hourly table since April 15, 2021 from Ready to Deploy to Done on the Analytics-Kanban board.
May 19 2021, 4:08 PM · Analytics-Kanban, Analytics

May 18 2021

razzi moved T280784: Aggregate table not working after superset upgrade from Next Up to Done on the Analytics-Kanban board.
May 18 2021, 11:07 PM · Analytics-Kanban, Analytics, Product-Analytics
razzi added a project to T280784: Aggregate table not working after superset upgrade: Analytics-Kanban.

I think we're all set here; let me know if anything else needs to be done.

May 18 2021, 11:07 PM · Analytics-Kanban, Analytics, Product-Analytics
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

It should be ok to merge the yarn patch first; from the hadoop docs:

May 18 2021, 11:03 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi created T283113: Create personal board user-razzi.
May 18 2021, 9:12 PM · Project-Admins
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

Thanks for the reviews @elukey and @JAllemandou!

May 18 2021, 6:03 AM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

May 17 2021

razzi committed rAAWO42125e632e4a: Remove google font request for Lato font (authored by razzi).
Remove google font request for Lato font
May 17 2021, 5:35 PM

May 13 2021

razzi moved T182804: Remove request for font.googleapis.com from analytics.wikimedia.org from Next Up to In Code Review on the Analytics-Kanban board.
May 13 2021, 9:27 PM · Analytics-Kanban, Analytics
razzi closed T281427: Re-add disk to an-worker1100 as Resolved.

I checked and the disk is filling up; this can be closed.

May 13 2021, 4:06 PM · Analytics-Kanban, Analytics-Clusters
razzi closed T262660: Review and improve Oozie authorization permissions as Resolved.
May 13 2021, 2:36 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters

May 12 2021

razzi closed T280549: Consolidate labs / production sqoop lists to a single list as Resolved.
May 12 2021, 6:15 PM · Analytics-Kanban, Analytics

May 11 2021

razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

Ok, here's my new plan, including draining the cluster and using safemode to take a stable fsimage. If this looks good to you @elukey we can pick a day at least a week away so that we can communicate the maintenance. We could do this without doing maintenance but I'd appreciate the safety and the opportunity to learn about safemode.

May 11 2021, 9:00 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

May 10 2021

razzi added a comment to T275212: Can't re-run failed Oozie workflows in Hue/Hue-Next (as non-admin).

@nshahquinn-wmf try setting mapreduce.cluster.acls.enable to true?

May 10 2021, 8:07 PM · Analytics-Clusters, Product-Analytics
razzi closed T282185: Add password reset to kerberos manage_principals.py as Resolved.
May 10 2021, 3:34 PM · Analytics-Kanban, Analytics
razzi added a comment to T282185: Add password reset to kerberos manage_principals.py.

Docs updated!

May 10 2021, 3:33 PM · Analytics-Kanban, Analytics

May 7 2021

razzi closed T282077: Reset Kerberos password for nahidunlimited as Resolved.

Great!

May 7 2021, 5:12 PM · Analytics

May 6 2021

razzi moved T282185: Add password reset to kerberos manage_principals.py from Next Up to Done on the Analytics-Kanban board.
May 6 2021, 9:29 PM · Analytics-Kanban, Analytics
razzi added a comment to T282077: Reset Kerberos password for nahidunlimited.

This should be all set; check your email for your reset password.

May 6 2021, 9:28 PM · Analytics
razzi added a comment to T261693: Ensure Puppet checks types as part of the build.

I vote to close this in favor of T166066 as @elukey suggested. Writing spec tests for modules doesn't scale as a solution, and we don't want to slow down CI by running more puppet compilation than we need to.

May 6 2021, 9:11 PM · Infrastructure-Foundations, Patch-For-Review, puppet-compiler, Puppet, SRE
razzi created T282185: Add password reset to kerberos manage_principals.py.
May 6 2021, 8:50 PM · Analytics-Kanban, Analytics

May 5 2021

razzi closed T281917: Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet as Resolved.

Ok, sure enough, the alert has removed an-test-client from its erroring nodes.

May 5 2021, 6:22 PM · Analytics-Clusters
razzi closed T281809: Requesting a kerberos identity for user sihe as Resolved.

@Silvan_WMDE Read the user guide at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide and comment here or chat in #wikimedia-analytics on IRC if you run into any trouble. Have fun exploring the data!

May 5 2021, 6:17 PM · Analytics
razzi added a comment to T281809: Requesting a kerberos identity for user sihe.

Should be all set; email was sent to silvan.heintze@wikimedia.de.

May 5 2021, 5:47 PM · Analytics
razzi claimed T281809: Requesting a kerberos identity for user sihe.
May 5 2021, 5:37 PM · Analytics

May 4 2021

razzi updated subscribers of T281917: Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet.

Hm, that patch fixed the underlying issue, and running the check manually produces the intended result:

May 4 2021, 9:15 PM · Analytics-Clusters
razzi created T281917: Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet.
May 4 2021, 7:15 PM · Analytics-Clusters

May 1 2021

razzi created T281617: Wikistats shows 0 views for April when data isn't available yet.
May 1 2021, 1:07 AM · Analytics-Wikistats, Analytics
razzi claimed T279440: Data drifts between superset_production on an-coord1001 and db1108.

I want to work on this! Is it ok to drop superset_production on db1108 in order to do this? If so, I think I'll be able to figure it out with some trial and error.

May 1 2021, 12:49 AM · Analytics-Kanban, Analytics

Apr 30 2021

razzi moved T278423: Upgrade the Hadoop masters to Debian Buster from In Progress to In Code Review on the Analytics-Kanban board.
Apr 30 2021, 9:48 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi added a comment to T278423: Upgrade the Hadoop masters to Debian Buster.

Alright, here's my plan @elukey, perhaps we can discuss this next week and if it looks good we can plan the maintenance.

Apr 30 2021, 7:53 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Apr 29 2021

razzi added a comment to T280784: Aggregate table not working after superset upgrade.

@cchen I'm not sure what you mean, what do you expect to have happen?

Apr 29 2021, 8:32 PM · Analytics-Kanban, Analytics, Product-Analytics
razzi added a comment to T281427: Re-add disk to an-worker1100.

Ok, it looks like everything is working here, but disk usage is still at 0%:

NAME            FSTYPE            LABEL           UUID                                   FSAVAIL FSUSE% MOUNTPOINT
...
sdl1            ext4              hadoop-k        cb58c727-dec9-4abf-8b21-3d70a6443b6d      1.8T     0% /var/lib/hadoop/data/k
Apr 29 2021, 5:59 PM · Analytics-Kanban, Analytics-Clusters
razzi moved T281427: Re-add disk to an-worker1100 from In Progress to Done on the Analytics-Kanban board.
Apr 29 2021, 5:49 PM · Analytics-Kanban, Analytics-Clusters
razzi moved T280549: Consolidate labs / production sqoop lists to a single list from In Code Review to Done on the Analytics-Kanban board.
Apr 29 2021, 3:46 PM · Analytics-Kanban, Analytics
razzi moved T281427: Re-add disk to an-worker1100 from Next Up to In Progress on the Analytics-Kanban board.
Apr 29 2021, 3:46 PM · Analytics-Kanban, Analytics-Clusters

Apr 28 2021

razzi updated subscribers of T281427: Re-add disk to an-worker1100.

@elukey I was following the instructions at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Swapping_broken_disk but I got a nonzero exit code.

Apr 28 2021, 11:05 PM · Analytics-Kanban, Analytics-Clusters
razzi created T281427: Re-add disk to an-worker1100.
Apr 28 2021, 10:58 PM · Analytics-Kanban, Analytics-Clusters
razzi reassigned T280132: Degraded RAID on an-worker1100 from razzi to Cmjohnson.

For simplicity, I'll create a new task, and this one can stay resolved. Thanks @Cmjohnson!

Apr 28 2021, 10:56 PM · SRE, ops-eqiad
razzi added a project to T280132: Degraded RAID on an-worker1100: Analytics-Kanban.
Apr 28 2021, 10:51 PM · SRE, ops-eqiad
razzi added a comment to T280784: Aggregate table not working after superset upgrade.

@cchen do you mean when switching data sources / creating new charts? We've already disabled the legacy druid connection for new charts, so any druid options you see are using the new druid tables connector.

Apr 28 2021, 9:47 PM · Analytics-Kanban, Analytics, Product-Analytics
razzi added a comment to T280784: Aggregate table not working after superset upgrade.

I believe this can be resolved by switching from the legacy druid connector to druid tables:

Apr 28 2021, 7:02 PM · Analytics-Kanban, Analytics, Product-Analytics
razzi added a comment to T280549: Consolidate labs / production sqoop lists to a single list.

This has been re-deployed: https://gerrit.wikimedia.org/r/c/operations/puppet/+/682791

Apr 28 2021, 4:20 PM · Analytics-Kanban, Analytics

Apr 21 2021

razzi moved T278421: Upgrade furud/flerovium to Debian Buster from Q4 2020/2021 to Done on the Analytics-Clusters board.
Apr 21 2021, 8:00 PM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi moved T278423: Upgrade the Hadoop masters to Debian Buster from Next Up to In Progress on the Analytics-Kanban board.
Apr 21 2021, 7:59 PM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters
razzi added a comment to T273064: Setup Analytics team in VO/splunk oncall.

In our ops sync we decided to add victorops alerting for critical alerts, and I've started adding them to puppet. In the case that the alert is conditionally critical, such as:

nrpe::monitor_service { 'kafka':
    description   => 'Kafka Broker Server',
    nrpe_command  => '/usr/lib/nagios/plugins/check_procs -c 1:1 -C java -a "Kafka /etc/kafka/server.properties"',
    critical      => $is_critical,
    contact_group => 'victorops-analytics',  # I added this locally
    notes_url     => 'https://wikitech.wikimedia.org/wiki/Kafka/Administration',
}
Apr 21 2021, 12:56 AM · User-razzi, Patch-For-Review, Analytics-Kanban, Analytics-Clusters, User-fgiunchedi, observability
razzi moved T280549: Consolidate labs / production sqoop lists to a single list from Next Up to In Code Review on the Analytics-Kanban board.
Apr 21 2021, 12:22 AM · Analytics-Kanban, Analytics