Page MenuHomePhabricator

razzi
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Aug 26 2020, 8:28 PM (26 w, 9 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
RAbuissa (WMF) [ Global Accounts ]

Recent Activity

Today

razzi created P14471 reportupdater error.
Thu, Feb 25, 1:29 AM

Tue, Feb 23

razzi created T275575: Add superset-next.wikimedia.org domain for superset staging.
Tue, Feb 23, 10:43 PM · Patch-For-Review, Analytics

Fri, Feb 19

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Fri, Feb 19, 6:08 PM · Analytics-Clusters
razzi added a comment to T272390: Upgrade to Superset 1.0.

ALTER TABLE row_level_security_filters ROW_FORMAT=DYNAMIC; fixed it, thanks! Here's the full procedure so the order is clear:

Fri, Feb 19, 4:47 PM · Analytics

Thu, Feb 18

razzi added a comment to T272390: Upgrade to Superset 1.0.

Ok, the problem was that I had upgraded the pip version in the docker container when building the wheels, which made the wheels incompatible with the staging server.

Thu, Feb 18, 11:20 PM · Analytics
razzi added a comment to T272390: Upgrade to Superset 1.0.

I tried to deploy superset to the staging box, but it failed with

Thu, Feb 18, 5:55 PM · Analytics
razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

I'm thinking of writing up the steps for rebalancing partitions in a wiki article such as https://wikitech.wikimedia.org/wiki/Kafka/Administration, and I'm reminded of how I scp'd the topicmappr executable to kafka-jumbo1002 and how that's hacky. Should we make a plan to properly package topicmappr?

Thu, Feb 18, 5:58 AM · Analytics-Clusters

Wed, Feb 17

razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

Ok! Now that we're on to the final and highest traffic topics, webrequest_upload and webrequest_text, we're switching to migrating one partition at a time. Here are the full migrations plans, in case they get modified in the process.

Wed, Feb 17, 8:19 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Wed, Feb 17, 5:36 PM · Analytics-Clusters

Sat, Feb 13

razzi updated subscribers of T269211: Convert labsdb1012 from multi-source to multi-instance.

Since it's already mid-February and there's still preparation to do, we're going to wait until sqoop runs on March 1 to proceed with this.

Sat, Feb 13, 12:11 AM · DBA, Patch-For-Review, Analytics-Clusters
razzi created T274690: Update sqoop to work with multi-instance clouddb1021 mariadb host.
Sat, Feb 13, 12:03 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Fri, Feb 12

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Fri, Feb 12, 6:41 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Fri, Feb 12, 6:31 PM · Analytics-Clusters

Wed, Feb 10

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Wed, Feb 10, 11:04 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Wed, Feb 10, 5:55 PM · Analytics-Clusters

Tue, Feb 9

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Tue, Feb 9, 3:32 PM · Analytics-Clusters

Mon, Feb 8

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Mon, Feb 8, 4:24 PM · Analytics-Clusters

Fri, Feb 5

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Fri, Feb 5, 8:34 PM · Analytics-Clusters

Thu, Feb 4

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Feb 4, 7:33 PM · Analytics-Clusters

Wed, Feb 3

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Wed, Feb 3, 8:04 PM · Analytics-Clusters
razzi added a comment to T265126: Improve logging for HDFS Namenodes.

@Ottomata and I discussed next steps for this ticket, and came up with the following:

Wed, Feb 3, 3:39 PM · Patch-For-Review, Analytics-Clusters

Tue, Feb 2

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Tue, Feb 2, 10:13 PM · Analytics-Clusters
razzi moved T233336: Add urlshortener button to Turnilo from Next Up to Done on the Analytics-Kanban board.
Tue, Feb 2, 6:56 PM · Analytics-Kanban, Patch-For-Review, Analytics
razzi claimed T233336: Add urlshortener button to Turnilo.
Tue, Feb 2, 6:55 PM · Analytics-Kanban, Patch-For-Review, Analytics

Mon, Feb 1

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Mon, Feb 1, 10:21 PM · Analytics-Clusters
razzi added a comment to T273004: Presto should warn or prevent users from querying without Hive partition predicates.

@JAllemandou what do you mean by snapshot data?

Mon, Feb 1, 8:37 PM · Patch-For-Review, Analytics
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Mon, Feb 1, 7:04 PM · Analytics-Clusters
razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

As we get into the higher-volume topics, we are seeing some alerts about replica max lag and under-replicated partions. As I continue to run migrations, those alerts should be disabled for a few hours at a time and the metrics should be observed manually in Grafana.

Mon, Feb 1, 7:03 PM · Analytics-Clusters

Fri, Jan 29

razzi added a comment to T273004: Presto should warn or prevent users from querying without Hive partition predicates.

One way to go about this may be to use hive.max-partitions-per-scan. From the docs:

Fri, Jan 29, 11:25 PM · Patch-For-Review, Analytics
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Fri, Jan 29, 10:53 PM · Analytics-Clusters
razzi created T273337: #wikimedia-analytics irc logs stopped on 2021-01-27.
Fri, Jan 29, 10:52 PM · User-MacFan4000, WM-Bot
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Fri, Jan 29, 7:40 PM · Analytics-Clusters
razzi added a comment to T273311: Security Issue Access Request for razzi.

Thanks, all set.

Fri, Jan 29, 7:21 PM · SecTeam-Processed, Security-Team, Security
razzi created T273311: Security Issue Access Request for razzi.
Fri, Jan 29, 6:35 PM · SecTeam-Processed, Security-Team, Security

Thu, Jan 28

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 10:01 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 9:02 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 7:14 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 6:59 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 6:42 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 5:50 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Thu, Jan 28, 5:04 PM · Analytics-Clusters

Tue, Jan 26

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Tue, Jan 26, 9:32 PM · Analytics-Clusters
razzi added a comment to T268784: Configure superset cache .

@Jgreen I did get this working, and confirmed it was working by visiting it in the UI, where you can see whether a chart is cached in the overflow menu:

Tue, Jan 26, 8:45 PM · Analytics-Clusters, Product-Analytics
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Tue, Jan 26, 7:25 PM · Analytics-Clusters

Jan 25 2021

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Jan 25 2021, 9:35 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Jan 25 2021, 8:44 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Jan 25 2021, 8:39 PM · Analytics-Clusters
razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Jan 25 2021, 6:56 PM · Analytics-Clusters

Jan 21 2021

razzi updated the task description for T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
Jan 21 2021, 10:18 PM · Analytics-Clusters
razzi closed T271089: Check home/HDFS leftovers of kaldari as Resolved.

Removed /srv/home/kaldari.

Jan 21 2021, 7:27 PM · Analytics-Kanban, Analytics
razzi closed T271092: Check home/HDFS leftovers of dcipoletti as Resolved.

Dropped /srv/home/dcipoletti.

Jan 21 2021, 7:21 PM · Analytics-Kanban, Analytics
razzi moved T268809: AQS pageview default caching is one day from Next Up to Done on the Analytics-Kanban board.
Jan 21 2021, 5:52 PM · Analytics-Kanban, Analytics

Jan 20 2021

razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

One more useful command: to change the throttle rate, run the on the node data is coming from and the node the data is going to. For example, if data is being copied from kafka-jumbo1003 to kafka-jumbo1009:

Jan 20 2021, 9:18 PM · Analytics-Clusters
razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

Migrated the following topics on kafka-jumbo:

Jan 20 2021, 9:04 PM · Analytics-Clusters
razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

With @ottomatta we came up with a way to rollback a partition migration.
When applying a migration, it prints the current state, which can be used to migrate the partitions back,
however while a migration is running, trying to start another gives the error "There is an existing assignment running."

Jan 20 2021, 5:49 PM · Analytics-Clusters
razzi added a comment to T272344: Matomo database backup size doubled, we should check this is normal operation.

@jcrespo It looks like this is normal - traffic to wikimediafoundation.org has spiked since the 20th birthday last week, so the access logs have grown proportionally.

Jan 20 2021, 4:41 PM · Data-Persistence-Backup, Analytics

Jan 14 2021

razzi moved T272058: Address jackson version security vulnerabilities in refinery-source from Incoming to Security Maturity and Data Privacy on the Analytics board.
Jan 14 2021, 6:14 PM · Analytics
razzi raised the priority of T272052: Traffic anomalies: Factor out list of countries into a dedicated Hive table from Medium to High.
Jan 14 2021, 6:13 PM · Analytics-Kanban, SRE, Traffic, Analytics
razzi triaged T272052: Traffic anomalies: Factor out list of countries into a dedicated Hive table as Medium priority.
Jan 14 2021, 6:11 PM · Analytics-Kanban, SRE, Traffic, Analytics
razzi edited projects for T272009: an-test-worker1002 may need a DAC replace, added: Analytics-Radar; removed Analytics.
Jan 14 2021, 6:09 PM · Analytics-Radar, SRE, ops-eqiad
razzi moved T269925: Update Spicerack cookbooks to follow the new class API conventions from Incoming to Operational Excellence on the Analytics board.
Jan 14 2021, 6:08 PM · Analytics-Clusters, Analytics-Kanban, Patch-For-Review
razzi moved T271960: New anaconda-wmf release with updated packages from Incoming to Data Exploration Tools on the Analytics board.
Jan 14 2021, 6:06 PM · Analytics-Kanban, Discovery, Product-Analytics, Research, Analytics
razzi assigned T271953: Add client TCP source port to webrequest to JAllemandou.
Jan 14 2021, 6:06 PM · Patch-For-Review, Analytics-Kanban, Analytics
razzi assigned T271870: 404.php shows up in pageview API for 2017 to JAllemandou.
Jan 14 2021, 6:02 PM · Analytics-Kanban, Analytics, Pageviews-API
razzi moved T271163: TranslationRecommendation* Schemas Event Platform Migration from Incoming to Event Platform on the Analytics board.
Jan 14 2021, 5:59 PM · Patch-For-Review, Research, Analytics, Event-Platform
razzi edited projects for T270503: Presto error in Superest - only when grouping, added: Analytics-Radar; removed Analytics.
Jan 14 2021, 5:59 PM · Analytics-Radar
razzi assigned T269925: Update Spicerack cookbooks to follow the new class API conventions to elukey.
Jan 14 2021, 5:58 PM · Analytics-Clusters, Analytics-Kanban, Patch-For-Review
razzi closed T265952: Retain nonsensitive mediawiki_api_request logging data as Declined.

Closing since there has been no reply; feel free to reopen.

Jan 14 2021, 5:51 PM · Analytics
razzi edited projects for T270768: Degraded RAID on an-coord1002, added: Analytics-Radar; removed Analytics.
Jan 14 2021, 5:40 PM · Analytics-Radar, Patch-For-Review, ops-eqiad, SRE
razzi edited projects for T270112: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking, added: Analytics-Radar; removed Analytics.
Jan 14 2021, 5:39 PM · Analytics-Radar, DBA
razzi removed a project from T240460: Clients need to generate an ISO 8601 formatted timestamp: Analytics-Kanban.
Jan 14 2021, 5:35 PM · MW-1.36-notes (1.36.0-wmf.22; 2020-12-15), Analytics, Event-Platform, MW-1.35-notes (1.35.0-wmf.37; 2020-06-16), Patch-For-Review, Better Use Of Data

Jan 12 2021

razzi closed T268202: Eq: 5 VM request for kafka-test-eqiad cluster as Resolved.

Cluster is up and running!

Jan 12 2021, 11:52 PM · Patch-For-Review, vm-requests, SRE
razzi closed T268074: Create kafka test cluster as Resolved.

Cluster is up and running!

Jan 12 2021, 11:52 PM · Patch-For-Review, Analytics-Clusters
razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.
  1. Migration plan for partition rebalancing
Jan 12 2021, 11:50 PM · Analytics-Clusters
razzi added a comment to T271467: Kerberos principal for kharlan.

@kostajh that was quick! Comment if you have any issues.

Jan 12 2021, 8:02 PM · Analytics
razzi added a comment to T271844: Requesting Kerberos password for janstee.

@JAnstee_WMF this should be all set, check your email :) and comment if you have any issues!

Jan 12 2021, 7:58 PM · Analytics
razzi added a comment to T271845: Request for Kerberos password.

@DNdubane_WMF this should be all set, check your email :) and comment if you have any issues!

Jan 12 2021, 7:56 PM · Analytics, SRE

Jan 8 2021

razzi moved T268219: Move Superset and Turnilo to an-tool1010 from Q3 2020/2021 to Done on the Analytics-Clusters board.
Jan 8 2021, 4:57 PM · Patch-For-Review, Analytics-Clusters
razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

Here's the error from attempting to decommission analytics-tool1004:

Jan 8 2021, 4:41 PM · Patch-For-Review, Analytics-Clusters

Jan 4 2021

razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

Spoke with @elukey and we're thinking of leaving turnilo on an-tool1007 for now, rather than co-locating it with superset, so that issues with either service won't affect the other. If we go that route, all that's left for this ticket is to decommission analytics-tool1004. @Ottomata what do you think?

Jan 4 2021, 6:28 PM · Patch-For-Review, Analytics-Clusters

Dec 22 2020

razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

Superset is now running on an-tool1010, so analytics-tool1004 can be decommissioned.

Dec 22 2020, 9:57 PM · Patch-For-Review, Analytics-Clusters
razzi moved T268784: Configure superset cache from Q2 2020/2021 to Done on the Analytics-Clusters board.
Dec 22 2020, 5:19 PM · Analytics-Clusters, Product-Analytics

Dec 18 2020

razzi added a comment to T255973: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers.

@Ottomata when I tested topicmappr before, I uploaded the binary directly onto the host; when we do this in production, will it make sense to debianize https://github.com/DataDog/kafka-kit/?

Dec 18 2020, 6:17 PM · Analytics-Clusters
razzi added a comment to T268784: Configure superset cache .

Quick poll: what should the default caching timeout be? I'm thinking 12 hours, since it seems most charts have daily granularity, so viewing a chart one day and then the next day will show the latest data point. The timeout is also configurable on a per-table or per-chart level, but I expect most users won't discover this.

Dec 18 2020, 6:04 PM · Analytics-Clusters, Product-Analytics
razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

For superset, the following 3 patches should be all we need to move traffic over with a short window of downtime:

Dec 18 2020, 4:04 PM · Patch-For-Review, Analytics-Clusters

Dec 17 2020

razzi committed rLPRIa2c56b2d42e2: Add fake kerberos keytabs for an-tool1010 (authored by razzi).
Add fake kerberos keytabs for an-tool1010
Dec 17 2020, 10:55 PM

Dec 14 2020

razzi added a comment to T269616: Set yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds.

This has been deployed to the hadoop workers and master. To test, we can view a long-running job and see that its logs are aggregated at the 1-hour mark.

Dec 14 2020, 7:12 PM · Analytics-Kanban, Analytics-Clusters

Dec 11 2020

razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

@elukey: I confirmed that memcached was working based on the presence of superset_result keys in memcached.

Dec 11 2020, 4:19 PM · Patch-For-Review, Analytics-Clusters

Dec 9 2020

razzi added a comment to T268784: Configure superset cache .

Current progress: configured staging superset to use memcache, but pylibmc was installed as an apt package and the process uses a virtual environment, so pylibmc needs to be installed there, in the virtual environment, using the https://gerrit.wikimedia.org/r/admin/repos/analytics/superset/deploy repository.

Dec 9 2020, 9:50 PM · Analytics-Clusters, Product-Analytics

Dec 4 2020

razzi added a comment to T268219: Move Superset and Turnilo to an-tool1010.

My thought for next steps here is to install superset on an-tool1010, using the existing database at an-coord1001, and testing that caching works as expected.

Dec 4 2020, 8:48 PM · Patch-For-Review, Analytics-Clusters
razzi added a comment to T268202: Eq: 5 VM request for kafka-test-eqiad cluster.

@Ottomata Yeah, I'll add an $is_critical parameter.

Dec 4 2020, 3:32 PM · Patch-For-Review, vm-requests, SRE

Dec 3 2020

razzi committed rLPRI8021e39fbd9b: Add dummy files for kafka_test-eqiad_broker (authored by razzi).
Add dummy files for kafka_test-eqiad_broker
Dec 3 2020, 9:14 PM

Dec 1 2020

razzi added a comment to T268202: Eq: 5 VM request for kafka-test-eqiad cluster.

Here's the cumin output for the kafka-test1001 decomission:

Dec 1 2020, 3:33 PM · Patch-For-Review, vm-requests, SRE

Nov 30 2020

razzi added a comment to T268202: Eq: 5 VM request for kafka-test-eqiad cluster.

I originally created these virtual machines in the analytics vlan, but it should be in the default private network instead, so I'm decommissioning the nodes that I created and remaking them.

Nov 30 2020, 9:16 PM · Patch-For-Review, vm-requests, SRE

Nov 19 2020

razzi added a comment to T268202: Eq: 5 VM request for kafka-test-eqiad cluster.

@Ottomata and I are planning create a new small standalone node to be the zookeeper, requiring 2GB ram, 20G disk, and 2 vcpus.

Nov 19 2020, 10:06 PM · Patch-For-Review, vm-requests, SRE
razzi claimed T268202: Eq: 5 VM request for kafka-test-eqiad cluster.

I plan to put these machines on the same ganeti host, since as a test use case we don't need high availability. Let me know if they should be distributed instead.

Nov 19 2020, 7:31 PM · Patch-For-Review, vm-requests, SRE
razzi updated subscribers of T268202: Eq: 5 VM request for kafka-test-eqiad cluster.

@akosiaris does this seem like a reasonable request?

Nov 19 2020, 3:57 AM · Patch-For-Review, vm-requests, SRE
razzi updated the task description for T268202: Eq: 5 VM request for kafka-test-eqiad cluster.
Nov 19 2020, 3:56 AM · Patch-For-Review, vm-requests, SRE
razzi created T268202: Eq: 5 VM request for kafka-test-eqiad cluster.
Nov 19 2020, 3:55 AM · Patch-For-Review, vm-requests, SRE

Nov 17 2020

razzi added a comment to T268074: Create kafka test cluster.
  • new zookeeper cluster or reuse a zookeeper cluster?
Nov 17 2020, 9:09 PM · Patch-For-Review, Analytics-Clusters