Page MenuHomePhabricator

LSobanski (Lukasz Sobanski)
Woo$

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Aug 31 2020, 5:40 PM (42 w, 3 d)
Availability
Available
LDAP User
LSobanski
MediaWiki User
LSobanski (WMF) [ Global Accounts ]

Recent Activity

Yesterday

LSobanski added a comment to T282484: (Need By: TBD) rack/setup/install pc1011-pc1014.

Hi. Do we have an idea for when these hosts could be available? We have ongoing issues with parsercache (see T282761) that we hope moving to the new HW will partially mitigate.

Thu, Jun 24, 2:04 PM · SRE, ops-eqiad, DC-Ops

Wed, Jun 23

LSobanski added a comment to T282761: purgeParserCache.php should not take over 24 hours for its daily run.

Notes from the meeting today with @Kormat and @Krinkle

  • Code changes to be made:
    • Recording number of iterations since the last percentage print (purge script feature to be added).
    • Running on multiple servers (adding a --server parameter) requires changes to both the maintenance function in SqlBagOStuff and the script.
  • Sleep interval deduction (making sleep variable instead of static) requires further data gathering.
  • Filtering deletes from replication and running on both DCs would result in spare PC not getting purged but this won't be a problem once the script is host-aware.
  • We can test the new hosts, they would have to be introduced as replicas first.
Wed, Jun 23, 2:49 PM · MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), Patch-For-Review, Parsoid (Tracking), MediaWiki-Parser, DBA, Performance-Team
LSobanski moved T284483: migrate clouddb backups (openstack) from the old mysqldump system to the new wmfbackups (mydumper) from Triage to Backlog on the Data-Persistence-Backup board.
Wed, Jun 23, 12:46 PM · bacula, database-backups, Data-Persistence-Backup, Data-Services, cloud-services-team (Kanban)
LSobanski moved T274463: Backups for GitLab from Triage to Done on the Data-Persistence-Backup board.
Wed, Jun 23, 12:45 PM · serviceops, Data-Persistence-Backup, Patch-For-Review, User-brennen, GitLab (Initialization)

Tue, Jun 22

LSobanski added a comment to T282761: purgeParserCache.php should not take over 24 hours for its daily run.

>>>! In T282761#7169546, @LSobanski wrote:

Do you happen to know if there is a specific reason to do the deletes in the above way?

It's hard to be sure since the code is so old, but based on today's best practices I would retroactively say that for production code our policy is generally to only perform writes or deletes by primary key. This is for performance, but also to be determistic and safe under various other conditions. I suppose it also helps with row-based replication. I understand parser caches could be exempt from this, as they are special in various ways, but from a code perspective we currently treat it like we treat anything else in production.

Does this mean it's not an option or can we consider an exception for parser cache? It looks like a promising way forward.

Tue, Jun 22, 7:45 PM · MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), Patch-For-Review, Parsoid (Tracking), MediaWiki-Parser, DBA, Performance-Team
LSobanski updated subscribers of T282761: purgeParserCache.php should not take over 24 hours for its daily run.

@Krinkle @aaron do you happen to know if there is a specific reason to do the deletes in the above way?

Tue, Jun 22, 2:01 PM · MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), Patch-For-Review, Parsoid (Tracking), MediaWiki-Parser, DBA, Performance-Team

Mon, Jun 21

LSobanski moved T282761: purgeParserCache.php should not take over 24 hours for its daily run from Refine to In progress on the DBA board.
Mon, Jun 21, 8:40 PM · MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), Patch-For-Review, Parsoid (Tracking), MediaWiki-Parser, DBA, Performance-Team
LSobanski added a comment to T282761: purgeParserCache.php should not take over 24 hours for its daily run.

We're back to 67 hours today.

Mon, Jun 21, 2:06 PM · MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), Patch-For-Review, Parsoid (Tracking), MediaWiki-Parser, DBA, Performance-Team
LSobanski moved T285082: Rebase pt-heartbeat-wikimedia on modern upstream version from Triage to Backlog on the DBA board.
Mon, Jun 21, 7:06 AM · DBA
LSobanski moved T285079: Investigate pt-heartbeat-wikimedia failure modes from Triage to In progress on the DBA board.
Mon, Jun 21, 7:06 AM · DBA

Wed, Jun 16

LSobanski triaged T284928: Prepare and check storage layer for shiwiki as Medium priority.

Thanks, let us know when the database is created, so we can sanitize it.

Wed, Jun 16, 8:32 PM · Data-Services, DBA

Fri, Jun 11

LSobanski triaged T284819: Deploy wmfmariadbpy 0.7.1 as Medium priority.
Fri, Jun 11, 1:11 PM · DBA

Thu, Jun 10

LSobanski edited projects for T284440: tegola-vector-tiles load testing and Swift throughput experiments, added: SRE-swift-storage; removed Data-Persistence.
Thu, Jun 10, 12:40 PM · SRE-swift-storage, Maps, SRE, Product-Infrastructure-Team-Backlog
LSobanski updated the task description for T263420: Clean up DB related pages on Wikitech.
Thu, Jun 10, 12:22 PM · Documentation, Data-Persistence-Misc
LSobanski triaged T284648: Switchover s3 from db1123 to db1157 as Medium priority.
Thu, Jun 10, 9:02 AM · Patch-For-Review, DBA

Wed, Jun 9

LSobanski triaged T284619: Schema change for renaming several indexes in change_tag table as Medium priority.
Wed, Jun 9, 8:46 AM · DBA, Blocked-on-schema-change

Tue, Jun 8

LSobanski updated the task description for T263420: Clean up DB related pages on Wikitech.
Tue, Jun 8, 11:20 AM · Documentation, Data-Persistence-Misc
LSobanski added a comment to T263420: Clean up DB related pages on Wikitech.

@Marostegui @Kormat @jcrespo I took the liberty of subscribing you as it seems like the easiest way to keep track of documentation changes in one place.

Tue, Jun 8, 11:20 AM · Documentation, Data-Persistence-Misc
LSobanski updated subscribers of T263420: Clean up DB related pages on Wikitech.
Tue, Jun 8, 11:19 AM · Documentation, Data-Persistence-Misc
LSobanski added a comment to T263420: Clean up DB related pages on Wikitech.

https://wikitech.wikimedia.org/wiki/SRE/Data_Persistence/Documentation_guidelines documents the approach I've been taking so far.

Tue, Jun 8, 11:17 AM · Documentation, Data-Persistence-Misc
LSobanski added a comment to T263420: Clean up DB related pages on Wikitech.

MariaDB start and stop section is now a separate page: https://wikitech.wikimedia.org/wiki/MariaDB/Start_and_stop

Tue, Jun 8, 11:16 AM · Documentation, Data-Persistence-Misc

Mon, Jun 7

LSobanski moved T270101: Grants not working with DB hosts with to ipv6 from Refine to Backlog on the DBA board.
Mon, Jun 7, 8:16 PM · Infrastructure-Foundations, netbox, DBA
LSobanski triaged T284375: Rename name_title index on page to page_name_title as Medium priority.
Mon, Jun 7, 3:39 PM · DBA, Blocked-on-schema-change
LSobanski triaged T284456: Prepare and check storage layer for dagwiki as Medium priority.

Thanks, let us know when the database is created, so we can sanitize it.

Mon, Jun 7, 3:38 PM · Data-Services, DBA
LSobanski added a comment to T275784: orchestrator: Upgrade to v3.2.5.

https://github.com/openark/orchestrator/releases/tag/v3.2.5

This version does include our patch.
I think we've never upgraded orchestrator since it was installed, it would be a good practice to upgrade and document the upgrade process.

Mon, Jun 7, 9:49 AM · DBA, Orchestrator
LSobanski triaged T284390: Prepare and check storage layer for banwikisource as Medium priority.

Thanks, let us know when the database is created, so we can sanitize it.

Mon, Jun 7, 9:37 AM · Data-Services, DBA

Wed, Jun 2

LSobanski created media-backups.
Wed, Jun 2, 12:39 PM
LSobanski created database-backups.
Wed, Jun 2, 12:37 PM
LSobanski created bacula.
Wed, Jun 2, 12:36 PM
LSobanski removed a hashtag from Data-Persistence-Backup: #bacula.
Wed, Jun 2, 12:36 PM
LSobanski added a comment to T283607: Requesting access to production deployment for David Lynch.

I updated the Wikitech instructions page to clarify the requirements (diff here: https://wikitech.wikimedia.org/w/index.php?title=Creating_new_tables&type=revision&diff=1913898&oldid=1913356). @DLynch I would appreciate a comment on whether it's clear enough.

Wed, Jun 2, 12:32 PM · SRE, SRE-Access-Requests
LSobanski moved T270112: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking from Refine to Backlog on the DBA board.
Wed, Jun 2, 11:21 AM · Analytics-Radar, DBA
LSobanski moved T266869: Investigate using orchestrator tags for different type of hosts. from Refine to Backlog on the DBA board.
Wed, Jun 2, 11:18 AM · Orchestrator, DBA
LSobanski edited Description on DBA.
Wed, Jun 2, 11:10 AM
LSobanski edited Description on DBA.
Wed, Jun 2, 11:09 AM

Tue, Jun 1

LSobanski added a comment to T252528: wmf-auto-reinstall fails on hosts that run pt-heartbeat.

A stub document capturing this is at https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host.

Tue, Jun 1, 12:15 PM · DBA, SRE

Mon, May 31

LSobanski added a project to T281135: codfw: Relocate servers in 10G racks : Data-Persistence (Consultation).
Mon, May 31, 8:18 PM · Data-Persistence (Consultation), serviceops, SRE, ops-codfw

Sun, May 30

LSobanski added a comment to T283017: Backup alert proactive notification.

For the "metadata dashboard" usage, something like https://github.com/nocodb/nocodb could be interesting to evaluate.

Sun, May 30, 7:53 PM · Data-Persistence-Backup

Thu, May 27

LSobanski closed T276220: Internal APT repository backup as Resolved.

I think we're all ok with the current state, resolving.

Thu, May 27, 1:40 PM · Data-Persistence-Backup
LSobanski renamed T283017: Backup alert proactive notification from Backup alert email notification to Backup alert proactive notification.
Thu, May 27, 11:41 AM · Data-Persistence-Backup
LSobanski added a comment to T283017: Backup alert proactive notification.

This turned into a much bigger scope than my original intention but it's a good discussion to have. I am now thinking that my ask was not well defined and needs to be adjusted, especially given my dislike of using email for operational notification :)

Thu, May 27, 11:41 AM · Data-Persistence-Backup
LSobanski moved T263463: Update the DBA task tracking workflow from In Progress to Ready on the Data-Persistence-Misc board.
Thu, May 27, 10:16 AM · Data-Persistence-Misc, PM

Wed, May 26

LSobanski assigned T283499: Schema change for renaming page_timestamp index on revision table to rev_page_timestamp to Kormat.

Assigning to Stevie to confirm if this can go into Ready.

Wed, May 26, 6:24 PM · DBA, Blocked-on-schema-change

May 25 2021

LSobanski moved T283580: Data Persistence IRC channels updates from Triage to Refine on the Data-Persistence-Misc board.
May 25 2021, 3:47 PM · Data-Persistence-Misc
LSobanski added a comment to T264162: IRC bot adjustments for #wikimedia-databases.

The remaining work may be addressed by T283580

May 25 2021, 3:46 PM · Wikibugs, Data-Persistence-Misc
LSobanski edited projects for T283580: Data Persistence IRC channels updates, added: Data-Persistence-Misc; removed Data-Persistence.
May 25 2021, 3:45 PM · Data-Persistence-Misc
LSobanski edited Description on Data-Persistence-Misc.
May 25 2021, 3:44 PM
LSobanski renamed Data-Persistence-Misc from Data-Persistence-Admin to Data-Persistence-Misc.
May 25 2021, 3:42 PM
LSobanski moved T277994: Rename global_block_whitelist table from Refine to Backlog on the DBA board.
May 25 2021, 2:09 PM · Schema-change, DBA, GlobalBlocking, Voice & Tone
LSobanski moved T283239: db-replication-tree doesn't support circular replication from Pending comment to Ready on the DBA board.
May 25 2021, 2:02 PM · DBA
LSobanski removed a project from T112282: Multiple pages with no revisions: SRE.
May 25 2021, 11:54 AM · DBA
LSobanski removed a project from T119626: Eliminate SPOF at the main database infrastructure: SRE.
May 25 2021, 11:53 AM · Epic, DBA
LSobanski removed a project from T266338: orchestrator: Add service monitoring: SRE.
May 25 2021, 11:53 AM · Orchestrator, User-Kormat, DBA
LSobanski removed a project from T141968: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master: SRE.
May 25 2021, 11:53 AM · observability, DBA
LSobanski removed a project from T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092: SRE.
May 25 2021, 11:53 AM · DBA
LSobanski removed a project from T157702: Followup for TLS MariaDB server roll-out: SRE.
May 25 2021, 11:53 AM · observability, Patch-For-Review, DBA
LSobanski removed a project from T141547: Setup automatic failover for misc database servers: SRE.
May 25 2021, 11:53 AM · DBA, Sustainability
LSobanski removed a project from T152427: Create a check/calendar alert for MariaDB TLS certs: SRE.
May 25 2021, 11:53 AM · observability, DBA
LSobanski removed a project from T164834: In some database hosts, performance schema loses digest statistics: SRE.
May 25 2021, 11:53 AM · DBA
LSobanski removed a project from T175672: Make apache/maintenance hosts TLS connections to mariadb work: SRE.
May 25 2021, 11:53 AM · Services (watching), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), DBA
LSobanski removed a project from T196378: Investigate solutions for MySQL connection pooling: SRE.
May 25 2021, 11:53 AM · DBA, Sustainability (MediaWiki-MultiDC), Performance-Team (Radar)
LSobanski removed a project from T255849: refactor mariadb puppet code to have single mapping of multiinstance section to port numbers: SRE.
May 25 2021, 11:53 AM · Patch-For-Review, DBA
LSobanski removed a project from T255758: Replace `role::prometheus::mysqld_exporter` with `profile::prometheus::mysqld_exporter_instance`: SRE.
May 25 2021, 11:53 AM · DBA
LSobanski removed a project from T256972: Refactor mariadb puppet code: SRE.
May 25 2021, 11:53 AM · Patch-For-Review, DBA, User-jbond, User-Kormat
LSobanski removed a project from T257814: Use zarcillo as an authoritative inventory of db instances/roles: SRE.
May 25 2021, 11:53 AM · DBA, User-Kormat, Epic
LSobanski removed a project from T257819: Add haproxy config for zarcillo: SRE.
May 25 2021, 11:53 AM · DBA, User-Kormat, Epic
LSobanski removed a project from T256845: Add monitoring to ensure that puppet/tendril/zarcillo all agree on the set of sections that exist: SRE.
May 25 2021, 11:53 AM · User-Kormat, DBA
LSobanski removed a project from T257821: Add monitoring to ensure consistency between puppet and zarcillo: SRE.
May 25 2021, 11:53 AM · DBA, User-Kormat
LSobanski removed a project from T263127: Remove groups from db configs: SRE.
May 25 2021, 11:53 AM · Performance-Team (Radar), Platform Engineering Roadmap Decision Making, User-Kormat, DBA
LSobanski removed a project from T265266: Create integration test env for wmfmariadbpy: SRE.
May 25 2021, 11:53 AM · Release-Engineering-Team (Radar), Continuous-Integration-Config, User-Kormat, DBA
LSobanski removed a project from T265901: Clean up role::mariadb::ferm and profile::mariadb::ferm: SRE.
May 25 2021, 11:53 AM · DBA, User-Kormat
LSobanski removed a project from T266119: mariadb::config: parameterize event_scheduler: SRE.
May 25 2021, 11:52 AM · DBA, User-Kormat
LSobanski added a project to T283580: Data Persistence IRC channels updates: Data-Persistence.
May 25 2021, 11:41 AM · Data-Persistence-Misc
LSobanski created T283580: Data Persistence IRC channels updates.
May 25 2021, 11:41 AM · Data-Persistence-Misc
LSobanski moved T250715: Drop (and archive?) aft_feedback from Blocked to Ready on the DBA board.
May 25 2021, 10:59 AM · SecTeam-Processed, Privacy Engineering, Security-Team, DBA

May 24 2021

LSobanski removed a project from T172492: Database alerting: User-LSobanski.
May 24 2021, 2:02 PM · Data-Persistence, Epic, observability
LSobanski added a project to T172492: Database alerting: User-LSobanski.
May 24 2021, 10:19 AM · Data-Persistence, Epic, observability
LSobanski created User-LSobanski.
May 24 2021, 10:15 AM

May 21 2021

LSobanski moved T283239: db-replication-tree doesn't support circular replication from Triage to Pending comment on the DBA board.

@Kormat How do you see the urgency of this (and the probability we'll work on it anytime soon)?

May 21 2021, 12:52 PM · DBA
LSobanski assigned T283093: Schema change for making cuc_id in cu_changes unsigned to Kormat.

Assigning to Stevie to confirm if this can go into Ready.

May 21 2021, 12:51 PM · DBA, Blocked-on-schema-change
LSobanski added a comment to T252528: wmf-auto-reinstall fails on hosts that run pt-heartbeat.

This does mean that pt-heartbeat-wikimedia needs to be started manually after a boot, however.

May 21 2021, 10:55 AM · DBA, SRE

May 20 2021

LSobanski moved T276220: Internal APT repository backup from Refine to Done on the Data-Persistence-Backup board.
May 20 2021, 5:27 PM · Data-Persistence-Backup
LSobanski triaged T277015: Evaluate possible solutions to backup Analytics Hadoop's HDFS data as Low priority.

Blocked at least until we get a clear picture from T283261.

May 20 2021, 5:25 PM · Analytics-Clusters, Data-Persistence-Backup
LSobanski moved T205627: Purge and monitor old metadata for the mariadb backups database from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:21 PM · Data-Persistence-Backup
LSobanski moved T200398: Document clearly the mariadb backup and recovery setup, specially how to recover a backup from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:20 PM · Data-Persistence-Backup, User-Marostegui
LSobanski moved T205628: Handle object metadata backups and compare it with stored database object inventory from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:20 PM · Data-Persistence-Backup
LSobanski moved T283017: Backup alert proactive notification from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:20 PM · Data-Persistence-Backup
LSobanski moved T244884: Implement logic to be able to perform incremental backups of ES hosts from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:20 PM · Data-Persistence-Backup, Patch-For-Review, SRE
LSobanski moved T274808: Implement production zookeeper backups from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:19 PM · Analytics-Clusters, Data-Persistence-Backup
LSobanski moved T282775: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction from Triage to Backlog on the Data-Persistence-Backup board.
May 20 2021, 5:19 PM · Infrastructure-Foundations, User-Kormat, SRE-tools, Data-Persistence-Backup, DBA
LSobanski triaged T283261: Define priorities for HDFS data to be backed up as Medium priority.

My understanding based on recent conversations was that a decision is yet to be made about what approach to take with HDFS backup / redundancy. Is that still the case and is this a discovery task or do we have an answer and this is an implementation task? I'm trying to figure out what is the expectation of involvement from our side.

May 20 2021, 5:12 PM · Analytics
LSobanski closed T274513: Investigate intermittent replica lag alarms, a subtask of T172492: Database alerting, as Declined.
May 20 2021, 1:35 PM · Data-Persistence, Epic, observability
LSobanski closed T274513: Investigate intermittent replica lag alarms as Declined.

Having seen this in a while and it's not super clear what the original problem was. Resolving.

May 20 2021, 1:35 PM · DBA
LSobanski renamed T262668: WMF media storage must be adequately backed up from WMF media storage must be adequately backed up in a remote location to WMF media storage must be adequately backed up.
May 20 2021, 1:26 PM · Data-Persistence-Backup, Epic, Goal, SRE, SRE-swift-storage

May 19 2021

LSobanski added a comment to T283125: dbstore1004 85% disk space used..

Timing wise, this should happen before DC switchover happens (likely the week of June 21st) as we'll have our hands full during that time. This makes things tricky as the three weeks before that date happen to be the All Hands and the two Percona trainings. In other words, if the host can be ready for data transfer bu mid next week we're significantly limiting the risk of it having issues,.

May 19 2021, 9:02 AM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA
LSobanski moved T283125: dbstore1004 85% disk space used. from Ready to Blocked on the DBA board.
May 19 2021, 8:34 AM · Patch-For-Review, Analytics-Clusters, Analytics-Kanban, DBA

May 17 2021

LSobanski added a comment to T274234: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002.

Bonus question, is there an option for some traffic shaping / QoS to remediate the above automatically?

May 17 2021, 3:29 PM · Infrastructure-Foundations, bacula, netops, SRE, Data-Persistence-Backup
LSobanski triaged T283017: Backup alert proactive notification as Low priority.
May 17 2021, 3:08 PM · Data-Persistence-Backup
LSobanski created T283017: Backup alert proactive notification.
May 17 2021, 3:08 PM · Data-Persistence-Backup
LSobanski closed T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) as Resolved.
May 17 2021, 11:45 AM · Epic, DBA, Tracking-Neverending