Page MenuHomePhabricator

jcrespo (Jaime Crespo)
Sr Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
May 11 2015, 8:31 AM (201 w, 1 d)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF) [ Global Accounts ]

Recent Activity

Fri, Mar 15

jcrespo added a comment to T210292: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database.

@Marostegui better :-)?

Fri, Mar 15, 10:38 AM · Patch-For-Review, DBA
jcrespo added a comment to T217715: INCIDENT: k8s@codfw prometheus queries disabled -- very slow to execute some queries.

may I ask for the actual (emergency) actions you took, in case it happens again and you are not around? Assuming they are safe to do in a all down scenario.

Fri, Mar 15, 9:36 AM · Patch-For-Review, Wikimedia-Incident, monitoring, Operations
jcrespo committed rOSMDd91e920263a3: backups: Make rentention policy configurable (authored by jcrespo).
backups: Make rentention policy configurable
Fri, Mar 15, 7:49 AM
jcrespo committed rOSMDd614c7224a34: backups: Make rentention policy configurable (authored by jcrespo).
backups: Make rentention policy configurable
Fri, Mar 15, 7:38 AM
jcrespo committed rOSMD51964afdd4df: backups: Make rentention policy configurable (authored by jcrespo).
backups: Make rentention policy configurable
Fri, Mar 15, 7:38 AM
jcrespo committed rOSMDede1ade28aaa: backups: Make rentention policy configurable (authored by jcrespo).
backups: Make rentention policy configurable
Fri, Mar 15, 6:49 AM
jcrespo committed rOSMD1fce0f3c2eaa: backups: Make rentention policy configurable (authored by jcrespo).
backups: Make rentention policy configurable
Fri, Mar 15, 6:44 AM

Thu, Mar 14

jcrespo closed T186730: Change user u11106 to have max 1 open connection as Resolved.

Done, on wikireplicas only, as above:

Thu, Mar 14, 7:18 PM · DBA
jcrespo added a comment to T186730: Change user u11106 to have max 1 open connection.

Ok, doing.

Thu, Mar 14, 7:10 PM · DBA
Marostegui awarded T213404: Design the final architecture for the database binary backups a Cookie token.
Thu, Mar 14, 7:02 PM · DBA
jcrespo closed T213404: Design the final architecture for the database binary backups, a subtask of T206203: Implement database binary backups into the production infrastructure, as Resolved.
Thu, Mar 14, 7:01 PM · Patch-For-Review, Goal, DBA
jcrespo closed T213404: Design the final architecture for the database binary backups as Resolved.
Thu, Mar 14, 7:00 PM · DBA
jcrespo added a comment to T205626: Document clearly the mariadb backup and recovery setup.

https://wikitech.wikimedia.org/wiki/MariaDB/Backups is close to be a complete description of the architecture, only missing some review and the individual application documentation, and maybe some extra details on the recovery process.

Thu, Mar 14, 6:58 PM · Patch-For-Review, DBA
jcrespo added a comment to T213404: Design the final architecture for the database binary backups.

https://wikitech.wikimedia.org/wiki/MariaDB/Backups is close to be a complete description of the architecture, only missing some review and the individual application documentation.

Thu, Mar 14, 6:55 PM · DBA
jcrespo added a comment to T218336: rack/setup/deploy dedicated backup recovery/provisioning hosts.

@Papaul Please see my warnings at T216137#5002854 for Chris, which applys here. I had suggested to use dbstore for these hosts, but @Marostegui didn't agree or disagree at the time, and I am no longer sure it is the right call given there are dbstore hosts which are identical to other db* hosts. Maybe we should call dbprovision2XXX or something similar, as this won't hold live databases? @Marostegui any thoughts?

Thu, Mar 14, 6:47 PM · Operations, DBA, ops-codfw
jcrespo added a comment to T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin.

I think it may be me after all, look at the documentation at https://docs.python.org/3.4/library/multiprocessing.html

Thu, Mar 14, 11:17 AM · Operations-Software-Development
jcrespo added a comment to T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin.

I am open to alternative suggestions that do not crash, please suggest a different model that allows for multiple threads running at the same time while running cumin and do not do a busy loop. :-)

Thu, Mar 14, 11:06 AM · Operations-Software-Development
jcrespo created P8198 auto_increment monitoring 2019-03-14.
Thu, Mar 14, 10:10 AM
jcrespo added a subtask for T212690: DBQueryTimeoutError on Wikidata's Special:Nuke: T188679: Nuke should batch the deletions.
Thu, Mar 14, 7:44 AM · Patch-For-Review, MediaWiki-extensions-Nuke, Wikimedia-production-error, Wikidata
jcrespo added a parent task for T188679: Nuke should batch the deletions: T212690: DBQueryTimeoutError on Wikidata's Special:Nuke.
Thu, Mar 14, 7:43 AM · MediaWiki-extensions-Nuke
jcrespo added a comment to T212690: DBQueryTimeoutError on Wikidata's Special:Nuke.

I am going to be bold and say that maybe the solution to timeouts (or at least part of the solution for timeouts) like this one reported on enwiki:

Thu, Mar 14, 7:43 AM · Patch-For-Review, MediaWiki-extensions-Nuke, Wikimedia-production-error, Wikidata
jcrespo added a comment to T218079: CodeRevisionListView::getRevCount is creating slow queries on mediawiki.org.

Yes sorry, your team tag was added when I wasn't aware of T116948

Thu, Mar 14, 7:27 AM · MediaWiki-Database, MediaWiki-extensions-CodeReview

Wed, Mar 13

jcrespo added a comment to T210292: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database.

while I agree with the interactive usage of backup_mariadb.py, I am not sure about daily_snapshot.py, as it is not supposed to be run interactively, just for the cron. Maybe I can implement an interactive version and rename it to snapshot_mariadb.py or create a separate executable just for that, one of the 2.

Wed, Mar 13, 3:20 PM · Patch-For-Review, DBA
jcrespo added a comment to T210292: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database.

Yeah, only_postprocess: True is ignored by daily_snapshot.py. I wonder if I should either implement it or error on it?

Wed, Mar 13, 3:18 PM · Patch-For-Review, DBA
jcrespo added a comment to T172410: Replace the current multisource analytics-store setup.

Can we resolve this already? I am guessing there may be many followups, but technically this has been done already? Subtasks can be left open as they don't seem to be hard blockers.

Wed, Mar 13, 2:32 PM · Product-Analytics, Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
jcrespo added a comment to T190364: eqiad 10G ports needs.

I have better information of Databases backups/provisioning service:

Wed, Mar 13, 2:03 PM · Operations, netops
jcrespo added a comment to T213404: Design the final architecture for the database binary backups.


Wed, Mar 13, 1:49 PM · DBA
jcrespo renamed T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin from cumin not thread/multiprocess-safe ? to Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin.
Wed, Mar 13, 12:29 PM · Operations-Software-Development
jcrespo added a comment to T218196: Optimize MySQL settings for MediaWiki CI / Quibble.

One thing that would also speed up the thing, among many other ideas, would be to not run mysql_install_db every time, but having a prepared empty data dir that is copied every time. The downside would be that it would have to be rebuilt on each mysql package upgrade.

Wed, Mar 13, 10:36 AM · DBA, Quibble
jcrespo added a parent task for T218189: wmfmariadbpy/CuminExecution must capture Exception cumin.transports.WorkerError: T206203: Implement database binary backups into the production infrastructure.
Wed, Mar 13, 9:16 AM · DBA
jcrespo added a subtask for T206203: Implement database binary backups into the production infrastructure: T218189: wmfmariadbpy/CuminExecution must capture Exception cumin.transports.WorkerError.
Wed, Mar 13, 9:16 AM · Patch-For-Review, Goal, DBA
jcrespo created T218189: wmfmariadbpy/CuminExecution must capture Exception cumin.transports.WorkerError.
Wed, Mar 13, 9:15 AM · DBA
jcrespo added a subtask for T206203: Implement database binary backups into the production infrastructure: T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin.
Wed, Mar 13, 9:01 AM · Patch-For-Review, Goal, DBA
jcrespo added a parent task for T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin: T206203: Implement database binary backups into the production infrastructure.
Wed, Mar 13, 9:01 AM · Operations-Software-Development
jcrespo updated subscribers of T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin.
Wed, Mar 13, 8:59 AM · Operations-Software-Development
jcrespo created T218188: Import issue (bug?) on Python 3.4/3.5 + multiprocessing affecting Cumin.
Wed, Mar 13, 8:50 AM · Operations-Software-Development
jcrespo created P8192 (An Untitled Masterwork).
Wed, Mar 13, 8:35 AM

Tue, Mar 12

jcrespo added a comment to T210292: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database.

Snapshoting is now working consistently. There was a locking issue due to stdout piping + xtrabackup verbose output (for long running backups).
Next: Fixing the port multiplexing issue to allow for multiple simultaneous backups.

Tue, Mar 12, 11:36 PM · Patch-For-Review, DBA
jcrespo triaged T218118: prometheus-node-exporter makes puppet fail because requiring a version that no longer exists on buster as Low priority.
Tue, Mar 12, 4:03 PM · monitoring, Puppet, Operations
jcrespo renamed T213546: Prepare puppet infrastructure for Debian buster from Prepare puppet for Debian buster to Prepare puppet infrastructure for Debian buster.
Tue, Mar 12, 3:59 PM · Patch-For-Review, Packaging, Puppet, Operations
jcrespo added a comment to T213546: Prepare puppet infrastructure for Debian buster.

Oh, I think I used the wrong task for my changes. Sorry about that, I will rename the task to prevent further confusion.

Tue, Mar 12, 3:58 PM · Patch-For-Review, Packaging, Puppet, Operations
jcrespo added a comment to T210292: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database.


Tue, Mar 12, 12:12 PM · Patch-For-Review, DBA
jcrespo committed rOSMDae2642a9cc17: mariadb: Refactor dump_section.py and rename to match functionality (authored by jcrespo).
mariadb: Refactor dump_section.py and rename to match functionality
Tue, Mar 12, 11:37 AM
jcrespo updated the task description for T218079: CodeRevisionListView::getRevCount is creating slow queries on mediawiki.org.
Tue, Mar 12, 9:53 AM · MediaWiki-Database, MediaWiki-extensions-CodeReview
jcrespo added a comment to T205482: CodeReview extension: Code stewardship review.

Causing issues with bad performant queries on mediawiki.org (s3) databases: T218079

Tue, Mar 12, 9:52 AM · Release-Engineering-Team (Kanban), MediaWiki-extensions-CodeReview, Code-Stewardship-Reviews
jcrespo added a parent task for T116948: Undeploy CodeReview: T218079: CodeRevisionListView::getRevCount is creating slow queries on mediawiki.org.
Tue, Mar 12, 9:48 AM · Technical-Debt, MediaWiki-extensions-CodeReview, Wikimedia-Site-requests
jcrespo added a subtask for T218079: CodeRevisionListView::getRevCount is creating slow queries on mediawiki.org: T116948: Undeploy CodeReview.
Tue, Mar 12, 9:48 AM · MediaWiki-Database, MediaWiki-extensions-CodeReview
jcrespo created T218079: CodeRevisionListView::getRevCount is creating slow queries on mediawiki.org.
Tue, Mar 12, 9:35 AM · MediaWiki-Database, MediaWiki-extensions-CodeReview
jcrespo added a comment to T149077: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki.

What would you think of creating a #mediawiki-query-performance label to track these overtime so there are no useless tickets over time like this?

Tue, Mar 12, 8:40 AM · Core Platform Team Kanban (Waiting for Review), Core Platform Team Backlog (Watching / External), Wikimedia-production-error, Patch-For-Review, MediaWiki-API, DBA
jcrespo added a comment to T149077: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki.

I don't think that one is exactly the same, as that joins with page and the above didn't. I think we need a bit of a higher level overview of query performance, grouping by query digest, api start point and timing to get a better understanding of the current state and prioritization. Looking at a 24-hour cycle on enwiki, worst types of queries are (in worst to "best" order):

Tue, Mar 12, 8:22 AM · Core Platform Team Kanban (Waiting for Review), Core Platform Team Backlog (Watching / External), Wikimedia-production-error, Patch-For-Review, MediaWiki-API, DBA
jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.

Let's add those lines to https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494899/13/modules/profile/files/mariadb/daily_snapshot.py

Tue, Mar 12, 7:49 AM · Patch-For-Review, Goal, DBA

Mon, Mar 11

jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.

And what I get is that I don't really know what happened as there is no usage or trace of error.

Mon, Mar 11, 4:26 PM · Patch-For-Review, Goal, DBA
jcrespo added a comment to T210292: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database.

Known bugs right now (CC @Marostegui):

Mon, Mar 11, 3:55 PM · Patch-For-Review, DBA
jcrespo added a comment to T218029: CloudVPS: evaluate convenience of having codfw openstack DBs in proper DB hosts.

@Marostegui I mentioned the same issues at T217891#5015199

Mon, Mar 11, 2:20 PM · cloud-services-team (Kanban)
jcrespo added a comment to T217891: CloudVPS: rework codfw deployments.

I answered @arturo, that this is technically possible and we intend to support this, but we are not ready at the moment. Too main blockers:

Mon, Mar 11, 1:52 PM · cloud-services-team (Kanban)
jcrespo added a comment to T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.

I've removed the rate limit, then we will be able see how the cpu behaves: https://grafana.wikimedia.org/d/000000342/node-exporter-server-metrics?panelId=13&fullscreen&orgId=1&var-node=labsdb1009:9100&from=now-12h&to=now

Mon, Mar 11, 9:04 AM · Tools

Fri, Mar 8

jcrespo added a comment to T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.

Just to be clear of why action was taken- normally we allow tools to use whatever resources they want, but we got a complain on IRC about lagging on the wikireplica web and this was observed the #1 cause of CPU starvation.

Fri, Mar 8, 4:36 PM · Tools
jcrespo added a project to T217898: Geoshape service fails to deliver geoshapes from OSM: User-notice.

More context: This didn't affect the tile service, but some maps show on wikis may fail to load if using the geoshape service.

Fri, Mar 8, 4:21 PM · User-notice, Discovery, Maps
jcrespo added a comment to T217898: Geoshape service fails to deliver geoshapes from OSM.

The people in the know are working on this, trying to figure the best way to fix the issue. They will know more and update when they can (they are working on a fix right now).

Fri, Mar 8, 4:07 PM · User-notice, Discovery, Maps
jcrespo added a comment to T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ipblocks
         type: ALL
possible_keys: ipb_range
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1129010
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: comment
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: enwiki.ipblocks.ipb_reason_id
         rows: 1
        Extra:
Fri, Mar 8, 3:23 PM · Tools
jcrespo added a comment to T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.

Some of the potential queries causing the issue (there were multiple of these running in parallel):

Info: SELECT
                ipb_by_text,
                ipb_address,
                comment_text AS ipb_reason,
                DATE_FORMAT(ipb_timestamp, "%Y-%b-%d") AS timestamp,
                DATE_FORMAT(ipb_expiry, "%Y-%b-%d") AS expiry,
                ipb_anon_only
            FROM
                ipblocks
                LEFT JOIN comment ON ipb_reason_id = comment_id
            WHERE
                (ipb_range_start <= 'v6-260088003980255009B2521EDAE937EA' AND ipb_range_end >= 'v6-260088003980255009B2521EDAE937EA')
                OR (ipb_range_start >= 'v6-260088003980255009B2521EDAE937EA' AND ipb_range_end <= 'v6-260088003980255009B2521EDAE937EA')
         /*8b4924e8*/
Fri, Mar 8, 3:20 PM · Tools
jcrespo added a comment to T217453: Remove etp_user from echo_target_page in production.

Maybe rather than stall it (or in addition), you can block it on a -presumably new- task to purchase such host?

Fri, Mar 8, 2:36 PM · Blocked-on-schema-change, DBA, Growth-Team, Schema-change, Notifications
jcrespo added a comment to T217453: Remove etp_user from echo_target_page in production.

+1

Fri, Mar 8, 2:32 PM · Blocked-on-schema-change, DBA, Growth-Team, Schema-change, Notifications
jcrespo added a comment to T217453: Remove etp_user from echo_target_page in production.

We can do the second at a period of low traffic, although we should do some testing to make sure it doesn't break things.

Fri, Mar 8, 2:21 PM · Blocked-on-schema-change, DBA, Growth-Team, Schema-change, Notifications
jcrespo renamed T217893: Traffic (text) instability due to misbehaving cache server (cp1077), causing a 1.5-2% requests failing from Traffic (text) instability due to unknown cause, causing a 1.5-2% requests failing to Traffic (text) instability due to misbehaving cache server (cp1077), causing a 1.5-2% requests failing.
Fri, Mar 8, 1:40 PM · Patch-For-Review, Operations, Traffic
jcrespo added a comment to T217893: Traffic (text) instability due to misbehaving cache server (cp1077), causing a 1.5-2% requests failing.

it looks indeed like purge requests

Fri, Mar 8, 1:08 PM · Patch-For-Review, Operations, Traffic
jcrespo updated the task description for T217893: Traffic (text) instability due to misbehaving cache server (cp1077), causing a 1.5-2% requests failing.
Fri, Mar 8, 1:06 PM · Patch-For-Review, Operations, Traffic
jcrespo updated the task description for T217893: Traffic (text) instability due to misbehaving cache server (cp1077), causing a 1.5-2% requests failing.
Fri, Mar 8, 1:03 PM · Patch-For-Review, Operations, Traffic
jcrespo created T217893: Traffic (text) instability due to misbehaving cache server (cp1077), causing a 1.5-2% requests failing.
Fri, Mar 8, 12:55 PM · Patch-For-Review, Operations, Traffic

Thu, Mar 7

jcrespo updated subscribers of T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.
Thu, Mar 7, 6:01 PM · Tools
jcrespo added a subtask for T119601: Certain tools users create multiple long running queries that take all memory and/or CPU from labsdb hosts, slowing it down and potentially crashing (tracking): T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.
Thu, Mar 7, 5:53 PM · Data-Services, Tracking, Toolforge, DBA
jcrespo added a parent task for T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited: T119601: Certain tools users create multiple long running queries that take all memory and/or CPU from labsdb hosts, slowing it down and potentially crashing (tracking).
Thu, Mar 7, 5:53 PM · Tools
jcrespo created T217853: tools.meta was creating high cpu load on wikireplicas, rate-limited.
Thu, Mar 7, 5:41 PM · Tools
jcrespo added a comment to T217755: Degraded RAID on db2044.
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Rebuilding)
Thu, Mar 7, 3:37 PM · Operations, ops-codfw
jcrespo added a comment to T208540: ApiContentTranslationSave fails with "Exception: Failed to update a translation section".

Sorry, my fault, I had not refreshed my browser.

Thu, Mar 7, 12:04 PM · ContentTranslation, Wikimedia-production-error
jcrespo added a comment to T208540: ApiContentTranslationSave fails with "Exception: Failed to update a translation section".

I see a relatively high (25 in the last 15 minutes) number of exceptions on /w/api.php?action=translationaids with:

[{exception_id}] {exception_url} InvalidArgumentException from line 333 of /srv/mediawiki/php-1.33.0-wmf.20/includes/Revision/RevisionStore.php: $pageId and $revId cannot both be 0 or null
Thu, Mar 7, 11:42 AM · ContentTranslation, Wikimedia-production-error

Wed, Mar 6

jcrespo added a comment to T100954: Wikitech: update Bacula article.

I made some updates to the databases section and point to the place where most updates of that are happening.

Wed, Mar 6, 6:41 PM · Operations, Documentation
jcrespo committed rOSMD9dfea1616560: mariadb: Refactor dump_section.py and rename to match functionality (authored by jcrespo).
mariadb: Refactor dump_section.py and rename to match functionality
Wed, Mar 6, 6:39 PM
jcrespo changed the status of T161296: Upgrade mysqld_exporter in production from Open to Stalled.

Fixed configuration for buster, but with no additional metrics (same metrics as before).

Wed, Mar 6, 4:38 PM · Patch-For-Review, DBA, User-fgiunchedi, Operations, Prometheus-metrics-monitoring
jcrespo changed the status of T161296: Upgrade mysqld_exporter in production, a subtask of T143896: MySQL metrics monitoring, from Open to Stalled.
Wed, Mar 6, 4:37 PM · monitoring, DBA, Patch-For-Review, Operations, Prometheus-metrics-monitoring
jcrespo added a comment to T213527: Prepare our base system layer for Debian buster.

After the buster upgrade, what appears to be the debmonitor hook fails on apt update, upgrade at db1114 with:

Wed, Mar 6, 4:35 PM · Patch-For-Review, Operations
jcrespo awarded T217756: Please delete phabricator Herald rule H298 a Like token.
Wed, Mar 6, 1:54 PM · Phabricator
jcrespo created T217756: Please delete phabricator Herald rule H298.
Wed, Mar 6, 12:16 PM · Phabricator
jcrespo updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Wed, Mar 6, 12:03 PM · Operations, DBA
jcrespo updated subscribers of T217755: Degraded RAID on db2044.

@Papaul please substitute under warranty or otherwise with a spare, if available (600GB disk)- probably the second.

Wed, Mar 6, 12:02 PM · Operations, ops-codfw
jcrespo committed rOSMDa9de41fadfce: mariadb: Refactor dump_section.py and rename to match functionality (authored by jcrespo).
mariadb: Refactor dump_section.py and rename to match functionality
Wed, Mar 6, 11:39 AM
jcrespo committed rOSMDcb60430adf14: mariadb: Refactor dump_section.py and rename to match functionality (authored by jcrespo).
mariadb: Refactor dump_section.py and rename to match functionality
Wed, Mar 6, 11:36 AM
jcrespo committed rOSMDac23ae815f75: mariadb: Refactor dump_section.py and rename to match functionality (authored by jcrespo).
mariadb: Refactor dump_section.py and rename to match functionality
Wed, Mar 6, 10:56 AM

Tue, Mar 5

jcrespo updated subscribers of T217457: Intermittent slowness on gerrit.

@Mathew.onipe maybe relevant to you ^

Tue, Mar 5, 1:07 PM · Operations, Gerrit
jcrespo added a comment to T214720: db1114 crashed (HW memory issues).

@Cmjohnson The remote IPMI password was out of sync. Just mentioning to add it on the to do list for motherboard changes (this and reviewing the boot order, which you did, thank you!). Not a huge issue, just a heads up to prevent people from bothering you.

Tue, Mar 5, 1:03 PM · Patch-For-Review, DBA, Operations, ops-eqiad
jcrespo added a comment to T215445: comment and actor view challenges for Cloud Services.

Basically, I don't like any of the 3, but that one is the one that doesn't touch production, plus we already have the infrastructure in place to maintain it:

Tue, Mar 5, 8:59 AM · cloud-services-team (Kanban), Data-Services
jcrespo added a comment to T215445: comment and actor view challenges for Cloud Services.

would include the triggers ?

Tue, Mar 5, 8:46 AM · cloud-services-team (Kanban), Data-Services
jcrespo added a comment to T215445: comment and actor view challenges for Cloud Services.

I would prefer option 3, I do not like triggers, but if we are going to have them, let's have them on the only place there are already there (sanitarium). Also, it is the safest because if we forget to deploy the changes, the view creation will fail and we will notice.

Tue, Mar 5, 8:26 AM · cloud-services-team (Kanban), Data-Services

Mon, Mar 4

jcrespo added a comment to T217073: Clean up orphaned echo_event rows again.

@Catrope One last think, not sure if you are in charge of that, and obviously not a huge priority, but maybe there should be some conversations of changing defaults on notifications, watchlists, etc. for places like wikidata combined with users that are bots with high editing activity, where lots of emails or notifications may never seen. Apologies if this has been brought up in the past or has already been handled. Apologies also if it has nothing to do with this purge and I am mixing things.

Mon, Mar 4, 7:10 PM · Growth-Team (Current Sprint), DBA, WorkType-Maintenance, Notifications
jcrespo added a comment to T217073: Clean up orphaned echo_event rows again.

Thanks to both!

Mon, Mar 4, 6:49 PM · Growth-Team (Current Sprint), DBA, WorkType-Maintenance, Notifications
jcrespo added a comment to T215445: comment and actor view challenges for Cloud Services.

Sorry I didn't answer before, was pinged through SOS, I personally do not have further comment or huge issues with the proposed plan (I mentioned before disadvantages which you are aware, but there is certainly no perfect solution, so it is the best we can do at the moment).

Mon, Mar 4, 5:57 PM · cloud-services-team (Kanban), Data-Services
jcrespo added a comment to T217385: EventBus mediawiki outage 2019-02-28.

May I ask to help completing documentation (when possible, doesn't have to be now) https://wikitech.wikimedia.org/wiki/Incident_documentation/20190228-logstash ? The logstash incident seems bad enough, but (please correct me if I am wrong), these seems more user-fac-y and probably more interesting to end users.

Mon, Mar 4, 5:51 PM · Analytics-Kanban, Analytics, WMF-JobQueue, Discovery, Services, EventBus
jcrespo placed T215611: MediaWiki errors overloading logstash up for grabs.

I am not working on this.

Mon, Mar 4, 4:55 PM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team, Wikimedia-production-error, Wikimedia-Logstash, Operations, MediaWiki-Database, monitoring
jcrespo added a comment to T215611: MediaWiki errors overloading logstash.

Thanks @herron, I would like to know more information about what caused the extra logging, but I didn't find it on the incident report, do you know it, or know someone that does?

Mon, Mar 4, 4:55 PM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team, Wikimedia-production-error, Wikimedia-Logstash, Operations, MediaWiki-Database, monitoring
jcrespo added a comment to T216441: Evaluate transferring the non-replicated tables to the new toolsdb server.

@jcrespo please restore the table s51290__dpl_p.dab_hof ; thanks in advance!

Mon, Mar 4, 4:41 PM · Data-Services, cloud-services-team (Kanban)