Page MenuHomePhabricator

jcrespo (Jaime Crespo)
Sr Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
May 11 2015, 8:31 AM (263 w, 6 d)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF) [ Global Accounts ]

Recent Activity

Fri, May 29

jcrespo added a comment to T224589: Migrate dbmonitor hosts to Buster.

Please don't consider dbmonitor2001 as upgraded- as the application doesn't work after os upgrade.

Fri, May 29, 8:51 AM · Operations
jcrespo updated the task description for T224589: Migrate dbmonitor hosts to Buster.
Fri, May 29, 8:50 AM · Operations

Thu, May 28

jcrespo added a comment to T253808: db1138 (s4 master) crashed due to memory issues.

I talked to @Jclark-ctr on IRC, hw replacement will likely happen on Tuesday next week.

Thu, May 28, 3:22 PM · Wikimedia-Incident, ops-eqiad, Operations, DBA
jcrespo added a comment to T253808: db1138 (s4 master) crashed due to memory issues.

@Johan plan continues as usual- @Jclark-ctr information is unrelated to the user impacting maintenance.

Thu, May 28, 3:21 PM · Wikimedia-Incident, ops-eqiad, Operations, DBA
jcrespo claimed T250602: db1140 (backup source) crashed .
Thu, May 28, 2:20 PM · DC-Ops, ops-eqiad, Operations, DBA
jcrespo added a comment to T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run.

One thing that may be relevant here is that the query may have worked once or twice in the last 6 months due to this underlying issue. Last update that completed successfully was 27 March. In effect, because this bug, this wasn't disabled officially but it was already (for the most part) not-working (but still causing issues before the merge).

Thu, May 28, 11:13 AM · Wikimedia-Incident, Wikimedia-database-error, MediaWiki-Special-pages, Wikidata
jcrespo added a comment to T253736: Package transferpy framework under wmfmariadbpy.

Let me think about it. Things are getting more and more complex, maintaining a lot of (mostly unrelated) stuff in the same repo. How would you see about splitting out transferpy to its own separate repo. Would that make CI (testing and doc generation) as well as packaging easier for you? We can talk about this on today's meeting, but please think if that would simplify development for you- we can totally ask for a repo if that helps you.

Thu, May 28, 10:53 AM · Patch-For-Review, DBA

Wed, May 27

jcrespo added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

host localhost

Wed, May 27, 9:30 AM · Patch-For-Review, Upstream, cloud-services-team (Kanban), DBA
jcrespo added a comment to T250602: db1140 (backup source) crashed .

Hi, @wiki_willy I just want to ping you so your team is aware that the maintenance here didn't complete correctly and that we need more onsite help (I don't need this fast, just making sure it doesn't fall under the radar).

Wed, May 27, 8:18 AM · DC-Ops, ops-eqiad, Operations, DBA

Tue, May 26

jcrespo added a comment to T252492: db2097 memory errors leading to crash.
$ ssh db2097.mgmt
User:root logged-in to ILOMXQ91304KD.(10.193.2.204 / FE80::8230:E0FF:FE3E:F9A2)
iLO Standard 1.40 at  Feb 05 2019
Server Name: 
Server Power: Off

host is down and ready for maintenance @Papaul.

Tue, May 26, 10:21 AM · ops-codfw, Operations, DBA
jcrespo added a comment to T252492: db2097 memory errors leading to crash.

I will be onsite tomorrow

Tue, May 26, 10:09 AM · ops-codfw, Operations, DBA

Mon, May 25

jcrespo added a comment to T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN.

I have created a specific column for tasks related to the GSOC, on the DBA project- I think we should use that to classify it under the DBA tag.

Mon, May 25, 1:56 PM · DBA, Google-Summer-of-Code (2020)
jcrespo moved T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN from Triage to GSOC2020 on the DBA board.
Mon, May 25, 1:49 PM · DBA, Google-Summer-of-Code (2020)
jcrespo edited projects for T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN, added: DBA; removed Patch-For-Review.
Mon, May 25, 1:49 PM · DBA, Google-Summer-of-Code (2020)
jcrespo moved T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly from In progress to GSOC2020 on the DBA board.
Mon, May 25, 1:48 PM · Patch-For-Review, DBA
jcrespo moved T252802: Improve output message readabiliy of transfer.py from In progress to GSOC2020 on the DBA board.
Mon, May 25, 1:48 PM · DBA
jcrespo moved T253219: Add more information to --help option of transfer.py from In progress to GSOC2020 on the DBA board.
Mon, May 25, 1:48 PM · Patch-For-Review, DBA
jcrespo moved T252171: Automate the detection of netcat listen port in transfer.py from Next to GSOC2020 on the DBA board.
Mon, May 25, 1:48 PM · DBA
jcrespo moved T253560: Exception raised when setting trivial, but incorrect parameters to transfer.py from Triage to GSOC2020 on the DBA board.
Mon, May 25, 1:47 PM · Patch-For-Review, DBA
jcrespo added a subtask for T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN: T253560: Exception raised when setting trivial, but incorrect parameters to transfer.py.
Mon, May 25, 1:45 PM · DBA, Google-Summer-of-Code (2020)
jcrespo added a parent task for T253560: Exception raised when setting trivial, but incorrect parameters to transfer.py: T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN.
Mon, May 25, 1:45 PM · Patch-For-Review, DBA
jcrespo created T253560: Exception raised when setting trivial, but incorrect parameters to transfer.py.
Mon, May 25, 1:44 PM · Patch-For-Review, DBA
jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

fuser will actually be more elegant than netstat:

Mon, May 25, 1:19 PM · Patch-For-Review, DBA
jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

Sorry, when I said netcat before, I meant netstat. =:-D

Mon, May 25, 1:16 PM · Patch-For-Review, DBA
jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

Starting to work a bit on PIDs and ports for our library of methods would not be a waste of time, as we may want to reuse it later for concurrency handling and error states, not just the integration test. Of course, the immediate need is the error in the test.

Mon, May 25, 9:16 AM · Patch-For-Review, DBA
jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

I see- our command does some piping- this means that it generates a few subprocesses with a single command. I think the right way would be to use netcat to know which processes are listening on a specific port- something that could be a method within the Firewall class, and then send a kill to the pid obtained from netcat. netcat -tlpn should give us a numeric PID so we don't have to work with commands.

Mon, May 25, 9:04 AM · Patch-For-Review, DBA

Fri, May 22

jcrespo awarded T243051: A query builder for MediaWiki core a Love token.
Fri, May 22, 3:36 PM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), Wikimedia-Rdbms, Core Platform Team Workboards (Clinic Duty Team)

Thu, May 21

jcrespo reassigned T250602: db1140 (backup source) crashed from jcrespo to Jclark-ctr.

I cannot reinstall the server because the remote ipmi interface doesn't work (and the ssh or the https acesses, that are enabled, don't accept my password). It looks like the password wasn't setup correctly after reset, but common default user/password combinations doesn't work either.

Thu, May 21, 3:46 PM · DC-Ops, ops-eqiad, Operations, DBA
jcrespo added a comment to T250602: db1140 (backup source) crashed .

Per my IRC chat with John

Thu, May 21, 1:43 PM · DC-Ops, ops-eqiad, Operations, DBA
jcrespo added a comment to T253219: Add more information to --help option of transfer.py.

How about writing our document with Sphinx?

Thu, May 21, 1:03 PM · Patch-For-Review, DBA
jcrespo added a subtask for T143896: MySQL metrics monitoring: T252761: Degraded performance on parsercache with buster/mariadb upgrade.
Thu, May 21, 10:54 AM · observability, DBA, Patch-For-Review, Operations, Prometheus-metrics-monitoring
jcrespo added a parent task for T252761: Degraded performance on parsercache with buster/mariadb upgrade: T143896: MySQL metrics monitoring.
Thu, May 21, 10:54 AM · DBA
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

As a followup of T161296 we need to research the changes on buster to understand new metrics making scrapping slower, plus if we should enable or disable more metrics for buster.

Thu, May 21, 10:52 AM · DBA
jcrespo closed T161296: Upgrade mysqld_exporter in production, a subtask of T143896: MySQL metrics monitoring, as Resolved.
Thu, May 21, 10:51 AM · observability, DBA, Patch-For-Review, Operations, Prometheus-metrics-monitoring
jcrespo closed T161296: Upgrade mysqld_exporter in production as Resolved.
Thu, May 21, 10:51 AM · Patch-For-Review, DBA, User-fgiunchedi, Operations, Prometheus-metrics-monitoring

Wed, May 20

jcrespo added a comment to T252802: Improve output message readabiliy of transfer.py.

Let's document --verbose flag before closing this ticket.

Wed, May 20, 4:29 PM · DBA
jcrespo updated subscribers of T252802: Improve output message readabiliy of transfer.py.

@Marostegui I think you'll love what transfer.py looks now (not yet in production, but available on HEAD) thanks to @Privacybatm work (no more garbage output):

$ ./transferpy/transfer.py --no-compress --no-encrypt --no-checksum cumin2001.codfw.wmnet:/home/jynus/test_file2 backup1002.eqiad.wmnet:/home/jynus/
ERROR: The final target path /home/jynus/test_file2 already exists on backup1002.eqiad.wmnet.
Wed, May 20, 4:29 PM · DBA
jcrespo added a comment to T252802: Improve output message readabiliy of transfer.py.

Cumin execution details are not useful to the user at any time

Wed, May 20, 4:17 PM · DBA
jcrespo added a comment to T253219: Add more information to --help option of transfer.py.

Very related, the comment T252171#6152787 to make sure before we close a ticket with a new functionality, those are properly documented :-D on wiki and/or --help.

Wed, May 20, 4:10 PM · Patch-For-Review, DBA
jcrespo added a comment to T252171: Automate the detection of netcat listen port in transfer.py.

^updating the hierarchy to reflect that we need to solve T252950 before closing this :-D. In other words, T252171 depends on T252950.

Wed, May 20, 4:07 PM · DBA
jcrespo removed a subtask for T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN: T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.
Wed, May 20, 4:06 PM · DBA, Google-Summer-of-Code (2020)
jcrespo added a subtask for T252171: Automate the detection of netcat listen port in transfer.py: T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.
Wed, May 20, 4:06 PM · DBA
jcrespo edited parent tasks for T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly, added: T252171: Automate the detection of netcat listen port in transfer.py; removed: T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN.
Wed, May 20, 4:06 PM · Patch-For-Review, DBA
jcrespo added a comment to T252171: Automate the detection of netcat listen port in transfer.py.

Aside from solving the issues I mention on the patch, the other thing we should not forget to update is the documentation. This is what it says now:

Wed, May 20, 4:03 PM · DBA
jcrespo added a comment to T252802: Improve output message readabiliy of transfer.py.

I tested this and this can go as is. Question: Should we give more information to stdout? Or when using the verbose mode, or do you think it is ok as it is.

Wed, May 20, 3:56 PM · DBA
jcrespo added a comment to T252492: db2097 memory errors leading to crash.

No rush on our side, just the day before you are going to the DC for this, let us know so I can stop the server 24h in advance.

Wed, May 20, 2:46 PM · ops-codfw, Operations, DBA
jcrespo updated the task description for T252512: Productionize db114[1-9].
Wed, May 20, 11:35 AM · DBA
jcrespo added a comment to T165348: Check long-running screen/tmux sessions.

I don't want to write more here because it is out of topic- I agree with everything you say, but let me go in a different direction:

Wed, May 20, 11:26 AM · Patch-For-Review, observability, Operations
jcrespo created P11244 .my.cnf.
Wed, May 20, 8:41 AM

Tue, May 19

jcrespo added a comment to T164382: Evaluate the need for FORCE INDEX (ls_field_val) [now IGNORE INDEX (ls_log_id)], delete the index hint if not needed anymore.

jcrespo moved this task from Triage to Backlog on the DBA board.

Tue, May 19, 3:59 PM · MediaWiki-Logging, DBA
jcrespo added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

Grants for labsdbuser, which is the default role on both servers for cloud users are also (almost) the same:

$ diff <(mysql.py -h labsdb1010 -e "show grants for labsdbuser" | sort) <(mysql.py -h labsdb1011 -e "show grants for labsdbuser" | sort)
290a291
> GRANT SELECT, SHOW VIEW ON `grwikiimedia\\_p`.* TO 'labsdbuser'
Tue, May 19, 6:35 AM · Patch-For-Review, Upstream, cloud-services-team (Kanban), DBA
jcrespo added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

One thing I can see is that labsdb1011 uses the new mysql authentication format, meaning:

Tue, May 19, 6:25 AM · Patch-For-Review, Upstream, cloud-services-team (Kanban), DBA
jcrespo added a comment to T250361: replace phabricator db passwords with longer passwords.

@mmodell could we schedule a specific date for this, so it is not forgotten? How much time do you need to prepare T146055? Work on our side is not too time consuming, but maybe yours may take more...

Tue, May 19, 5:29 AM · Phabricator, DBA, Operations

Mon, May 18

jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

I've checked and both a manual "kill -9" and a "kill -15" should make the port available almost instanatly, so probably it is not that. Maybe kill_job doesn't work properly, will research it on my testing and report back.

Mon, May 18, 6:13 PM · Patch-For-Review, DBA
jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

The reason behind that is, the remote_executor.kill_job() does not close the port instantly (takes more than 30s in my machine).

Mon, May 18, 6:04 PM · Patch-For-Review, DBA
jcrespo added a comment to T252950: kill_job function in remote execution module of transfer framework does not close the ports instantly.

The remote execution module of this framework has kill_job function and it does not kill/close the port used by the netcat instantly. This ticket is to enquire whether it is the expected behaviour or not? If yes, could you please explain a little bit about it?

Mon, May 18, 5:29 PM · Patch-For-Review, DBA
jcrespo added a comment to T252807: More structured cookbooks to reboot hosts.

Of course im also expecting some historical context to potentially raise its head here.

Mon, May 18, 1:46 PM · Patch-For-Review, SRE-tools, Operations
jcrespo added a comment to T165348: Check long-running screen/tmux sessions.

For context, I was opposed to this being on icinga (NOT the concept itself) because I was worried about icinga spam and pings from other users stressing SREs. I compromised because Daniel improved (in my opinion) the proposal with the added whitelist and the promise that people were going to bee "cool" about them. Whitelist was implemented, "coolness" factor was known years ago, but not documented for newer SREs.

Mon, May 18, 9:14 AM · Patch-For-Review, observability, Operations
jcrespo added a comment to T165348: Check long-running screen/tmux sessions.

I made an amend to the policy:

Mon, May 18, 8:57 AM · Patch-For-Review, observability, Operations
jcrespo added a comment to T165348: Check long-running screen/tmux sessions.

@Dhzan I think documenting how one is supposed to use the WARNINGS (to adopt some of my feedback) and document the general idea of what not to worry about (e.g. screens running on databases) would be my criteria to resolve this. I think that is a reasonable request :-D.

Mon, May 18, 8:47 AM · Patch-For-Review, observability, Operations
jcrespo reopened T165348: Check long-running screen/tmux sessions as "Open".

I said that this is was going to lead to people annoying other people for things that are non impacting, and I agreed to the change because I was sworn that this was only going to be a tool to detect bad patterns, but that SREs were never going to actively ping other people for just having things running for a few hours (it was considered only an issue if it was left like that for months).

Mon, May 18, 8:24 AM · Patch-For-Review, observability, Operations
jcrespo updated the task description for T162070: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases.
Mon, May 18, 7:58 AM · Patch-For-Review, Operations, DBA

Sat, May 16

jcrespo added a comment to T252952: Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag.

{P11212}

Sat, May 16, 9:46 PM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), User-Addshore, Operations, DBA, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata

Fri, May 15

jcrespo closed T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 as Resolved.

Thanks, @Papaul

Fri, May 15, 2:10 PM · DBA, ops-codfw, DC-Ops, Operations
jcrespo added a comment to T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN.

I've added a couple of things to https://wikitech.wikimedia.org/wiki/Transfer.py#Wishlist_and_know_issues as an ideas for later work (we don't have to do everything, these are just ideas for improvement)

Fri, May 15, 1:46 PM · DBA, Google-Summer-of-Code (2020)
jcrespo added a comment to T248256: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN.

This doesn't have to be a task (or it can be, up to you), but I realized that there is very little comments on the main files. For example, there is not a single comment on https://github.com/wikimedia/operations-software-wmfmariadbpy/blob/master/transferpy/Firewall.py

Fri, May 15, 1:41 PM · DBA, Google-Summer-of-Code (2020)
jcrespo added a comment to T252172: Refactor transfer.py.

This can be resolved.

Fri, May 15, 1:34 PM · DBA
jcrespo reopened T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 as "Open".

@Papaul see the FAILED above for db2140, as well as the

Fri, May 15, 10:43 AM · DBA, ops-codfw, DC-Ops, Operations
jcrespo added a comment to T162070: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases.

As per T162070#4942720.

Fri, May 15, 10:20 AM · Patch-For-Review, Operations, DBA
jcrespo updated the task description for T162070: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases.
Fri, May 15, 10:20 AM · Patch-For-Review, Operations, DBA
jcrespo awarded T100501: mysql user and group should be a system user/group a Party Time token.
Fri, May 15, 9:47 AM · Patch-For-Review, Operations, DBA
jcrespo closed T100501: mysql user and group should be a system user/group, a subtask of T168356: Prepare mysql hosts for stretch, as Resolved.
Fri, May 15, 9:47 AM · Patch-For-Review, Operations, DBA
jcrespo closed T100501: mysql user and group should be a system user/group as Resolved.

All mysql users are system users.

Fri, May 15, 9:47 AM · Patch-For-Review, Operations, DBA
jcrespo added a comment to T226840: Consistent HTTP 503 Error on some urls for some logged-in users (CentralAuth Set-Cookie storm).

ping @BBlack to know if you prefer to make temporary workaround permanent or revert as per previous comment so this can be closed.

Fri, May 15, 9:23 AM · Sustainability (Incident Prevention), Core Platform Team, Patch-For-Review, TimedMediaHandler, MW-1.34-notes (1.34.0-wmf.13; 2019-07-09), Performance-Team (Radar), Traffic, MediaWiki-extensions-CentralAuth, Operations
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

There is a difference in replication "performance" (pc1010 is spikier and lower):



But the comparison is not fair for the 10.4 host, as it replicates from an intermediate master and thus it replicates serially due to using a conservative replication config.

Fri, May 15, 8:50 AM · DBA
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

These are the database metrics during the tests (tests were not concurrent between hosts):

Fri, May 15, 7:31 AM · DBA
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

These are my findings:

average latency (ms)		percentile 95 latency (ms)		read requests per second		write requests per second	
VERSION	10.1	10.4	10.1	10.4	10.1	10.4	10.1	10.4
ro, low concurrency	5.23	5.83	5.65	6.38	21384.53	19195.52	0.00	0.00
ro	15.78	15.99	17.06	17.90	56782.46	56027.81	0.00	0.00
mixed rw	19.50	19.34	21.14	21.48	45943.38	46316.48	16408.35	16541.59
rw, high concurrency	4.54	3.48	7.07	4.93	0.00	0.00	141059.31	183871.83
Fri, May 15, 7:15 AM · DBA
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

For memory-only read only traffic, regression seems to be only of 5%, which was around what we expected. Note pc1010 had a 13% average extra latency from client, so it is within the margin of error (test had to be done from network to prevent software version differences).

Fri, May 15, 6:07 AM · DBA

Thu, May 14

jcrespo added a comment to T252807: More structured cookbooks to reboot hosts.

Quick stupid idea - 1) Insert hook after downtime for custom code. 2) Have a configured way to tell which hosts load which class in the hierarchy, be it an abort "this host should never be rebooted", or some other functionality "depool from pybal". 3) Start writing reboot modules for all hosts until complete coverage. 4) Profit!

Thu, May 14, 7:11 PM · Patch-For-Review, SRE-tools, Operations
jcrespo added a comment to T252171: Automate the detection of netcat listen port in transfer.py.

Regarding the ss issue: I was able to reproduce this:

Thu, May 14, 4:13 PM · DBA
jcrespo added a comment to T252172: Refactor transfer.py.

Not directly related to refactoring, but I though this was very interesting for you in general:

Thu, May 14, 4:03 PM · DBA
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

Maybe the sysbench result can give us a better picture of how this is affecting mysql query latency itself (if it is really doing so)

Thu, May 14, 3:25 PM · DBA
jcrespo added a comment to T252761: Degraded performance on parsercache with buster/mariadb upgrade.

From what I see we did io benchmarks. I would like to know if real sql queries are affected, maybe MariaDB, on memory-limited hosts with loose disk consistency (pc) now generate more io (but that is ok, if it means no extra client latency). I proposed to do some sysbench (sql) of write intensive queries and see if there is a difference between pc hosts with 10.1 and those with 10.4. If it is a metrics/db behaviour change but doesn't really impact queries, we can ignore it (resolve).

Thu, May 14, 2:07 PM · DBA
jcrespo added a comment to T252172: Refactor transfer.py.

I don't know I understood this correctly.

Thu, May 14, 1:16 PM · DBA
jcrespo triaged T252172: Refactor transfer.py as Medium priority.

Great job here! I looked at every line of the change, and tested it on several runs and it worked nicely. This change, I think, will make further development much easier. You did a lot of work on refactoring- I liked the way you resolved the dependency inversion on the subclasses. I merged as is.

Thu, May 14, 9:50 AM · DBA

Wed, May 13

jcrespo placed T244884: Implement logic to be able to perform full and incremental backups of ES hosts up for grabs.
Wed, May 13, 5:38 PM · Patch-For-Review, Operations, DBA
jcrespo closed T252698: Lost password for wikimedia-vn mailling list as Resolved.

Happy to be helpful. Have a nice day!

Wed, May 13, 5:27 PM · Operations, Wikimedia-Mailing-lists
jcrespo added a comment to T252698: Lost password for wikimedia-vn mailling list.

I've reset it already, let me know when you receive it and change the password to something else (mail is not a very secure method of sending passwords).

Wed, May 13, 5:20 PM · Operations, Wikimedia-Mailing-lists
jcrespo claimed T252698: Lost password for wikimedia-vn mailling list.
Wed, May 13, 5:05 PM · Operations, Wikimedia-Mailing-lists
jcrespo added a comment to T252698: Lost password for wikimedia-vn mailling list.

no one willing to handle this list

Wed, May 13, 5:05 PM · Operations, Wikimedia-Mailing-lists
jcrespo added a comment to T252698: Lost password for wikimedia-vn mailling list.

Hi, @minhhuy you still have control of the email account associated with that list, right? I can force a password reset for you.

Wed, May 13, 4:58 PM · Operations, Wikimedia-Mailing-lists
jcrespo added a comment to T119173: RFC: Discourage use of MySQL's ENUM type.

Independently of the "strength", I think it could be missunderstood, the same way now many people think "all primary keys should be autoincremental integers" instead of "if there is no good options for a PK, just add a new autoinc".

Wed, May 13, 4:51 PM · Performance-Team (Radar), TechCom-RFC, MediaWiki-General, DBA
jcrespo awarded T247728: Events set to SLAVESIDE_DISABLED when upgrading from 10.1 to 10.4 a Orange Medal token.
Wed, May 13, 4:33 PM · Upstream, DBA
jcrespo added a comment to T250666: Upgrade WMF database-and-backup-related hosts to buster.

@Marostegui we had a replication breakage from m1 master (10.1) to db2078 (10.4). T251222#6133759

Wed, May 13, 4:32 PM · Epic, DBA
jcrespo added a comment to T251222: Upgrade LibreNMS to 1.63.
Error 'Row size too large. The maximum row size for the used table type, not counting BLOBs, is 8126. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs' on query. Default database: 'librenms'. Query: 'alter table `ports` add `ifSpeed_prev` bigint null after `ifSpeed`, add `ifHighSpeed_prev` int null after `ifHighSpeed`'
Wed, May 13, 3:05 PM · User-fgiunchedi, observability, Operations, netops
jcrespo reopened T251222: Upgrade LibreNMS to 1.63 as "Open".
[14:57] <icinga-wm> PROBLEM - MariaDB Slave SQL: m1 on db2078 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1118, Errmsg: Error Row size too large. The maximum row size for the used table type, not counting BLOBs, is 8126. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs on query. Default database: librenms. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshoo
[14:57] <icinga-wm> a_slave
Wed, May 13, 3:02 PM · User-fgiunchedi, observability, Operations, netops
jcrespo triaged T252679: Move paging from individual databases to database service "groups" as Medium priority.
Wed, May 13, 2:53 PM · Epic, observability, DBA
jcrespo created T252679: Move paging from individual databases to database service "groups".
Wed, May 13, 2:53 PM · Epic, observability, DBA
jcrespo closed T177778: Improve database application performance monitoring visibility, a subtask of T143896: MySQL metrics monitoring, as Resolved.
Wed, May 13, 2:46 PM · observability, DBA, Patch-For-Review, Operations, Prometheus-metrics-monitoring
jcrespo closed T177778: Improve database application performance monitoring visibility, a subtask of T172492: Improve database alerting (tracking), as Resolved.
Wed, May 13, 2:46 PM · Epic, observability, DBA