Page MenuHomePhabricator

zhuyifei1999
*Not* Serious business title.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 13 2014, 10:19 AM (231 w, 2 d)
Availability
Available
IRC Nick
zhuyifei1999
LDAP User
Zhuyifei1999
MediaWiki User
Zhuyifei1999 [ Global Accounts ]

There is currently no text in this page. You can search for this page title in other pages, or search the related logs.

Recent Activity

Sat, Mar 16

zhuyifei1999 added a comment to T218468: Quarry is down with 502 Bad Gateway message.

Could it be related to T217280?

Sat, Mar 16, 1:38 PM · Quarry

Fri, Mar 15

zhuyifei1999 added a comment to T217908: Don't update pywikibot directly from master but from last published tag.

$(git tag | tail -1)

Fri, Mar 15, 8:33 PM · Pywikibot, PAWS

Fri, Mar 8

zhuyifei1999 added a comment to T216840: Could not move linter_counts_dump tool to a continuous job .

It showed the error Permission denied (publickey, hostbased).

Fri, Mar 8, 7:44 PM · Kubernetes, Toolforge
zhuyifei1999 added a comment to T217838: Toolforge Stretch - Increased LDAP utilization.

Now stracing journald to see if that is indeed asking for groups all the time.

Fri, Mar 8, 1:40 AM · LDAP, Toolforge
zhuyifei1999 added a comment to T217838: Toolforge Stretch - Increased LDAP utilization.
root@tools-sgeexec-0914:~# strace -s 1024 -p 543 -p 560 -p 561 -p 562 -p 563 -p 564 -p 568 -p 569 -p 570 -p 571 -p 22727 -p 14963 -t 2>&1 | grep nslcd | while read t; do TID=$(echo "$t" | awk '{print $2}' | tr -d ']'); kill -STOP $TID; lsof +E -aUp 543; kill -CONT $TID; break; done
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
systemd       1 root   49u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
systemd-j   228 root    6u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
nscd        543 root    5u  unix 0xffff88c275681400      0t0    14432 type=DGRAM ->INO=1649 228,systemd-j,6u 1,systemd,49u
nscd        543 root   14u  unix 0xffff88c2735eb800      0t0    14453 /var/run/nscd/socket type=STREAM
nscd        543 root   16u  unix 0xffff88c244448800      0t0 14111659 /var/run/nscd/socket type=STREAM ->INO=14122005 17658,sudo,10u
nscd        543 root   17u  unix 0xffff88c1a902bc00      0t0 14120443 type=STREAM
sudo      17658 root   10u  unix 0xffff88c224571c00      0t0 14122005 type=STREAM ->INO=14111659 543,nscd,16u
^C
root@tools-sgeexec-0914:~# strace -s 1024 -p 543 -p 560 -p 561 -p 562 -p 563 -p 564 -p 568 -p 569 -p 570 -p 571 -p 22727 -p 14963 -t 2>&1 | grep nslcd | while read t; do TID=$(echo "$t" | awk '{print $2}' | tr -d ']'); kill -STOP $TID; lsof +E -aUp 543; kill -CONT $TID; break; done
COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
systemd     1 root   49u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
systemd-j 228 root    6u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
nscd      543 root    5u  unix 0xffff88c275681400      0t0    14432 type=DGRAM ->INO=1649 228,systemd-j,6u 1,systemd,49u
nscd      543 root   14u  unix 0xffff88c2735eb800      0t0    14453 /var/run/nscd/socket type=STREAM
nscd      543 root   16u  unix 0xffff88c27433ac00      0t0 14128549 type=STREAM
^C
Fri, Mar 8, 1:31 AM · LDAP, Toolforge
zhuyifei1999 added a comment to T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients.
root@tools-sgeexec-0914:~# strace -s 1024 -p 543 -p 560 -p 561 -p 562 -p 563 -p 564 -p 568 -p 569 -p 570 -p 571 -p 22727 -p 14963 -t 2>&1 | grep nslcd | while read t; do TID=$(echo "$t" | awk '{print $2}' | tr -d ']'); kill -STOP $TID; lsof +E -aUp 543; kill -CONT $TID; break; done
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
systemd       1 root   49u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
systemd-j   228 root    6u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
nscd        543 root    5u  unix 0xffff88c275681400      0t0    14432 type=DGRAM ->INO=1649 228,systemd-j,6u 1,systemd,49u
nscd        543 root   14u  unix 0xffff88c2735eb800      0t0    14453 /var/run/nscd/socket type=STREAM
nscd        543 root   16u  unix 0xffff88c244448800      0t0 14111659 /var/run/nscd/socket type=STREAM ->INO=14122005 17658,sudo,10u
nscd        543 root   17u  unix 0xffff88c1a902bc00      0t0 14120443 type=STREAM
sudo      17658 root   10u  unix 0xffff88c224571c00      0t0 14122005 type=STREAM ->INO=14111659 543,nscd,16u
^C
root@tools-sgeexec-0914:~# strace -s 1024 -p 543 -p 560 -p 561 -p 562 -p 563 -p 564 -p 568 -p 569 -p 570 -p 571 -p 22727 -p 14963 -t 2>&1 | grep nslcd | while read t; do TID=$(echo "$t" | awk '{print $2}' | tr -d ']'); kill -STOP $TID; lsof +E -aUp 543; kill -CONT $TID; break; done
COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
systemd     1 root   49u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
systemd-j 228 root    6u  unix 0xffff88c271f91000      0t0     1649 /run/systemd/journal/dev-log type=DGRAM ->INO=14432 543,nscd,5u
nscd      543 root    5u  unix 0xffff88c275681400      0t0    14432 type=DGRAM ->INO=1649 228,systemd-j,6u 1,systemd,49u
nscd      543 root   14u  unix 0xffff88c2735eb800      0t0    14453 /var/run/nscd/socket type=STREAM
nscd      543 root   16u  unix 0xffff88c27433ac00      0t0 14128549 type=STREAM
^C
Fri, Mar 8, 1:29 AM · Patch-For-Review, Operations, Cloud-VPS, LDAP, Toolforge
zhuyifei1999 added a comment to T217838: Toolforge Stretch - Increased LDAP utilization.

These group requests to nscd accounts for a minority of requests to nscd.

Fri, Mar 8, 12:57 AM · LDAP, Toolforge
zhuyifei1999 added a comment to T217838: Toolforge Stretch - Increased LDAP utilization.

I'm checking tools-sgeexec-0914 (which according to T217280#5008363 is the worst host) with my hammer, so:

root@tools-sgeexec-0914:~# strace -p 694 -p 726 -p 727 -p 728 -p 729 -p 730 -s 1024 -e getsockopt -t
strace: Process 694 attached
[...]
strace: Process 730 attached
[pid   728] 00:45:04 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   726] 00:45:11 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=697, uid=600, gid=600}, [12]) = 0
[pid   729] 00:45:11 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=11136, uid=600, gid=600}, [12]) = 0
[pid   728] 00:45:16 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=11136, uid=600, gid=600}, [12]) = 0
[pid   729] 00:45:18 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:45:18 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:45:18 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:45:18 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:45:57 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:46:01 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   728] 00:46:04 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   730] 00:46:12 getsockopt(10, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:46:19 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
[pid   729] 00:47:04 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=543, uid=0, gid=0}, [12]) = 0
^Cstrace: Process 694 detached
[...]
strace: Process 730 detached
root@tools-sgeexec-0914:~# ps u 543
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       543  0.0  0.0 1536700 4340 ?        Ssl  Feb25   0:49 /usr/sbin/nscd

nscd accounts for the vast majority of requests to nslcd.

Fri, Mar 8, 12:54 AM · LDAP, Toolforge
zhuyifei1999 created P8169 (An Untitled Masterwork).
Fri, Mar 8, 12:54 AM

Thu, Mar 7

zhuyifei1999 added a comment to T217406: Stretch grid problem: cannot migrate tomcat webservice.
tools.zhuyifei1999-test@tools-sgebastion-08:~$ setup-tomcat
Setting up your public_tomcat directory...
All done.
You can edit the configuration in /data/project/zhuyifei1999-test/public_tomcat/conf/server.xml as needed.
tools.zhuyifei1999-test@tools-sgebastion-08:~$ tree public_tomcat
public_tomcat
├── bin
│   ├── setenv.sh
│   ├── shutdown.sh
│   └── startup.sh
├── conf
│   ├── catalina.properties
│   ├── context.xml
│   ├── jaspic-providers.xml
│   ├── logging.properties
│   ├── server.xml
│   ├── tomcat-users.xml
│   └── web.xml
├── logs
├── temp
├── webapps
└── work
Thu, Mar 7, 12:44 AM · Patch-For-Review, cloud-services-team, Toolforge

Tue, Mar 5

zhuyifei1999 added a comment to T217639: Should novices patches run full tests?.

Historically, CI was insecure. The ability to run full tests means that you can run arbitrary code on it, and historically CI test runners were not as 'isolated' as they currently are, so you had all sorts of opportunities to hijack the CI servers... a whitelist was needed. This was worked on in the CI isolation project, and now we do have nice isolation. I don't know why the whitelist was not removed, considering it was part of the long term plan. (perhaps to prevent, say DOS attacks?)

Tue, Mar 5, 7:05 PM · Pywikibot-RfCs, Pywikibot

Sun, Mar 3

zhuyifei1999 added a project to T217501: Page banner of Wikivoyage can not automatically convert the problem of tranditional or simplified Chinese: Wikidata-Page-Banner.
Sun, Mar 3, 6:02 AM · Language-Team, Readers-Web-Backlog (Tracking), Reading-Web-Local-Wiki-Issues, Wikidata-Page-Banner, Chinese-Sites

Sat, Mar 2

zhuyifei1999 closed T114734: [Spike] Title and TOC not converted for Wikidata page banner language variants as Resolved.

No longer reproduceable. TOC stays not converted but the tags are stripped

Sat, Mar 2, 5:35 PM · Readers-Web-Backlog (Tracking), I18n, Need-volunteer, Spike, Chinese-Sites, MW-1.27-release (WMF-deploy-2015-10-13_(1.27.0-wmf.3)), MediaWiki-Language-converter, Wikidata-Page-Banner, Wikidata

Thu, Feb 28

zhuyifei1999 renamed T217297: Manual page of jsub is unclear regarding what -once means from Manual page of jsub has two defaults, but only one can be default to Manual page of jsub is unclear regarding what -once means.
Thu, Feb 28, 7:37 PM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 closed T217297: Manual page of jsub is unclear regarding what -once means as Resolved.
07:35:31 0 ✓ zhuyifei1999@tools-bastion-02: ~$ man jsub | grep once
       -once  Only start one job with that name, fail if another job with the same name is already started or queued (default  if  invoked
07:35:38 0 ✓ zhuyifei1999@tools-bastion-02: ~$ jsub --help | grep once
  -once         Only start one job, fail if another job with the same name is
Thu, Feb 28, 7:37 PM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 added a comment to T217297: Manual page of jsub is unclear regarding what -once means.

How was this building before? Now tests fail...

## ------------------------- ##
## toollabs 1.36 test suite. ##
## ------------------------- ##
  1: Normal call                                     FAILED (testsuite.at:64)
  2: Quiet call                                      FAILED (testsuite.at:68)
  3: -o points to a non-existing file                FAILED (testsuite.at:74)
  4: -o points to a existing file                    FAILED (testsuite.at:84)
  5: -o points to a non-existing file and -umask is used FAILED (testsuite.at:92)
  6: -o points to a existing file and -umask is used FAILED (testsuite.at:102)
  7: -o points to a existing directory               FAILED (testsuite.at:111)
  8: .jsubrc is honoured                             FAILED (testsuite.at:120)
  9: .jsubrc options are overwritten by command line arguments FAILED (testsuite.at:133)
 10: -l is exploded                                  FAILED (testsuite.at:144)
 11: -l h_vmem is processed                          FAILED (testsuite.at:148)
 12: -l largest wins (virtual_free)                  FAILED (testsuite.at:152)
 13: -l largest wins (h_vmem)                        FAILED (testsuite.at:156)
 14: -l largest wins (default)                       FAILED (testsuite.at:160)
Thu, Feb 28, 7:07 PM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 added a comment to T217297: Manual page of jsub is unclear regarding what -once means.

since continous or not is sure mutually exclusive.

Thu, Feb 28, 2:42 PM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 added a comment to T217297: Manual page of jsub is unclear regarding what -once means.

I also failed to find the command that submits the job in tools.persondata's crontab.

Thu, Feb 28, 7:54 AM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 added a comment to T217297: Manual page of jsub is unclear regarding what -once means.

My job had "Task / Running".

Thu, Feb 28, 7:45 AM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 added a comment to T217297: Manual page of jsub is unclear regarding what -once means.

Could you explain your reasoning on why you doubt that? Maybe we can clarify the docs a bit.

Thu, Feb 28, 12:00 AM · Patch-For-Review, Toolforge, Documentation

Wed, Feb 27

zhuyifei1999 added a comment to T217297: Manual page of jsub is unclear regarding what -once means.

So, both -once and -continuous seems to be default when I start a job with jstart.

Wed, Feb 27, 11:59 PM · Patch-For-Review, Toolforge, Documentation
zhuyifei1999 added a comment to T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients.
08:04:37 0 ✓ zhuyifei1999@tools-sgebastion-07: ~$ getent group tools.dexbot
tools.dexbot:*:51100:ladsgroup
08:04:41 0 ✓ zhuyifei1999@tools-sgebastion-07: ~$ sudo become dexbot
tools.dexbot@tools-sgebastion-07:~$
Wed, Feb 27, 8:05 PM · Patch-For-Review, Operations, Cloud-VPS, LDAP, Toolforge
zhuyifei1999 removed a project from T217246: cronjob error mails: Cloud-VPS.
Wed, Feb 27, 5:11 PM · Tools
zhuyifei1999 added a comment to T217246: cronjob error mails.

You have a crontab installed on a wrong host:

$ diff <(ssh tools-sgeexec-0940.tools.eqiad.wmflabs 'sudo cat /var/spool/cron/crontabs/tools.genedb') <(ssh tools-sgecron-01.tools.eqiad.wmflabs 'sudo cat /var/spool/cron/crontabs/tools.genedb')
2c2
< # (/data/project/genedb/cron.tab installed on Wed Feb 27 12:49:24 2019)
---
> # (- installed on Wed Feb 27 12:49:54 2019)
Wed, Feb 27, 5:11 PM · Tools
zhuyifei1999 added a comment to T176027: Tools with "_" in their name or names longer than 63 characters do not get Kubernetes namespaces created.
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: Homedir already exists for /data/project/whichsub
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: Wrote config in /data/project/whichsub/.kube/config
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: (b'namespace "whichsub" created\n', b'')
Feb 27 16:26:48 tools-k8s-master-01 maintain-kubeusers[16561]: Provisioned creds for tool whichsub
Wed, Feb 27, 4:30 PM · Kubernetes, Toolforge
zhuyifei1999 added a comment to T176027: Tools with "_" in their name or names longer than 63 characters do not get Kubernetes namespaces created.
04:21:44 0 ✓ zhuyifei1999@tools-k8s-master-01: ~$ sudo rm ~tools.whichsub/.kube/config ~tools.permission-denied-test/.kube/config
rm: cannot remove ‘/data/project/permission-denied-test/.kube/config’: Operation not permitted
Wed, Feb 27, 4:24 PM · Kubernetes, Toolforge
zhuyifei1999 added a comment to T217152: Monitor and scale in the Trusty grid.

Nobody is using this.

Wed, Feb 27, 7:08 AM · Cloud-VPS (Ubuntu Trusty Deprecation), Toolforge, cloud-services-team (Kanban)

Tue, Feb 26

zhuyifei1999 added a comment to T134495: Create a "my first Pywikibot bot" tutorial for Toolforge.

both pretty good

Tue, Feb 26, 10:27 PM · User-srodlund, Pywikibot, Pywikibot-Documentation, Toolforge, Community-Tech-Tool-Labs, Documentation

Mon, Feb 25

zhuyifei1999 added a comment to T217086: Investigate why the new Son of Grid Engine grid landed in a worse state when NFS was filled than the old Sun Grid Engine grid did.

The console log of 0905 was unhelpful. From last boot to my reboot, the only thing on there was this (just failing jobs, looks like which isn't abnormal):

Mon, Feb 25, 10:22 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T217086: Investigate why the new Son of Grid Engine grid landed in a worse state when NFS was filled than the old Sun Grid Engine grid did.
Feb 25 10:46:57 tools-sgeexec-0905 puppet-agent[32403]: (/Stage[main]/Role::Labs::Nfsclient/Labstore::Nfs_mount[tools-home-on-labstore-secondary]/Exec[create-/mnt/nfs/labstore-secondary-tools-home]/returns) /bin/mkdir: cannot create directory ‘/mnt/nfs/labstore-secondary-tools-home’: File exists
Mon, Feb 25, 9:45 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T217086: Investigate why the new Son of Grid Engine grid landed in a worse state when NFS was filled than the old Sun Grid Engine grid did.

The HBA doesn't affect user logins, just grid functioning. Also, it doesn't need to reconstruct it on every puppet run for it to continue working.

Mon, Feb 25, 9:35 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T217086: Investigate why the new Son of Grid Engine grid landed in a worse state when NFS was filled than the old Sun Grid Engine grid did.
Feb 25 19:11:10 tools-sgeexec-0904 puppet-agent[29329]: (/Stage[main]/Profile::Toolforge::Grid::Hba/Exec[make-access]/onlyif) Check "/usr/bin/test -n \"$(/usr/bin/find /data/project/.system_sge/store -maxdepth 1 \\( -type d -or -type f -name submithost-\\* \\) -newer /etc/project.access)\" -o ! -s /etc/project.access" exceeded timeout
Feb 25 10:33:51 tools-sgeexec-0904 kernel: [3042432.250199] nfs: server nfs-tools-project.svc.eqiad.wmnet not responding, still trying
Feb 25 10:33:52 tools-sgeexec-0904 kernel: [3042433.498050] nfs: server nfs-tools-project.svc.eqiad.wmnet not responding, timed out
Feb 25 10:33:52 tools-sgeexec-0904 kernel: [3042433.498073] nfs: server nfs-tools-project.svc.eqiad.wmnet not responding, still trying
Mon, Feb 25, 9:21 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T216988: labstore1004 - DISK CRITICAL - free space: /srv/tools 115904 MB (1% inode=79%):.

An older ticket on disk space: T206239: 2018-10-04: tools and NFS share cleanup (high usage)

Mon, Feb 25, 9:24 AM · cloud-services-team (Kanban), Data-Services

Sun, Feb 24

zhuyifei1999 awarded T216340: Raise memory limit for copyvios tool's k8s webservice a Like token.
Sun, Feb 24, 7:32 AM · Toolforge

Sat, Feb 23

zhuyifei1999 renamed T202825: Install a PHP profiling extension for k8s and stretch grid webservices in Toolforge from Install a PHP profiling extension for k8s webservices stretch grid in Toolforge to Install a PHP profiling extension for k8s and stretch grid webservices in Toolforge.
Sat, Feb 23, 6:35 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge

Thu, Feb 21

zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.

Is https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Flask_OAuth_tool a bad tutorial in your opinion, or just not relevant to your particular problem?

Thu, Feb 21, 7:35 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 added a comment to T216268: Custom ruby interpreter compiled on Trusty can't find libssl.so.1.0.0 on Stretch.

A vanilla systemd tmp.mount uses Options=mode=1777,strictatime,nosuid,nodev. I agree that noexec is excessive here.

Thu, Feb 21, 7:13 PM · Tools, Toolforge
zhuyifei1999 added a comment to T216268: Custom ruby interpreter compiled on Trusty can't find libssl.so.1.0.0 on Stretch.

The second is that notice about /tmp and noexec. I saw that when testing some venv things a few days ago as well. I we just looking at tools-sgebastion-07 and that mount restriction does not seem to be active there? I can't remember what host I was on when I got slightly tripped up by it.

Thu, Feb 21, 7:07 PM · Tools, Toolforge
zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.

The fix was https://github.com/PyMySQL/PyMySQL/commit/0e01158fb8a204144c5adddde983bea2b3e4ff93, part of v0.7.11 release.

Thu, Feb 21, 6:42 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.

If close happened due to 'MySQL server has gone away', we should have received a OperationalError.

Thu, Feb 21, 6:40 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.

If close happened due to 'MySQL server has gone away', we should have received a OperationalError.

Thu, Feb 21, 6:35 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.
def _write_bytes(self, data):
    self._sock.settimeout(self._write_timeout)
    try:
        self._sock.sendall(data)
    except IOError as e:
        self._force_close()
        raise err.OperationalError(
            CR.CR_SERVER_GONE_ERROR,
            "MySQL server has gone away (%r)" % (e,))
Thu, Feb 21, 6:34 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.
def close(self):
    """Send the quit message and close the socket"""
    if self._closed:
        raise err.Error("Already closed")
    self._closed = True
    if self._sock is None:
        return
    send_data = struct.pack('<iB', 1, COMMAND.COM_QUIT)
    try:
        self._write_bytes(send_data)
    except Exception:
        pass
    finally:
        sock = self._sock
        self._sock = None
        self._rfile = None
        sock.close()

sock.close() complains about sock being None, sock = self._sock, so self._sock must have been None, and self._sock was checked in if self._sock is None: so self._sock must not have been None at that time. So this status muct have been changed either in another thread (unlikely) or in self._write_bytes.

Thu, Feb 21, 6:32 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 added a comment to T216741: MySQL page generator throws error on sock.close() on toolforge.

Which host (or via what means; bastion/grid/k8s; trusty/jessie/stretch) is the script executed on?

Thu, Feb 21, 5:44 PM · Pywikibot-pagegenerators.py, Pywikibot, Toolforge
zhuyifei1999 closed T190884: Replicate toolforge 'webservice' setup in toolsbeta as Resolved.

Toolsbeta ha been changed so much since my last comment :)

Thu, Feb 21, 12:00 AM · Toolforge
zhuyifei1999 closed T190884: Replicate toolforge 'webservice' setup in toolsbeta, a subtask of T175768: Improvements for the Toolforge 'webservice' command, as Resolved.
Thu, Feb 21, 12:00 AM · Toolforge

Wed, Feb 20

zhuyifei1999 added a comment to T216581: Letter g cut off at bottom in #title in Chrome based browsers.

This seems relevant to Google Chrome (and possibly other Chromium-based). Reproduced on Google Chrome 72.0.3626.109 (too lazy to compile Chromium just for this ticket) on Gentoo. Does not appear on Firefox 65.0.1 on Gentoo.

Wed, Feb 20, 4:19 PM · Browser-Support-Google-Chrome, Patch-For-Review, Quarry

Tue, Feb 19

zhuyifei1999 added a comment to T61793: Activate Extension:DeleteBatch and Extension:UndeleteBatch on Wikimedia Commons.

For emergency clean ups I would say it would make sense to override the
bot-flag requirement. You are checking every undeletion you make anyways.
It makes little difference, imo, whether the list of pages is given to a
client script (bot) or a mediawiki extension.

Tue, Feb 19, 6:15 PM · Commons, Wikimedia-extension-review-queue, Wikimedia-Extension-setup
zhuyifei1999 added a comment to T61793: Activate Extension:DeleteBatch and Extension:UndeleteBatch on Wikimedia Commons.

One way: be a robo-admin yourself. This is how I dealt with INC's mass
deleting spree before he left.

Tue, Feb 19, 5:03 PM · Commons, Wikimedia-extension-review-queue, Wikimedia-Extension-setup

Feb 17 2019

zhuyifei1999 added a project to T216370: IP address list for grid nodes / Freenode iline request: wikimedia-irc-freenode.
Feb 17 2019, 6:40 PM · cloud-services-team (Kanban), wikimedia-irc-freenode, Toolforge
zhuyifei1999 added a comment to T216340: Raise memory limit for copyvios tool's k8s webservice.

Another way would be to use some libraries to collect statistics on what objects are currently allocated. https://stackoverflow.com/q/1435415 has some examples for such libraries. I've personally used https://pypi.org/project/mem_top/ once or twice but last time I tried it only works for Python 2. Note though, that mem_top library does an implicit invoke to gc, so if the issue just disappears after using such libraries then it could be possible that it is gc being too infrequent for some obscure reason. (I encounter this issue once outside Toolforge. Solution? Manually invoke gc periodically *facepalm*. Honestly, even after reading Python's gc documentation twice I still don't understand when gc is invoked automatically.)

Feb 17 2019, 6:25 PM · Toolforge
zhuyifei1999 added a comment to T216340: Raise memory limit for copyvios tool's k8s webservice.

Hmm the dmesg is a bit confusing (redacted all kernel addresses because of kaslr):

1[Feb17 17:22] uwsgi invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=968
2[ +0.000008] uwsgi cpuset=2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73 mems_allowed=0
3[ +0.000015] CPU: 3 PID: 13407 Comm: uwsgi Not tainted 4.9.0-0.bpo.6-amd64 #1 Debian 4.9.88-1+deb9u1~bpo8+1
4[ +0.000001] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1~cloud0 04/01/2014
5[ +0.000002] 0000000000000000 ffffffffREDACTED ffff----REDACTED ffff----REDACTED
6[ +0.000003] ffffffffREDACTED 0000000000000000 00000000000003c8 ffff----REDACTED
7[ +0.000002] ffff----REDACTED 0000000000000000 ffff----REDACTED ffffffffREDACTED
8[ +0.000002] Call Trace:
9[ +0.000034] [<ffffffffREDACTED>] ? dump_stack+0x5a/0x6f
10[ +0.000013] [<ffffffffREDACTED>] ? dump_header+0x85/0x212
11[ +0.000003] [<ffffffffREDACTED>] ? mem_cgroup_scan_tasks+0xc7/0xe0
12[ +0.000011] [<ffffffffREDACTED>] ? oom_kill_process+0x228/0x3e0
13[ +0.000002] [<ffffffffREDACTED>] ? out_of_memory+0x10c/0x4b0
14[ +0.000004] [<ffffffffREDACTED>] ? mem_cgroup_out_of_memory+0x49/0x80
15[ +0.000002] [<ffffffffREDACTED>] ? mem_cgroup_oom_synchronize+0x2f5/0x320
16[ +0.000002] [<ffffffffREDACTED>] ? mem_cgroup_oom_unregister_event+0x80/0x80
17[ +0.000002] [<ffffffffREDACTED>] ? pagefault_out_of_memory+0x2f/0x80
18[ +0.000010] [<ffffffffREDACTED>] ? __do_page_fault+0x4a2/0x500
19[ +0.000017] [<ffffffffREDACTED>] ? async_page_fault+0x28/0x30
20[ +0.000001] Task in /docker/2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73 killed as a result of limit of /docker/2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73
21[ +0.000005] memory: usage 2097152kB, limit 2097152kB, failcnt 33362622
22[ +0.000001] memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
23[ +0.000001] kmem: usage 14028kB, limit 9007199254740988kB, failcnt 0
24[ +0.000000] Memory cgroup stats for /docker/2d430296e404040fdb8ff360d8c2f32cd507d491c83c1105bb4905b829b74e73: cache:6124KB rss:2077000KB rss_huge:0KB mapped_file:3384KB dirty:0KB writeback:0KB swap:0KB inactive_anon:520404KB active_anon:1557796KB inactive_file:2560KB active_file:2364KB unevictable:0KB
25[ +0.000011] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
26[ +0.001012] [13407] 51330 13407 47201 13993 84 3 0 968 uwsgi
27[ +0.000210] [15254] 51330 15254 6050 401 16 3 0 968 bash
28[ +0.001017] [ 1406] 51330 1406 251811 84691 301 4 0 968 uwsgi
29[ +0.000002] [ 7203] 51330 7203 234791 74714 279 4 0 968 uwsgi
30[ +0.000001] [21119] 51330 21119 220137 48255 215 4 0 968 uwsgi
31[ +0.000003] [21208] 51330 21208 219869 50895 253 4 0 968 uwsgi
32[ +0.000002] [17220] 51330 17220 219712 54137 233 4 0 968 uwsgi
33[ +0.000001] [13168] 51330 13168 257712 94628 323 4 0 968 uwsgi
34[ +0.000009] [21822] 51330 21822 235716 67312 252 4 0 968 uwsgi
35[ +0.000002] [ 7984] 51330 7984 228533 52367 219 4 0 968 uwsgi
36[ +0.000012] [ 9625] 51330 9625 1461 21 8 3 0 968 tail
37[ +0.000003] Memory cgroup out of memory: Kill process 13168 (uwsgi) score 1148 or sacrifice child
38[ +0.010237] Killed process 13168 (uwsgi) total-vm:1030848kB, anon-rss:376280kB, file-rss:2108kB, shmem-rss:124kB

Feb 17 2019, 6:08 PM · Toolforge
zhuyifei1999 created P8101 (An Untitled Masterwork).
Feb 17 2019, 5:39 PM
zhuyifei1999 added a comment to T216340: Raise memory limit for copyvios tool's k8s webservice.

Is there any evidence that the tool is running out of memory? Grid counts memory by virtual size (which is IMO not sane), but though I can't google any relevant information, I'm inclined to think k8s calculates memory by resident set size (which is IMO much saner) and you are much less likely to run out of it, even with a lower threshold. <rant>This is why people say 'I don't need so much memory if I run it at home but why do I have to specify such large number in -mem'.</rant>

Feb 17 2019, 3:33 AM · Toolforge

Feb 16 2019

zhuyifei1999 added a comment to T216320: Cannot connect to toolsdb.

This is an ongoing outage T216208: ToolsDB overload and cleanup

Feb 16 2019, 3:49 PM · Toolforge
zhuyifei1999 created P8097 (An Untitled Masterwork).
Feb 16 2019, 4:56 AM
zhuyifei1999 created P8096 (An Untitled Masterwork).
Feb 16 2019, 4:47 AM
zhuyifei1999 created P8095 (An Untitled Masterwork).
Feb 16 2019, 4:16 AM

Feb 13 2019

zhuyifei1999 added a comment to T216042: qstat from login-stretch.tools.wmflabs.org fails.

Done this in shadow also:

Feb 13 2019, 3:20 PM · Toolforge
zhuyifei1999 closed T216042: qstat from login-stretch.tools.wmflabs.org fails as Resolved.
03:14:29 1 ✗ zhuyifei1999@tools-sgegrid-master: ~$ sudo /usr/local/bin/grid-configurator --all-domains --observer-pass $(grep OS_PASSWORD /etc/novaobserver.yaml|awk '{gsub(/"/,"",$2);print $2}')
tools-sgebastion-07.tools.eqiad.wmflabs added to submit host list
root@tools-sgegrid-master.tools.eqiad.wmflabs modified "webgrid-lighttpd" in cluster queue list
root@tools-sgegrid-master.tools.eqiad.wmflabs modified "webgrid-generic" in cluster queue list
Feb 13 2019, 3:18 PM · Toolforge
zhuyifei1999 triaged T216042: qstat from login-stretch.tools.wmflabs.org fails as High priority.
Feb 13 2019, 2:56 PM · Toolforge

Feb 10 2019

zhuyifei1999 added a comment to T215712: Stretch grid problem: No entries in Grid status.

Apparently, it doesn't have any idea about the new grid. or even that it exists...

Feb 10 2019, 3:26 AM · cloud-services-team, Toolforge

Feb 8 2019

zhuyifei1999 added a comment to T215617: Toolforge: Re-evaluate root and user SSH access to nodes.

Toolforge users to login to all nodes (Grid & Kubernetes), besides just the bastions

Feb 8 2019, 3:42 PM · cloud-services-team (Kanban), Toolforge

Feb 5 2019

zhuyifei1999 removed a project from T215271: Padlocks don't display on mobile (Wikimedia Commons): MobileFrontend (MobileFrontend and MinervaNeue architecture).
Feb 5 2019, 4:32 PM · Reading-Web-Local-Wiki-Issues, Readers-Web-Backlog (Tracking), MinervaNeue (Tracking), Commons
zhuyifei1999 updated the task description for T215291: #MinervaNeue 404s despite seemingly not restricted.
Feb 5 2019, 4:28 PM · Phabricator
zhuyifei1999 created T215291: #MinervaNeue 404s despite seemingly not restricted.
Feb 5 2019, 4:28 PM · Phabricator
zhuyifei1999 added projects to T215271: Padlocks don't display on mobile (Wikimedia Commons): Commons, MinervaNeue, MobileFrontend (MobileFrontend and MinervaNeue architecture).

This is caused by https://commons.wikimedia.org/w/load.php?debug=true&lang=en&modules=skins.minerva.content.styles&only=styles&skin=minerva (matches .content a > img)

.content a > img,
.content a > .lazy-image-placeholder,
.content noscript > img {
  max-width: 100% !important;
}
Feb 5 2019, 4:24 PM · Reading-Web-Local-Wiki-Issues, Readers-Web-Backlog (Tracking), MinervaNeue (Tracking), Commons

Feb 4 2019

zhuyifei1999 added a comment to T215128: Video2Commons fails on a particular video: "ERROR: Signature extraction failed".

Yes, it works now! :-) I started it and when it said "uploading" I hit abort as I had already uploaded it by hand. My task overview actually says "Davina Michelle - What About Us (P!NK) An exception occurred: TaskAbort: The task has been aborted.", yet here it is. (already tagged it as duplicate)

Feb 4 2019, 2:41 PM · Internet-Archive, video2commons
zhuyifei1999 closed T215128: Video2Commons fails on a particular video: "ERROR: Signature extraction failed" as Resolved.

I upgraded youtube_dl on all hosts. Could you try again?

Feb 4 2019, 3:19 AM · Internet-Archive, video2commons

Feb 3 2019

zhuyifei1999 added a project to T215125: Wikimedia OTRS release generator incorrectly converts hyperlink code: Tools.

@FDMS Could you look at this?

Feb 3 2019, 5:15 PM · Tools, OTRS

Feb 1 2019

zhuyifei1999 edited projects for T215035: 502 Bad Gateway issue on Petscan, added: VPS-Projects; removed Cloud-Services.
Feb 1 2019, 2:09 PM · VPS-Projects
zhuyifei1999 closed T215035: 502 Bad Gateway issue on Petscan as Invalid.

As far as I know, if https://petscan.wmflabs.org/ loads, there is nothing going wrong with the WMCS networking & routing, which would normally cause 502s.

Feb 1 2019, 7:22 AM · VPS-Projects

Jan 31 2019

zhuyifei1999 added a comment to T215038: Pywikibot failing with non-jSON response errors due to FANDOM url change.

Or you can backup and delete that file and re-generate.

Jan 31 2019, 9:40 PM · Patch-For-Review, Pywikibot
zhuyifei1999 added a comment to T215038: Pywikibot failing with non-jSON response errors due to FANDOM url change.

Update your mlp family file. The file should be in C:\Users\Home\Downloads\core\pywikibot\families

Jan 31 2019, 9:39 PM · Patch-For-Review, Pywikibot
zhuyifei1999 renamed T215038: Pywikibot failing with non-jSON response errors due to FANDOM url change from Bot problems to Pywikibot failing with non-jSON response errors due to FANDOM url change.
Jan 31 2019, 9:33 PM · Patch-For-Review, Pywikibot

Jan 30 2019

zhuyifei1999 closed T214875: Server side upload for Koavf as Resolved.

I'm working on that. Server side uploading is done so closing as resolved.

Jan 30 2019, 8:37 PM · video2commons, Commons, Wikimedia-Site-requests
zhuyifei1999 added a comment to T210827: Help to run Java8 web-app.

[12.912s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

Jan 30 2019, 5:48 PM · Toolforge
zhuyifei1999 added a comment to T214966: Scp intermittently reports missing directory when accessing subdir of tool $HOME directly as maintainer.

Does the contents of files in the directory ever change without without you doing it?

Jan 30 2019, 5:41 PM · Toolforge
zhuyifei1999 added a comment to T208052: Server side upload for Victorgrigas.

File gets deleted after two months :(

Jan 30 2019, 1:08 AM · video2commons, Commons, Wikimedia-Site-requests

Jan 29 2019

zhuyifei1999 added a comment to T214875: Server side upload for Koavf.

@Koavf Don't create file description page manually. video2commons currently refuse to reupload if page exists previously, and this might even apply even if the page is deleted. Could you change the file name on v2c and run again?

Jan 29 2019, 7:06 PM · video2commons, Commons, Wikimedia-Site-requests
zhuyifei1999 added a comment to T210827: Help to run Java8 web-app.

For the moment, I have logged in the Stretch machine and tried (with no success):

Jan 29 2019, 4:59 PM · Toolforge
zhuyifei1999 added a comment to T210827: Help to run Java8 web-app.

Do you know if there will be a Stretch alternative for this machine ?

Jan 29 2019, 2:36 PM · Toolforge

Jan 28 2019

zhuyifei1999 added a comment to T210827: Help to run Java8 web-app.

Sorry, my focus was on 'JVM-based projects that have one executable to start the application' part, not specifically Play Framework. When you execute the JAR, is it executed on the bastion or in a k8s instance? It must be executed on k8s for the routing to work.

Jan 28 2019, 9:09 PM · Toolforge

Jan 25 2019

zhuyifei1999 added a comment to T210827: Help to run Java8 web-app.

Is tomcat one of 'Play Framework projects (and other JVM-based projects that have one executable to start the application)'? If not,it cannot be run on k8s, but on grid. (I don't really know how tomcat works):

Jan 25 2019, 8:27 PM · Toolforge
zhuyifei1999 added a comment to T210827: Help to run Java8 web-app.
tools.replacer@interactive:~$ webservice tomcat status
Traceback (most recent call last):
[...]
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/gridenginebackend.py", line 61, in _get_job_xml
    output = subprocess.check_output(['qstat', '-xml'])
[...]
OSError: [Errno 2] No such file or directory
Jan 25 2019, 7:42 PM · Toolforge
zhuyifei1999 added a comment to T213646: Stretch grid problem: Please install packages libmariadbclient-dev-compat and libssl-dev.

I think so. My next bigger bot task that uses MySQLdb is at 3:42 UTC so we'll see.

Jan 25 2019, 2:18 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
zhuyifei1999 added a comment to T213646: Stretch grid problem: Please install packages libmariadbclient-dev-compat and libssl-dev.

I think so. My next bigger bot task that uses MySQLdb is at 3:42 UTC so we'll see.

Jan 25 2019, 1:53 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge

Jan 22 2019

zhuyifei1999 added a comment to T214003: Merge the "extended-uploader" and "autopatrolled" user groups on Commons.

I can merge the UW change now and SWAT the config change on the same day the new branch gets deployed to Commons, that's ~2-4 hours of difference. Any concerns with that?

Jan 22 2019, 8:06 PM · Patch-For-Review, User-Zoranzoki21, Wikimedia-Site-requests, Commons
zhuyifei1999 added a comment to T214003: Merge the "extended-uploader" and "autopatrolled" user groups on Commons.

I was waiting for a reply on my comment.

Jan 22 2019, 1:36 AM · Patch-For-Review, User-Zoranzoki21, Wikimedia-Site-requests, Commons

Jan 21 2019

zhuyifei1999 added a comment to T214230: Temporary switch crosswiki uploads off across all wikis.

See also T137269

Jan 21 2019, 8:50 PM · Patch-For-Review, Africa-Wikimedia-Developers, Wikimedia-Site-requests, Crosswiki, Commons
zhuyifei1999 added a comment to T214315: tools.meta: notice of heavy use till the end of February.

With php7.2 container you get 2GiB of memory (not sure whether virtual or resident). AFAICT, the default php-cgi has a queue with 4 'workers' that polls on the queue, so you can handle maximum 4 concurrent requests, so memory usage is bounded by the 4 concurrent requests. Is 2GiB not enough for 4 requests (like, if you are doing massive data processing in memory rather than in an external database server)?

Jan 21 2019, 6:15 PM · cloud-services-team, Toolforge

Jan 14 2019

zhuyifei1999 added a comment to T213646: Stretch grid problem: Please install packages libmariadbclient-dev-compat and libssl-dev.

Did you rebuild your /data/project/yifeibot/.local/local/lib/python2.7 packages on a Stretch host yet?

Jan 14 2019, 5:59 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
zhuyifei1999 added a comment to T213646: Stretch grid problem: Please install packages libmariadbclient-dev-compat and libssl-dev.

I'm hitting this well:

Traceback (most recent call last):
  File "/data/project/yifeibot/o/toolserver/bryan/flickr/bots/flickreviewr.py", line 35, in <module>
    from botbase import FlickrBotBase
  File "/mnt/nfs/labstore-secondary-tools-project/yifeibot/o/toolserver/bryan/flickr/bots/botbase.py", line 37, in <module>
    import database
  File "/data/project/yifeibot/o/toolserver/bryan/flickr/shared/database.py", line 1, in <module>
    import MySQLdb
  File "/data/project/yifeibot/.local/local/lib/python2.7/site-packages/MySQLdb/__init__.py", line 19, in <module>
    import _mysql
ImportError: libmariadb.so.2: cannot open shared object file: No such file or directory

This libmariadb.so.2 is provided by libmariadb2, which has a dependency chain default-libmysqlclient-dev -> libmariadbclient-dev-compat -> libmariadb-dev-compat -> libmariadb-dev -> libmariadb2. Yes this error only happens on grid exec nodes; the bastion has the package installed for idek why.

Jan 14 2019, 9:23 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge

Jan 9 2019

zhuyifei1999 added a comment to T213252: toolscheckerctl fails to stop/start checks.

The checkers that weren't started are those that doesn't seem to be managed by puppet. Looking at the list:

toolschecker_nfs_showmount stop/waiting

No idea what this is. I stopped it again.

toolschecker_labsdb_labsdb1003 stop/waiting
toolschecker_labsdb_labsdb1003rw stop/waiting
toolschecker_labsdb_labsdb1001 stop/waiting
toolschecker_labsdb_labsdb1001rw stop/waiting

These labsdb100[13] are some very old databases that have been decommissioned in T184832

toolschecker_grid_start_precise stop/waiting
toolschecker_continuous_job_precise stop/waiting

Precise is dead.

Jan 9 2019, 2:26 PM · Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T213252: toolscheckerctl fails to stop/start checks.

The boot service starting seemed all failed:

Jan  9 09:58:43 tools-checker-01 kernel: [   15.743269] FS-Cache: Netfs 'nfs' registered for caching
Jan  9 09:58:43 tools-checker-01 kernel: [   15.760279] init: Failed to spawn toolschecker main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.784213] init: Failed to spawn toolschecker_labsdb_labsdb1003 main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.802854] init: Failed to spawn toolschecker_continuous_job_trusty main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.815255] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Jan  9 09:58:43 tools-checker-01 kernel: [   15.829076] init: Failed to spawn toolschecker_flannel_etcd main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.853230] init: Failed to spawn toolschecker_labsdb_labsdb1001 main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.855349] init: Failed to spawn toolschecker_puppet_catalog main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.888520] init: Failed to spawn toolschecker_nfs_secondary_cluster_showmount main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.891387] init: Failed to spawn toolschecker_grid_start_trusty main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.915945] init: Failed to spawn toolschecker_labsdb_labsdb1001rw main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.949428] init: Failed to spawn toolschecker_labsdb_labsdb1003rw main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.951630] init: Failed to spawn toolschecker_service_start main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   15.976144] init: Failed to spawn toolschecker_kubernetes_etcd main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.024132] init: Failed to spawn toolschecker_labsdb_labsdb1005 main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.035022] init: Failed to spawn toolschecker_redis main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.044197] init: Failed to spawn toolschecker_toolsdb main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.046291] init: Failed to spawn toolschecker_continuous_job_precise main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.052201] init: Failed to spawn toolschecker_ldap main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.059837] init: Failed to spawn toolschecker_dumps main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.069203] init: Failed to spawn toolschecker_nfs_home main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.071314] init: Failed to spawn toolschecker_labs_private main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.080757] init: Failed to spawn toolschecker_labsdb_labsdb1004rw main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.082921] init: Failed to spawn toolschecker_nfs_showmount main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.101206] init: Failed to spawn toolschecker_self main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.115024] init: Failed to spawn toolschecker_grid_start_precise main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.116600] init: Failed to spawn toolschecker_webservice_kubernetes main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.129750] init: Failed to spawn toolschecker_cron main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.140133] init: Failed to spawn toolschecker_kubernetes_nodes_ready main process: unable to getpwnam: No such file or directory
Jan  9 09:58:43 tools-checker-01 kernel: [   16.759673] init: idmapd main process (939) terminated with status 1
Jan  9 09:58:43 tools-checker-01 kernel: [   16.759688] init: idmapd main process ended, respawning

The current batch was started by puppet:

Jan  9 09:59:58 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/File[/run/toolschecker]/ensure) created
Jan  9 09:59:59 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[self]/Service[toolschecker_self]/ensure) ensure changed 'stopped' to 'running'
Jan  9 09:59:59 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[self]/Service[toolschecker_self]) Unscheduling refresh on Service[toolschecker_self]
Jan  9 09:59:59 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[puppet_catalog]/Service[toolschecker_puppet_catalog]/ensure) ensure changed 'stopped' to 'running'
Jan  9 09:59:59 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[puppet_catalog]/Service[toolschecker_puppet_catalog]) Unscheduling refresh on Service[toolschecker_puppet_catalog]
Jan  9 09:59:59 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[labs_private]/Service[toolschecker_labs_private]/ensure) ensure changed 'stopped' to 'running'
Jan  9 09:59:59 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[labs_private]/Service[toolschecker_labs_private]) Unscheduling refresh on Service[toolschecker_labs_private]
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[nfs_secondary_cluster_showmount]/Service[toolschecker_nfs_secondary_cluster_showmount]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[nfs_secondary_cluster_showmount]/Service[toolschecker_nfs_secondary_cluster_showmount]) Unscheduling refresh on Service[toolschecker_nfs_secondary_cluster_showmount]
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[ldap]/Service[toolschecker_ldap]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[ldap]/Service[toolschecker_ldap]) Unscheduling refresh on Service[toolschecker_ldap]
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[nfs_home]/Service[toolschecker_nfs_home]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[nfs_home]/Service[toolschecker_nfs_home]) Unscheduling refresh on Service[toolschecker_nfs_home]
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[redis]/Service[toolschecker_redis]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:00 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[redis]/Service[toolschecker_redis]) Unscheduling refresh on Service[toolschecker_redis]
Jan  9 10:00:01 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[labsdb_labsdb1005]/Service[toolschecker_labsdb_labsdb1005]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:01 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[labsdb_labsdb1005]/Service[toolschecker_labsdb_labsdb1005]) Unscheduling refresh on Service[toolschecker_labsdb_labsdb1005]
Jan  9 10:00:02 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[labsdb_labsdb1004rw]/Service[toolschecker_labsdb_labsdb1004rw]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:02 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[labsdb_labsdb1004rw]/Service[toolschecker_labsdb_labsdb1004rw]) Unscheduling refresh on Service[toolschecker_labsdb_labsdb1004rw]
Jan  9 10:00:03 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[toolsdb]/Service[toolschecker_toolsdb]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:03 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[toolsdb]/Service[toolschecker_toolsdb]) Unscheduling refresh on Service[toolschecker_toolsdb]
Jan  9 10:00:03 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[dumps]/Service[toolschecker_dumps]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:03 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[dumps]/Service[toolschecker_dumps]) Unscheduling refresh on Service[toolschecker_dumps]
Jan  9 10:00:03 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[continuous_job_trusty]/Service[toolschecker_continuous_job_trusty]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:03 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[continuous_job_trusty]/Service[toolschecker_continuous_job_trusty]) Unscheduling refresh on Service[toolschecker_continuous_job_trusty]
Jan  9 10:00:04 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[grid_start_trusty]/Service[toolschecker_grid_start_trusty]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:04 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[grid_start_trusty]/Service[toolschecker_grid_start_trusty]) Unscheduling refresh on Service[toolschecker_grid_start_trusty]
Jan  9 10:00:04 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[cron]/Service[toolschecker_cron]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:04 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[cron]/Service[toolschecker_cron]) Unscheduling refresh on Service[toolschecker_cron]
Jan  9 10:00:05 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[flannel_etcd]/Service[toolschecker_flannel_etcd]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:05 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[flannel_etcd]/Service[toolschecker_flannel_etcd]) Unscheduling refresh on Service[toolschecker_flannel_etcd]
Jan  9 10:00:05 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[kubernetes_etcd]/Service[toolschecker_kubernetes_etcd]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:05 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[kubernetes_etcd]/Service[toolschecker_kubernetes_etcd]) Unscheduling refresh on Service[toolschecker_kubernetes_etcd]
Jan  9 10:00:06 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[kubernetes_nodes_ready]/Service[toolschecker_kubernetes_nodes_ready]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:06 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[kubernetes_nodes_ready]/Service[toolschecker_kubernetes_nodes_ready]) Unscheduling refresh on Service[toolschecker_kubernetes_nodes_ready]
Jan  9 10:00:06 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[webservice_kubernetes]/Service[toolschecker_webservice_kubernetes]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:06 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[webservice_kubernetes]/Service[toolschecker_webservice_kubernetes]) Unscheduling refresh on Service[toolschecker_webservice_kubernetes]
Jan  9 10:00:07 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[service_start]/Service[toolschecker_service_start]/ensure) ensure changed 'stopped' to 'running'
Jan  9 10:00:07 tools-checker-01 puppet-agent[1717]: (/Stage[main]/Toollabs::Checker/Toollabs::Check[service_start]/Service[toolschecker_service_start]) Unscheduling refresh on Service[toolschecker_service_start]
Jan 9 2019, 2:17 PM · Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T213252: toolscheckerctl fails to stop/start checks.

Are they simply not enabled (start on boot)?

Jan 9 2019, 2:11 PM · Toolforge, cloud-services-team (Kanban)
zhuyifei1999 added a comment to T213252: toolscheckerctl fails to stop/start checks.
02:05:28 0 ✓ zhuyifei1999@tools-checker-01: ~$ sudo service toolschecker_nfs_showmount status
toolschecker_nfs_showmount stop/waiting
02:06:18 0 ✓ zhuyifei1999@tools-checker-01: ~$ sudo service toolschecker_nfs_showmount start
toolschecker_nfs_showmount start/running, process 14698
02:06:22 0 ✓ zhuyifei1999@tools-checker-01: ~$ sudo service toolschecker_nfs_showmount status
toolschecker_nfs_showmount start/running, process 14698

Starting the first one manually did seem to work. Are they simply not enabled (start on boot)?

Jan 9 2019, 2:08 PM · Toolforge, cloud-services-team (Kanban)

Jan 8 2019

zhuyifei1999 added a comment to T213152: How to STOP running a Query?.

What is the use case for stopping the query? The query being running
doesn’t prevent you from editing the query and submitting it again.

Jan 8 2019, 9:41 AM · Quarry

Jan 7 2019

zhuyifei1999 changed the status of T213041: Change Chinese Wikivoyage Logo from Resolved to Invalid.
Jan 7 2019, 3:15 AM · Wikimedia-Site-requests, Chinese-Sites
zhuyifei1999 added a comment to T213041: Change Chinese Wikivoyage Logo.

Just change it via CSS. Wiki configuration changes are for more permanent changes.

Jan 7 2019, 2:34 AM · Wikimedia-Site-requests, Chinese-Sites