Page MenuHomePhabricator

XMLRCs is not functioning
Closed, ResolvedPublic

Description

XMLRCs is a service within the Huggle cloud-vps project that is where wm-bot gets RecentChanges feeds. It seems to have gone down this morning as wm-bot is failing to connect to it which is rendering the RecentChanges module useless. Somebody with access to Huggle needs to restart XMLRCs.

Event Timeline

@Addshore and @Petrb are the admins for the huggle Cloud VPS project.

Mentioned in SAL (#wikimedia-cloud) [2021-11-13T01:54:16Z] <bd808> sudo su - xmlrcs; ./xmlrcsd -d after seeing no running xmlrcsd (T295487)

Mentioned in SAL (#wikimedia-cloud) [2021-11-13T01:54:16Z] <bd808> sudo su - xmlrcs; ./xmlrcsd -d after seeing no running xmlrcsd (T295487)

I found some sketchy docs at https://wikitech.wikimedia.org/wiki/XmlRcs#Maintainer_info which led me to try that command.

/opt/xmlrcs/nohup.out
Traceback (most recent call last):
  File "./es2r.py", line 16, in <module>
    rs.set("es2r.pid", int(os.getpid()))
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 1072, in set
    return self.execute_command('SET', *pieces)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 573, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 585, in parse_response
    response = connection.read_response()
  File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 582, in read_response
    raise response
redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
/var/log/redis/redis-server.log
11418:M 13 Nov 03:02:37.048 * 1 changes in 900 seconds. Saving...
11418:M 13 Nov 03:02:37.048 # Can't save in background: fork: Cannot allocate memory

Mentioned in SAL (#wikimedia-cloud) [2021-11-13T03:08:12Z] <bd808> Rebooting xmlrcs.huggle.eqiad1.wikimedia.cloud (T295487)

Perryprog claimed this task.

From @bd808: "[03:07:21] bd808: I think what happened is that the redis queue filled up because the consuming script was down. I'm going to reboot the instance and then check back to see what has to be manually started."

After this, everything depending on xmlrcs was back and running again, so this looks to be resolved.

The services DO NOT start on boot, so after rebooting someone needs to do something like:

$ ssh xmlrcs.huggle.eqiad1.wikimedia.cloud
$ sudo su - xmlrcs
$ cd /opt/xmlrcs
$ ./xmlrcsd -d
$ nohup ./start &

It appears to be down again. @Petrb or @bd808, are one of you able to re-poke it awake again?