Page MenuHomePhabricator

Clean ZppixBot's filesystem up
Closed, ResolvedPublic

Description

These files will be at least gzip'd and may be deleted:

  • *.log
  • .sopel/log/*
  • *.pyc
  • */__pycache__/*
  • stashbot/*
  • *.mndb
  • *.out
  • .gz
  • .err
  • www/*
  • *.swp
  • *.save

@samuelguebo: do you need the logs from the AWMD meeting on our host?

I'm also going to take this slot to give the database a bit of a cleanup.

Event Timeline

We'll need to do a short downtime for this as I'll be touching the database.

Hopefully should take no longer than 30 mins.

kubectl scale --replicas=0 $(deployment.apps/sopel.bot)
kubectl delete pods sopel.bot-<hash>

will stop the bot

kubectl scale --replicas=1 $(deployment.apps/sopel.bot)

will reboot us

I plan to do this on Monday 1st June at 10:00 am UTC+1. Once we restart, there should be NO user impact as we're only touching things that are not needed.

I will announce this tomorrow morning.

  • Things to run on default.db

DELETE from channel_values WHERE channel like %#wiki_dev_africa%

  • I'm also going to delete the backup user_known database and recreate it.
  • I'm also going to drop anything not needed from the user_known database.
RhinosF1 set Due Date to Jun 1 2020, 10:30 AM.

We apparently have a very old deployment file so when I switch some config to tidy up how we install stuff on pip. I'll need to replace the file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sopel.bot
  namespace: tool-zppixbot
  labels:
    name: sopel.bot
    # The toolforge=tool label will cause $HOME and other paths to be mounted from Toolforge
    toolforge: tool
spec:
  replicas: 1
  selector:
    matchLabels:
      name: sopel.bot
      toolforge: tool
  template:
    metadata:
      labels:
        name: sopel.bot
        toolforge: tool
    spec:
      containers:
        - name: sopel
          image: 'docker-registry.tools.wmflabs.org/toolforge-python35-sssd-base:latest'
          command: [ "/data/project/zppixbot/k8s/starter-new.sh", "bash" ]
          workingDir: /data/project/zppixbot
          env:
            - name: HOME
              value: /data/project/zppixbot
          volumeMounts:
            - mountPath: /data/project/zppixbot/
              name: home
              readOnly: false
      volumes:
        - name: home
          hostPath:
            path: /data/project/zppixbot/

Plan of action:

  1. Webservice stop
  2. kubectl deployment delete sopelbot
  3. Drop #wiki-dev-Africa from config
  4. rm everything outside of sopel that’s being deleted
  5. tar+gzip sopel logs
  6. fix the user_known file & trash the mndb
  7. delete .pyc and __pycache__/*
  8. switch the starter{-type}.sh
  9. check permissions on the k8s file
  10. create the deployment & start webservice
  11. wait for everything to boot up
  12. run .ip and .commands to hopefully re-generate most of the caches.

We'll need to do a short downtime for this as I'll be touching the database.

Hopefully should take no longer than 30 mins.

kubectl scale --replicas=0 $(deployment.apps/sopel.bot)
kubectl delete pods sopel.bot-<hash>

will stop the bot

kubectl scale --replicas=1 $(deployment.apps/sopel.bot)

will reboot us

I plan to do this on Monday 1st June at 10:00 am UTC+1. Once we restart, there should be NO user impact as we're only touching things that are not needed.

I will announce this tomorrow morning.

@RhinosF1, sure. If it can help save space you can remove the logs as most of them are pretty old, unless @D3r1ck01 thinks otherwise?

If it can help save space you can remove the logs as most of them are pretty old, unless @D3r1ck01 thinks otherwise?

There’s not many (3?) files I think. I can email you them.

Change 601179 had a related patch set uploaded (by RhinosF1; owner: RhinosF1):
[labs/tools/ZppixBot@master] Add sitenotice for maint.

https://gerrit.wikimedia.org/r/601179

Change 601179 merged by RhinosF1:
[labs/tools/ZppixBot@master] Add sitenotice for maint.

https://gerrit.wikimedia.org/r/601179

Change 601180 had a related patch set uploaded (by RhinosF1; owner: RhinosF1):
[labs/tools/ZppixBot@master] Add sitenotice for maint.

https://gerrit.wikimedia.org/r/601180

Change 601180 merged by RhinosF1:
[labs/tools/ZppixBot@master] Add sitenotice for maint.

https://gerrit.wikimedia.org/r/601180

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T07:46:14Z] <RF1dle> add notice for T254046 to wiki index about

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T08:17:12Z] <RhinosF1> upload starter-new.sh and switched sopelbot.yaml foor T254046

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T09:01:31Z] <RhinosF1> deleted sopel.bot deployment and stopped webservice - START T254046

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T09:55:46Z] <RhinosF1> revert sitenotice for T254046

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T09:56:56Z] <RhinosF1> webservice --backend=kubernetes php7.2 start --canonical for T254046

Mentioned in SAL (#wikimedia-cloud) [2020-06-01T10:00:50Z] <RhinosF1> sopel.bot re-created for T254046

Sanity checked and everything seems fine, I've reloaded the ip database and ran .commands and .ping to hopefully populate a few caches again.

The bot may be slow for first use.