Page MenuHomePhabricator

reFill stuck at "Submitting your task..."
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Alternatively use the /ng version
https://refill.toolforge.org/ng/result.php?page=Voting_age&defaults=y&wiki=en

What happens?:

reFill displays a message "Submitting your task..." but doesn't get any further.

What should have happened instead?:

reFill should progress to processing the bare references on the page.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Details

Other Assignee
TheresNoTime

Event Timeline

Curb_Safe_Charmer updated the task description. (Show Details)
Curb_Safe_Charmer updated Other Assignee, added: TheresNoTime.
Curb_Safe_Charmer changed the task status from Open to In Progress.Jun 4 2023, 3:46 PM
Curb_Safe_Charmer triaged this task as Medium priority.

refill-api/restart.sh didn't fix it

tools.refill-api@tools-sgebastion-11:~$ kubectl describe pod refill-api-797c676f c-27qcg
Name: refill-api-797c676fc-27qcg
Namespace: tool-refill-api
Priority: 0
Node: tools-k8s-worker-75/172.16.1.132
Start Time: Sun, 04 Jun 2023 17:12:48 +0000
Labels: app.kubernetes.io/component=web

app.kubernetes.io/managed-by=webservice
name=refill-api
pod-template-hash=797c676fc
toolforge=tool

Annotations: cni.projectcalico.org/containerID: 71f8893f883cf1e144f228ac679d28b 57f6fcff30d4ac1277bba6b80779f5746

cni.projectcalico.org/podIP: 192.168.78.233/32
cni.projectcalico.org/podIPs: 192.168.78.233/32
kubernetes.io/psp: tool-refill-api-psp
seccomp.security.alpha.kubernetes.io/pod: runtime/default

Status: Running
IP: 192.168.78.233
IPs:

IP:           192.168.78.233

Controlled By: ReplicaSet/refill-api-797c676fc
Containers:

webservice:
  Container ID:  docker://b510ff5eaf7afc2da874740c090202eae14537ea12501526cf22                                      d9a63550ebcd
  Image:         docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web                                      :latest
  Image ID:      docker-pullable://docker-registry.tools.wmflabs.org/toolforge                                      -python39-sssd-web@sha256:79cce2d82565a77f6cf06682fb244e9d9e3ed6667c4130ead1c40c                                      0cb19fe343
  Port:          8000/TCP
  Host Port:     0/TCP
  Command:
    /usr/bin/webservice-runner
    --type
    uwsgi-python
    --port
    8000
  State:          Waiting
    Reason:       CrashLoopBackOff
  Last State:     Terminated
    Reason:       Error
    Exit Code:    22
    Started:      Sun, 04 Jun 2023 17:28:45 +0000
    Finished:     Sun, 04 Jun 2023 17:28:45 +0000
  Ready:          False
  Restart Count:  8
  Limits:
    cpu:     500m
    memory:  512Mi
  Requests:
    cpu:     125m
    memory:  256Mi
  Environment:
    HOME:           /data/project/refill-api
    TOOL_DATA_DIR:  /data/project/refill-api
  Mounts:
    /data/project from home (rw)
    /data/scratch from scratch (rw)
    /etc/ldap.conf from etcldap-conf (ro)
    /etc/ldap.yaml from etcldap-yaml (ro)
    /etc/novaobserver.yaml from etcnovaobserver-yaml (ro)
    /etc/wmcs-project from wmcs-project (ro)
    /mnt/nfs/dumps-clouddumps1001.wikimedia.org from dumpsrc-clouddumps1001 (r                                      o)
    /mnt/nfs/dumps-clouddumps1002.wikimedia.org from dumpsrc-clouddumps1002 (r                                      o)
    /public/dumps from dumps (ro)
    /var/lib/sss/pipes from sssd-pipes (rw)
    /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hjwln (                                      ro)

Conditions:

Type              Status
Initialized       True
Ready             False
ContainersReady   False
PodScheduled      True

Volumes:

kube-api-access-hjwln:
  Type:                    Projected (a volume that contains injected data fro                                      m multiple sources)
  TokenExpirationSeconds:  3607
  ConfigMapName:           kube-root-ca.crt
  ConfigMapOptional:       <nil>
  DownwardAPI:             true
dumps:
  Type:          HostPath (bare host directory volume)
  Path:          /public/dumps
  HostPathType:  Directory
dumpsrc-clouddumps1001:
  Type:          HostPath (bare host directory volume)
  Path:          /mnt/nfs/dumps-clouddumps1001.wikimedia.org
  HostPathType:  Directory
dumpsrc-clouddumps1002:
  Type:          HostPath (bare host directory volume)
  Path:          /mnt/nfs/dumps-clouddumps1002.wikimedia.org
  HostPathType:  Directory
home:
  Type:          HostPath (bare host directory volume)
  Path:          /data/project
  HostPathType:  Directory
wmcs-project:
  Type:          HostPath (bare host directory volume)
  Path:          /etc/wmcs-project
  HostPathType:  File
scratch:
  Type:          HostPath (bare host directory volume)
  Path:          /data/scratch
  HostPathType:  Directory
etcldap-conf:
  Type:          HostPath (bare host directory volume)
  Path:          /etc/ldap.conf
  HostPathType:  File
etcldap-yaml:
  Type:          HostPath (bare host directory volume)
  Path:          /etc/ldap.yaml
  HostPathType:  File
etcnovaobserver-yaml:
  Type:          HostPath (bare host directory volume)
  Path:          /etc/novaobserver.yaml
  HostPathType:  File
sssd-pipes:
  Type:          HostPath (bare host directory volume)
  Path:          /var/lib/sss/pipes
  HostPathType:  Directory

QoS Class: Burstable
Node-Selectors: kubernetes.wmcloud.org/nfs-mounted=true
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

Type     Reason     Age                 From               Message
----     ------     ----                ----               -------
Normal   Scheduled  16m                 default-scheduler  Successfully assign                                      ed tool-refill-api/refill-api-797c676fc-27qcg to tools-k8s-worker-75
Normal   Pulled     16m                 kubelet            Successfully pulled                                       image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in                                       76.193536ms
Normal   Pulled     16m                 kubelet            Successfully pulled                                       image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in                                       93.438359ms
Normal   Pulled     15m                 kubelet            Successfully pulled                                       image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in                                       80.540017ms
Normal   Created    15m (x4 over 16m)   kubelet            Created container w                                      ebservice
Normal   Pulled     15m                 kubelet            Successfully pulled                                       image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in                                       87.109824ms
Normal   Started    15m (x4 over 16m)   kubelet            Started container w                                      ebservice
Normal   Pulling    14m (x5 over 16m)   kubelet            Pulling image "dock                                      er-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest"
Normal   Pulled     14m                 kubelet            Successfully pulled                                       image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in                                       100.484119ms
Warning  BackOff    63s (x72 over 16m)  kubelet            Back-off restarting                                       failed container

tools.refill-api@tools-sgebastion-11:~$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
31m Normal Scheduled pod/refill-api-797c676fc-27qcg Successfully assigned tool-refill-api/refill-api-797c676fc-27qcg to tools-k8s-worker-75
30m Normal Pulling pod/refill-api-797c676fc-27qcg Pulling image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest"
31m Normal Pulled pod/refill-api-797c676fc-27qcg Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in 76.193536ms
30m Normal Created pod/refill-api-797c676fc-27qcg Created container webservice
30m Normal Started pod/refill-api-797c676fc-27qcg Started container webservice
31m Normal Pulled pod/refill-api-797c676fc-27qcg Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in 93.438359ms
80s Warning BackOff pod/refill-api-797c676fc-27qcg Back-off restarting failed container
31m Normal Pulled pod/refill-api-797c676fc-27qcg Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in 80.540017ms
30m Normal Pulled pod/refill-api-797c676fc-27qcg Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in 87.109824ms
30m Normal Pulled pod/refill-api-797c676fc-27qcg Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest" in 100.484119ms
31m Normal SuccessfulCreate replicaset/refill-api-797c676fc Created pod: refill-api-797c676fc-27qcg
21m Normal Killing pod/refill-api-85457fc487-4kt6d Stopping container webservice
4m47s Warning FailedKillPod pod/refill-api-85457fc487-4kt6d error killing pod: failed to "KillContainer" for "webservice" with KillContainerError: "rpc error: code = Unknown desc = Error response from daemon: cannot stop container: ee49aa029f05c6e8b57f2e4256a207768ffb616ac33c2ae0727bcc1c5f1f5081: tried to kill container, but did not receive an exit event"
30m Normal SuccessfulDelete replicaset/refill-api-85457fc487 Deleted pod: refill-api-85457fc487-4kt6d
35m Normal Scheduled pod/refill-api-scheduler-64b4d8d4c-4n8f4 Successfully assigned tool-refill-api/refill-api-scheduler-64b4d8d4c-4n8f4 to tools-k8s-worker-55
35m Normal Pulling pod/refill-api-scheduler-64b4d8d4c-4n8f4 Pulling image "docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest"
35m Normal Pulled pod/refill-api-scheduler-64b4d8d4c-4n8f4 Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest" in 144.577708ms
35m Normal Created pod/refill-api-scheduler-64b4d8d4c-4n8f4 Created container celery
35m Normal Started pod/refill-api-scheduler-64b4d8d4c-4n8f4 Started container celery
36m Normal Killing pod/refill-api-scheduler-64b4d8d4c-dhcsh Stopping container celery
35m Normal SuccessfulCreate replicaset/refill-api-scheduler-64b4d8d4c Created pod: refill-api-scheduler-64b4d8d4c-4n8f4
35m Normal ScalingReplicaSet deployment/refill-api-scheduler Scaled up replica set refill-api-scheduler-64b4d8d4c to 1
35m Normal Scheduled pod/refill-api-worker-569f9f5698-4dmg4 Successfully assigned tool-refill-api/refill-api-worker-569f9f5698-4dmg4 to tools-k8s-worker-55
35m Normal Pulling pod/refill-api-worker-569f9f5698-4dmg4 Pulling image "docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest"
35m Normal Pulled pod/refill-api-worker-569f9f5698-4dmg4 Successfully pulled image "docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest" in 72.74477ms
35m Normal Created pod/refill-api-worker-569f9f5698-4dmg4 Created container celery
35m Normal Started pod/refill-api-worker-569f9f5698-4dmg4 Started container celery
36m Normal Killing pod/refill-api-worker-569f9f5698-f75mx Stopping container celery
35m Normal SuccessfulCreate replicaset/refill-api-worker-569f9f5698 Created pod: refill-api-worker-569f9f5698-4dmg4
35m Normal ScalingReplicaSet deployment/refill-api-worker Scaled up replica set refill-api-worker-569f9f5698 to 1
31m Normal ScalingReplicaSet deployment/refill-api Scaled up replica set refill-api-797c676fc to 1
30m Normal ScalingReplicaSet deployment/refill-api Scaled down replica set refill-api-85457fc487 to 0

The important part of the log is

State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       Error
  Exit Code:    22

The application is crashing, you can try to discover what is happening by viewing the log of the previous container to see the error it got before being terminated. You can do that with the command kubectl logs refill-api-797c676fc-27qcg -p, the "-p" is what make it reach the previous container.

uwsgi.log showed

Traceback (most recent call last):
File "/data/project/refill-api/www/python/src/app.py", line 6, in <module> from flask import Flask, Response, request, abort, url_for
ModuleNotFoundError: No module named 'flask'

danilo thought a virtualenv problem:
<danilo> check if it is activating the virtualenv before call the application
<danilo> or better, toolforge has some conventions for python webservices, check if it is following the conventions: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Python
<CurbSafeCharmer> The strange thing is, AFAIK, nothing has changed in the environment recently so not sure why this problem has appeared yesterday afternoon
<danilo> virtualenvs must be in $HOME/www/python/venv, if it is it will be automatically activate
<danilo> indeed it is strange, and the https://refill.toolforge.org/ home is working, so it is probably using the virtualenv with flask
<danilo> it seems like something in the broken application is different from the one that shows the homepage
<CurbSafeCharmer> probably broken in the backend api rather than the front end

It appears to be working now? Tried with multiple pages - it doesn't get stuck at submitting for me.

Running ~/www/python/venv/bin/pip list shows that Flask is installed alright.

It appears to be working now? Tried with multiple pages - it doesn't get stuck at submitting for me.

Yes, I saw on the reFill talk page that it had started working again. Did you change anything? I am not aware that Tox makes changes in production, so can't see what would have fixed it.

Over at https://en.wikipedia.org/wiki/Wikipedia_talk:ReFill, @TheresNoTime wrote:

Glad to see it's back — I very quickly took a look last night and restarted some pods/the webservice. The originally targeted Python version (in the service manifest file, refill-api) is no longer supported, so I bumped this to python3.9 (and I think the same applies for the targeted PHP version in refill...?) — TheresNoTime (talk • they/them) 12:13 pm, Today (UTC+1)