Page MenuHomePhabricator

Missing Enterprise Dumps from 2022-06-20 run
Closed, ResolvedPublic

Description

It appears that something may have happened and the June 20th (2022) run didn't happen, seen per this index file being empty.

Flagging here to make sure we can get this fort-nightly run live. Whatever the error might be, let myself (@RBrounley_WMF) and @HShaikh know if there is anything on the Wikimedia Enterprise side causing this or we can do to help optimize.

Also tagging @Mitar who flagged this on our talk page

Event Timeline

Jun 20 08:30:09 labstore1006 systemd[1]: Started Twice monthly download of Wikimedia Enterprise HTML dumps.
...
Jun 20 08:30:10 labstore1006 python3[9098]: 2022-06-20 08:30:10,780 - __main__ - ERROR - failed to get namespaces list with response code 500 (Internal Server Error)
Jun 20 08:30:10 labstore1006 python3[9098]: 2022-06-20 08:30:10,781 - __main__ - ERROR - error retrieving namespace_ids
Jun 20 08:30:10 labstore1006 systemd[1]: download_enterprise_htmldumps.service: Succeeded.

This was the problem: a 500 error on y'all's end, no idea why. I'll have a look and see how often we retry for that.

Thanks Ariel, will look into the logs from our side as well.

I've found the request in our logs:

{
    "response_time": "2022-06-20T08:30:10Z",
    "status": 500,
    "latency": 64861,
    "ip": "xxx.xx.xxx.x",
    "method": "GET",
    "path": "/v1/namespaces",
    "body_size": 58
}

Are you using JWT (access token) to make this particular request? If you do, then can you pls share the username that you are using?

We do and the user name is that which was assigned to us, ops-dumps

Note that we do not try to retry getting the namespace id list because if that fails, we assume something to be seriously wrong on the remote end. We could add a few retries though.

So if I understand correctly, those files have never been generated so that particular dump for that particular date will not be available?

Yes, that particular dump for that particular date will not be available. We'll download latest as soon as we fix the pipeline.

We do and the user name is that which was assigned to us, ops-dumps

Note that we do not try to retry getting the namespace id list because if that fails, we assume something to be seriously wrong on the remote end. We could add a few retries though.

I don't think we need to retry namespace endpoint, you are correct in your assumptions that if this endpoint does not work something is clearly wrong.
I think I know what the problem is, will let you know when it's fixed.

we have deployed a patch so that we'll get email notification the next time something like this happens.

Just a note that we are still seeing this. The pull for the 1st of the month failed:

2022-07-01 08:30:01,522 - __main__ - ERROR - failed to get namespaces list with response code 500 (Internal Server Error)
2022-07-01 08:30:01,523 - __main__ - ERROR - error retrieving namespace_ids

We fixed the problem but have not deployed yet. We have deployment scheduled next week, will let you know when it's out.

This issue (two missing dumps) has been noted on our Meta talkpage. Note to self to update that conversation when the issue is resolved.

Hey all, I've deployed code that should fix the problem.

@ArielGlenn Can you please trigger the run when you have time and let me know if it works. Thanks.

I have verfied via a dryrun that the namepsaces and wiki lists are now retrievable. The next scheduled run on the 20th should be ok based on this. Thanks!