Page MenuHomePhabricator

Refill stuck at 'Submitting your task...'
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:

Browse to https://refill.toolforge.org/ and enter the title of any Wikipedia article. Click 'Fix Page'

Actual Results:

The status message 'Submitting your task...' appears, and remains indefinitely.

Expected Results:

The tool should begin processing the references in the article.

Event Timeline

I don't see any errors this time, which is interesting. It doesn't look stuck.

I can kick the API over.

Mentioned in SAL (#wikimedia-cloud) [2020-07-08T17:20:36Z] <wm-bot> <root> Deleted pod to restart the app T257471

I'm not seeing any traffic to celery so far.

Still not working, but I am digging around in the javascript and I can see in the console

Access to XMLHttpRequest at 'https://tools.wmflabs.org/refill-api/fixWikipage' from origin 'https://refill.toolforge.org' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: Redirect is not allowed for a preflight request.

This will need to be updated to https://refill-api.toolforge.org/fixWikipage

I have checked https://github.com/zhaofengli/refill and it isn't hard-coded anywhere

I see it in the venv for pywikibot www/python/venv/lib/python3.4/site-packages/pywikibot/proofreadpage.py:- https://tools.wmflabs.org/phetools/hocr_cgi.py
That can't help. I hope pywikibot doesn't have tools.wmflabs.org URLs hardcoded in.

I see it in the venv for pywikibot www/python/venv/lib/python3.4/site-packages/pywikibot/proofreadpage.py:- https://tools.wmflabs.org/phetools/hocr_cgi.py
That can't help. I hope pywikibot doesn't have tools.wmflabs.org URLs hardcoded in.

It is. https://github.com/wikimedia/pywikibot/blob/374164444c5090f28a5c05430c94c71d6389a691/pywikibot/proofreadpage.py#L132

I don't know if that is causing this, though.

No, I don't think so, that code is to do with optical character recognition...

No other matches outside of binary cache things on a recursive grep in that tool (in refill-api that is). I haven't checked the refill tool yet.

In the refill tool it matches the minified js (which is loads of fun).

I am not sure where the webpack-minified JS is coming from so far.

It shows up in the app.<stuff>.js "return regeneratorRuntime.wrap(function(t){for(;;)switch(t.prev=t.next){case 0:return t.next=2,Me.getItem("userpref");case 2:return(i=t.sent)&&(n.preferences=i),t.abrupt("return",Ne.a.post("".concat("https://tools.wmflabs.org/refill-api","/").concat(e),n));"

So that is definitely a problem. Exactly where this vue (?) code is, I still don't know.

I also am not sure the paths are correct in the app after redirects. index.html shows "<script type="text/javascript" src="/refill/ng/app.dd418aea35e29ef5e4f8.js"></script>", but the redirect goes to /ng

Yeah, this is all about paths and domains @bd808 I'm going to put this in the queue for these issues. I am not sure all building and such is done on toolforge. It looks like webpack and such may have been run locally by the dev. sed could fix things...and it might not.

This whole tool is very problematic in that it appears to be used by folks but has no maintainer. I spent nearly a whole day looking into the code base last week in the hopes of finding a way to setup a proper health check for it. That deeper dive left me feeling that the current tool is architecturally a poor fit for Toolforge. The redis + celery system is not ideal with a shared redis instance. Maybe if we had a reasonable understanding of how much data lives in the redis queue during normal operation it would be possible to make it more robust by introducing a redis sidecar in the tool itself. Having seen now how celery uses redis, and the very minor configuration that is allowed, I don't trust a shared redis instance at all for celery use. It is way to easy to imagine multiple tools all using the same database number (there are only 16 possible) for their queue.

This tool will eventually be merged into InternetArchiveBot, but being a one person dev team, don’t hope to have this accomplished anytime soon. Feel free to send some volunteers my way will to help with development work.

]The redis + celery system is not ideal with a shared redis instance. Maybe if we had a reasonable understanding of how much data lives in the redis queue during normal operation it would be possible to make it more robust by introducing a redis sidecar in the tool itself. Having seen now how celery uses redis, and the very minor configuration that is allowed, I don't trust a shared redis instance at all for celery use. It is way to easy to imagine multiple tools all using the same database number (there are only 16 possible) for their queue.

I do, however, suspect this is all a problem in the frontend. I think the backend refill-api tool is totally fine. This would be old values in the javascript of the refill tool that talks to it. The redis and all that doesn't matter much here as far as I can tell so far.

The content at $HOME/public_html/ng/ is not tracked in git. It is just a pile of files generated using webpack. I live hacked the .js files there to use https://refill-api.toolforge.org instead of https://tools.wmflabs.org/refill-api. Now the next problem is that https://refill-api.toolforge.org needs to send CORS header to allow the cross-origin js polling.

I live hacked the Ingress object for refill-api to add a nginx.ingress.kubernetes.io/enable-cors: "true" annotation. The way to do that is via kubectl edit ingress refill-api-subdomain. The change I made will stick until someone deletes the existing Ingress. When that happens, the new ingress will need the same fix.

Thank you! files away for future reference