Page MenuHomePhabricator

Unable to download PDF files of articles
Closed, ResolvedPublic

Description

As reported by "The Voidwalker" on VillagePump, when using OCG which provides the "Download as PDF" feature, rendering starts and finishes but when you click the resulting download link you get a "file not found".

I have confirmed this and an example link is:

https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=download&collection_id=be20b30664758b9ffc90d36d682cc6add1186e6e&writer=rdf2latex&return_to=Wikipedia%3AVillage+pump+%28technical%29

Event Timeline

Change 284392 had a related patch set uploaded (by Giuseppe Lavagetto):
Add "fake" codfw entries for ocg to hotfix an issue

https://gerrit.wikimedia.org/r/284392

Btw just to let others know my findings:

  1. pdf generation works fine
  2. we have a flawed mechanism with which OCG signals where to fetch an article to mediawiki, for which I already opened at least one UBN! ticket that went completely and happily ignored
  3. This flawed mechanism uses "ocg1001:8000" strings to signal mediawiki where the pdf has been generated. In eqiad, that would resolve to ocg1001.eqiad.wmnet, which is indeed the host. In codfw, the appserver are not set up to search for addresses in eqiad.wmnet, and for a good reason. Thus, the file won't be found.

So the less lame hotfix I could think of is to add entries in the dns for ocg1001.codfw.wment and so on... it's horrible, but the horror all lies within ocg in this case.

Change 284392 merged by Giuseppe Lavagetto:
Add "fake" codfw entries for ocg to hotfix an issue

https://gerrit.wikimedia.org/r/284392

So, after wiping the recursor caches for the negative record, I can download PDFs just fine.

I am not resolving this issue as we seriously need to fix ocg.

For reference, here is the list of bugs I wrote months ago and that saw no activity since:

T120077 OCG should not be contacted directly from the appservers but only via LVS
T120079 The OCG cleanup cache script doesn't work properly

So I am resolving this bug, and maybe adding another one to the pyle of tickets I open on this and get ignored.

@Joe -- it seems like we should open another task for "document migration procedure from codfw to eqiad", right? Isn't that the root cause of the problems here?

How does this work for the other WMF services? It seems that starting in codfw after the migration with a cold cache is a bad idea. But the caching strategy of OCG doesn't really lend itself to master-slave relationships. Perhaps what we want is some sort of rsync command to prefill caches on the codfw side? Some design work needs to be done.