Fri, May 14
@Arlolra Thank you for the quick move. In an attempt to help you diagnose the problem (or verify the patch works fine), here is a list of other URLS suffering for the same symptom:
Sun, May 9
@Arlorla Thank you so much for your effort. I'm not sure if this (kind of) problem is old or if this is a regression because at the same time we tend to make MWoffliner more strict. What is sure is that it impacts maybe 40% of all wikis and that we barelly can scrape fully a big Wikipedia anymore.
Sat, May 8
@Arlolra I allow myself to ping you on this as the impact is super high for us and I don't know who else to ping. That said, not sure if this is an error in Parsoid or in the API service itself.
Mon, May 3
I have move from 6 slots to 10, hopefully this won't destabalised the server.
@ArielGlenn Still a problem?
The root problem is in the wiki code (free text in place of size in pixel)... but probably Parsoid should not generated broken HTML of it.
Sat, May 1
@Aklapper This bug is a serious one for the Kiwix team as it impacts many (proeminent) Wikimedia wikis and make our whole scraping dying because the backend does not deliver. Any chance someone could have a look why such given URLs simply fail in the backend?
Mobile API is impacted as well, see for example https://de.wikipedia.org/api/rest_v1/page/mobile-sections/Chronik_der_COVID-19-Pandemie_in_den_Vereinigten_Staaten_2020
Mon, Apr 26
Apr 6 2021
@ssastry thx a lot!
@arlorla Great to see a patch here. Thx! A few users had reported the problem on our side over the years. What is the timeline for prod? https://ru.wikipedia.org/api/rest_v1/page/html/%D0%9D%D0%B0%D0%BC%D0%B8%D0%B1%D0%B8%D1%8F#mwAdg seems to be still buggy.
Apr 4 2021
@Arlolra Indeed and it seems to work fine in the ZIM as well http://library.kiwix.org/wikisource_fr_all_maxi/A/De_la_litt%C3%A9rature_des_n%C3%A8gres/4. Thx.
@Krinkle Thx, I'm still in touch with a developer of this custom version of MathJax extension and he will try to load the MathJax js code within the ResourceLoader.
Mar 28 2021
@Krinkle I have mailed someone at Proofwiki and it was given to me the following link (seems to be and older Version of the MathJax extension) https://www.mediawiki.org/w/index.php?title=Extension:MathJax&oldid=1184913. Not sure what would be the next step. Would that simply work if they update the extension to the latest version?
Mar 27 2021
Might that be that this ticket has been invalided by the deprecation of server-side Graphoid charts rendering?
Mar 26 2021
A few days ago a user has open a third ticket (https://github.com/openzim/mwoffliner/issues/1402) about that on MWoffliner. I don't really understand why this old bug, which seems easy to fix, has not been tackled so far.
Mar 25 2021
Mar 23 2021
Mar 21 2021
Mar 18 2021
@Arlolra Thx for the patch. Hopefuly soon in prod!
Feb 18 2021
Feb 10 2021
Jan 26 2021
We came back to this ticket on Kiwix side with https://github.com/kiwix/kiwix-android/pull/2562#issuecomment-767382951
Jan 21 2021
@Audiodude and me are the maintainers. I got an email to inform me about this ticket.
Jan 5 2021
An other example: here is the Classic rendering:
Jan 3 2021
Someone has fixed the problem in the wiki source at https://li.wiktionary.org/w/index.php?title=Wiktionary%3AVeurblaad&type=revision&diff=645102&oldid=628893.
Jan 1 2021
Still a new case here: https://es.wikipedia.org/api/rest_v1/page/html/Anexo:Baloncesto_en_los_Juegos_Mediterr%C3%A1neos_de_1951, 15 days after image renaming on Commons (https://commons.wikimedia.org/w/index.php?title=Special:Log&page=File%3AFlag+of+Egypt+%281922%E2%80%931958%29.svg), the rendered HTML still points to a the wrong/old/404 thumbnail.
Dec 26 2020
@ssastry Thank you very much for the analysis.
Dec 24 2020
@Aklapper I tend to think the problem is real but maybe has been wrongly reported.
Dec 9 2020
I have found the password for "WP 1.0 bot" (Thank you backup!)
Oct 8 2020
Sep 26 2020
@ArielGlenn Oh yes... sounds a good candidate. Thx for linking both tickets!
Sep 23 2020
@ArielGlenn Seems you are right. Indeed Stripped from the link, see https://el.wikibooks.org/api/rest_v1/page/html/inux_%CE%B3%CE%B9%CE%B1_%CE%B1%CF%81%CF%87%CE%AC%CF%81%CE%B9%CE%BF%CF%85%CF%82%2F%CE%93%CE%B9%CE%B1%CF%84%CE%AF_Linux%3B
@ArielGlenn I have put the online link as a reference. It is not deleted and if there is a typo, where (I can not find it)? You can see the problem differently: how to get the Parsoid output via REST api for this very specific article?
Sep 22 2020
Jul 27 2020
@Andrew mwoffliner1 & mwoffliner3 have been re-created. Hope this solves your problem :)
Jul 21 2020
@ema not really this is case which had to be handled in MWoffliner. This is all.
Jul 16 2020
@Andrew Then good to me. Would deleting the instance and recreating them be good enough to solve our problem? Or should we follow an other procedure?
Jul 15 2020
@Andrew Hi Andrew. About which VMs are with talking about exactly? mwoffliner1, mwoffliner2 and mwoffliner3? It is possible for us to invest time to recreate them but I would like to secure with you than we won't get weaker hardware. This is really critical point for us that they get really similar hardware (like mwoffliner5).
Jul 7 2020
Jun 24 2020
Jun 21 2020
Jun 17 2020
@CDanis We get many HTTP 429 errors from the rest(base) API if we scrape with nodes outside the VPS cluster. Really a hassle to deal with. It seems to me we are impacted... But maybe I get something wrong.
Jun 16 2020
Jun 15 2020
FYI: Because it seems there is a knowledge/communication gap about openZIM/Kiwix dumping solution, a Tech talk is currently being planned (probably in August) https://phabricator.wikimedia.org/T255392. If you have questions/concerns/remarks, please make comments on that ticket. I will secure that the presentation address them.
Jun 8 2020
I can only emphasis that a ticket which does not transparently explain the problem which is tried to be solved is going to be successfuly only by chance. Therefore, this is probably my last comment on this as we run here a discussion being blind. One of thing I heard is that that dumps might have to include the Parsoid sementic tags, which is not the case for the dumps issued by MWoffliner (MWoffliner remove them). If this is the case, a POC can be done within a few hours to avoid remove them, we can ever just store the raw HTML issued from the API JSON.
Jun 5 2020
I believe I don't understand why additional HTML dumps are necessary, but like @ArielGlenn has written we do all of this already on a monthly base:
Jun 4 2020
@aborrero Thank you very much. Everything works like a charm now!
@Andrew I have been able to recreate mwoffliner2 properly. I believe 4 VCPUs and 8GB or RAM are missing in the quota.
@Andrew Thank your very much for this! I have been able to delete mwoffliner1 and recreate it successfully with a xlarge-xtradisk profile. The VM is up and running. I wanted to recreate mwoffliner3 the same way, deleted it but failed to create a new xlarge-xtradisk instance. It seems the quota is not proper (too low). Do I'm wrong somewhere?
May 28 2020
@Aklapper Thx for pointing me to this, I have updated the task with the expected information.
May 19 2020
May 9 2020
May 4 2020
@abi_ Thank you!