User Details
- User Since
- Oct 7 2014, 7:43 AM (397 w, 4 d)
- Availability
- Available
- IRC Nick
- kelson
- LDAP User
- Kelson
- MediaWiki User
- Kelson [ Global Accounts ]
Mon, May 16
@Andrew VM recreated.
Fri, May 13
@Andrew OK, shit happends time to time, I will recreate the VM during the WE.
Sun, May 8
@Andrew Old our VMs have been migrated to Debian-bullseye. Therefore this ticket can be closed
Sat, May 7
@Reedy Thank you!
@Reedy Would you be able please to help here?
Sun, May 1
We have only one VM left, which should be destroyed within the next 2 weeks.
@vadim-kovalenko Thx for coming back with this insightfull comment. Does that mean that the Wikipedians behind this template have chosen to do so... or does that mean this is not their choice (it's a feature/bug in Mediawiki) and they complain as well about it?
@Andrew Thank you, meanwhile, almost all VMs have been newly set!
@abi_ Thank you very much!
Apr 21 2022
Apr 20 2022
mwoffliner1 -> 300GB
mwoffliner2 -> 300GB
mwoffliner3 -> 300GB
mwoffliner4 -> 300GB
mwcurator -> 200GB
Apr 18 2022
Apr 17 2022
@Andrew I have deleted mwoffliner3 instance (is stateless and anyway we need to recreate it with debian-11.0-bullseye too) and then recreate the same kind of VM (but based on debian-11.0-bullseye) to welcome the migrated instance of wp1. This new instance is called mwcurator (we have been talking since year anyway for a better name thant "WP1"). The VM has been created but we suffer now of 0 free disk quota. Now that the volumes are handled separatly we need more/one. Actually considering that all old mwoffliner VMs (mwoffliner1, mwoffliner2 and mwoffliner3) will be migrated soon, we need 1.1 TB to be able to recreate 100% of our VM with the similar hardware like before. Would you be able to change that please?
Apr 2 2022
@Andrew I recreated mwoffliner4. I guess it should be OK now for you.
Mar 29 2022
Of the 4 Thumbor throttles, only 1 is per-IP address. The other three are based on the original file (failure or concurrency) or filetype. RFC 6585 explicitly does not define how users or requests should be counted. We've also used 429 with "too many" being 1 elsewhere in the Wikimedia infrastructure, though that's largely been replaced with 403s for media at least. Using 503 (with or without Retry-After) would be an option, but I don't really see it as necessarily better than 429.
Mar 28 2022
@ArielGlenn Sorry, I meant T286588
@ArielGlenn Hi, I already come back to you! T299993 is hidden to me and I have no visibility on it. Is that already implemented. If "no", in which timeline are we moving in?
Mar 24 2022
@Andrew Yes it would kill the process and I would have to restart everything.
@Andrew I don't know what is a "special-purpose wdqs server", I suspect this is by mistake. This server as a "special" profile, I remember asking you specifically this config.
Mar 18 2022
@Arlolra Indeed, sorry for the duplicate
Mar 4 2022
@Aklapper I wondered if no priorisation and keeping "Open, needs triage", although a team/someone works on a solution, might somehow misleads or impair at some point the development of the ticket.
@Aklapper I thought you are in charge of the triage. What would be the proper thing to do?
@Aklapper This still needs triage?
Mar 3 2022
Feb 3 2022
This ticket is important for the openZIM/Kiwix community and in particular its Chinese audience, see https://github.com/openzim/mwoffliner/issues/840
I have created a similar ticket at MWoffliner level https://github.com/openzim/mwoffliner/issues/1587.
Jan 27 2022
@ArielGlenn Thank you for putting WMCS in the loop. In which timeline this refresh should happen? I guess nothing will be done as far as this is not done.
@ArielGlenn Thank you! It seems to me the problem has been fixed now. I see (again) recent ZIM files in https://dumps.wikimedia.org/kiwix/ and the Kiwix mirror manager too, see https://download.kiwix.org/zim/wikipedia/wikipedia_ab_all_nopic_2022-01.zim.mirrorlist.
Jan 25 2022
wikipedia_nan_all_maxi_2022-01.zim for example is not listed in https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/... really strange!
@ArielGlenn Thank you for your feedback. I have created an other task here https://phabricator.wikimedia.org/T299993
Dec 23 2021
@ArielGlenn Any chance this ticket could be implemented some time? It seems as well that the Wikimedia does not mirror since at least a month because all the Wikipedia ZIM files are older than a month.
Dec 19 2021
Dec 14 2021
thx
Dec 12 2021
Dec 5 2021
@TheDJ I have no stats, but the problem is common enough that we regularly get bug reports at Kiwix because of this sole bug. My guess is that this is at the same time pretty old and impacting a pretty large number of pages.
@Arlolra @tstarling The situation on this front has improved *a lot* these last months. I'm not sure to know who to thank you for that but please transmit my "thank you". That said the situation is not fully sanitised. I went through the past buggy examples and have added a few new ones which make recipes failing at https://farm.openzim.org:
- https://de.wikisource.org/api/rest_v1/page/html/Schwere%2C_Elektricit%C3%A4t_und_Magnetismus%2FErster_Theil"
- https://fr.wikiversity.org/api/rest_v1/page/html/M%C3%A9canique_des_syst%C3%A8mes_de_points%2FCin%C3%A9tique_et_dynamique_d'un_syst%C3%A8me_discret_de_points_mat%C3%A9riels"
- https://en.wikiversity.org/api/rest_v1/page/html/Quizbank%2FcalcPhyEMqAll%2Fc05
- https://pt.wikibooks.org/api/rest_v1/page/html/An%C3%A1lise_real%2FImprimir
Oct 10 2021
@Samtar Not that I'm aware of... and this is a pretty annoying one. We regularly have users (at Kiwix) complaining because of an old revision of an article because the REST API cache is not refreshed properly.
Oct 5 2021
Oct 2 2021
Oct 1 2021
@Arlolra Thank you very much!
Sep 26 2021
Sep 15 2021
@tstarling Makes sense. Considering the age of this ticket, I focus myself on T288889 which is more recent and with a clearer symptom. I monitor it and update it time to time with new pages which don't render.
Sep 14 2021
Another case with WPFR:
time curl -sI "https://fr.wikipedia.org/api/rest_v1/page/mobile-sections/Liste_des_cantons_fran%C3%A7ais_depuis_2015" | grep 504 HTTP/2 504
Other case:
$ time curl -sI "https://id.wikipedia.org/api/rest_v1/page/mobile-sections/Daftar_tokoh_Wales" | grep 504 HTTP/2 504
Sep 1 2021
@Arlolra Thank you for the explanation. WPFI scrape now works fine, see https://farm.openzim.org/pipeline/742bcb9b43e0eecc7177e216
Aug 31 2021
@Arlolra How to know when this fix will be deploy on production? Here and in general?
Aug 29 2021
For the reference, problem was reported first at https://github.com/openzim/mwoffliner/issues/1520
@ssastry Thx, should have seen myself.
Aug 28 2021
@Aklapper The problem is clearly visible (from a user perspective) on the REST desktop HTML as well, look at:
https://ru.wikipedia.org/api/rest_v1/page/html/%D0%A2%D0%AD%D0%9C2#%D0%93%D0%B0%D0%BB%D0%B5%D1%80%D0%B5%D1%8F
A user newly reported to us about missing images in the REST API HTML output because of this bug:
Aug 20 2021
Here probably an other case:
$ time curl -sI "https://zh.wikisource.org/api/rest_v1/page/html/%E6%98%8E%E6%9C%AC%E6%8E%92%E5%AD%97%E4%B9%9D%E7%B6%93%E7%9B%B4%E9%9F%B3_(%E5%9B%9B%E5%BA%AB%E5%85%A8%E6%9B%B8%E6%9C%AC)%2F%E5%8D%B7%E4%B8%8B" | grep 504 HTTP/2 504
Aug 19 2021
@Krinkle Thx
Other case:
$ time curl -sI "https://de.wikisource.org/api/rest_v1/page/html/Schwere%2C_Elektricit%C3%A4t_und_Magnetismus%2FErster_Theil" | grep 504 HTTP/2 504
Other case:
$ time curl -sI "https://zh.wikisource.org/api/rest_v1/page/html/%E6%98%8E%E6%9C%AC%E6%8E%92%E5%AD%97%E4%B9%9D%E7%B6%93%E7%9B%B4%E9%9F%B3_(%E5%9B%9B%E5%BA%AB%E5%85%A8%E6%9B%B8%E6%9C%AC)%2F%E5%8D%B7%E4%B8%8B" | grep 504 HTTP/2 504
@Arlolra Sounds good. Overall situation with Wiki timeouts/errors has been improved over the past months, but we still have problems to fully scrape many of them, mostly the big ones. Will keep trying to attach our MWoffliner tickets to Phabricator tickets so info keep percolating up to you :)
@Arlolra Thank you for enlighting a bit all this server errors and help us to have a differentiated view on the global problem.
Kiwix users are continously reporting oddities regarding content and content formating. We are challenged to identify if the problem is in MWoffliner or upstream. To help identify where things go wrong, earlier this year, MWoffliner has started to be stricter regarding the Mediawiki backend HTTP responses. For example if we get HTTP 5xx errors, MWoffliner now dies. If the Parsoid does not deliver because of performance issue(s), then it will probably ends with an HTTP 504 or 502 error at the API backend level. This is why this ticket and all the siblings of that kind are really important to us. For the moment, one of this case within the 6M articles of WPEN, and MWoffliner just stops the whole scrape.
Aug 18 2021
@Arlolra The problem with https://fi.wikipedia.org/api/rest_v1/page/mobile-sections/S%C3%A4teily%C2%ADpaine is still there, so if this is a cache problem, it does not disappear quickly :(
This impacts many wikis, here an other example with WPEL:
https://el.wikipedia.org/api/rest_v1/page/mobile-sections/%CE%A5%CF%80%CE%AC%CF%81%CF%87%CE%BF%CF%85%CE%BD_%CE%A7%CF%81%CF%85%CF%83%CF%8C%CF%88%CE%B1%CF%81%CE%B1_%CE%95%CE%B4%CF%8E%3B
Aug 15 2021
New case with https://github.com/openzim/mwoffliner/issues/1526
Might be the same root cause as T226931
Could we do something to avoid the API to release months old pages. I wonder a bit that since two years, this ticket is not even triaged.
Aug 14 2021
@Arlolra HTTP error code is not the same but otherwise looks for me to be a similar problem like https://phabricator.wikimedia.org/T280381