Thanks for the reminder. I just updated the page now. Let me know if I missed anything.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 21 2017
Jun 18 2017
May 12 2017
@Jack_who_built_the_house Yeah, you're right; I misread your post. Thanks for the correction
May 9 2017
Wikimedia generally doesn't retain old dumps. I think the threshold is 10 or so dumps. For example, I usually wouldn't look at anything that's not listed on https://dumps.wikimedia.org/ruwiki/ where the oldest is currently 2017-01.
Mar 17 2017
@Nemo_bis Fair enough: I misunderstood your comment. It seems this task is about creating static .html files, such as those in the compressed archives at https://dumps.wikimedia.org/other/static_html_dumps/. I was more focused on a full HTML dump with an XML / SQLite / JSON format. In other words, something comparable to the current xml datadumps at https://dumps.wikimedia.org/backup-index.html, except in an HTML format.
Mar 16 2017
@Nemo_bis Thanks for the clarification. I agree that there really isn't a clear use-case for a directory of HTML files, particularly since the Wikimedia API allows users to get any HTML for any page. Full HTML dumps are much more useful, as there is no real bulk facility to get all this data (i.e.: get all the HTML for every page in English Wikipedia)
Mar 15 2017
The use case for the HTML dumps would be similar to the use case for the current XML dumps: to provide Wikimedia content in a universal and easily accessible format (XML, JSON, SQLite). Wikimedia already provides a regular dumping service for the XML dumps. I think it's safe to say that most users of these XML dump would want HTML dumps as well. Does it matter whether or not the use case is clarified to be development or research or something else? The demand still exists, and has been demonstrated in this thread by three commenters.
Mar 2 2017
I don't have anything valuable to add, but for what it's worth, I think the new look is fine. I think it is more visually appealing and modern. So, from me.
Jan 18 2017
Hi Nemo. Thanks for adding me as a subscriber.
Nov 27 2016
Hi Alex. I think this is due to the recent changes to incorporate UCA / numeric sorting for many wikis.
Aug 17 2016
I'm saying that using static HTML pages seems to be a zero-effort on virtually all platforms
Aug 9 2016
I wasn't able to reproduce the issue. I ran wget for 6 big files between 9 AM and 11 PM EDT. No failures or hiccups. See below.
Aug 8 2016
Nope. Seems to work now. From above:
It would be very helpful if you could give me a date tha you know downloads were working, and the earliest date that you know they were failing for large files.
Aug 5 2016
May 6 2016
Yup. I think we're good. Thanks again for looking into it. Let me know if there is anything else (should I be the one to mark it resolved?)
May 2 2016
The latest dump looks good. I'm running it through my XOWA parser now, and have not seen any issues. I'll post a quick message to the mailing list now. Thanks again for the follow-up!
Apr 29 2016
Thanks! I pulled down the latest dump and it looks good. The Module count is up to 3163 and individual spot-checking looks good.
Nov 24 2015
Cool. Thanks for the explanation on revision deletion.
Nov 11 2015
Ah! Thanks for the explanations. I don't have much to add, but just some minor points
Nov 10 2015
Well, here are some comments and questions. Hopefully they aren't too obtuse, and might be of some utility.
Sep 21 2015
Sep 14 2015
Sep 7 2015
Aug 31 2015
Aug 24 2015
Aug 17 2015
Aug 10 2015
Aug 3 2015
Jul 20 2015
Jul 13 2015
Jul 6 2015
As per https://lists.wikimedia.org/pipermail/xmldatadumps-l/2015-July/001149.html, I also observed the same behavior in multiple Swedish wiki dumps, as well as the Spanish Wikipedia dump. I excerpt the relevant portion below.
Jun 29 2015
Jun 22 2015
Jun 19 2015
@ArielGlenn Cool. Didn't know that it would just be 3 more lines. If only all changes could scale as nicely as that. :)
Jun 18 2015
@ArielGlenn Thanks. I think 5 rounds would be better, especially in comparison to the original 2 round proposal.
Jun 17 2015
Can the stub dumps be extended from step 29 [1] to either step 30 [2] or even 31 [3]? This would generate usable dumps early in the cycle and help mitigate the staleness factor that DCDuring brings up.
Jun 15 2015
Jun 14 2015
@GWicke I think this is great. I took a look at a few articles in http://dumps.wikimedia.org/htmldumps/dumps/simple.wikipedia.org.articles.ns0.sqlite3.xz and it looked fine. I particularly like how these dumps includes other small details like category, indicator and alt text. Very nice work.