Page MenuHomePhabricator
Feed Advanced Search

Jun 21 2017

gnosygnu added a comment to T168204: Archive Wikimedia repository for XOWA: https://phabricator.wikimedia.org/diffusion/GXOW/ (moved to GitHub).

Thanks for the reminder. I just updated the page now. Let me know if I missed anything.

Jun 21 2017, 3:01 AM · Diffusion-Repository-Administrators

Jun 18 2017

gnosygnu created T168204: Archive Wikimedia repository for XOWA: https://phabricator.wikimedia.org/diffusion/GXOW/ (moved to GitHub).
Jun 18 2017, 2:12 PM · Diffusion-Repository-Administrators

May 12 2017

gnosygnu added a comment to T164755: Dead links on dumps.wikimedia.org.

@Jack_who_built_the_house Yeah, you're right; I misread your post. Thanks for the correction

May 12 2017, 1:37 AM · Russian-Sites, Dumps-Generation

May 9 2017

gnosygnu added a comment to T164755: Dead links on dumps.wikimedia.org.

Wikimedia generally doesn't retain old dumps. I think the threshold is 10 or so dumps. For example, I usually wouldn't look at anything that's not listed on https://dumps.wikimedia.org/ruwiki/ where the oldest is currently 2017-01.

May 9 2017, 6:57 AM · Russian-Sites, Dumps-Generation

Mar 17 2017

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

@Nemo_bis Fair enough: I misunderstood your comment. It seems this task is about creating static .html files, such as those in the compressed archives at https://dumps.wikimedia.org/other/static_html_dumps/. I was more focused on a full HTML dump with an XML / SQLite / JSON format. In other words, something comparable to the current xml datadumps at https://dumps.wikimedia.org/backup-index.html, except in an HTML format.

Mar 17 2017, 4:21 AM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Mar 16 2017

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

@Nemo_bis Thanks for the clarification. I agree that there really isn't a clear use-case for a directory of HTML files, particularly since the Wikimedia API allows users to get any HTML for any page. Full HTML dumps are much more useful, as there is no real bulk facility to get all this data (i.e.: get all the HTML for every page in English Wikipedia)

Mar 16 2017, 4:13 AM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Mar 15 2017

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

The use case for the HTML dumps would be similar to the use case for the current XML dumps: to provide Wikimedia content in a universal and easily accessible format (XML, JSON, SQLite). Wikimedia already provides a regular dumping service for the XML dumps. I think it's safe to say that most users of these XML dump would want HTML dumps as well. Does it matter whether or not the use case is clarified to be development or research or something else? The demand still exists, and has been demonstrated in this thread by three commenters.

Mar 15 2017, 3:04 AM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Mar 2 2017

gnosygnu added a comment to T155697: HTML/CSS style changes to the dumps.wikimedia.org main pages.

I don't have anything valuable to add, but for what it's worth, I think the new look is fine. I think it is more visually appealing and modern. So, from me.

Mar 2 2017, 3:39 AM · Datasets-General-or-Unknown, Patch-For-Review, Developer-Wishlist (2017)

Jan 18 2017

gnosygnu added a comment to T99483: Divide XML dumps by page.page_namespace (and figure out what to do with the "pages-articles" dump).

Hi Nemo. Thanks for adding me as a subscriber.

Jan 18 2017, 12:26 PM · Dumps-Generation

Nov 27 2016

gnosygnu added a comment to T151133: Broken SQL dump of Russian Wiktionary.

Hi Alex. I think this is due to the recent changes to incorporate UCA / numeric sorting for many wikis.

Nov 27 2016, 12:38 AM · Dumps-Generation, All-and-every-Wiktionary

Aug 17 2016

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

I'm saying that using static HTML pages seems to be a zero-effort on virtually all platforms

Aug 17 2016, 10:59 PM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Aug 9 2016

gnosygnu added a comment to T142367: https downloads of large files from dataset1001 stop in the middle.

I wasn't able to reproduce the issue. I ran wget for 6 big files between 9 AM and 11 PM EDT. No failures or hiccups. See below.

Aug 9 2016, 3:28 AM · Datasets-General-or-Unknown

Aug 8 2016

gnosygnu added a comment to T142367: https downloads of large files from dataset1001 stop in the middle.

Nope. Seems to work now. From above:

Aug 8 2016, 1:08 PM · Datasets-General-or-Unknown
gnosygnu added a comment to T142367: https downloads of large files from dataset1001 stop in the middle.

It would be very helpful if you could give me a date tha you know downloads were working, and the earliest date that you know they were failing for large files.

Aug 8 2016, 12:52 PM · Datasets-General-or-Unknown

Aug 5 2016

gnosygnu added a watcher for Dumps-Generation: gnosygnu.
Aug 5 2016, 4:18 AM

May 6 2016

gnosygnu added a comment to T133416: En Wikipedia stub dumps short for April 2016.

Yup. I think we're good. Thanks again for looking into it. Let me know if there is anything else (should I be the one to mark it resolved?)

May 6 2016, 2:09 AM · Dumps-Generation

May 2 2016

gnosygnu added a comment to T133416: En Wikipedia stub dumps short for April 2016.

The latest dump looks good. I'm running it through my XOWA parser now, and have not seen any issues. I'll post a quick message to the mailing list now. Thanks again for the follow-up!

May 2 2016, 3:11 AM · Dumps-Generation

Apr 29 2016

gnosygnu added a comment to T133416: En Wikipedia stub dumps short for April 2016.

Thanks! I pulled down the latest dump and it looks good. The Module count is up to 3163 and individual spot-checking looks good.

Apr 29 2016, 12:18 PM · Dumps-Generation
gnosygnu awarded T133416: En Wikipedia stub dumps short for April 2016 a Like token.
Apr 29 2016, 12:14 PM · Dumps-Generation

Nov 24 2015

gnosygnu added a comment to T114019: Dumps 2.0 for realz (planning/architecture session).

Cool. Thanks for the explanation on revision deletion.

Nov 24 2015, 5:06 AM · Dumps-Rewrite, Wikidata, Wikimedia-Developer-Summit-2016

Nov 11 2015

gnosygnu added a comment to T114019: Dumps 2.0 for realz (planning/architecture session).

Ah! Thanks for the explanations. I don't have much to add, but just some minor points

Nov 11 2015, 4:40 AM · Dumps-Rewrite, Wikidata, Wikimedia-Developer-Summit-2016

Nov 10 2015

gnosygnu added a comment to T114019: Dumps 2.0 for realz (planning/architecture session).

Well, here are some comments and questions. Hopefully they aren't too obtuse, and might be of some utility.

Nov 10 2015, 4:18 AM · Dumps-Rewrite, Wikidata, Wikimedia-Developer-Summit-2016

Sep 21 2015

gnosygnu committed rGXOW6141ae6029b2: v2.9.3.1.
v2.9.3.1
Sep 21 2015, 3:48 AM

Sep 14 2015

gnosygnu committed rGXOW505127ff0240: v2.9.2.1.
v2.9.2.1
Sep 14 2015, 1:55 AM

Sep 7 2015

gnosygnu committed rGXOW0e07abc30fdd: v2.9.1.1.
v2.9.1.1
Sep 7 2015, 2:11 AM

Aug 31 2015

gnosygnu committed rGXOW64dd88dd3657: v2.8.5.1.
v2.8.5.1
Aug 31 2015, 3:11 AM

Aug 24 2015

gnosygnu committed rGXOW859f8b44675f: v2.8.4.1.
v2.8.4.1
Aug 24 2015, 4:09 AM

Aug 17 2015

gnosygnu committed rGXOWf6c663fe88dd: v2.8.3.1.
v2.8.3.1
Aug 17 2015, 6:39 AM

Aug 10 2015

gnosygnu committed rGXOWf2ae8e8cfc79: v2.8.2.1.
v2.8.2.1
Aug 10 2015, 2:29 AM

Aug 3 2015

gnosygnu committed rGXOW12aa89101975: v2.8.1.1.
v2.8.1.1
Aug 3 2015, 5:43 AM

Jul 20 2015

gnosygnu committed rGXOW0c58bdd57f62: v2.7.3.1.
v2.7.3.1
Jul 20 2015, 3:42 AM

Jul 13 2015

gnosygnu committed rGXOW7509296330da: v2.7.2.1.
v2.7.2.1
Jul 13 2015, 2:08 AM

Jul 6 2015

gnosygnu added a comment to T103670: Several "Duplicate entry for key 'PRIMARY'" errors in enwiki-latest-pages-articles.xml.bz2 (05-Jun-2015 23:45, 11984805689 bytes).

As per https://lists.wikimedia.org/pipermail/xmldatadumps-l/2015-July/001149.html, I also observed the same behavior in multiple Swedish wiki dumps, as well as the Spanish Wikipedia dump. I excerpt the relevant portion below.

Jul 6 2015, 11:48 PM · Datasets-General-or-Unknown, MediaWiki-libs-Rdbms
gnosygnu committed rGXOW59bdb4b6ea25: v2.7.1.1.
v2.7.1.1
Jul 6 2015, 2:43 AM

Jun 29 2015

gnosygnu committed rGXOW407d26ca4ae0: v2.6.5.1.
v2.6.5.1
Jun 29 2015, 3:54 AM

Jun 22 2015

gnosygnu committed rGXOW6e89ee21c168: v2.6.4.1.
v2.6.4.1
Jun 22 2015, 3:47 AM

Jun 19 2015

gnosygnu added a comment to T89273: Produce stub dumps for all wikis as soon as a new month starts, then generate all other dumps on second round-robin cycle.

@ArielGlenn Cool. Didn't know that it would just be 3 more lines. If only all changes could scale as nicely as that. :)

Jun 19 2015, 1:31 PM · Dumps-Generation

Jun 18 2015

gnosygnu added a comment to T89273: Produce stub dumps for all wikis as soon as a new month starts, then generate all other dumps on second round-robin cycle.

@ArielGlenn Thanks. I think 5 rounds would be better, especially in comparison to the original 2 round proposal.

Jun 18 2015, 11:28 PM · Dumps-Generation

Jun 17 2015

gnosygnu added a comment to T89273: Produce stub dumps for all wikis as soon as a new month starts, then generate all other dumps on second round-robin cycle.

Can the stub dumps be extended from step 29 [1] to either step 30 [2] or even 31 [3]? This would generate usable dumps early in the cycle and help mitigate the staleness factor that DCDuring brings up.

Jun 17 2015, 11:18 PM · Dumps-Generation

Jun 15 2015

gnosygnu committed rGXOW34146c13acf2: v2.5.4.1.
v2.5.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWcc2f08e2ec1b: v2.6.3.1.
v2.6.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWf285598eb390: v2.5.1.1.
v2.5.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW5dc93b518b09: v2.5.2.1.
v2.5.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW7f5e822ef0c4: v2.4.4.1.
v2.4.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW0dcc220a62e3: v2.4.1.1.
v2.4.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWe9b2ed16797f: v2.2.4.1.
v2.2.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW51cb1a840bc0: v2.3.1.1.
v2.3.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW12bc426a060b: v2.4.3.1.
v2.4.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW26d59059ae33: v2.3.2.1.
v2.3.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW18bf933f7b50: v2.2.1.1.
v2.2.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWaa7eebd4825f: v2.4.2.1.
v2.4.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW0fa0b7d96d70: v2.2.2.1.
v2.2.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWfba53782274b: v2.1.4.1.
v2.1.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWef347819f2bb: v1.11.2.1.
v1.11.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW6423959af707: v1.12.1.1.
v1.12.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW4eb4726e544f: v1.10.2.1.
v1.10.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW4e4391b00058: v1.12.2.1.
v1.12.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWb21d7167747d: v1.10.3.1.
v1.10.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW8f12b76c7a27: v1.11.3.1.
v1.11.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW9ee388fcbee1: v1.9.5.1.
v1.9.5.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWa1b660872587: v1.10.1.1.
v1.10.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWbb6fcd55a800: v1.10.4.1.
v1.10.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWf4a9a430f923: v1.9.4.1.
v1.9.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW8be39c927f9b: v1.9.2.1.
v1.9.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW16f2cf35472f: v1.9.1.1.
v1.9.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWee423f9ce545: v1.8.3.1.
v1.8.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW4555155577b9: v1.9.3.1.
v1.9.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWdcf12f764ee5: v1.8.4.1.
v1.8.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWa7b10ec82ac3: v1.8.2.1.
v1.8.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW89098c038b44: v1.7.4.1.
v1.7.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW2bc23871fb74: v1.7.3.1.
v1.7.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW191e2ab639d0: v1.8.1.1.
v1.8.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW02ef9ba51e1d: v1.6.2.1.
v1.6.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW5afba070e277: v1.7.2.1.
v1.7.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW2ff4a719b0b2: v1.6.5.1.
v1.6.5.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW29cc541b287e: v1.7.1.1.
v1.7.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWbfb87244343f: v1.6.4.1.
v1.6.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW14fe274e8766: v1.6.3.1.
v1.6.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW70f63f88f9bd: v1.6.1.1.
v1.6.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW9cee6aafa4c4: v1.5.4.1.
v1.5.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW8899a00c7789: v1.5.2.1.
v1.5.2.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOW7f7d437cc83f: v1.5.3.1.
v1.5.3.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWd46aaca1e004: v1.5.1.1.
v1.5.1.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWdde2c79a7f41: v1.4.4.1.
v1.4.4.1
Jun 15 2015, 11:38 PM
gnosygnu committed rGXOWe1c138375c8f: v1.4.3.1.
v1.4.3.1
Jun 15 2015, 11:38 PM

Jun 14 2015

gnosygnu added a comment to T93396: Decide on format options for HTML and possibly other dumps.

@GWicke I think this is great. I took a look at a few articles in http://dumps.wikimedia.org/htmldumps/dumps/simple.wikipedia.org.articles.ns0.sqlite3.xz and it looked fine. I particularly like how these dumps includes other small details like category, indicator and alt text. Very nice work.

Jun 14 2015, 3:00 AM · Services (blocked), Dumps-Generation, RESTBase

Mar 3 2015

gnosygnu committed rGXOW13125c64be07: v1.4.1.1.
v1.4.1.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW5957bd0cb0de: v1.4.2.1.
v1.4.2.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW8106ccd5bef5: v1.3.4.1.
v1.3.4.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW23027b610401: v1.3.5.1.
v1.3.5.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW05bcd7aa046b: v1.3.2.1.
v1.3.2.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW5ae26691ee7e: v1.2.1.1.
v1.2.1.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOWccedbbd7fea1: v1.2.2.1.
v1.2.2.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOWdfda0ce3bb06: v1.1.4.1.
v1.1.4.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW628612e70361: v1.3.1.1.
v1.3.1.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW92c556b6c1ab: v1.2.3.1.
v1.2.3.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW83d1f4ab98a7: v1.3.3.1.
v1.3.3.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW22ad02e0de3f: v1.2.4.1.
v1.2.4.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOW0bf40d5a6f12: v1.1.3.1.
v1.1.3.1
Mar 3 2015, 9:03 PM
gnosygnu committed rGXOWb55a1a0e8852: v0.12.0.0.
v0.12.0.0
Mar 3 2015, 9:03 PM