Page MenuHomePhabricator

gnosygnu (gnosygnu)
User

Projects (1)

Today

  • No visible events.

Tomorrow

  • No visible events.

Wednesday

  • No visible events.

User Details

User Since
Feb 28 2015, 2:42 PM (571 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Gnosygnu [ Global Accounts ]

Recent Activity

Jun 21 2017

gnosygnu added a comment to T168204: Archive Wikimedia repository for XOWA: https://phabricator.wikimedia.org/diffusion/GXOW/ (moved to GitHub).

Thanks for the reminder. I just updated the page now. Let me know if I missed anything.

Jun 21 2017, 3:01 AM · Diffusion-Repository-Administrators

Jun 18 2017

gnosygnu created T168204: Archive Wikimedia repository for XOWA: https://phabricator.wikimedia.org/diffusion/GXOW/ (moved to GitHub).
Jun 18 2017, 2:12 PM · Diffusion-Repository-Administrators

May 12 2017

gnosygnu added a comment to T164755: Dead links on dumps.wikimedia.org.

@Jack_who_built_the_house Yeah, you're right; I misread your post. Thanks for the correction

May 12 2017, 1:37 AM · Russian-Sites, Dumps-Generation

May 9 2017

gnosygnu added a comment to T164755: Dead links on dumps.wikimedia.org.

Wikimedia generally doesn't retain old dumps. I think the threshold is 10 or so dumps. For example, I usually wouldn't look at anything that's not listed on https://dumps.wikimedia.org/ruwiki/ where the oldest is currently 2017-01.

May 9 2017, 6:57 AM · Russian-Sites, Dumps-Generation

Mar 17 2017

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

@Nemo_bis Fair enough: I misunderstood your comment. It seems this task is about creating static .html files, such as those in the compressed archives at https://dumps.wikimedia.org/other/static_html_dumps/. I was more focused on a full HTML dump with an XML / SQLite / JSON format. In other words, something comparable to the current xml datadumps at https://dumps.wikimedia.org/backup-index.html, except in an HTML format.

Mar 17 2017, 4:21 AM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Mar 16 2017

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

@Nemo_bis Thanks for the clarification. I agree that there really isn't a clear use-case for a directory of HTML files, particularly since the Wikimedia API allows users to get any HTML for any page. Full HTML dumps are much more useful, as there is no real bulk facility to get all this data (i.e.: get all the HTML for every page in English Wikipedia)

Mar 16 2017, 4:13 AM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Mar 15 2017

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

The use case for the HTML dumps would be similar to the use case for the current XML dumps: to provide Wikimedia content in a universal and easily accessible format (XML, JSON, SQLite). Wikimedia already provides a regular dumping service for the XML dumps. I think it's safe to say that most users of these XML dump would want HTML dumps as well. Does it matter whether or not the use case is clarified to be development or research or something else? The demand still exists, and has been demonstrated in this thread by three commenters.

Mar 15 2017, 3:04 AM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Mar 2 2017

gnosygnu added a comment to T155697: HTML/CSS style changes to the dumps.wikimedia.org main pages.

I don't have anything valuable to add, but for what it's worth, I think the new look is fine. I think it is more visually appealing and modern. So, from me.

Mar 2 2017, 3:39 AM · Datasets-General-or-Unknown, Patch-For-Review, Developer-Wishlist (2017)

Jan 18 2017

gnosygnu added a comment to T99483: Divide XML dumps by page.page_namespace (and figure out what to do with the "pages-articles" dump).

Hi Nemo. Thanks for adding me as a subscriber.

Jan 18 2017, 12:26 PM · Dumps-Generation

Nov 27 2016

gnosygnu added a comment to T151133: Broken SQL dump of Russian Wiktionary.

Hi Alex. I think this is due to the recent changes to incorporate UCA / numeric sorting for many wikis.

Nov 27 2016, 12:38 AM · Dumps-Generation, All-and-every-Wiktionary

Aug 17 2016

gnosygnu added a comment to T17017: Wikimedia static HTML dumps broken.

I'm saying that using static HTML pages seems to be a zero-effort on virtually all platforms

Aug 17 2016, 10:59 PM · Services (later), Dumps-Generation, Readers-Community-Engagement, Patch-For-Review, RESTBase, Datasets-General-or-Unknown

Aug 9 2016

gnosygnu added a comment to T142367: https downloads of large files from dataset1001 stop in the middle.

I wasn't able to reproduce the issue. I ran wget for 6 big files between 9 AM and 11 PM EDT. No failures or hiccups. See below.

Aug 9 2016, 3:28 AM · Datasets-General-or-Unknown

Aug 8 2016

gnosygnu added a comment to T142367: https downloads of large files from dataset1001 stop in the middle.

Nope. Seems to work now. From above:

Aug 8 2016, 1:08 PM · Datasets-General-or-Unknown
gnosygnu added a comment to T142367: https downloads of large files from dataset1001 stop in the middle.

It would be very helpful if you could give me a date tha you know downloads were working, and the earliest date that you know they were failing for large files.

Aug 8 2016, 12:52 PM · Datasets-General-or-Unknown

Aug 5 2016

gnosygnu added a watcher for Dumps-Generation: gnosygnu.
Aug 5 2016, 4:18 AM

May 6 2016

gnosygnu added a comment to T133416: En Wikipedia stub dumps short for April 2016.

Yup. I think we're good. Thanks again for looking into it. Let me know if there is anything else (should I be the one to mark it resolved?)

May 6 2016, 2:09 AM · Dumps-Generation

May 2 2016

gnosygnu added a comment to T133416: En Wikipedia stub dumps short for April 2016.

The latest dump looks good. I'm running it through my XOWA parser now, and have not seen any issues. I'll post a quick message to the mailing list now. Thanks again for the follow-up!

May 2 2016, 3:11 AM · Dumps-Generation

Apr 29 2016

gnosygnu added a comment to T133416: En Wikipedia stub dumps short for April 2016.

Thanks! I pulled down the latest dump and it looks good. The Module count is up to 3163 and individual spot-checking looks good.

Apr 29 2016, 12:18 PM · Dumps-Generation

Nov 24 2015

gnosygnu added a comment to T114019: Dumps 2.0 for realz (planning/architecture session).

Cool. Thanks for the explanation on revision deletion.

Nov 24 2015, 5:06 AM · Dumps-Rewrite, Wikidata, Wikimedia-Developer-Summit-2016

Nov 11 2015

gnosygnu added a comment to T114019: Dumps 2.0 for realz (planning/architecture session).

Ah! Thanks for the explanations. I don't have much to add, but just some minor points

Nov 11 2015, 4:40 AM · Dumps-Rewrite, Wikidata, Wikimedia-Developer-Summit-2016

Nov 10 2015

gnosygnu added a comment to T114019: Dumps 2.0 for realz (planning/architecture session).

Well, here are some comments and questions. Hopefully they aren't too obtuse, and might be of some utility.

Nov 10 2015, 4:18 AM · Dumps-Rewrite, Wikidata, Wikimedia-Developer-Summit-2016

Jul 6 2015

gnosygnu added a comment to T103670: Several "Duplicate entry for key 'PRIMARY'" errors in enwiki-latest-pages-articles.xml.bz2 (05-Jun-2015 23:45, 11984805689 bytes).

As per https://lists.wikimedia.org/pipermail/xmldatadumps-l/2015-July/001149.html, I also observed the same behavior in multiple Swedish wiki dumps, as well as the Spanish Wikipedia dump. I excerpt the relevant portion below.

Jul 6 2015, 11:48 PM · Datasets-General-or-Unknown, MediaWiki-libs-Rdbms

Jun 19 2015

gnosygnu added a comment to T89273: Produce stub dumps for all wikis as soon as a new month starts, then generate all other dumps on second round-robin cycle.

@ArielGlenn Cool. Didn't know that it would just be 3 more lines. If only all changes could scale as nicely as that. :)

Jun 19 2015, 1:31 PM · Dumps-Generation

Jun 18 2015

gnosygnu added a comment to T89273: Produce stub dumps for all wikis as soon as a new month starts, then generate all other dumps on second round-robin cycle.

@ArielGlenn Thanks. I think 5 rounds would be better, especially in comparison to the original 2 round proposal.

Jun 18 2015, 11:28 PM · Dumps-Generation

Jun 17 2015

gnosygnu added a comment to T89273: Produce stub dumps for all wikis as soon as a new month starts, then generate all other dumps on second round-robin cycle.

Can the stub dumps be extended from step 29 [1] to either step 30 [2] or even 31 [3]? This would generate usable dumps early in the cycle and help mitigate the staleness factor that DCDuring brings up.

Jun 17 2015, 11:18 PM · Dumps-Generation

Jun 14 2015

gnosygnu added a comment to T93396: Decide on format options for HTML and possibly other dumps.

@GWicke I think this is great. I took a look at a few articles in http://dumps.wikimedia.org/htmldumps/dumps/simple.wikipedia.org.articles.ns0.sqlite3.xz and it looked fine. I particularly like how these dumps includes other small details like category, indicator and alt text. Very nice work.

Jun 14 2015, 3:00 AM · Services (blocked), Dumps-Generation, RESTBase