User Details
User Details
- User Since
- Feb 18 2019, 5:37 PM (368 w, 2 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Nicolastorzec [ Global Accounts ]
Jul 15 2020
Jul 15 2020
Nicolastorzec added a comment to T257480: Sample HTML Dumps - Request for feedback.
RE redirects - We don't reuse the redirect tables that @ArielGlenn dumps every few weeks because we need more up-to-date data, but we do something similar to what he described. We typically parse article pages for redirect templates, and add the information we extract about the redirect pages to the pages they redirect to.
Jul 10 2020
Jul 10 2020
Nicolastorzec added a comment to T257480: Sample HTML Dumps - Request for feedback.
@RBrounley_WMF:
+1 on publishing the dataset as a small number of large splittable files compressed with a splittable format. It helps the download and distributed data processing.
Apr 1 2019
Apr 1 2019
Nicolastorzec added a comment to T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday.
Thanks for the summary Ariel.
Feb 20 2019
Feb 20 2019
Nicolastorzec added a comment to T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday.
I'm also interested in the specific reasons why the update frequency needs to be changed, i.e. beside streamlining the monthly workload on the Wikimedia machines.
Feb 18 2019
Feb 18 2019
Nicolastorzec added a comment to T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday.
Hi Ariel et al.,
