0) Summary
I tried to build a mirror of enwikinews using mwxml2sql. This failed whenever mwxml2sql encountered a page from namespace 90 (Thread).
I tried again using maintenance/importDump.php. This worked better. However, it appears that importDump.php ignores namespace 90, because no such pages are later found in the enwikinews.page database table.
- Dataset
enwikinews-20140605-pages-meta-current.xml.bz2
- Error messages
WHINE: (155323) no end page tag
When I divide the XML data dump into smaller files of say 1000 pages, I can find many more such errors.
- Pages that cause errors
<page>
<title>Thread:Comments:Chip and PIN 'not fit for purpose', says Cambridge researcher/Those in positions of power shirking responsibility and lying?</title\>
<ns>90</ns>
<id>155323</id>
<DiscussionThreading>
<ThreadSubject>Those in positions of power shirking responsibility and lying?</ThreadSubject>
<ThreadPage>Comments:Chip and PIN 'not fit for purpose', says Cambridge researcher</ThreadPage>
<ns>90</ns>
<id>155323</id>
<DiscussionThreading>
<ThreadSubject>Those in positions of power shirking responsibility and lying?</ThreadSubject>
<ThreadPage>Comments:Chip and PIN 'not fit for purpose', says Cambridge researcher</ThreadPage>
<ThreadID>92</ThreadID>
<ThreadAuthor>70.31.58.181</ThreadAuthor>
<ThreadEditStatus>has-reply</ThreadEditStatus>
<ThreadType>normal</ThreadType>
<ThreadSignature>[[Special:Contributions/70.31.58.181|70.31.58.181]] ([[User talk:70.31.58.181|talk]])</ThreadSignature>
</DiscussionThreading>
<revision>
<id>958267</id>
<timestamp>2010-02-15T04:04:56Z</timestamp>
<contributor>
<ip>70.31.58.181</ip>
</contributor>
<comment>New thread: Those in positions of power shirking responsibility and lying?</comment>
<text xml:space="preserve">"All the banks are lying. They are maliciously and wilfully deceiving the customer [...] The system is not fit for purpose." I'm so surprised that I've apparently transcended a serious remark and instead am being sarcastic. Incidentally, only part of that sentence was sarcastic.</text>
<sha1>rjidk12i4hv2mxia3a8qq620rlc7lok</sha1>
<model>wikitext</model>
<format>text/x-wiki</format>
</revision>
</page>
- Namespace of pages that cause errors
<namespace key="90" case="first-letter">Thread</namespace>
- Use of importDump.php
Apparently importDump.php ignores namespace 90.
mysql> select page_id,page_namespace,page_title from enwikinews.page where page_id=155323;
Empty set (0.00 sec)
mysql> select page_id,page_namespace,page_title from enwikinews.page where page_namespace=90;
Empty set (0.00 sec)
Sincerely Yours,
Kent
Version: unspecified
Severity: major
OS: Linux
Platform: PC