Page MenuHomePhabricator

xmlreader.py fails a lot
Closed, ResolvedPublicBUG REPORT

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1245/
Reported by: emijrp
Created on: 2010-10-03 13:51:00
Subject: xmlreader.py fails a lot
Original description:
Hi all;

I think that there is an error in xmlreader.py. When parsing a full revision XML \(in this case\[1\]\), using this code\[2\] \(look at the try-catch, it writes in console when it fails\) I get correctly username, timestamp and revisionid, but sometimes, the page title and the page id are None or empty string.

The first error is:
\['', None, 'QuartierLatin1968', '2004-10-10T04:24:14Z', '4267'\] \#look the empty string for the title, and the None for pageid

But if we do:
7za e -bd -so kwwiki-20100926-pages-meta-history.xml.7z 2>/dev/null | egrep -i '2004-10-10T04::14Z' -C20

We get this\[3\], which is OK, the page title and the page id are available in the XML, but not correctly parsed. And this is not the only page title and page it that fails.

Perhaps I have missed something, because I'm learning to parsing XML. Sorry in that case.

Regards,
emijrp

\[1\] http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-history.xml.7z
\[2\] http://pastebin.ca/1951930
\[3\] http://pastebin.ca/1951937


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1245

Details

Reference
bz55259

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:30 AM
bzimport set Reference to bz55259.
bzimport added a subscriber: Unknown Object (????).

I don't think this bug is still valid

Aklapper triaged this task as Low priority.Feb 4 2022, 8:07 PM
Aklapper changed the subtype of this task from "Task" to "Bug Report".
Xqt claimed this task.
Xqt subscribed.

I don't think this bug is still valid

This bug is 12 years old and possibly solved. Feel free to reopen if the bug is still valid

Xqt removed Xqt as the assignee of this task.Feb 11 2022, 4:52 PM