Page MenuHomePhabricator

reflinks.py crashes on certain article in Ru.Wp
Closed, ResolvedPublic

Description

When I am running reflinks.py, the bot always crashes after the same article
https://ru.wikipedia.org/wiki/%D0%90%D0%BB%D0%BB%D0%B5%D1%8F_%D0%93%D0%B5%D1%80%D0%BE%D0%B5%D0%B2_(%D0%A1%D0%B0%D0%BD%D0%BA%D1%82-%D0%9F%D0%B5%D1%82%D0%B5%D1%80%D0%B1%D1%83%D1%80%D0%B3)

I don't see anything bad in the article and it seems to be a script problem.

Traceback:

No changes were needed on [[Alleya polkovodcev (Yaroslavl')]] ***
No changes were needed on [[Alleya Geroev (Sankt-Peterburg)]] ***
Traceback (most recent call last):
  File "core/pwb.py", line 222, in <module>
    run_python_file(filename, argv, argvu, file_package)
  File "core/pwb.py", line 81, in run_python_file
    main_mod.__dict__)
  File "core/scripts/reflinks.py", line 846, in <module>
    main()
  File "core/scripts/reflinks.py", line 843, in main
    bot.run()
  File "core/scripts/reflinks.py", line 605, in run
    compressed = io.StringIO(f.read())
TypeError: initial_value must be unicode or None, not str
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Details

Related Gerrit Patches:

Event Timeline

Rubin16 created this task.Jan 11 2015, 8:56 AM
Rubin16 raised the priority of this task from to Normal.
Rubin16 updated the task description. (Show Details)
Rubin16 added a project: Pywikibot.
Rubin16 added a subscriber: Rubin16.
Xqt updated the task description. (Show Details)Jan 11 2015, 10:10 AM
Xqt set Security to None.
Mpaa added a subscriber: Mpaa.Jan 26 2015, 9:31 PM

I was not able to reproduce it.
Can you provide the full command you used?

If it is possible, I will send you a set of tracebacks with errors in other articles a couple of week later - I am on the business trip now with no access to Labs...
I'll copy the exact command, too.

Aklapper changed the task status from Open to Stalled.Jan 26 2015, 9:59 PM

[Please reset task status once provided]

XZise added a subscriber: XZise.Jan 26 2015, 10:10 PM

This is strange. If it's compressed it's put into a io.StringIO object? For binary data (which is what compressed data is) the io.BytesIO should be used. There is also StringIO.StringIO which accepts both, but was removed in Python 3 so using that should be avoided. Maybe someone (probably me and me) has changed it from StringIO.StringIO to io.StringIO because the latter is available in both and they function similar. So I think there is actually a bug, although I'm not sure why @Mpaa hasn't experienced it. Maybe that section of code wasn't used.

gerritbot added a subscriber: gerritbot.

Change 186950 had a related patch set uploaded (by XZise):
[FIX] reflinks: Use BytesIO for binary data

https://gerrit.wikimedia.org/r/186950

Patch-For-Review

XZise added a comment.Jan 27 2015, 1:05 PM

I've provided a patch as it looks wrong to me, but I haven't tested it before or afterwards so I'm not sure if this does solve it as @Mpaa hasn't had that problem (although if no gzipped reflink was processed it doesn't happen).

Change 186950 merged by jenkins-bot:
[FIX] reflinks: Use BytesIO for binary data

https://gerrit.wikimedia.org/r/186950

Rubin16 closed this task as Resolved.Feb 22 2015, 5:02 PM
Rubin16 claimed this task.

Seems to be fixed now.