Page MenuHomePhabricator

Pywikibot UploadRobot should have a method to retrieve file name after change
Closed, ResolvedPublicFeature

Description

Feature summary:
UploadRobot gives you the chance to change the file name before uploading. I would like to easily retrieve that new filename somehow.

Use case(s) :
When uploading a list of files, sometimes you need to change it mid-process. I should be able to keep track of said change in my code.

I ended up editing _upload.py, line 312 and adding

self.last_filename = filename

Benefits:
I think it's self evident

Event Timeline

Both UploadRobot.process_filename() and UploadRobot.upload() already gives the filename back to the calling process (which is UploadRobot.run() by default. I've no glue for what the last_filename attribute is good for. Could you pleas explain a bit more.

Both UploadRobot.process_filename() and UploadRobot.upload() already gives the filename back to the calling process (which is UploadRobot.run() by default. I've no glue for what the last_filename attribute is good for. Could you pleas explain a bit more.

I'm no coder. last_filename was a crappy attempt to be able to get the final filename after .process_filename() finishes.

I'll try to explain once again what I'm trying to achieve. This is my test code

import os
from pywikibot.specialbots import UploadRobot
from pywikibot import Site
Commons = Site('commons', 'commons')

filename = 'Catálogo de libros escogidos - bdh0000197468.pdf'
filepath = os.path.join(os.getcwd() + '\\PDF', filename)

bot = UploadRobot(filepath, description='DESCRIPTION', use_filename=filename,
                  chunk_size=1000000, verify_description=True,
                  target_site=Commons)
x = bot.run()

lista = [(key, bot.__dict__[key]) for key in bot.__dict__.keys()]
print(lista)
  • I follow the guidelines and setup a new instance of UploadRobot with a filename I may or may not give it beforehand.
  • Then call UploadRobot.run()and use the interactive command line upload interface. Inside said interface, I change the filename and upload the file to Commons under a new filename.
  • Then I want to use the new filename in other way, such as creating a list of files I've uploaded, or linking the file back to an article:
    • UploadRobot.run() returns None.
    • The UploadRobot dict of propierties shows that the only filename is "stored" at .use_filename but it's the "old" filename.

So maybe .run() should return the new filename if succesful. I wanted to do that but couldn't. That's the reason I introduced that attribute, to not mess up with the code.

Change #1015990 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [IMPR} Add a post_processing attribute to UploadRobot

https://gerrit.wikimedia.org/r/1015990

@Ninovolador: I made a proposal. Can you please review that patch?

I think it's a little convoluted but after a while I managed to get it working for my specific purpose.

So I ended up doing

def _(old, new):
    bot.filename = new

bot.post_processor = _
bot.run()

return bot.filename

inside my upload helper function. It was a little counter-intuitive and tricky because I needed to retrieve the file name out of that function that I can't get the return value directly, and global variables are ugly, so I ended up attaching it to the bot anyway.

I'm not uploading anything today or in the near future, but I did a couple of tests, and it worked (I got the new filename after uploading it)

Thanks!

PS: I'm going to leave the code as it is, but I was reading into the code and thinking: shouldn't it have a dictionary of file_url: filename pairs, something like self.counter? That would be an straighter way of getting that info

Why not in such way:

def my_upload_function():
    def collector(old, new):
        replacements[old] = new

    replacements = {}
    bot = UploadRobot(['A.jpg', 'b.jpg'])
    bot.post_processor = collector
    bot.run()

    return replacements

or

def my_upload_function():
    def collector(old, new):
        replacements.append((old, new))

    replacements = []
    bot = UploadRobot(['A.jpg', 'b.jpg'])
    bot.post_processor = collector
    bot.run()

    return replacements

Or process the filename directly instead of returning the last value (note: your example gets only the last filename but UploadRobot is able to upload multiple files):

def my_upload_function()
...

def main()
    filename = my_upload_funtion()
    print(filename)

you can delegate the statements dealing with the filename into the post_processing function:

def collector(old, new):
    print(new)

def my_upload_function()
...

def main()
    my_upload_funtion()

I do not know your special needance of your uploader function. Maybe you could share the code to clarify.

btw. there is a sample for imagetransfer.py script here.

The post_processor function is great and I use it now in my code. I see that this is way more flexible when uploading multiple files. In my case I work with one book file at a time, so no need to do that, and in my mind it looked cleaner to just grab something instead of defining functions and such.
But I insist, isn't even more cleaner to just be able to grab the filename using the file_url?

I think about setting an attribute to UploadRobot similar to self.counter, like self.filenames, that stores that information. In the same block where you defined post_processor:

self.filenames[file_url] = filename

The robot would have an internal dictionary that checks 1) if the file was correctly uploaded vs None 2) the final filename of every file. And you can loop or list or whatever you want with that information.

That way, getting the final filename would be one line:

bot.filenames[file_url] instead of various lines and a function in between

Having a UploadRobot dict attribute is to special for a framework; this might lead to unnecessary memory exhausting if you load a images from generator. I think a interface is an appropriate way to do this. Another way would be using the lower level method upload_file where you give the file_url and get the possible new filename back. The current implementation of imagetransfer uses this technique and the new implementation of post_processor is used there with that patch given above.

Having a UploadRobot dict attribute is to special for a framework; this might lead to unnecessary memory exhausting if you load a images from generator. I think a interface is an appropriate way to do this. Another way would be using the lower level method upload_file where you give the file_url and get the possible new filename back. The current implementation of imagetransfer uses this technique and the new implementation of post_processor is used there with that patch given above.

Even though a dictionary with, 100,000 entries with 300 bytes per file_url/filename would use 30MB of memory. Hardly an issue if you want to upload 100,000 files at once.

Change #1015990 merged by jenkins-bot:

[pywikibot/core@master] [IMPR} Add a post_processing attribute to UploadRobot

https://gerrit.wikimedia.org/r/1015990