Page MenuHomePhabricator

An -always for upload.py
Closed, ResolvedPublic

Description

I am writing a script that uses Pywikibot for uploading thousands of files, but upload.py does not have an -always option which prevents interacting with the operator.

Would it be possible to include it? I will not be able to carry out those uploads until this option is available.

Thanks in advance.

Event Timeline

abian raised the priority of this task from to Needs Triage.
abian updated the task description. (Show Details)
abian added a project: Pywikibot.
abian subscribed.
Mr.Ajedrez subscribed.

Adding -always to the script would permit uploading big collections of media to Commons, such as those which are released with GLAM initiatives.

This seems quite easy to do, considering its benefits. Also, I'm aware of abian's (good) work in his script, so this is a blocker for real improvements and results in GLAM projects. I hope it can be solved soon.

Thanks for your support, @Mr.Ajedrez and @-jem-.

Developers: Please do not forget this task. We cannot upload those ~20000 files to Commons without this simple feature.

Mr.Ajedrez raised the priority of this task from High to Unbreak Now!.Aug 19 2015, 11:14 PM

I think it's not hard to do but it's vital in order to go on on uploading all that media. The lack of this feature is delaying abian's great work and it affects on the development of GLAM projects.

It's funny… when it's so easy go ahead and we help you to get it merged. Anyway I looked through all input and input_yn calls:

  • The first is before it even starts when the source file name is invalid. I guess -always will cause that to fail.
  • One is verifying the file name but -keep can already switch that off. Should -always imply -keep then?
  • When the file extension is invalid it asks you if you want to continue or change the name. What should -always do?
  • When the file already exists it asks you if you want to continue or change the name. What should -always do?
  • Additionally to the two above it may be that the file name is invalid (when you upload not to commons but it already exists in commons). May not apply to the project mentioned here but different from the two above -always will skip these files because that cannot be fixed without changing the file name.
  • There is a question whether the description should be changed. Now if the description is already defined beforehand and -noverify is used it won't ask you. But obviously without defining it, `-always' won't be possible.
  • And the last one is a question when a warning happened. There is already infrastructure to ignore it or abort it so -always could require it like the description.
abian lowered the priority of this task from Unbreak Now! to Medium.Aug 20 2015, 10:18 AM

It's funny… when it's so easy go ahead and we help you to get it merged. Anyway I looked through all input and input_yn calls:

Thank you very much for taking this task on, XZise!

Actually, completing it would be hard for us, who have no idea about Pywikibot, and easier for you, but it does require an effort in any case. I am fully aware of it, and I acknowledge it.

  • The first is before it even starts when the source file name is invalid. I guess -always will cause that to fail.

Yes, I think this is what is expected.

  • One is verifying the file name but -keep can already switch that off. Should -always imply -keep then?

I think so.

  • When the file extension is invalid it asks you if you want to continue or change the name. What should -always do?

What does 'invalid' mean?

  • The source file extension does not match the content, but the destination file extension does → upload
  • The destination file extension does not match the content → not to upload (if implemented)
  • The destination file extension is not supported by the project → not to upload
  • The destination file name has no extension → not to upload
  • When the file already exists it asks you if you want to continue or change the name. What should -always do?

In my opinion, this should be configured by another option (for example, -ignorewarn, or a new one, -overwrite). Without these possible options, -always should not upload.

  • Additionally to the two above it may be that the file name is invalid (when you upload not to commons but it already exists in commons). May not apply to the project mentioned here but different from the two above -always will skip these files because that cannot be fixed without changing the file name.

Okay, I believe this is expected.

  • There is a question whether the description should be changed. Now if I understand the code correctly when the description is already defined beforehand it won't ask you. But obviously without defining it, `-always' won't be possible.

Right.

  • And the last one is a question when a warning happened. There is already infrastructure to ignore it or abort it so -always could require it like the description.

Okay. Is there any case which has not been mentioned above?

Thanks again.

  • When the file extension is invalid it asks you if you want to continue or change the name. What should -always do?

What does 'invalid' mean?

  • The source file extension does not match the content, but the destination file extension does → upload
  • The destination file extension does not match the content → not to upload (if implemented)
  • The destination file extension is not supported by the project → not to upload
  • The destination file name has no extension → not to upload

Invalid as in not recognized. As we don't compare it with the content the first two option is not valid.

  • When the file already exists it asks you if you want to continue or change the name. What should -always do?

In my opinion, this should be configured by another option (for example, -ignorewarn, or a new one, -overwrite). Without these possible options, -always should not upload.

Unfortunately -ignorewarn has already a meaning so I don't think we can use that. But yeah maybe -overwrite instead.

Okay. Is there any case which has not been mentioned above?

I don't think so. Unfortunately test the script is quite complicated because as soon as you uploaded a file the server will behave differently when you upload the file again. I'd appreciate when you were able to test it. I should have a patch available soonish.

Change 232708 had a related patch set uploaded (by XZise):
[FEAT] upload: Support -always option

https://gerrit.wikimedia.org/r/232708

Change 232708 merged by jenkins-bot:
[FEAT] upload: Support -always option

https://gerrit.wikimedia.org/r/232708