Page MenuHomePhabricator

FilePage title should be checked for a valid file extension
Closed, ResolvedPublicBUG REPORT

Description

FilePage.__init__ should raise when the title does not have a valid file extension.

original report

When a # character is present when setting the title of a FilePage object, the resultant FilePage's title mysteriously loses the space character that was preceding the #. This caused me to upload files to incorrect titles. See test code:

title = 'Front_Cover_Album_#3_-_DPLA_-_e0ff466e4c02b7ee526863936a6f1512_(page_2).jpg'

site = pywikibot.Site()
testpage = pywikibot.Page(site, title=title)
testfile = pywikibot.FilePage(site, title=title)

print('Title: ' + title)
print('Test: ' + testfile.title())

Output:

Title: Front_Cover_Album_#3_-_DPLA_-_e0ff466e4c02b7ee526863936a6f1512_(page_2).jpg
Test: File:Front Cover Album#3 - DPLA - e0ff466e4c02b7ee526863936a6f1512 (page 2).jpg

Event Timeline

Dominicbm updated the task description. (Show Details)
JJMC89 subscribed.

# is not permitted in titles. See mw:Manual:Page title#Invalid page titles.

# is not permitted in titles. See mw:Manual:Page title#Invalid page titles.

Yes, this is part of the problem, surely. That # isn't permitted in titles isn't in question. But it should either raise InvalidTitleError like other characters (like [) do, or should normalize it, like the MediaWiki API does when you use this in page titles. (For example, upload a file with the name I gave, and it gets normalized to - by the Wikimedia Commons, rather than returning any error at all.) Instead, it just produces seemingly buggy behavior. As I said, it resulted in me uploading pages to the wrong title, since it didn't either raising an error I could handle or normalize the character correctly.

# is permitted in Page object titles since it represents a link to a section. Spaces before it get removed since MediaWiki page titles don't have trailing whitespace.

Dominicbm renamed this task from "#" character in Page object title eats leading whitespace to "#" character in FilePage object title eats leading whitespace.Sep 6 2023, 11:52 PM
Dominicbm updated the task description. (Show Details)

# is permitted in Page object titles since it represents a link to a section. Spaces before it get removed since MediaWiki page titles don't have trailing whitespace.

That makes sense for Page objects, I didn't really realize that was how it works. I changed the title and am restricting this report to FilePages only, since it seems like the behavior of Pages is being used for files, in a context where it does not make sense. I think the solution is to normalize the character to - like the MediaWiki API does when creating a FilePage title. A FilePage probably shouldn't be permitted to have the character in the title, unlike Page, but normalizing is a more graceful way to handle it. Can we repoen with that scope?

Change 955411 had a related patch set uploaded (by JJMC89; author: JJMC89):

[pywikibot/core@master] FilePage: raise ValueError when title doesn't have a valid file extension

https://gerrit.wikimedia.org/r/955411

FilePages can also have sections, the same as Page.

The title without the section does not have a file extension, which can be detected.

JJMC89 renamed this task from "#" character in FilePage object title eats leading whitespace to FilePage title should be checked for a valid file extension.Sep 7 2023, 12:46 AM
JJMC89 reopened this task as In Progress.
JJMC89 claimed this task.
JJMC89 triaged this task as Medium priority.
JJMC89 updated the task description. (Show Details)
JJMC89 updated Other Assignee, added: JJMC89.
JJMC89 updated Other Assignee, removed: JJMC89.
Xqt reopened this task as In Progress.Sep 7 2023, 6:37 AM

Change 955411 merged by jenkins-bot:

[pywikibot/core@master] FilePage: raise ValueError when title doesn't have a valid file extension

https://gerrit.wikimedia.org/r/955411

Change #1015994 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [fix] fil title must have a valid file extension

https://gerrit.wikimedia.org/r/1015994

Change #1015994 merged by jenkins-bot:

[pywikibot/core@master] [fix] file title must have a valid file extension

https://gerrit.wikimedia.org/r/1015994