Page MenuHomePhabricator

ImportTextFiles.php fails for text files with # (hash) in their filename
Closed, ResolvedPublic

Description

Not that one should go around creating files like #Test_page.txt, but if one does and then tries to import them with importTextFiles.php, this happens:

sam@memex:~/public_html/mediawiki/maintenance$ echo 'Test' > \#Test_page.txt 

sam@memex:~/public_html/mediawiki/maintenance$ php importTextFiles.php \#Test_page.txt 
Importing 1 pages...
Exception encountered, of type "InvalidArgumentException"
[25263c947f26094a28c2705c] [no req]   InvalidArgumentException from line 100 of /home/sam/public_html/mediawiki/includes/deferred/LinksUpdate.php: The Title object yields no ID. Perhaps the page doesn't exist?
Backtrace:
#0 /home/sam/public_html/mediawiki/includes/content/AbstractContent.php(234): LinksUpdate->__construct(Title, ParserOutput, boolean)
#1 /home/sam/public_html/mediawiki/includes/page/WikiPage.php(2184): AbstractContent->getSecondaryDataUpdates(Title, NULL, boolean, ParserOutput)
#2 /home/sam/public_html/mediawiki/includes/import/WikiRevision.php(554): WikiPage->doEditUpdates(Revision, User, array)
#3 /home/sam/public_html/mediawiki/maintenance/importTextFiles.php(141): WikiRevision->importOldRevision()
#4 /home/sam/public_html/mediawiki/maintenance/doMaintenance.php(103): ImportTextFiles->execute()
#5 /home/sam/public_html/mediawiki/maintenance/importTextFiles.php(201): require_once(string)
#6 {main}

It also doesn't give the desired result when there's a hash (or octothorpe) character within the filename, but in that situation it does at least use the preceding part of the filename for the page title.

This is with MediaWiki 1.27.0.

Event Timeline

Change 304603 had a related patch set uploaded (by TTO):
Fix importation of weird file names in importTextFiles.php

https://gerrit.wikimedia.org/r/304603

Thanks @TTO! — does your fix work for filenames like Test #2 file.txt though? I think it resolves to just use the first part (i.e. Test) and imports with that.

Ah, that makes sense; I thought I'd have a crack at it, but you're right actually checking for '#' is best. Thanks @TTO.

Change 304603 merged by jenkins-bot:
Fix importation of weird file names in importTextFiles.php

https://gerrit.wikimedia.org/r/304603

Thanks to the code review efforts of the legendary @Legoktm, this is now fixed :)

Legoktm assigned this task to TTO.

:) no problem