Page MenuHomePhabricator

ImportTextFiles.php fails for text files with # (hash) in their filename
Closed, ResolvedPublic

Description

Not that one should go around creating files like #Test_page.txt, but if one does and then tries to import them with importTextFiles.php, this happens:

sam@memex:~/public_html/mediawiki/maintenance$ echo 'Test' > \#Test_page.txt 

sam@memex:~/public_html/mediawiki/maintenance$ php importTextFiles.php \#Test_page.txt 
Importing 1 pages...
Exception encountered, of type "InvalidArgumentException"
[25263c947f26094a28c2705c] [no req]   InvalidArgumentException from line 100 of /home/sam/public_html/mediawiki/includes/deferred/LinksUpdate.php: The Title object yields no ID. Perhaps the page doesn't exist?
Backtrace:
#0 /home/sam/public_html/mediawiki/includes/content/AbstractContent.php(234): LinksUpdate->__construct(Title, ParserOutput, boolean)
#1 /home/sam/public_html/mediawiki/includes/page/WikiPage.php(2184): AbstractContent->getSecondaryDataUpdates(Title, NULL, boolean, ParserOutput)
#2 /home/sam/public_html/mediawiki/includes/import/WikiRevision.php(554): WikiPage->doEditUpdates(Revision, User, array)
#3 /home/sam/public_html/mediawiki/maintenance/importTextFiles.php(141): WikiRevision->importOldRevision()
#4 /home/sam/public_html/mediawiki/maintenance/doMaintenance.php(103): ImportTextFiles->execute()
#5 /home/sam/public_html/mediawiki/maintenance/importTextFiles.php(201): require_once(string)
#6 {main}

It also doesn't give the desired result when there's a hash (or octothorpe) character within the filename, but in that situation it does at least use the preceding part of the filename for the page title.

This is with MediaWiki 1.27.0.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added subscribers: TTO, Aklapper. · View Herald TranscriptAug 11 2016, 5:18 AM
Samwilson updated the task description. (Show Details)Aug 11 2016, 7:07 AM

Change 304603 had a related patch set uploaded (by TTO):
Fix importation of weird file names in importTextFiles.php

https://gerrit.wikimedia.org/r/304603

Thanks @TTO! — does your fix work for filenames like Test #2 file.txt though? I think it resolves to just use the first part (i.e. Test) and imports with that.

Ah, that makes sense; I thought I'd have a crack at it, but you're right actually checking for '#' is best. Thanks @TTO.

Change 304603 merged by jenkins-bot:
Fix importation of weird file names in importTextFiles.php

https://gerrit.wikimedia.org/r/304603

TTO added a subscriber: Legoktm.Sep 21 2016, 5:05 AM

Thanks to the code review efforts of the legendary @Legoktm, this is now fixed :)

Legoktm closed this task as Resolved.Sep 21 2016, 5:09 AM
Legoktm assigned this task to TTO.

:) no problem