Page MenuHomePhabricator

Custom content types whose getNativeData() method returns a non-string value can break XML dump generation
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Define a custom Content and ContentHandler; make the custom Content subclass return something from getNativeData that's not a string.
  • Create a page using the custom content model.
  • Try to export the page or generate a wiki-wide XML dump via dumpBackup.php

What happens?:

TypeError: strlen(): Argument #1 ($string) must be of type string, array given
Backtrace:
from includes/export/XmlDumpWriter.php(569)
#0 includes/export/XmlDumpWriter.php(569): strlen(array)
#1 includes/export/XmlDumpWriter.php(503): XmlDumpWriter->writeText(Fandom\InteractiveMaps\Content\InteractiveMapContent, array, string)
#2 includes/export/XmlDumpWriter.php(389): XmlDumpWriter->writeSlot(MediaWiki\Revision\SlotRecord, integer)
#3 includes/export/WikiExporter.php(555): XmlDumpWriter->writeRevision(stdClass, array)
#4 includes/export/WikiExporter.php(493): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\MysqliResultWrapper, stdClass)
#5 includes/export/WikiExporter.php(315): WikiExporter->dumpPages(string, boolean)
#6 includes/export/WikiExporter.php(175): WikiExporter->dumpFrom(string)
#7 maintenance/includes/BackupDumper.php(363): WikiExporter->allPages()
#8 maintenance/dumpBackup.php(84): BackupDumper->dump(integer, integer)
#9 maintenance/includes/MaintenanceRunner.php(309): DumpBackup->execute()
#10 maintenance/doMaintenance.php(85): MediaWiki\Maintenance\MaintenanceRunner->run()
#11 maintenance/dumpBackup.php(144): require_once(string)
#12 {main}

What should have happened instead?:
The export / dump should have generated successfully.

Software version : MW 1.37

Event Timeline

Change 885444 had a related patch set uploaded (by TK-999; author: TK-999):

[mediawiki/core@master] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885444

Are you running with modifications? Line 571 is the strlen() - https://github.com/wikimedia/mediawiki/blame/REL1_37/includes/export/XmlDumpWriter.php#L571

Obviously 1.37 is unsupported, but clearly can still happen on master as the code is basically identical...

Are you running with modifications? Line 571 is the strlen() - https://github.com/wikimedia/mediawiki/blame/REL1_37/includes/export/XmlDumpWriter.php#L571

Obviously 1.37 is unsupported, but clearly can still happen on master as the code is basically identical...

My bad, the stacktrace is from 1.39 - https://github.com/wikimedia/mediawiki/blame/REL1_39/includes/export/XmlDumpWriter.php#L569. The issue reproduces on both 1.37 and 1.39. We're in the middle of deploying 1.39.

As the code is from rMWfdc3e9f9524d: Add support for xml dump schema 0.11, I suspect it's applicable all the way back to 1.34 :)

Yup. It does require a custom content type though, which is probably why it went unnoticed.

getNativeData was deprecated in 1.33. It's still used in XmlDumpWriter though:

		if ( $content instanceof TextContent ) {
			// HACK: For text based models, bypass the serialization step. This allows extensions (like Flow)
			// that use incompatible combinations of serialization format and content model.
			$data = $content->getNativeData();
		} else {
			$data = $content->serialize( $contentFormat );
		}

It seems to me like this should be fixed in Flow. getNativeData is broken by design, nothing should use it. Import/export should rely on serialize().

Anyway, if you have a Content object that extends TextContent but does not return a string from getNativeData(), this will explode.

getNativeData was deprecated in 1.33. It's still used in XmlDumpWriter though:

		if ( $content instanceof TextContent ) {
			// HACK: For text based models, bypass the serialization step. This allows extensions (like Flow)
			// that use incompatible combinations of serialization format and content model.
			$data = $content->getNativeData();
		} else {
			$data = $content->serialize( $contentFormat );
		}

It seems to me like this should be fixed in Flow. getNativeData is broken by design, nothing should use it. Import/export should rely on serialize().

Anyway, if you have a Content object that extends TextContent but does not return a string from getNativeData(), this will explode.

Yeah, the fix just changes this to getText.

Change 885444 merged by jenkins-bot:

[mediawiki/core@master] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885444

Change 885775 had a related patch set uploaded (by Reedy; author: TK-999):

[mediawiki/core@REL1_39] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885775

Change 885776 had a related patch set uploaded (by Reedy; author: TK-999):

[mediawiki/core@REL1_38] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885776

Change 885777 had a related patch set uploaded (by Reedy; author: TK-999):

[mediawiki/core@REL1_35] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885777

Change 885776 merged by jenkins-bot:

[mediawiki/core@REL1_38] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885776

Change 885775 merged by jenkins-bot:

[mediawiki/core@REL1_39] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885775

Change 885777 merged by jenkins-bot:

[mediawiki/core@REL1_35] Fix XML dumps for content types with non-string getNativeData()

https://gerrit.wikimedia.org/r/885777

Reedy assigned this task to TK-999.