Page MenuHomePhabricator

mwdumper uses too much memory
Open, MediumPublic

Description

I tried to run the GUI version of the newest revision (r60229) of mwdumper under Java 6 update 17 on an Intel Core i7 with 3,25G RAM and WinXP SP3, and it gave this error:

Exception in thread "Thread-8" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.StringCoding.safeTrim(Unknown Source)
at java.lang.StringCoding.access$300(Unknown Source)
at java.lang.StringCoding$StringEncoder.encode(Unknown Source)
at java.lang.StringCoding.encode(Unknown Source)
at java.lang.String.getBytes(Unknown Source)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:493)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:603)
at com.mysql.jdbc.ByteArrayBuffer.writeStringNoNull(ByteArrayBuffer.java:544)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1638)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2972)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2902)
at com.mysql.jdbc.Statement.execute(Statement.java:529)
at org.mediawiki.importer.SqlServerStream.writeStatement(SqlServerStream.java:25)
at org.mediawiki.importer.SqlWriter.flushInsertBuffer(SqlWriter.java:195)
at org.mediawiki.importer.SqlWriter.bufferInsertRow(SqlWriter.java:184)
at org.mediawiki.importer.SqlWriter15.writeRevision(SqlWriter15.java:68)
at org.mediawiki.importer.PageFilter.writeRevision(PageFilter.java:67)
at org.mediawiki.dumper.ProgressFilter.writeRevision(ProgressFilter.java:56)
at org.mediawiki.importer.XmlDumpReader.closeRevision(XmlDumpReader.java:346)
at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:204)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)

According to the Java docs, default max heap size is 3/4 of the physical memory, that is, around 800M. Since a single revision is at most 2M, there is no reason for mwdumper to require that much space. (It ran on the huwiki full history dump, directly writing to the database.)


Version: unspecified
Severity: enhancement
OS: Windows XP
Platform: PC

Details

Reference
bz21937

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:46 PM
bzimport set Reference to bz21937.
Tgr created this task.Dec 24 2009, 12:14 AM
Tgr added a comment.Dec 24 2009, 12:18 AM

After manually raising the max heap size, it ran smoothly, unlike the older versions available from download.wikimedia.org which didn't even start. Is there any reason to recommend the broken old versions instead of a current one? ([[mw:MWDumper]] points to a third version attached in a bug report, which also didn't seem to work.)

drdee added a comment.Jan 29 2011, 8:16 PM

The solution seems to be to increase the size of the heap as explained on http://www.mediawiki.org/wiki/Manual:MWDumper#Troubleshooting

I'll mark this bugs as Resolved and Worksforme, if the bugreporter feels that this is still an issue then please reopen the bug.

As a bigger question though - why does it need so much memory? Doesn't it interpert the dumps a little at a time, and thus shouldn't need all that much memory?

Tgr added a comment.Jan 29 2011, 9:49 PM

(In reply to comment #2)

The solution seems to be to increase the size of the heap as explained on
http://www.mediawiki.org/wiki/Manual:MWDumper#Troubleshooting

Yeah, I'm probably aware of that, since I was the one who added it there :)

The point, as Bawolff said, is that MWDumper should not need a default heap size of ~1GB when the largest revision is below 2MB. Either there is a memory leak, or something is done really inefficiently.

brion removed brion as the assignee of this task.Feb 17 2015, 7:30 PM
brion set Security to None.