Right now when the dump is generated, references are identified by content has. This means reference to German WIkipedia always produces "ref:004ec6fbee857649acdbdbad4f97b2c8571df97". However, since these are many such references, the data for this reference is repeated over and over, potentially creating thousands of copies of the same information. We need to remove the duplicates from the dump - or change the way the hash is generated (how?)
Additionally, we may encounter the same problem when importing updates, so we must account for this when we make the update procedure.