Page MenuHomePhabricator

bz2 dumps cannot be read with PHP
Closed, ResolvedPublic

Description

The last two ones, as of writing this, cannot be read. After 19 entities, bzread returns empty string and feof returns true. Does not happen for the dump before. Does not happen using bzip2 ubuntu package. Does not happen when extracting 1k entities using the ubuntu package, recompressing them, and having PHP go at it. Happens with PHP 5.6, 7.0.0RC6 and 7.0.0-dev.

Wrote a more detailed report, but hit back button, and apparently Phabricator is nice enough to prevent my browser from remembering the text.

Event Timeline

JeroenDeDauw raised the priority of this task from to Needs Triage.
JeroenDeDauw updated the task description. (Show Details)
JeroenDeDauw added a subscriber: JeroenDeDauw.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 11 2015, 11:08 AM
JeroenDeDauw set Security to None.
Lucie added a subscriber: hoo.
hoo added a subscriber: daniel.Nov 11 2015, 1:25 PM

Almost certainly related to us using pbzip2 now (instead of just bzip2, which we used initially). I doubt there's anything we can do about this.

If it is really important, we might need to go back to bzip2, possibly with a lower block size because the default already takes 4 hours for a single dump and those aren't getting smaller…

In case supporting PHP for this dump format is not something that will be done, it'd be good to add some note so people don't waste their time on this like I did.

Yeh, I ran into this last night.
Super annoying :/

hoo added a comment.Dec 7 2015, 9:38 AM

I looked at this a little bit and the problem here is that both Zend and HHVM use the zlib-like bzip2 functions (http://www.bzip.org/1.0.3/html/zlib-compat.html) which only handle a single bzip2 stream, but out pbzip2 compressed dumps consist of several streams (that's how pbzip2 manages to parallelize).

In order to fix this, both upstreams (Zend and HHVM) would need to change to using the "native" bz2 interface (http://www.bzip.org/1.0.3/html/hl-interface.html). Other hacks might be possible, I didn't look into that though.

hoo added a comment.Dec 7 2015, 10:59 AM

Alternatively, the zlib-like functions could be adopted… but I'm not sure how much work that would be or whether that's even desirable.

hoo added a comment.Jan 15 2016, 10:49 PM

From Monday on the bzip2 dumps will be written with plain bzip2 again, thus they should be readable again with all sorts of readers.

Shall we close this?

Lydia_Pintscher closed this task as Resolved.Jan 16 2016, 3:20 PM
Lydia_Pintscher claimed this task.