Page MenuHomePhabricator

Mediawiki 1.34.2 tarball incompatible with 7zip on windows due to Pax format
Open, Needs TriagePublic

Description

There have been several complaints on support desk that 7zip doesn't support most recent tarball (but is fine with older tarballs), and extracts them incorrectly breaking mediawiki.

I have not verified this myself.

Event Timeline

Bawolff created this task.Jul 4 2020, 9:03 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 4 2020, 9:03 AM
Bawolff added a subscriber: Reedy.Jul 4 2020, 9:03 AM

Can confirm. When unpacking the tar, I get prompted if I want to overwrite the @PaxHeader file, apparently for every file in the tarball.

Reedy added a comment.Jul 4 2020, 11:32 AM

Most obvious answer I can see is due to an upgrade to Ubuntu 20.04, and the subsequent upgrade of python... or other libraries?

I know I did it with python3, which I have done for previous releases

$ python3 --version
Python 3.8.2
$ pip3 install git_archive_all
Requirement already satisfied: git_archive_all in ./.local/lib/python3.8/site-packages (1.21.0)
Reedy added a comment.Jul 4 2020, 11:44 AM

Though, a bit of googling does make it look like more of a 7-zip (or possibly 7-zip and windows?) issue - https://sourceforge.net/p/sevenzip/bugs/2116/ and https://sourceforge.net/p/sevenzip/discussion/45797/thread/f58b3570/

Obviously that doesn't mean that there isn't anything we can do to help, but it seems 7-zip is more at fault than the tools used to create the tarballs; especially as it works fine with "normal" rar tools

Even winrar supports them ;)

https://superuser.com/questions/1557274/7zip-has-problems-extracting-posix-tar-archives-due-to-paxheader-files

https://github.com/Flood-UI/flood/issues/606

It does look like it's possibly an issue with long file names/paths etc... Was 1.34.1 ok? Is 1.33.4 ok? I don't think any new files were actually added to 1.34.2 that weren't in previous releases...

Reedy renamed this task from Mediawiki 1.34.2 tarball incompatibe with 7zip on windows due to Pax format to Mediawiki 1.34.2 tarball incompatible with 7zip on windows due to Pax format.Jul 4 2020, 11:45 AM

[...] Was 1.34.1 ok? Is 1.33.4 ok? I don't think any new files were actually added to 1.34.2 that weren't in previous releases...

I tried 1.34.1; no issue there. 1.33.4 has the issue.

Discussions in 7zip:

From 2016, apparently tar can use different extensions to handle long filenames. @PaxHeader is one, @LongLink seems to be another
https://sourceforge.net/p/sevenzip/discussion/45797/thread/f58b3570/

From 2020, issue specific for MediaWiki 1.34.2
https://sourceforge.net/p/sevenzip/discussion/45797/thread/d95e3ad59b/

I'm not sure if 7z does support @LongLink neither. In the last message on this bug from 2017, the maintainer says it will fix that in next version, but the bug is still open.

Legoktm added a subscriber: Legoktm.Aug 1 2020, 5:22 AM

Most obvious answer I can see is due to an upgrade to Ubuntu 20.04, and the subsequent upgrade of python... or other libraries?

I know I did it with python3, which I have done for previous releases

$ python3 --version
Python 3.8.2
$ pip3 install git_archive_all
Requirement already satisfied: git_archive_all in ./.local/lib/python3.8/site-packages (1.21.0)

Python 3.8 contains https://github.com/python/cpython/commit/e680c3db80efc4a1d637dd871af21276db45ae03 which switched the default tar format from GNU to PAX. The ticket says it should be compatible with "7-zip (Windows) at some point before 2011 (>8 years ago), with significant bug fixes up to 2011 (8 years ago)".

Is everyone using the latest 7-zip version? There's also a note that some older Windows versions won't work (see this comment) - what Windows versions are being used here?

Depending on those responses we could try generating a 1.34.2 tarball with GNU_FORMAT (will need some hacks to git-archive-all) to see if that makes a difference.

Is everyone using the latest 7-zip version? There's also a note that some older Windows versions won't work (see this comment) - what Windows versions are being used here?

My comments were based on 7zip 16.04, on Windows 10 1903 (build 18362). Same result when upgrading to 7zip 19.00.

@Mainframe98 Could you test out these GNU and PAX format tarballs and let me know which, if any, you're able to extract: https://people.wikimedia.org/~legoktm/T257102/ ?

FileResult
gnu_mediawiki-1.34.2.tar.gzExtracts properly
gnu_mediawiki-core-1.34.2.tar.gzExtracts properly
pax_mediawiki-1.34.2.tar.gzExtracts, but with @PaxHeader prompt - the extracted files sometimes have their filename truncated
pax_mediawiki-core-1.34.2.tar.gzExtracts, but with @PaxHeader prompt - the extracted files sometimes have their filename truncated

In short, the gnu variants work; the pax ones do not.

I filed https://github.com/Kentzo/git-archive-all/issues/83 upstream to allow us to specify tarfile.GNU_FORMAT.

In the meantime @Reedy could use a patch like

diff --git a/git_archive_all.py b/git_archive_all.py
index 9d21257..83c0288 100755
--- a/git_archive_all.py
+++ b/git_archive_all.py
@@ -234,14 +234,15 @@ class GitArchiver(object):
                 import tarfile
 
                 mode = self.TARFILE_FORMATS[output_format]
+                tarformat = tarfile.GNU_FORMAT
 
                 if compresslevel is not None:
                     try:
-                        archive = tarfile.open(path.abspath(output_path), mode, compresslevel=compresslevel)
+                        archive = tarfile.open(path.abspath(output_path), mode, format=tarformat, compresslevel=compresslevel)
                     except TypeError:
                         raise ValueError("{0} cannot be compressed".format(output_format))
                 else:
-                    archive = tarfile.open(path.abspath(output_path), mode)
+                    archive = tarfile.open(path.abspath(output_path), mode, format=tarformat)
 
                 def add_file(file_path, arcname):
                     archive.add(file_path, arcname)

...or just use Python 3.7? :p

Change 620647 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/tools/release@master] make-release: Build tarballs in GNU format

https://gerrit.wikimedia.org/r/620647

Change 620647 merged by jenkins-bot:
[mediawiki/tools/release@master] make-release: Build tarballs in GNU format

https://gerrit.wikimedia.org/r/620647

Legoktm claimed this task.Aug 24 2020, 8:28 AM

@Mainframe98 can you confirm that the new 1.35.0-rc.2 tarballs extract properly? If so, I think we can close this.

Mainframe98 added a comment.EditedAug 24 2020, 8:41 AM

I followed the link in https://lists.wikimedia.org/pipermail/wikitech-l/2020-August/093737.html, but that file still has the paxfile issue. Trying to run the web installer prompts me with Uncaught Error: Interface 'MediaWiki\Diff\Hook\DifferenceEngineShowDiffPageMaybeShowMissingRevisionHook'. Looking for the file, it appears to have its filename truncated.

So, unfortunately, it's still broken :(

dancy updated the task description. (Show Details)Sep 14 2020, 4:48 PM