Page MenuHomePhabricator

Mediawiki 1.34.2 tarball incompatible with 7zip on windows due to Pax format
Open, LowPublic

Description

There have been several complaints on support desk that 7zip doesn't support most recent tarball (but is fine with older tarballs), and extracts them incorrectly breaking mediawiki.

I have not verified this myself.

Event Timeline

Can confirm. When unpacking the tar, I get prompted if I want to overwrite the @PaxHeader file, apparently for every file in the tarball.

Most obvious answer I can see is due to an upgrade to Ubuntu 20.04, and the subsequent upgrade of python... or other libraries?

I know I did it with python3, which I have done for previous releases

$ python3 --version
Python 3.8.2
$ pip3 install git_archive_all
Requirement already satisfied: git_archive_all in ./.local/lib/python3.8/site-packages (1.21.0)

Though, a bit of googling does make it look like more of a 7-zip (or possibly 7-zip and windows?) issue - https://sourceforge.net/p/sevenzip/bugs/2116/ and https://sourceforge.net/p/sevenzip/discussion/45797/thread/f58b3570/

Obviously that doesn't mean that there isn't anything we can do to help, but it seems 7-zip is more at fault than the tools used to create the tarballs; especially as it works fine with "normal" rar tools

Even winrar supports them ;)

https://superuser.com/questions/1557274/7zip-has-problems-extracting-posix-tar-archives-due-to-paxheader-files

https://github.com/Flood-UI/flood/issues/606

It does look like it's possibly an issue with long file names/paths etc... Was 1.34.1 ok? Is 1.33.4 ok? I don't think any new files were actually added to 1.34.2 that weren't in previous releases...

Reedy renamed this task from Mediawiki 1.34.2 tarball incompatibe with 7zip on windows due to Pax format to Mediawiki 1.34.2 tarball incompatible with 7zip on windows due to Pax format.Jul 4 2020, 11:45 AM

[...] Was 1.34.1 ok? Is 1.33.4 ok? I don't think any new files were actually added to 1.34.2 that weren't in previous releases...

I tried 1.34.1; no issue there. 1.33.4 has the issue.

Discussions in 7zip:

From 2016, apparently tar can use different extensions to handle long filenames. @PaxHeader is one, @LongLink seems to be another
https://sourceforge.net/p/sevenzip/discussion/45797/thread/f58b3570/

From 2020, issue specific for MediaWiki 1.34.2
https://sourceforge.net/p/sevenzip/discussion/45797/thread/d95e3ad59b/

I'm not sure if 7z does support @LongLink neither. In the last message on this bug from 2017, the maintainer says it will fix that in next version, but the bug is still open.

Most obvious answer I can see is due to an upgrade to Ubuntu 20.04, and the subsequent upgrade of python... or other libraries?

I know I did it with python3, which I have done for previous releases

$ python3 --version
Python 3.8.2
$ pip3 install git_archive_all
Requirement already satisfied: git_archive_all in ./.local/lib/python3.8/site-packages (1.21.0)

Python 3.8 contains https://github.com/python/cpython/commit/e680c3db80efc4a1d637dd871af21276db45ae03 which switched the default tar format from GNU to PAX. The ticket says it should be compatible with "7-zip (Windows) at some point before 2011 (>8 years ago), with significant bug fixes up to 2011 (8 years ago)".

Is everyone using the latest 7-zip version? There's also a note that some older Windows versions won't work (see this comment) - what Windows versions are being used here?

Depending on those responses we could try generating a 1.34.2 tarball with GNU_FORMAT (will need some hacks to git-archive-all) to see if that makes a difference.

Is everyone using the latest 7-zip version? There's also a note that some older Windows versions won't work (see this comment) - what Windows versions are being used here?

My comments were based on 7zip 16.04, on Windows 10 1903 (build 18362). Same result when upgrading to 7zip 19.00.

@Mainframe98 Could you test out these GNU and PAX format tarballs and let me know which, if any, you're able to extract: https://people.wikimedia.org/~legoktm/T257102/ ?

FileResult
gnu_mediawiki-1.34.2.tar.gzExtracts properly
gnu_mediawiki-core-1.34.2.tar.gzExtracts properly
pax_mediawiki-1.34.2.tar.gzExtracts, but with @PaxHeader prompt - the extracted files sometimes have their filename truncated
pax_mediawiki-core-1.34.2.tar.gzExtracts, but with @PaxHeader prompt - the extracted files sometimes have their filename truncated

In short, the gnu variants work; the pax ones do not.

I filed https://github.com/Kentzo/git-archive-all/issues/83 upstream to allow us to specify tarfile.GNU_FORMAT.

In the meantime @Reedy could use a patch like

diff --git a/git_archive_all.py b/git_archive_all.py
index 9d21257..83c0288 100755
--- a/git_archive_all.py
+++ b/git_archive_all.py
@@ -234,14 +234,15 @@ class GitArchiver(object):
                 import tarfile
 
                 mode = self.TARFILE_FORMATS[output_format]
+                tarformat = tarfile.GNU_FORMAT
 
                 if compresslevel is not None:
                     try:
-                        archive = tarfile.open(path.abspath(output_path), mode, compresslevel=compresslevel)
+                        archive = tarfile.open(path.abspath(output_path), mode, format=tarformat, compresslevel=compresslevel)
                     except TypeError:
                         raise ValueError("{0} cannot be compressed".format(output_format))
                 else:
-                    archive = tarfile.open(path.abspath(output_path), mode)
+                    archive = tarfile.open(path.abspath(output_path), mode, format=tarformat)
 
                 def add_file(file_path, arcname):
                     archive.add(file_path, arcname)

...or just use Python 3.7? :p

Change 620647 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/tools/release@master] make-release: Build tarballs in GNU format

https://gerrit.wikimedia.org/r/620647

Change 620647 merged by jenkins-bot:
[mediawiki/tools/release@master] make-release: Build tarballs in GNU format

https://gerrit.wikimedia.org/r/620647

@Mainframe98 can you confirm that the new 1.35.0-rc.2 tarballs extract properly? If so, I think we can close this.

I followed the link in https://lists.wikimedia.org/pipermail/wikitech-l/2020-August/093737.html, but that file still has the paxfile issue. Trying to run the web installer prompts me with Uncaught Error: Interface 'MediaWiki\Diff\Hook\DifferenceEngineShowDiffPageMaybeShowMissingRevisionHook'. Looking for the file, it appears to have its filename truncated.

So, unfortunately, it's still broken :(

Is there any specific reason of releasing MW only as a tarball ? Why we can not release a zip file along with the tar file?

recently i faced this issues and it is so frustrating that I can not even cross the very first step of the installation. It is a known issue and rather providing an alternative way, it is being ignored. When i faced this issue I extracted it from a linux machine and created a zip file. Then using that zip to install. But i am sure not every window user have a linux machine.

Is there any specific reason of releasing MW only as a tarball ? Why we can not release a zip file along with the tar file?

Because no one has ever asked before. And AFAIK we've never had these sorts of issues before, there's never really been a reason to do so.

recently i faced this issues and it is so frustrating that I can not even cross the very first step of the installation. It is a known issue and rather providing an alternative way, it is being ignored. When i faced this issue I extracted it from a linux machine and created a zip file. Then using that zip to install. But i am sure not every window user have a linux machine.

No, it's not being ignored. A patch was merged to try and fix the issue, but unfortunately it didn't.

There's plenty of other "archiving" tools on windows, in various states of free or trials (winrar, winzip etc). So you shouldn't have to use a Linux server

This issue was created on Jul 4 2020, 3:03 PM. So it exists more than 6 months. This is the very first step of the installation and it is known that it can be failed. That is why I think it has been ignored.

Even if it worked, tar.gz was never a popular format in Windows. You have to download additional tools (7zip, winzip, winrar and so on), where as windows can extract Zip files without any external tool. Like on my Ubuntu I can extract tar without installing anything extra. I do not use OSX there might be some archive format which is supported by default.

Can we release MW archives as ZIP from now along with the tar? Can it be done as soon as possible? I would also to make other format available for osx as no one needs to install anything extra just for extracting the archive.

macOS will do multiple different formats fine out of the box, as per https://www.mediawiki.org/wiki/Download.

But the 3rd party tool The Unarchiver shouldn't be used

Can we make MW archives available in such formats where we will not need to have additional/ special tools just to extract that?

A tar and a zip for linux and windows. If osx can extract any of these out of the box then only these two format should cover almost all.

By this time if patches worked or 7zip fixes, this issue will be closed anyway.

Do I have to create an another task to get a Zip format which can be extracted with the default tool? or what is the process of requesting that?

I think the fix should be easy. Instead of telling *everyone* to not use some very popular extraction tools (because most of them don't seem to support this new format), revert the way *you* generate the tar archive so it works for everyone.

Because apparently it's not a limitation on the tar format, since Legoktm was able to create a tarball that works correcty (In T257102#6359051).

If the solution is to not use python for creating the tarball, but instead shell out to a standard gnu tar program, please do so.

I think the fix should be easy. Instead of telling *everyone* to not use some very popular extraction tools (because most of them don't seem to support this new format), revert the way *you* generate the tar archive so it works for everyone.

Not really. It's not so easy to revert the python version being used when it changed in an OS upgrade.

Hello, there is a new version "7zip 21.00".
--amended:--
I tested it with a 1.34.2 download (using BeyondCompare, a great software for folder and file comparison), and the problem is still there. Fortunately, the files with truncated filenames have, been identical to old ones or replacements for old ones which gave me the correct name.

7zip shows warnings that it is going to replace files. If it extracts without any warning, then I assume 7zip fixed the issue. I use 19.00 and it shows warnings as following.
.

Untitled.png (517×816 px, 26 KB)

Reedy triaged this task as Low priority.Jan 28 2021, 3:22 AM

There's now a zip format too

[…]
In short, the gnu variants work; the pax ones do not.

[…] It's not so easy to revert the python version being used when it changed in an OS upgrade.

Change 620647 merged by jenkins-bot:
[mediawiki/tools/release@master] make-release: Build tarballs in GNU format

https://gerrit.wikimedia.org/r/620647

OK, so as I understand it:

  • "7z for Windows" is or was unable to extract our Tar release file correctly.
  • The reason it worked before is that Python's default Tar algo changed from GNU to PAX, and it seems PAX doesnt work correctly in 7z for Windows.
  • Reedy has confirmed this issue and also confirmed that the GNU variant still works correctly.
  • The above patch changes it to explicitly use the GNU format again, like it was before effectively.

It seems however people are still experiencing the (same?) issue. What did I miss in the above summary? Do we know why it didn't work?

It looks like Lego's patch didn't necessarily have the desired effect; the tarballs seem to be still in the PAX format. I'm not sure what the difference between that patch and the tarballs he built for testing vs the ones I release are. Other than the underlying python version, and the upstream patch he linked (as I'm using 3.8)

Which if tarfile.DEFAULT_FORMAT = tarfile.GNU_FORMAT isn't erroring or working, is potentially a bug... Or we're doing something wrong

It looks like Lego's patch didn't necessarily have the desired effect; the tarballs seem to be still in the PAX format. I'm not sure what the difference between that patch and the tarballs he built for testing vs the ones I release are. Other than the underlying python version, and the upstream patch he linked (as I'm using 3.8)

Which if tarfile.DEFAULT_FORMAT = tarfile.GNU_FORMAT isn't erroring or working, is potentially a bug... Or we're doing something wrong

It's possible this monkeypatching doesn't work for subtle reasons I don't fully grasp. I think we need someone who 1) has access to Windows 2) comfortable/interested in debugging tar format issues 3) understands basic Python to dig in and see what's wrong.

My current thinking is that https://github.com/python/cpython/commit/e680c3db80efc4a1d637dd871af21276db45ae03 is buggy somehow. It surely didn't "...improves cross-platform portability with a consistent encoding" in our case as the release notes claimed it would. But we probably should go to the Python folks with a easier/smaller reproduction case.