Tue, Jan 15
You can apply the patch and test it. ZipDirectoryReader has changed very little since I introduced it in February 2011 to fix T26230. Based on the quoted debug log snippets, the issue is that Microsoft has extended the end of central directory record, adding new fields to the end of it. When I wrote ZipDirectoryReader, I had it flag any deviation from the ZIP specification as an error, since that seemed to be the safest way to flag potentially malicious files. For example, it is possible to deviate from the spec in ways that make files visible to one ZIP library but invisible to another. But extending the EOCDR does not help an attacker in any obvious way, so this particular error is needlessly paranoid. That's why I propose removing it in my patch.
I can't recall having seen this bug before. It is the simplest thing to fix. The patch is untested but probably works. Sorry @Pavel.petrovic, I don't know why it has taken so long.
Wed, Dec 19
Ran it again:
Dec 17 2018
It would be convenient if the script would report all malformed throttle rows, instead of stopping on the first one. Or maybe we could think of some way automatically deal with the bad rows, since there seems to be quite a lot. For example, disabling the throttle and generating a report for admins to check later.
Dec 14 2018
Dec 11 2018
@Krinkle, by corruption, do you just mean things like aborted LinksUpdate operations as in T201482? PHP timeouts are expected to break things, the main solution should be to make things be fast enough or to increase the timeout. It seems to me that a missed LinksUpdate would not be a failure mode so severe as to be a blocker for PHP 7 deployment.
Dec 6 2018
Please install tideways, but it should only be enabled in php.ini on the debug servers, since it will cause a performance degradation even without being used. Also, please install php-mongodb, the PHP driver for MongoDB, since this is recommended for XHGui saving on PHP 7. I am working on the mediawiki-config patch which will use these extensions.
Dec 4 2018
Dec 3 2018
Dec 2 2018
Nov 30 2018
Also, the zend_interrupt_function hook is only called after the internal function returns to the VM, at which point it is not in the stack anymore. At best you would only see functions that re-enter the VM, like array_map(). So it would be misleading.
Nov 28 2018
@Joehoyle: internal functions were just skipped for simplicity. We configure xhprof to skip them, so we presumably don't need that feature for production. Finding the function name for an internal function requires a few more lines of code. So I guess it depends on how much you want them.
Nov 26 2018
I think we should remove unblockself on all Wikimedia wikis, and see if it helps with vandalism by compromised admin accounts like T210192. In the case of Killiondude, the compromised admin account was blocked after 2 minutes, but unblocked itself twice and thus was able to carry on vandalising for an additional 2 minutes prior to a steward-imposed global lock.
Nov 21 2018
Nov 20 2018
OK, but there would have been a notice as well, and probably broken output, regardless of PHP version.
Removing PHP 7 projects since it does not appear to be a PHP 7 migration issue, it's a RemexHtml issue which was observed on PHP 7.
Nov 19 2018
Installing the package gnupg1 and using
Nov 15 2018
Nov 14 2018
I've just been looking at the phpegjs code. Performance will be extremely sensitive to character class matching. Ideally, that should be inlined, instead of split out to a runtime library function peg_char_class_test(). If the class is purely ASCII, then it should be possible to consume text without doing UTF-8 parsing. For example, Parsoid has:
Nov 11 2018
Nov 8 2018
paravoid explained to me that librsvg 2.44 was uploaded to sid on November 3. There was some consternation about the ports which are still missing, but it looks like the change will not be reverted. This upload to sid was the thing that was previously blocked by lack of architecture support.
@Krinkle asked whether we can do function counts in excimer, to provide feature parity with xhprof. I think the answer is no, not without a PHP patch. The relevant hook (zend_execute_ex) is a true global, so it needs to be installed unconditionally, on MINIT. When this hook is overridden, the VM switches into a slow mode which uses the C stack for internal function calls, instead of the internal stack. For example, in the DO_FCALL handler:
Nov 7 2018
It looks like the bug started in 2a03980093b5168a834bbf65e820c5400e29b21b (October 2004), you see Will changed the random number function to be "more random", by multiplying two random numbers together, producing a distribution heavily skewed towards zero. It was fixed in May 2005 in 133b12c9dc98e4f1abe5a1ccb3138612f3837176 and 4d556548eb8f2d020d99b813758fc50706a1821a.
The upper limit is around page_id 2200000, i.e. July 2005, so we should look for a fix in MW core at around that time.
cur_random values were reset with RAND() by Brion in May 2003, as described at the top of https://en.wikipedia.org/wiki/Wikipedia:Village_pump/Archive_E . But this is not the problem since page_id<1200000 does not show the skewed distribution, meaning that it started in approximately November 2004.
Nov 6 2018
Should be fixed now, feel free to undelete the page and try it.
The reason I'm not concerned about increasing this limit is because the effect on CPU time is O(N). It just limits the number of characters examined by PCRE, and PCRE takes a very small amount of time for each character. The reason it exists is because for certain regexes, short input strings could cause an exponential amount of backtracking. Setting the backtrack limit to some constant factor times the input size avoids this problem, bounding execution time to be approximately linear. Settings the backtrack limit to less than the input size is pointless, it implies that the goal is sublinear performance, i.e. better than O(N), which is not possible.
The usual way to fix exhaustion of pcre.backtrack_limit is to just increase the limit. I documented on line 1449 of Remex's Tokenizer.php that it needs to be at least twice the length of the input string. The current limit is 1MB, which I thought would be enough, but the input to RemexHtml for this test case is 1.4MB. I confirmed with eval.php that increasing pcre.backtrack_limit to 2MB fixes the issue for this test case. But let's make it 5MB to be on the safe side.
Nov 5 2018
Do you think "event count" is the right name? It's correct if you take "event" to mean a hardware event, like in perf record, but it's not correct if an event is an invocation of an event handler.
It's unclear to me how this code works, since neither normal command lines nor NMAKE files appear to support backslash-escaped quotes, but I downloaded a DLL and it seemed to have plausible things in it, in the "details" tab of the file properties. In Windows command lines, the quotes need to be at the start and end of each argument to work, that's probably why it's redirecting. So it's a bit of a stab in the dark, but @MarkAHershberger , could you please try...
The reason it's got angle brackets is because that's the format used by Pyrus: https://github.com/pyrus/Pyrus_Developer/blob/master/src/Pyrus/Developer/PackageFile/Commands/MakePEAR2.php#L384
Nov 4 2018
Oct 31 2018
Oct 26 2018
Oct 25 2018
- Has revisions
- Must it have a title?
- Actions: edit/view, rollback/undo, delete/undelete, move
Oct 24 2018
This should be fixed now
There's no index on ar_page_id, it needs to select by namespace and title
I logged a deletion on en.wikipedia.org using X-Wikimedia-Debug, you can see it in mwlog1001.eqiad.wmnet:/srv/mw-log/XWikimediaDebug.log . You can see that the row count query was indeed very slow. The query was:
Oct 23 2018
I see the patch does both things at the same time (index choice and batch size). I'd be interested to see a benchmark of the stub dump with different batch sizes but with the table scanning issue fixed. It should only take a couple of milliseconds to fetch 1000 rows. Having the benchmark will help us design similar software in future.
Oct 19 2018
Oct 17 2018
Oct 10 2018
Sep 27 2018
I think there should also be ExcimerProfiler::flush(), which detaches the log and returns it, similar to what happens on an implicit flush. The theory is that ExcimerProfiler::stop() will leave the log still attached to the profiler, so this:
Sep 26 2018
Sep 25 2018
Sep 24 2018
I'm planning the timer backend component. An interesting wrinkle is ZTS support. As in LuaSandbox, we can have an integer ID (sival_int) with our timer struct stored in a hashtable, with a lock protecting it from concurrent updates. Instead of setting a hook in a lua_State, we need to store &EG(vm_interrupt) in the timer struct, since in PHP 7.0+ it is declared with __thread, so taking the address of it is the only way to transport it to the handler thread. Then when the zend_interrupt_function() hook is called, the hook function will need to find all the ExcimerTimer/ExcimerProfiler instances associated with the local thread that have pending events -- this was not a problem with LuaSandbox which only had one "timer set" per lua_State.
I started the process of adding LuaSandbox to PECL: https://marc.info/?l=pecl-dev&m=153776610925078&w=2
The main reason to use a flush callback is for real-time analysis of overload events. The problem we've had in the past is that if profiling data is only logged at the end of the request, the requests that are timing out are invisible. If we log once every 10 seconds, we can get a realistic snapshot of what the cluster is doing.
Sep 22 2018
Sep 21 2018
Sep 19 2018
Sep 12 2018
For createaccount/autocreateaccount filtering, shouldn't the log performer always be anonymous? It doesn't make sense to use a non-existent user, half created, as the performer. That logic shouldn't depend on User::isSafeToLoad(), which is implemented in a hackish way, it should just depend on $action.
Sep 11 2018
It's very mysterious. My best guess is that "PHP fatal error: unlink(1): No such file or directory" was actually a suppressed warning, and was later misinterpreted as a fatal. There's definitely no entries in fatal.log with "unlink" in the message.
It was webVideoTranscodePrioritized. This is the JobExecutor log line for one of the affected jobs: