If you decided that they're ok to be ignored, that's fine by me. But given that initially hovered links at pageload never worked, one might wonder if the instrumentation was correct and we weren't just missing the measure the ones that either happened on pageload, or prior to the event listener being added in the data that led to this decision.
We'll know for sure when it hits production regarding the impact on SpeedIndex. I don't know how "agressive" requestIdleCallback called at that point will be in practice.
I've discovered a pre-existing bug, that this change makes worse: T190037: Page preview doesn't trigger when link already hovered at page load time
I haven't touched anything
Fri, Mar 16
Thu, Mar 15
Wed, Mar 14
Actually, from what I can see on Vagrant, Varnish doesn't pass X-Varnish as part of the request to backends.
This isn't an issue with thumbnailing, but a problem with the layout of the file page for PDFs when dealing with PDFs whose page aspect ratio changes from page to page, and is different from what Mediawiki considers to be the "default" aspect ratio for that PDF.
Something's still wrong with integration-slave-jessie-1001:
Actually, I've just noticed that despite the error it spews and the non-zero exit code, ImageMagick (at least locally) does render a valid output PNG. We currently throw it away due to the exit code, but we could keep it.
I don't think that's something that can be fixed generically across the board, as logstash entries will be emitted by various backends, which would each be responsible for adding the header they were given by Varnish to the log entries they record.
Locally even a fairly recent version of ImageMagick (Version: ImageMagick 7.0.5-10 Q16 x86_64 2017-06-04) fails to convert this file, with the same error message. OS X Preview.app does render it, albeit the bottom of the image is black. Combined with the ImageMagick error message, this suggests that the file is somewhat cut off before its end.
I need to look into why that particular entry doesn't have the url field, which would have made it easy to find.
Tue, Mar 13
Does the Panellium install on labs actually generate tiles, or is that feature turned off? Looking at the example @Ainali provided on that wiki page, it seems to just stream the original file.
Those expire quickly, afaik
Just realized that this affects private wikis. We hadn't hit it yet, given how little videos we have stored there, but still: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.03.13/logback?id=AWIenanRXDoVG7yN2CRu&_g=()
Mon, Mar 12
I've made a saved search in logstash: https://logstash.wikimedia.org/app/kibana#/discover/afd5a910-2629-11e8-abaf-f154b84b6b6c?_g=(refreshInterval%3A(display%3AOff%2Cpause%3A!f%2Cvalue%3A0)%2Ctime%3A(from%3Anow-24h%2Cmode%3Aquick%2Cto%3Anow)) which is what we'll need to watch after Thumbor deployments.
I've combed through logstash and during the last incident that lasted 2-3 days, only officewiki and ombudsmenwiki hit the error (i.e. tried to generate new thumbnails). I think it's sufficient to watch those two private wikis as canaries, particularly since officewiki generated a significant amount of errors (84 over the breakage period, versus 3 for ombudsmenwiki).
I think the assumption of picking a specific combination of factors for RUM would be that we oversample that particular set. We have to mindful of privacy implications on small wikis when we do that, though. I think doing this would also let us pick a more advanced RUM metric only available on some browsers. Which while wouldn't represent the experience on all browsers, could be slightly closer to what users experience.
What's the next step? Asking the community of these 3 wikis (cawiki, frwiki, enwikivoyage) if they would be willing to let us run the study on their wiki?
It looks like this won't be necessary, as we have different proxy options that should offer a satisfying workaround to the problem of feeding thumbor instances the right amount of work.
Thu, Mar 8
Haproxy might actually work: https://stackoverflow.com/questions/8750518/difference-between-global-maxconn-and-server-maxconn-haproxy I'll try that first
Mon, Mar 5
Interesting, worth testing that different tool for webpagereplay
Sat, Mar 3
@Tgr searching my filename would let you find the thumbor error easily:
Fri, Mar 2
Automatic mesh simplification prior to rendering would also be an option, but that also needs to be streamed, otherwise it's likely to run into the same problem.
If I'm not mistaken, each Thumbor process is allowed to use up to 15% of the memory in production. That's a lot. It means that this is using gigabytes of memory. It would be unreasonable to increase the limit, as it would be at the expense of other thumbnails being rendered concurrently. Other gigantic files of other types hit the memory limit as well.
Thu, Mar 1
A lot more information about the performance of the page and anonymous information about the browser, etc., which we already collect with the NavigationTiming extension. Essentially the survey results need to be tied to the performance data (and performance measured when we show the survey - or rather the survey shown on a subsample of pages where we measure performance). It's fine if they go to different EventLogging schemas (I presume QuickSurveys is set up with a default schema). I'll probably modify NavigationTiming rather than QuickSurveys to achieve that linking.
It would be desirable if we have at least 2 wikipedias, but it could also be interesting to have a non-wikipedia in addition to that.
Please add test files for both types. This will ensure when the Debian package is built that ffmpeg has the right codecs for those formats.