What would be the harm in switching to "defer" for both the startup module and the preloaded modules?
Right, it's unclear to what extent real users would be affected, but since we have no way of measuring visual completion with RUM, we can only assume it might have such a dramatic effect for some. I'd feel more comfortable if we could get back to the performance we used to have in synthetic tests. This stuff is all low priority, it should get out of the way during the critical path. Preload is still worthwhile if we can make it free of side effects.
Running via WPT? Or testing manually?
I assume that the pink bars ween in WPT are when the JS in the corresponding request is given execution/CPU time.
Last Chrome 61 unauthed desktop run: http://wpt.wmftest.org/result/171018_Z4_FE/1/details/
First Chrome 62 unauthed desktop run: http://wpt.wmftest.org/result/171018_PJ_G0/1/details/
Is it too late for us to run something on that machine before it's fully decommissioned? We've been trying something in labs for T176361 and we're at the point where we're wondering if bare metal would be better for what we're trying to achieve.
Ok, so the difference is clearly Chrome 61 vs Chrome 62 behavior. I think it's safe to assume that nothing changed in our code.
Are WPT and WebPageReplay at 10fps in the above breakdown? Given the ranges we're looking at, I think the rounding that happens at 10fps might make anything running at 10fps look more stable than it really is. A span of up to 200ms could be reduced to 100ms. Up to 300ms could end up being around 200ms, etc. And obviously it introduces a floor, where it's impossible to go below 100ms span at 10fps, even if the setup performs better.
Wed, Oct 18
Videoscalers don't run Thumbor
Tue, Oct 17
Mon, Oct 16
We could possibly take an image scaler our of rotation on codfw, since those serve very few requests since the Thumbor deployment (and none in codfw outside of switchovers). @fgiunchedi is that something that sounds reasonable? If initial testing is promising, we could subsequently request a bare metal machine properly. Or talk about repurposing an image scaler for good, I guess.
Do we have dashboard for these new metrics yet?
OK, I think the experiment of running browsertime alone has run its course, clearly webpagereplay is giving us greater stability on every metric. I'm going to destroy the browsertime machine and remove the corresponding labs dashboard.
Meeting scheduled with the Edge team this week
Cleanup should be complete for TIFFs, PDFs and DJVUs on all wikis.
Fri, Oct 13
Probably something in the Cairo libraries? That's where you located the issue last time.
I'm going to wait until there's a more compelling reason to push a new version. As it stands those noisy messages can easily be filtered out in Logstash.
throttle --up 330 --down 780 --rtt 200 is now running in a screen under my user on the browsertime machine, and the browsertime command on the cron job no longer has any connectivity parameter.
I will make the throttle permanent and see what happens
Just checked the latest har file, the slow runs are random.
We're talking about a 6.3% increase in SpeedIndex, that's nothing to sneeze at. Which these arguments it takes 3 teams who each consider their feature to be a special snowflake to get close to a 20% regression. The problem isn't the amount per se, but the fact that we can do something about it. In logical terms the bulk of that module doesn't need to be loaded and executed at that point in time.
I think the parse/eval should also be done lazily. Ideally we have the data in a "black box", we know it's the contents of the module (we should know that because we know which modules we've requested), and we do all the main thread work for it lazily. This has the advantage of avoiding the parse/eval pile-up in the local storage case.
The logs are flowing, but the host information is missing. It needs to be merged and deployed.
Thu, Oct 12
It's specifically resizing it takes issue with.
The upgrade hasn't fixed the problem. It seems like converting the file without options works, but not with the options we typically use:
Seems like just noise coming from the pyexiv2 library, writing to the error log. The actual exception is caught by the code.
Yes, they're coming from exiv2. I need to check if they're merely spewing into the log or if they're really causing requests to fail.
Did you bother to look at this link I provided in my last comment?
https://phabricator.wikimedia.org/T175916#3608031 How can you be "unaware of a regression related to the launch" after looking at this?
@godog it's logstash that needs to be restarted
Wed, Oct 11
--connectivity.engine throttle -c cable gave me an error the first time I used it:
--cacheClearRaw is undocumented on the front page of https://github.com/sitespeedio/browsertime btw
Without doing anything special, the browsertime metrics are more volatile. The min-max span for SpeedIndex is 5% of the min value on the webpagereplay server, 11% on the browsertime server.
Tue, Oct 10
The browsertime machine (without any proxying) is set up and running, visible in Grafana. Since the runs are much faster, I have it running every 3 minutes to collect more data.
It's now putting the run data into a tmpfs partition. I couldn't put the chrome data in there as well, because doing so reuses the same profile and cache between runs inside a given browsertime call. This is what was happening in the latest batch of mahimahi and could explain why it got better, it just started (accidentally) using the browser cache across runs.
I think the results we've got from Cloud VPS are ok. AWS is a bit better, but it's a lot more expensive and not on our infrastructure. When we started we didn't know what kind of stability we were going to get, and it seems to be as good as it gets. Which is already a lot better than what WPT without a proxy is giving us right now. Neither AWS not Cloud VPS can achieve commit-level stability yet, but already they're useful for better alerts. I don't think the difference in stability between the two justifies paying for AWS, it's still in the same ballpark.
Mon, Oct 9
I've tried pinning chromium to a different CPU than browsertime via a shell wrapper and it's not liking it, crashing randomly. I think that concludes my investigation of that idea. Pinning things to one CPU just makes stability and performance worse than letting the OS do its own scheduling.
I've set a CPU aside and giving it exclusively to browsertime and its subprocesses for the mahimahi machines. It's not ideal because ffmpeg, Xvfb and chrome are still sharing a CPU, but h2o and OS processes are definitely running on other CPUs. I don't know how useful that'll be if the physical CPUs are shared between VMs for Cloud VPS. Might be more useful for AWS, but there I see only 2 cores.
For the mahimahi machine it definitely hasn't made things more stable than before (tmpfs for run data and chrome user data).
I think tmpfs is safer because you reserve the space and it can't go over the limit
Yes, we run into the same problem occasionally in production: https://phabricator.wikimedia.org/T153169#3042185, but it's really painful on Labs Graphite/Grafana.
It's Graphite intermittently serving junk to Grafana. It's a PNG image instead of JSON being sent back. I've seen that before, but I can't remember if it was production Graphite or that I'm just rediscovering the same issue on Labs Graphite again...
I've just noticed an interesting difference: on AWS the underlying disk is an SSD, on Cloud VPS it's a spinning disk. We might get better perf by having the run read and write its data to a RAM partition.
...and that's already what you're doing :) OK, it doesn't seem like there's a better option with ffmpeg. Which doesn't mean that we couldn't benefit from isolating things per CPU more. I think it's a matter of figuring out what consumes the most CPU during runs (eg. could be the web servers replaying stuff).
But ideally raw video would be best (no compression at all). I've just seen people sharing their screen recording ffmpeg code and they use this: "-vcodec libx264 -vpre lossless_ultrafast" which suggests that it will be contained in a h.264 format but compressed as little as possible.
Reading your commit, it seems like you've decoupled recording and encoding. It's just a matter of changing the codec used during recording to be one with the lowest overhead possible.
Or is what I'm describing already what browsertime doing?
You can get the best of both worlds and record the video uncompressed during the run, then once you're done measuring, encode it to x264 or whatever you like. This way the huge raw video file doesn't stick around, it's just kept for as long as you need to derive metrics from it. You could even encode before measuring metrics, it really doesn't matter as long as perf measurements aren't happening at the same time as expensive video encoding.
x264 is going to be expensive to encode regardless. Why are we compressing video at all? It's useful if you're going to expose the videos to be played on the web, but in terms of figuring out the SpeedIndex, etc. it's just overhead at the time the measurements are being made.
I think I'm done touring the dev tools. My biggest pet peeve is the inability to contrast the performance timeline with screenshots.
https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/ has Edge 16 VMs
Sigh, actually you need to nuke the entire machine, since the .ssh folder is gone, can't ssh into it anymore.
I've accidentally nuked the home directory of the mahimahi AWS machine, can you give me the IP address of the host to send graphite metrics to?
It's certainly an improvement compared to https://grafana.wikimedia.org/dashboard/db/webpagetest-drilldown?orgId=1&var-wiki=enwiki&var-users=anonymous&var-page=Barack_Obama&var-location=us-east-1&var-browser=Google_Chrome&var-view=firstView but I'm disappointed in the "best" results on AWS. That means we can't catch regressions of less than 5% reliably. Maybe we should experiment with reserving CPUs for certain tasks. For example booting the kernel with N-1 CPUs available, and ffmpeg runs exclusively on the Nth CPU?
Fri, Oct 6
Cleanup of non-Commons wikis complete, started the cleanup of Commons. There are 1.2 million files to process there, at the current rate it's going to take approximately 9 days to complete.
Thu, Oct 5
Command to set up a fresh Debian 9 machine with mahimahi, with the custom script to send metrics to graphite:
Started running the cleanup script on Terbium, clearing the header for TIFFs, PDFs and DJVUs.
Wed, Oct 4
Geo.region doesn't contain the continent.