And the rollout will be gradual until August, with no details on how the gradual part will work.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 24 2021
Apr 23 2021
This started happening because Safari 14 is supposed to support WebP (and serves relevant Accept headers), but their WebP decoding is clearly buggy as demonstrated here.
Apr 20 2021
Apr 19 2021
I don't want to derail the discussion with weird ideas I've just had - the current proposals are great - but since they're unusual and haven't been discussed before I wanted to bring them up. Feel free to spin that into a separate task to discuss those ideas separately.
Apr 14 2021
if there is no utterance for the given segment, a new one will be generated by Speechoid
Apr 13 2021
Apr 7 2021
As for evictions, I understand that you want to be able to evict for other reasons than expiry and need to keep track of where things are stored in Swift, but for the expiry part you would be reinventing the wheel and creating a lot of needless jobs for something that could be handled at the Swift level. Less chatter between MediaWiki and the file storage would be a win.
Any URI length up to 2048 bytes is safe (IE being the bottleneck). I didn't see anything in the form POST data that got anywhere near that amount.
Yes, all of these are of the same nature, validating expectations based on the NavigationTiming standard specification. Filtering them out of the global error dashboards makes perfect sense, since they are expected to happen because browsers are buggy.
Apr 5 2021
Apr 2 2021
Thank you for the extremely thorough work you've done analysing the performance of this new system.
Without being cached Special:BannerLoader would need to be quite fast, as it will be the bottleneck in terms of response time when the cookie is present, including (currently) latency to the active main datacenter. Could calls to Special:BannerLoader made by the edge cache be cached for some time, varying by the set of parameters sent to it?
Apr 1 2021
This is expected. Ever since we started collecting RUM data, browsers have been sending us invalid data that violates the NavigationTiming standard definition. We've reported relevant bugs to the affected browser engines, but some remain unresolved. Often the vendors just can't reproduce them.
Mar 31 2021
T277769 needs to be completed for the dashboard to be restored. Essentially the host data has to go through a new pipeline. The EventGate collection part is ready to merge, next I'll write the changes to the navtiming daemon. Since the data overwhelmed Grafana when looking at a whole DC, I think I'll change it to be just broken down by host, with the ability to select multiple hosts.
Mar 30 2021
Mar 26 2021
Looks good, it worked fine, thank you!
Mar 25 2021
Sure, just kick off a manual run of the asoranking systemd timer. The issue was only happening when running inside systemd.
Mar 22 2021
Mar 19 2021
And for this article, since parsing takes 25 seconds or more, while an earlier request than yours post-purge might have triggered the parsing, your request will be waiting on a PoolCounter lock until the parsing is done.
To be more precise, it's the propagation of the purge to the various frontend caches we have that can take time. The parser cache is probably nuked at the time you call action=purge, but it might take time for Varnish/ATS to pick up on the purge and not served you their previously cached copy anymore.
Purge requests aren't always instantaneous, they go through a queue. At different times of day the purge queue is more or less busy. You also have to remember that if you have one of your synthetic test platforms purging the article, it will affect them all. So beware of test batches that might run concurrently on Kobiton and on AWS, etc. for the same article when purging is involved.
A warmup request before the test will do the trick, you can keep purging and get the best of both worlds.
The Barack Obama article being notoriously large, it might indeed take 15-20 seconds to parse it. Maybe it's being purged excessively? If we have a script somewhere that calls ?action=purge on it frequently, it would result in this slow experience for the next pageview.
hit-local just means that the object was found in a cache backend in the same datacenter.
Mar 18 2021
OK, then I guess the workaround would be to get the information from the client and pass it as schema fields. I'll file a task to that effect.
The explanation is simple, the recvFrom field isn't available anymore, at least for NavigationTiming EventGate events. @Ottomata is this something that's unsupported now? We were using this information to plot this data per DC/host.
This was around the time the navtiming daemon broke because of the EventGate migration. Maybe not all issues were fixed...
Mar 17 2021
This has been launched, but as far as I'm aware all the issues that I flagged in my review don't have tasks. I assume that none of the issues have been addressed either?
Mar 16 2021
@aaron will run the maintenance script on Beta and Production
Mar 15 2021
I looked at a few that got bigger and they were all webp before, png or jpeg after. I think that explains the size difference for all those images.
All images got bigger at once? Can you check the headers to see if the ones that changed were WebP before and aren't anymore?
I can't see anything obvious comparing a run before and after. It look like network-wise everything is a little slower.
Mar 11 2021
Sounds like a grant-worthy project, easy to scope. If someone is interested in getting paid to build this, let me know and I can figure out who to pitch it to at the Foundation.
Thanks!
Mar 10 2021
@JKatzWMF can you make that happen?
Mar 9 2021
@tstarling thank you for your in-depth response. Based on what you've just described, are the following items the minimum changes required to make CentralAuth work in an active-active setup?
Mar 8 2021
I guess my comment might have given a sufficient answer to your concerns? I'll close the task since it's been inactive for a long time now. Feel free to reopen if necessary.
@Nomsterio let me know if you have time to look into this one as well, thanks!
Mar 4 2021
Mar 3 2021
Fixed by applying https://issues.apache.org/jira/browse/HIVE-19231
Mar 2 2021
Hasn't happened to me since, I think, but I rarely deploy mediawiki things these days. Closing it on the assumption that it magically went away, we can always reopen if someone else runs into it.
In that case, given that users like @1234qwer1234qwer4 and @IKhitron essentially generate the same amount of requests using the script as a workaround, for valid reasons since they're active editors on dozens of wikis, I think we should increase the limit to 50.
I can't find a way to start the unit myself. Is there a way I could temporarily be allowed to do that? Or maybe I tried the wrong commands. Being able to start the unit myself would let me iterate on the script until I find a fix (latest attempt failed).
@elukey please try running it again with the new version of /home/gilles/T276121.py which makes some encoding parameters explicit.
A workaround might be to make the encoding/decoding explicit in the Python script. Currently it might inherit it from the shell or systemd. I'll try to modify /home/gilles/T276121.py to that effect.
The output is very different, which explains the parsing failures:
Ah, yes, 100 is hardcoded, so I guess we'll see 100 countries at least. Thanks for that link, it let me find the drop-down menu that I didn't know existed to override the default split limit picked. 100 countries is probably good enough for now, and it's going to let me see all the buckets for Load Event End.
Thanks for the info. I see that you have 100+ edits on 26 wikis: https://meta.wikimedia.org/wiki/Special:CentralAuth?target=1234qwer1234qwer4 I presume that's the bulk of the wikis you watch this way?
I would like to know how and if people are working around this limit. For instance, does sticking to the limit of 5 means that a significant amount of users would still use the old gadget (what limit did that have?). Do they instead keep many browser tabs open on dozens of projects and refresh those manually?
We can put a throwaway script into a new systemd service/timer that will serve as a repro case and output what beeline gives. You can use /home/gilles/T276121.py for that purpose. Just set up a systemd unit that runs that and its logs should contain the beeline output when ran the same way through python (at INFO level).
The errors showed up in the log and suggest that beeline was returning unexpected data when it ran through the timer. They just made the script unable to process the data and its job.
Mar 1 2021
It runs fine without pandas csv parsing issues when run outside of the systemd timer, doing this:
Feb 25 2021
OK, I'll ask our legal department if Moritz could join the MathML WG under the Foundation's W3C membership. I'll let you know when I hear back from them.
Feb 22 2021
The fact that each character takes twice the storage space shouldn't affect parsing complexity and time, right? I'm not familiar with out parsing code, but I don't imagine it would do any sub-character processing.