I made a proof of concept so we can see how this looks like in real life. The share feature is disabled, by default, will be visible only to beta users on mobile site. Also, there is a config variable that allows us to enable/disable the feature.
Wed, Sep 19
I'd like to add one more Acceptance Criteria -> parse only first 1000 images, then skip the rest. Anyone against?
To get the user group we always call
@mobrovac we're on 1.7.0, Puppeteer 1.8.0 got released like a week ago. Today I'll create a patch to update puppeteer to the latest version (we will be able to drop the CHROME_BIN variable)
Tue, Sep 18
yes, we're skipping small images but somehow it doesn't work, we can add a simple check that only first X images will be transformed, that should be pretty easy.
I checked the code and it's pretty simple but most probably performance is pretty bad (we're doing many DOM operations on pretty big DOMDocument), I'll do some small research and try to do things differently (like clone elements instead of creating new one every time) and do some performance testing using https://en.wikipedia.org/wiki/User:West.andrew.g/2013_popular_pages)
Maybe there is a faster/less resource heavy images transformation -> that the first thing to check. Second -> maybe we can just parse first X images, and then skip the rest, I agree, a page with 11k images is huge, and most probably we should skip transformation for such pages.
Do we know what is the average image count?
Mon, Sep 17
Moved to workboard as I'm actively working on it
I'm thinking loud here: -> There is only one case when we jump out of puppeteer scope, in renderer.closeBrowser()) we ask the browser to close and then we listen to the native ChildProcess exit event - if it didn't happen we kill the browser.
The scenario @Jdrewniak talks about - happens only when something other than our code (something other than puppetter&chromium-render) tries to manage the chromium process. If we start listening to exit event all the time (not only when we ask the browser to close), we could log&prevent such situations in a nice manner.
@mobrovac that's a good point. I was going to add the check anyway because that error kills the worker which IMHO is pretty bad. The chromium-render process shouldn't die, all errors have to be handled properly. Also, if that error (callback called twice) is triggered during tests -> that's more than sure it will happen on production. I'll analyze the flow and I'll try to prevent similar cases.
Wed, Sep 12
The documentation is available on https://www.mediawiki.org/wiki/Reading/Web/Team/Hosting_Prototypes
Tue, Sep 11
Prototype is available on https://people.wikimedia.org/~pmiazga/mobile_contributions
IMHO we should enable this only to users without gadgets (maybe anons only?). I'm afraid that we might get lots of errors and spend way too much time on finding what's causing the issue.
Fri, Sep 7
The task is nearly ready, the remaining bits and pieces:
- we need to merge the last patch for T181623 - the new flow for closing the chromium browser - blocked on Services
- security review - blocked on the review
- fix the Icinga issues - that is most probably the configuration issue - blocked on services
- review the Grafana dashboard - @pmiazga will do it
Then we can stream the traffic to the service to verify that it can withstand the live traffic. Service will get all requests to the old Electron service, but the PDF output will be discarded. We want to verify how the service performs in the live environment.
blocked, we're waiting for @mobrovac to come back from vacation.,
I'm wondering what is the best way to store as much as possible without crossing the hard 1k limit for the URL. Adding some hard limit is a good way to go, but there are two pitfalls:
- those strings will be encodeURIComponent(), which means that most probably the output will be longer, for example, the userUrl can have non-latin characters. if that happens the encoded string will be much longer than the not encoded one
- some fields like errorUrl might be empty, we can use that space for longer stack traces
- error messages can be super short (like Type error.) or pretty long (over 200 characters)
- stacktrace might contain repetitive data (like most of the files share a similar path)
Thu, Sep 6
ok, I'll do a quick research where to put it and comment in the task.
Wed, Sep 5
Mon, Sep 3
Fri, Aug 31
I think (I didn't verify it yet) that it fails because of introduced restbase checks:
Thu, Aug 30
Tested locally, works as expected. After a couple changes to the mobile.startup.js script started to fail that file size exceed the max size.
Wed, Aug 29
Tue, Aug 28
Mon, Aug 27
Aug 13 2018
I also did couple of tests before and I found that:
- MobileFrontend uses different DiffEngine (it's hardcoded), this is something you have to talk to Reading Web
- I also remember there is a work on a different engine that nicely shows when stuff is moved around (instead of just showing added/removed) it shows "moved" with an arrow that points to the new location (@Jdlrobson will know more)
- at least on my machine populateContentTablesthrows strange mysql errors:
Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-rollback') in /vagrant/mediawiki/includes/libs/rdbms/lbfactory/LBFactory.php on line 707 Wikimedia\Rdbms\DBTransactionStateError from line 1298 of /vagrant/mediawiki/includes/libs/rdbms/database/Database.php: Cannot execute query from Wikimedia\Rdbms\Database::ping while transaction status is ERROR.
Aug 10 2018
Aug 9 2018
Very similar construct has also the NewsletterExtension
Aug 8 2018
Queue -> it's the time when the task waits before it gets picked to render
Job -> the task got picked and started rendering,
The main difference is that we can have a pretty long queue and render only a couple of pages at once, we specify the different timeout/max size for the queue (as the task in this state do not require any resources) and different timeout for the jobs (as the task is processing and it requires the chromium instance)
about the count.count -> I'm not sure, I need to check the code. Most probably I named the bucket as something.count and then I take count of that.
- I believe that you wanted to use rates vs counts - counts are kinda meaningless cause they get nullified only on every flush, which is kinda arbitrary if I understand that correctly.
Thanks for the info, I'll check it
- Please add axis measurement units legend to all the graphs and also select appropriate units for the axis (for example latency graph uses short now while it should use ms
Will do, thanks for the tip
- All the services dashboards expose heap and GC graphs - please add them, you can find the examples on MCS dashboard https://grafana.wikimedia.org/dashboard/db/mobileapps?orgId=1
Will do, thanks for the tip
- I do not believe all the metrics are correct. The request rate from the last graph is 0.30 but the sum of request rates by type and by format are both 0.20 - where did 0.10 requests/s go?
Thos could be rejected request (because of timeouts/failed jobs etc). I'll check the graphs once again.
Aug 7 2018
This task was solved by the https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/451045/ (reverting T173527)
@Esanders, the problem was with the Skin initialization system. MobileFrontend changes broke Minerva tests because the MobileFrontend stopped executing RequestContextCreateSkinMobile and the Minerva skin wasn't initialized properly.
By "wasn't initialized properly" I mean that the skinOptionsweren't set properly, all values were set to default false instead of proper values.
After digging into the code I found whats wrong:
When refactoring, please keep in mind, that in Page previews repo, when we create a pageInteractionToken -> this token has to be unique for each preview. We cannot use the same token if the user dwells over the same link twice.
Aug 6 2018
Aug 3 2018
That's a bug, most probably introduced in T199282, I'm on it