Tue, Jun 28
Here you can find the report for the new per-attribute stripping benchmarks
Mon, Jun 27
Thu, Jun 23
Something like that can also work as a selector just to simplify things:
Tue, Jun 21
Just a quick correction on the numbers: the current production container size is ~40M objects not ~12M (i was counting the wrong container).
Next steps to failover to codfw:
Tue, Jun 14
Hey @fgiunchedi the size of the current active deployment is stabilized at ~12269804 objects for quite some time now (last 1 month). The theoretical upper limit can be way higher if we assume that all planet tiles are pregenerated and the count could be as high as 1431655765 (all tiles from zoom level 0 to zoom level 15). The first estimate is more realistic, we never end up generating all planet tiles.
Mon, Jun 13
@cooltey Just a clarification regarding the API.
When calling footer=true on the API endpoint the output is a demo page with a hardcoded value, but not what the apps consume. I think its worth fixing it too to display the right last edit timestamp but for the actual scope of the problem on the apps level what needs to be fixed is the way we render the last edit timestamp to string for the specific locale.
May 25 2022
For now we can use the testcases for each template so no need to identify which template is used per test run
May 24 2022
A quick solution (other than implementing a content handler on parsoid) could be to fallback to MW parser output for the supported content models.
May 23 2022
Added parsoid since mobileapps output is the same with the parsoid output
May 20 2022
Here is the report for the readability issues after testing all template testcase pages for en.wikipedia.org:
May 19 2022
May 18 2022
From a quick look on parsoid and mw:
May 17 2022
Here are the results based on the last rt-testing run (as HTML export):
May 12 2022
May 11 2022
@vadim-kovalenko is there anything left to be done for this ticket or should we close it ?
This is already deployed in production
May 10 2022
May 9 2022
May 6 2022
nit: I think there is a typo on the category, "infrastructure" instead of "infastructure"
I dont see any slow queries on the replicas after enabling the postgres config also kartotherian level errors are back to the regular rates:
I think we should consider it done since we have both Vadims implementation for readability and pa11y which implements the w3c checks for a11y.
I think that given that we need to computed values for CSS from a browser i dont think that linter would be a good place to implement something at the moment. Going ahead and closing the ticket for now.
This view is very useful for debugging on the maps end:
Also maps1006 got killed by OOM at the same time.
2022-05-05 03:24:10 GMT LOG: server process (PID 42352) was terminated by signal 9: Killed
May 5 2022
Maybe its worth depooling maps1007 temporarily if its the only node that had issues to see if we still get errors.
Maybe the current issue with geoshapes is because of a corrupted index? Still TBD based on the output of slow query logs but in the past reindexing worked.
From postgres logs around the time errors started on maps1007:
2022-05-05 03:24:04 GMT LOG: server process (PID 24334) was terminated by signal 9: Killed
At the moment:
- DB connections look healthy
- The load/IO issue on maps nodes is resolved
- Tiles DB query latency went back down
- Tiles request rate is is back to our regular traffic
- No errors related to map tiles/snapshots
It looks like manually GETing the tile fixed the DB query latency:
Lets see if this fixes things. One potential action item could be to fine tune the envoy timeout threshold.
Another thing that looks a bit off is that in the top 50 request on the error logs is that this tile show up repeatedly:
The increase in HTTP traffic makes sense because of the way geoshapes retries the HTTP request on failure.
The part that connections available are almost zero correlates with the kartotherian service high error count.