Do you have a copy of the JSON?
What JobQueue request is a subset of what CP requests to cache. Can we request all from the JobQueue?
The models and wikis that have ores ext. enabled. $wgOresModels and $wgOresUiEnabled has the data
Ok, how about:
/** * Called after a RevisionRecord ORES score was saved to the database. * * @param RevisionRecord $revision the RevisionRecord that has been scored. * @param array $scores scores in the form returned by ORES service. * For the response schema see https://ores.wikimedia.org/v3/scores/enwiki/?model_info=score_schema */ ORESRevisionRecordScoreSaved( RevisionRecord $revision, array $scores )
Fri, Aug 17
One thing that's probably happening (and the new metric will allow us to prove that) is that when DB becomes slow, and the jobs start executing slower, all the high-traffic jobs, that run close to the concurrency limit reach the max concurrency and the lag starts to accumulate in Kafka.
As a start, I propose to vastly decrease the default limit, fine-tune jobs that don't fit and look if anything changes - that's much easier then implementing the global limit, can be done within a day and will give us additional data points.
Will deploy RESTBase first thing on Monday, no Friday deploys :)
I think I've found the correct configuration file now, at mediawiki/services/change-propagation/jobqueue-deploy/scap/vars.yaml .
Just one very primary thought:
Thu, Aug 16
I believe it doesn't request MW API directly, however, RESTBase does. Will fix it shortly.
Mon, Aug 13
Gotcha, is there an estimated date for deployment @Pchelolo?
The service is not yet deployed, so let's wait a little bit. before closing it. more things might come up that would deserve to be a subtask of this ticket.
Thu, Aug 9
Let's get back on track.
Wed, Aug 8
And also, looking into native ORES response, like https://ores.wikimedia.org/v3/scores/enwiki/854077897 the only issue I see is that the scores are prefixed with rev_id. Remove rev_id wrapping and I don't see any obvious conflicts.
Ok, I revoke the idea of having this in ores - @Halfak said that within an API version regardless of the model version they maintain backward compatibility for output formats - so if the model version change, the client written for the previous model version won't break. This makes it easier to create a robust reformatter and put it into change-prop
After a quick h-o with @Ottomata and @JAllemandou we've understood that the /precache endpoint used to produce these is kinda private, used only by change-prop, so we're free to change the format it emits in whatever way we want.
Duh, thank you for pointing out the obvious. Somehow I originally read Petr's comment the other way round - that 73% of requests reach RESTBase, but now I realise that's not the case.
Implementing a new event in the current EventBus system made me think about a fairly random idea. We have a set of well-defined interfaces within MediaWiki, like RevisionRecored, User etc. Most of the events we will be "intaking" directly from MediaWiki will eventually have these MW interfaces recorded - some events will include info about the revision, some about the user, some about a particular revision slot etc.
Tue, Aug 7
By simply exporting one from beta cluster grafana and importing it into production grafana I've created this https://grafana.wikimedia.org/dashboard/db/proton?orgId=1
Does this failure happen on every job run or just some fraction of the attempts?
The proposed schema falls into the revision hierarchy, so it will inherit from the revision-create schema in order to follow our guidelines. Apart from that, it will contain the new list of tags and the prior_state object with an old list of tags, again, because that's how we do it in other *-change events.
revisions belonging to restored pages emit new revision-create events when the page is restored?!?
Mon, Aug 6
Although, these calculations are try for mobile-sections that will likely be replaced with mobile-html very soon, and the latency calculations for the new endpoint will probably be different.
Given that p95 latency for MCS generating the mobile-sections from scratch is between 500 ms and 1 second, only 5% of the requests not served by varnish with result in client-side latency, so, 64500 reqs per day giving us the SLA that 98% of the overall requests will be served way within 1 second latency. Given that for mobile clients the network delay is probably way more important in driving up the overall page load latency, I believe in might be a fair tradeoff to stop pre-generating mobile content.
A little bit more data: according to webrequest logs mobile-sections-lead was requested 4.808.026 times on 2018/08/03. According to RESTBase graphs, on the same day the average rate of requests reaching RESTbase was 15/s, which gives us 1.296.000 Varnish cache-misses giving the hit ratio of 0.73
Right, that model is clearly not tenable on the long run, either. The next question would be then what percentage of the objects stored in restbase ever get re-read before they expire/are regenerated.
Where did you find out the reason is specifically that the width is not included in the final request URI? I can't find this info anywhere
I'm not sure if it should have it, if you look at the swagger UI the row for that parameter doesn't have a name and it looks weird, so maybe it should?
Thu, Aug 2
Just started happening again recently with a fairly high rate: https://logstash.wikimedia.org/goto/b78e539d56b4286f3096b061d2ebbc51
One other reason for using the k-r-v pattern is that mobile content consists of a significant number of chunks - previously those were mobile-sections-lead and mobile-sections-remaining, now with PCS it's mobile-html, metadata, media, references etc. Since they all are fetched lazily in different times, in the perfect world the user would expect to get the version of references that correspond exactly to the render of the content they've been reading, so we also have the grace period when we need older renders here.
- As an engineer I want each schema/(schema revision) to have a unique ID in a form of a publically accessible URI
- As an engineer I want to be able to reuse and reference schemas from one another using the aforementioned ID in order to avoid copy-pasting the code.
- As an engineer, I want to be able to guarantee production of the event and be able to retry until the event is indeed produced T120242
Ye, @CCicalese_WMF was creating them for each goad the Core Platform team is a part of. I'll merge
I believe we could either merge this with the parent or use this one for our part of the work (which is still unclear what exactly that part will be)
Wed, Aug 1
Tue, Jul 31
The message timestamp should be set by the producer (is EventBus doing this? it should), and it should correspond to meta.dt
If a message arrives late, there are scenarios where a consumer will get a lot of extra messages
Looks like the JSON output in https://en.wikipedia.org/api/rest_v1/?spec doesn't include any of the definitions.
Ye, I just didn't really know what exactly that is and in the heat of the outage I was a bit confused.
What's exactly the benefit of providing the exact offset? The timestamp has millisecond resolution and in practice, our topics have low enough traffic for a millisecond timestamps to be a very definitive id for a particular message.
In production, the event-schemas are deployed automatically as soon as a patch is merged in gerrit in this repo. I don't have any idea why that shouldn't be done the same way in labs. @Ottomata was setting up this repo, so pinging him
Mon, Jul 30
Seems like this was done. Resolving.
And transferred to wikimedia group.
Hm.. I've looked into Kafka topics that should've triggered the update, but the topics have already rolled over, so there's no way of knowing whether the message was missed or something else happened.
Everything is done here on the API side and on the popups side. For example, on sr.wikipedia.org the content of the preview respects user's variant settings.
RESTBase API now supports language variants, but the actual converters are enabled only on a subset of wikis. However, nothing else needs to be done on the API side.
The support was deployed. Resolving.
Jul 19 2018
Jul 18 2018
each of the two CP instances needs to have a distinctive set of rules
Jul 13 2018
Now it does.
Is this html optimized for mobile reading use-cases? If so, why not /page/mobile-html? Or even /page/html/Bla?mobule=true?
Jul 12 2018
I can see that the a header is set to sr on Cerbian wiki - bravo, that unblocks us from deploying the language variants for the summary endpoint and if I change my preferences I can see sr-ec and sr-el. Thank you, I think this could be resolved.