Fri, Sep 22
You need to look at the max-age value.
PR merged, to be deployed on Monday, so resolving.
Thu, Sep 21
I've submitted a PR that somehow magically fixes the issue, but we have to reevaluate the problem in a month or two, the bug is clearly on the travis side
We already set the cache-control to 10 minutes, but currently we also cache in Cassandra the result for the current day. I want to completely remove current day cassandra caching from the endpoint - it makes sense for previous days because we can't really regenerate all the feed content for an arbitrary day (news for example are not accessible for an arbitrary day). This will have a very limited effect on performance, because the majority of time is spent on response hydration, so with Cassandra caching the average response time on RESTBase level is 300 ms and without it's 600ms. However on the client this will be completely unnoticeable since the Varnish cache hit ration is fairly good right now. Will submit a patch soon.
Wed, Sep 20
Tue, Sep 19
Some tests were added by the above patches - coverage is not great yet, but I think we can resolve this now.
As we're going to begin getting the new storage into production we should make a decision on this one.
Verified that the jobs are now small enough to fit in our new infrastructure. Resolving.
Mon, Sep 18
Thu, Sep 14
@Pchelolo actually, can you confirm how many entries there were in the "pages" parameter? With the latest patches deployed, there should be no more than 20. Perhaps this is an old job getting retried, because it failed earlier?
After the patch was deployed the situation improved a lot, but we've got a 5 Mb event today: https://people.wikimedia.org/~ppchelko/event
Wed, Sep 13
Also, /etc/parsoid.localsettings.vagrant.js should be updated for Parsoid to know about other domains.
Here's an example of a very large event: https://people.wikimedia.org/~ppchelko/large_event
Currently ChangeProp indeed only supports concurrency per rule (per job type) and it's hard coded in the config, so although you don't need a puppet patch to change it, you still need a full deploy of the service.
Tue, Sep 12
What's the expected request rate? SCB has a lot of capacity, so I don't think it would be a problem.
wikidata is a very specific project as well and we sometimes need to do something different with Wikidata, just like commons, so we might consider separating it as well.
Fri, Sep 8
Thank you @EBernhardson, updated the task with your info. Now we've got a complete list of jobs executed in production.
I wrote a little script to run through a sample of events in the job topics we have in prod right now and here's.a list of job types that had the releaseTimestamp set:
Thu, Sep 7
@Pchelolo is there much difference between going with 10 minutes or 5 minutes?
Wed, Sep 6
@Tgr http://localhost:8000/proton-staging.wmflabs.org/v3/page/html/Book/10 doesn't look like RESTBase URI, we don't have /v3 in the API, so it seems you're talking directly to Parsoid so it's a parsed issue.
Thu, Aug 31
Step 0 - merge the patch and deploy MCS at least in beta so that I could run tests while making an RB PR
I think it's not applicable any more. Closing as declined.
Wed, Aug 30
We have redid-based reduplication running in production now that's exactly matching the algorithm used in the JobQueue. There's a parent task to generalize it even more, but this one can be closed now.
All the jobs are being posted on all wikis in production that support EventBus Resolving.
Tue, Aug 29
This has been deployed, please verify that it works.
All done here indeed, resolving the ticket.
Aug 23 2017
Aug 17 2017
Aug 16 2017
That idea was around for quite a while, but since there's no clear owner of OCG it was just dragging. I think @GWicke can provide more details as he was following that closer then I did