The fix to mw-config has been deployed, we should be good now.
The above patch to mediawiki-config will fix it. This is another instance of having code in multiple places biting us hard. Given that it affects production, this config change has to be deployed ASAP
I do not have a solid theory of what's happening yet, but just throwing in some bits and pieces.
Thu, Jun 20
Apparently, change-prop is not prepared for the new event schemas.
Mon, Jun 17
So, the event is based on LinksUpdate hook. the LinksUpdate has getTriggeredRevision method which we use to assign a revision ID to the page. If it's null - we use the latest revId of the page. I believe that (logically, needs to be verified) the LinksUpdate object triggered by a template propagation will have null triggering rev id and thus we'd be able to distinguish the events.
Ok, I think this could be the list of remaining events from lowest to the highest risk, events that are coming from EventBus extension. The events coming from other sources (RB/ChangeProp) are not considered here as they are the highest risk.
Fri, Jun 14
We should probably also recreate the VM as stretch as now it's running jessie.
Tue, Jun 11
OpenAPI 3 (which we want to be able to express our API specifications in) uses draft 5 with some extensions.
We do track image usage in change-prop and rerender pages when images get updated, but indeed it's only done for local usage. Having the same mechanism for global usage is possible, even though we would miss some usages, but I support @Mholloway in his concern regarding the scalability of this.
Mon, Jun 10
The PR was merged, thank you @holger.knust
I'm sure we've forgot something, but it's ok, the remaining will be replaced as it goes.
Thu, Jun 6
One thought - currently the syntax of the table_properties.yaml is something like JSON-path-stile pointers to configurations we need to replace in the original shema.cql. What if instead we allowed templating of the schema.cql and provided the configurations in a yaml similar to values.yaml we use in scap?
Wed, Jun 5
I've removed the comment since the previous comment got be confused regarding terminology of schema vs configuration here.
Wikitech should not be using kafka job queue at all per T192361#4139799
After upgrading to swagger-ui 3 everything is collapsed by default, so this is no longer a valid issue.
We have eliminated the timeline-based storage semantics alltogether, so this is no longer a valid issue.
This has been solved by https://github.com/wikimedia/swagger-router/pull/62
We have switched to using openapi-schema-validator so I believe this issue is resolved. Thank you @Clarakosi
I believe our current approach has proved to work well for us, I'm inclined to close this ticket
The event destinations are configurable now, we can resolve this.
Merged, published and tested. Resolving. Thank you @holger.knust
Tue, Jun 4
The original issue report message was before the switch to the new PDF renderer, so it is 100% not related. The Electron renderer was experiencing some issues before we switched to proton, might be related to that.
Mon, Jun 3
Hm. On one hand I would agree given that additional properties is basically a go for anything, so it can easily break refinery for example. On the other hand having additional properties: false makes it much harder to add new fields - we need to first add field as optional, then start producing it and then make it required if it is required. However, I think it's a minor problem.
Thu, May 30
Wed, May 29
Yes, but the paths are probably gonna be different as well..
Tue, May 28
In theory that logic could be amended to handle Parsoid REST URLs with domains in them, but it would be confusing, would add extra complexity for no good reason, and would not work for third parties
May 24 2019
LGTM, but let's not execute it until next week?
May 23 2019
Couple of notes:
May 21 2019
May 18 2019
This is a result of how we're treating the storage of Parsoid data now. Older revisions are not stored anymore unless you provide ?stash=true parameter and a matching html/data-parsoid can only be fetched if you provide a TID.
May 16 2019
Since I have finished refactoring and reshuffling code in parsoid.js module this can now be picked up and worked on.
May 15 2019
Currently we also are supplying original HTML and Data-Parsoid for wikitext/to/html transformations. This caused a bug in RESTBase when the transformation was failing if the original was not present and attempt to fetch it was failing with a 404. Currently, the optimization is disabled in Parsoid, we will stop fetching the original to save some roundtrips to cassandra. When(if) the optimization is reintroduced, we will need to start fetching the original again, but make sure we gracefully handle the case when it's not there.
The definitions docs had incorrect referencing and apparently swagger 3 is not as chill about it as swagger 2, https://github.com/wikimedia/restbase/pull/1133 fixes it
May 14 2019
Good point. So, we need to replace the .length with Buffer.byteLength with 'utf8' encoding and provide some tests.
I believe https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/503651 is the reason.
What is this mediawiki.api-request data ultimately for and why does it require a UUID?
May 13 2019
The caveat would be we would have to update all dashboards for all services residing on the same host cluster (scb being the problem here) pretty much on the same timeframe, mostly due to the fact statsd configuration is the same for all services on the same host.
The patch has been SWATted, so now VE provides an appropriate query parameter and the responses are not cached. This particular one is done. Resolving.
Thank you for an impressive level of details :) There's a bunch of other places where we abuse the timing metric within services exactly for the reason that we've needed to have percentiles, so the decision we make here should probably be adopted elsewhere.
May 12 2019
We have been using UUID v1 in EventBus events for a while now with no problem, but I guess for other events it did not really matter since they're much lower volume and are created within already very heavy code paths.
May 9 2019
Clearly, the links are incorrect, the /api/rest_v1/ is missing. I think it's related to how we're replacing paths in swaggerUI.js in hyperswitch. @Clarakosi was working on it a lot recently, she should be able to have a look.
May 7 2019
It's is still the case sometimes. It's not severe, but we actually have a hack to insert an artificial meta tag with an article TID into Parsoid HTML in order to workaround this issue. It was added as temporary workaround many years ago and is becoming a permanent workaround, which is not good.
May 6 2019
(a) this should actually be documented in the API docs (swagger?) alongwith the implications of what happens if a client doesn't comply
changeprop has not been moved to k8s, so no, it can not be marked as resolved.
May 2 2019
Apr 30 2019
Apr 29 2019
After adding the warn logging, there were 15 requests recorded for getSections all in a span of a few minutes, from a browser, so I believe that was someone playing with it from the docs UI. I think we can safely remove the endpoints and the code.
I believe that this can be closed after we are easily sustaining almost 7k events per second in prod.