mobrovac (Marko Obrovac)
Spy

Projects (44)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Dec 16 2014, 7:40 PM (152 w, 5 d)
Availability
Available
IRC Nick
mobrovac
LDAP User
Mobrovac
MediaWiki User
Mobrovac

Recent Activity

Sat, Nov 18

mobrovac edited projects for T180626: Firejail (and cpulimit, if feasable) headless chromium processes, added: Services (next); removed Services (watching).

Each service running in production is already properly sandboxed with firejail for security reasons; this is something we provide out of the box. However, given that chromium might take up considerable resources, we should limit its resource usage as well. I would propose, however, to tackle this in a second step after the initial prototype has been built, but before this goes into production.

Sat, Nov 18, 11:35 AM · Services (next), Proton, Electron-PDFs, Readers-Web-Backlog
mobrovac edited projects for T179875: Update RESTBase to get summary content from MCS Summary 2.0 endpoint when development is complete, added: RESTBase-API, Services (next); removed Services.

In general I support pushing this out to beta cluster. I think this is mostly a question for the Services team. I'm not sure if there would be a lot of testing done on beta cluster, though.

Sat, Nov 18, 10:09 AM · Services (next), RESTBase-API, RESTBase, Page Content Service, Reading-Infrastructure-Team-Backlog (Kanban), Reading Epics (Platform JS CSS and HTML consolidation)

Fri, Nov 17

mobrovac committed rMSCR47cb5fa425fb: Update to service-template-node v0.5.3 (authored by mobrovac).
Update to service-template-node v0.5.3
Fri, Nov 17, 4:01 PM
mobrovac created T180800: Update to service-template-node v0.5.3.
Fri, Nov 17, 3:59 PM · Patch-For-Review, service-template-node, User-mobrovac, Services (doing), Readers-Web-Backlog, Proton, Electron-PDFs
mobrovac added a comment to T180604: Chromium render keeps tasks in the queue even browser connection is lost.

Oh right, right. In that case, the req object fires a close event which can be used in the service. So, something like:

Fri, Nov 17, 3:48 PM · Services (watching), Readers-Web-Kanban-Board, Proton
mobrovac added a project to T180604: Chromium render keeps tasks in the queue even browser connection is lost: Services (watching).

So this sounds like the browser instance hasn't been terminated when expected. I haven't yet had time to really look into this, but it looks like there have been some changes in the browser.close() API which might help with this, so updating to puppeteer v0.13.0 might help (and by looking at the list of changes since v0.11.0 would actually be advisable).

Fri, Nov 17, 3:36 PM · Services (watching), Readers-Web-Kanban-Board, Proton
mobrovac added a comment to T180384: Turn off Trending Service.

@Pchelolo asked me a few questions

are you up for being a maintainer of it?

I am, although one of the biggest pain points for maintaining this so far has been the inability to get at live data during testing and the fact Vagrant has to be used and requires some hacking (https://gerrit.wikimedia.org/r/#/c/335555/). It's not been 100% clear who was responsible for updates, so I'd appreciate more clarity around that and what level of support I could get from services.

Fri, Nov 17, 8:25 AM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
mobrovac added a comment to T180366: All Reading Infrastructure engineers should have deploy rights for all services Readers engineering maintains.

So while this doesn't have a sudo permission listed in the admins module, I'm assuming the actual roles set permissions to allow 'deploy-service' users extra rights? Please advise, as all advanced rights require ops meeting review. (I'd like to get this on the meeting notes for next Monday if so, since @Tgr has already been waiting a bit for this access.)

Fri, Nov 17, 7:58 AM · Ops-Access-Requests, Operations, Patch-For-Review, Reading-Infrastructure-Team-Backlog (Kanban)

Thu, Nov 16

mobrovac added a comment to T179422: Reshape RESTBase Cassandra clusters.

I propose to convert these in pairs (4 sets of 2), with a bit of time for compaction to settle in between.

Thu, Nov 16, 5:04 PM · User-Eevans, Patch-For-Review, User-fgiunchedi, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a comment to T180682: Investigate ChangeProp memory growth when a rule hits concurrency limit.

If that's the case, then we should also see the memory of CP follow the growing trend of backlogged messages, but that doesn't seem to be the case.

Thu, Nov 16, 3:16 PM · ChangeProp, Services (doing)

Wed, Nov 15

mobrovac added a watcher for TechCom: mobrovac.
Wed, Nov 15, 9:30 PM
mobrovac added a member for TechCom: mobrovac.
Wed, Nov 15, 9:13 PM
mobrovac added a subtask for T179421: Migrate revisions and restrictions from legacy to new storage: T180568: Aberrant load on instances involved in recent bootstrap.
Wed, Nov 15, 4:57 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a parent task for T180568: Aberrant load on instances involved in recent bootstrap: T179421: Migrate revisions and restrictions from legacy to new storage.
Wed, Nov 15, 4:57 PM · Services (doing), User-Eevans, Cassandra, Operations
mobrovac added a comment to T179421: Migrate revisions and restrictions from legacy to new storage.

Given T180568: Aberrant load on instances involved in recent bootstrap, and out of an abundance of caution, I would avoid deploying anything until we have a better understanding of what is going on there.

Wed, Nov 15, 4:57 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a project to T180568: Aberrant load on instances involved in recent bootstrap: Services (doing).
Wed, Nov 15, 3:54 PM · Services (doing), User-Eevans, Cassandra, Operations
mobrovac added a comment to T179421: Migrate revisions and restrictions from legacy to new storage.

PR #909 has been merged. The plan is to deploy the switch of both revisions and restrictions tomorrow, 2017-11-16.

Wed, Nov 15, 3:22 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac renamed T176126: Update node-rdkafka version to v2.x from Update node-rdkafka version to 2.0 to Update node-rdkafka version to v2.x.
Wed, Nov 15, 3:18 PM · Services (blocked), EventBus, Analytics, Trending-Service, Reading-Infrastructure-Team-Backlog, ChangeProp
mobrovac merged T180591: Update node-rdkafka to v2 into T176126: Update node-rdkafka version to v2.x.
Wed, Nov 15, 3:17 PM · Services (blocked), EventBus, Analytics, Trending-Service, Reading-Infrastructure-Team-Backlog, ChangeProp
mobrovac merged task T180591: Update node-rdkafka to v2 into T176126: Update node-rdkafka version to v2.x.
Wed, Nov 15, 3:17 PM · Analytics, Services (later)
mobrovac added a comment to T180017: Timeouts on event delivery to EventBus.

https://groups.google.com/forum/#!msg/python-tornado/qqF5JQdP6XU/ZB8bI4AIU5sJ seems relevant to the problem.

Wed, Nov 15, 3:06 PM · MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), Patch-For-Review, Services (next), EventBus, Analytics
mobrovac added a comment to T179421: Migrate revisions and restrictions from legacy to new storage.

I have created the schemas according to the YAML above except I used the LZ4Compressor with 64kb chunk length as per our discussion yesterday. No problems were spotted in the Cassandra logs during or after the creation process.

Wed, Nov 15, 2:49 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac renamed T180600: Set up Redis for ChangeProp in deployment-prep from Set up reds for ChangeProp in deployment-prep to Set up Redis for ChangeProp in deployment-prep.
Wed, Nov 15, 2:12 PM · Services (next), ChangeProp
mobrovac added a subtask for T179422: Reshape RESTBase Cassandra clusters: T180562: Degraded RAID on restbase2004.
Wed, Nov 15, 9:03 AM · User-Eevans, Patch-For-Review, User-fgiunchedi, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a parent task for T180562: Degraded RAID on restbase2004: T179422: Reshape RESTBase Cassandra clusters.
Wed, Nov 15, 9:03 AM · Services (watching), Operations, ops-codfw
mobrovac added a subtask for T179422: Reshape RESTBase Cassandra clusters: T180568: Aberrant load on instances involved in recent bootstrap.
Wed, Nov 15, 9:03 AM · User-Eevans, Patch-For-Review, User-fgiunchedi, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a parent task for T180568: Aberrant load on instances involved in recent bootstrap: T179422: Reshape RESTBase Cassandra clusters.
Wed, Nov 15, 9:03 AM · Services (doing), User-Eevans, Cassandra, Operations
mobrovac added a project to T180562: Degraded RAID on restbase2004: Services (watching).
Wed, Nov 15, 8:29 AM · Services (watching), Operations, ops-codfw

Tue, Nov 14

mobrovac added a comment to T180402: Remove custom ordering from ReadingLists extension.

@Tgr we've never had the need for sorting so we've never settled on a convention. I personally like the /lists/?sort=name more then others.

Tue, Nov 14, 6:34 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, Reading List Service
mobrovac added a comment to T180416: Share configuration between ReadingLists extension and RESTBase.

I don't think there is a straightforward way of doing that, given that the Swagger spec is treated as a static asset. Is the point here just to be able to communicate the config in the docs or does the RB module need that info? For the former, you might want to just link to the extension's docs where the config is available for users to see and for the latter the module could request it during initialisation or on the first request (preferred). And for both cases, copy/pasta is always an option :)

Tue, Nov 14, 3:18 PM · Reading List Service, Reading-Infrastructure-Team-Backlog
mobrovac added a comment to T180384: Turn off Trending Service.

Really the concept needs more testing for product viability. Unfortunately, we were unable to test in a non-production environment due to Kafka not being available outside of production.

Is it actually impossible to use that in Labs or is it just that whoever put it into production didn't properly mirror it in beta?

It is actually impossible because AIUI the edits stream via Kafka is not available in deployment-prep, and this is of course an issue that left no room for testing there. It's a very unfortunate place to be in, and in fact this is probably the whole reason why this was deployed in production in the first place.

Tue, Nov 14, 12:47 PM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
mobrovac edited projects for T175212: Services Q2 2017/18 goal: Migrate a subset of jobs to multi-DC enabled event processing infrastructure., added: Services (doing); removed Services (next).
Tue, Nov 14, 8:54 AM · Services (doing), Patch-For-Review, MediaWiki-JobQueue, ChangeProp, Analytics, EventBus, Goal
mobrovac added a comment to T180366: All Reading Infrastructure engineers should have deploy rights for all services Readers engineering maintains.

Thnx @Mholloway, +1 from me. This will also need approval from Gergo's direct manager. Could you add them and have them approve the request?

Tue, Nov 14, 8:34 AM · Ops-Access-Requests, Operations, Patch-For-Review, Reading-Infrastructure-Team-Backlog (Kanban)

Mon, Nov 13

mobrovac edited projects for T180384: Turn off Trending Service, added: Services (designing), Operations; removed Services.
Mon, Nov 13, 6:51 PM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
mobrovac added a comment to T180384: Turn off Trending Service.

Turning the service off in itself is trivial (a patch + deploy), but there is a larger question here. The results of the trending edits service are publicly available via the REST API. The end point is marked as experimental, so we are covered on that front. However, the task's description alludes to the service being put back in production at some point. Is that so? It is a bit confusing from the user perspective to have the end point available, then for it to disappear and resurface later.

Mon, Nov 13, 6:51 PM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
mobrovac added a project to T180366: All Reading Infrastructure engineers should have deploy rights for all services Readers engineering maintains: Ops-Access-Requests.

In order to be able to deploy the services, one must be in the deploy-service group. From what I can see, @Mholloway and @bearND are already there, @Tgr is not, so might want to re-purpose this task for that.

Mon, Nov 13, 5:38 PM · Ops-Access-Requests, Operations, Patch-For-Review, Reading-Infrastructure-Team-Backlog (Kanban)
mobrovac removed a project from T179422: Reshape RESTBase Cassandra clusters: Patch-For-Review.
Mon, Nov 13, 5:18 PM · User-Eevans, Patch-For-Review, User-fgiunchedi, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a subtask for T179419: Migrate mathoid storage from legacy to new strategy: T172767: Prepare mathoid 0.7 release (tracking).
Mon, Nov 13, 4:50 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a parent task for T172767: Prepare mathoid 0.7 release (tracking): T179419: Migrate mathoid storage from legacy to new strategy.
Mon, Nov 13, 4:50 PM · Patch-For-Review, Release, Chemical-Markup, Mathoid
mobrovac closed T151396: Update Mathoid to service-template-node v0.5.3, a subtask of T172767: Prepare mathoid 0.7 release (tracking), as Resolved.
Mon, Nov 13, 4:48 PM · Patch-For-Review, Release, Chemical-Markup, Mathoid
mobrovac closed T151396: Update Mathoid to service-template-node v0.5.3 as Resolved.
Mon, Nov 13, 4:48 PM · Services (done), User-mobrovac, service-template-node, Mathoid
mobrovac edited projects for T151396: Update Mathoid to service-template-node v0.5.3, added: Services (done); removed Patch-For-Review, Services (doing).

Merged and deployed, resolving.

Mon, Nov 13, 4:48 PM · Services (done), User-mobrovac, service-template-node, Mathoid
mobrovac claimed T151396: Update Mathoid to service-template-node v0.5.3.
Mon, Nov 13, 11:43 AM · Services (done), User-mobrovac, service-template-node, Mathoid
mobrovac renamed T151396: Update Mathoid to service-template-node v0.5.3 from Update Mathoid to service-template-node v0.5.2 to Update Mathoid to service-template-node v0.5.3.
Mon, Nov 13, 10:56 AM · Services (done), User-mobrovac, service-template-node, Mathoid
mobrovac added a project to T175758: Update MCS to new service template version: Services (watching).

@Mholloway thank you for the PR and update! We have just released v0.5.3 which includes some improvements to the tests and support for the new mocha version, aside from your PR. Could you update the above patch to reflect that?

Mon, Nov 13, 10:54 AM · Reading-Infrastructure-Team-Backlog (Kanban), Services (watching), Patch-For-Review, service-template-node, Mobile-Content-Service

Sat, Nov 11

mobrovac added a comment to T179786: Update trending-edits' node-rdkafka to v1.x.

Thank you, @bearND for looking into it.

Sat, Nov 11, 10:47 AM · Patch-For-Review, User-Joe, Operations, Wikimedia-Incident, Reading-Infrastructure-Team-Backlog (Kanban), User-Jdlrobson, Trending-Service, Services (watching)

Fri, Nov 10

mobrovac added a project to T179786: Update trending-edits' node-rdkafka to v1.x: Wikimedia-Incident.

We have seen now twice in one week trending-edits swallowing memory on SCB nodes because the service's offsets disappear from Kafka. This is likely due to the fact that this service is the only one using an older version of node-rdkafka which was not compiled for the version present on SCB. The dependency has to be updated ASAP. @Jdlrobson could you please take care of this?

Fri, Nov 10, 11:53 AM · Patch-For-Review, User-Joe, Operations, Wikimedia-Incident, Reading-Infrastructure-Team-Backlog (Kanban), User-Jdlrobson, Trending-Service, Services (watching)

Thu, Nov 9

mobrovac added a comment to T180017: Timeouts on event delivery to EventBus.

+1 ^

There's also this old unmerged patch:
https://gerrit.wikimedia.org/r/#/c/302372/

However, when I tested at the time, it didn't show any performance improvements.

Thu, Nov 9, 8:57 AM · MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), Patch-For-Review, Services (next), EventBus, Analytics
mobrovac added a comment to T179419: Migrate mathoid storage from legacy to new strategy.

Uh, completely missed the fact that all of the Mathoid keyspaces need to go into the globaldomain storage group, not others... I dropped the others_T_mathoid__ng_* keyspaces, we will have to recreate them for the correct group.

Thu, Nov 9, 8:50 AM · RESTBase-Cassandra, RESTBase, Services (doing)

Wed, Nov 8

mobrovac added a comment to T179417: Migrate Parsoid from legacy to new storage.

All of the non-WP tables have been truncated now, snapshots can be cleared.

Wed, Nov 8, 5:45 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a comment to T179421: Migrate revisions and restrictions from legacy to new storage.

The CQL statements LGTM.

Wed, Nov 8, 4:39 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a comment to T179419: Migrate mathoid storage from legacy to new strategy.

Things to change:

  • all headers must be of type text
  • value in others_T_mathoid__ng_input must be of type text
  • value in others_T_mathoid__ng_check must be of type text
  • value in others_T_mathoid__ng_mml must be of type text
  • value in others_T_mathoid__ng_svg must be of type text
Wed, Nov 8, 4:37 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac closed T179420: Migrate definitions storage from the legacy to new strategy, a subtask of T179416: Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API, as Resolved.
Wed, Nov 8, 4:12 PM · Goal, Cassandra, Epic, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac closed T179420: Migrate definitions storage from the legacy to new strategy as Resolved.

RESTBase is now using Cassandra 3 for definitions. Resolving.

Wed, Nov 8, 4:12 PM · Services (done), RESTBase-Cassandra, RESTBase
mobrovac added a comment to T180017: Timeouts on event delivery to EventBus.

We have 16 CPU cores with HyperThreading enabled on each node, but only 8 EventBus proxy service workers, so one immediate thing that we need to do is to increase the number of workers.

Wed, Nov 8, 11:22 AM · MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), Patch-For-Review, Services (next), EventBus, Analytics
mobrovac closed T180005: RESTBASE startup error as Resolved.

https://github.com/wikimedia/template-expression-compiler/pull/3 fixes the issue in master, so resolving. @Slushpuppeh23 feel free to reopen if the problem continues.

Wed, Nov 8, 10:52 AM · Services (done), RESTBase

Tue, Nov 7

mobrovac created P6279 Cleanup job topics.
Tue, Nov 7, 5:17 PM · Services (next)
mobrovac added a comment to T179420: Migrate definitions storage from the legacy to new strategy.

Thank you @Eevans for the schemas. I created them, with the exception that value needs to be of type text because we are storing the stringified JSON object returned by MCS.

Tue, Nov 7, 2:46 PM · Services (done), RESTBase-Cassandra, RESTBase
mobrovac removed a project from T179417: Migrate Parsoid from legacy to new storage: Patch-For-Review.

All but the default group tables have been truncated. default still had activity due to the wikidata.org domain which had not been switched. We switched that one too, so we should be able to truncate that group tomorrow as well.

Tue, Nov 7, 12:04 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a comment to T179420: Migrate definitions storage from the legacy to new strategy.

Mentioned in SAL (#wikimedia-operations) [2017-11-07T11:42:46Z] <mobrovac> restbase truncating cassandra 2 non-WP tables for T179420

Tue, Nov 7, 11:44 AM · Services (done), RESTBase-Cassandra, RESTBase
mobrovac added a project to T179876: Expose media endpoint in RESTBase when ready: Services (later).
Tue, Nov 7, 11:14 AM · Services (later), RESTBase-API, Reading-Infrastructure-Team-Backlog (Kanban), Page Content Service, Reading Epics (Platform JS CSS and HTML consolidation)
mobrovac edited projects for T179876: Expose media endpoint in RESTBase when ready, added: RESTBase-API; removed RESTBase.
Tue, Nov 7, 11:13 AM · Services (later), RESTBase-API, Reading-Infrastructure-Team-Backlog (Kanban), Page Content Service, Reading Epics (Platform JS CSS and HTML consolidation)
mobrovac added a comment to T178189: [spike] Temporarily allow pushing large objects.

While migrating SCB nodes to stretch will need to happen, I don't think we will have the bandwidth to do so soon-ish. Since the headless Chrome/puppeteer approach is experimental at this point, how about setting it up temporarily in Ganeti for evaluation purposes and then migrate it to SCB once we move to stretch?

Tue, Nov 7, 8:37 AM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit

Mon, Nov 6

mobrovac updated the task description for T175210: Select candidate jobs for transferring to the new infrastucture.
Mon, Nov 6, 3:38 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Analytics, Operations, User-Joe, User-Elukey

Sun, Nov 5

mobrovac created T179786: Update trending-edits' node-rdkafka to v1.x.
Sun, Nov 5, 10:03 AM · Patch-For-Review, User-Joe, Operations, Wikimedia-Incident, Reading-Infrastructure-Team-Backlog (Kanban), User-Jdlrobson, Trending-Service, Services (watching)

Sat, Nov 4

mobrovac added a project to T179688: mediawiki-config changes not deployed automatically to deployment-videoscaler01: Multimedia.
Sat, Nov 4, 12:09 PM · Release-Engineering-Team (Kanban), User-greg, Patch-For-Review, Multimedia, Beta-Cluster-Infrastructure, Services (watching)

Fri, Nov 3

mobrovac updated the task description for T179416: Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API.
Fri, Nov 3, 9:51 AM · Goal, Cassandra, Epic, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac closed T179418: Migrate page summary from legacy to new storage as Resolved.

The switch has been completed, summaries are now fetched from and stored in Cassandra 3 only. Resolving.

Fri, Nov 3, 9:20 AM · Services (done), RESTBase-Cassandra, RESTBase
mobrovac closed T179418: Migrate page summary from legacy to new storage, a subtask of T179416: Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API, as Resolved.
Fri, Nov 3, 9:20 AM · Goal, Cassandra, Epic, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a comment to T173710: Job queue is increasing non-stop.

https://gerrit.wikimedia.org/r/#/c/385248 should be already working for commons, but from mwlog1001's runJob.log I can only see stuff like causeAction=unknown causeAgent=unknown (that probably only confirms that no authenticated user/bot is triggering these jobs iteratively).

Fri, Nov 3, 8:20 AM · User-Elukey, Patch-For-Review, Services (watching), Performance-Team (Radar), CirrusSearch, Discovery, Wikidata-Former-Sprint-Board, Wikidata, Operations, MediaWiki-JobQueue

Thu, Nov 2

mobrovac added a comment to T179579: Cannot read property 'substring' of null.

According to Logstash it appeared at the beginning of October, and then sporadically appearing towards the middle of the month, only to disappear and reappear in higher volume at the end of the month.

Thu, Nov 2, 5:37 PM · Patch-For-Review, Services (watching), Parsoid
mobrovac added a comment to T179353: Scap: Standardize git version.

sca*?

Thu, Nov 2, 5:32 PM · Operations, Release-Engineering-Team (Watching / External), Scap
mobrovac added a comment to T179417: Migrate Parsoid from legacy to new storage.

All but WPs (and the global domain, technically) have been switched to use the new storage schema with Cassandra 3. We need to keep the old contents around for the next 24h, though, before we can get rid of the data in Cassandra 2. data tables in the following keyspaces can be truncated (not dropped!) after that period elapses:

Thu, Nov 2, 12:27 PM · RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac renamed T179416: Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API from Program 7 Outcome 2 Objective 1: Develop a scalable and cost-effective storage solution for backing the REST API to Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API.
Thu, Nov 2, 12:12 PM · Goal, Cassandra, Epic, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac added a comment to T176627: Trial replacing Electron with headless Chromium in the render service.

Change 387871 merged by Bmansurov:
[mediawiki/services/chromium-render@master] Escape article title before sending it to RESTBase

https://gerrit.wikimedia.org/r/387871

Thu, Nov 2, 10:11 AM · Services (watching), Patch-For-Review, Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, Electron-PDFs
mobrovac triaged T179553: Cookies should not be forwarded to different domains as Low priority.

RESTBase's mediawiki auth filter is meant to be used only on routes that both need auth(n|z) and whose sub-requests need cookies to complete the action. In your case, that can be achieved by creating an internal end point (in the /sys/ hierarchy) that calls the MW action API for manipulating lists and then declare the auth filter on that route only.

Thu, Nov 2, 9:41 AM · Reading List Service, Reading-Infrastructure-Team-Backlog, Services (later), RESTBase

Wed, Nov 1

mobrovac closed T179494: restbase.svc.eqiad.wmnet directs requests to staging if the origin is staging too as Resolved.

Ok, after a round of apt-get remove --purge wikimedia-lvs-realserver && ip addr del 10.2.X.17/32 dev lo in both DCs, the LVS doesn't point to any of the staging hosts anymore. Thanks @mark and @BBlack for the swift help in diagnosing the issue!

Wed, Nov 1, 3:13 PM · Services (done), Traffic, Operations
mobrovac created T179494: restbase.svc.eqiad.wmnet directs requests to staging if the origin is staging too.
Wed, Nov 1, 2:23 PM · Services (done), Traffic, Operations
mobrovac added a comment to T178492: Create a more controlled WDQS cluster.
  • Since we have 2 active / active WDQS clusters (eqiad / codfw), we could use one of them to serve internal traffic and one as external endpoint. This defeats the purpose of having a backup datacenter, so that's not a long term solution.
Wed, Nov 1, 11:23 AM · Services (watching), Discovery, Wikidata, Structured-Data-Commons, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service

Tue, Oct 31

mobrovac added a project to T178168: Scap3ize the deploy repository: Services (watching).
Tue, Oct 31, 6:32 PM · Services (watching), Readers-Web-Backlog, Proton, Electron-PDFs
mobrovac added a comment to T178168: Scap3ize the deploy repository.

AFAIK, the docs wrt Scap3 config are up to date and should allow you to create the initial deployment config with ease. If you stumble upon problems, feel free to ping me and/or add me to the relevant Gerrit patches.

Tue, Oct 31, 6:31 PM · Services (watching), Readers-Web-Backlog, Proton, Electron-PDFs
mobrovac added a project to T179416: Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API: Goal.
Tue, Oct 31, 6:28 PM · Goal, Cassandra, Epic, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac renamed T179416: Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API from Migrate RESTBase use-cases to new storage strategy to Program 7 Outcome 2 Objective 1: Develop a scalable and cost-effective storage solution for backing the REST API.
Tue, Oct 31, 6:28 PM · Goal, Cassandra, Epic, RESTBase-Cassandra, RESTBase, Services (doing)
mobrovac changed the status of T179374: Use one keyspace per storage group from Open to Stalled.

We discussed this in the team meeting and decided to put this one on the back-burner for now until we complete the migration.

Tue, Oct 31, 6:19 PM · Cassandra, RESTBase, Services (designing)
mobrovac added projects to T179412: Stop storing feeds in Cassandra: Services (next), RESTBase.
Tue, Oct 31, 4:41 PM · RESTBase, Services (next)
mobrovac added a comment to T179083: Cassandra 3.11.0 schema creation seems unreliable.

Given the relatively small size of this cluster, it seems curious that agreement would take so long.

Yes, it is. I wonder what will happen when we need to create new keyspaces when we have the full Cassandra 3 cluster up and running. This definitely looks like a regression compared to Cassandra 2.

Tue, Oct 31, 1:44 PM · User-Eevans, RESTBase-Cassandra, Services (next)
mobrovac closed T179280: PHP out of memory error trying to log big events as Resolved.
Tue, Oct 31, 1:19 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Services (done), Analytics, EventBus
mobrovac created T179374: Use one keyspace per storage group.
Tue, Oct 31, 12:22 PM · Cassandra, RESTBase, Services (designing)
mobrovac added a comment to T179083: Cassandra 3.11.0 schema creation seems unreliable.

One thing that stood out, is a mismatch which occurred early (when I was creating the tables more aggressively), that continued to log long after the schema mutation that triggered it. The IDs in question do not seem to correspond with any existing schema.
There are also seems to be a fair number of requests (~1/5) that return an OperationTimedOut (even though the action succeeded):

Tue, Oct 31, 8:36 AM · User-Eevans, RESTBase-Cassandra, Services (next)

Mon, Oct 30

mobrovac triaged T179280: PHP out of memory error trying to log big events as High priority.
Mon, Oct 30, 6:36 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Services (done), Analytics, EventBus
mobrovac added a comment to T178983: Malformed HTTP message in EventBus logs.

Obviously, we need to get rid of such huge jobs, so one temporary solution that ought to be safe enough is to use the EventBus service to log to a file any message crossing a reasonable threshold (I'm thinking a couple of MBs at most) and then work on resolving individual jobs' problems as they arise.

Mon, Oct 30, 3:36 PM · Analytics-Kanban, Services (next), EventBus
mobrovac added a comment to T179083: Cassandra 3.11.0 schema creation seems unreliable.

Based on @Eevans' script, I made another one to create the new Parsoid tables and ran it on the production cluster. There were still quite a number of schema disagreement errors in spite of just one node being in charge of all of the creations.

Mon, Oct 30, 9:25 AM · User-Eevans, RESTBase-Cassandra, Services (next)

Fri, Oct 27

mobrovac added a project to T162241: Deploy meddo as part of tilerator-deploy: Services (watching).

What we could do here is add a field to our deploy field in package.json that could list extra node module dependencies to be installed during the build-repo step. Something like:

Fri, Oct 27, 9:28 AM · Services (watching), Patch-For-Review, Maps-Sprint, Maps (Maps-data)

Thu, Oct 26

mobrovac edited projects for T179019: deployment-prep statsd hiera does not have port, added: Services (watching); removed Services.
Thu, Oct 26, 9:24 AM · Patch-For-Review, Services (watching), MediaWiki-JobQueue
mobrovac edited projects for T179057: Cleanup stale cassandra graphite metrics, added: Services (watching), Cassandra; removed Services.

Also, there are metrics that are no longer being emitted and can be cleaned up, cf T173436: Delete graphite metrics for old CFs.

Thu, Oct 26, 9:23 AM · Cassandra, Services (watching), Patch-For-Review, monitoring
mobrovac created T179058: RB and CP logs disappeared from Logstash.
Thu, Oct 26, 8:56 AM · Operations, Patch-For-Review, ChangeProp, RESTBase, Services (watching), Wikimedia-Logstash

Wed, Oct 25

mobrovac updated the task description for T178997: AddUsagesForPageJob doesn't really report execution status.
Wed, Oct 25, 1:01 PM · EventBus, Analytics, MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), Wikibase-Quality-Constraints, Need-volunteer, Services (watching), Wikidata
mobrovac closed T158100: Deprecate and remove the public title/{title} endpoint as Resolved.

The /title/ and /title/{title}/ end points and the /page/revision hierarchy have been dropped. Resolving.

Wed, Oct 25, 10:58 AM · Services (done), RESTBase-API, RESTBase
mobrovac closed T178881: Set up ChangeProp for JobQueue in beta as Resolved.

CP4JQ has been set up on deployment-cpjobqueue. Resolving.

Wed, Oct 25, 10:16 AM · Services (done), User-mobrovac, MediaWiki-JobQueue, EventBus, ChangeProp, Analytics
mobrovac closed T178881: Set up ChangeProp for JobQueue in beta, a subtask of T157088: [EPIC] Develop a JobQueue backend based on EventBus, as Resolved.
Wed, Oct 25, 10:16 AM · MediaWiki-JobQueue, Epic, Services (doing), User-mobrovac, Analytics, ChangeProp, EventBus