Pchelolo
User

Projects (6)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jun 24 2015, 10:23 AM (151 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Pchelolo

Recent Activity

Yesterday

Pchelolo added a subtask for T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus: T188947: Create an LVS endpoint for jobrunners on videoscalers.
Tue, May 22, 10:48 AM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Services (doing), Goal, EventBus, MediaWiki-JobQueue, Analytics
Pchelolo added a parent task for T188947: Create an LVS endpoint for jobrunners on videoscalers: T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus.
Tue, May 22, 10:48 AM · Services (blocked), User-Joe, Operations, MediaWiki-JobQueue, User-mobrovac, Analytics, ChangeProp, EventBus
Pchelolo updated the task description for T175210: Select candidate jobs for transferring to the new infrastucture.
Tue, May 22, 10:19 AM · Services (doing), MediaWiki-JobQueue, ChangeProp, Analytics, EventBus, Operations, User-Joe, User-Elukey

Sat, May 19

Pchelolo created T195066: Split examples away from service-template-node.
Sat, May 19, 2:05 PM · service-template-node, Services (later)
Gerrit Code Review <gerrit@wikimedia.org> committed rMSCRcb1986cf52d5: Merge "Use metrics to provide some basic stats about the service" (authored by Pchelolo).
Merge "Use metrics to provide some basic stats about the service"
Sat, May 19, 12:16 PM

Fri, May 18

Pchelolo committed rMSCD070e5f3a96cb: [Config] Use the ores updater and emit the revision score event (authored by Pchelolo).
[Config] Use the ores updater and emit the revision score event
Fri, May 18, 4:42 PM
Pchelolo updated the task description for T194682: Run a enwiki dump to refresh content for summary and mobile-sections endpoints.
Fri, May 18, 4:02 PM · Services (done), Reading-Infrastructure-Team-Backlog, Mobile-Content-Service
Pchelolo closed T194682: Run a enwiki dump to refresh content for summary and mobile-sections endpoints as Resolved.
Fri, May 18, 4:01 PM · Services (done), Reading-Infrastructure-Team-Backlog, Mobile-Content-Service
Pchelolo closed T194682: Run a enwiki dump to refresh content for summary and mobile-sections endpoints, a subtask of T191869: Update mobile-sections and summary to source Wikidata descriptions from local wiki where available, as Resolved.
Fri, May 18, 4:01 PM · Patch-For-Review, Mobile-Content-Service, Reading-Infrastructure-Team-Backlog (Kanban)
Pchelolo added a comment to T194682: Run a enwiki dump to refresh content for summary and mobile-sections endpoints.

Done for summaries as well

Fri, May 18, 4:01 PM · Services (done), Reading-Infrastructure-Team-Backlog, Mobile-Content-Service

Tue, May 15

Pchelolo closed T189618: Investigate group.initial.rebalance.delay.ms Kafka setting as Resolved.

This was deployed to production, the number of rebalance log messages during the consumer startups declined, so I'm resolving the ticket.

Tue, May 15, 11:36 PM · Services (done), User-Elukey, EventBus, Analytics
Pchelolo closed T189618: Investigate group.initial.rebalance.delay.ms Kafka setting, a subtask of T167039: Upgrade Kafka on main cluster with security features, as Resolved.
Tue, May 15, 11:36 PM · Patch-For-Review, EventBus, Services (watching), Analytics-Kanban, Analytics
Pchelolo closed T189618: Investigate group.initial.rebalance.delay.ms Kafka setting, a subtask of T179684: Kafka sometimes misses to rebalance topics properly, as Resolved.
Tue, May 15, 11:36 PM · User-Elukey, Services (doing), EventBus, Analytics
Pchelolo committed rMSCD181c3243364d: Increase ORES precaching concurrency. (authored by Pchelolo).
Increase ORES precaching concurrency.
Tue, May 15, 5:19 PM
Pchelolo added a comment to T194682: Run a enwiki dump to refresh content for summary and mobile-sections endpoints.

Mobile is done, running summaries.

Tue, May 15, 5:14 PM · Services (done), Reading-Infrastructure-Team-Backlog, Mobile-Content-Service

Mon, May 14

Pchelolo edited projects for T194682: Run a enwiki dump to refresh content for summary and mobile-sections endpoints, added: Services (doing); removed Services.

Started one for mobile-sections with concurrency 100 on restbase-dev1004 in a screen session. I will monitor for a little while to make sure the concurrency's fine.

Mon, May 14, 7:43 PM · Services (done), Reading-Infrastructure-Team-Backlog, Mobile-Content-Service
Pchelolo added a comment to T191785: Implement Content Service endpoint for availability of feed content by Wikipedia languages.

@Pchelolo do you have any other objections or were you just looking to reuse schemas for consistency?

Mon, May 14, 6:59 PM · Services (watching), Patch-For-Review, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Reading-Infrastructure-Team-Backlog (Kanban), Android-app-feature-Multilingual
Pchelolo added a comment to T191785: Implement Content Service endpoint for availability of feed content by Wikipedia languages.

Ye, it might be more complex to parse indeed. Just throwing out the ideas, feel free to discard it.

Mon, May 14, 6:48 PM · Services (watching), Patch-For-Review, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Reading-Infrastructure-Team-Backlog (Kanban), Android-app-feature-Multilingual
Pchelolo added a comment to T191785: Implement Content Service endpoint for availability of feed content by Wikipedia languages.

@bearND heh, I've just copy-pasted this from the config, obviously it should return this in json format.

Mon, May 14, 6:15 PM · Services (watching), Patch-For-Review, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Reading-Infrastructure-Team-Backlog (Kanban), Android-app-feature-Multilingual
Pchelolo added a comment to T194277: The page translation system repeatedly trying to remove already removed translations.

@Pchelolo TranslateDeleteJobs are not being run in parallel in two job queues, are they?

Mon, May 14, 6:08 PM · MediaWiki-extensions-Translate
Pchelolo added a comment to T191785: Implement Content Service endpoint for availability of feed content by Wikipedia languages.

@bearND basically just the schema we already have, just adjusted per-domain:

Mon, May 14, 5:55 PM · Services (watching), Patch-For-Review, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Reading-Infrastructure-Team-Backlog (Kanban), Android-app-feature-Multilingual

Fri, May 11

Pchelolo moved T189618: Investigate group.initial.rebalance.delay.ms Kafka setting from blocked to doing on the Services board.
Fri, May 11, 7:22 PM · Services (done), User-Elukey, EventBus, Analytics
Pchelolo closed T193230: EventBus HTTP Proxy service does not report errors to logstash as Resolved.
Fri, May 11, 7:05 PM · Services (done), Analytics-Kanban, Wikimedia-Logstash, EventBus, Analytics
Pchelolo added a comment to T193230: EventBus HTTP Proxy service does not report errors to logstash.

We've got the logs in logstash, thank you @Ottomata

Fri, May 11, 7:05 PM · Services (done), Analytics-Kanban, Wikimedia-Logstash, EventBus, Analytics
Pchelolo edited projects for T189358: Log API path for RESTBAse errors, added: Services (doing); removed Services (next).

Finally after a bunch of logging enhancements, it's now possible to do this with the following PR: https://github.com/wikimedia/hyperswitch/pull/90

Fri, May 11, 6:58 PM · Services (doing), Reading List Service V1, Reading List Service, Reading-Infrastructure-Team-Backlog, RESTBase
Pchelolo added a comment to T191785: Implement Content Service endpoint for availability of feed content by Wikipedia languages.

I'm wondering if it would be nicer to expose the actual JSON schema of the expected response and not invent a custom format?

Fri, May 11, 6:40 PM · Services (watching), Patch-For-Review, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Reading-Infrastructure-Team-Backlog (Kanban), Android-app-feature-Multilingual
Pchelolo edited projects for T194190: Infinite rerender loop in RESTBase, added: Services (later); removed Services (doing).
Fri, May 11, 4:49 PM · Services (later), RESTBase
Pchelolo added a comment to T194190: Infinite rerender loop in RESTBase.

After breaking the loop it seems to have stopped. Digging through the code for a whole day was not fruitful and even an attempt to recreate similar situation by manually inserting some non-existent revision onto the storage didn't let me reproduce this.

Fri, May 11, 4:48 PM · Services (later), RESTBase

Thu, May 10

Pchelolo added a comment to T167180: Emit revision-score event to EventBus and expose in EventStreams.

@Ottomata, when we send the revision-create event to ORES, precache endpoint we get the scores as a result, but we do not have the capability to inject those results into the event and re-send it in the config. What we could do is to write a js module to handle that.

Thu, May 10, 4:56 PM · Scoring-platform-team, Trending-Service, Reading-Infrastructure-Team-Backlog, Patch-For-Review, Analytics, EventBus, ORES

Wed, May 9

Pchelolo added a comment to T193230: EventBus HTTP Proxy service does not report errors to logstash.

HA! gelf as the solution? I've told you!!!

Wed, May 9, 8:01 PM · Services (done), Analytics-Kanban, Wikimedia-Logstash, EventBus, Analytics
Pchelolo updated subscribers of T194276: Review all VMs in the 'services' WMCS project.

pdfservice can go away for a well-deserved retirement
swproxy can go away for a well-deserved retirement
cc @bearND re appservice, do we have a real beta cluster instance already? RB is still going to the appservice for tests
cc @mobrovac @Mvolz re zotero-test citoif-test citoid-jessie-test sca1

Wed, May 9, 6:48 PM · Performance-Team (Radar), Services (attic)
Pchelolo updated subscribers of T189618: Investigate group.initial.rebalance.delay.ms Kafka setting.

@Ottomata @elukey now that we were successful in upgrading Kafka, I think we can try increasing this to 10 seconds. Do you think the number is reasonable?

Wed, May 9, 6:36 PM · Services (done), User-Elukey, EventBus, Analytics

Tue, May 8

Pchelolo closed T192107: Unable to mark pages for translation in Meta as Resolved.
Tue, May 8, 9:47 PM · EventBus, Services (done), Analytics, MediaWiki-JobQueue, MediaWiki-extensions-Translate
Pchelolo added a comment to T192107: Unable to mark pages for translation in Meta.

I think it's time to close this one. Please reopen if that breaks again during the transition process.

Tue, May 8, 9:46 PM · EventBus, Services (done), Analytics, MediaWiki-JobQueue, MediaWiki-extensions-Translate
Pchelolo added a comment to T194190: Infinite rerender loop in RESTBase.

Interesting, that revision 269290610 actually did exist for the page once, but it was somehow deleted, since that revision ID exists in MySQL archive table.

Tue, May 8, 8:01 PM · Services (later), RESTBase
Pchelolo triaged T194190: Infinite rerender loop in RESTBase as High priority.
Tue, May 8, 6:09 PM · Services (later), RESTBase
Pchelolo created T194190: Infinite rerender loop in RESTBase.
Tue, May 8, 6:09 PM · Services (later), RESTBase
Pchelolo updated the task description for T167039: Upgrade Kafka on main cluster with security features.
Tue, May 8, 3:04 PM · Patch-For-Review, EventBus, Services (watching), Analytics-Kanban, Analytics
Pchelolo added a comment to T179684: Kafka sometimes misses to rebalance topics properly.

Oh, sorry. It actually just happened again at 07:13 UTC:

Tue, May 8, 1:38 PM · User-Elukey, Services (doing), EventBus, Analytics

Mon, May 7

Pchelolo added a comment to T179684: Kafka sometimes misses to rebalance topics properly.

This happened again today with on_transclusions_update group - it just stopped being consumed completely without a visible reason. There's some logs regarding the topic that this group was consuming and some messages regarding it being rebalanced, but no crazy multi-generation reassignment logs.

Mon, May 7, 10:55 PM · User-Elukey, Services (doing), EventBus, Analytics
Pchelolo updated the task description for T167039: Upgrade Kafka on main cluster with security features.
Mon, May 7, 9:45 PM · Patch-For-Review, EventBus, Services (watching), Analytics-Kanban, Analytics
Pchelolo added a comment to T189357: Improve logging of MW API exceptions in RESTBase.

@Tgr the link you provided doesn't work and I can't find instances of logs that look like the one you're talking about in restbase logs. Can you show one again please?

Mon, May 7, 9:19 PM · Services (next), Reading List Service V1, Reading List Service, RESTBase, Reading-Infrastructure-Team-Backlog
Pchelolo updated the task description for T167039: Upgrade Kafka on main cluster with security features.
Mon, May 7, 5:25 PM · Patch-For-Review, EventBus, Services (watching), Analytics-Kanban, Analytics
Pchelolo added a comment to T193790: Global renames get stuck at ty.wikipedia.

Doesn't seem related to the job queue either as there's no job-related logs an it was propagating correctly

Mon, May 7, 2:50 PM · Wikimedia-Site-requests, MediaWiki-extensions-CentralAuth, GlobalRename

Thu, May 3

Pchelolo created T193773: Improvements to the new storage deletion mechanism.
Thu, May 3, 5:44 PM · RESTBase-Cassandra, RESTBase, Services (later)
Pchelolo closed T193080: Enable snappy compression for eventbus Kafka producer as Resolved.

We've enabled it for change-prop instances as well, so I consider this task resolved.

Thu, May 3, 4:51 PM · Services (done), Analytics-Kanban, EventBus
Pchelolo updated subscribers of T193230: EventBus HTTP Proxy service does not report errors to logstash.

@Ottomatta so we are still not getting proper logs right? At least I can't find them ;(

Thu, May 3, 2:17 AM · Services (done), Analytics-Kanban, Wikimedia-Logstash, EventBus, Analytics

Wed, May 2

Pchelolo closed T189360: Consider logging 4xx action API errors in RESTBase as Resolved.

All the 4xx from MW Action API except 404 are logged with 1% probability now. Log entry example: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.05.02/restbase?id=AWMiRbXNpesmgM3lqi_i&_g=h@b74aee6

Wed, May 2, 7:16 PM · Services (done), Reading List Service V1, Reading-Infrastructure-Team-Backlog, Reading List Service, RESTBase

Tue, May 1

Pchelolo committed rMSCDa68ec9a53142: Add snappy compression to messages produced directly to kafka. (authored by Pchelolo).
Add snappy compression to messages produced directly to kafka.
Tue, May 1, 9:40 PM
Pchelolo closed T193471: JobQueueGroup's singletons using the wrong wgJobTypeConf as Invalid.

Yes, wgJobTypeConf is intended to be set the same on all wikis to avoid having to shell/API out.

Tue, May 1, 8:33 PM · Services (done), MediaWiki-JobQueue
Pchelolo closed T193471: JobQueueGroup's singletons using the wrong wgJobTypeConf, a subtask of T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus, as Invalid.
Tue, May 1, 8:33 PM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Services (doing), Goal, EventBus, MediaWiki-JobQueue, Analytics
Pchelolo closed T181291: Separate retry and error topics between JobQueue and normal ChangeProp as Resolved.

The topics have been separated. They use the service name as prefix now.

Tue, May 1, 5:15 PM · Services (done), ChangeProp
Pchelolo closed T192363: The .meta.domain is incorrect in EventBus when other wiki is used as Resolved.

THis has been deployed, the domain is now reported correctly.

Tue, May 1, 5:10 PM · Services (done), EventBus, Analytics, MediaWiki-JobQueue
Pchelolo closed T192363: The .meta.domain is incorrect in EventBus when other wiki is used, a subtask of T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus, as Resolved.
Tue, May 1, 5:10 PM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Services (doing), Goal, EventBus, MediaWiki-JobQueue, Analytics
Pchelolo added a comment to T192473: deployment-prep has jobqueue issues.

It seems this task got derailed completely from the original purpose.

Tue, May 1, 4:12 PM · Services, Release-Engineering-Team, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Puppet, Beta-Cluster-Infrastructure
Pchelolo added a comment to T189360: Consider logging 4xx action API errors in RESTBase.

PR here https://github.com/wikimedia/restbase/pull/990

Tue, May 1, 4:05 PM · Services (done), Reading List Service V1, Reading-Infrastructure-Team-Backlog, Reading List Service, RESTBase
Pchelolo added a project to T142313: Add global logging context: Services (watching).
Tue, May 1, 4:04 PM · Services (watching), User-Tgr, Developer-Wishlist (2017), MediaWiki-Debug-Logger
Pchelolo added a comment to T193417: ReadingLists performance degradation.

That suggests something times out and gets retried but I have no idea what that something might be.

Tue, May 1, 2:07 PM · Services (watching), Performance-Team (Radar), Reading-Infrastructure-Team-Backlog (Kanban), Reading List Service
Pchelolo added a comment to T191282: Wikimedia\Rdbms\LoadBalancer::{closure}: found writes pending.

Almost all seem to come from the job queue, unfortunately, I don't think the job name is recorded.

Tue, May 1, 1:56 PM · Wikidata, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Services (watching), MediaWiki-Database, EventBus, JobRunner-Service, Wikimedia-log-errors, Analytics
Pchelolo added a comment to T193417: ReadingLists performance degradation.

@Tgr we don't have stats JUST for the Action API request itself, but we do have stats for calls to the action.js module. Since it's such a thin wrapper over a pure request to the Action API, an since for reading lists we only use rawquery that is even thinner, I guess the stats for restbase.external.sys_action_rawquery.ALL.ALL.p95 woul be a decent representation of the actual latencies we see in requests from RESTBase to Action API:

Tue, May 1, 1:34 PM · Services (watching), Performance-Team (Radar), Reading-Infrastructure-Team-Backlog (Kanban), Reading List Service
Pchelolo added a comment to T193417: ReadingLists performance degradation.

Also, playing with latencies dashboard on RESTBase level, we have the ability to separate latencies by response code and I can see the same degradation for 2xx as for 4xx:

Tue, May 1, 1:29 PM · Services (watching), Performance-Team (Radar), Reading-Infrastructure-Team-Backlog (Kanban), Reading List Service
Pchelolo added a project to T193417: ReadingLists performance degradation: Services (watching).
Tue, May 1, 1:21 PM · Services (watching), Performance-Team (Radar), Reading-Infrastructure-Team-Backlog (Kanban), Reading List Service
Pchelolo added a comment to T193417: ReadingLists performance degradation.

@mobrovac @Pchelolo does RESTBase log anything that I could use to cross-correlate MW API request durations?

Tue, May 1, 1:21 PM · Services (watching), Performance-Team (Radar), Reading-Infrastructure-Team-Backlog (Kanban), Reading List Service

Mon, Apr 30

Pchelolo added a comment to T193254: Global renames get stuck at metawiki.

Other instances of cross-wiki job scheduling that are yielded by a quick ack 'JobQueueGroup::singleton\( ': Cognate/LocalJobSubmitJob, MassMessage/MassMessageSubmitJob, GlobalUsage/GlobalUsageCachePurgeJob, GlobalUserPage/LocalJobSubmitJob, SecurePoll/PopulateVoterListJob.

Mon, Apr 30, 9:46 PM · Services (done), Analytics, EventBus, MediaWiki-JobQueue, Operations, Wikimedia-log-errors, GlobalRename, Wikimedia-Site-requests, MediaWiki-extensions-CentralAuth
Pchelolo added a comment to T193254: Global renames get stuck at metawiki.

Do we need to migrate CentralAuthRename too? If so, can it be done? Thanks.

Mon, Apr 30, 9:24 PM · Services (done), Analytics, EventBus, MediaWiki-JobQueue, Operations, Wikimedia-log-errors, GlobalRename, Wikimedia-Site-requests, MediaWiki-extensions-CentralAuth
Pchelolo moved T185233: Modern Event Platform (with EventLogging of the Future (EoF)) from Backlog to watching on the Services board.
Mon, Apr 30, 8:43 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban
Gerrit Code Review <gerrit@wikimedia.org> committed rMSCD8cd45edf9b80: Merge "Don't filter bots from the ORES stream" (authored by Pchelolo).
Merge "Don't filter bots from the ORES stream"
Mon, Apr 30, 8:25 PM
Pchelolo added a comment to T193254: Global renames get stuck at metawiki.

As you know, we can't say it resolved until we being sure, because there's many pending requests, so if we said to all global renamers that the issue solved, there's will be a large number of rename processes in the log!

Mon, Apr 30, 6:38 PM · Services (done), Analytics, EventBus, MediaWiki-JobQueue, Operations, Wikimedia-log-errors, GlobalRename, Wikimedia-Site-requests, MediaWiki-extensions-CentralAuth
Pchelolo added a comment to T193254: Global renames get stuck at metawiki.

The problem started within hours of Kafka being enabled on mediawikiwiki, and it affects the wiki that's after mediawikiwiki alphabetically (which is the order global renames go), so it seems pretty likely there is a connection. (Except Husseinzadeh02/Hüseynzadə which also got stuck on the next wiki, minwiki, and the job did not finish properly on mediawikiwiki either. No idea what's up with that one.)

Mon, Apr 30, 4:21 PM · Services (done), Analytics, EventBus, MediaWiki-JobQueue, Operations, Wikimedia-log-errors, GlobalRename, Wikimedia-Site-requests, MediaWiki-extensions-CentralAuth
Pchelolo added a comment to T193254: Global renames get stuck at metawiki.

@mobrovac do you know if LocalRenameUserJob jobs on meta (and only there) could somehow be affected by the Redis-Kafka migration? I'm probably grasping at straws here, but not sure where else to look.

Mon, Apr 30, 3:04 PM · Services (done), Analytics, EventBus, MediaWiki-JobQueue, Operations, Wikimedia-log-errors, GlobalRename, Wikimedia-Site-requests, MediaWiki-extensions-CentralAuth

Thu, Apr 26

Pchelolo committed rMSCDf2f7a847e40e: Update change-propagation to 4d885a7 (authored by Pchelolo).
Update change-propagation to 4d885a7
Thu, Apr 26, 3:24 PM

Wed, Apr 25

Pchelolo added a comment to T189137: Migrate CirrusSearch jobs to Kafka queue.

Consider troubleshooting some problem with kafkacat -C | jq .

Wed, Apr 25, 3:22 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Operations, User-Joe, Analytics, User-Elukey
Pchelolo added a comment to T189137: Migrate CirrusSearch jobs to Kafka queue.

I've run some analysis on the logs and indeed sometimes the cirrusSearchElasticWrite is too large. Here're the sizes in bytes for all the log entries I could find so far:

Wed, Apr 25, 2:58 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Operations, User-Joe, Analytics, User-Elukey
Pchelolo created P7040 Too large cirrusSearchElasticaWrite jobs.
Wed, Apr 25, 2:55 PM
Pchelolo added a comment to T189137: Migrate CirrusSearch jobs to Kafka queue.

If there is a way to monitor such errors I guess we can pick-up known large pages and modify them while the write are frozen?

Wed, Apr 25, 2:26 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Operations, User-Joe, Analytics, User-Elukey
Pchelolo added a comment to T189137: Migrate CirrusSearch jobs to Kafka queue.

When we freeze writes we start to push ElasticaWrite jobs that contain the full page doc which can be relatively large. We had to raise some limits in the past due to that (nginx request size when we added nginx in front of elastic).

Wed, Apr 25, 2:14 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Operations, User-Joe, Analytics, User-Elukey
Pchelolo added a comment to T189137: Migrate CirrusSearch jobs to Kafka queue.

The subtasks that were created to fix issues discovered during the first iteration of the switch were resolved, and I don't see any logs indicating there's problems, so seems like nothing is blocking us from moving some more projects to kafka queue.

Wed, Apr 25, 2:09 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Operations, User-Joe, Analytics, User-Elukey
Pchelolo created P7039 cirrusSearchncomingLinksCount distribution.
Wed, Apr 25, 2:04 PM
Pchelolo closed T191024: Exception thrown while running DataSender::sendData in cluster codfw: Data should be a Document, a Script or an array containing Documents and/or Scripts as Resolved.

We might want to test more wikis or all of them perhaps?

Wed, Apr 25, 1:51 PM · Analytics, Services (done), MW-1.31-release-notes (WMF-deploy-2018-04-17 (1.31.0-wmf.30)), Discovery-Search (Current work), MediaWiki-JobQueue, EventBus, Discovery, CirrusSearch
Pchelolo closed T191024: Exception thrown while running DataSender::sendData in cluster codfw: Data should be a Document, a Script or an array containing Documents and/or Scripts, a subtask of T189137: Migrate CirrusSearch jobs to Kafka queue, as Resolved.
Wed, Apr 25, 1:51 PM · Patch-For-Review, Services (doing), MediaWiki-JobQueue, ChangeProp, EventBus, Operations, User-Joe, Analytics, User-Elukey
Pchelolo added a comment to T191024: Exception thrown while running DataSender::sendData in cluster codfw: Data should be a Document, a Script or an array containing Documents and/or Scripts.

I believe the fix for it has been deployed and we can try to proceed with switching cirrus search for some more wikis?

Wed, Apr 25, 1:44 PM · Analytics, Services (done), MW-1.31-release-notes (WMF-deploy-2018-04-17 (1.31.0-wmf.30)), Discovery-Search (Current work), MediaWiki-JobQueue, EventBus, Discovery, CirrusSearch
Pchelolo closed T192405: LocalGlobalUserPageCacheUpdateJob always fails as Resolved.

This has been resolved by enabling EventBus extension on loginwiki wiki with T191464

Wed, Apr 25, 12:56 PM · Analytics, Services (done), MediaWiki-JobQueue, EventBus
Pchelolo closed T191464: Enable CP4JQ support for private wikis as Resolved.

Support was enabled for all wikis except wikitech (see T192361 for reasoning). Resolving.

Wed, Apr 25, 12:51 PM · Services (done), MW-1.31-release-notes (WMF-deploy-2018-04-10 (1.31.0-wmf.29)), Analytics, ChangeProp, MediaWiki-JobQueue, EventBus
Pchelolo closed T191464: Enable CP4JQ support for private wikis, a subtask of T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus, as Resolved.
Wed, Apr 25, 12:51 PM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Services (doing), Goal, EventBus, MediaWiki-JobQueue, Analytics

Tue, Apr 24

Pchelolo added projects to T192946: Make gwtoolsetUploadMediafileJob JSON-serializable: MediaWiki-extensions-GWToolset, Multimedia.
Tue, Apr 24, 6:30 PM · Multimedia, MediaWiki-extensions-GWToolset, Commons, Services (blocked), EventBus, Analytics, MediaWiki-JobQueue
Pchelolo updated the task description for T192945: Make EchoNotification job JSON-serializable .
Tue, Apr 24, 5:51 PM · MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), Patch-For-Review, Collaboration-Team-Triage (Collab-Team-This-Quarter), Analytics, Notifications, Services (blocked), EventBus, MediaWiki-JobQueue
Pchelolo triaged T192946: Make gwtoolsetUploadMediafileJob JSON-serializable as Normal priority.
Tue, Apr 24, 5:50 PM · Multimedia, MediaWiki-extensions-GWToolset, Commons, Services (blocked), EventBus, Analytics, MediaWiki-JobQueue
Pchelolo triaged T192945: Make EchoNotification job JSON-serializable as Normal priority.
Tue, Apr 24, 5:48 PM · MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), Patch-For-Review, Collaboration-Team-Triage (Collab-Team-This-Quarter), Analytics, Notifications, Services (blocked), EventBus, MediaWiki-JobQueue

Apr 18 2018

Pchelolo added a comment to T192371: Consider stopping mobile regeneration for unreachable namespaces.

So, we have 2 options on how to implement this.

Apr 18 2018, 4:00 PM · Services (designing), Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Mobile-Content-Service, Reading-Infrastructure-Team-Backlog
Pchelolo committed rMSCDd83fad356b8b: Update change-propagation to 60dcce0 (authored by Pchelolo).
Update change-propagation to 60dcce0
Apr 18 2018, 3:59 PM
Pchelolo committed rMSCP2d75430ac1f9: Update dependencies (authored by Pchelolo).
Update dependencies
Apr 18 2018, 3:33 PM

Apr 17 2018

Pchelolo created T192405: LocalGlobalUserPageCacheUpdateJob always fails.
Apr 17 2018, 8:35 PM · Analytics, Services (done), MediaWiki-JobQueue, EventBus
Pchelolo added projects to T187102: Vagrant's /var/log/daemon.log filling up with kafka errors: Analytics, Services (watching).
Apr 17 2018, 5:03 PM · Services (watching), Analytics, MediaWiki-Vagrant
Pchelolo created T192371: Consider stopping mobile regeneration for unreachable namespaces.
Apr 17 2018, 3:15 PM · Services (designing), Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Mobile-Content-Service, Reading-Infrastructure-Team-Backlog
Pchelolo triaged T192363: The .meta.domain is incorrect in EventBus when other wiki is used as Normal priority.
Apr 17 2018, 2:00 PM · Services (done), EventBus, Analytics, MediaWiki-JobQueue
Pchelolo triaged T192361: Transfer wikitech jobs to Kafka queue as Normal priority.
Apr 17 2018, 1:44 PM · Services (done), wikitech.wikimedia.org, EventBus, MediaWiki-JobQueue, Analytics
Pchelolo closed T192287: [Bug] Beta cluster page summary endpoint sometimes reponds with 5xx as Resolved.

The deployment-mediawiki04.deployment-prep.eqiad.wmflabs host was removed per T192071 - that explains the issue. I think this can be resolved now, please reopen if it comes back.

Apr 17 2018, 12:52 PM · Services (done), Reading-Infrastructure-Team-Backlog (Kanban), Readers-Web-Backlog (Tracking), Operations, Beta-Cluster-Infrastructure, Mobile-Content-Service, Page-Previews

Apr 16 2018

Pchelolo updated the task description for T175210: Select candidate jobs for transferring to the new infrastucture.
Apr 16 2018, 9:37 PM · Services (doing), MediaWiki-JobQueue, ChangeProp, Analytics, EventBus, Operations, User-Joe, User-Elukey
Pchelolo committed rMSCD45385daa0716: Update change-propagation to 7bacc72 (authored by Pchelolo).
Update change-propagation to 7bacc72
Apr 16 2018, 3:43 PM

Apr 12 2018

Pchelolo triaged T192111: Make TranslationsUpdateJob JSON-serializable as High priority.
Apr 12 2018, 8:47 PM · Services (done), MW-1.32-release-notes (WMF-deploy-2018-05-01 (1.32.0-wmf.2)), Language-2018-Apr-June, MediaWiki-extensions-Translate