Page MenuHomePhabricator
Feed Advanced Search

Apr 2 2019

Pchelolo claimed T211125: Move service-runner to new logging infrastructure.
Apr 2 2019, 7:17 PM · observability, Platform Team Legacy (Watching / External), service-runner, Wikimedia-Logstash, SRE
Pchelolo added a comment to T211125: Move service-runner to new logging infrastructure.

I have deployed a new pipeline for RESTBase in production and it all looks great. Next step - convert other services. I will try it out on change-prop and create subtasks for individual services.

Apr 2 2019, 7:09 PM · observability, Platform Team Legacy (Watching / External), service-runner, Wikimedia-Logstash, SRE
Pchelolo added a comment to T219738: PHP Warning: Array key should be either a string or an integer.

If you could verify on Beta prior to deployment, however, that would be helpful.

Apr 2 2019, 7:05 PM · Analytics-Radar, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.25; 2019-04-09), Beta-Cluster-reproducible, Event-Platform, Wikimedia-production-error
Pchelolo added a comment to T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0.

The new UI has been deployed. Next step here - explore the new features in openAPI 3.0, see what we can start using, converting the specs into 3.0.

Apr 2 2019, 6:04 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE
Pchelolo added a comment to T219738: PHP Warning: Array key should be either a string or an integer.

The https://gerrit.wikimedia.org/r/500363 fixes it. Don't want to self-merge my own patch though.

Apr 2 2019, 5:50 PM · Analytics-Radar, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.25; 2019-04-09), Beta-Cluster-reproducible, Event-Platform, Wikimedia-production-error
Pchelolo created T219900: Evaluate url-template package for use in hyperswitch.
Apr 2 2019, 5:14 PM · Platform Team Initiatives (RESTBase Split (CDP2)), Platform Team Workboards (Clinic Duty Team), RESTBase
Pchelolo updated the task description for T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0.
Apr 2 2019, 4:40 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE

Mar 31 2019

Pchelolo added a comment to T219737: Ability to create blocks broken.

Thank you for catching this early and sorry for this.

Mar 31 2019, 7:50 PM · Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.25; 2019-04-09), Beta-Cluster-reproducible, Event-Platform, Analytics

Mar 29 2019

Pchelolo added a comment to T219148: Use PHP7 to run all async jobs.

If I understand correctly, in order to switch a particular job execution to PHP7 all we need to do is to add Cookie: PHP_ENGINE=php7 header to the request.

Mar 29 2019, 5:31 PM · User-WDoran, Platform Team Workboards (Clinic Duty Team), User-jijiki, Services (watching), SRE, serviceops
Pchelolo added a project to T219148: Use PHP7 to run all async jobs: Services (watching).
Mar 29 2019, 5:12 PM · User-WDoran, Platform Team Workboards (Clinic Duty Team), User-jijiki, Services (watching), SRE, serviceops
Pchelolo moved T219520: Replace online validator with swagger-cli from Inbox to Later on the Platform Team Legacy board.
Mar 29 2019, 4:29 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (RESTBase Split (CDP2)), RESTBase-API, RESTBase
Pchelolo added a comment to T219548: restbase-mod-table-* simplification and improvements.

I think I have failed to describe the details of the basis of the reasoning behind this leaving a lot of room for confusion. I'll try to fix this mistake.

Mar 29 2019, 1:41 PM · Platform Engineering (Needs Cleaning - Cassandra Operational), RESTBase
Pchelolo added a project to T219556: Create schema[12]00[12] (schema.svc.{eqiad,codfw}.wmnet): Platform Engineering (Modern Event Platform (TEC2)).
Mar 29 2019, 12:07 PM · Analytics-Kanban, Patch-For-Review, Platform Team Legacy (Watching / External), Platform Engineering (Modern Event Platform (TEC2)), Services (watching), SRE, vm-requests, Event-Platform, Analytics
Pchelolo moved T219556: Create schema[12]00[12] (schema.svc.{eqiad,codfw}.wmnet) from Backlog to watching on the Services board.
Mar 29 2019, 12:07 PM · Analytics-Kanban, Patch-For-Review, Platform Team Legacy (Watching / External), Platform Engineering (Modern Event Platform (TEC2)), Services (watching), SRE, vm-requests, Event-Platform, Analytics
Pchelolo moved T219552: Schema Registry HTTP Service from Backlog to watching on the Services board.
Mar 29 2019, 12:06 PM · Analytics-Kanban, Patch-For-Review, Platform Engineering (Modern Event Platform (TEC2)), Services (watching), Platform Team Legacy (Watching / External), Event-Platform, Analytics

Mar 28 2019

Pchelolo created T219548: restbase-mod-table-* simplification and improvements.
Mar 28 2019, 7:35 PM · Platform Engineering (Needs Cleaning - Cassandra Operational), RESTBase
Pchelolo closed T219159: Partition htmlCacheUpdate job topic as Resolved.

We have deployed the partitioner for the htmlCacheUpdate job and it's not running in production. We have created some lag in the process, but it should clear out soon.

Mar 28 2019, 2:48 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), WMF-JobQueue, Event-Platform

Mar 27 2019

Pchelolo committed rMSCP460c938357dd: Support templates in partitioned topics names.
Support templates in partitioned topics names
Mar 27 2019, 8:00 PM
Pchelolo committed rMSCPdbdce749932a: Support templates in partitioned topics names.
Support templates in partitioned topics names
Mar 27 2019, 6:58 PM
Pchelolo created T219427: Use service-runner provided test server in service-template.
Mar 27 2019, 5:44 PM · Platform Team Legacy (Later), Services (later), service-template-node
Pchelolo created T219425: preq emits unhandled rejection on socket timeout.
Mar 27 2019, 5:21 PM · Platform Engineering (Icebox), RESTBase
Pchelolo updated the task description for T219385: Remove SIG* listeners in service runner on stop.
Mar 27 2019, 2:05 PM · Platform Team Workboards (Done with CPT), Services (done), service-runner
Pchelolo created T219386: Use service-runner test service in change-prop tests.
Mar 27 2019, 2:03 PM · Platform Team Legacy (Later), ChangeProp, Services (later)
Pchelolo created T219385: Remove SIG* listeners in service runner on stop.
Mar 27 2019, 1:59 PM · Platform Team Workboards (Done with CPT), Services (done), service-runner

Mar 26 2019

Pchelolo added a comment to T216567: mediawiki/recentchange event should not use fields with polymorphic types.

After the patch was deployed we do not have nulls in recent change schema anymore, however we still can not declare victory and get rid of all of the polymorphic types in the schema. The log_params can be either an object or an array and, judging by the code, it can actually be a non-empty array in rare cases. Not sure what to do about that.

Mar 26 2019, 5:52 PM · Patch-For-Review, Platform Team Initiatives (Modern Event Platform (TEC2)), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Analytics, Services (next), Event-Platform
Pchelolo added a comment to T219159: Partition htmlCacheUpdate job topic.

Actually, the existing topic need to be left alone, but 2 new topics 8 partitions each needs to be created:

Mar 26 2019, 1:45 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), WMF-JobQueue, Event-Platform

Mar 25 2019

Pchelolo closed T218260: Decrease timeout for EventBus extension for analytics events, a subtask of T218255: Enabling api-request eventgate to group1 caused minor service disruptions , as Resolved.
Mar 25 2019, 7:01 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics, Event-Platform, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), serviceops, Wikimedia-Incident, SRE
Pchelolo closed T218260: Decrease timeout for EventBus extension for analytics events as Resolved.

Merged and deployed as a part of SWAT. Resolving.

Mar 25 2019, 7:01 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics-Kanban, Analytics, Event-Platform
Pchelolo reassigned T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 from holger.knust to Clarakosi.

For step 2 we need to switch hyperswitch to upstream swagger.

Mar 25 2019, 4:06 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE
Pchelolo added a comment to T219159: Partition htmlCacheUpdate job topic.

@Ottomata yes, but not just yet, we still need to prepare the patches etc.

Mar 25 2019, 2:28 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), WMF-JobQueue, Event-Platform
Pchelolo updated the task description for T219159: Partition htmlCacheUpdate job topic.
Mar 25 2019, 2:08 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), WMF-JobQueue, Event-Platform
Pchelolo created T219159: Partition htmlCacheUpdate job topic.
Mar 25 2019, 2:07 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), WMF-JobQueue, Event-Platform

Mar 22 2019

Pchelolo moved T218812: RFC: Provide the ability to have time-delayed or time-offset jobs in the job queue from next to watching on the Services board.
Mar 22 2019, 11:28 AM · Data-Engineering-Icebox, Analytics-Radar, User-ArielGlenn, Platform Team Legacy (Watching / External), serviceops-radar, TechCom-RFC, ChangeProp, WMF-JobQueue, Community-Tech

Mar 21 2019

Pchelolo committed rMSCD7bdc068c5bb8: Update change-propagation to c3d6639.
Update change-propagation to c3d6639
Mar 21 2019, 2:53 PM
Pchelolo added a comment to T218812: RFC: Provide the ability to have time-delayed or time-offset jobs in the job queue.

There is already an ability to execute jobs after a delay or at more-or-less specific time, but it's really not something we want to build on.

Mar 21 2019, 12:55 PM · Data-Engineering-Icebox, Analytics-Radar, User-ArielGlenn, Platform Team Legacy (Watching / External), serviceops-radar, TechCom-RFC, ChangeProp, WMF-JobQueue, Community-Tech
Pchelolo closed T218396: Make change-prop tests independent of Kafka and Redis as Resolved.

Now it's ready - CP tests are independent of both Kafka and Redis.

Mar 21 2019, 10:59 AM · Platform Team Workboards (Done with CPT), Services (done), Release Pipeline, serviceops, ChangeProp
Pchelolo closed T218396: Make change-prop tests independent of Kafka and Redis, a subtask of T213193: Migrate changeprop to kubernetes, as Resolved.
Mar 21 2019, 10:59 AM · Patch-For-Review, Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Services (watching), Release Pipeline, serviceops, ChangeProp
Pchelolo committed rMSCDd2ec6849cbfc: Change enable_blacklist to disable_blacklist..
Change enable_blacklist to disable_blacklist.
Mar 21 2019, 12:18 AM

Mar 20 2019

Pchelolo added a comment to T210651: Switch all PDF render traffic to new Proton service.

mediawiki-vagrant should also be updated to support new proton role.

Mar 20 2019, 6:46 PM · User-notice-archive, Platform Team Workboards (Done with CPT), Services (done), Product-Infrastructure-Team-Backlog-Deprecated, Web-Team-Backlog (Tracking), Proton
Pchelolo reopened T218396: Make change-prop tests independent of Kafka and Redis as "Open".

Oh, no, not resolving yet. Next step - mock redis.

Mar 20 2019, 2:16 PM · Platform Team Workboards (Done with CPT), Services (done), Release Pipeline, serviceops, ChangeProp
Pchelolo reopened T218396: Make change-prop tests independent of Kafka and Redis, a subtask of T213193: Migrate changeprop to kubernetes, as Open.
Mar 20 2019, 2:16 PM · Patch-For-Review, Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Services (watching), Release Pipeline, serviceops, ChangeProp
Pchelolo closed T218396: Make change-prop tests independent of Kafka and Redis as Resolved.

The PR has been merged, resolving

Mar 20 2019, 2:14 PM · Platform Team Workboards (Done with CPT), Services (done), Release Pipeline, serviceops, ChangeProp
Pchelolo closed T218396: Make change-prop tests independent of Kafka and Redis, a subtask of T213193: Migrate changeprop to kubernetes, as Resolved.
Mar 20 2019, 2:14 PM · Patch-For-Review, Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Services (watching), Release Pipeline, serviceops, ChangeProp

Mar 19 2019

Pchelolo added a comment to T218252: [Bug] Sporadic 503 errors when editing.

Can this be resolved?

Mar 19 2019, 2:27 PM · Services, Wikimedia-production-error, Web-Team-Backlog (Tracking), VisualEditor, SRE
Pchelolo closed T218274: `rev_parent_id` and `rev_content_changed` are missing in event.mediawiki_revision_tags_change as Resolved.

The rev_content_changed has been removed from the schema and after the train we will ensure rev_parent_id is present in all the events. Resolving.

Mar 19 2019, 2:25 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Product-Analytics

Mar 18 2019

Pchelolo added a comment to T218274: `rev_parent_id` and `rev_content_changed` are missing in event.mediawiki_revision_tags_change.

I think that the schema is incorrect here.

Mar 18 2019, 3:40 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (done), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Product-Analytics
Pchelolo added a comment to T211125: Move service-runner to new logging infrastructure.

After enabling logging over syslog for RESTBase in deployment-prep, we have identified a number of disparities between node services and, for example, mediawiki.

Mar 18 2019, 3:08 PM · observability, Platform Team Legacy (Watching / External), service-runner, Wikimedia-Logstash, SRE

Mar 15 2019

Pchelolo renamed T218396: Make change-prop tests independent of Kafka and Redis from Make change-prop tests undefended of Kafka and Redis to Make change-prop tests independent of Kafka and Redis.
Mar 15 2019, 12:48 PM · Platform Team Workboards (Done with CPT), Services (done), Release Pipeline, serviceops, ChangeProp
Pchelolo created T218396: Make change-prop tests independent of Kafka and Redis.
Mar 15 2019, 12:47 PM · Platform Team Workboards (Done with CPT), Services (done), Release Pipeline, serviceops, ChangeProp

Mar 14 2019

Pchelolo assigned T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 to holger.knust.

Verified that we can work with swagger-ui 3+ once we make the spec standard-compliant. Let's begin with modifying the specs.

Mar 14 2019, 8:13 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE
Pchelolo added a comment to T218260: Decrease timeout for EventBus extension for analytics events.

I wonder if we should also use ?hasty=true mode for mediawiki 'analytics' events? This would use a non-ACKed producer and not ever block the MW waiting for a response.

Mar 14 2019, 1:42 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics-Kanban, Analytics, Event-Platform
Pchelolo assigned T218275: Vagrant restbase can't launch: statsd.childClient is not a function to holger.knust.

Oh, we have made use of the hot-shots internal childClient method, but forgot there's a debugging LogStatsD. Need to fix this in service-runner and make sure RESTBase starts if we configure metrics.type to 'log'.

Mar 14 2019, 1:37 PM · Platform Team Workboards (Done with CPT), Services (done), service-runner, RESTBase, MediaWiki-Vagrant

Mar 13 2019

Pchelolo added a subtask for T218255: Enabling api-request eventgate to group1 caused minor service disruptions : T218260: Decrease timeout for EventBus extension for analytics events.
Mar 13 2019, 9:05 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics, Event-Platform, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), serviceops, Wikimedia-Incident, SRE
Pchelolo added a parent task for T218260: Decrease timeout for EventBus extension for analytics events: T218255: Enabling api-request eventgate to group1 caused minor service disruptions .
Mar 13 2019, 9:05 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics-Kanban, Analytics, Event-Platform
Pchelolo added a parent task for T218254: EventBus extension should never log unserialized events: T218255: Enabling api-request eventgate to group1 caused minor service disruptions .
Mar 13 2019, 9:05 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics-Kanban, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Analytics, Event-Platform
Pchelolo added a subtask for T218255: Enabling api-request eventgate to group1 caused minor service disruptions : T218254: EventBus extension should never log unserialized events.
Mar 13 2019, 9:05 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics, Event-Platform, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), serviceops, Wikimedia-Incident, SRE
Pchelolo created T218260: Decrease timeout for EventBus extension for analytics events.
Mar 13 2019, 9:04 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics-Kanban, Analytics, Event-Platform
Pchelolo created T218254: EventBus extension should never log unserialized events.
Mar 13 2019, 8:36 PM · Platform Team Workboards (Done with CPT), Services (done), Analytics-Kanban, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Analytics, Event-Platform
Pchelolo closed T128040: Document and implement the REST API format versioning and negotiation policy as Resolved.

I think this is done and can be resolved now.

Mar 13 2019, 3:02 PM · Services (done), Documentation, Patch-For-Review, RESTBase
Pchelolo closed T128040: Document and implement the REST API format versioning and negotiation policy, a subtask of T124365: RFC: Define a policy for REST API result format versioning / negotiation , as Resolved.
Mar 13 2019, 3:02 PM · TechCom-RFC (TechCom-RFC-Closed), Proposal, Services-next, Services, discovery-system, Architecture, Parsing-Team--ARCHIVED, Parsoid-Web-API, RESTBase
Pchelolo closed T128040: Document and implement the REST API format versioning and negotiation policy, a subtask of T128392: Documentation improvements (tracking), as Resolved.
Mar 13 2019, 3:02 PM · Tracking-Neverending, Services
Pchelolo added a parent task for T174982: Sourcemap is incorrect in RESTBase help page: T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0.
Mar 13 2019, 3:00 PM · Platform Team Workboards (Done with CPT), Services (done), RESTBase-API, RESTBase
Pchelolo added a subtask for T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0: T174982: Sourcemap is incorrect in RESTBase help page.
Mar 13 2019, 3:00 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE
Pchelolo added a parent task for T188255: Upgrade swagger-ui version in mathoid: T218217: Make services swagger specs standard compliant.
Mar 13 2019, 2:59 PM · Math, Platform Team Workboards (Clinic Duty Team), Mathoid
Pchelolo added a subtask for T218217: Make services swagger specs standard compliant: T188255: Upgrade swagger-ui version in mathoid.
Mar 13 2019, 2:59 PM · Math, Platform Engineering, serviceops-radar, Product-Infrastructure-Team-Backlog-Deprecated, Proton, Graphoid, CX-cxserver, Citoid, Mathoid, Recommendation-API, Services (later), Mobile-Content-Service, RESTBase-API
Pchelolo closed T189494: Evaluate swagger 3 as Resolved.
Mar 13 2019, 2:58 PM · Platform Team Legacy (Designing), Services (designing)
Pchelolo closed T189494: Evaluate swagger 3, a subtask of T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services, as Resolved.
Mar 13 2019, 2:58 PM · Platform Team Workboards (Done with CPT), Services (done), TechCom, RESTBase-API, serviceops, SRE
Pchelolo added a parent task for T189494: Evaluate swagger 3: T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services.
Mar 13 2019, 2:58 PM · Platform Team Legacy (Designing), Services (designing)
Pchelolo added a subtask for T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services: T189494: Evaluate swagger 3.
Mar 13 2019, 2:58 PM · Platform Team Workboards (Done with CPT), Services (done), TechCom, RESTBase-API, serviceops, SRE
Pchelolo added a parent task for T217725: Selected response type in REST BASE page does not match the info sent in request, resulting in 406 error: T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0.
Mar 13 2019, 2:57 PM · Platform Team Workboards (Done with CPT), Services (done), RESTBase-API, RESTBase
Pchelolo added a subtask for T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0: T217725: Selected response type in REST BASE page does not match the info sent in request, resulting in 406 error.
Mar 13 2019, 2:57 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE
Pchelolo added a comment to T217725: Selected response type in REST BASE page does not match the info sent in request, resulting in 406 error.

This seem to be a bug in swagger-ui. We're going to update to the latest upstream version of the package, so it should be resolved when done.

Mar 13 2019, 2:56 PM · Platform Team Workboards (Done with CPT), Services (done), RESTBase-API, RESTBase
Pchelolo closed T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services as Resolved.

I think we have a consensus to go spec-compliant here. I've created a task for services and a special task for RESTBase, as it's a different beast. I'm resolving this ticket.

Mar 13 2019, 2:36 PM · Platform Team Workboards (Done with CPT), Services (done), TechCom, RESTBase-API, serviceops, SRE
Pchelolo closed T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services, a subtask of T217747: cxserver's swagger spec fails to validate, as Resolved.
Mar 13 2019, 2:35 PM · Language-Team (Language-2021-October-December), Patch-For-Review, CX-cxserver
Pchelolo created T218218: Make RESTBase spec standard compliant and switch to OpenAPI 3.0.
Mar 13 2019, 2:34 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), RESTBase, RESTBase-API, serviceops, SRE
Pchelolo created T218217: Make services swagger specs standard compliant.
Mar 13 2019, 2:29 PM · Math, Platform Engineering, serviceops-radar, Product-Infrastructure-Team-Backlog-Deprecated, Proton, Graphoid, CX-cxserver, Citoid, Mathoid, Recommendation-API, Services (later), Mobile-Content-Service, RESTBase-API

Mar 12 2019

Pchelolo added a comment to T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services.

Well instead of copy/pasta, there is that thing called YAML anchors/references that deal with data repetition in YAML files.

Mar 12 2019, 1:41 PM · Platform Team Workboards (Done with CPT), Services (done), TechCom, RESTBase-API, serviceops, SRE

Mar 11 2019

Pchelolo added a comment to T217881: Decide whether to keep violating OpenAPI/Swagger specification in our REST services.

I would say we should update our swagger to 3.0 and become standard-compatible.

Mar 11 2019, 10:00 PM · Platform Team Workboards (Done with CPT), Services (done), TechCom, RESTBase-API, serviceops, SRE
Pchelolo added a comment to T217683: Delete the mediawiki/services/cp-jobqueue repo.

Ye, it can be deleted. This is not used and has actually never been used.

Mar 11 2019, 6:37 PM · User-greg, Platform Team Legacy (Watching / External), Services (watching), WMF-JobQueue, ChangeProp, Release-Engineering-Team
Pchelolo closed T212335: EventBus or CirrusSearch: DomainException from line 353 of /srv/mediawiki/php-1.33.0-wmf.9/vendor/firebase/php-jwt/src/JWT.php: Unknown JSON error: 5 as Resolved.

The change has been deployed and EventBus doesn't fail with this anymore. Resolving.

Mar 11 2019, 6:37 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), Services (doing), Event-Platform, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), WMF-JobQueue, CirrusSearch, Wikimedia-production-error
Pchelolo added a comment to T210342: Otrs-wiki lint errors are not being updated.

Exactly. We have to prioritize T171788 I guess.

Mar 11 2019, 5:57 PM · MediaWiki-extensions-Linter
Pchelolo merged T210342: Otrs-wiki lint errors are not being updated into T171788: On wikis without changeprop enabled, lint errors don't update after page edits.
Mar 11 2019, 5:57 PM · Parsoid (Tracking), Platform Team Legacy (Designing), Services (designing), wikitech.wikimedia.org, MediaWiki-extensions-Linter
Pchelolo merged task T210342: Otrs-wiki lint errors are not being updated into T171788: On wikis without changeprop enabled, lint errors don't update after page edits.
Mar 11 2019, 5:57 PM · MediaWiki-extensions-Linter
Pchelolo closed T215987: Verify that hit/miss stats in WebRequest are correct as Resolved.

Thank you, everyone! :)

Mar 11 2019, 3:22 PM · Analytics-Radar, Traffic, SRE, Platform Team Legacy (Later), Services (blocked), RESTBase
Pchelolo closed T215987: Verify that hit/miss stats in WebRequest are correct, a subtask of T215960: Simplify MCS storage model, as Resolved.
Mar 11 2019, 3:22 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (RESTBase Split (CDP2)), User-Eevans, Product-Infrastructure-Team-Backlog-Deprecated, RESTBase

Feb 26 2019

Pchelolo added a comment to T217146: ConfigException from line 339 of /srv/mediawiki/php-1.33.0-wmf.19/extensions/EventBus/includes/EventBus.php: EventBus::getInstance requires a configured $eventServiceName.

Taken care of by https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/493073/

Feb 26 2019, 5:17 PM · Analytics, Event-Platform, Wikimedia-production-error
Pchelolo added a comment to T217145: Catchable fatal error: Argument 1 passed to EventBusHooks::sendResourceChangedEvent() must be an instance of LinkTarget, Title given in /srv/mediawiki/php-1.33.0-wmf.19/extensions/EventBus/includes/EventBusHooks.php on line 324.

The above patch and backport should take care of this and another one with $eventServiceName issue. Sorry about that

Feb 26 2019, 5:13 PM · MW-1.33-notes (1.33.0-wmf.20; 2019-03-05), Wikimedia-production-error, Analytics, Event-Platform

Feb 22 2019

Pchelolo updated the task description for T216191: Replace Istanbul with nyc.
Feb 22 2019, 7:16 PM · Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1))

Feb 21 2019

Pchelolo added a comment to T216069: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins.

Thank you a lot @hashar . I guess this ticket can be closed, however I have a last question - do you think we could somehow improve quibble to output this info without the need to place *cough* console.log()?

Feb 21 2019, 7:01 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (blocked), Event-Platform, MediaWiki-Core-Tests, Release-Engineering-Team, Jenkins, Quibble
Pchelolo added a comment to T216726: Edits to Flow pages result in a page-links-change event with no performer.

I'm not sure how Flow works internally, which hooks are called and why doesn't it set a user, better to ask Community-Tech ?

Feb 21 2019, 6:00 PM · Wikilink-Tool, Data-Engineering, Growth-Team-Filtering, Analytics-Radar, Growth-Team, Event-Platform
Pchelolo triaged T214099: Stress test Parsoid's HTTP API as High priority.

Parsoid was in a bit of trouble again today. At 02:44 XioNoX: depool eqsin, so all the mobile traffic for Chinese wiki started hitting RESTBase. Since Chinese has variants, RB started requesting Parsoid to transform HTML into correct variant. As the transformation request rate reached roughly 40 r/s, Parsoid started experiencing troubles, alerting and timing out.

Feb 21 2019, 4:07 AM · Patch-For-Review, Services (watching), Parsoid-Web-API, Parsoid

Feb 20 2019

Pchelolo created T216636: Consider deprecating section editing API in RESTBase.
Feb 20 2019, 5:44 PM · User-Ryasmeen, Platform Team Workboards (Done with CPT), Services (done), Platform Engineering (RESTBase Split (CDP2)), VisualEditor, RESTBase
Pchelolo added a comment to T216069: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins.

I have tried to resubmit the change: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/491014/ with no luck.

Feb 20 2019, 3:57 PM · Analytics-Radar, Platform Team Workboards (Done with CPT), Services (blocked), Event-Platform, MediaWiki-Core-Tests, Release-Engineering-Team, Jenkins, Quibble
Pchelolo claimed T212335: EventBus or CirrusSearch: DomainException from line 353 of /srv/mediawiki/php-1.33.0-wmf.9/vendor/firebase/php-jwt/src/JWT.php: Unknown JSON error: 5.

This is a bug in Event-Platform. Serializing the events for sending is protected with a try-catch, it logs an error and drops the job if it's not serializable. However, in this case, it fails in a different place - each job is signed via JWT so that it could be checked in a later point and verified that Mediawiki is the entity actually sending the job. That signing procedure internally requires serialization as well apparently and it's not protected with try-catch.

Feb 20 2019, 1:44 AM · Analytics-Radar, Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), Services (doing), Event-Platform, Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), WMF-JobQueue, CirrusSearch, Wikimedia-production-error
Pchelolo created T216584: Consider deprecating and removing public data-parsoid REST endpoint.
Feb 20 2019, 1:04 AM · Platform Team Initiatives (RESTBase Split (CDP2)), RESTBase

Feb 19 2019

Pchelolo created T216567: mediawiki/recentchange event should not use fields with polymorphic types.
Feb 19 2019, 10:33 PM · Patch-For-Review, Platform Team Initiatives (Modern Event Platform (TEC2)), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Analytics, Services (next), Event-Platform
Pchelolo added a comment to T214093: Modern Event Platform: Schema Guidelines and Conventions.

Rejecting events that haven't lowercased the headers is another option, albeit not as user-friendly.

Feb 19 2019, 9:39 PM · Product-Data-Infrastructure, Analytics-Kanban, Analytics, Better Use Of Data, Product-Analytics, Goal, Services (watching), MediaWiki-extensions-EventLogging, Event-Platform
Pchelolo added a comment to T214093: Modern Event Platform: Schema Guidelines and Conventions.

EventGate, so that all parts of the system have a standard representation of the headers.

Feb 19 2019, 9:33 PM · Product-Data-Infrastructure, Analytics-Kanban, Analytics, Better Use Of Data, Product-Analytics, Goal, Services (watching), MediaWiki-extensions-EventLogging, Event-Platform
Pchelolo added a comment to T216184: Eventstreams build is broken.

It's still undecided what to do with package-lock (T179229), so maybe let's just freeze the verison?

Feb 19 2019, 8:30 PM · Patch-For-Review, Analytics-Kanban, SRE, EventStreams, Analytics, Services (watching)
Pchelolo added a comment to T216184: Eventstreams build is broken.

KafkaSSE requires ^2.3.4.

Feb 19 2019, 7:58 PM · Patch-For-Review, Analytics-Kanban, SRE, EventStreams, Analytics, Services (watching)