Page MenuHomePhabricator

Jgiannelos (jgiannelos)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
May 5 2020, 11:24 AM (254 w, 5 d)
Availability
Available
IRC Nick
nemo-yiannis
LDAP User
Jgiannelos
MediaWiki User
JGiannelos (WMF) [ Global Accounts ]

Recent Activity

Fri, Mar 21

Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

Upstream patch merged: https://github.com/omniscale/imposm3/pull/300

Fri, Mar 21, 1:52 PM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

FWIW we should also investigate how far are the other sequences in our tables from max value.

Fri, Mar 21, 10:50 AM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

For future reference (on bookworm upgrade). We might want to consider:

Fri, Mar 21, 10:15 AM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

Looks like ALTERing the sequence after ALTERing the table did the trick.

Fri, Mar 21, 10:14 AM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps

Thu, Mar 20

Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

We don't have an explicit id column in the imposm mapping. This means that it defaults to:
https://github.com/omniscale/imposm3/blob/5d32daabd0e75800a20261a486f33b20b948ad5b/database/postgis/spec.go#L57
And in postgres SERIAL is integer not bigint

Thu, Mar 20, 3:56 PM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

Going through the logs, the last error before that aborted transactions is:

nextval: reached maximum value of sequence "wikidata_relation_members_id_seq" (2147483647)
Thu, Mar 20, 3:21 PM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

Turns out this is the side-effect of the main issue and the node is just flooded with logs and / run out of space.

Thu, Mar 20, 3:20 PM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

This error correlates with the start of the replication lag:
https://logstash.wikimedia.org/goto/070b5c8c4c7fdcaf42824de690ae09c3

Thu, Mar 20, 2:55 PM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps
Jgiannelos added a comment to T376635: The apostrophe after the letter "Š".

Yeah, i just verified it on staging without caching:

Thu, Mar 20, 2:45 PM · Content-Transform-Team (Work In Progress), RESTBase Sunsetting, Page Content Service, Language and Product Localization, MW-Interfaces-Team, MediaWiki-REST-API
Jgiannelos added a project to T376635: The apostrophe after the letter "Š": Content-Transform-Team (Work In Progress).
Thu, Mar 20, 2:32 PM · Content-Transform-Team (Work In Progress), RESTBase Sunsetting, Page Content Service, Language and Product Localization, MW-Interfaces-Team, MediaWiki-REST-API
Jgiannelos added a comment to T376635: The apostrophe after the letter "Š".

I think the problem here might be the handling of language variants after we switched over srwiki PCS from RESTBase to REST-gateway.

Thu, Mar 20, 2:32 PM · Content-Transform-Team (Work In Progress), RESTBase Sunsetting, Page Content Service, Language and Product Localization, MW-Interfaces-Team, MediaWiki-REST-API
Jgiannelos added a project to T389410: restbase service crashing: serviceops.
Thu, Mar 20, 2:25 PM · serviceops, Content-Transform-Team, Cassandra
Jgiannelos added a comment to T389410: restbase service crashing.

I tried GETing all the scap targets for the healthcheck URL and some failed consistently:

Thu, Mar 20, 2:19 PM · serviceops, Content-Transform-Team, Cassandra
Jgiannelos added a comment to T389462: OSM replication lag on maps1009.

From maps 1006 i saw logs flooded with:

2025-03-20 13:45:21 GMT LOG:  incomplete startup packet
Thu, Mar 20, 1:45 PM · Content-Transform-Team (Work In Progress), Infrastructure-Foundations, Maps

Tue, Mar 18

Jgiannelos closed T388148: CTT tasks week of 2025-03-07 as Resolved.
Tue, Mar 18, 12:38 PM · MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Essential-Work, Content-Transform-Team (Work In Progress)
Jgiannelos updated the task description for T388148: CTT tasks week of 2025-03-07.
Tue, Mar 18, 12:38 PM · MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Essential-Work, Content-Transform-Team (Work In Progress)

Mon, Mar 17

Jgiannelos claimed T269003: Caching issue for title descriptions in German Wikipedia.
Mon, Mar 17, 3:25 PM · Content-Transform-Team (Work In Progress), Page Content Service, Wikidata data quality and trust, Wikidata Integration in Wikimedia projects, Wikidata, ChangeProp, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog
Jgiannelos moved T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 17, 3:07 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos moved T348996: Change changeprops rules to pre-generate/invalidate cache directly to PCS rather than in restbase from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 17, 3:07 PM · Content-Transform-Team (Work In Progress), Page Content Service, serviceops, RESTBase Sunsetting
Jgiannelos updated the task description for T388140: Rollout more wikis: week 2.
Mon, Mar 17, 1:40 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos added a comment to T389042: Remove Kartotherian from bare metal hosts.

We should also remove or deprecate the scap config and update the deployment docs.

Mon, Mar 17, 12:14 PM · User-Elukey, Infrastructure-Foundations, Content-Transform-Team, Epic, Maps (Kartotherian)

Thu, Mar 13

Jgiannelos added a comment to T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did.

So far I've tested:

Thu, Mar 13, 3:33 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service

Wed, Mar 12

Jgiannelos updated the task description for T388148: CTT tasks week of 2025-03-07.
Wed, Mar 12, 6:29 PM · MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Essential-Work, Content-Transform-Team (Work In Progress)
Jgiannelos added a comment to T388140: Rollout more wikis: week 2.

I updated the patch to enable all changeprop traffic except of top 9 by PCS traffic:

en|de|ru|ja|fr|es|zh|it|pt
Wed, Mar 12, 4:11 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos added a comment to T383765: Issues with Image Licensing, Placement, and Duplication in Image Recommendations.

Here is an example query of what the android app is asking from the action api:

https://en.wikipedia.org/w/api.php?action=query&generator=search&gsrsearch=hasrecommendation%3Aimage&gsrnamespace=0&gsrsort=random&prop=growthimagesuggestiondata|revisions|pageimages&pilicense=any&rvprop=ids|timestamp|flags|comment|user|content&rvslots=main&rvsection=0
Wed, Mar 12, 3:44 PM · Growth-Team, Growth-Structured-Tasks, Structured-Data-Backlog, Wikipedia-Android-App-Backlog
Jgiannelos added a comment to T269003: Caching issue for title descriptions in German Wikipedia.

I manually purged page summary and it looks the output of most-read is fixed.

Wed, Mar 12, 11:23 AM · Content-Transform-Team (Work In Progress), Page Content Service, Wikidata data quality and trust, Wikidata Integration in Wikimedia projects, Wikidata, ChangeProp, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog
Jgiannelos added a comment to T269003: Caching issue for title descriptions in German Wikipedia.

The vandalized title comes from:

Wed, Mar 12, 11:17 AM · Content-Transform-Team (Work In Progress), Page Content Service, Wikidata data quality and trust, Wikidata Integration in Wikimedia projects, Wikidata, ChangeProp, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog
Jgiannelos added a comment to T269003: Caching issue for title descriptions in German Wikipedia.

Even after purging the local cache in the app this still shows up.
The output of page/mobile-html looks like its fixed.

Wed, Mar 12, 11:02 AM · Content-Transform-Team (Work In Progress), Page Content Service, Wikidata data quality and trust, Wikidata Integration in Wikimedia projects, Wikidata, ChangeProp, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog

Tue, Mar 11

Jgiannelos added a comment to T388140: Rollout more wikis: week 2.

For reference, from a quick look:

Tue, Mar 11, 5:33 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos added a comment to T388140: Rollout more wikis: week 2.

We can do that, but that means we should keep track of whats supported and whats not in the PCS level.

Tue, Mar 11, 4:49 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos added a comment to T388140: Rollout more wikis: week 2.

I think this round of rollout needs a bit more thinking.
Currently restbase maintains a list of per project (wikipedias, wiktionaries, wikiquotes, wikivoyages) thats explicit:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/restbase/deploy/+/refs/heads/master/scap/vars.yaml#49

Tue, Mar 11, 4:22 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos moved T388140: Rollout more wikis: week 2 from Backlog to In Progress on the Content-Transform-Team (Work In Progress) board.
Tue, Mar 11, 2:27 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos claimed T388140: Rollout more wikis: week 2.
Tue, Mar 11, 2:26 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service

Mon, Mar 10

Jgiannelos moved T388148: CTT tasks week of 2025-03-07 from Current Deploy Target to In Progress on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 10, 8:10 PM · MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Essential-Work, Content-Transform-Team (Work In Progress)
Jgiannelos closed T387277: Rollout more wikis after week 1 of testing with production traffic as Resolved.
Mon, Mar 10, 3:04 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos closed T387343: Enable native prometheus metrics in RESTBase as Resolved.
Mon, Mar 10, 3:04 PM · SRE Observability, Content-Transform-Team (Work In Progress), observability, RESTBase Sunsetting
Jgiannelos closed T387277: Rollout more wikis after week 1 of testing with production traffic, a subtask of T264670: Move PCS endpoints behind API Gateway, as Resolved.
Mon, Mar 10, 3:04 PM · Content-Transform-Team (Work In Progress), RESTBase Sunsetting, Epic, serviceops, Code-Health-Objective, Page Content Service
Jgiannelos closed T387472: Add time jitter on TTL when invalidating caches on PCS as Resolved.
Mon, Mar 10, 3:04 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos closed T387472: Add time jitter on TTL when invalidating caches on PCS, a subtask of T386919: Pregeneration performance optimizations for PCS, as Resolved.
Mon, Mar 10, 3:04 PM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos moved T348996: Change changeprops rules to pre-generate/invalidate cache directly to PCS rather than in restbase from Blocked to To Deploy on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 10, 3:02 PM · Content-Transform-Team (Work In Progress), Page Content Service, serviceops, RESTBase Sunsetting
Jgiannelos moved T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did from Code Review to To Deploy on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 10, 3:01 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service

Fri, Mar 7

Jgiannelos added a comment to T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did.

As a workaround there is some orchestration in place that reads the resource change topics and invalidates caches outside of change-prop until we have the proper change prop solution deployed.

Fri, Mar 7, 12:07 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos moved T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did from Backlog to Code Review on the Content-Transform-Team (Work In Progress) board.
Fri, Mar 7, 10:00 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos triaged T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did as High priority.
Fri, Mar 7, 10:00 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos updated the task description for T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did.
Fri, Mar 7, 9:55 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos created T388214: Pregeneration rules don't pregenerate caches for the same cases restbase did.
Fri, Mar 7, 9:53 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service

Thu, Mar 6

Jgiannelos updated the task description for T388140: Rollout more wikis: week 2.
Thu, Mar 6, 3:41 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos created T388140: Rollout more wikis: week 2.
Thu, Mar 6, 3:40 PM · Chinese-Sites, serviceops-radar, Content-Transform-Team (Work In Progress), Code-Health-Objective, Page Content Service
Jgiannelos moved T387277: Rollout more wikis after week 1 of testing with production traffic from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Thu, Mar 6, 2:08 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos moved T387472: Add time jitter on TTL when invalidating caches on PCS from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Thu, Mar 6, 2:08 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos moved T387343: Enable native prometheus metrics in RESTBase from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Thu, Mar 6, 2:08 PM · SRE Observability, Content-Transform-Team (Work In Progress), observability, RESTBase Sunsetting

Tue, Mar 4

Jgiannelos added a comment to T386465: Add sylwiki to RESTBase.

Deployed to prod.

Tue, Mar 4, 5:15 PM · RESTBase
Jgiannelos added a comment to T386632: Add satwiktionary to RESTBase.

I just deployed the restbase change to enable the wiki in prod.

Tue, Mar 4, 5:15 PM · RESTBase
Jgiannelos moved T387472: Add time jitter on TTL when invalidating caches on PCS from Code Review to To Deploy on the Content-Transform-Team (Work In Progress) board.
Tue, Mar 4, 3:31 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos added a comment to T387277: Rollout more wikis after week 1 of testing with production traffic.

After talking with @Seddon its probably better if we swap ptwiki with srwiki+kowiki+idwikii

Tue, Mar 4, 1:45 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos added a comment to T385033: Switchover plan from restbase to rest gateway for PCS endpoints that don't require cache.

@MSantos we need to double check that we set the same cache headers everywhere. For the rest of the stored endpoints we needed to add them because they were missing.

Tue, Mar 4, 1:29 PM · Content-Transform-Team (Work In Progress), Page Content Service, RESTBase Sunsetting
Jgiannelos added a comment to T387472: Add time jitter on TTL when invalidating caches on PCS.

Merge request here: https://gitlab.wikimedia.org/repos/content-transform/nodejs-cassandra-storage/-/merge_requests/12

Tue, Mar 4, 11:10 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos moved T387472: Add time jitter on TTL when invalidating caches on PCS from Backlog to Code Review on the Content-Transform-Team (Work In Progress) board.
Tue, Mar 4, 10:45 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos added a project to T387472: Add time jitter on TTL when invalidating caches on PCS: Content-Transform-Team (Work In Progress).
Tue, Mar 4, 10:45 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos claimed T387472: Add time jitter on TTL when invalidating caches on PCS.
Tue, Mar 4, 10:44 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service

Mon, Mar 3

Jgiannelos moved T387343: Enable native prometheus metrics in RESTBase from Code Review to To Deploy on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 3, 4:10 PM · SRE Observability, Content-Transform-Team (Work In Progress), observability, RESTBase Sunsetting
Jgiannelos moved T278481: Parsoid support for the ProofreadPage extension from Backlog to In Progress on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 3, 4:06 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), ProofreadPage, Parsoid
Jgiannelos added a project to T278481: Parsoid support for the ProofreadPage extension: Content-Transform-Team (Work In Progress).
Mon, Mar 3, 4:06 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), ProofreadPage, Parsoid
Jgiannelos moved T387277: Rollout more wikis after week 1 of testing with production traffic from Code Review to To Deploy on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 3, 4:04 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos moved T387277: Rollout more wikis after week 1 of testing with production traffic from Backlog to Code Review on the Content-Transform-Team (Work In Progress) board.
Mon, Mar 3, 4:04 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
MSantos awarded T372746: hewiki: Route mobile-html to the backing node service instead of RESTBase a Love token.
Mon, Mar 3, 3:31 PM · Essential-Work, RESTBase Sunsetting, Content-Transform-Team-WIP, serviceops, Code-Health-Objective, Page Content Service

Thu, Feb 27

Jgiannelos claimed T387277: Rollout more wikis after week 1 of testing with production traffic.
Thu, Feb 27, 3:41 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos added a project to T387277: Rollout more wikis after week 1 of testing with production traffic: Content-Transform-Team (Work In Progress).
Thu, Feb 27, 3:40 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos created T387472: Add time jitter on TTL when invalidating caches on PCS.
Thu, Feb 27, 3:35 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos created T387438: Add cache purge support on nodejs cassandra storage middleware.
Thu, Feb 27, 8:48 AM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos created T387437: Adapt changeprop rules to purge content on resource changes for non main namespace events.
Thu, Feb 27, 8:47 AM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos created T387436: Adapt changeprop rules to only pregenerate content on main namespace .
Thu, Feb 27, 8:46 AM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos created T387435: Add page namespace information on resource change events.
Thu, Feb 27, 8:45 AM · Data-Engineering-Radar, Data-Engineering, Event-Platform, serviceops, RESTBase Sunsetting, Epic, Page Content Service

Wed, Feb 26

Jgiannelos moved T387343: Enable native prometheus metrics in RESTBase from Backlog to Code Review on the Content-Transform-Team (Work In Progress) board.
Wed, Feb 26, 3:09 PM · SRE Observability, Content-Transform-Team (Work In Progress), observability, RESTBase Sunsetting
Jgiannelos claimed T387343: Enable native prometheus metrics in RESTBase.
Wed, Feb 26, 3:09 PM · SRE Observability, Content-Transform-Team (Work In Progress), observability, RESTBase Sunsetting
Jgiannelos updated subscribers of T387277: Rollout more wikis after week 1 of testing with production traffic.

@Seddon Any objections on the list? I wasn't sure if there is a language specific experiment happening that could be an issue.

Wed, Feb 26, 9:23 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos created T387277: Rollout more wikis after week 1 of testing with production traffic.
Wed, Feb 26, 9:17 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos updated subscribers of T386919: Pregeneration performance optimizations for PCS.

After discussing with @Ottomata and @Joe it sounds like a good idea to add more fields if we need them. Using meta though is a bad idea: https://wikitech.wikimedia.org/wiki/Event_Platform/Flaws#meta_field

Wed, Feb 26, 8:51 AM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos added a comment to T372749: hewiki: Use backing node service instead of RESTBase on pregeneration changeprop rules.

Metrics overall look OK.
Using the client side session data I don't see any bump in session.page_load_latency but I would like to see how a weeks worth of traffic works.
More specifically:

Wed, Feb 26, 8:15 AM · RESTBase Sunsetting, Content-Transform-Team-WIP, serviceops, Code-Health-Objective, Page Content Service, Core Platform Team Initiatives (API Gateway), Platform Engineering Roadmap, Platform Engineering Roadmap Decision Making

Tue, Feb 25

Jgiannelos created P73585 (An Untitled Masterwork).
Tue, Feb 25, 4:18 PM
Jgiannelos added a comment to T386919: Pregeneration performance optimizations for PCS.

@Joe I spent some time figuring out how EventBus works in order to create and emit events but I don't think the current schema gives us flexibility to add more information about the event namespace.
Do you think the issue described in this ticket justifies a schema change? I was thinking that the least invasive way is to just add a tag like:

  "meta": {
    "uri": "http://de.wikipedia.org/api/rest_v1/page/html/Heinrich_von_Othegraven",
    "stream": "resource_change",
    "request_id": "da40ebc5-556f-4609-be09-b6c82b0661e2",
    "id": "9c996201-f391-11ef-9615-87603b9b8d19",
    "dt": "2025-02-25T16:00:19.488Z",
    "domain": "de.wikipedia.org"
  },
  "$schema": "/resource_change/1.0.0",
  "tags": [
    "restbase",
    "main_namespace"
  ],
  "triggered_by": "req:da40ebc5-556f-4609-be09-b6c82b0661e2,mediawiki.revision-create:https://de.wikipedia.org/wiki/Heinrich_von_Othegraven"
}
Tue, Feb 25, 4:05 PM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos added a comment to T386926: Scale up Kartotherian on Wikikube and move live traffic to it.

I think so, yeah both master nodes and postgres read replicas.

Tue, Feb 25, 2:30 PM · Infrastructure-Foundations, serviceops, Maps (Kartotherian)
Jgiannelos added a comment to T386926: Scale up Kartotherian on Wikikube and move live traffic to it.

Just a comment around the usage in the bare metal nodes, keep in mind that each node other than node service also runs Postgres/PostGIS and master nodes run the OSM import pipeline.

Tue, Feb 25, 2:24 PM · Infrastructure-Foundations, serviceops, Maps (Kartotherian)
Jgiannelos closed T372746: hewiki: Route mobile-html to the backing node service instead of RESTBase, a subtask of T264670: Move PCS endpoints behind API Gateway, as Resolved.
Tue, Feb 25, 12:35 PM · Content-Transform-Team (Work In Progress), RESTBase Sunsetting, Epic, serviceops, Code-Health-Objective, Page Content Service
Jgiannelos closed T372746: hewiki: Route mobile-html to the backing node service instead of RESTBase as Resolved.
Tue, Feb 25, 12:35 PM · Essential-Work, RESTBase Sunsetting, Content-Transform-Team-WIP, serviceops, Code-Health-Objective, Page Content Service
Jgiannelos assigned T372746: hewiki: Route mobile-html to the backing node service instead of RESTBase to hnowlan.
Tue, Feb 25, 12:35 PM · Essential-Work, RESTBase Sunsetting, Content-Transform-Team-WIP, serviceops, Code-Health-Objective, Page Content Service

Feb 20 2025

Jgiannelos added a comment to T386919: Pregeneration performance optimizations for PCS.

Using the following queries:

SELECT COUNT(*)
FROM
  (SELECT regexp_like(uri_path, '/api/rest_v1/page/mobile-html/(User_talk|User|Wikipedia|File_talk|File|Category_talk|Category|Draft|Template|Template_talk|Wikipedia_talk|Draft_talk|Portal|Module|Module_talk)\%3A.*') AS is_non_main_ns_mobile_html,
          uri_path
   FROM webrequest_sampled_128
   WHERE __time >= TIMESTAMP '2025-02-05 00:00:00'
     AND __time < TIMESTAMP '2025-02-06 00:00:00'
     AND uri_host = 'en.wikipedia.org' )
WHERE
  is_non_main_ns_mobile_html=true
Feb 20 2025, 3:34 PM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos updated the task description for T386919: Pregeneration performance optimizations for PCS.
Feb 20 2025, 1:33 PM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos created T386919: Pregeneration performance optimizations for PCS.
Feb 20 2025, 12:25 PM · serviceops, RESTBase Sunsetting, Epic, Page Content Service
Jgiannelos closed T385718: PCS raises errors when trying to send requests to eventgate, a subtask of T314025: [EPIC] Migrate PCS service away from restbase, as Resolved.
Feb 20 2025, 12:03 PM · Content-Transform-Team-WIP, RESTBase Sunsetting, Epic, Platform Team Workboards (MW Expedition), Page Content Service
Jgiannelos closed T385718: PCS raises errors when trying to send requests to eventgate as Resolved.
Feb 20 2025, 12:03 PM · Content-Transform-Team-WIP, RESTBase Sunsetting, Page Content Service

Feb 19 2025

MSantos awarded T319365: PCS caching and pregeneration when restbase is decommissioned a Love token.
Feb 19 2025, 3:37 PM · Wikipedia-Android-App-Backlog, Data-Persistence, Patch-For-Review, Epic, User-jijiki, RESTBase Sunsetting, Content-Transform-Team-WIP, Traffic, Wikipedia-iOS-App-Backlog, iOS-app-feature-Performance, RESTBase, serviceops
Jgiannelos closed T385821: Dont propagate server error details to end users, a subtask of T264670: Move PCS endpoints behind API Gateway, as Resolved.
Feb 19 2025, 3:33 PM · Content-Transform-Team (Work In Progress), RESTBase Sunsetting, Epic, serviceops, Code-Health-Objective, Page Content Service
Jgiannelos closed T385821: Dont propagate server error details to end users as Resolved.
Feb 19 2025, 3:33 PM · Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos added a comment to T385821: Dont propagate server error details to end users.

Verified on staging

Feb 19 2025, 2:04 PM · Content-Transform-Team (Work In Progress), serviceops, Page Content Service
Jgiannelos added a comment to T386648: Review maps outage happened on Feb 17th 2025.

From my local env when I try this request this shows up in the trace logs so indeed it should eventually request en.wikipedia.org:

[2025-02-19T05:00:20.053Z] TRACE: kartotherian/20 on 48bfdbb6c162: Outgoing request (request_id=6b03dd00-ee7e-11ef-b32e-cf29ca820246, levelPath=trace/req)
    out_request: {
      "method": "post",
      "uri": "https://en.wikipedia.org/w/api.php",
      "headers": {
        "user-agent": "kartotherian",
        "x-request-id": "6b03dd00-ee7e-11ef-b32e-cf29ca820246"
      },
      "body": {
        "format": "json",
        "formatversion": "2",
        "action": "query",
        "revids": "1268325471",
        "prop": "mapdata",
        "mpdlimit": "max",
        "mpdgroups": [
          "_1b46af921bb4e1f090a1b07748a50bd2e1f322fc"
        ]
      }
    }
    --
    request: {
      "url": "/img/osm-intl,a,a,a,300x200.png?lang=en&domain=en.wikipedia.org&title=Alabama&revid=1268325471&groups=_1b46af921bb4e1f090a1b07748a50bd2e1f322fc",
      "headers": {
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:135.0) Gecko/20100101 Firefox/135.0",
        "x-request-id": "6b03dd00-ee7e-11ef-b32e-cf29ca820246"
      },
      "method": "GET",
      "params": {
        "0": "/img/osm-intl,a,a,a,300x200.png"
      },
      "query": {
        "lang": "en",
        "domain": "en.wikipedia.org",
        "title": "Alabama",
        "revid": "1268325471",
        "groups": [
          "_1b46af921bb4e1f090a1b07748a50bd2e1f322fc"
        ]
      },
      "remoteAddress": "192.168.65.1",
      "remotePort": 48165
    }
Feb 19 2025, 5:02 AM · Content-Transform-Team, serviceops, Maps (Kartotherian)
Jgiannelos added a comment to T386648: Review maps outage happened on Feb 17th 2025.

For debugging purposes, this is a URL that requests a snapshot with an overlay map from en.wikipedia.org:

https://maps.wikimedia.org/img/osm-intl,a,a,a,300x200.png?lang=en&domain=en.wikipedia.org&title=Alabama&revid=1268325471&groups=_1b46af921bb4e1f090a1b07748a50bd2e1f322fc
Feb 19 2025, 4:48 AM · Content-Transform-Team, serviceops, Maps (Kartotherian)

Feb 18 2025

Jgiannelos added a comment to T386648: Review maps outage happened on Feb 17th 2025.

I agree, but the only env that could hang on en.wikipedia.org is k8s, maps nodes can talk to that endpoint directly.

Feb 18 2025, 2:11 PM · Content-Transform-Team, serviceops, Maps (Kartotherian)
Jgiannelos added a comment to T386648: Review maps outage happened on Feb 17th 2025.

The errors not showing up on the k8s side could be because there was actually no error, the ETIMEDOUT is raised in the client side, the server side just hangs.

Feb 18 2025, 1:54 PM · Content-Transform-Team, serviceops, Maps (Kartotherian)
Jgiannelos claimed T385821: Dont propagate server error details to end users.
Feb 18 2025, 8:21 AM · Content-Transform-Team (Work In Progress), serviceops, Page Content Service