Page MenuHomePhabricator

hnowlan (Hugh Nowlan)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Jan 6 2020, 12:19 PM (237 w, 4 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
HNowlan (WMF) [ Global Accounts ]

Recent Activity

Wed, Jul 10

hnowlan closed T369655: Electrons missing when previewing SVG file in PNG format as Resolved.

This appears to have been an issue with a previous version of librsvg, I've purged the caches for each of the affected images in this case and they now render correctly. If you notice any other similarly incorrect images, purging the cache will hopefully address it.

Wed, Jul 10, 9:51 AM · Thumbor, Commons

Tue, Jul 9

hnowlan added a comment to T365995: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad.

kubernetes* and mw* are ready

Tue, Jul 9, 2:33 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

Mon, Jul 8

hnowlan added a comment to T369482: Specific SVG rendered as empty grey PNG thumbnail file.

Confirming that this is an issue, seeing the same behaviour with a clean cache and locally with rsvg-convert. Either an issue with the file or a new issue with rsvg-convert.

Mon, Jul 8, 2:47 PM · Thumbor
hnowlan changed the status of T350507: Update mobileapps k8s deployment chart for Cassandra credentials from Open to Stalled.

Are there plans on when and how we need to move this forward to production?

Mon, Jul 8, 11:43 AM · Content-Transform-Team, Patch-For-Review, Page Content Service, serviceops, RESTBase Sunsetting
hnowlan moved T350507: Update mobileapps k8s deployment chart for Cassandra credentials from Incoming 🐫 to 🛎 Services & Oids on the serviceops board.
Mon, Jul 8, 11:43 AM · Content-Transform-Team, Patch-For-Review, Page Content Service, serviceops, RESTBase Sunsetting
hnowlan changed the status of T350507: Update mobileapps k8s deployment chart for Cassandra credentials, a subtask of T348995: Introduce PCS cache management layer , from Open to Stalled.
Mon, Jul 8, 11:42 AM · Page Content Service, serviceops, RESTBase Sunsetting, Content-Transform-Team-WIP
hnowlan updated the task description for T350507: Update mobileapps k8s deployment chart for Cassandra credentials.
Mon, Jul 8, 11:41 AM · Content-Transform-Team, Patch-For-Review, Page Content Service, serviceops, RESTBase Sunsetting

Sat, Jul 6

hnowlan edited projects for T369388: Upload errors due to swift failures, 503s, added: Data-Persistence; removed serviceops.
Sat, Jul 6, 6:15 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
hnowlan added a comment to T369388: Upload errors due to swift failures, 503s.

It seems a bad frontend server was the source of these errors, and a rolling restart appears to have addressed this but we'll follow up during the week to see if there are any obvious causes.

Sat, Jul 6, 5:31 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage

Fri, Jun 28

hnowlan created T368743: Relabel codfw kubernetes nodes.
Fri, Jun 28, 3:29 PM · SRE, ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops
hnowlan added a comment to T368301: STL 3D models broken: "Sorry, the file Undefined cannot be displayed since it is not present on the current page.".

The thumbor-side issues were a side-effect of the upgrade that our tests didn't catch due to differences in config between prod and test - I see the supplied image rendering now (I've purged the cache for that image to be sure). However, I am still seeing the Undefined error, which I suspect is either a long-tail side-effect of the Thumbor issue, or something unrelated.

Fri, Jun 28, 10:28 AM · serviceops, Regression, 3D, Commons

Thu, Jun 27

hnowlan created T368619: Log filename in shellbox-video httpd.
Thu, Jun 27, 1:38 PM · Video, TimedMediaHandler, MW-on-K8s, serviceops
kamila awarded T357309: Create a deployment for `shellbox-timedmedia` a Love token.
Thu, Jun 27, 9:59 AM · Patch-For-Review, Video, TimedMediaHandler, MW-on-K8s, serviceops
hnowlan closed T357309: Create a deployment for `shellbox-timedmedia` as Resolved.

I'm sure there'll be some tweaks further down the road, but this deployment has been created. Tracking further work in T356241

Thu, Jun 27, 9:53 AM · Patch-For-Review, Video, TimedMediaHandler, MW-on-K8s, serviceops
hnowlan closed T357309: Create a deployment for `shellbox-timedmedia`, a subtask of T356241: Move video transcoding to use Shellbox, as Resolved.
Thu, Jun 27, 9:51 AM · Patch-For-Review, Video, TimedMediaHandler, MW-on-K8s, serviceops

Jun 26 2024

hnowlan updated the task description for T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm.
Jun 26 2024, 10:47 AM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
hnowlan added a comment to T368405: Special:Homepage is rendered much slower (<1 sec to 2+ sec).

In a different conversation, @kostajh pointed out that the requests to the AnalyticsQueryService, added in T235810 to show the site edits in the last day, might be a contributing factor. They are in fact still in use as a fallback in \GrowthExperiments\HomepageModules\SuggestedEdits::getMobileSummaryBody. And that else-branch also logs a warning if we do not get something useful back from the AnalyticsQueryService. And that warning is seeing a suspicious uptick as well:

image.png (395×847 px, 40 KB)

So I think a good next place to look would be to ask AQS folks if anything there changed on June 18.

I'm sure the developers would be best positioned to say whether anything has changed, but as far as the AQS services themselves are concerned it doesn't seem like there have been any significant increases in latency: service-level view, REST gateway level view (easiest to read if you filter out the proton metrics)

Jun 26 2024, 9:44 AM · Data-Platform-SRE (2024.07.08 - 2024.07.28), Growth-Team (FY2024-25 Q1 Sprint 1), MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Products, User-Michael, Data-Platform, Performance Issue, GrowthExperiments-Homepage
hnowlan added a comment to T364799: Write access policy for AQS docs.

@VirginiaPoundstone I am not aware of any rate limits on AQS 2.0 service , tagging @hnowlan for confirmation.

Jun 26 2024, 8:53 AM · Patch-For-Review, Data Products (Data Products Sprint 15), AQS2.0, Tech-Docs-Team, Documentation

Jun 24 2024

hnowlan added a comment to T368180: Thumbor high log volume and unstructured logging.

Log level reduced to stop the bleeding:

image.png (357×1 px, 48 KB)

Jun 24 2024, 10:22 AM · Patch-For-Review, Thumbor, Observability-Logging

Jun 21 2024

Quiddity awarded T337139: Hyphenated langtags in Thumbor/7.3.2 and librsvg 2.44.10 do not show any text a Like token.
Jun 21 2024, 5:25 PM · Patch-For-Review, Platform Team Workboards (Platform Engineering Reliability), Wikimedia-SVG-rendering, Thumbor Migration, Thumbor

Jun 13 2024

hnowlan added a comment to T333120: Migrate internal traffic to k8s.

I believe the straggling traffic here is a misnomer/a graph misunderstanding - the API gateway's envoy config refers to traffic to the mediawiki API as "mwapi_cluster" internally before and after the hostname was changed. I believe these requests are normal and are being routed to k8s already - drop mwapi_cluster from the graph and we're at zero! 🎉

Jun 13 2024, 3:57 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
hnowlan added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

@WDoranWMF Yep , it makes sense . I confirmed with @mforns that API paths and we agreed on metrics/commons-analytics . Regarding the prefix , all the AQS services use /api/rest_v1/metrics , we could use that from consistency perspective . But still will like @VirginiaPoundstone to confirm .

Jun 13 2024, 1:41 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
hnowlan closed T254917: Implement API Gateway solution for deployment-prep as Declined.

I don't think this will be needed.

Jun 13 2024, 11:12 AM · Patch-Needs-Improvement, serviceops, Beta-Cluster-Infrastructure, Platform Team Workboards (Green), Core Platform Team Initiatives (API Gateway)
hnowlan closed T254917: Implement API Gateway solution for deployment-prep, a subtask of T255034: Wikimedia API Gateway Long-term Use, as Declined.
Jun 13 2024, 11:12 AM · serviceops, Platform Engineering Roadmap, Epic, Platform Team Workboards (Epics), Core Platform Team Initiatives (API Gateway)

Jun 12 2024

hnowlan closed T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL, a subtask of T343123: Migrate Machine-generated Article Descriptions from toolforge to liftwing., as Resolved.
Jun 12 2024, 10:47 AM · Wikipedia-Android-App-Backlog (Android Release - FY2024-25), Machine-Learning-Team
hnowlan closed T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL as Resolved.
Jun 12 2024, 10:47 AM · Machine-Learning-Team
hnowlan added a comment to T261192: Rendering multilingual (systemLanguage) SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10.

This change now uses librsvg's accept-language flag to obey`lang`.

Jun 12 2024, 10:41 AM · Patch-For-Review, Upstream, Wikimedia-SVG-rendering, MediaWiki-File-management, I18n

Jun 11 2024

Krinkle awarded T265549: Update librsvg to version > 2.44.10 (2.50.3) a Orange Medal token.
Jun 11 2024, 5:51 PM · User-notice-archive, Packaging, Wikimedia-SVG-rendering, Thumbor
hnowlan added a comment to T295007: Upload by URL should use the job queue, possibly chunked with range requests.

I’m not aware of any issues, but if this is working (great!), then we should probably enable it by default in MediaWiki, and everywhere in production? Right now AFAICT $wgEnableAsyncUploadsByURL still defaults to false in core and is only enabled in commonswiki, testwiki and the Beta Cluster in wmf-config.

Jun 11 2024, 10:19 AM · MW-1.42-notes (1.42.0-wmf.24; 2024-03-26), MediaWiki CodeJam Dec 2023, MediaWiki-Uploading

Jun 10 2024

1234qwer1234qwer4 awarded T265549: Update librsvg to version > 2.44.10 (2.50.3) a Barnstar token.
Jun 10 2024, 9:56 PM · User-notice-archive, Packaging, Wikimedia-SVG-rendering, Thumbor
Pcoombe awarded T265549: Update librsvg to version > 2.44.10 (2.50.3) a Love token.
Jun 10 2024, 8:39 PM · User-notice-archive, Packaging, Wikimedia-SVG-rendering, Thumbor
hnowlan closed T199618: Thin stroke-width=".02" not rendered until ~250% zoom level as Resolved.

Thin lines are now rendering in the test cases given

Jun 10 2024, 5:35 PM · Upstream, Wikimedia-SVG-rendering, Multimedia, Commons
hnowlan closed T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute as Resolved.

Considering this resolved as part of T355020. Please reopen if that's incorrect

Jun 10 2024, 5:24 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute, a subtask of T35245: Incorrect text positioning/kerning in SVG rendering (text/tspan x/y, dx/dy attribute; upstream), as Resolved.
Jun 10 2024, 5:23 PM · Thumbor, Wikimedia-SVG-rendering, Upstream
hnowlan closed T246001: SVG <use> element inside <clipPath> should only reference path, text, or basic shapes as Resolved.

Not seeing this behaviour as fixed onwiki in librsvg 2.54.7 in place

Jun 10 2024, 5:21 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan added a comment to T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute.

For existing images that have rendering issues, how does one kick-off rerendering existing images that had issues after this upgrade?

https://en.wikipedia.org/wiki/Help:Purge#Purge_request_to_server - tldr add ?action=purge when viewing the File:myfile.svg page.

Be sure to force refresh your browser after doing it to be sure changes have rendered.

Thank you. Gave that a shot and sadly do not see an improvement in the example I previously shared. (yes cleared browser cache and even tried another browser that hasn't previously pulled this page up before).
(Also does Wikipedia have a CDN? perhaps that's also caching it somehow/somewhere?)

I had actually already run a purge for your image, which improved some but not all of the text alignment. This must be another issue outstanding in our current version of librsvg, but I don't know enough to point to which specific issue unfortunately. In coming weeks we'll be upgrading to 2.54.7 which will offer further improvements.

Jun 10 2024, 5:05 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T355020: Upgrade Thumbor to Debian Bookworm as Resolved.
Jun 10 2024, 4:56 PM · Patch-For-Review, Internet-Archive, Thumbor
hnowlan closed T355020: Upgrade Thumbor to Debian Bookworm, a subtask of T246014: set default dpi to 96 for rsvg, as Resolved.
Jun 10 2024, 4:54 PM · Thumbor, Wikimedia-SVG-rendering
hnowlan added a comment to T246014: set default dpi to 96 for rsvg.

@hnowlan: But that task is resolved already (please only write T336881 instead of full URLs to get corresponding link rendering) so why would this one remain open? Not sure I can follow.

Jun 10 2024, 11:36 AM · Thumbor, Wikimedia-SVG-rendering
hnowlan added a comment to T246001: SVG <use> element inside <clipPath> should only reference path, text, or basic shapes.

Not seeing this behaviour as fixed onwiki in librsvg 2.54.7 in place

Jun 10 2024, 11:32 AM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan added a comment to T172312: Install SiyamRupali font (bengali) for svg.

Is this still an issue? As far as I can tell we have all available Bengali fonts installed that are currently available in Debian (along with the noto fonts from T184664) but SiyamRupali is not one of them. If there's an example of a broken SVG it'd be helpful to debug this further

Jun 10 2024, 11:25 AM · Wikimedia-SVG-rendering, Bengali-Sites
hnowlan added a comment to T364362: Some svg files are not rendering properly.

The generated image post-purge/post-upgrade appears to render correctly for me, fixed in the last upgrade?

Jun 10 2024, 9:55 AM · Thumbor, Wikimedia-SVG-rendering

Jun 7 2024

hnowlan added a comment to T246014: set default dpi to 96 for rsvg.

This will be done with T355020

Jun 7 2024, 4:14 PM · Thumbor, Wikimedia-SVG-rendering
hnowlan closed T349673: Small PNG thumbnail of SVG can miss letters close to image border as Resolved.

I agree this isn't necessarily a software issue. That said, since upgrades the 180px version of this image is no longer cut off.

Jun 7 2024, 4:12 PM · Thumbor, Wikimedia-SVG-rendering
hnowlan added a comment to T355020: Upgrade Thumbor to Debian Bookworm.

https://gerrit.wikimedia.org/r/1039778 is now ready for review. In future we might need to revisit how we do our SSIM comparisons as regards reference thumbnails. As our tools change, the distances etc are diverging and there will come a point where tweaking the test values will become a problem.

Jun 7 2024, 4:07 PM · Patch-For-Review, Internet-Archive, Thumbor

Jun 6 2024

TheDJ awarded T336881: [XL] Upgrade Thumbor to bullseye a Stroopwafel token.
Jun 6 2024, 9:29 PM · Structured-Data-Backlog (Current Work), Platform Team Workboards (Platform Engineering Reliability), serviceops, Thumbor
hnowlan closed T236398: SVG text-decoration="overline" doesn't work as Resolved.

Appears resolved.

Jun 6 2024, 5:01 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T289765: SVG paint-order property doesn't works in SVG renderer as Resolved.

Appears resolved by librsvg upgrade

Jun 6 2024, 4:56 PM · Wikimedia-SVG-rendering, Thumbor
hnowlan closed T213139: Discrepancies in SVG Translate's pngs and Commons' versions in character and word spacings as Resolved.
Jun 6 2024, 4:54 PM · Wikimedia-SVG-rendering, Upstream, SVG Translate Tool
hnowlan closed T215815: librsvg uses class attribute order instead of document order during CSS cascade as Resolved.
Jun 6 2024, 4:52 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3) as Resolved.

Resolving for now, following up in related issues.

Jun 6 2024, 4:51 PM · User-notice-archive, Packaging, Wikimedia-SVG-rendering, Thumbor
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T7792: rsvg does not render baseline-shift correctly (<percentage> and <length>), as Resolved.
Jun 6 2024, 4:51 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T65236: Han characters in SVG files misplaced and clustered, as Resolved.
Jun 6 2024, 4:51 PM · Vertical-Writing, Upstream, Chinese-Sites, I18n, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T43422: rsvg cannot handle classes/ids with cyrillic alphabet when styling, as Resolved.
Jun 6 2024, 4:51 PM · Upstream, Thumbor, I18n, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T43423: CSS child selector not supported by rsvg, as Resolved.
Jun 6 2024, 4:51 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T43425: rsvg does not support the font shorthand style property, as Resolved.
Jun 6 2024, 4:51 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T64986: librsvg does not support fallback font set (more than one font family), as Resolved.
Jun 6 2024, 4:51 PM · Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute, as Resolved.
Jun 6 2024, 4:51 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T106240: Colorable SVG, as Resolved.
Jun 6 2024, 4:51 PM · Structured-Data-Backlog, Structured Data Engineering, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T271663: Offer to invert text-anchor for RTL languages, as Resolved.
Jun 6 2024, 4:50 PM · I18n, RTL, Community-Tech, SVG Translate Tool
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T200443: SVG text-anchor=end confused by tspan with following #text, as Resolved.
Jun 6 2024, 4:50 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T294843: BackgroundImage filter antialiasing pixel artifacts, as Resolved.
Jun 6 2024, 4:50 PM · Thumbor, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T316962: librsvg filter using FillPaint referencing a gradient fill does not work., as Resolved.
Jun 6 2024, 4:50 PM · Upstream, Wikimedia-SVG-rendering
hnowlan closed T265549: Update librsvg to version > 2.44.10 (2.50.3), a subtask of T336894: librsvg 2.44.10 causes a regression: <text> with text-anchor="middle" and multiple <tspan>s is misaligned, as Resolved.
Jun 6 2024, 4:50 PM · Commons, Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T304209: SVG `<style` element ignored if put at the end as Resolved.

This appears fixed

Jun 6 2024, 4:48 PM · Thumbor, Multimedia, Commons, Upstream, Wikimedia-SVG-rendering
hnowlan closed T294843: BackgroundImage filter antialiasing pixel artifacts as Resolved.

Appears fixed

Jun 6 2024, 4:40 PM · Thumbor, Wikimedia-SVG-rendering
hnowlan added a comment to T271663: Offer to invert text-anchor for RTL languages.

Has this issue been fixed by the upgrade and the improved text-anchor behaviours?

Jun 6 2024, 4:39 PM · I18n, RTL, Community-Tech, SVG Translate Tool
hnowlan added a comment to T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute.

For existing images that have rendering issues, how does one kick-off rerendering existing images that had issues after this upgrade?

https://en.wikipedia.org/wiki/Help:Purge#Purge_request_to_server - tldr add ?action=purge when viewing the File:myfile.svg page.

Be sure to force refresh your browser after doing it to be sure changes have rendered.

Thank you. Gave that a shot and sadly do not see an improvement in the example I previously shared. (yes cleared browser cache and even tried another browser that hasn't previously pulled this page up before).
(Also does Wikipedia have a CDN? perhaps that's also caching it somehow/somewhere?)

Jun 6 2024, 4:34 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan added a comment to T64986: librsvg does not support fallback font set (more than one font family).

I believe after purging SVGs are now obeying the font list. Not closing until this is confirmed

Jun 6 2024, 4:32 PM · Wikimedia-SVG-rendering
hnowlan closed T7792: rsvg does not render baseline-shift correctly (<percentage> and <length>) as Resolved.

This appears to be resolved.

Jun 6 2024, 4:23 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T200443: SVG text-anchor=end confused by tspan with following #text as Resolved.

This specific issue appears resolved - I see some other unresolved issues in SVG_Test_TextAlign.svg but not pertinent to this issue.

Jun 6 2024, 4:18 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan added a comment to T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute.

For existing images that have rendering issues, how does one kick-off rerendering existing images that had issues after this upgrade?

Jun 6 2024, 4:13 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T43422: rsvg cannot handle classes/ids with cyrillic alphabet when styling as Resolved.

This appears resolved by T265549

Jun 6 2024, 4:11 PM · Upstream, Thumbor, I18n, Wikimedia-SVG-rendering
hnowlan closed T316962: librsvg filter using FillPaint referencing a gradient fill does not work. as Resolved.

This appears to be fixed by T265549

Jun 6 2024, 3:59 PM · Upstream, Wikimedia-SVG-rendering
hnowlan closed T43423: CSS child selector not supported by rsvg as Resolved.

Solved as of T265549

Jun 6 2024, 3:57 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan closed T43425: rsvg does not support the font shorthand style property as Resolved.

I believe this is solved by T265549

Jun 6 2024, 3:54 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan added a comment to T97233: Incorrect text positioning in SVG with tspan element and text-anchor attribute.

Since upgrading to 2.50.3 and doing some purges, I am seeing *some* improvement in the errors in this ticket but not in all cases.

Jun 6 2024, 3:48 PM · Thumbor, Upstream, Wikimedia-SVG-rendering
hnowlan added a comment to T265549: Update librsvg to version > 2.44.10 (2.50.3).

We are using Thumbor on bullseye everywhere which means that SVGs will be rendered by 2.50.3. Keeping this task open for tracking issues for the moment.

Jun 6 2024, 3:41 PM · User-notice-archive, Packaging, Wikimedia-SVG-rendering, Thumbor
hnowlan added a comment to T364921: Commons Impact Metrics: Data Gateway endpoints.

The last log line is:

{"@timestamp":"2024-06-05T22:49:28Z","message":"Connecting to Cassandra database: aqs2001-a.codfw.wmnet,aqs2001-b.codfw.wmnet,aqs2002-a.codfw.wmnet,aqs2002-b.codfw.wmnet,aqs2003-a.codfw.wmnet,aqs2003-b.codfw.wmnet,aqs2004-a.codfw.wmnet,aqs2004-b.codfw.wmnet,aqs2005-a.codfw.wmnet,aqs2005-b.codfw.wmnet,aqs2006-a.codfw.wmnet,aqs2006-b.codfw.wmnet,aqs2007-a.codfw.wmnet,aqs2007-b.codfw.wmnet,aqs2008-a.codfw.wmnet,aqs2008-b.codfw.wmnet,aqs2009-a.codfw.wmnet,aqs2009-b.codfw.wmnet,aqs2010-a.codfw.wmnet,aqs2010-b.codfw.wmnet,aqs2011-a.codfw.wmnet,aqs2011-b.codfw.wmnet,aqs2012-a.codfw.wmnet,aqs2012-b.codfw.wmnet (port 9042)","log":{"level":"INFO"},"service":{"name":"data-gateway"}}

This suggests cassandra client session init is hanging for some reason.

Jun 6 2024, 11:15 AM · Data Products, Cassandra, serviceops, Service-deployment-requests, SRE

May 30 2024

hnowlan moved T365571: Rename wikikube worker nodes during OS reimage from Incoming 🐫 to Doing 😎 on the serviceops board.
May 30 2024, 2:10 PM · Kubernetes, Prod-Kubernetes, serviceops
hnowlan moved T366094: k8s master capacity issues from Incoming 🐫 to Doing 😎 on the serviceops board.
May 30 2024, 2:10 PM · serviceops, SRE
hnowlan moved T339863: Thumbor's use of poolcounter is rate limiting Kubernetes IPs from Doing 😎 to 🛎 Services & Oids on the serviceops board.
May 30 2024, 2:09 PM · Structured-Data-Backlog, serviceops, Thumbor
Dbrant awarded T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL a Like token.
May 30 2024, 1:24 PM · Machine-Learning-Team
hnowlan added a comment to T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL.

I am now seeing results when using queries with urlencoded characters. Unfortunately we will need to add a manual hack if there are other non-alphanumeric chars in other parts of the URL in future, but for now I think this works:

May 30 2024, 11:38 AM · Machine-Learning-Team

May 28 2024

hnowlan created T366094: k8s master capacity issues.
May 28 2024, 4:58 PM · serviceops, SRE
hnowlan created T366085: Relabel kubernetes2032 to wikikube-worker2002.
May 28 2024, 3:54 PM · SRE, ops-codfw, DC-Ops
hnowlan created P63430 (An Untitled Masterwork).
May 28 2024, 12:21 PM
hnowlan added a comment to T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL.

It seems Envoy only normalises a subset of urlencoded characters:

hnowlan@plunkett ~/Code/deployment-charts (hnowlan/T365439-apigw_normalise_path_urls *) $ curl -s localhost:8087/core/v1/wikisource/a/%3A| grep original-path
    "x-envoy-original-path": "/core/v1/wikisource/a/%3A"
hnowlan@plunkett ~/Code/deployment-charts (hnowlan/T365439-apigw_normalise_path_urls *) $ curl -s localhost:8087/core/v1/wikisource/a/%31| grep original-path
    "x-envoy-original-path": "/core/v1/wikisource/a/1"
May 28 2024, 10:51 AM · Machine-Learning-Team

May 27 2024

hnowlan added a comment to T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL.

The normalisation change has unfortunately not fixed this issue - docs indicate that it should have but I suspect this is something to do with the use of regex matching as opposed to static matching. I'll try to come up with a workaround for the short term

May 27 2024, 12:36 PM · Machine-Learning-Team

May 23 2024

hnowlan added a comment to T363996: Sessionstore's discovery TLS cert will expire before end of May 2024.

certs updated in all DCs, alerts resolved. I sincerely hope we will have the mesh migration resolved so we can avoid having to update echostore's certificates in October, but in case something prevents that and for reference the process was:

  • puppet cert revoke sessionstore.discovery.wmnet
  • In the puppet repo on your local checkout ./utils/create_ecdsa_cert sessionstore.discovery.wmnet sessionstore.svc.eqiad.wmnet sessionstore.svc.codfw.wmnet
  • On the puppetmaster, put the contents of /var/lib/puppet/server/ssl/ca/signed/sessionstore.discovery.wmnet.pem into certs.kask.cert in helmfile.d
  • Add the contents of the new private key from ./modules/secret/secrets/ssl/sessionstore.discovery.wmnet.key to hieradata/role/common/deployment_server/kubernetes.yaml
  • Validate the files and make sure everything looks okay using openssl ec/openssl x509, then git commit your changes in private
  • Follow the Helm rollout process as normal, keeping an eye on the sessionstore graphs and the session loss graphs
May 23 2024, 3:50 PM · Patch-For-Review, serviceops, Data-Persistence
hnowlan created T365712: Relabel codfw Kubernetes hosts .
May 23 2024, 1:30 PM · SRE, serviceops, ops-codfw, DC-Ops
hnowlan created T365711: Relabel eqiad Kubernetes hosts .
May 23 2024, 1:26 PM · SRE, serviceops, ops-eqiad, DC-Ops

May 22 2024

hnowlan added a comment to T363996: Sessionstore's discovery TLS cert will expire before end of May 2024.

Current situation - I have refreshed the .key file on the puppet master using a modified version of the create_ecdsa_cert script, and I have pushed the new key to the staging k8s secrets for sessionstore only. I've also updated the cert file in the helmfile configuration for sessionstore, but it hasn't picked it up because that part of the config wasn't checksummed to recreate the pods (fixed in this change). Tomorrow I will try to roll to codfw and, if successful, eqiad.

May 22 2024, 6:22 PM · Patch-For-Review, serviceops, Data-Persistence
hnowlan added a comment to T363996: Sessionstore's discovery TLS cert will expire before end of May 2024.

Steps that I see:

  • Renewing the existing cergen cert to give us breathing room just in case. We're looking at less than 2 weeks of headroom for a major change to one of our most critical services
  • Add toggleable mesh support
  • Get a baseline for performance in staging by running siege for a few hours
  • Deploy mesh-enabled version to staging, compare performance before and after
  • Roll forward if everything looks okay

Only concern around testing in staging is having reasonably representative infrastructure, given that it's both using cassandra-dev, and in codfw from pods in eqiad. That said, we're just looking for a baseline and not identical numbers.

+1! Not sure how to renew the existing cert since IIUC cergen wasn't used, but we can try to check in old tasks to see what it was done.

I'm out next week, and the cert is about to expire, so I'll try to reverse engineer this. If anyone has first-hand knowledge of how this was done, please let me know!

May 22 2024, 2:35 PM · Patch-For-Review, serviceops, Data-Persistence
hnowlan added a comment to T353464: Migrate wikikube control planes to hardware nodes.

All codfw wikikube-ctrl nodes are operational

May 22 2024, 11:06 AM · Patch-For-Review, serviceops, Prod-Kubernetes, Kubernetes

May 21 2024

hnowlan created P62772 (An Untitled Masterwork).
May 21 2024, 12:29 PM
hnowlan updated the task description for T353464: Migrate wikikube control planes to hardware nodes.
May 21 2024, 9:27 AM · Patch-For-Review, serviceops, Prod-Kubernetes, Kubernetes
hnowlan added a comment to T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL.

I suspect the fix for this is a relatively small change on the API gateway, but the change is a global one so I will need to take some time to test this, even if the impact is to make things standards-compliant. Hoping to get to it later this week

May 21 2024, 9:20 AM · Machine-Learning-Team

May 20 2024

hnowlan updated the task description for T353464: Migrate wikikube control planes to hardware nodes.
May 20 2024, 4:34 PM · Patch-For-Review, serviceops, Prod-Kubernetes, Kubernetes
hnowlan updated the task description for T362323: Move 100% of external traffic to Kubernetes.
May 20 2024, 4:14 PM · Patch-For-Review, MoveComms-Support, SRE, Traffic, serviceops, MW-on-K8s