Page MenuHomePhabricator

[Bug] Page content service is deployed with localhost links to the CSS and JS, breaking all pages that have been edited recently
Closed, ResolvedPublic

Description

Steps to Reproduce

  1. Open https://en.wikipedia.org/api/rest_v1/page/mobile-html/Politics
  2. Observe the page

Expected Results

  • CSS/JS loads properly
  • CSS/JS is properly linked
<link rel="stylesheet" href="https://meta.wikimedia.org/api/rest_v1/data/css/mobile/base">
[...]
<script src="https://meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pcs"></script>

Actual Results

  • CSP issues:
Refused to load the stylesheet 'http://localhost:6011/meta.wikimedia.org/v1/data/css/mobile/base' because it violates the following Content Security Policy directive: "style-src app://meta.wikimedia.org https://meta.wikimedia.org app://*.wikipedia.org https://*.wikipedia.org 'self' 'unsafe-inline'". Note that 'style-src-elem' was not explicitly set, so 'style-src' is used as a fallback.

Politics:1 Refused to load the stylesheet 'http://localhost:6011/meta.wikimedia.org/v1/data/css/mobile/pcs' because it violates the following Content Security Policy directive: "style-src app://meta.wikimedia.org https://meta.wikimedia.org app://*.wikipedia.org https://*.wikipedia.org 'self' 'unsafe-inline'". Note that 'style-src-elem' was not explicitly set, so 'style-src' is used as a fallback.

Politics:1 Refused to load the script 'http://localhost:6011/meta.wikimedia.org/v1/data/javascript/mobile/pcs' because it violates the following Content Security Policy directive: "script-src app://meta.wikimedia.org https://meta.wikimedia.org 'unsafe-inline'". Note that 'script-src-elem' was not explicitly set, so 'script-src' is used as a fallback.
  • CSS/JS is linked to localhost
<link rel="stylesheet" href="http://localhost:6011/meta.wikimedia.org/v1/data/css/mobile/base">
[...]
<script src="http://localhost:6011/meta.wikimedia.org/v1/data/javascript/mobile/pcs"></script>

Environments Observed

Production

Additional notes

Varnish and RESTBase caches will need to be purged of the articles that were rendered incorrectly with the localhost links

Event Timeline

JoeWalsh created this task.Wed, Sep 9, 4:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptWed, Sep 9, 4:05 PM
JoeWalsh triaged this task as Unbreak Now! priority.Wed, Sep 9, 4:06 PM
JoeWalsh updated the task description. (Show Details)
MSantos added a subscriber: Joe.Wed, Sep 9, 4:18 PM
bearND updated the task description. (Show Details)Wed, Sep 9, 4:41 PM
Joe added a subscriber: Pchelolo.Wed, Sep 9, 4:45 PM

So the broken configuration (that I deployed, sorry about that) has been fixed.

Now the problem that needs to be solved is to purge the broken pages from restbase.

We need to basically purge all pages cached by restbase between 08:40 and 16:10 today (and never refreshed).

Sadly there doesn't seem to be a way to do so in cassandra, so @Pchelolo is trying to find a way to do so.

Change 626189 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[operations/deployment-charts@master] Update mobileapps to 2020-09-09-171242-production

https://gerrit.wikimedia.org/r/626189

Change 626189 merged by jenkins-bot:
[operations/deployment-charts@master] Update mobileapps to 2020-09-09-171242-production

https://gerrit.wikimedia.org/r/626189

Joe added a comment.Wed, Sep 9, 5:34 PM

Status update: we've decided to invalidate content for mobile-html in restbase, so whatever is not cached at the edge will be re-rendered if not rendered since the deployment of the new restbase version.

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:35:56Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437

bearND updated the task description. (Show Details)Wed, Sep 9, 5:37 PM

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:41:56Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437 (duration: 06m 00s)

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:42:57Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2

Joe closed this task as Resolved.Wed, Sep 9, 5:43 PM
Joe claimed this task.

This bug should now be resolved. Please reopen if this behaviour persists.

bearND added a subscriber: bearND.Wed, Sep 9, 5:44 PM

This patch should be linked as the actual fix: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/626178/, reverting changes from https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/626102.

I guess nobody noticed the issue when this config was deployed to staging only.

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:52:35Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2 (duration: 09m 38s)

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:52:42Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:59:29Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 06m 47s)

Mentioned in SAL (#wikimedia-operations) [2020-09-09T17:59:56Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout

Mentioned in SAL (#wikimedia-operations) [2020-09-09T18:02:51Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 02m 55s)

Joe reopened this task as Open.Wed, Sep 9, 6:40 PM
Joe added a project: Traffic.
Restricted Application added a project: Operations. · View Herald TranscriptWed, Sep 9, 6:40 PM
bearND added a comment.Wed, Sep 9, 6:45 PM

Is there a separate task for the mobile-html-offline-resources issue or are we combining that?

Joe added a comment.Wed, Sep 9, 6:45 PM

Sadly, we still have caching issues:

These latter urls have a max-age of 1 day, it would be needed to purge them all (they're not computationally expensive, so it's ok to just ban them).

Sadly, I tried to do what wikitech suggests:

sudo cumin -b 1 A:cp-text "varnishadm -n frontend ban 'req.url ~ \"^/api/rest_v1/page/mobile-html-offline-resources/\"'"

and while this actually purged the varnish frontends, it did nothing for ATS, and I find no indication on how to purge content there. Any advice would be welcome.

Change 626210 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/puppet@production] trafficserver: Cache-ban pages with localhost links from page content service

https://gerrit.wikimedia.org/r/626210

Change 626210 merged by RLazarus:
[operations/puppet@production] trafficserver: Cache-ban pages with localhost links from page content service

https://gerrit.wikimedia.org/r/626210

RLazarus closed this task as Resolved.Wed, Sep 9, 9:28 PM
RLazarus added a subscriber: RLazarus.

I purged /api/rest_v1/page/mobile-html/ and /api/rest_v1/page/mobile-html-offline-resources/ in ATS, then re-ran Joe's command from T262437#6448216 for both. This should now be fully expunged from cache, although it may persist in your browser cache for up to a day or until you refresh.

After waiting 24 hours, I'll revert the ATS patch.

Change 627328 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/puppet@production] Revert "trafficserver: Cache-ban pages with localhost links from page content service"

https://gerrit.wikimedia.org/r/627328

Change 627328 merged by RLazarus:
[operations/puppet@production] Revert "trafficserver: Cache-ban pages with localhost links from page content service"

https://gerrit.wikimedia.org/r/627328