Page MenuHomePhabricator

Prepare Phame to support heavy traffic for a Tech Department blog
Open, Stalled, NormalPublic

Description

Steps to Reproduce (tentative)
  1. Create a new blog post via https://phabricator.wikimedia.org/phame/blog/view/15/
  2. Browse to blog via https://techblog.wikimedia.org and view the post.
Current Results

Blog page is served directly from Phabricator.
Blog entry URLs are ugly.

Post would currently have urls like https://phabricator.wikimedia.org/phame/live/15/post/{id_number}/ with the home page at https://phabricator.wikimedia.org/phame/live/15/.

Requested Results

Page is served from Wikimedia CDN cache if possible.
Blog URLs are prettier,
with a post urls as https://techblog.wikimedia.org/post/{id_number}/ and the home page at https://techblog.wikimedia.org.

Details

Brandon suggested that, with the creation of a specific URL (subdomain?) particular to Phame, it would be much easier to cache all of this data and reduce the risk that Phabricator is impaired by heavy traffic to a Phame blog post.

We've discussed re-using the old defunct (currently just a redirect) techblog.wikimedia.org as the pretty and cacheable entrypoint. There's some configuration work to do on the phame side, as well as DNS and edge cache support. Probably this public-facing URI will not allow authentication at all (readonly), enforced at the cache layer by stripping Authorization/Cookie headers.

(This task emerged from a discussion of Phame blogging at the SRE offsite.)

See also:

Event Timeline

JAufrecht triaged this task as Normal priority.Jun 18 2019, 6:57 PM
JAufrecht created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
BBlack updated the task description. (Show Details)Jun 19 2019, 1:25 PM
BBlack added a project: Traffic.
Restricted Application added a project: Operations. · View Herald TranscriptJun 19 2019, 1:25 PM

If re-using techblog.wikimedia.org, please take care not to break existing urls. The root path would be fine to change as it was always pointing to a mutable post overview. However, the rest of the domain used to be the blog now at blog.wikimedia.org. Urls such as https://techblog.wikimedia.org/2009/07/power-outage-in-wikimedias-european-servers/ currently work and have been used in publications elsewhere, and are expected to continue to work (with redirect).

This could be done in numerous ways of course. The simplest would be to move the redirect we currently have at /* and reduce its scope to only the 2007-2018 paths.

Implementing a blanket redirect to the legacy blog URI for ^/20(0[7-9]|1[0-8])/ should be feasible in VCL or Lua at the edge. Or alternatively, we could also just leave it alone and pick another hostname, too.

ema moved this task from Triage to Caching on the Traffic board.Jun 21 2019, 8:09 AM
mmodell added a comment.EditedJun 21 2019, 12:37 PM

This seems like a good idea, however, the upstream documentation has a warning that there are some issues with an external blog / dedicated subdomain: https://secure.phabricator.com/book/phabricator/article/phame/#external-blogs

Nevertheless, I don't think that any issues will be insurmountable...

greg removed mmodell as the assignee of this task.Jul 6 2019, 5:21 AM
greg moved this task from INBOX to Later / Need volunteer on the Release-Engineering-Team-TODO board.
greg added a subscriber: mmodell.
jijiki added a subscriber: jijiki.Jul 8 2019, 4:59 PM
ayounsi added a subscriber: ayounsi.Jul 8 2019, 4:59 PM
CDanis added a subscriber: CDanis.Jul 11 2019, 4:44 PM
JAufrecht renamed this task from Set up a subdomain for Phame to enable caching to Prepare Phame to support heavy traffic for a Tech Department blog.Jul 17 2019, 7:01 PM
JAufrecht updated the task description. (Show Details)
herron added a subscriber: herron.Jul 31 2019, 4:31 PM

TODO list here from my POV, as best I understand things:

  1. Decide on a public pretty domainname for these more public/promoteable/cacheable tech blog posts (and if it's the legacy techblog.wikimedia.org instead of a fresh name, decide how we're handling historical redirects or whatever. (who's deciding, or was it decided?))
  2. Configure a new phame blog (separate from the current internal team ones) for these higher-visibility posts, and configure it properly for the vanity domain in phame. (I don't think I have permissions to do any of these things, so I think this involves someone from releng?)
  3. Configure AuthDNS + VCL to make the hostname work and be cacheable (a new alias for text-lb in AuthDNS, and VCL changes to strip cookies/auth on inbound, route it to phab, and let the response be Varnish-cacheable) - probably not wise to deploy this ahead of (2) being in place.

Step 3 is where Traffic needs to do some work, but it's relatively-light work. We just need 1 and 2 sorted out first so we know the details we're working with.

faidon assigned this task to JAufrecht.Jul 31 2019, 5:14 PM
greg added a subscriber: greg.Jul 31 2019, 5:27 PM
  1. Configure a new phame blog (separate from the current internal team ones) for these higher-visibility posts, and configure it properly for the vanity domain in phame. (I don't think I have permissions to do any of these things, so I think this involves someone from releng?)

Yeah, those steps (outlined at https://secure.phabricator.com/book/phabricator/article/phame/#external-blogs ) seem straight-forward for us to do once we have the domain name and "parent site name/url" (see documentation) decided.

Decide on a public pretty domainname for these more public/promoteable/cacheable tech blog posts (and if it's the legacy techblog.wikimedia.org instead of a fresh name, decide how we're handling historical redirects or whatever. (who's deciding, or was it decided?))

I think it should be techblog.wikimedia.org, because even if that introduces a complication around redirect based on time period, that's a much smaller potential complication than those arising from multiple domain names. I propose as the decision rule that if the people currently involved in this ticket can agree, it's decided. My decision rule for making that proposal is a) forgiveness over permission and b) not that hard to reverse.

JAufrecht reassigned this task from JAufrecht to greg.Jul 31 2019, 6:10 PM

passing to Greg for step 2, configuring the blog and vanity name.

Restricted Application added a project: User-greg. · View Herald TranscriptJul 31 2019, 6:10 PM

I think it should be techblog.wikimedia.org, because even if that introduces a complication around redirect based on time period

Do we have any idea what the redirects' regexen should look like? Currently https://techblog.wikimedia.org/ root URI redirects to https://blog.wikimedia.org/c/technology/ , but I don't know examples of what its old URIs might look like to see if they have custom redirects we could base something on, etc...

I tested the "Move Post" feature in Phame today - on the foresight that some people will likely try to move one or two handpicked posts from their existing Phame blogs to (the archive of) the new sub-blog (without them appearing as "new").

This mostly works fine and as expected (keeping timelines and existing urls working, etc.), but for teams using "Live" feature of urls, those don't survive a move currently. I've reported this bug upstream.

Replying to myself earlier: apparently they're datestamped URIs beginning with /yyyy/mm/, examples being:

https://techblog.wikimedia.org/2010/05/
https://techblog.wikimedia.org/2010/05/29/xml-dumps-resumed/

so perhaps a blanket redirect from the new techblog's ^/20[01][0-9] to a copy of the same URI on blog.wikimedia.org would be sufficient?

[..] apparently they're datestamped URIs beginning with /yyyy/mm/, examples being:
https://techblog.wikimedia.org/2010/05/29/xml-dumps-resumed/
so perhaps a blanket redirect from the new techblog's ^/20[01][0-9] to a copy of the same URI on blog.wikimedia.org would be sufficient?

Yes. The old blog was a WordPress instance with fairly standard "permalink" configuration for posts, using the format /{yyyy}/{mm}/{slug}, such as https://techblog.wikimedia.org/2009/07/02/power-outage-in-wikimedias-european-servers/.

There are also "archive" urls at /{yyyy}, /{yyyy}/, /{yyyy}/{mm}, and /{yyyy}/{mm}/, such as https://techblog.wikimedia.org/2009/07/.

This was moved to blog.wikimedia.org as-is, with as far as I know all arbitrary urls redirecting. The exception being the root / which redirects to the technology category instead. This means in theory posts published after the move could be accessed through this older domain as well, but.. this isn't an issue currently because this "new" blog is also "old" again (read-only as of 2018). So, if we capture anything from /2007/* to /2018/* we should be good. (See also T226044#5268590).

Heh, apparently I can't even remember things I read and said before even when they're right above me in the same ticket!

I tested the "Move Post" feature in Phame today [...] for teams using "Live" feature of urls, those don't survive a move currently.

That bug is now fixed in upstream by https://secure.phabricator.com/D20688 in https://secure.phabricator.com/T13353

greg reassigned this task from greg to mmodell.Aug 1 2019, 4:51 PM
  1. Configure a new phame blog (separate from the current internal team ones) for these higher-visibility posts, and configure it properly for the vanity domain in phame. (I don't think I have permissions to do any of these things, so I think this involves someone from releng?)

Yeah, those steps (outlined at https://secure.phabricator.com/book/phabricator/article/phame/#external-blogs ) seem straight-forward for us to do once we have the domain name and "parent site name/url" (see documentation) decided.

passing to Greg for step 2, configuring the blog and vanity name.

Giving to @mmodell to do it as I trust his context better. And putting on the backlog for this month.

Ok I've created https://phabricator.wikimedia.org/phame/blog/view/15/ but I would appreciate creative input on the title, subtitle and description.

On a more technical level, what should we use for the "parent site name" (and url?). This is needed for the breadcrumb navigation on the "live" blog view.

title, subtitle and description

I think what you have now, "Wikimedia Tech Blog", "Selected blog posts from Wikimedia Technology", is good enough to proceed. The Description, which is displayed at the bottom of the live view and is basically a footer, AFAICT, so we probably don't need the existing text ("Select posts from the Wikimedia Foundation's Technology Department") and could go with pure boilerplate:

Except where otherwise noted, the content of this site is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported license.  <a href="https://foundation.wikimedia.org/wiki/Cookie_statement">Cookie Statement</a>, <a href="https://foundation.wikimedia.org/wiki/Wikimedia:General_disclaimer">Disclaimers</a>, <a href="https://wikimediafoundation.org/privacy-policy/>Privacy Policy</a>, <a href="https://foundation.wikimedia.org/wiki/Terms_of_Use/en">Terms of Use</a>.

On a more technical level, what should we use for the "parent site name" (and url?). This is needed for the breadcrumb navigation on the "live" blog view.

Looking at the other blogs, it seems like the breadcrumb is

Phame > Blogs > Wikimedia Tech Blog

The ideal breadcrumb might be

Wikimedia Foundation > News > Technical Blog

with links to https://wikimediafoundation.org/ and https://wikimediafoundation.org/news/?

General reminder about naming things: WMF is part of the Wikimedia movement. Please do not call stuff "Wikimedia Tech" if "WMF Tech" is meant. Thanks.

The ideal breadcrumb might be
Wikimedia Foundation > News > Technical Blog
with links to https://wikimediafoundation.org/ and https://wikimediafoundation.org/news/?

Phame only gives me one link field, so how about:

Wikimedia FoundationTechnical Blog

Wikimedia Foundation (https://wikimediafoundation.org/) → Technical Blog

Seems fine.

Urbanecm removed the point value for this task.Aug 31 2019, 11:27 PM
Urbanecm added a subscriber: Urbanecm.

[bulk] Setting points to "", given it doesn't make any sense to have them as "0".

mmodell moved this task from To Triage to Infrastructure on the Phabricator board.Sep 13 2019, 7:13 PM

From the merged task:

Blog posts on phame cannot currently be cached by our CDN, and hence cannot be shared on link aggreagators for the (unfounded?) fear of being successful and melting phab down.

Phame currently returns the following response headers that make it uncacheable:

[...]
< Cache-Control: no-store
[...]
< Set-Cookie: phsid=REDACTED; expires=Mon, 01-Apr-2024 09:18:05 GMT; Max-Age=157680000; path=/; domain=phabricator.wikimedia.org; secure; httponly

For posts with visibility set to Published, Cache-Control should be set by phame to some value that allows caching, such as Cache-Control: max-age=1440 (or whatever number of seconds is deemed appropriate).

I'm not sure what the purpose of the cookie is, but I can read phame posts just fine without sending it from my client, so I guess it's not critical functionality-wise and can be dealt with at the caching layer?

mmodell removed mmodell as the assignee of this task.Sep 13 2019, 7:21 PM

This is unblocked on my end, @ema feel free to proceed when you are back from vacation. I'll be glad to help however I can.

Also important, @epriestley's comment at T219978#5346100

Krinkle updated the task description. (Show Details)Sep 13 2019, 7:41 PM
greg added a comment.Sep 25 2019, 10:21 PM

@mmodell @ema After a discussion with TechEng and SRE and the rest of tech-mgt this week in Portland, we are going to hold off on this work (phame as techblog) and instead wait for guidance from TechEng as they have a plan in place and people dedicated to the work. Basically this is going to be rolled into TechEng work on our communication and social plans from a wikimedia tech perspective.

greg changed the task status from Open to Stalled.Sep 25 2019, 10:30 PM

@mmodell @ema After a discussion with TechEng and SRE and the rest of tech-mgt this week in Portland, we are going to hold off on this work (phame as techblog) and instead wait for guidance from TechEng as they have a plan in place and people dedicated to the work. Basically this is going to be rolled into TechEng work on our communication and social plans from a wikimedia tech perspective.

Just curious -- what's the expected timeframe on this?

greg added a subscriber: Bmueller.Sep 27 2019, 8:18 PM

Just curious -- what's the expected timeframe on this?

I'll let @Bmueller answer that :)

@Bmueller: Could you answer the last question, please?

Hey @CDanis, sorry that I missed your question! (thanks for the ping, @Aklapper :-)

@mmodell @ema After a discussion with TechEng and SRE and the rest of tech-mgt this week in Portland, we are going to hold off on this work (phame as techblog) and instead wait for guidance from TechEng as they have a plan in place and people dedicated to the work. Basically this is going to be rolled into TechEng work on our communication and social plans from a wikimedia tech perspective.

Just curious -- what's the expected timeframe on this?

So, there is a bunch of coordination and preparation work that needs to be done beforehand - we're currently hoping to be ready around mid/end of Nov, but it's hard to predict to the point, as we also depend on folks outside of our team. I - or @srodlund, who manages all things techblog - will let you know as soon as we know more :-)

Krinkle updated the task description. (Show Details)Oct 16 2019, 1:53 AM

Hey all -- I am currently seeking some answers to some basic infrastructure questions. Unfortunately, I don't have control over who will answer and when, but I am trying :-) Once I have those answered, I can proceed and provide you with a better timeline.