Page MenuHomePhabricator

Move old transparency report pages to historical URLs and setup redirect
Open, NormalPublic

Description

As part of preparing to launch a new transparency report website, the pages currently at https://transparency.wikimedia.org/ and its sub-pages need to be moved to a new domain at https://transparency.wikimedia.org/historical to free up the old URLs for the new report website.

transparency.wikimedia.org should then redirect to https://wikimediafoundation.org/about/transparency/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2019, 10:38 PM
Varnent renamed this task from Move old transparency report pages to historical URLs to Move old transparency report pages to historical URLs and setup redirect.Aug 17 2019, 12:39 AM
Varnent added projects: Operations, serviceops.
Varnent updated the task description. (Show Details)
Varnent removed subscribers: Operations, WMF-Legal.
Joe added a subscriber: Joe.Aug 19 2019, 7:12 AM

Sorry, the indications you give here are in contrast with each other:

you seem to want a full redirect of transparency.wikimedia.org to the new url - but also a redirect to https://transparency.wikimedia.org/historical. While I guess we could avoid doing a redirect for the old pages, that would result in a ton of broken links.

Also: all the links and the images/assets in the old transparency report site are without scope - who would be responsible to fix them?

BBlack removed Dzahn as the assignee of this task.Aug 19 2019, 10:50 AM
BBlack added subscribers: Dzahn, BBlack.

Unassign for now. The actual ask here is unclear in terms of technical details.

CDanis triaged this task as Normal priority.Aug 19 2019, 8:31 PM
BBlack changed the task status from Open to Stalled.Aug 20 2019, 3:30 PM

Just stalling this so that anyone following it doesn't try to pick this up or move with it yet. There's an ongoing email thread about clarifying this task, and we're waiting for at least one person to return from a vacation and provide guidance before we move forward here.

As I understand it - Legal would like the existing microsite located at transparency.wikimedia.org to be relocated to transparency.wikimedia.org/historical - and then for the top-level transparency.wikimedia.org domain to redirect to the report's new location at https://wikimediafoundation.org/about/transparency/

Summary:

I believe the annual report has a similar setup if that reference helps.

@Varnent: For the redirects: just the main https://transparency.wikimedia.org/ URL? Or also the sub-pages like https://transparency.wikimedia.org/content.html ? I haven't yet looked at the content for the move to /historical/, but I assume it's relatively-simple.

Why didn't we just move the whole of transparency.wikimedia.org over to Automattic and handle the redirects and/or content changes there? It's not like any of it's staying original anyways.

BBlack changed the task status from Stalled to Open.Aug 27 2019, 9:30 PM

@Varnent: For the redirects: just the main https://transparency.wikimedia.org/ URL? Or also the sub-pages like https://transparency.wikimedia.org/content.html ? I haven't yet looked at the content for the move to /historical/, but I assume it's relatively-simple.

I think just the main URL and perhaps faq.html - but I do not think there are many external links. I defer to Legal on that one though.

Why didn't we just move the whole of transparency.wikimedia.org over to Automattic and handle the redirects and/or content changes there? It's not like any of it's staying original anyways.

There are costs with each additional site we add, and I believe the plan is to shut the historical site down completely once Design is done working on modules to import the rest of the historical content into the organization website.

Dzahn added a comment.Aug 29 2019, 1:15 PM

One thing that you can already do is create https://transparency.wikimedia.org/historical/ since that is just inside the content repo that is under your control and is a requirement before we can add rewrite/redirect rules to it in the webserver config.

One thing that you can already do is create https://transparency.wikimedia.org/historical/ since that is just inside the content repo that is under your control and is a requirement before we can add rewrite/redirect rules to it in the webserver config.

I am unfamiliar with the functional technical setup of the existing microsite, and so not sure how change where it builds. It appears to build itself using a script I think we developed. Also, this microsite's content repo is on Foundation servers - so we all have access to it.

Dzahn added a comment.Aug 30 2019, 7:15 AM

I am confused by this statement. The content repo is explicitely setup this way so that people working on this site can merge content changes without having to contact SRE and when i look at the repo in Gerrit i see that you just merged something in it yourself. https://gerrit.wikimedia.org/r/q/project:wikimedia%252FTransparencyReport

Dzahn added a comment.Aug 30 2019, 7:20 AM

All the changes you can see above, including your own, have been deployed to production servers automatically in the past.

The puppet code is:

16     git::clone { 'wikimedia/TransparencyReport':
17         ensure    => latest,

which means all that happens is that puppet pulls from the repo people have been using all this time. I don't see why that would be different now.

Varnent added a comment.EditedAug 30 2019, 12:42 PM

All the changes you can see above, including your own, have been deployed to production servers automatically in the past.
The puppet code is:

16     git::clone { 'wikimedia/TransparencyReport':
17         ensure    => latest,

which means all that happens is that puppet pulls from the repo people have been using all this time. I don't see why that would be different now.

I am incredibly confused right now. I have made some additions to the code it builds from. However, that was basically adding a line of HTML. I am not familiar enough with the microsite setup to know how to tell the script to build it in a different place. I suppose I can take a look later, but if you all already know how to do it - is there a reason a comms person needs to do that coding? Again, this is not our project, we are offering some help and were asked to help clarify a technical question. My familiarity with the old site ends at knowing how to update some of the numbers and where to add code for a banner.

Varnent added a comment.EditedAug 30 2019, 12:44 PM

I am confused by this statement. The content repo is explicitely setup this way so that people working on this site can merge content changes without having to contact SRE and when i look at the repo in Gerrit i see that you just merged something in it yourself. https://gerrit.wikimedia.org/r/q/project:wikimedia%252FTransparencyReport

Right - except we are not talking about content changes - we are trying to change the technical setup of the site.

Some clarifying points:

  1. Code - means different things to different people in different contexts. From the SRE perspective, the whole repo is "content", even though there's a build script in there and various browser-related "languages" involved. From the server-side perspective it's all just static content to us.
  2. Historically SRE doesn't manage anything inside this repo. Our role in this is simply to provide a static content hosting service that auto-deploys whatever is committed to the repo. Whomever does the content repo editing runs a build script and commits the outputs back to the repo itself as static content.
  3. Historically, it seems the primary editors of this repo have been the Product Design team, specifically e.g. @Prtksxna and @Volker_E .

Okay - so basically SRE will not setup the requested configuration until we figure out how to setup the script to output the old site into a subdirectory - correct?

BBlack added a comment.EditedAug 30 2019, 1:30 PM

There are two separate things to do here:

  1. The move of content to the new /historical/ sub-path: For this, nothing will actually change on the side of this that SRE actually manages (the deployment of the static content in the repo to a microsite static service that hosts the domainname). Somebody (SRE could do it perhaps, but we don't historically, and Product Design seems a more-natural fit since they've worked on it before) has to make changes to the build scripts and/or other contents in the repo to move things to /historical/ and keep the internal links working. If the repo is effectively-dead at this point it may not be worth updating the build system if that's problematic; it may be simpler to alter the existing outputs manually (e.g. git mv the pages into the /historical/ path and then edit any links that need editing with a sed script or whatever).
  1. The redirect of https://transparency.wikimedia.org/ to the new foundation blog URL(s): This is separate and doesn't happen inside the above repo, and is on the SRE side of things in terms of technical setup. Probably the above should happen first, but they should happen close in time to avoid user confusion. I'd like us to be explicit about the requirements, though. What I'm hearing/interpreting so far is that the following two single-URL redirects should be configured, but all other possible URLs should continue being served directly from the old transparency site (which means they'll be 404s if anyone has them saved or linked anywhere, since the content switched out to /historical/):
  • https://transparency.wikimedia.org/ -> https://wikimediafoundation.org/about/transparency/
  • https://transparency.wikimedia.org/faq.html -> https://wikimediafoundation.org/about/transparency/faq/

I would've expected you to perhaps add to that list the other primary pages, e.g.:

  • https://transparency.wikimedia.org/privacy.html -> https://wikimediafoundation.org/about/transparency/privacy/
  • https://transparency.wikimedia.org/content.html -> https://wikimediafoundation.org/about/transparency/content/
  • https://transparency.wikimedia.org/stories.html -> ??? (maybe just the main transparency about page, since there's no equivalent?)

Another option would be to simply redirect *all* URLs on the old site to https://wikimediafoundation.org/about/transparency/ by default, and carve out an exception just for https://transparency.wikimedia.org/historical/ so that it can be linked from the foundation site without creating a loop.

(Also - a side note while I was digging in various things: https://wikimediafoundation.org/advocacy/ has a link to the old transparency site still, should probably be updated to the new one on the main blog (and any other such links that might exist around the site))

Awesome - thank you @BBlack - super helpful and I get now what you all need done. Apologies for the confusion on my part, I thought this was done via backend mapping that basically pointed the repo to a specific URL which could just be updated. I think I understand how what you are indicating is the setup.

I will talk with Legal about the redirect setups, I suspect an all with exception setup makes the most sense.

I think all links to the report point to transparency.wikimedia.org - which will essentially continue to be the URL used (it also appears on the printed report which has already been released). So I think we are okay to leave them assuming we are able to make this change reasonably soon.

On the broader meta-topics: Long-lived canonical URLs are important, and I think that transparency.wikimedia.org seems like a more-natural fit for that (and to continue printing and publishing it). IMHO, the ideal end-game here* would be to move transparency.wikimedia.org to Automattic hosting completely and have it serve the new content directly, as well as the historical parts, and have the blog's links link into it. The currently-outlined (interim?) setup sends confusing social and technical signals (e.g. to search engines) about which of https://transparency.wikimedia.org/ or https://wikimediafoundation.org/about/transparency/ is the canonical location of the content.

  • - (within the scope of what's already been decided and ignoring the known pain-points we've communicated before, re: how the privacy and transparency policies of our main sites, documented on these sites, are different from Automattic's privacy and transparency policies in hosting the content about our policies, etc...)

Change 533537 had a related patch set uploaded (by Varnent; owner: Varnent):
[wikimedia/TransparencyReport@master] Testing if copying files to subdirectory will produce desired build - related to T230638

https://gerrit.wikimedia.org/r/533537

On the broader meta-topics: Long-lived canonical URLs are important, and I think that transparency.wikimedia.org seems like a more-natural fit for that (and to continue printing and publishing it). IMHO, the ideal end-game here* would be to move transparency.wikimedia.org to Automattic hosting completely and have it serve the new content directly, as well as the historical parts, and have the blog's links link into it. The currently-outlined (interim?) setup sends confusing social and technical signals (e.g. to search engines) about which of https://transparency.wikimedia.org/ or https://wikimediafoundation.org/about/transparency/ is the canonical location of the content.

  • - (within the scope of what's already been decided and ignoring the known pain-points we've communicated before, re: how the privacy and transparency policies of our main sites, documented on these sites, are different from Automattic's privacy and transparency policies in hosting the content about our policies, etc...)

I agree that is a better long-term setup and is something I can bring up with Automattic. Is it safe to say this is something that could be done relatively easy if the site were hosted internally?

Let me know if you think https://gerrit.wikimedia.org/r/533537 will work - thanks! :)

I agree that is a better long-term setup and is something I can bring up with Automattic. Is it safe to say this is something that could be done relatively easy if the site were hosted internally?

I'm not sure exactly what you mean by "if the site were hosted internally" here, there's a lot of possible interpretations! My best guess would be that you mean transparency.wm.o continues to be a static-content microsite hosted by SRE and driven by the git repo like today, and you want to take the Automattic-developed content and stuff it into the git repo so that it appears through our hosting of the microsite (wouldn't support any "live" WP features like comments, analytics would be our standard production analytics, but on the plus side it would comply with our normal privacy and transparency policies), and then have the foundation site link into it? That could work, if Automattic can output the static data in a workable form (possibly as gerrit changes like the one you link below, for someone here to +2).

Let me know if you think https://gerrit.wikimedia.org/r/533537 will work - thanks! :)

Probably not, as a quick glance at various HTML files shows absolute site-local links that need the historical prefix added, e.g. line 12 of /build/privacy.html has near the beginning of that long line, the snippet: <link href="/stylesheets/bootstrap.min.css" rel=stylesheet />, which would need editing to reference /historical/stylsheets/boostrap.min.css, or to be a relative link (no leading slash, in this case). There are probably lots of such examples throughout.

Sorry - should have clarified. I meant if in theory down the road the wikimediafoundation.org site was moved to our servers - the setup you are describing would be easy to do? Just getting some context and verifying assumptions before going in to talk with Automattic. :)

Hi all, just wanted to see if there was any further info or clarification needed from Legal. We really appreciate everyone thinking through how to get this done. It sounds like the next step is seeing if this patch works?