Page MenuHomePhabricator

Import blog posts from old https://blog.wikimedia.org/c/technology/ into archive of the new Tech blog
Open, MediumPublic

Description

If this is possible and useful.

Note: In my understanding, this does not mean "Redirect URLs" but is about functionality in the blog software itself.

Also note that we must check that the license is the same as the default license of the tech blog. If it is not, explicitly state the license of imported tasks.

Event Timeline

Aklapper created this task.Jan 22 2020, 1:19 PM
Aklapper triaged this task as Lowest priority.Jan 22 2020, 1:39 PM

There are also a number of people who have published on Phab or in their own blogs in lieu of a central place to publish posts. We should determine whether offer to link to, import (if we can), or republish recent posts, which have been published elsewhere.

TJones added a subscriber: TJones.Feb 18 2020, 10:46 PM

There are also posts at https://wikimediafoundation.org/news/category/technology/ that came after blog.wikimedia.org stopped posting new blogs.

bd808 added subscribers: bd808, Bmueller, Krinkle, BBlack.

At T226044, it was planned to self-host with Phabricator. The the domain itself is to be rerouted at the DNS layer instead, and using WordPress, this means we need to preserve URL functionality by different means. Is there a plan for this in place? One option could be to import all its posts, taking care to use the same slugs, dates, and permalink structure.

[..] apparently they're datestamped URIs beginning with /yyyy/mm/, examples being:
https://techblog.wikimedia.org/2010/05/29/xml-dumps-resumed/
so perhaps a blanket redirect from the new techblog's ^/20[01][0-9] to a copy of the same URI on blog.wikimedia.org would be sufficient?

Yes. The old blog was a WordPress instance with fairly standard "permalink" configuration for posts, using the format /{yyyy}/{mm}/{slug}, such as https://techblog.wikimedia.org/2009/07/02/power-outage-in-wikimedias-european-servers/.
There are also "archive" urls at /{yyyy}, /{yyyy}/, /{yyyy}/{mm}, and /{yyyy}/{mm}/, such as https://techblog.wikimedia.org/2009/07/.
This was moved to blog.wikimedia.org as-is, with as far as I know all arbitrary urls redirecting. The exception being the root / which redirects to the technology category instead. This means in theory posts published after the move could be accessed through this older domain as well, but.. this isn't an issue currently because this "new" blog is also "old" again (read-only as of 2018). So, if we capture anything from /2007/* to /2018/* we should be good. (See also T226044#5268590).

We are using the same permalink structure for the new blog. I think the right thing to do is to recreate the posts without comments there. In theory we could have done this with a database dump + import cycle from the legacy blog, but the logistics of that were more than we could handle and stay on the launch timeline. The legacy posts will 404 for a bit until we get them backfilled, but we should be able to get it done not too long after launch.
The articles we would want to preserve are the ones under https://blog.wikimedia.org/c/technology/

bd808 raised the priority of this task from Lowest to Medium.Fri, Mar 20, 11:28 PM
bd808 added a subscriber: Varnent.Fri, Mar 20, 11:37 PM

@Varnent can you help us out with this by either giving me access to the admin console for blog.wikimedia.org so I can make a content dump for the technology category there, or by making the dump yourself and getting it to me so I can do some testing and then upload it into the new blog?

Aklapper updated the task description. (Show Details)Tue, Mar 24, 6:07 PM
bd808 added a comment.Wed, Mar 25, 5:11 PM

I chatted with @varnet on irc yesterday (2020-03-24). He should be able to help with this in the next week or so.

Varnent added a comment.EditedWed, Mar 25, 11:03 PM

I am working on getting access to this site's main database dump. Basically, it is the only one of our VIP sites I did not yet have full access to as it has been inactive. I imagine a direct database dump is the easiest solution - but I can also look into getting you access if you prefer.

Full disclosure, because this site has been dormant, it is not as operational as most of our VIP sites and using an older version of VIP's configuration. However, we are beginning to start it back up and get it aligned with our other VIP sites as we are repurposing the instance for the new community blog that is replacing the Space blog. So put another way, worst case scenario whatever barrier happens to be in place today should be removed in the coming weeks.