Page MenuHomePhabricator

Make an HTML dump of the output of the CodeReview extension on MediaWiki.org
Open, HighPublic

Description

As demanded by @Legoktm.

Details

Related Gerrit Patches:
mediawiki/tools/codereview-archiver : masterInitial commit
mediawiki/extensions/CodeReview : masterCodeRevisionView: Fix one case of viewvc not being optional

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 24 2018, 11:53 PM
Legoktm claimed this task.Sep 24 2018, 11:57 PM

I'll give it a shot then.

I think the diffs themselves are not that important to dump (we have them at https://phabricator.wikimedia.org/diffusion/SVN/), but the review comments are often very useful.

Soooo... @Legoktm were you able to give it a try, and if so, what happened?

Dzahn added a subscriber: Dzahn.Mar 12 2019, 9:49 AM

T218079 brings T116948 up again and this task here is a blocker for both.

Change 500884 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/CodeReview@master] CodeRevisionView: Fix one case of viewvc not being optional

https://gerrit.wikimedia.org/r/500884

Change 500884 merged by jenkins-bot:
[mediawiki/extensions/CodeReview@master] CodeRevisionView: Fix one case of viewvc not being optional

https://gerrit.wikimedia.org/r/500884

I guess this task is stalled? I'll check with @Legoktm to see if we can move it forward.

Also checking in with @CCicalese_WMF to see if we can rebalance some resources.

WDoranWMF triaged this task as Low priority.Jun 25 2019, 1:51 PM
WDoranWMF added a subscriber: WDoranWMF.

It appears this extension's data is accessible here
https://www.mediawiki.org/wiki/Special:Code

So we need to scrap all the data under these pages. However, the scale of the data is such that it would have to be paginated in some manner, which would then need to be ported to MW.org.

I'm moving this to the Core Platform Team Inbox so that it can be triaged and planned into a future sprint for one of our subteams.

Krinkle added a subscriber: Krinkle.EditedJun 26 2019, 8:33 PM

We've done a similar archiving in the past when we shutdown our BugZilla installation.
Don't know if you want to go this route, but it's an idea I explored last year at T205482:

  • Maybe a MW maintenance script or Python scraper to render the pages without the skin (like action=render, but for Special:Code), and upload them somewhere (puppet microsite can be used).
  • Redirect the Special:Code urls for mediawiki.org to this static site, using an Apache rewrite rule.
  • Then, undeploy the extension (reduced to only log entries, as for EducationProgram, could be in WikimediaMessages repo).
WDoranWMF removed Legoktm as the assignee of this task.Jul 2 2019, 7:06 PM
WDoranWMF edited projects, added Core Platform Team Legacy; removed Core Platform Team.
awight added a subscriber: awight.Jul 4 2019, 7:44 AM
WDoranWMF raised the priority of this task from Low to High.Dec 10 2019, 3:20 PM
Legoktm claimed this task.Wed, Jan 15, 10:41 AM

I have everything dumped locally, it's about 4GB. I'll rsync it to people.wm.o so people can review it before we place it in its final location. I mostly did what @Krinkle suggested but with a few regex tweaks to fix URLs.

Looks good to me

Mentioned in SAL (#wikimedia-operations) [2020-01-16T02:35:18Z] <Krinkle> krinkle@mwmaint1002 Change code_repo.repo_viewvc from 'https://svn.wikimedia.org/viewvc/mediawiki' to '' for 'MediaWiki' repo_name. Ref 2162cf2fc46cfe, T205361.

Krinkle added a comment.EditedThu, Jan 16, 2:44 AM

@Legoktm Awesome. I do have a few small nit picks:

  • Per T205361#5080437, I've applied the repo_viewvc change. This results in some of the broken interface links, being omitted. If you re-run the script now, those links will be gone, and the paths will remain as plain text.
  • The archive pages could do with a basic <h1> heading. Maybe make them a simple copy of the <title> that you have already?
  • I noticed the relative links are currently absolute e.g. <a href="/~legoktm/CodeReview/MediaWiki/rev/2.html">. If these were relative, like ./2.html, the archive would be more portable (without needing string replacements or regeneration).
  • I have a few minor CSS tweaks (e.g. hide the no-op "purge" link), but I'll stuff that in a patch later.

Change 565805 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/tools/codereview-archiver@master] Initial commit

https://gerrit.wikimedia.org/r/565805

Change 565805 merged by Legoktm:
[mediawiki/tools/codereview-archiver@master] Initial commit

https://gerrit.wikimedia.org/r/565805

I updated the dump last night with fixes for @Krinkle's feedback.