Page MenuHomePhabricator

Make an HTML dump of the output of the CodeReview extension on MediaWiki.org
Open, HighPublic

Description

As demanded by @Legoktm.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 24 2018, 11:53 PM
Legoktm claimed this task.Sep 24 2018, 11:57 PM

I'll give it a shot then.

I think the diffs themselves are not that important to dump (we have them at https://phabricator.wikimedia.org/diffusion/SVN/), but the review comments are often very useful.

Soooo... @Legoktm were you able to give it a try, and if so, what happened?

Dzahn added a subscriber: Dzahn.Mar 12 2019, 9:49 AM

T218079 brings T116948 up again and this task here is a blocker for both.

Change 500884 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/CodeReview@master] CodeRevisionView: Fix one case of viewvc not being optional

https://gerrit.wikimedia.org/r/500884

Change 500884 merged by jenkins-bot:
[mediawiki/extensions/CodeReview@master] CodeRevisionView: Fix one case of viewvc not being optional

https://gerrit.wikimedia.org/r/500884

I guess this task is stalled? I'll check with @Legoktm to see if we can move it forward.

Also checking in with @CCicalese_WMF to see if we can rebalance some resources.

WDoranWMF triaged this task as Low priority.Jun 25 2019, 1:51 PM
WDoranWMF added a subscriber: WDoranWMF.

It appears this extension's data is accessible here
https://www.mediawiki.org/wiki/Special:Code

So we need to scrap all the data under these pages. However, the scale of the data is such that it would have to be paginated in some manner, which would then need to be ported to MW.org.

I'm moving this to the Core Platform Team Inbox so that it can be triaged and planned into a future sprint for one of our subteams.

Krinkle added a subscriber: Krinkle.EditedJun 26 2019, 8:33 PM

We've done a similar archiving in the past when we shutdown our BugZilla installation.
Don't know if you want to go this route, but it's an idea I explored last year at T205482:

  • Maybe a MW maintenance script or Python scraper to render the pages without the skin (like action=render, but for Special:Code), and upload them somewhere (puppet microsite can be used).
  • Redirect the Special:Code urls for mediawiki.org to this static site, using an Apache rewrite rule.
  • Then, undeploy the extension (reduced to only log entries, as for EducationProgram, could be in WikimediaMessages repo).
WDoranWMF removed Legoktm as the assignee of this task.Jul 2 2019, 7:06 PM
WDoranWMF edited projects, added Platform Team Legacy; removed Platform Engineering.
awight added a subscriber: awight.Jul 4 2019, 7:44 AM
WDoranWMF raised the priority of this task from Low to High.Dec 10 2019, 3:20 PM
Legoktm claimed this task.Jan 15 2020, 10:41 AM

I have everything dumped locally, it's about 4GB. I'll rsync it to people.wm.o so people can review it before we place it in its final location. I mostly did what @Krinkle suggested but with a few regex tweaks to fix URLs.

Looks good to me

Mentioned in SAL (#wikimedia-operations) [2020-01-16T02:35:18Z] <Krinkle> krinkle@mwmaint1002 Change code_repo.repo_viewvc from 'https://svn.wikimedia.org/viewvc/mediawiki' to '' for 'MediaWiki' repo_name. Ref 2162cf2fc46cfe, T205361.

Krinkle added a comment.EditedJan 16 2020, 2:44 AM

@Legoktm Awesome. I do have a few small nit picks:

  • Per T205361#5080437, I've applied the repo_viewvc change. This results in some of the broken interface links, being omitted. If you re-run the script now, those links will be gone, and the paths will remain as plain text.
  • The archive pages could do with a basic <h1> heading. Maybe make them a simple copy of the <title> that you have already?
  • I noticed the relative links are currently absolute e.g. <a href="/~legoktm/CodeReview/MediaWiki/rev/2.html">. If these were relative, like ./2.html, the archive would be more portable (without needing string replacements or regeneration).
  • I have a few minor CSS tweaks (e.g. hide the no-op "purge" link), but I'll stuff that in a patch later.

Change 565805 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/tools/codereview-archiver@master] Initial commit

https://gerrit.wikimedia.org/r/565805

Change 565805 merged by Legoktm:
[mediawiki/tools/codereview-archiver@master] Initial commit

https://gerrit.wikimedia.org/r/565805

I updated the dump last night with fixes for @Krinkle's feedback.

CCicalese_WMF closed this task as Resolved.Apr 28 2020, 8:48 PM

Marking as Resolved as it is in the Done column. Feel free to reopen if there is remaining work.

Kizule added a subscriber: Kizule.EditedApr 29 2020, 12:55 AM

Where is HTML dump located?

Thanks @Dzahn. I looked in https://people.wikimedia.org/~legoktm/CodeReview/MediaWiki/rev/35.html (example) and found that some URL's are to MediaWiki.org like history, "purge"...

CCicalese_WMF reopened this task as Open.Apr 29 2020, 2:00 PM

@Legoktm Is the dump available somewhere more public or documented somewhere? Could you please add a link to the final location somewhere and re-resolve this task?

Dzahn added a comment.Apr 29 2020, 2:14 PM

Maybe talk to @ArielGlenn about getting it on the official dumps servers (dumps.wikimedia.org) under "misc". That would be more stable than the people VM.

This task depends on T243056: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview , which is about setting up a domain to host the dump.

Maybe talk to @ArielGlenn about getting it on the official dumps servers (dumps.wikimedia.org) under "misc". That would be more stable than the people VM.

We could host a tarball of the hmtl pages but that's different than a static copy that people can browse online.

Dzahn added a comment.Apr 30 2020, 4:02 PM

Since the SQL dumps for codereview are also on dumps servers (T243055) doesn't it fit to also have the HTML together with it?

The HTML dump can be in a tarball for download, sure. But that is separate from what was requested in T243056 i.e. actually serving a static copy for browsing. I don't think the labstore boxes should be doing that.

Dzahn added a comment.May 1 2020, 10:23 AM

In that case i think it sounds like this should have a dedicated ganeti VM just for this.

Sounds good to me, though we probably want that dicussion on the other task.

Pppery added a subscriber: Pppery.May 15 2020, 1:27 AM
Dzahn added a comment.EditedMay 15 2020, 9:07 AM

This has happened in T243056 (sites have been added to the miscweb* VMs shared with other static sites)

https://static-codereview.wikimedia.org/MediaWiki/1.html

I think it's (basically?) done.

Krinkle added a comment.EditedMay 20 2020, 2:31 AM

Some HTML corruption ocurred in the post-processing step. This has caused "follow-up" links to become broken:

https://static-codereview.wikimedia.org/MediaWiki/75446.html?

<a ./75429.html" title="Special:Code/MediaWiki/75429">r75429</a>
<a ./75446.html" title="Special:Code/MediaWiki/75446">r75446</a>

Original from https://www.mediawiki.org/wiki/Special:Code/MediaWiki/75446

<a href="/wiki/Special:Code/MediaWiki/75466" title="Special:Code/MediaWiki/75466">r75466</a>
<a href="/wiki/Special:Code/MediaWiki/75467" title="Special:Code/MediaWiki/75467">r75467</a>

Also, is it MediaWiki/1.html or MediaWiki/rev/1.html. I've seen both versions. It seems we're back to the former?

Dzahn added a comment.May 20 2020, 6:36 AM

Also, is it MediaWiki/1.html or MediaWiki/rev/1.html. I've seen both versions. It seems we're back to the former?

It's https://static-codereview.wikimedia.org/MediaWiki/1.html The other version was MediaWiki/r1.html but not /rev/1.html.

So, when is the Apache rewrite being put in place? That's blocking undeploying the extension.

greg added a subscriber: greg.Sep 17 2020, 11:16 PM

So, when is the Apache rewrite being put in place? That's blocking undeploying the extension.

Ping! :)

(I remembered this task after this comment on the GitLab consultation: https://www.mediawiki.org/wiki/Topic:Vu63x95by4od74uc )