Page MenuHomePhabricator

Create a static HTML version of Bugzilla
Closed, ResolvedPublic

Description

To get rid of Bugzilla to reduce the up keep ops have for a service that is no longer used by people and the fact content is already here in Phabricator, create a static HTML dump of Bugzilla and the history pages to retain certain historical data.

Split from T1198

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

example:

if an old URL had this:

show_bug.cgi?id=61294

then we would like to have something like:

bug61294.html

Might be easier to set up redirects if we keep the naming similar to the original (e.g. /show_bug-61294). It doesn't have to end in .html, either. There's at least one other action we'd want to keep, the change log (show_activity.cgi?id=1, show_activity-1). There may be a few more endpoints.

the trickier part would be how to handle CSS and images/ fix paths etc

That should be the easiest part. As far as I know none of those are generated in Bugzilla. So we just fetch the source code as-is and serve it from this new static server.

Another tricky part is maintaining relative urls from bugs to other bugs. A few ideas:

  • Modify scraped output and change relative urls to new urls (hardcoded).
  • Modify scraped output and change relative urls to absolute urls to the old server (natural).
  • Add the url rewrite/redirect logic that we'll use on the old server, on the new server as well (/show_bug.cgi?id=:id -> /show_bug-{id}).

Alternatively, and I think this would be the most neat solution, use transparent rewriting instead of redirecting on the new server. That would transparently serve show_bug-1.html when accessing /show_bug.cgi?id=1. This would also remove the need for complex redirects from the old server. Other addresses just point to the same path on the new server and it works as expected.

  • Scrape
    • show_bug
    • show_activity
  • Serve
    • Scraped files (via rewrite?)
    • Static front-end resources
  • Set up redirects
    • old-bugzilla/{number}
    • old-bugzilla/show_bug.cgi
    • old-bugzilla/show_activity.cgi
In T85140#946725, @Qgil wrote:

Dump done.

Where?

Stored locally. I've not had chance to speak to Daniel yet about what the next steps are with the idea.

Good, but then I think the task should remain open until this static version of Bugzilla is can be accessed and tested publicly.

PS: thank you very much for working on this!

I'm not aware having a dump. How large is it?

Stored locally. I've not had chance to speak to Daniel yet about what the next steps are with the idea.

No need to wait for me or IRC, let's just speak via this ticket. The next steps would be:

  • create a new puppet role that sets up an Apache site, f.e. static-bugzilla.wikimedia.org
  • add config snippet to varnish misc-web, backend zirconium
  • add DNS entry to point to misc-web
  • apply role on backend server zirconium
  • figure out how we want to "deploy" the static HTML files into the docroot of that Apache site

Change 183758 had a related patch set uploaded (by Dzahn):
bugzilla: add Apache site for static BZ version

https://gerrit.wikimedia.org/r/183758

Patch-For-Review

Change 183759 had a related patch set uploaded (by Dzahn):
bugzilla: add varnish config for static-bugzilla

https://gerrit.wikimedia.org/r/183759

Patch-For-Review

Stored locally. I've not had chance to speak to Daniel yet about what the next steps are with the idea.

No need to wait for me or IRC, let's just speak via this ticket. The next steps would be:

  • figure out how we want to "deploy" the static HTML files into the docroot of that Apache site

That is the only reason I wanted to speak to you actually :)

I don't know either. The options seem to be:

  • just upload them manually once, saying it's not worth any additonal effort since it's a one-time thing
  • make a .deb that does nothing but install the static HTML files
  • add them all to puppet and use "recurse" to let puppet manage that directory and deploy all files in it
  • ?

I don't know either. The options seem to be:

  • just upload them manually once, saying it's not worth any additonal effort since it's a one-time thing

Probably the best thing to do really.

  • make a .deb that does nothing but install the static HTML files

Could work?

  • add them all to puppet and use "recurse" to let puppet manage that directory and deploy all files in it

148 thousand files though?

Change 183758 merged by Dzahn:
bugzilla: add Apache site for static BZ version

https://gerrit.wikimedia.org/r/183758

Change 183759 merged by Dzahn:
bugzilla: add varnish config for static-bugzilla

https://gerrit.wikimedia.org/r/183759

prepared the Apache/varnish/DNS config so we now have

http://static-bugzilla.wikimedia.org/

and an empty dir to put the files in

uploaded some of the bugs and the needed skins directory.

example: http://static-bugzilla.wikimedia.org/bug1.html

issues here:

  • not enough disk space on zirconium to upload them all
  • too many files in a single directory, if we allow indexing it almost crashes my browser

maybe these:

< MatmaRex> (no more searching bugzilla? :( )
< bd808> Probably should do something to strip out the bugzilla chrome (links to login etc) or setup a 404 page that explains its a static dump

Dzahn mentioned this in Unknown Object (Diffusion Commit).Jan 14 2015, 4:52 PM

Currently, attachments are missing from static-bugzilla.wikimedia.org. Sorry if this is already known.

As the obsolete attachments were not migrated to Phab, I hope that old-bugzilla wont be pulled offline until attachments are on static-bugzilla.

Also the bug activity is not included the static version of the comments, and the activity page hasnt been included. I believe this is a bugzilla viewing preference.

This is very important, as it is not possible to see when a status changed for a bug imported in phab. e.g. when it was CLOSED, especially when comments are made after the migration to phab.

@jayvdb activity pages are included in the dump @Dzahn has - although it is not fully imported due to technical reasons.

Status changes are not shown in comments are the public view of Bugzilla doesn't allow allow to view them.

Dzahn raised the priority of this task from Lowest to Low.Feb 9 2015, 7:09 PM

Change 190118 had a related patch set uploaded (by Dzahn):
add index.html for static Bugzilla

https://gerrit.wikimedia.org/r/190118

Patch-For-Review

Now that the blocking task is closed i could upload all the files. All (public) bugs should be there now and also one activity page for each of them.

ex: http://static-bugzilla.wikimedia.org/activity1.html

fwiw: 73681 bug* and also 73681 activity*. Adding an index.html next. And we need a rewrite rule.

Change 190118 merged by Dzahn:
add index.html for static Bugzilla

https://gerrit.wikimedia.org/r/190118

Also the bug activity is not included the static version
This is very important

Activities are now available. see http://static-bugzilla.wikimedia.org/ and http://static-bugzilla.wikimedia.org/activity1.html

Change 190126 had a related patch set uploaded (by Dzahn):
load mod_rewrite on static Bugzilla

https://gerrit.wikimedia.org/r/190126

Patch-For-Review

Change 190126 merged by Dzahn:
load mod_rewrite on static Bugzilla

https://gerrit.wikimedia.org/r/190126

Change 190132 had a related patch set uploaded (by Dzahn):
static-bz: rewrite /show_bug.cgi to static HTML

https://gerrit.wikimedia.org/r/190132

Patch-For-Review

Change 190132 merged by Dzahn:
static-bz: rewrite /show_bug.cgi to static HTML

https://gerrit.wikimedia.org/r/190132

these URLs with "show_bug.cgi" work now on static-bugzilla. so users can just insert the "static" part into original URLs

example: http://static-bugzilla.wikimedia.org/show_bug.cgi?id=123

gets rewritten internally to bug123.html . There is no actual .cgi script involved.

fixes links on activity pages as well http://static-bugzilla.wikimedia.org/activity1.html

RewriteCond %{QUERY_STRING} ^id=([0-9]+)$
RewriteRule ^/show_bug.cgi$ /bug%1.html? [PT]

Change 190613 had a related patch set uploaded (by Dzahn):
static-bz: also rewrite activity URLs

https://gerrit.wikimedia.org/r/190613

Patch-For-Review

No doubt you are aware, but just noting:

  • the header on static-bugzilla pages is / will be wrong
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "old-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
  • need rewrite rule for /show_activity.cgi?id=1 -> activity1.html
  • attachments yet to be done, and "Show Obsolete" JavaScript still broken (reported earlier: T85140#987797)
  • also not done, and probably going to be lost: votes, dependency graph and tree, buglists (e.g. the link after duplicates)

Change 190613 merged by Dzahn:
static-bz: also rewrite activity URLs

https://gerrit.wikimedia.org/r/190613

  • need rewrite rule for /show_activity.cgi?id=1 -> activity1.html

done just now.

https://static-bugzilla.wikimedia.org/show_activity.cgi?id=23

  • the header on static-bugzilla pages is / will be wrong

sed 's/remove \"old-\"/remove \"static-\"/' bug1.html
in a loop on ALL the files i guess

attachments yet to be done, and "Show Obsolete" JavaScript still broken

I don't know about that part yet. Do we have any idea how large that is @JohnLewis ideas?

  • attachments yet to be done, and "Show Obsolete" JavaScript still broken (reported earlier: T85140#987797)

After looking at it, there are only two options regarding obsolete attachments.

  1. Import them into Phabricator (I'm staying out of this option, @Aklapper, @chasemp and @Qgil are the ones who have the say on this)
  2. Wait for T85141 to be resolved.

The space (as @Dzahn poked me about earlier) is not really the issue, having attachments on static-bz is just technically not feasible. Bugzilla stores the attachments in the database as opposed to text files like most software would usually use. As such, to allow attachments on static, the database attachments would need to be converted to text, image, xml etc. files relative to their format. Considering the little gain and other methods which would be easier, I [personally] don't feel it is worth devoting time to creating a script or a process to do this. Though the attachments links could possibly be redirected to phabricator attachments.

  • the header on static-bugzilla pages is / will be wrong In order to access the Phabricator task corresponding to a Bugzilla report, just remove "old-" from its URL.

i just fixed that and replaced "old-" with "static-" on all bug pages and also the activity pages.

After looking at it, there are only two options regarding obsolete attachments.

  1. Import them into Phabricator

We won't do that - at least I don't see anybody investigating how complicated linking and updating those related comments would be, plus fiddling with proper access restrictions again for those marked obsolete && private.

  1. Wait for T85141 to be resolved.

How is that related to obsolete (but not: private) attachments?

  1. Wait for T85141 to be resolved.

How is that related to obsolete (but not: private) attachments?

All attachments are stored in the database from the understanding we have. Creating a db dump will allow people to recover obsolete attachments if they want.

What does "recovering obsolete attachments" mean? All obsolete attachments are accessible from current old-bugzilla.wikimedia.org without any login required, so I don't get the "recover" part.

The plan is to remove old-bugzilla as soon as possible as maintaining old abandoned project (in the sense of we don't use it) is not going to happen which can open it up to security exploits, breakages.

static-bugzilla is aiming to replace the bug and activity pages and a database dump to allow users on labs to create a basic tool which allows querying of votes etc. unless another solution is found.

I know that. It's what this task is about. :) I also understand that "Show Obsolete" does nothing currently.

But I don't understand how "recovering obsolete attachments" is related to T85141.

Attachments are stored in the database. Providing a database dump leaving attachments and obsolete ones in the dump would allow people to recover them in what ever they want. The only alternative is just import them into Phabricator really.

What John said. We want to be able to close this task so we can remove old-bugzilla and only keep static-bugzilla, and making those attachments available seems a blocker for that since after that we won't be able to tell people to get them from old-bz and if we don't provide the database dump either we're going to be stuck with old-bz.

fwiw, getting the database sanitized (without even an existing schema from Mozilla) and getting somebody to review that it is properly sanitized to release it seems like it might be harder than importing the missing attachments.

what i don't want is being stuck with this ticket forever and support _2_ old BZ services, the whole point of static- is to kill old-

What is the use case for obsoleted attachments?

We'd need better search in phabricator before old- can be turned off (or a BZ DB dump). Also we should keep static copies of the hidden bugs so that they can be added to static- when the relevant Phabricator ticket is opened up.

well.. can we add JohnLewis to the BZ security group then so he can also dump the static copies of the hidden bugs ?:)) (--> T89781)

If this ticket turns out to have a dependency on "better search in phab" (which i'm not convinced of yet), i would politely give it back to the pool "up for grabs" at this point.

To get rid of Bugzilla

This is not going to happen in this decade anyway.

FYI, static HTML dumps made for archival purposes should be in WARC format.

To get rid of Bugzilla

This is not going to happen in this decade anyway.

It is very much going to happen. Please stay constructive.

re: WARC format. sounds like all that is needed for that is running (a modern version) of wget with the --warc-file . option.

http://archiveteam.org/index.php?title=Wget_with_WARC_output

Change 192865 had a related patch set uploaded (by Dzahn):
static Bugzilla: explain cgi links are redirects

https://gerrit.wikimedia.org/r/192865

Change 192865 merged by Dzahn:
static Bugzilla: explain cgi links are redirects

https://gerrit.wikimedia.org/r/192865

Change 192964 had a related patch set uploaded (by Dzahn):
static bugzilla: add links to all bugs/activities

https://gerrit.wikimedia.org/r/192964

well.. can we add JohnLewis to the BZ security group (--> T89781)

See my reply in T89781#1058600 - hapy to do so if csteipp is fine with that

Change 193858 had a related patch set uploaded (by Dzahn):
static bugzilla: add links to all bugs/activities

https://gerrit.wikimedia.org/r/193858

Change 193858 merged by Dzahn:
static bugzilla: add links to all bugs/activities

https://gerrit.wikimedia.org/r/193858

Change 192964 abandoned by Dzahn:
static bugzilla: add links to all bugs/activities

https://gerrit.wikimedia.org/r/192964

The bulk of this is done now as https://static-bugzilla.wikimedia.org/ exists serving static versions of the bugs on bugzilla with a list of all bugs and the activity pages separately.

The homepage basically covers the purpose of it up and it is not dependent on bugzilla what so-ever so this is resolved.

If there is anything missed which is a blocker for before BZ can go as a service, open a new bug and place it as a block of T95184 which is going to be used for tracking the removal of Bugzilla from production.