Page MenuHomePhabricator

Bugzilla HTML static version and database dump
Closed, ResolvedPublic

Description

In T366#20014, @Dzahn wrote:

I would strongly advise to NOT keep Bugzilla up in the current from longer than needed. Saying "expect years" means "for years the ops team will have to patch it for every security release and version upgrade" even though almost nobody will use it. It already did not get much love when it was the main bugtracker, it will only become worse if it's "old-bugzilla" and has no users, but it would still need support. Already while we speak there is a complicated pending patch that deals with our custom modifications as well. What i would suggest instead:

  • keep static HTML of the bug pages. let's run a script that goes through all bug numbers, starting from 1 and simply saves the raw HTML. Then we can keep those static HTML files somewhere for years and nobody would mind. But we could delete the entire application, all the Perl files, anything dynamic and not have to deal with it ever again.
  • and/or provide a sanitized database dump to the general public. remove the user table and the security related bugs from a database dump, then put it on dumps.wikimedia.org or something for the community to download. people generally like that when we just release dumps and they are free to work with them

this, combined with the bugzilla.wikimedia.org URLs still working and being handled by phab itself, sounds good to me without having to care about actual BZ application

Attachments should be stored as well -- including obsolete versions. See T78747: Obsolete attachments were not migrated from bugzilla

Related Objects

Event Timeline

Qgil raised the priority of this task from to Low.
Qgil updated the task description. (Show Details)
Qgil added a project: Phabricator.
Qgil changed Security from none to None.
Qgil subscribed.

i'm trying to do the "static HTML" part at least, made some attempts with curl and httrack but not quite there yet to make it save the CSS and keep nice looking design the right way. maybe i'll just do it in Firefox instead.

that doesn't mean i know how exactly to sanitize the database dump the correct way

maybe better to split this up. making static HTML is kind of separate job from figuring out how to sanitize the database. what exactly needs to be removed from the user table? how do we remove the security related bugs (just the open ones?).

for the script to make static HTML it doesn't really matter, just solving that problem by _not_ logging in, that should mean whatever i save is public anyways.

In T1198#21299, @Dzahn wrote:

maybe better to split this up. making static HTML is kind of separate job from figuring out how to sanitize the database. what exactly needs to be removed from the user table?

Not sure if helpful but bugzilla.mozilla.org has a non-upstreamed sanitizeme.pl script against their customized Bugzilla running 4.2 (we have 4.4) to create MySQL DB dumps for researchers.

I still think this task should be split in 2. Creating static HTML pages from old-bugzilla.wikimedia.org could be "needs volunteer" and done by anyone, while sanitizing the database can't.

JohnLewis claimed this task.

Split into T85140 and T85141

Not sure about the status so, invalid?

disagree. it's not invalid it's a tracking or master ticket which should have the 2 new tickets as subtasks/blockers

Minor: Seeing "WMF Static Bugzilla" results in Google, could that be changed to "Wikimedia Deprecated Bugzilla" or something?

Change 195651 had a related patch set uploaded (by John F. Lewis):
bz: change index.html title

https://gerrit.wikimedia.org/r/195651

Change 195651 merged by Dzahn:
bz: change index.html title

https://gerrit.wikimedia.org/r/195651

Minor: Seeing "WMF Static Bugzilla" results in Google, could that be changed to "Wikimedia Deprecated Bugzilla" or something?

It changed now to "Deprecated Bugzilla - Wikimedia"

both blocker tasks are resolved, so this should be resolved too