Page MenuHomePhabricator

HTML/CSS style changes to the dumps.wikimedia.org main pages
Closed, ResolvedPublic

Description

I know dumps are being re-written but it's taking several years so in the mean time, it would be great to have small enhancements (low hanging fruits) to the current system:

  • UI. It's so 1990s. Please have something a little bit prettier.

Deferred:

  • Make it faster and nimble:
    • Remove Yahoo! summaries: For example it's 38 GBs for Wikidata and it slows down the whole process of generating dumps for all wikis. The Yahoo! itself is about to die and I don't know why we keep doing this.
    • Keep one compression method and abandon the other. Why we build two dumps one for 7z and one for bz2?
  • Notifications: I would love to receive an email once the monthly dump of Wikidata is done so I can run some checks and fix it on-wiki. (Note: It's possible and now documented

Details

Related Gerrit Patches:
operations/puppet : productiondumps: Redesign progress report page
operations/puppet : productiondumps: Centralize CSS in one file, make it wider and apply to more files
operations/puppet : productiondumps: More UI cleanup
operations/puppet : productiondumps: Modernize design of the index page

Event Timeline

Notifications: I would love to receive an email once the monthly dump of Wikidata is done so I can run some checks and fix it on-wiki.

You are aware of the fact that there are RSS feeds for all dumps (see https://dumps.wikimedia.org/wikidatawiki/latest/ etc., all those -rss.xml stuff)? With services like https://blogtrottr.com/ you can turn them into email notifications. I use this for the dewiki dump and it works great.

Bz2 dumps are provided because they are block-oriented unlike 7z; not only can we recover from them in the middle, but various sites use this feature for their work (cf dbpedia). &z dumps are provided because they are much smaller, for downloaders without a lot of storage/bandwidth. When the dumps are rewritten we'll be revisiting the output format(s) and compression formats, as well as the possibility of producing files on demand comprised of smaller chunks of dump content.

Note that if you don't need full revision history, two dumps a month are provided of everything else.

I'd like to lose the abstracts too but people download them (not just mirrors of all dumps; I see downloaders that pick up only those files).

I'm happy to see updates to the html templates used for output; they are in our puppet repo here: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/snapshot/files/dumps/templates/ ({download-index,progress,report}.html), changesets welcome!

Thanks for the explanations:

I'd like to lose the abstracts too but people download them (not just mirrors of all dumps; I see downloaders that pick up only those files).

Can we start discussion about the deprecation and wait for a while to see people come back and explain why they download such dumps, maybe they are broken crawler bots or something like that. Maybe page summaries should be re-formatted using parsoid (for services such as MCS)

I'm happy to see updates to the html templates used for output; they are in our puppet repo here: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/snapshot/files/dumps/templates/ ({download-index,progress,report}.html), changesets welcome!

Thanks. I'll make some patches there soon.

For the abstract dumps, I've tried polling on the mailing list before with no response, but that's not surprising. When I looked at the access logs though I found a significant number of ips that downloaded only abstract dumps and nothing else. What format those might take in the future should of course be part of the dumps rewrite project.

In April we'll be trying to get together a survey of dumps users casting as wide a net as possible to reach everyone, and this is one of the things we can ask about. See comments on T147177 about this; I need to open a task for that soon so we can collect ideas there.

Change 334856 had a related patch set uploaded (by Ladsgroup):
dumps: Modernize design of the index page

https://gerrit.wikimedia.org/r/334856

@Ladsgroup Would you be willing to toss a screenshot up here of what it will look like? That way I can float it by people on the mailing lists.

ArielGlenn moved this task from Backlog to Active on the Dumps-Generation board.Jan 29 2017, 11:39 PM
ArielGlenn triaged this task as Medium priority.
srishakatux awarded a token.

Change 334856 merged by ArielGlenn:
dumps: Modernize design of the index page

https://gerrit.wikimedia.org/r/334856

It's live. Let's see who notices :-)

Change 335684 had a related patch set uploaded (by Ladsgroup):
dumps: More UI cleanup

https://gerrit.wikimedia.org/r/335684

This proposal is selected for the Developer-Wishlist voting round and will be added to a MediaWiki page very soon. To the subscribers, or proposer of this task: please help modify the task description: add a brief summary (10-12 lines) of the problem that this proposal raises, topics discussed in the comments, and a proposed solution (if there is any yet). Remember to add a header with a title "Description," to your content. Please do so before February 5th, 12:00 pm UTC.

Screenshots for the new UI changes:

PageBeforeAfter
dumps.wikimedia.org
Mirrors
Legal
DVD
Ladsgroup updated the task description. (Show Details)Feb 10 2017, 2:53 PM

Change 335684 merged by ArielGlenn:
dumps: More UI cleanup

https://gerrit.wikimedia.org/r/335684

Change 337264 had a related patch set uploaded (by Ladsgroup):
dumps: Centeralize CSS in one file, make it wider and apply to more files

https://gerrit.wikimedia.org/r/337264

Change 337264 merged by ArielGlenn:
dumps: Centralize CSS in one file, make it wider and apply to more files

https://gerrit.wikimedia.org/r/337264

A bit delayed this time but now live.

@Ladsgroup You want to fix up report.html? I think that's the last thing left.

@ArielGlenn I will get that done this week.

Change 339332 had a related patch set uploaded (by Ladsgroup):
dumps: Redesign progress report page

https://gerrit.wikimedia.org/r/339332

It will look like this:

Old version to record in history:

Adding a screenshot to show what the change looks like when viewing some of the shorter jobs. Is it too cluttered?

I can increase the space between them if you think that would help.

I don't have anything valuable to add, but for what it's worth, I think the new look is fine. I think it is more visually appealing and modern. So, from me.

ArielGlenn added a comment.EditedMar 2 2017, 11:25 AM

I can increase the space between them if you think that would help.

I don't know what to suggest here (which is why I didn't suggest something :-)) More space means more scrolling and less items that fit on the screen. In an ideal world we would redo this list completely so that you could see the status of the currently running job if any, the failed jobs as a group, the successful jobs as a group, everything nicely on one screen. But that's work for the Dumps-Rewrite project, not for the current architecture.

>Punt<

Dzahn added a subscriber: Dzahn.Mar 3 2017, 7:00 PM

I also think the new design is alright.

Change 339332 merged by ArielGlenn:
[operations/puppet] dumps: Redesign progress report page

https://gerrit.wikimedia.org/r/339332

That's the UI changes. Anything left on this ticket? Bz2/7Z production and abstracts dumps will be kept for the current architecture as per comments T155697#2961002 and T155697#2965114

I think we can close this for now. Maybe we should wait for further input from anyone who thinks these changes are not enough (in UI or backend)

Nemo_bis renamed this task from Small enhancements to current system of dumps to HTML/CSS style changes to the dumps.wikimedia.org main pages.Mar 19 2017, 12:32 PM
Nemo_bis closed this task as Resolved.
Nemo_bis edited projects, added Datasets-General-or-Unknown; removed Dumps-Generation.
Nemo_bis updated the task description. (Show Details)
Nemo_bis added a subscriber: Nemo_bis.

Done. Thanks.