Page MenuHomePhabricator

NEW BUG REPORT Wikipedia clickstream datasets link on Dumps "Other" page should point to HTML readme
Closed, ResolvedPublicBUG REPORT

Description

Data Platform Engineering Bug Report or Data Problem Form.

Please fill out the following
Please ensure you set priority

What kind of problem are you reporting?

  • Access related problem
  • Service related problem
  • Data related problem
For a data related problem:
  • Is this a data quality issue?

No, it's a data documentation/ findability issue

  • What datasets and/or dashboards are affected?

https://dumps.wikimedia.org/other/ and Wikipedia Clickstream

  • What are the observed vs expected results? Please include information such as location of data, any initial assessments, sql statements, screenshots.

Observed:
The Wikipedia clickstream datasets link on https://dumps.wikimedia.org/other/ currently links directly to the list of data dumps.

Problem: this makes it difficult if not impossible for a user to realize that there is documentation for that data at https://dumps.wikimedia.org/other/clickstream/readme.html.

Proposed fix:
The link on https://dumps.wikimedia.org/other/ should point to https://dumps.wikimedia.org/other/clickstream/readme.html. This is the link used on other dumps pages, like https://dumps.wikimedia.org/other/analytics/. (I don't know how to update this HTML, and haven't been able to figure out what the process might be for doing that, though I did read https://wikitech.wikimedia.org/wiki/Dumps and related pages).

For the DE Team to fill out
Which systems does this effect?
  • Hive
  • Druid
  • Superset
  • Turnilo
  • WikiDumps
  • Wikistats
  • Airflow
  • HDFS
  • Goblin
  • Scqoop
  • Dashiki
  • DataHub
  • Spark
  • Jupyter
  • Modern Event Platform
  • Event Logging
  • Other
Impact Assessment:

Does this problem qualify as an incident?

  • Yes
  • No

Does this violate an SLO?

  • Yes
  • No
Value CalculatorRank
Will this improve the efficiency of a teams workflow?1-3
Does this have an effect of our Core Metrics?1-3
Does this align with our strategic goals?1-3
Is this a blocker for another team?1-3

Event Timeline

TBurmeister created this task.

Change #1013576 had a related patch set uploaded (by Milimetric; author: Dan Andreescu):

[operations/puppet@production] dumps.wikimedia.org/other: point clickstream link to readme

https://gerrit.wikimedia.org/r/1013576

I made the puppet change but I need an SRE to merge. This is not well documented indeed, we should talk about a better way to maintain this interface that so many people use.

Change #1013576 merged by Btullis:

[operations/puppet@production] dumps.wikimedia.org/other: point clickstream link to readme

https://gerrit.wikimedia.org/r/1013576

I've taken the liberty of mergng and deploying your patch @Milimetric - Looked good to me.

@TBurmeister I had a look and it seems good. Want to have final review and make sure this request has been fulfilled?

The fix looks good! If there's a way that I or other contributors could help improve this interface, I'd be happy to make additional improvements to make it easier to navigate and discover all this juicy data.