Page MenuHomePhabricator

Decommission obsolete Discovery Dashboards
Open, LowPublic0 Estimated Story Points

Description

With key Search Platform metrics moving to Superset/Turnilo (T227781), we should evaluate the various Discovery Dashboards to reduce maintenance cost & effort if they are not being actively used to drive decision-making.

Decommission

  • Wikipedia.org Portal
  • Search Metrics (when ready)
  • Wikimedia Maps (when ready)
  • Wikidata Query Service
  • External Referrers

Event Timeline

mpopov created this task.
mpopov moved this task from Triage to Backlog on the Product-Analytics board.
mpopov moved this task from needs triage to Tests & Analysis on the Discovery-Search board.

Okay, we've got 3 dashboards to decide fates for:

WDQS

@Smalyshev: Do you use https://discovery.wmflabs.org/wdqs/ for monitoring usage or decision-making?

Maps

@MSantos: You're the closest person to a https://discovery.wmflabs.org/maps/ stakeholder I could find who recently referred to the tile usage dashboard.

  • Will you need that data in FY19-20?
  • For trends and performance analysis, would it be helpful to have a tile usage breakdown by language+project vs non-Wikimedia usage?
    • Once I show you how to query the web logs to find which external websites and apps use our tile servers, will a dashboard like that be relevant?
  • Do you ever care about tile usage broken down by styles and zoom levels?

External traffic

@JKatzWMF / @kzimmerman: This dashboard made sense before we had Superset/Turnilo, but now it's mostly obsolete. The only information we get on this dashboard that we don't have anywhere else is the breakdown of traffic by search engine (https://discovery.wmflabs.org/external/#traffic_by_engine), since the pageview data cube in Druid lumps all of it under the "external (search engine)" referer class.

But since almost all of the search engine traffic is just Google anyway and individual search engine traffic patterns aren't interesting, do we need this additional granularity?

@mpopov

  • Will you need that data in FY19-20?

Yes, this data is important to us in order to push forward some decision making.

  • For trends and performance analysis, would it be helpful to have a tile usage breakdown by language+project vs non-Wikimedia usage?

Yes, it would. This can help us scale hardware and even restrict some access like we had to do with the Pokemon GO issue a few years ago.

  • Once I show you how to query the web logs to find which external websites and apps use our tile servers, will a dashboard like that be relevant?

Yes

  • Do you ever care about tile usage broken down by styles and zoom levels?

No.

kzimmerman added a subscriber: debt.

@debt I'm moving this to our team's icebox, and think it should be revisited when a PM for Search comes onboard.

mpopov renamed this task from Retire obsolete Discovery Dashboards to Decommission obsolete Discovery Dashboards.Aug 11 2021, 6:44 PM
mpopov updated the task description. (Show Details)
mpopov removed subscribers: debt, JKatzWMF, Smalyshev, MSantos.

The data outage since April (T287381) provided a great opportunity to re-evaluate whether the final two dashboards (External Traffic and WDQS) should be decommissioned at last.

Isaac (Research):

it would make me sad to see it go but i don't know that i can argue for maintenance being prioritized. well at least data is there. it's always sad to see interesting data no longer be there but i also understand that maintaining something like this causes headaches. the search referral dataset i put out at least means there's some public info on search out there.

More on the search referrals by country dataset & dashboard/exploration tool: https://techblog.wikimedia.org/2021/06/07/searching-for-wikipedia/

Mike (Search Platform):

I’m not sure if anybody actively uses this regularly, but I was looking at it just yesterday. I’m planning on linking to it in upcoming WDQS community comms on the status of WDQS (to show the growing number of SPARQL requests). If we decommission this, it’d be helpful to have a public facing data viz or something that shows the same information, so that at the least the community can have visibility into what went on with WDQS

The team has been using this Grafana dashboard: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=now-6M&to=now&refresh=1d

Note: there are two streams in the Event Platform: wdqs-internal.sparql-query and wdqs-external.sparql-query (using sparql/query schema) that should make aggregation relatively straightforward once Data Engineering make Airflow available (T282033) as we're avoiding more tech debt that would be incurred by creating an ETL job with Oozie.

First step will be to stop the broken ETL job that hasn't yielded any data since April. Second step will be to add a very visible notice to both dashboards that they will be decommissioned (say, Fall 2021) and link to the underlying data that will be available in perpetuity for historical/archival reasons.

Mike has requested for an alternative visualization to be made available so I'm thinking of uploading the WDQS counts to Commons under the Data namespace and opening it up to be visualized on-wiki with the Graph extension.

Change 712422 had a related patch set uploaded (by Bearloga; author: Bearloga):

[operations/puppet@production] statistics::discovery: Stop metric calculation

https://gerrit.wikimedia.org/r/712422

Change 712422 merged by Elukey:

[operations/puppet@production] statistics::discovery: Stop metric calculation

https://gerrit.wikimedia.org/r/712422

Mike has requested for an alternative visualization to be made available so I'm thinking of uploading the WDQS counts to Commons under the Data namespace and opening it up to be visualized on-wiki with the Graph extension.

https://www.mediawiki.org/wiki/User:MPopov_(WMF)/Wikimania_2021_Hackathon 😄