Page MenuHomePhabricator

[Scraper] Implement CSV endpoint to get relevant data for reporting
Closed, ResolvedPublic

Description

Maybe one that gets a GQL request via HTTP GET and flattens the result? This way we could have an URL to download CSVs every month. Also, we need to auth. Could we also add the auth token to the URL?

Event Timeline

Implemented most basic version:

  • /csv/metrics GET endpoint, does require the auth header
  • Runs a pre-configured SQL query, dumps the result in a CSV, returns file

1 Row per Wiki, with the following columns:

  • wikibase_id
  • wikibase_type
  • base_url

From latest successful Quantity observation

  • quantity_observation_date
  • total_items
  • total_lexemes
  • total_properties
  • total_triples
  • total_ei_properties
  • total_ei_statements
  • total_url_properties
  • total_url_statements

From latest successful Recent Changes observation

  • recent_changes_observation_date
  • first_change_date
  • last_change_date
  • human_change_count
  • human_change_user_count
  • bot_change_count
  • bot_change_user_count

From latest successful Software Version observation

  • software_version_observation_date
  • software_name -- ONLY MediaWiki
  • version -- MediaWiki version

@RickiJay-WMDE: This open task was moved to "Done" on an archived workboard and has no other non-archived project tags. If this task is done, please update the task status to "Resolved" so this task does not show up on workboard or in search results as unresolved. Thanks.

@RickiJay-WMDE: No reply thus boldly resolving. Please set the status accordingly so such tasks do not show up in the search results for open, unresolved tasks - thanks.