Context:
Wikibase world is the community-run, cloud hosted instance that is a directory of Wikibases. When I first manually collated our table of known instances I ran their query "Only Wikibases that are currently online". I am not convinced this list is accurate and I would like to check that there aren't any self-hosted (ie non cloud) instances within Wikibase.world which are live but not returned by this query.
Goal:
catch any missing instances from my first round of manual querying.
Acceptance Criteria
- import all missing self-hosted Wikibases from wikibase.world to our metrics db
- mark which instances are no longer online
- determine urls for apis of live instances and update metrics db
- filter out key exceptions such as Wikidata, Wikimedia Commons, etc (either by marking them w/in the database or by never importing them, whichever is simpler to build around)
Possible supporting mechanisms (optional):
note: the following is not necessarily in the correct order of operations
- create a mechanism using cloud's api or raw data (found in the cloud metrics google sheet) to filter out any known cloud instances
- create a manner to automatically record whether an instance is considered 'live'. (ex api pings back) doesn't need to be perfect for MVP