We would like to have some tracking of usage of Toolhub. By the magic of the Wikimedia CDN, the webrequest hadoop table has information on Toolhub traffic.
$ ssh stat1007 hive (wmf)> select count(distinct ip) as hits from webrequest where year = 2021 and month = 9 and day = 29 and uri_host = "toolhub.wikimedia.org"; ... lots of hive progress reporting ... MapReduce Total cumulative CPU time: 1 days 5 hours 57 minutes 45 seconds 740 msec Ended Job = job_1632476005296_21353 MapReduce Jobs Launched: Stage-Stage-1: Map: 5206 Reduce: 1 Cumulative CPU: 107865.74 sec HDFS Read: 176232805069 HDFS Write: 102 SUCCESS Total MapReduce CPU Time Spent: 1 days 5 hours 57 minutes 45 seconds 740 msec OK hits 14 Time taken: 106.498 seconds, Fetched: 1 row(s)
Figure out what we want to mine from here and if it is worth setting up a dashboard somewhere.