In T295775, we wrote an Elixir commandline script [[ https://gitlab.com/wmde/search-mapframe-insource/ | search-mapframe-insource ]] which searches all wikis using the [[ https://www.mediawiki.org/wiki/Help:CirrusSearch#Insource | insource ]] search query, for the purpose of counting mapframe usages. We want to add two more columns to the per-wiki statistics emitted by [[ https://gitlab.com/wmde/search-mapframe-insource/-/blob/main/lib/mapframe_search.ex | MapframeSearchInsource ]] (`search_insource --maps`):
* How many pages include a mapframe with a "geopoint" and "ids" external data source.
* How many pages include a mapframe with a "geopoint" and a "query" external data source.
Implementation steps:
* [x] Write an insource regex for geopoint maps with ids. `/[^|]geopoint[^|]*ids/` [[ https://de.wikipedia.org/w/index.php?search=insource%3Amapframe+insource%3A%2F%5B%5E%7C%5Dgeopoint%5B%5E%7C%5D%2Aids%2F&title=Spezial:Suche&profile=advanced&fulltext=1&ns0=1 | example in dewiki ]]
* [x] Write an insource regex for geopoint maps which make a query. `/[^|]geopoint[^|]*query/` [[ https://de.wikipedia.org/w/index.php?search=insource%3Amapframe+insource%3A%2F%5B%5E%7C%5Dgeopoint%5B%5E%7C%5D%2Aquery%2F&title=Spezial:Suche&profile=advanced&fulltext=1&ns0=1 | example in dewiki ]]
* [x] Integrate these regexes into the search-insource script, under a new function.
* [x] Document the new metrics in our [[ https://docs.google.com/spreadsheets/d/1L_9YsDbKkbUJMJeJcHOeBtHs8mrttmdOAHtKewsmP5Q/edit#gid=0 | internal analytics catalog ]].
Note that some work has already been merged which generalizes this tool and allows arbitrary search terms. Feel free to write the queries as one-off script runs, or integrate as a specialized module.
Nice to have:
* [x] Rename the project "search-all-wikis" and update documentation, to reflect that the tool has been generalized. --> https://gitlab.com/wmde/search-all-wikis
Review
https://gitlab.com/wmde/search-mapframe-insource/-/merge_requests/5