Page MenuHomePhabricator
Paste P5201

Bash script to check for local overrides of "MediaWiki:Publishpage"
ActivePublic

Authored by Aklapper on Apr 4 2017, 2:22 PM.
#!/bin/bash
# Local bash script to check if "NAMESPACE:PAGENAME" has ever been changed on any of the WMF sites.
# Requires installing JQ: https://stedolan.github.io/jq/ (but prepackaged in Fedora/Ubuntu)
# This code is licensed under CC0 1.0 Universal: https://creativecommons.org/publicdomain/zero/1.0/legalcode
#You must set these two variables accordingly (default: "MediaWiki:Publishpage"):
NAMESPACE="MediaWiki"
PAGENAME="Publishpage"
#### As a script user, you can ignore everything else below this line. ####
#Temporary files (too lazy to pass wget to jq via stdin):
TMPFILE_RAWSITEMATRIX=tmp_wmf_rawsitematrix.json
TMPFILE_DOMAINSLIST=tmp_wmf_domainmatrix.list
TMPFILE_SINGLESITE=tmp_wmf_output.json
# As output is appended into the files, remove those files if existing, to avoid duplicated entries
# in case the script got ever run and interrupted in the past and left those files in place:
if [ -f "$TMPFILE_RAWSITEMATRIX" ]; then
rm $TMPFILE_RAWSITEMATRIX
fi
if [ -f "$TMPFILE_DOMAINSLIST" ]; then
rm $TMPFILE_DOMAINSLIST
fi
if [ -f "$TMPFILE_SINGLESITE" ]; then
rm $TMPFILE_SINGLESITE
fi
##### BEGIN: Create file TMPFILE_DOMAINSLIST that lists the domains of all our sites
# Get domain info for all sites from https://en.wikipedia.org/w/api.php?action=sitematrix&smsiteprop=url and dump into TMPFILE_RAWSITEMATRIX
# Docs: https://en.wikipedia.org/w/api.php?action=help&modules=sitematrix
wget -q https:\/\/en.wikipedia.org\/w\/api.php?action=sitematrix\&smsiteprop=url\&format=json -O $TMPFILE_RAWSITEMATRIX
# Dump all key names (such as "47" or "specials") in our sitematrix into tmpkeyarray
mapfile -t tmpkeyarray < <(jq '.sitematrix | keys' $TMPFILE_RAWSITEMATRIX | jq ".[]")
for i in "${tmpkeyarray[@]}"
do
# Check only items like "123" (which are actually languages). Handle the "specials" key later as it has a different format
if [[ $i =~ ^\"[0-9]+\"$ ]]; then
# Output all domains and drop them into another file
echo "$(cat $TMPFILE_RAWSITEMATRIX | jq -r '.sitematrix .'"$i"' .site[] .url')" >> $TMPFILE_DOMAINSLIST
fi
done
# Now also add the non-language sites listed under the "specials" key:
specialslength="$(cat $TMPFILE_RAWSITEMATRIX | jq '.sitematrix .specials | length')"
for ((i=1; i<$specialslength; i++))
do
echo "$(cat $TMPFILE_RAWSITEMATRIX | jq -r '.sitematrix .specials['$i'] .url')" >> $TMPFILE_DOMAINSLIST
done
rm $TMPFILE_RAWSITEMATRIX
##### END: Create file TMPFILE_DOMAINSLIST which is a complete list of all our domains
# Dump all the domains from the file into an array:
mapfile -t sitesArray < "$TMPFILE_DOMAINSLIST"
# Now query each site for that specific string:
for j in "${sitesArray[@]}"; do
# Docs: https://www.mediawiki.org/wiki/API:Revisions
# Get only the last and latest revision of the page that we're checking:
wget -q $j\/w\/api.php?action=query\&rvdir=newer\&rvlimit=1\&prop=revisions\&titles=$NAMESPACE\:$PAGENAME\&format=json -O $TMPFILE_SINGLESITE
# If a revision (well, an ID for a revision) exists there has been a local override at some point. Hence print output:
if [[ $(cat $TMPFILE_SINGLESITE | jq '.query .pages[] .revisions[0] .revid') != "null" ]]; then
echo $j\/wiki\/$NAMESPACE\:$PAGENAME CHANGED
# else
# echo $j not changed
fi
sleep 2 #wait for 2 seconds as a courtesy to servers
done
# clean up:
rm $TMPFILE_SINGLESITE
rm $TMPFILE_DOMAINSLIST

Event Timeline

Aklapper edited the content of this paste. (Show Details)
Aklapper edited the content of this paste. (Show Details)

If you have cluster access i think this query does the same thing:

curl search.svc.eqiad.wmnet:9200/_all/_search -d '{"query":{"bool":{"filter":[{"match": {"namespace": 8}}], "must":[{"match":{"title.keyword":"Publishpage"}}]}}}' | jq '.hits.hits | map(._source.wiki)'

The result is:

[
  "jawiki",
  "ltwiki",
  "bewiki",
  "barwiki",
  "dewikibooks",
  "nvwiki"
]

Thanks Eric! I've been thinking about that and also about using Quarry, but being lazy and not a dev this was my poor man's attempt (which I knew could be put into way way less lines). :)

@Aklapper with the newest version, I'm getting these 2 error lines at the start

n@kotatsu ~/wikimedia $ ./mediawikichangesALL.sh
cat: test.json: No such file or directory
./mediawikichangesALL.sh: line 48: ((: i<: syntax error: operand expected (error token is "<")
https://ar.wikipedia.org CHANGED

(and then results start appearing as expected)


@EBernhardson yup, this to solve the problem for us non-devs. I've got extensive notes and phab links in a sandbox, which I'll move to mw.o once I've had a chance to clean up some of the duplicate tasks and confusing bits. ( https://office.wikimedia.org/wiki/User:Quiddity_(WMF)/Localmediawikistringchanges )

(For the records, I've moved this to https://gitlab.com/aklapper/wikimedia-wikis-local-string-overrides as code should be in a code repo and not in a Pastebin.)