maxsem@tools-bastion-03:~$ sql meta_p MariaDB [meta_p]> select dbname from wiki where url is null; +---------+ | dbname | +---------+ | adywiki | | jamwiki | +---------+ 2 rows in set (0.00 sec)
See also:
maxsem@tools-bastion-03:~$ sql meta_p MariaDB [meta_p]> select dbname from wiki where url is null; +---------+ | dbname | +---------+ | adywiki | | jamwiki | +---------+ 2 rows in set (0.00 sec)
See also:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | jcrespo | T150767 Wikireplica service for tools and labs - issues and missing available views (tracking) | |||
Resolved | jcrespo | T142759 New entries in meta_p.wiki are missing a URL |
@AlexMonk-WMF does your rewrite of the script take care of these too? if so I'm tempted to just hand fix them.
To be honest with you @yuvipanda at this stage I don't even know if that table has been generated with the old maintain-replicas.pl or my maintain-meta_p.py that I wrote last year. I'll test my script later and see if it produces this issue.
It seems my script broke a while ago. Probably when we moved dblist files into the 'dblists' folder in mediawiki-config, or changed InitialiseSettings to use short array syntax. Or maybe it was always broken.
I'm about to upload a new version of mine, no related fixes for your bug, as it'd give you this:
krenair@tools-bastion-03:/tmp/krenair-operations-software/maintain-replicas ((368e5f3...))$ python3 maintain-meta_p.py | grep -E "(ady|jam)" Already up-to-date. Running insertion_query using {'has_flaggedrevs': 0, 'name': 'Википедие', 'dbname': 'adywiki', 'is_closed': 0, 'size': 1, 'has_wikidata': 1, 'is_sensitive': 0, 'url': 'https://ady.wikipedia.org', 'family': 'wikipedia', 'has_visualeditor': 1, 'slice': 's3.labsdb', 'lang': 'ady'} Running insertion_query using {'has_flaggedrevs': 0, 'name': 'Wikipidia', 'dbname': 'jamwiki', 'is_closed': 0, 'size': 1, 'has_wikidata': 1, 'is_sensitive': 0, 'url': 'https://jam.wikipedia.org', 'family': 'wikipedia', 'has_visualeditor': 1, 'slice': 's3.labsdb', 'lang': 'jam'}
BTW: I ran my script under my own user (with a couple of naming changes) - log into tools, sql meta_p and use u2170__meta_p;
This is still an issue. Causing tools like guc to be unable to format results for these two wikis when iterating over all wikis. url is still null/empty string for adywiki.
@yuvipanda @Krinkle I have run the following update based on @AlexMonk-WMF's comments:
UPDATE wiki SET has_flaggedrevs = 0, name = 'Википедие', dbname = 'adywiki', is_closed = 0, size = 1, has_wikidata = 1, is_sensitive = 0, url = 'https://ady.wikipedia.org', family = 'wikipedia', has_visualeditor = 1, slice = 's3.labsdb', lang = 'ady' WHERE dbname='adywiki'; UPDATE wiki SET has_flaggedrevs = 0, name = 'Wikipidia', dbname = 'jamwiki', is_closed = 0, size = 1, has_wikidata = 1, is_sensitive = 0, url = 'https://jam.wikipedia.org', family = 'wikipedia', has_visualeditor = 1, slice = 's3.labsdb', lang = 'jam' WHERE dbname='jamwiki';
The issue, however, has not been corrected- the tool is not updating this database correctly (maybe it lacks permissions?).
I do not think we should provide a guarantee on this service to be reliable- people can just query the API and it will be more reliable than having outdated information here.
Which API? For discovery of wiki databases existing and being queryable in labs, tools rely on meta_p. From there the only thing I have to go on is the dbname - which is not enough to form a MediaWiki API url.
The main three things we need to bootstrap tools is:
Which API?
This is a service that we should provide to all users, not only labs users. Having a wikimedia api, with just one call "which wikis do you host?" integrated on mediawiki will help not only labs users, but all users. Do not worry, I do not intend to delete this, but there are microservices here and there that are basically unmaintained, probably undocumented and not puppetized/version controlled. If we fix this, I would prefer a proper public api (which would be literally one cached php file or a static file integrated into scap).
db slice
If you want to be efficient, query all wikis on the same server, they are there, and at this point they will always be there because many tools rely on that evil, horrible functionality.
I'm aware of this implementation detail, but so far believed that aside from natural load balancing, it also serves for better query performance when consistently using the same (set of) slave(s) for a given wiki? Since "query all wiki"-type tools tend to repeatedly do 800+ queries (as part of a web request), I'd like to get all the performance I can get. I haven't tried but I imagine it might outweigh the cost of setting up a handful of connections (upto 7).
./sql.py -h labsdb1001.eqiad.wmnet meta_p -e "select dbname from wiki where url is null" --no-dry-run Results for labsdb1001.eqiad.wmnet:3306/meta_p: 0 rows in set (0.00 sec)
I am going to close this as resolved, as the initial request was fullfiled- we should however, revisit the meta_p script and maybe implement this into production, rather than an ill-maintened table on labs only.