Page MenuHomePhabricator

New entries in meta_p.wiki are missing a URL
Closed, ResolvedPublic

Description

maxsem@tools-bastion-03:~$ sql meta_p
MariaDB [meta_p]> select dbname from wiki where url is null;
+---------+
| dbname  |
+---------+
| adywiki |
| jamwiki |
+---------+
2 rows in set (0.00 sec)

See also:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Also seems to have the wrong language. They both get lang=en. And name IS NULL.

@AlexMonk-WMF does your rewrite of the script take care of these too? if so I'm tempted to just hand fix them.

To be honest with you @yuvipanda at this stage I don't even know if that table has been generated with the old maintain-replicas.pl or my maintain-meta_p.py that I wrote last year. I'll test my script later and see if it produces this issue.

It seems my script broke a while ago. Probably when we moved dblist files into the 'dblists' folder in mediawiki-config, or changed InitialiseSettings to use short array syntax. Or maybe it was always broken.

I'm about to upload a new version of mine, no related fixes for your bug, as it'd give you this:

krenair@tools-bastion-03:/tmp/krenair-operations-software/maintain-replicas ((368e5f3...))$ python3 maintain-meta_p.py | grep -E "(ady|jam)"
Already up-to-date.
Running insertion_query using {'has_flaggedrevs': 0, 'name': 'Википедие', 'dbname': 'adywiki', 'is_closed': 0, 'size': 1, 'has_wikidata': 1, 'is_sensitive': 0, 'url': 'https://ady.wikipedia.org', 'family': 'wikipedia', 'has_visualeditor': 1, 'slice': 's3.labsdb', 'lang': 'ady'}
Running insertion_query using {'has_flaggedrevs': 0, 'name': 'Wikipidia', 'dbname': 'jamwiki', 'is_closed': 0, 'size': 1, 'has_wikidata': 1, 'is_sensitive': 0, 'url': 'https://jam.wikipedia.org', 'family': 'wikipedia', 'has_visualeditor': 1, 'slice': 's3.labsdb', 'lang': 'jam'}

BTW: I ran my script under my own user (with a couple of naming changes) - log into tools, sql meta_p and use u2170__meta_p;

Krinkle triaged this task as High priority.EditedNov 24 2016, 11:46 PM
Krinkle awarded a token.
Krinkle subscribed.

This is still an issue. Causing tools like guc to be unable to format results for these two wikis when iterating over all wikis. url is still null/empty string for adywiki.

jcrespo lowered the priority of this task from High to Low.Nov 25 2016, 3:52 PM
jcrespo subscribed.

@yuvipanda @Krinkle I have run the following update based on @AlexMonk-WMF's comments:

UPDATE wiki SET has_flaggedrevs = 0, name = 'Википедие', dbname = 'adywiki', is_closed = 0, size = 1, has_wikidata = 1, is_sensitive = 0, url = 'https://ady.wikipedia.org', family = 'wikipedia', has_visualeditor = 1, slice = 's3.labsdb', lang = 'ady' WHERE dbname='adywiki';

UPDATE wiki SET has_flaggedrevs = 0, name = 'Wikipidia', dbname = 'jamwiki', is_closed = 0, size = 1, has_wikidata = 1, is_sensitive = 0, url = 'https://jam.wikipedia.org', family = 'wikipedia', has_visualeditor = 1, slice = 's3.labsdb', lang = 'jam'
WHERE dbname='jamwiki';

The issue, however, has not been corrected- the tool is not updating this database correctly (maybe it lacks permissions?).

I do not think we should provide a guarantee on this service to be reliable- people can just query the API and it will be more reliable than having outdated information here.

The issue, however, has not been corrected- the tool is not updating this database correctly (maybe it lacks permissions?).

As far as I know, it's just not being run.

people can just query the API and it will be more reliable than having outdated information here.

Which API? For discovery of wiki databases existing and being queryable in labs, tools rely on meta_p. From there the only thing I have to go on is the dbname - which is not enough to form a MediaWiki API url.

The main three things we need to bootstrap tools is:

  • dbname
  • canonical server or url (to find the API, site config, namespace config, localisation etc.)
  • db slice (for efficient re-use of connections when iterating over all wikis)

Which API?

This is a service that we should provide to all users, not only labs users. Having a wikimedia api, with just one call "which wikis do you host?" integrated on mediawiki will help not only labs users, but all users. Do not worry, I do not intend to delete this, but there are microservices here and there that are basically unmaintained, probably undocumented and not puppetized/version controlled. If we fix this, I would prefer a proper public api (which would be literally one cached php file or a static file integrated into scap).

db slice

If you want to be efficient, query all wikis on the same server, they are there, and at this point they will always be there because many tools rely on that evil, horrible functionality.

db slice

If you want to be efficient, query all wikis on the same server

I'm aware of this implementation detail, but so far believed that aside from natural load balancing, it also serves for better query performance when consistently using the same (set of) slave(s) for a given wiki? Since "query all wiki"-type tools tend to repeatedly do 800+ queries (as part of a web request), I'd like to get all the performance I can get. I haven't tried but I imagine it might outweigh the cost of setting up a handful of connections (upto 7).

jcrespo claimed this task.
./sql.py -h labsdb1001.eqiad.wmnet meta_p -e "select dbname from wiki where url is null" --no-dry-run

Results for labsdb1001.eqiad.wmnet:3306/meta_p:
0 rows in set (0.00 sec)

I am going to close this as resolved, as the initial request was fullfiled- we should however, revisit the meta_p script and maybe implement this into production, rather than an ill-maintened table on labs only.