New entries in meta_p.wiki are missing a URL
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MaxSem
	Aug 11 2016, 9:37 PM

Description

maxsem@tools-bastion-03:~$ sql meta_p
MariaDB [meta_p]> select dbname from wiki where url is null;
+---------+
| dbname  |
+---------+
| adywiki |
| jamwiki |
+---------+
2 rows in set (0.00 sec)

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		jcrespo	T150767 Wikireplica service for tools and labs - issues and missing available views (tracking)
		Resolved		jcrespo	T142759 New entries in meta_p.wiki are missing a URL

Event Timeline

MaxSem created this task.Aug 11 2016, 9:37 PM

Restricted Application added a project: Cloud-Services. · View Herald TranscriptAug 11 2016, 9:37 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Also seems to have the wrong language. They both get lang=en. And name IS NULL.

@AlexMonk-WMF does your rewrite of the script take care of these too? if so I'm tempted to just hand fix them.

To be honest with you @yuvipanda at this stage I don't even know if that table has been generated with the old maintain-replicas.pl or my maintain-meta_p.py that I wrote last year. I'll test my script later and see if it produces this issue.

It seems my script broke a while ago. Probably when we moved dblist files into the 'dblists' folder in mediawiki-config, or changed InitialiseSettings to use short array syntax. Or maybe it was always broken.

I'm about to upload a new version of mine, no related fixes for your bug, as it'd give you this:

krenair@tools-bastion-03:/tmp/krenair-operations-software/maintain-replicas ((368e5f3...))$ python3 maintain-meta_p.py | grep -E "(ady|jam)"
Already up-to-date.
Running insertion_query using {'has_flaggedrevs': 0, 'name': 'Википедие', 'dbname': 'adywiki', 'is_closed': 0, 'size': 1, 'has_wikidata': 1, 'is_sensitive': 0, 'url': 'https://ady.wikipedia.org', 'family': 'wikipedia', 'has_visualeditor': 1, 'slice': 's3.labsdb', 'lang': 'ady'}
Running insertion_query using {'has_flaggedrevs': 0, 'name': 'Wikipidia', 'dbname': 'jamwiki', 'is_closed': 0, 'size': 1, 'has_wikidata': 1, 'is_sensitive': 0, 'url': 'https://jam.wikipedia.org', 'family': 'wikipedia', 'has_visualeditor': 1, 'slice': 's3.labsdb', 'lang': 'jam'}

• AlexMonk-WMF removed a project: Toolforge.Aug 12 2016, 1:37 AM

BTW: I ran my script under my own user (with a couple of naming changes) - log into tools, sql meta_p and use u2170__meta_p;

This is still an issue. Causing tools like guc to be unable to format results for these two wikis when iterating over all wikis. url is still null/empty string for adywiki.

Krinkle updated the task description. (Show Details)Nov 25 2016, 12:16 AM

Krinkle updated the task description. (Show Details)Nov 25 2016, 12:56 AM

@yuvipanda @Krinkle I have run the following update based on @AlexMonk-WMF's comments:

UPDATE wiki SET has_flaggedrevs = 0, name = 'Википедие', dbname = 'adywiki', is_closed = 0, size = 1, has_wikidata = 1, is_sensitive = 0, url = 'https://ady.wikipedia.org', family = 'wikipedia', has_visualeditor = 1, slice = 's3.labsdb', lang = 'ady' WHERE dbname='adywiki';

UPDATE wiki SET has_flaggedrevs = 0, name = 'Wikipidia', dbname = 'jamwiki', is_closed = 0, size = 1, has_wikidata = 1, is_sensitive = 0, url = 'https://jam.wikipedia.org', family = 'wikipedia', has_visualeditor = 1, slice = 's3.labsdb', lang = 'jam'
WHERE dbname='jamwiki';

The issue, however, has not been corrected- the tool is not updating this database correctly (maybe it lacks permissions?).

I do not think we should provide a guarantee on this service to be reliable- people can just query the API and it will be more reliable than having outdated information here.

jcrespo added a parent task: T150767: Wikireplica service for tools and labs - issues and missing available views (tracking).Nov 25 2016, 4:00 PM

In T142759#2823498, @jcrespo wrote:

The issue, however, has not been corrected- the tool is not updating this database correctly (maybe it lacks permissions?).

As far as I know, it's just not being run.

In T142759#2823498, @jcrespo wrote:

people can just query the API and it will be more reliable than having outdated information here.

Which API? For discovery of wiki databases existing and being queryable in labs, tools rely on meta_p. From there the only thing I have to go on is the dbname - which is not enough to form a MediaWiki API url.

The main three things we need to bootstrap tools is:

dbname
canonical server or url (to find the API, site config, namespace config, localisation etc.)
db slice (for efficient re-use of connections when iterating over all wikis)

Which API?

This is a service that we should provide to all users, not only labs users. Having a wikimedia api, with just one call "which wikis do you host?" integrated on mediawiki will help not only labs users, but all users. Do not worry, I do not intend to delete this, but there are microservices here and there that are basically unmaintained, probably undocumented and not puppetized/version controlled. If we fix this, I would prefer a proper public api (which would be literally one cached php file or a static file integrated into scap).

db slice

If you want to be efficient, query all wikis on the same server, they are there, and at this point they will always be there because many tools rely on that evil, horrible functionality.

• chasemp subscribed.Dec 9 2016, 1:18 PM

In T142759#2824239, @jcrespo wrote:

db slice

If you want to be efficient, query all wikis on the same server

I'm aware of this implementation detail, but so far believed that aside from natural load balancing, it also serves for better query performance when consistently using the same (set of) slave(s) for a given wiki? Since "query all wiki"-type tools tend to repeatedly do 800+ queries (as part of a web request), I'd like to get all the performance I can get. I haven't tried but I imagine it might outweigh the cost of setting up a handful of connections (upto 7).

Krinkle mentioned this in T153987: [Regression] Many url values have a malformed subdomain in wiki meta_p database table.Dec 22 2016, 11:36 PM

Base subscribed.Feb 19 2017, 2:08 PM

./sql.py -h labsdb1001.eqiad.wmnet meta_p -e "select dbname from wiki where url is null" --no-dry-run

Results for labsdb1001.eqiad.wmnet:3306/meta_p:
0 rows in set (0.00 sec)

I am going to close this as resolved, as the initial request was fullfiled- we should however, revisit the meta_p script and maybe implement this into production, rather than an ill-maintened table on labs only.

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:40 PM

New entries in meta_p.wiki are missing a URLClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

New entries in meta_p.wiki are missing a URL
Closed, ResolvedPublic
Actions

Related Objects
Search...