Hello,
I am opening this thread as a suggestion from the IRC channel Wikimedia Labs, where I exposed the problem I encountered.
I told them:
"I was just running an script I made in march-april which runs into several wp languages databases, separately and at the same time using "union all".
I am surprised because I notice it is much slower than before. What have happened? And, could u speed it up somehow by changing some user parameters? Otherwise it is impossible to run the tests..."
Actually, it took three hours for a test it used to run with less than 10 min. I used gnwiki (only 700 articles), which is my small test wiki. I see it would be impossible to test middle or big wikis...
My code does run several wikis for a list of articles and checks its existance in other articles and how editors edit them. I study multilingual behavior.
List of queries, for example:
'SELECT DISTINCT rev_user_text, count(*) FROM '+lang+'_p.revision WHERE rev_page = (SELECT page_id FROM '+lang+'_p.page WHERE page_title = %s AND page_namespace=0 AND page_is_redirect=0) GROUP BY rev_user_text'
'SELECT rev_user_text, COUNT(*) FROM revision INNER JOIN page ON rev_page=page_id WHERE page_namespace=0 AND page_is_redirect=0 GROUP BY rev_user_text ORDER BY 2 DESC'
Checking number of langlinks...
'SELECT ll_lang, COUNT(*) FROM page INNER JOIN langlinks ON ll_from=page_id WHERE page_id IN (SELECT page_id FROM u3532__.'+originarylang+table+') AND page_is_redirect=0 AND page_namespace=0 GROUP by 1 ORDER BY 2 DESC'
Checking the names of an article in several languages through langlinks...
'SELECT ll_lang,ll_title FROM langlinks WHERE ll_from = %s'
Checking "edit_count" in multiple language editions for a user with the same name.
query = query + 'SELECT "'+language+'",user_editcount FROM '+language+'_p.user WHERE user_name LIKE %s '
if count < len(languagelist): query = query + 'UNION ALL '
query = query + 'ORDER BY user_editcount DESC'This last one, when does UNION ALL for 10 wiki databases is where it struggles most.
What could you do to improve the system or at least go back to the configuration which was working in march-april?
Thank you very much.
Marc Miquel