When blocking tasks are done, run the LQT conversion script on Project:Support desk.
Support desk script run started at 2015-07-30 23:05:45 UTC:
[2015-07-30 23:05:45] Starting LQT conversion of page Project:Support_desk
When blocking tasks are done, run the LQT conversion script on Project:Support desk.
Support desk script run started at 2015-07-30 23:05:45 UTC:
[2015-07-30 23:05:45] Starting LQT conversion of page Project:Support_desk
Back of the envelope calculation is that we're about half done. I tried comparing to previous conversions to estimate time per topic, but the resulting time estimate was too low (not sure why).
This is instead based on looking at the progress of competed topics (since they are turned into redirects):
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect; +----------+ | COUNT(*) | +----------+ | 3333 | +----------+ 1 row in set (0.51 sec) mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%'; +----------+ | COUNT(*) | +----------+ | 6835 | +----------+ 1 row in set (0.02 sec)
3333/6835 = 48.8%
TLDR, it seems to be getting slower. For now we can probably still just leave it, but it might be like 10 days. If it gets even slower, we will have to probably stop it and fix the problem (it is resumable):
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/% | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 0.5324 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.06 sec)
It does seem to be slowing done on the topic iteration:
[2015-07-30 22:12:46] Imported 2 revisions for header [2015-07-30 22:12:47] Importing new topic -- [2015-07-30 22:12:48] Finished importing post with 1 revisions [2015-07-30 22:13:25] Importing new topic -- [2015-07-30 22:13:26] Finished importing post with 1 revisions [2015-07-30 22:13:31] Importing new topic -- [2015-07-30 22:13:32] Finished importing post with 1 revisions [2015-07-30 22:13:37] Importing new topic -- [2015-07-30 22:13:38] Finished importing post with 1 revisions [2015-07-30 22:13:49] Importing new topic -- [2015-07-30 22:13:51] Finished importing post with 1 revisions [2015-07-30 22:13:58] Importing new topic -- [2015-07-30 22:13:58] Finished importing post with 1 revisions [2015-07-30 22:14:02] Importing new topic -- [2015-07-30 22:14:03] Finished importing post with 1 revisions [2015-07-30 22:14:07] Importing new topic -- [2015-07-30 22:14:09] Finished importing post with 1 revisions [2015-07-30 22:14:15] Importing new topic -- [2015-07-30 22:14:20] Finished importing post with 2 revisions [2015-07-30 22:14:28] Importing new topic
-- [2015-08-05 00:26:26] Finished importing post with 1 revisions [2015-08-05 00:44:02] Importing new topic -- [2015-08-05 00:44:19] Finished importing post with 1 revisions [2015-08-05 00:46:07] Importing new topic -- [2015-08-05 00:46:34] Finished importing post with 1 revisions [2015-08-05 00:49:18] Importing new topic -- [2015-08-05 00:50:19] Finished importing post with 1 revisions [2015-08-05 00:55:46] Importing new topic -- [2015-08-05 00:56:25] Finished importing post with 1 revisions [2015-08-05 01:00:01] Importing new topic -- [2015-08-05 01:01:23] Finished importing post with 1 revisions [2015-08-05 01:06:41] Importing new topic -- [2015-08-05 01:10:03] Finished importing post with 1 revisions [2015-08-05 01:29:08] Importing new topic -- [2015-08-05 01:30:09] Finished importing post with 1 revisions [2015-08-05 01:36:03] Importing new topic -- [2015-08-05 01:41:10] Finished importing post with 3 revisions [2015-08-05 02:05:49] Importing new topic -- [2015-08-05 02:06:15] Finished importing post with 1 revisions [2015-08-05 02:08:51] Importing new topic
Something is slowing down the iteration of topics; in the beginning there were not these gaps, but now they're long.
I think the query is roughly:
SELECT thread_id,thread_id,thread_subject,thread_article_namespace,thread_article_title,thread_parent,thread_ancestor,thread_created,thread_modified,thread_author_id,thread_author_name,thread_summary_page,thread_root,thread_type,thread_signature FROM `thread` WHERE thread_article_namespace = '0' AND thread_article_title = 'Test_LQT' AND (thread_id>='674') AND (thread_type != '2') ORDER BY thread_id LIMIT 501
(that's taken from my test wiki debugging, but can be changed to match prod).
Even doing it without the LIMIT doesn't take forever:
SELECT thread_id,thread_id,thread_subject,thread_article_namespace,thread_article_title,thread_parent,thread_ancestor,thread_created,thread_modified,thread_author_id,thread_author_name,thread_summary_page,thread_root,thread_type,thread_signature FROM `thread` WHERE thread_article_namespace = '4' AND thread_article_title = 'Support_desk/LQT_Archive_1' AND (thread_id>='674') AND (thread_type != '2') ORDER BY thread_id;
27456 rows in set (0.42 sec)
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%'); +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/% | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 0.5568 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.93 sec)
The slowdown may be partly to writing larger and larger source store logs. Currently at:
-rw-r--r-- 1 www-data www-data 2.0M Aug 5 23:03 Project:Support_desk.log
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (3.76 sec) mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%'); +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/% | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 0.6119 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.80 sec)
So it turned out to have a memory leak (@hoo noticed it last night). It was about 16 GB when I had to kill it:
www-data 7846 92.4 49.8 17474100 16370300 pts/47 R<+ Jul30 13483:52 php /srv/mediawiki-staging/multiversion/MWScript.php extensions/Flow/maintenance/convertLqtPageOnLocalWiki.php --wiki=mediawikiwiki --debug --srcpage Project:Support_desk --logfile /tmp/Project:Support_desk.log
I should have checked this before (this is almost definitely the cause of the slowdown (SWAP usage rather than RAM)), but it didn't occur to me since I was more thinking about the possibility of repeated iterations or inefficient database queries.
Log of the initial run is at http://people.wikimedia.org/~mattflaschen/2015_07_30_LQT_Flow_VE_Feedback_and_Support_desk.log (VE is also before that).
Good news is that we're done two thirds:
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%'); +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/% | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 0.6748 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.73 sec)
Bad news is when we resume it will have to loop through and detect all the completed ones from the resume log, but it won't have to do any Parsoid requests or inserts.
Good news is after the memory leak is done, the remaining third should go faster.
Change 230723 had a related patch set uploaded (by Mattflaschen):
WIP: Eliminate some memory leaks affecting LQT->Flow:
Change 230723 merged by jenkins-bot:
Eliminate some memory leaks affecting LQT->Flow:
Change 230929 had a related patch set uploaded (by Mattflaschen):
Eliminate some memory leaks affecting LQT->Flow:
Change 230929 merged by jenkins-bot:
Eliminate some memory leaks affecting LQT->Flow:
It's still leaking. I'm wondering if we have anything set up (xhprof?) for memory profiling, and whether I have to kill it to add some line enabling that or not.
www-data 32477 53.6 18.3 6536248 6025732 pts/47 R<+ 00:01 85:16 php /srv/mediawiki-staging/multiversion/MWScript.php extensions/Flow/maintenance/convertLqtPageOnLocalWiki.php --wiki=mediawikiwiki --debug --srcpage Project:Support_desk/LQT_Archive_1 --logfile /tmp/Project:Support_desk.log
@ori installed XDebug on terbium for me. It looks like the relevant parameters we would need to set are:
xdebug.trace_format = 1 xdebug.show_mem_delta = 1 xdebug.auto_trace = 1
So the following should work:
php -d "xdebug.trace_format=1" -d "xdebug.show_mem_delta=1" -d "xdebug.auto_trace=1" /srv/mediawiki-staging/multiversion/MWScript.php extensions/Flow/maintenance/convertLqtPageOnLocalWiki.php --wiki=mediawikiwiki --debug --srcpage Project:Support_desk/LQT_Archive_1 --logfile /tmp/Project:Support_desk.log
The output should go to /tmp. Holding off on trying this, since I want to see how far it will get before the memory becomes intolerable. Currently it's at:
www-data 32477 53.2 25.7 8999080 8447788 pts/47 R<+ 00:01 141:34 php /srv/mediawiki-staging/multiversion/MWScript.php extensions/Flow/maintenance/convertLqtPageOnLocalWiki.php --wiki=mediawikiwiki --debug --srcpage Project:Support_desk/LQT_Archive_1 --logfile /tmp/Project:Support_desk.log
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT CURRENT_TIMESTAMP(), (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%'); +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CURRENT_TIMESTAMP() | (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/% | +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 2015-08-12 04:29:53 | 0.7331 | +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.08 sec)
It's definitely moving faster now, but it's not clear how much of that is due to having done ~5 percentage points since restart vs. memory leak improvements. It's done 5.83 (73.31 - 67.48) percentage points since ~00:01 UTC (and that calculation penalizes it for the "check off what I already did" phase. It's 04:29 in the above, so that's 4 hours, 28 minutes. That's 1.31 an hour.
Whereas before, at one point it was doing less than 5 percentage points in over a day.
The RAM reached 10 GB, and I don't really feel comfortable letting it run overnight, so I decided to kill it.
Plus, we still really ought to figure out the memory leak.
So I'll re-run it with the profiling parameters from above tomorrow. Final status for tonight:
0.7494 (fraction)
and:
www-data 32477 54.5 31.1 10812664 10223660 pts/47 S<+ 00:01 189:59 php /srv/mediawiki-staging/multiversion/MWScript.php extensions/Flow/maintenance/convertLqtPageOnLocalWiki.php --wiki=mediawikiwiki --debug --srcpage Project:Support_desk/LQT_Archive_1 --logfile /tmp/Project:Support_desk.log
It's still leaking, but there was a decent amount of progress (12 percentage points) on the HHVM run (see T108601):
0.8758
I'll restart it tomorrow (again on HHVM if possible, since I think it's faster).
The script run is done. :)
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT CURRENT_TIMESTAMP(), (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%'); +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CURRENT_TIMESTAMP() | (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND page_is_redirect) / (SELECT COUNT(*) FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/% | +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 2015-08-14 07:12:14 | 0.9999 | +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.69 sec)
...
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT * FROM page WHERE page_namespace = 90 AND page_title LIKE 'Project:Support_desk/%' AND page_title NOT LIKE '%/reply%%%%' AND NOT page_is_redirect; +---------+----------------+-------------------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+-----------------------+--------------------+ | page_id | page_namespace | page_title | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random | page_touched | page_links_updated | page_latest | page_len | page_no_title_convert | page_content_model | +---------+----------------+-------------------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+-----------------------+--------------------+ | 56726 | 90 | Project:Support_desk/How_can_I_clear_my_New_Messages? | | 0 | 0 | 1 | 0.693698862857 | 20120906235530 | NULL | 387699 | 655 | 0 | wikitext | +---------+----------------+-------------------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+-----------------------+--------------------+ 1 row in set (0.02 sec)
This thread is in an inconsistent state. Compare:
vs.
(It is a post about the server being slow when there are too many LiquidThreads messages: Really).
It's apparently since it's not in the thread table, which unfortunately we can't really do anything about:
mysql:research@s3-analytics-slave [mediawikiwiki]> SELECT thread_id, page_id, rev_id, page_namespace, page_title FROM revision JOIN page on rev_page = page_id LEFT OUTER JOIN thread ON thread_root = page_id WHERE page_id = 56726; +-----------+---------+--------+----------------+-------------------------------------------------------+ | thread_id | page_id | rev_id | page_namespace | page_title | +-----------+---------+--------+----------------+-------------------------------------------------------+ | NULL | 56726 | 387699 | 90 | Project:Support_desk/How_can_I_clear_my_New_Messages? | +-----------+---------+--------+----------------+-------------------------------------------------------+ 1 row in set (0.03 sec)
So I'm prepared to call it done. Logs coming next.
First log: http://people.wikimedia.org/~mattflaschen/2015_07_30_LQT_Flow_VE_Feedback_and_Support_desk.log
Second log (resume): http://people.wikimedia.org/~mattflaschen/2015_08_11_LQT_Flow_Support_desk_resume.log
Third log (resume): http://people.wikimedia.org/~mattflaschen/2015_08_12_LQT_Flow_Support_desk_resume.log
Fourth log (resume on HHVM): http://people.wikimedia.org/~mattflaschen/2015_08_12_LQT_Flow_Support_desk_hhvm_resume.log
Fifth and final log (resume on HHVM): http://people.wikimedia.org/~mattflaschen/2015_08_13_LQT_Flow_Support_desk_hhvm_resume.log
Just one error (see T108972), which is already fixed.
mattflaschen@terbium:~$ grep -iE 'warning|exception|error|fatal' ./public_html/2015_07_30_LQT_Flow_VE_Feedback_and_Support_desk.log ./public_html/2015_08_13_LQT_Flow_Support_desk_hhvm_resume.log ./public_html/2015_08_12_LQT_Flow_Support_desk_hhvm_resume.log ./public_html/2015_08_12_LQT_Flow_Support_desk_resume.log ./public_html/2015_08_11_LQT_Flow_Support_desk_resume.log ./public_html/2015_08_12_LQT_Flow_Support_desk_hhvm_resume.log:[2015-08-13 05:14:50] exception 'Flow\Exception\NoParsoidException' with message 'Failed contacting Parsoid for title "Topic:S52bxzn9rgi3j9e0": There was a problem during the HTTP request: 503 Service Unavailable' in /srv/mediawiki/php-1.26wmf18/extensions/Flow/includes/Parsoid/Utils.php:124
Final summary, for posterity:
[2015-08-14 06:47:54] Imported 6955 items, failed 0 [2015-08-14 06:47:56] Completed import to Project:Support desk from Project:Support desk/LQT Archive 1 [2015-08-14 06:47:56] Finished LQT conversion of page Project:Support_desk/LQT_Archive_1
In betalabs some additional check was done with interrupting the script to check for completeness of topic(s) conversion.