We need to ensure storing references in page props is not going to degrade other services. To do this we must run some tests.
Duration: 8hrs
We need to ensure storing references in page props is not going to degrade other services. To do this we must run some tests.
Duration: 8hrs
@kaldari can you help me flesh this out with the API requests you are concerned about?
Remember we are using setExtensionData to not storing anything in the database, so I struggle to see how this could have a negative impact.
That's not the direction I7b106254 went with it, it's actually being added to the page_props table.
After the upcoming database infrastructure changes are made (to make sure we are comparing apples to apples), we should test some of the following Special page queries:
And some of the following API queries:
Could you remind me what date this will happen? I'll bump this card to a future sprint.
@Jdlrobson: I don't think testing these on the beta cluster is going to be effective. The page_props table there is virtually empty, and will still be even after importing 10 articles with references. The performance times are going to be more dependent on random network latency than the tiny change in the size of the page_props table. These tests need to be done on English Wikipedia before and after the 5 million new rows are added. Also, I think it would be more effective to test against the database directly (command-line queries from terbium), rather than doing curl tests over the internet.
Could you remind me what date this will happen? I'll bump this card to a future sprint.
I don't know. You'll have to ask @jcrespo.
Surely a smaller wiki would be sufficient. Maybe Portuguese?
Sure. It looks like pt.wiki has about 2 million rows in page_props (compared to 20 million for en.wiki), but it's definitely a better test case than beta labs.
Here is a version of the Special:Random query:
SELECT page_title,page_namespace FROM `page` LEFT JOIN `page_props` ON ((page_id = pp_page) AND pp_propname = 'disambiguation') WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_random >= 0.694440558979) AND pp_page IS NULL ORDER BY page_random LIMIT 1;
Here is a version of the Special:DisambiguationPages query:
SELECT pp_page AS value,page_namespace AS namespace,page_title AS title FROM `page`,`page_props` WHERE (page_id = pp_page) AND pp_propname = 'disambiguation' ORDER BY value LIMIT 100;
Running these against the ptwiki database on the db2035 slave in production currently gives:
I expect the results may be different after the database infrastructure changes go into place, though.
That is not proper performance testing. Allow me to do it for you, at least sampling a whole day of data on a non-idle host.
Change 272535 had a related patch set uploaded (by Jdlrobson):
Include Brasil (pt wiki) in webpagetest runs
@jcrespo you do rock indeed :) I can take care of the deployment side of things and measuring the impact on the client (right now it looks like we could half the time for entire documents to download).
What's your schedule look like? Would you be able to to do this analysis this week/next week for example if I enabled it?
I find amusing that I specifically banned s2 wikis from being deployed an increase of content, and then you specifically chose an s2 wiki for testing. :-)
Not an issue anymore, unless that would create 500GB of new content.
If you are finally going with ptwiki, please allow me one day to gather data pre- and post- feature enable, as close as it as possible of such a deploy, as the previous link I sent you is a bit outdated.
@jcrespo it's hard to know for sure but based on https://phabricator.wikimedia.org/T125329#2004919 where the worse case the reference blob was 77 KB, since pt wiki has below 1 million pages worst case we'd be looking at 77GB (but it's going to be considerably lower than that I suspect!).
If you'd feel more comfortable with a non-s2 wiki it's not too late for me to look into it if you can suggest a wiki of similar size in terms of articles.
We had 36GB free one week ago on s2-master. We are more confortable now, but not totally until new hardware arrives.
Preference for s6 and s7:
frwiki jawiki ruwiki eswiki huwiki hewiki ukwiki frwiktionary arwiki cawiki viwiki fawiki rowiki kowiki
ja has a bit more articles, but around the same size and s6 has potentially less impact and more resources available (and plenty of references/footnotes to work with).
Change 273492 had a related patch set uploaded (by Jdlrobson):
Capture Japanese wiki article in tests
*************************** 56. row *************************** Name: page_props Engine: InnoDB Version: 10 Row_format: Compact Rows: 3129755 Avg_row_length: 74 Data_length: 232636416 Max_data_length: 0 Index_length: 269287424 Data_free: 7340032 Auto_increment: NULL Create_time: 2015-01-05 14:18:21 Update_time: NULL Check_time: NULL Collation: binary Checksum: NULL Create_options: Comment:
*************************** 56. row *************************** Name: page_props Engine: InnoDB Version: 10 Row_format: Compact Rows: 3008878 Avg_row_length: 86 Data_length: 260866048 Max_data_length: 0 Index_length: 276611072 Data_free: 4194304 Auto_increment: NULL Create_time: 2014-04-28 10:14:42 Update_time: NULL Check_time: NULL Collation: binary Checksum: NULL Create_options: Comment:
(Note the rows do not match because they were taken at a different time, plus those are not updated in real time- only are approximations)
Proper count:
$ date; mysql -BN information_schema -e "SELECT count(*) FROM jawiki.page_props" Wed Mar 2 11:48:22 UTC 2016 2866974
Change 274470 had a related patch set uploaded (by Jdlrobson):
Enable reference storage on Japanese Wiki
SWAT arranged for Monday.
TODO:
Change 274470 abandoned by Jdlrobson:
Enable reference storage on Japanese Wiki
Reason:
pending further discussion