What is impact of storing references
Open, NormalPublic

Description

We need to ensure storing references in page props is not going to degrade other services. To do this we must run some tests.

Duration: 8hrs

Jdlrobson updated the task description. (Show Details)
Jdlrobson raised the priority of this task from to Normal.
Jdlrobson renamed this task from Spike: What is impact of storing references in ParserOutput on beta cluster? to Spike: What is impact of storing references.
Jdlrobson set Security to None.
Jdlrobson updated the task description. (Show Details)Feb 12 2016, 7:38 PM

@kaldari can you help me flesh this out with the API requests you are concerned about?
Remember we are using setExtensionData to not storing anything in the database, so I struggle to see how this could have a negative impact.

Anomie added a subscriber: Anomie.Feb 12 2016, 8:28 PM

Remember we are using setExtensionData to not storing anything in the database, so I struggle to see how this could have a negative impact.

That's not the direction I7b106254 went with it, it's actually being added to the page_props table.

After the upcoming database infrastructure changes are made (to make sure we are comparing apples to apples), we should test some of the following Special page queries:

  • Special:Random (top priority)
  • Special:DisambiguationPages

And some of the following API queries:

  • action=query&prop=pageprops
  • action=query&prop=info&inprop=displaytitle
  • action=query&prop=pageimages

After the upcoming database infrastructure changes are made (to make sure we are comparing apples to

Could you remind me what date this will happen? I'll bump this card to a future sprint.

@Jdlrobson: I don't think testing these on the beta cluster is going to be effective. The page_props table there is virtually empty, and will still be even after importing 10 articles with references. The performance times are going to be more dependent on random network latency than the tiny change in the size of the page_props table. These tests need to be done on English Wikipedia before and after the 5 million new rows are added. Also, I think it would be more effective to test against the database directly (command-line queries from terbium), rather than doing curl tests over the internet.

Could you remind me what date this will happen? I'll bump this card to a future sprint.

I don't know. You'll have to ask @jcrespo.

@Jdlrobson: I don't think testing these on the beta cluster is going to be effective. The page_props table there is virtually empty, and will still be even after importing 10 articles with references. The performance times are going to be more dependent on random network latency than the tiny change in the size of the page_props table. These tests need to be done on English Wikipedia before and after the 5 million new rows are added. Also, I think it would be more effective to test against the database directly (command-line queries from terbium), rather than doing curl tests over the internet.

Surely a smaller wiki would be sufficient. Maybe Portuguese?

Surely a smaller wiki would be sufficient. Maybe Portuguese?

Sure. It looks like pt.wiki has about 2 million rows in page_props (compared to 20 million for en.wiki), but it's definitely a better test case than beta labs.

Here is a version of the Special:Random query:

SELECT page_title,page_namespace FROM `page` LEFT JOIN `page_props` ON ((page_id = pp_page) AND pp_propname = 'disambiguation') WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_random >= 0.694440558979) AND pp_page IS NULL ORDER BY page_random LIMIT 1;

Here is a version of the Special:DisambiguationPages query:

SELECT pp_page AS value,page_namespace AS namespace,page_title AS title FROM `page`,`page_props` WHERE (page_id = pp_page) AND pp_propname = 'disambiguation' ORDER BY value LIMIT 100;

Running these against the ptwiki database on the db2035 slave in production currently gives:

  • Special:Random query: 0.29 sec first query, ~0.04 sec subsequent queries
  • Special:DisambiguationPages query: 0.24 sec first query, ~0.07 sec subsequent queries

I expect the results may be different after the database infrastructure changes go into place, though.

That is not proper performance testing. Allow me to do it for you, at least sampling a whole day of data on a non-idle host.

Jdlrobson moved this task from Backlog to Tasks on the MobileFrontend board.Feb 18 2016, 6:25 PM
Jdlrobson updated the task description. (Show Details)Feb 22 2016, 5:05 PM
Jdlrobson renamed this task from Spike: What is impact of storing references to What is impact of storing references.Feb 22 2016, 5:26 PM
Jdlrobson set the point value for this task to 8.Feb 22 2016, 5:30 PM

Change 272535 had a related patch set uploaded (by Jdlrobson):
Include Brasil (pt wiki) in webpagetest runs

https://gerrit.wikimedia.org/r/272535

Jdlrobson updated the task description. (Show Details)Feb 22 2016, 8:29 PM
Jdlrobson removed the point value for this task.Feb 23 2016, 12:10 AM
Jdlrobson updated the task description. (Show Details)

@jcrespo you do rock indeed :) I can take care of the deployment side of things and measuring the impact on the client (right now it looks like we could half the time for entire documents to download).

What's your schedule look like? Would you be able to to do this analysis this week/next week for example if I enabled it?

Change 272535 merged by jenkins-bot:
Include Brasil (pt wiki) in webpagetest runs

https://gerrit.wikimedia.org/r/272535

I find amusing that I specifically banned s2 wikis from being deployed an increase of content, and then you specifically chose an s2 wiki for testing. :-)

Not an issue anymore, unless that would create 500GB of new content.

If you are finally going with ptwiki, please allow me one day to gather data pre- and post- feature enable, as close as it as possible of such a deploy, as the previous link I sent you is a bit outdated.

@jcrespo it's hard to know for sure but based on https://phabricator.wikimedia.org/T125329#2004919 where the worse case the reference blob was 77 KB, since pt wiki has below 1 million pages worst case we'd be looking at 77GB (but it's going to be considerably lower than that I suspect!).

If you'd feel more comfortable with a non-s2 wiki it's not too late for me to look into it if you can suggest a wiki of similar size in terms of articles.

We had 36GB free one week ago on s2-master. We are more confortable now, but not totally until new hardware arrives.

Preference for s6 and s7:

frwiki
jawiki
ruwiki

eswiki
huwiki
hewiki
ukwiki
frwiktionary
arwiki
cawiki
viwiki
fawiki
rowiki
kowiki

ja has a bit more articles, but around the same size and s6 has potentially less impact and more resources available (and plenty of references/footnotes to work with).

Let's go with Japanese wiki then!

Change 273492 had a related patch set uploaded (by Jdlrobson):
Capture Japanese wiki article in tests

https://gerrit.wikimedia.org/r/273492

jcrespo added a comment.EditedMar 2 2016, 11:34 AM

Performing the 24-hour profiling of s6 master and a s6 slave.

db1061
*************************** 56. row ***************************
           Name: page_props
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 3129755
 Avg_row_length: 74
    Data_length: 232636416
Max_data_length: 0
   Index_length: 269287424
      Data_free: 7340032
 Auto_increment: NULL
    Create_time: 2015-01-05 14:18:21
    Update_time: NULL
     Check_time: NULL
      Collation: binary
       Checksum: NULL
 Create_options: 
        Comment:
db1023
*************************** 56. row ***************************
           Name: page_props
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 3008878
 Avg_row_length: 86
    Data_length: 260866048
Max_data_length: 0
   Index_length: 276611072
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2014-04-28 10:14:42
    Update_time: NULL
     Check_time: NULL
      Collation: binary
       Checksum: NULL
 Create_options: 
        Comment:

1db1023
2TABLE_SCHEMA TABLE_NAME ROWS_READ ROWS_CHANGED ROWS_CHANGED_X_INDEXES
3jawiki uploadstash 2084 1659 6636
4jawiki delete_archive_old 77938 0 0
5jawiki #sql-738b_3816357c 0 405086 3240688
6jawiki securepoll_questions 152 0 0
7jawiki objectcache 52477 75479 150958
8jawiki delete_user_old 47045 0 0
9jawiki abuse_filter_log 728399580 407631 3261048
10jawiki _geo_tags_new 0 83167 249501
11jawiki querycache_info 4732845 2281121 2281121
12jawiki page 648076406 157802455 789012275
13jawiki geo_updates 5518 5497 5497
14jawiki text 53732051 8348521 16697042
15jawiki pagelinks 10409326874 89516004 254752401
16jawiki user 207404904 19187587 76750348
17jawiki log_search 1215613 355497 710994
18jawiki checksums 474 158 316
19jawiki page_props 167200493 1745278 4947767
20jawiki blob_orphans 1319489 0 0
21jawiki titlekey 2655874 156906 313812
22jawiki delete_image_old 56304 0 0
23jawiki user_properties 648545646 4220561 8441122
24jawiki imagelinks 338355604 4788055 13601259
25jawiki delete_watchlist_old 1017461 0 0
26jawiki geo_killlist 1086311 1487379 2974758
27jawiki category 6314731 4873612 14620836
28jawiki _revision_new 159 51025617 306153702
29jawiki geo_tags 8054921 6392997 19178991
30jawiki categorylinks 488988841 4961082 19844328
31jawiki iwlinks 190421383 1837037 5511111
32jawiki securepoll_elections 1564 3 6
33jawiki hidden 14 0 0
34jawiki logging 5870089 976722 10743942
35jawiki user_former_groups 368 179 179
36jawiki _templatelinks_new 3 17138795 51416385
37jawiki _filearchive_new 0 37977 227862
38jawiki redirect 992464 145130 290260
39jawiki filearchive 155378 3882 23292
40jawiki externallinks 651800869 4949966 21962210
41jawiki site_stats 8612168 8546364 8546364
42jawiki prefswitch_survey 28080 0 0
43jawiki msg_resource_links 4255 8171 8171
44jawiki user_daily_contribs 12990855 4849984 4849984
45jawiki optin_survey_old 246270 0 0
46jawiki searchindex 4 0 0
47jawiki transcode 180 72 288
48jawiki wikilove_log 1565 829 4145
49jawiki #sql-738b_bc52d558 0 13659 54636
50jawiki site_identifiers 28176 16754 50262
51jawiki securepoll_options 1468 0 0
52jawiki protected_titles 27249494 1032 2064
53jawiki user_newtalk 256008 377028 754056
54jawiki watchlist 927714808 42169553 110234712
55jawiki pif_edits 68713 0 0
56jawiki abuse_filter 293428 187430 562290
57jawiki math 1439331 170288 170288
58jawiki cu_log 116627 12873 77238
59jawiki recentchanges 5200224704 34246658 342466580
60jawiki msg_resource 10658025 1817693 1817693
61jawiki bv2015_edits 7275 971567 971567
62jawiki change_tag 7745725 1074161 8593288
63jawiki cur 256272 0 0
64jawiki ipblocks 678392065 632713 4428991
65jawiki page_restrictions 72262463 16155 96930
66jawiki querycache 16254932 32487077 32487077
67jawiki securepoll_properties 13882 15 15
68jawiki oldimage 442259 2280 9120
69jawiki wikilove_image_log 312 78 234
70jawiki abuse_filter_history 582 464 2320
71jawiki wbc_entity_usage 555200643 93469150 280407450
72jawiki click_tracking_events 129 0 0
73jawiki securepoll_cookie_match 7 0 0
74jawiki _watchlist_new 19 12810741 38432223
75jawiki delete_querycache_old 9537 0 0
76jawiki spoofuser 1281721 263680 527360
77jawiki updates 11037573 8393016 25179048
78jawiki user_groups 3685083 253 506
79jawiki cu_changes 14854007 19698454 118190724
80jawiki abuse_filter_action 16865 740 1480
81jawiki langlinks 722209772 3692178 7384356
82jawiki mathoid 13642 59202 59202
83jawiki _user_new 11 811062 3244248
84jawiki _page_props_new 0 2491891 7475673
85jawiki tag_summary 1084544 1460426 8762556
86jawiki archive 6526919 491796 1967184
87jawiki delete_oldimage_old 2447 0 0
88jawiki bv2011_edits 520312 0 0
89jawiki global_block_whitelist 3 0 0
90jawiki accountaudit_login 1564128 1388355 2776710
91jawiki bv2013_edits 739444 0 0
92jawiki templatelinks 2124760424 25498080 71417701
93jawiki filejournal 13257 0 0
94jawiki betafeatures_user_counts 121552 15480 15480
95jawiki blob_tracking 18595918 0 0
96jawiki bv2009_edits 329230 0 0
97jawiki _page_new 333 2594022 12970110
98jawiki sites 40604 2701 24309
99jawiki securepoll_entity 62 3 3
100jawiki querycachetwo 10152453 2914640 8743920
101jawiki delete_logging_old 25553 0 0
102jawiki prefstats 265966 0 0
103jawiki securepoll_lists 20166 0 0
104jawiki _imagelinks_new 0 4173347 12520041
105jawiki securepoll_msgs 3688 3 3
106jawiki revision 387363245 9462366 56774196
107jawiki _externallinks_new 1 4716670 23583350
108jawiki _pagelinks_new 201 91915168 275745504
109jawiki delete_recentchanges_old 379151 0 0
110jawiki image 1623001 10313 61878
111jawiki updatelog 33 7 7
112jawiki _archive_new 0 2063939 8255756
113jawiki _ipblocks_new 0 154553 1081871
114jawiki module_deps 12500777 60907 60907
115jawiki _image_new 0 164227 985362
116jawiki securepoll_voters 1526 0 0
117
118db1061
119TABLE_SCHEMA TABLE_NAME ROWS_READ ROWS_CHANGED ROWS_CHANGED_X_INDEXES
120jawiki user 7575206233 7365504 29462016
121jawiki mathoid 261203 47087 47087
122jawiki page 63470902602 93886987 469434935
123jawiki objectcache 21091370 75479 150958
124jawiki text 193078823 4793898 9587796
125jawiki cu_changes 7113669 11836050 71016300
126jawiki site_stats 137220996 4893391 4893391
127jawiki abuse_filter_log 1390532530 233965 1871720
128jawiki recentchanges 15213609849 19104162 152863304
129jawiki pagelinks 2211914629 41236053 123708159
130jawiki _geo_tags_new 0 83167 249501
131jawiki updates 10987888 4871808 14615424
132jawiki querycache_info 2402283 2271396 2271396
133jawiki templatelinks 445387843 12626600 37879800
134jawiki watchlist 421762430 19670943 59012829
135jawiki categorylinks 2700612436 2664479 10657916
136jawiki abuse_filter_action 512820 661 1322
137jawiki checksums 316 158 316
138jawiki page_props 2570967668 854117 2562351
139jawiki imagelinks 336565430 1941445 5824335
140jawiki category 203540047 2615807 7847421
141jawiki wikilove_log 0 439 2195
142jawiki spoofuser 57455 187088 374176
143jawiki image 59131051 6320 37920
144jawiki tag_summary 38187487 1021623 6129738
145jawiki user_daily_contribs 38448484 2294423 2294423
146jawiki module_deps 2769705458 29809 29809
147jawiki _image_new 0 82647 495882
148jawiki _ipblocks_new 0 154553 1081871
149jawiki _externallinks_new 0 4713818 23569090
150jawiki externallinks 271527580 2926160 13870490
151jawiki filearchive 6521 2489 14934
152jawiki logging 292072161 641010 7051110
153jawiki geo_tags 6352909 2147036 6441108
154jawiki redirect 330029881 81758 163516
155jawiki msg_resource_links 246 3116 3116
156jawiki _pagelinks_new 687 193346150 580038450
157jawiki msg_resource 2221841423 1078187 1078187
158jawiki user_properties 589599223 1064279 2128558
159jawiki ipblocks 1205748595 325768 2280376
160jawiki betafeatures_user_counts 860321 9342 9342
161jawiki abuse_filter 57681997 111928 335784
162jawiki transcode 11040 72 288
163jawiki cu_log 13137166 7737 46422
164jawiki user_groups 8010807 67 134
165jawiki querycache 28857090 17688320 17688320
166jawiki _recentchanges_new 0 635085 5080680
167jawiki _imagelinks_new 0 8818204 26454612
168jawiki math 11116609 46386 46386
169jawiki archive 21062482 294793 1179172
170jawiki change_tag 4826050810 712727 5701816
171jawiki protected_titles 16654757 596 1192
172jawiki user_newtalk 10298809 254441 508882
173jawiki bv2015_edits 1935451 971567 971567
174jawiki _templatelinks_new 73 38055873 114167619
175jawiki page_restrictions 217487005 10479 62874
176jawiki uploadstash 1096 960 3840
177jawiki user_former_groups 0 10 10
178jawiki iwlinks 20324904 1117475 3352425
179jawiki securepoll_elections 14575 0 0
180jawiki global_block_whitelist 6 0 0
181jawiki oldimage 9949842 1252 5008
182jawiki revision 43110322170 5620562 33723372
183jawiki site_identifiers 397166516 12160 36480
184jawiki abuse_filter_history 246460 392 1960
185jawiki wbc_entity_usage 165604390 93469758 280409274
186jawiki log_search 1184489715 235293 470586
187jawiki langlinks 837561525 2207687 4415374
188jawiki updatelog 0 6 6
189jawiki querycachetwo 14573233 1564382 4693146
190jawiki accountaudit_login 527951 577418 1154836
191jawiki sites 1129090126 2686 24174

(Note the rows do not match because they were taken at a different time, plus those are not updated in real time- only are approximations)
Proper count:

$ date; mysql -BN information_schema -e "SELECT count(*) FROM jawiki.page_props"
Wed Mar  2 11:48:22 UTC 2016
2866974

Change 274470 had a related patch set uploaded (by Jdlrobson):
Enable reference storage on Japanese Wiki

https://gerrit.wikimedia.org/r/274470

SWAT arranged for Monday.
TODO:

  • Confirm with @jcrespo he can make that.
  • Me to e-mail ops list.

Change 273492 merged by jenkins-bot:
Capture Japanese wiki article in tests

https://gerrit.wikimedia.org/r/273492

Change 274470 abandoned by Jdlrobson:
Enable reference storage on Japanese Wiki

Reason:
pending further discussion

https://gerrit.wikimedia.org/r/274470