What is impact of storing references
Open, NormalPublic

Description

We need to ensure storing references in page props is not going to degrade other services. To do this we must run some tests.

Duration: 8hrs

Jdlrobson updated the task description. (Show Details)
Jdlrobson raised the priority of this task from to Normal.
Jdlrobson renamed this task from Spike: What is impact of storing references in ParserOutput on beta cluster? to Spike: What is impact of storing references.
Jdlrobson set Security to None.
Jdlrobson updated the task description. (Show Details)Feb 12 2016, 7:38 PM

@kaldari can you help me flesh this out with the API requests you are concerned about?
Remember we are using setExtensionData to not storing anything in the database, so I struggle to see how this could have a negative impact.

Anomie added a subscriber: Anomie.Feb 12 2016, 8:28 PM

Remember we are using setExtensionData to not storing anything in the database, so I struggle to see how this could have a negative impact.

That's not the direction I7b106254 went with it, it's actually being added to the page_props table.

After the upcoming database infrastructure changes are made (to make sure we are comparing apples to apples), we should test some of the following Special page queries:

  • Special:Random (top priority)
  • Special:DisambiguationPages

And some of the following API queries:

  • action=query&prop=pageprops
  • action=query&prop=info&inprop=displaytitle
  • action=query&prop=pageimages

After the upcoming database infrastructure changes are made (to make sure we are comparing apples to

Could you remind me what date this will happen? I'll bump this card to a future sprint.

@Jdlrobson: I don't think testing these on the beta cluster is going to be effective. The page_props table there is virtually empty, and will still be even after importing 10 articles with references. The performance times are going to be more dependent on random network latency than the tiny change in the size of the page_props table. These tests need to be done on English Wikipedia before and after the 5 million new rows are added. Also, I think it would be more effective to test against the database directly (command-line queries from terbium), rather than doing curl tests over the internet.

Could you remind me what date this will happen? I'll bump this card to a future sprint.

I don't know. You'll have to ask @jcrespo.

@Jdlrobson: I don't think testing these on the beta cluster is going to be effective. The page_props table there is virtually empty, and will still be even after importing 10 articles with references. The performance times are going to be more dependent on random network latency than the tiny change in the size of the page_props table. These tests need to be done on English Wikipedia before and after the 5 million new rows are added. Also, I think it would be more effective to test against the database directly (command-line queries from terbium), rather than doing curl tests over the internet.

Surely a smaller wiki would be sufficient. Maybe Portuguese?

Surely a smaller wiki would be sufficient. Maybe Portuguese?

Sure. It looks like pt.wiki has about 2 million rows in page_props (compared to 20 million for en.wiki), but it's definitely a better test case than beta labs.

Here is a version of the Special:Random query:

SELECT page_title,page_namespace FROM `page` LEFT JOIN `page_props` ON ((page_id = pp_page) AND pp_propname = 'disambiguation') WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_random >= 0.694440558979) AND pp_page IS NULL ORDER BY page_random LIMIT 1;

Here is a version of the Special:DisambiguationPages query:

SELECT pp_page AS value,page_namespace AS namespace,page_title AS title FROM `page`,`page_props` WHERE (page_id = pp_page) AND pp_propname = 'disambiguation' ORDER BY value LIMIT 100;

Running these against the ptwiki database on the db2035 slave in production currently gives:

  • Special:Random query: 0.29 sec first query, ~0.04 sec subsequent queries
  • Special:DisambiguationPages query: 0.24 sec first query, ~0.07 sec subsequent queries

I expect the results may be different after the database infrastructure changes go into place, though.

That is not proper performance testing. Allow me to do it for you, at least sampling a whole day of data on a non-idle host.

Jdlrobson moved this task from Backlog to Tasks on the MobileFrontend board.Feb 18 2016, 6:25 PM
Jdlrobson updated the task description. (Show Details)Feb 22 2016, 5:05 PM
Jdlrobson renamed this task from Spike: What is impact of storing references to What is impact of storing references.Feb 22 2016, 5:26 PM
Jdlrobson set the point value for this task to 8.Feb 22 2016, 5:30 PM

Change 272535 had a related patch set uploaded (by Jdlrobson):
Include Brasil (pt wiki) in webpagetest runs

https://gerrit.wikimedia.org/r/272535

Jdlrobson updated the task description. (Show Details)Feb 22 2016, 8:29 PM
Jdlrobson removed the point value for this task.Feb 23 2016, 12:10 AM
Jdlrobson updated the task description. (Show Details)

@jcrespo you do rock indeed :) I can take care of the deployment side of things and measuring the impact on the client (right now it looks like we could half the time for entire documents to download).

What's your schedule look like? Would you be able to to do this analysis this week/next week for example if I enabled it?

Change 272535 merged by jenkins-bot:
Include Brasil (pt wiki) in webpagetest runs

https://gerrit.wikimedia.org/r/272535

I find amusing that I specifically banned s2 wikis from being deployed an increase of content, and then you specifically chose an s2 wiki for testing. :-)

Not an issue anymore, unless that would create 500GB of new content.

If you are finally going with ptwiki, please allow me one day to gather data pre- and post- feature enable, as close as it as possible of such a deploy, as the previous link I sent you is a bit outdated.

@jcrespo it's hard to know for sure but based on https://phabricator.wikimedia.org/T125329#2004919 where the worse case the reference blob was 77 KB, since pt wiki has below 1 million pages worst case we'd be looking at 77GB (but it's going to be considerably lower than that I suspect!).

If you'd feel more comfortable with a non-s2 wiki it's not too late for me to look into it if you can suggest a wiki of similar size in terms of articles.

We had 36GB free one week ago on s2-master. We are more confortable now, but not totally until new hardware arrives.

Preference for s6 and s7:

frwiki
jawiki
ruwiki

eswiki
huwiki
hewiki
ukwiki
frwiktionary
arwiki
cawiki
viwiki
fawiki
rowiki
kowiki

ja has a bit more articles, but around the same size and s6 has potentially less impact and more resources available (and plenty of references/footnotes to work with).

Let's go with Japanese wiki then!

Change 273492 had a related patch set uploaded (by Jdlrobson):
Capture Japanese wiki article in tests

https://gerrit.wikimedia.org/r/273492

jcrespo added a comment.EditedMar 2 2016, 11:34 AM

Performing the 24-hour profiling of s6 master and a s6 slave.

db1061
*************************** 56. row ***************************
           Name: page_props
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 3129755
 Avg_row_length: 74
    Data_length: 232636416
Max_data_length: 0
   Index_length: 269287424
      Data_free: 7340032
 Auto_increment: NULL
    Create_time: 2015-01-05 14:18:21
    Update_time: NULL
     Check_time: NULL
      Collation: binary
       Checksum: NULL
 Create_options: 
        Comment:
db1023
*************************** 56. row ***************************
           Name: page_props
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 3008878
 Avg_row_length: 86
    Data_length: 260866048
Max_data_length: 0
   Index_length: 276611072
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2014-04-28 10:14:42
    Update_time: NULL
     Check_time: NULL
      Collation: binary
       Checksum: NULL
 Create_options: 
        Comment:

1​db1023
2​TABLE_SCHEMA TABLE_NAME ROWS_READ ROWS_CHANGED ROWS_CHANGED_X_INDEXES
3​jawiki uploadstash 2084 1659 6636
4​jawiki delete_archive_old 77938 0 0
5​jawiki #sql-738b_3816357c 0 405086 3240688
6​jawiki securepoll_questions 152 0 0
7​jawiki objectcache 52477 75479 150958
8​jawiki delete_user_old 47045 0 0
9​jawiki abuse_filter_log 728399580 407631 3261048
10​jawiki _geo_tags_new 0 83167 249501
11​jawiki querycache_info 4732845 2281121 2281121
12​jawiki page 648076406 157802455 789012275
13​jawiki geo_updates 5518 5497 5497
14​jawiki text 53732051 8348521 16697042
15​jawiki pagelinks 10409326874 89516004 254752401
16​jawiki user 207404904 19187587 76750348
17​jawiki log_search 1215613 355497 710994
18​jawiki checksums 474 158 316
19​jawiki page_props 167200493 1745278 4947767
20​jawiki blob_orphans 1319489 0 0
21​jawiki titlekey 2655874 156906 313812
22​jawiki delete_image_old 56304 0 0
23​jawiki user_properties 648545646 4220561 8441122
24​jawiki imagelinks 338355604 4788055 13601259
25​jawiki delete_watchlist_old 1017461 0 0
26​jawiki geo_killlist 1086311 1487379 2974758
27​jawiki category 6314731 4873612 14620836
28​jawiki _revision_new 159 51025617 306153702
29​jawiki geo_tags 8054921 6392997 19178991
30​jawiki categorylinks 488988841 4961082 19844328
31​jawiki iwlinks 190421383 1837037 5511111
32​jawiki securepoll_elections 1564 3 6
33​jawiki hidden 14 0 0
34​jawiki logging 5870089 976722 10743942
35​jawiki user_former_groups 368 179 179
36​jawiki _templatelinks_new 3 17138795 51416385
37​jawiki _filearchive_new 0 37977 227862
38​jawiki redirect 992464 145130 290260
39​jawiki filearchive 155378 3882 23292
40​jawiki externallinks 651800869 4949966 21962210
41​jawiki site_stats 8612168 8546364 8546364
42​jawiki prefswitch_survey 28080 0 0
43​jawiki msg_resource_links 4255 8171 8171
44​jawiki user_daily_contribs 12990855 4849984 4849984
45​jawiki optin_survey_old 246270 0 0
46​jawiki searchindex 4 0 0
47​jawiki transcode 180 72 288
48​jawiki wikilove_log 1565 829 4145
49​jawiki #sql-738b_bc52d558 0 13659 54636
50​jawiki site_identifiers 28176 16754 50262
51​jawiki securepoll_options 1468 0 0
52​jawiki protected_titles 27249494 1032 2064
53​jawiki user_newtalk 256008 377028 754056
54​jawiki watchlist 927714808 42169553 110234712
55​jawiki pif_edits 68713 0 0
56​jawiki abuse_filter 293428 187430 562290
57​jawiki math 1439331 170288 170288
58​jawiki cu_log 116627 12873 77238
59​jawiki recentchanges 5200224704 34246658 342466580
60​jawiki msg_resource 10658025 1817693 1817693
61​jawiki bv2015_edits 7275 971567 971567
62​jawiki change_tag 7745725 1074161 8593288
63​jawiki cur 256272 0 0
64​jawiki ipblocks 678392065 632713 4428991
65​jawiki page_restrictions 72262463 16155 96930
66​jawiki querycache 16254932 32487077 32487077
67​jawiki securepoll_properties 13882 15 15
68​jawiki oldimage 442259 2280 9120
69​jawiki wikilove_image_log 312 78 234
70​jawiki abuse_filter_history 582 464 2320
71​jawiki wbc_entity_usage 555200643 93469150 280407450
72​jawiki click_tracking_events 129 0 0
73​jawiki securepoll_cookie_match 7 0 0
74​jawiki _watchlist_new 19 12810741 38432223
75​jawiki delete_querycache_old 9537 0 0
76​jawiki spoofuser 1281721 263680 527360
77​jawiki updates 11037573 8393016 25179048
78​jawiki user_groups 3685083 253 506
79​jawiki cu_changes 14854007 19698454 118190724
80​jawiki abuse_filter_action 16865 740 1480
81​jawiki langlinks 722209772 3692178 7384356
82​jawiki mathoid 13642 59202 59202
83​jawiki _user_new 11 811062 3244248
84​jawiki _page_props_new 0 2491891 7475673
85​jawiki tag_summary 1084544 1460426 8762556
86​jawiki archive 6526919 491796 1967184
87​jawiki delete_oldimage_old 2447 0 0
88​jawiki bv2011_edits 520312 0 0
89​jawiki global_block_whitelist 3 0 0
90​jawiki accountaudit_login 1564128 1388355 2776710
91​jawiki bv2013_edits 739444 0 0
92​jawiki templatelinks 2124760424 25498080 71417701
93​jawiki filejournal 13257 0 0
94​jawiki betafeatures_user_counts 121552 15480 15480
95​jawiki blob_tracking 18595918 0 0
96​jawiki bv2009_edits 329230 0 0
97​jawiki _page_new 333 2594022 12970110
98​jawiki sites 40604 2701 24309
99​jawiki securepoll_entity 62 3 3
100​jawiki querycachetwo 10152453 2914640 8743920
101​jawiki delete_logging_old 25553 0 0
102​jawiki prefstats 265966 0 0
103​jawiki securepoll_lists 20166 0 0
104​jawiki _imagelinks_new 0 4173347 12520041
105​jawiki securepoll_msgs 3688 3 3
106​jawiki revision 387363245 9462366 56774196
107​jawiki _externallinks_new 1 4716670 23583350
108​jawiki _pagelinks_new 201 91915168 275745504
109​jawiki delete_recentchanges_old 379151 0 0
110​jawiki image 1623001 10313 61878
111​jawiki updatelog 33 7 7
112​jawiki _archive_new 0 2063939 8255756
113​jawiki _ipblocks_new 0 154553 1081871
114​jawiki module_deps 12500777 60907 60907
115​jawiki _image_new 0 164227 985362
116​jawiki securepoll_voters 1526 0 0
117
118​db1061
119​TABLE_SCHEMA TABLE_NAME ROWS_READ ROWS_CHANGED ROWS_CHANGED_X_INDEXES
120​jawiki user 7575206233 7365504 29462016
121​jawiki mathoid 261203 47087 47087
122​jawiki page 63470902602 93886987 469434935
123​jawiki objectcache 21091370 75479 150958
124​jawiki text 193078823 4793898 9587796
125​jawiki cu_changes 7113669 11836050 71016300
126​jawiki site_stats 137220996 4893391 4893391
127​jawiki abuse_filter_log 1390532530 233965 1871720
128​jawiki recentchanges 15213609849 19104162 152863304
129​jawiki pagelinks 2211914629 41236053 123708159
130​jawiki _geo_tags_new 0 83167 249501
131​jawiki updates 10987888 4871808 14615424
132​jawiki querycache_info 2402283 2271396 2271396
133​jawiki templatelinks 445387843 12626600 37879800
134​jawiki watchlist 421762430 19670943 59012829
135​jawiki categorylinks 2700612436 2664479 10657916
136​jawiki abuse_filter_action 512820 661 1322
137​jawiki checksums 316 158 316
138​jawiki page_props 2570967668 854117 2562351
139​jawiki imagelinks 336565430 1941445 5824335
140​jawiki category 203540047 2615807 7847421
141​jawiki wikilove_log 0 439 2195
142​jawiki spoofuser 57455 187088 374176
143​jawiki image 59131051 6320 37920
144​jawiki tag_summary 38187487 1021623 6129738
145​jawiki user_daily_contribs 38448484 2294423 2294423
146​jawiki module_deps 2769705458 29809 29809
147​jawiki _image_new 0 82647 495882
148​jawiki _ipblocks_new 0 154553 1081871
149​jawiki _externallinks_new 0 4713818 23569090
150​jawiki externallinks 271527580 2926160 13870490
151​jawiki filearchive 6521 2489 14934
152​jawiki logging 292072161 641010 7051110
153​jawiki geo_tags 6352909 2147036 6441108
154​jawiki redirect 330029881 81758 163516
155​jawiki msg_resource_links 246 3116 3116
156​jawiki _pagelinks_new 687 193346150 580038450
157​jawiki msg_resource 2221841423 1078187 1078187
158​jawiki user_properties 589599223 1064279 2128558
159​jawiki ipblocks 1205748595 325768 2280376
160​jawiki betafeatures_user_counts 860321 9342 9342
161​jawiki abuse_filter 57681997 111928 335784
162​jawiki transcode 11040 72 288
163​jawiki cu_log 13137166 7737 46422
164​jawiki user_groups 8010807 67 134
165​jawiki querycache 28857090 17688320 17688320
166​jawiki _recentchanges_new 0 635085 5080680
167​jawiki _imagelinks_new 0 8818204 26454612
168​jawiki math 11116609 46386 46386
169​jawiki archive 21062482 294793 1179172
170​jawiki change_tag 4826050810 712727 5701816
171​jawiki protected_titles 16654757 596 1192
172​jawiki user_newtalk 10298809 254441 508882
173​jawiki bv2015_edits 1935451 971567 971567
174​jawiki _templatelinks_new 73 38055873 114167619
175​jawiki page_restrictions 217487005 10479 62874
176​jawiki uploadstash 1096 960 3840
177​jawiki user_former_groups 0 10 10
178​jawiki iwlinks 20324904 1117475 3352425
179​jawiki securepoll_elections 14575 0 0
180​jawiki global_block_whitelist 6 0 0
181​jawiki oldimage 9949842 1252 5008
182​jawiki revision 43110322170 5620562 33723372
183​jawiki site_identifiers 397166516 12160 36480
184​jawiki abuse_filter_history 246460 392 1960
185​jawiki wbc_entity_usage 165604390 93469758 280409274
186​jawiki log_search 1184489715 235293 470586
187​jawiki langlinks 837561525 2207687 4415374
188​jawiki updatelog 0 6 6
189​jawiki querycachetwo 14573233 1564382 4693146
190​jawiki accountaudit_login 527951 577418 1154836
191​jawiki sites 1129090126 2686 24174

(Note the rows do not match because they were taken at a different time, plus those are not updated in real time- only are approximations)
Proper count:

$ date; mysql -BN information_schema -e "SELECT count(*) FROM jawiki.page_props"
Wed Mar  2 11:48:22 UTC 2016
2866974

Change 274470 had a related patch set uploaded (by Jdlrobson):
Enable reference storage on Japanese Wiki

https://gerrit.wikimedia.org/r/274470

SWAT arranged for Monday.
TODO:

  • Confirm with @jcrespo he can make that.
  • Me to e-mail ops list.

Change 273492 merged by jenkins-bot:
Capture Japanese wiki article in tests

https://gerrit.wikimedia.org/r/273492

Change 274470 abandoned by Jdlrobson:
Enable reference storage on Japanese Wiki

Reason:
pending further discussion

https://gerrit.wikimedia.org/r/274470