Investigation: Measure load times when the complete lexeme data is loaded to display lexeme in the statement
Closed, ResolvedPublic


Experiment plan:

  • create 500 lexemes (with random lemma etc)
    • each of lexemes should contain 3-5 forms (with random representation)
    • each of lexemes should have between 5 and 100 statements, each of these referencing some of those 500 lexemes (referencing self should probably be limited)
  • for each of lexeme, do the following
    • do action=purge on its page
    • start the counter
    • load the page
    • stop the counter

Test result would be the every page load time per the lexeme statement count class (e.g. when referencing 1-3 lexemes per page the load takes x secs, when 10-15 lexemes y secs).
If it seems to making test more "reliable", the experiment could be repeated and the average of results reported.

WMDE-leszek triaged this task as Normal priority.
WMDE-leszek claimed this task.EditedFeb 28 2018, 1:17 PM
WMDE-leszek moved this task from Backlog to Review on the Wikidata-Sprint-2018-02-28 board.

The actual test went like that (slightly diverged from the description above, the test set is still fine IMO)

  • 473 lexemes
  • each of them having between 1 and 28 statements referencing lexeme
  • each of them having between 3 and 5 random functions.

Bonus: created one lexeme with 1000 statements as an edge case:

Measured page load times for each of above test lexemes. Results collected in the table below.
Only the time to get HTML was measured, i.e. didn't measure the time needed to load JS (which does not seem related to the issue at hand, neither seem to have a significant performance impact).

1-28 statements on Lexeme
AVG load time0.747
MIN load time0.548
MAX load time1.925
1000 statements on Lexeme
load time4.947

For the record, the full data (load time for each lexeme, including its size etc) is published as P6755

Pinging @Jonas, @thiemowmde, @Lydia_Pintscher, @daniel to have a look at the result. It seems to me the current load times are acceptable. Of course when the load on the system is higher (more users and more data), then we would have to implement the more performant way of displaying lexemes (see T187323). For now the existing approach seems good enough to me.

WMDE-leszek updated the task description. (Show Details)Feb 28 2018, 1:17 PM

Looks good, thanks! Since this was brought up, here are some numbers from the Wikidata Item "Germany" for comparison:

  • 622 statements
  • 684 qualifiers
  • 579 non-empty references, containing 1789 snaks (that's an average of 3 snaks per reference)
  • 3095 value snaks, including qualifiers and references snaks
WMDE-leszek closed this task as Resolved.Mar 2 2018, 9:01 AM