Page MenuHomePhabricator

Aspect values are poorly documented in API help pages
Closed, ResolvedPublic3 Estimated Story Points

Description

The automatic docs pages are:
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bwbentityusage and https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bwblistentityusage

wbeuaspect

    Only return entity IDs that used this aspect. 
    Values (separate with | or alternative): C, D, L, O, S, T, X

It's unclear what these letters mean without knowing otherwise, or digging into the code to find out:

	/**
	 * Usage flag indicating that the entity's sitelinks were used as links.
	 * This would be the case when generating language links or sister links from
	 * an entity's sitelinks, for display in the sidebar.
	 *
	 * @note: This does NOT cover sitelinks used in wikitext (e.g. via Lua).
	 *        Use OTHER_USAGE for that.
	 */
	public const SITELINK_USAGE = 'S';
	/**
	 * Usage flag indicating that one of the entity's labels were used.
	 * This would be the case when showing the label of a referenced entity. Note that
	 * label usage is typically tracked with a modifier specifying the label's language code.
	 */
	public const LABEL_USAGE = 'L';
	/**
	 * Usage flag indicating that one of the entity's descriptions were used.
	 * This would be the case when showing the descriptions of a referenced entity. Note that
	 * descriptions usage is typically tracked with a modifier specifying the language code.
	 */
	public const DESCRIPTION_USAGE = 'D';
	/**
	 * Usage flag indicating that the entity's local page name was used,
	 * i.e. the title of the local (client) page linked to the entity.
	 * This would be the case when linking a referenced entity to the
	 * corresponding local wiki page.
	 * This can be thought of as a special kind of sitelink usage,
	 * specifically for the sitelink for the local wiki.
	 */
	public const TITLE_USAGE = 'T';
	/**
	 * Usage flag indicating that certain statements (identified by their property id)
	 * from the entity were used.
	 * This currently implies that we also have an OTHER_USAGE or an ALL_USAGE
	 * for the same entity (STATEMENT_USAGE is never used alone).
	 */
	public const STATEMENT_USAGE = 'C';
	/**
	 * Usage flag indicating that any and all aspects of the entity
	 * were (or may have been) used.
	 */
	public const ALL_USAGE = 'X';
	/**
	 * Usage flag indicating that some aspect of the entity was changed
	 * which is not covered by any other usage flag (except "all"). That is,
	 * the specific usage flags together with the "other" flag are equivalent
	 * to the "all" flag ( S + T + L + O = X or rather O = X - S - T - L ).
	 *
	 * Due to migration efforts, this is currently used redundantly with
	 * STATEMENT_USAGE or DESCRIPTION_USAGE, as they were only added later.
	 */
	public const OTHER_USAGE = 'O';

Docs of these aspects also exist at https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_usagetracking.html

Some sort of better documentation inline on the API output would be useful

Acceptance criteria:

  • API docs are improved to explain what each available aspect means
  • documentation has been double-checked and improved if necessary to ensure it is up-to-date
    • in particular, most of the documentation for the "Other" aspect seems out of date as it seems to actually include Alias-usage and existence checks, but not Description usage
  • cross-checked this documentation with https://grafana-next.wikimedia.org/d/000000160/wikidata-entity-usage?orgId=1&refresh=5m
  • Backported to appropriate/supported release branches of Wikibase

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptMay 17 2021, 10:19 PM
Addshore subscribed.

They are also documented in our docs https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_usagetracking.html

Perhaps we should just link here?
OR has some sort of special page / help page embended in wikibase.get that will appear on all wikibases?

Something that appears in/is linked from the API help pages, and therefore by design is also on Special:ApiSandbox feels more streamlined than something seperate and disjointed, or requires other discovery to find it.

While it's extra docs to keep upto date (but I imagine it doesn't change often?), copy pasta of the bulk of this section (minor tweaks maybe needed?) into the description (seemingly apihelp-query+wbentityusage-param-aspect and apihelp-query+wblistentityusage-param-aspect as they both use the Only return entity IDs that used this aspect. string) would probably work best

Entity usage on client pages is tracked using the following codes (each representing one aspect):

    sitelinks (S) - The entity's sitelinks are used.
    label (L.xx) - The entity's label in language xx is used.
    description (D.xx) - The entity's description in language xx is used.
    title (T) - The title of the local page corresponding to the entity is used.
    statements (C) - Certain statements (identified by their property id) from the entity are used.
    other (O) - Something else about the entity is used. This currently implicates statement and description usage.
    all (X) - All aspects of an entity are or may be used.

While it would be nice if the API allowed seperate documentation of those list of values, it's obviously out of scope for this request, and would become a dependancy on being able to use it. Engineering effort suggests it's probably likely to end up low on a list of things to be done; but I will file a bug.

The first similar example I can find in the API is for "error format`:

	"apihelp-main-param-errorformat": "Format to use for warning and error text output.\n; plaintext: Wikitext with HTML tags removed and entities replaced.\n; wikitext: Unparsed wikitext.\n; html: HTML.\n; raw: Message key and parameters.\n; none: No text output, only the error codes.\n; bc: Format used prior to MediaWiki 1.29. <var>errorlang</var> and <var>errorsuselocal</var> are ignored.",

Which results in something like

Screenshot 2021-06-07 at 14.38.14.png (574ร—1 px, 105 KB)

Which obviously also makes it translateable, where a link to https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_usagetracking.html is not.

Reedy renamed this task from prop=wbentityusage wbeuaspect values are poorly documented to Aspect values are poorly documented in API help pages.Jun 8 2021, 5:34 PM
Reedy updated the task description. (Show Details)
Addshore set the point value for this task to 3.

Change 700505 had a related patch set uploaded (by Michael GroรŸe; author: Michael GroรŸe):

[mediawiki/extensions/Wikibase@master] Add documentation for API param values

https://gerrit.wikimedia.org/r/700505

Change 700505 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add documentation for API param values

https://gerrit.wikimedia.org/r/700505

Change 701417 had a related patch set uploaded (by Reedy; author: Michael GroรŸe):

[mediawiki/extensions/Wikibase@REL1_35] Add documentation for API param values

https://gerrit.wikimedia.org/r/701417

Change 701418 had a related patch set uploaded (by Reedy; author: Michael GroรŸe):

[mediawiki/extensions/Wikibase@REL1_36] Add documentation for API param values

https://gerrit.wikimedia.org/r/701418

Change 701417 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@REL1_35] Add documentation for API param values

https://gerrit.wikimedia.org/r/701417

Change 701418 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@REL1_36] Add documentation for API param values

https://gerrit.wikimedia.org/r/701418

Michael removed Michael as the assignee of this task.
Michael subscribed.

Maybe, this shouldn't have gone into "Verification" just yet, because the other ACs are genuinely still open, given that the meaning of the "Other" aspect has changed at some point in the past and all documentation except the API docs is (now) incorrect.

(Unassigning myself as I'm not working on it right now, but happy to pick it up again later.)

in particular, most of the documentation for the "Other" aspect seems out of date as it seems to actually include Alias-usage and existence checks, but not Description usage

Description usage is not "Other" usage AFAIK. Am I missing something very obvious?

in particular, most of the documentation for the "Other" aspect seems out of date as it seems to actually include Alias-usage and existence checks, but not Description usage

Description usage is not "Other" usage AFAIK. Am I missing something very obvious?

I'm sorry, this confusion is probably due to my bad English skills. What I wanted to say:
Based on the code, "Other" usage includes Alias-usage, Statement-usage and entity existence checks.
However, most documentation, like the one quoted in the description of this task, is apparently outdated/wrong, because it wrongly claims that Description usage were part of the Other usage.

Thanks for clarifying. I'm not native either, that definitely contributed to the confusion. Now I have more questions!

Based on the code, "Other" usage includes Alias-usage, Statement-usage and entity existence checks.

Alias usage and entity existence make sense but statement usage has its own aspect (C). Maybe we are still subscribing and hasn't been properly cleaned up in the code? If that's the case, it'd be a huge performance gain.

However, most documentation, like the one quoted in the description of this task, is apparently outdated/wrong, because it wrongly claims that Description usage were part of the Other usage.

I think I know where this is coming from, Description used to be part of O until it got its own aspect.

It looks like we really still track โ€œotherโ€ usage for statement usage:

EntityAccessor::getEntityStatements()
		$propertyId = new PropertyId( $propertyIdSerialization );
		$this->usageAccumulator->addStatementUsage( $entityId, $propertyId );
		$this->usageAccumulator->addOtherUsage( $entityId );
StatementTransclusionInteractor::render()
		// Currently statement usage implies other usage.
		$this->usageAccumulator->addOtherUsage( $entityId );

		// If the entity doesn't exist, we just want to resolve the property id
		// for usage tracking purposes, so don't let the exception bubble up.
		$shouldThrow = $entity !== null;

		$propertyId = $this->resolvePropertyId( $propertyLabelOrId, $shouldThrow );

		if ( $propertyId ) {
			// XXX: This means we only track a statement usage if the property id /label
			// can be resolved. This requires the property to exist!
			$this->usageAccumulator->addStatementUsage( $entityId, $propertyId );
		}

We probably want to stop doing that, yeah.

Apart from that, alias usage (mw.wikibase.entity::maskEntityTables()) and entity existence (EntityAccessor::entityExists()) seem to be the only places where we use โ€œotherโ€ usage, as Michael says.

Let me double check with Marius but after that, let's remove this.

Marius said we can remove it, let me make a patch on this.

Brief check for usages on enwiki:

MariaDB [enwiki_p]> SELECT eu_aspect, eu_page_id FROM wbc_entity_usage WHERE eu_entity_id = 'Q1513315' ORDER BY eu_page_id DESC LIMIT 13;
+-----------+------------+
| eu_aspect | eu_page_id |
+-----------+------------+
| C.P793    |   67023127 |
| O         |   67023127 |
| C.P793    |   66824389 |
| O         |   66824389 |
| C.P18     |   60532910 |
| C.P2670   |   60101418 |
| L.en      |   60101418 |
| O         |   60101418 |
| C.P793    |   60101418 |
| L.en      |   57316060 |
| C.P2670   |   57316060 |
| C.P793    |   57316060 |
| O         |   57316060 |
+-----------+------------+
13 rows in set (0.00 sec)

This supports the idea that weโ€™re adding lots of โ€œotherโ€ usages with โ€œstatementโ€ usages.

I then did a more thorough check on cawiki, and there are ~12k distinct pages that have some C% usage but no O usage:

SELECT DISTINCT u1.eu_page_id FROM wbc_entity_usage AS u1 LEFT JOIN wbc_entity_usage AS u2 ON u1.eu_page_id = u2.eu_page_id AND u2.eu_aspect = 'O' WHERE u1.eu_aspect LIKE 'C%' AND u2.eu_page_id IS NULL;

I guess those must be coming from accesses to the claims in Lua (which, unlike the getEntityStatements function, doesnโ€™t add an โ€œotherโ€ usage, if I read the code correctly). But thatโ€™s a fairly small proportion compared to the ~614k total distinct pages with C% usage on cawiki.

Change 703192 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@master] Remove subscribing to other aspect for entity usage

https://gerrit.wikimedia.org/r/703192

Change 703192 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@master] Remove subscribing to other aspect for entity usage

https://gerrit.wikimedia.org/r/703192

๐Ÿ‘† This patch was moved to another task ๐Ÿ‘†

(I'll be finishing my work on this, tackling the remaining ACs)

Change 703445 had a related patch set uploaded (by Michael GroรŸe; author: Michael GroรŸe):

[mediawiki/extensions/Wikibase@master] Update documentation of "Other" usage aspect

https://gerrit.wikimedia.org/r/703445

Change 703445 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Update documentation of "Other" usage aspect

https://gerrit.wikimedia.org/r/703445

Change 703414 had a related patch set uploaded (by Reedy; author: Michael GroรŸe):

[mediawiki/extensions/Wikibase@REL1_35] Update documentation of "Other" usage aspect

https://gerrit.wikimedia.org/r/703414

Change 703415 had a related patch set uploaded (by Reedy; author: Michael GroรŸe):

[mediawiki/extensions/Wikibase@REL1_36] Update documentation of "Other" usage aspect

https://gerrit.wikimedia.org/r/703415

Most of the work was done by Michael, I did the last pushes.

Change 703415 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@REL1_36] Update documentation of "Other" usage aspect

https://gerrit.wikimedia.org/r/703415

Change 703414 merged by Ladsgroup:

[mediawiki/extensions/Wikibase@REL1_35] Update documentation of "Other" usage aspect

https://gerrit.wikimedia.org/r/703414