Page MenuHomePhabricator

Consider addition of Wikidata tracking
Open, LowPublicSpike

Description

Currently, Wikidata is explicitly excluded from hashtag monitoring in the tool because the volume of data is orders of magnitude higher than from all other projects combined. The overall rate for all other projects is on the order of dozens of hashtag uses per hour - for Wikidata this is more like thousands. If we want to avoid the data overload issues we had with the old hashtags tool we need to consider how best to proceed.

Tracking Wikidata edits has value - many campaigns involve Wikidata contributions. However, there are many automated or semi-automated means by which editors can make edits which include a hashtag edit summary. From a cursory glance it appears that quickstatements is the biggest culprit. We could exclude it specifically, but that might be confusing.

Event Timeline

Samwalton9-WMF moved this task from Incoming to Features on the Hashtags board.

For more specific data, I found a range of ~20-150 hashtags per hour for all other projects, and around 2000 during the hour I tested for Wikidata.

From running some Quarry queries it looks like ~30% of all Wikidata edits are tagged with #quickstatements - nearly 7 million in the past month!

Measuring impact of tools may, besides tracking 'human' campaigns, be an interesting and valid use case for the Hashtags tool, so I'd be hesitant to drop QuickStatements edits (or any tool or bot edit) from the results by default.

See also T207370: Statistics of number of Wikidata edits with Magnus Manske's tools

Measuring impact of tools may, besides tracking 'human' campaigns, be an interesting and valid use case for the Hashtags tool

Yeah that's absolutely the case - I've restricted bot edits because you can easily track those anyway by looking at a single account's contributions, but tracking tool use is something I wouldn't want to remove support for.

It might be the case that with additional resources the tool could happily support Wikidata edits too, but I can see that it's currently already sluggish with the flickr2commons results, which have hit 110,000 entries.

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptJul 19 2019, 9:57 AM