Track all Wikidata metrics currently gathered in Graphite rather than SQL and TSVs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Addshore
	Nov 4 2015, 5:15 PM

Description

Using graphite and statsd would be much simpler.
As far as I understand it will work for every usecase we have so far.

See T117732 regarding an analytics specific instance.

Details

Subject	Repo	Branch	Lines +/-
Convert getclaims stats to graphite	analytics/limn-wikidata-data	master	+40 -76
Convert site_stats to graphite	analytics/limn-wikidata-data	master	+75 -230
Social metrics to graphite	analytics/limn-wikidata-data	master	+213 -443

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Addshore	T117735 Track all Wikidata metrics currently gathered in Graphite rather than SQL and TSVs
		Declined		Addshore	T117732 Create a Graphite instance in the Analytics cluster

Event Timeline

Addshore created this task.Nov 4 2015, 5:15 PM

Addshore claimed this task.

Addshore raised the priority of this task from to Needs Triage.

Addshore updated the task description. (Show Details)

Addshore added a project: WMDE-Analytics-Engineering.

Addshore subscribed.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 4 2015, 5:15 PM

Addshore added a subtask: T117732: Create a Graphite instance in the Analytics cluster.Nov 4 2015, 5:15 PM

Addshore added a project: Wikidata.

Addshore set Security to None.

fgiunchedi mentioned this in T117732: Create a Graphite instance in the Analytics cluster.Nov 5 2015, 2:09 PM

To expand on the use cases for a metrics storage backend here is appropriate.

I think that Wikidata content metrics favor long term retention (i.e. forever) because their purpose is to evaluate dynamics over both short and long period intervals. Since content is always changing, recreation of a past state from live data is not possible. The value of these historical measurement "snapshots" is therefore quite high. These old data are never archived either and must be able to be retrieved without loading a dump or using some offline process.

In contrast, ops metrics are much more focused on the present and/or recent state.

Thus, two different use cases exist here. If the proposal to use Graphite can substantiate a long term ( not decaying ) storage method, then it should work for both. If not, then something else (like OpenTSDB/ HBase) should be implemented.

If the proposal to use Graphite can substantiate a long term ( not decaying ) storage method, then it should work for both

Retention and resolution changes / decay are both configurable.

Simply setting the retention to 1d:100y would / should keep daily metrics for a period of 100years

thanks for expanding on that, here's my (as the person who's been looking after our graphite stack) opinion:

graphite isn't really data warehouse, thus I wouldn't recommend it as the primary storage for the verbatim/authoritative data
- though saving data in graphite for graphing/etc and archived elsewhere too I think would cater in this case
it is possible as @Addshore suggests to not downsample daily data for a really long time, e.g. keeping a daily metric for e.g. 100y takes 438028 bytes on disk for each metric
an analytics graphite instance could help, it means maintenance of that too of course.
if the volume of metrics isn't very high (no idea on the order of magnitude though) then using the main graphite is certainly less overhead. To give an example, if we're talking about 10k distinct metrics that'd be no problem, 100k would be ATM.

hope that helps!

Addshore changed the status of subtask T117732: Create a Graphite instance in the Analytics cluster from Open to Stalled.Nov 13 2015, 2:21 PM

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Nov 16 2015, 11:46 AM

For reference and our worries about graphite loosing data / data being removed please see this crude script.

https://github.com/addshore/graphite-backup

Addshore moved this task from Incoming to Doing on the WMDE-Analytics-Engineering board.Nov 17 2015, 9:24 AM

Change 253571 had a related patch set uploaded (by Addshore):
Social metrics to graphite

https://gerrit.wikimedia.org/r/253571

Change 253572 had a related patch set uploaded (by Addshore):
Convert site_stats to graphite

https://gerrit.wikimedia.org/r/253572

Change 253573 had a related patch set uploaded (by Addshore):
Convert getclaims stats to graphite

https://gerrit.wikimedia.org/r/253573

Change 253571 merged by jenkins-bot:
Social metrics to graphite

https://gerrit.wikimedia.org/r/253571

Change 253572 merged by Addshore:
Convert site_stats to graphite

https://gerrit.wikimedia.org/r/253572

Change 253573 merged by Addshore:
Convert getclaims stats to graphite

https://gerrit.wikimedia.org/r/253573

Resolved using the PSs linked.

Now to import the old data into graphite.

Addshore moved this task from Doing to Done on the WMDE-Analytics-Engineering board.Nov 17 2015, 10:24 AM

Addshore closed subtask T117732: Create a Graphite instance in the Analytics cluster as Declined.Nov 20 2015, 4:17 PM

Track all Wikidata metrics currently gathered in Graphite rather than SQL and TSVsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Track all Wikidata metrics currently gathered in Graphite rather than SQL and TSVs
Closed, ResolvedPublic
Actions

Related Objects
Search...