Create robots.txt policy for datasets
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	Milimetric
	Feb 27 2017, 11:04 PM

Description

Our datasets get crawled all the time and some of them are a few MB. We could disallow all crawling on datasets to help reduce bandwidth usage. But is it good for any reason to get them crawled? I mean we can link to specific folders from the wikis if we want them to be searchable on the web, right?

Details

	Subject	Repo	Branch	Lines +/-
	Prevent datasets from being crawled	analytics/analytics.wikimedia.org	master	+2 -0

Customize query in gerrit

Related Objects

Mentioned In: rAAWO37c67c8585dc: Prevent datasets from being crawled

Event Timeline

Milimetric created this task.Feb 27 2017, 11:04 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2017, 11:04 PM

Milimetric added a project: Analytics.Feb 27 2017, 11:04 PM

Is there any reason we are actually concerned about bandwidth usage?

@Peachey88 not particularly, this is low priority, but it just seems like a bad idea to waste it for no reason, especially on larger files like datasets. I mean it downloads the whole thing just to go: oh, not HTML, moving on.

• Nuria edited projects, added Analytics-Kanban; removed Analytics.Mar 2 2017, 5:11 PM

Milimetric claimed this task.Mar 30 2017, 4:11 PM

• Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.Mar 30 2017, 4:17 PM

• Nuria set the point value for this task to 1.

Change 345634 had a related patch set uploaded (by Milimetric):
[analytics/analytics.wikimedia.org@master] Prevent datasets from being crawled

https://gerrit.wikimedia.org/r/345634

gerritbot added a project: Patch-For-Review.Mar 30 2017, 7:13 PM

Milimetric mentioned this in rAAWO37c67c8585dc: Prevent datasets from being crawled.Mar 30 2017, 7:15 PM

Change 345634 merged by Milimetric:
[analytics/analytics.wikimedia.org@master] Prevent datasets from being crawled

https://gerrit.wikimedia.org/r/345634

Milimetric moved this task from In Progress to Done on the Analytics-Kanban board.Mar 31 2017, 1:39 PM

• Nuria closed this task as Resolved.Apr 3 2017, 4:38 PM

Create robots.txt policy for datasetsClosed, ResolvedPublic1 Estimated Story PointsActions

Description

Details

Related Objects

Event Timeline

Create robots.txt policy for datasets
Closed, ResolvedPublic1 Estimated Story Points
Actions