Page MenuHomePhabricator

Install a robots.txt file for maps.wikimedia.org
Closed, ResolvedPublic

Description

It's my first bug report, so my apologises if I'm doing something wrong...

I saw maps.wikimedia.org map-tiles showing up in the Google Image search of my website. Which means that the Google Crawler has crawled and index your server. I checked your robots.txt file but you seem to not have one. In my mind you could reduce your server load significantly by installing a robots.txt file that disallows all crawlers.
Just like OSM does: https://a.tile.openstreetmap.org/robots.txt
Or OpenTopoMap: https://a.tile.opentopomap.org/robots.txt

Event Timeline

Change 619124 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/services/kartotherian@master] Add a robots.txt to maps.wikimedia.org

https://gerrit.wikimedia.org/r/619124

Change 619124 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] Add a robots.txt to maps.wikimedia.org

https://gerrit.wikimedia.org/r/619124

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:12:57Z] <mbsantos@deploy1001> Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:24:33Z] <mbsantos@deploy1001> Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 11m 36s)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:25:13Z] <mbsantos@deploy1001> Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:26:22Z] <mbsantos@deploy1001> Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 01m 09s)

Jgiannelos claimed this task.
TheDJ subscribed.

curl -v "https://maps.wikimedia.org/robots.txt"

< HTTP/2 200
< date: Thu, 07 Apr 2022 09:34:32 GMT
< access-control-allow-origin: *
< access-control-allow-headers: accept, x-requested-with, content-type
< access-control-expose-headers: etag
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< content-security-policy: default-src 'self'; object-src 'none'; media-src 'none'; style-src 'self'; script-src 'self'; frame-ancestors 'self'
< x-content-security-policy: default-src 'self'; object-src 'none'; media-src 'none'; style-src 'self'; script-src 'self'; frame-ancestors 'self'
< x-webkit-csp: default-src 'self'; object-src 'none'; media-src 'none'; style-src 'self'; script-src 'self'; frame-ancestors 'self'
< user-agent: *
< disallow: /
< server: ATS/8.0.8
< age: 30729
< x-cache: cp3061 miss, cp3057 hit/243
< x-cache-status: hit-front
< server-timing: cache;desc="hit-front", host;desc="cp3057"
< strict-transport-security: max-age=106384710; includeSubDomains; preload
< report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
< nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
< accept-ch: Sec-CH-UA-Arch,Sec-CH-UA-Bitness,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-UA-Platform-Version
< permissions-policy: interest-cohort=(),ch-ua-arch=(self "intake-analytics.wikimedia.org"),ch-ua-bitness=(self "intake-analytics.wikimedia.org"),ch-ua-full-version-list=(self "intake-analytics.wikimedia.org"),ch-ua-model=(self "intake-analytics.wikimedia.org"),ch-ua-platform-version=(self "intake-analytics.wikimedia.org")
< x-client-ip: xxxxxxxxxxx
< accept-ranges: bytes
< content-length: 0

Change 778327 had a related patch set uploaded (by MSantos; author: MSantos):

[mediawiki/services/kartotherian@master] Fix robots.txt content-type

https://gerrit.wikimedia.org/r/778327

Change 778327 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Fix robots.txt content-type

https://gerrit.wikimedia.org/r/778327

@Jgiannelos: Hi! This task has been assigned to you a while ago. Could you maybe share an update?
If this task has been resolved in the meantime: Please update the task status (via Add Action...Change Status in the dropdown menu).
If this task is not resolved and only if you do not plan to work on this task anymore: Please consider removing yourself as assignee (via Add Action...Assign / Claim in the dropdown menu): That would allow others to work on this (in theory), as others won't think that someone is already working on this. Thanks!