Page MenuHomePhabricator

Install a robots.txt file for maps.wikimedia.org
Open, Needs TriagePublic

Description

It's my first bug report, so my apologises if I'm doing something wrong...

I saw maps.wikimedia.org map-tiles showing up in the Google Image search of my website. Which means that the Google Crawler has crawled and index your server. I checked your robots.txt file but you seem to not have one. In my mind you could reduce your server load significantly by installing a robots.txt file that disallows all crawlers.
Just like OSM does: https://a.tile.openstreetmap.org/robots.txt
Or OpenTopoMap: https://a.tile.opentopomap.org/robots.txt

Event Timeline

Change 619124 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/services/kartotherian@master] Add a robots.txt to maps.wikimedia.org

https://gerrit.wikimedia.org/r/619124

Change 619124 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] Add a robots.txt to maps.wikimedia.org

https://gerrit.wikimedia.org/r/619124

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:12:57Z] <mbsantos@deploy1001> Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:24:33Z] <mbsantos@deploy1001> Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 11m 36s)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:25:13Z] <mbsantos@deploy1001> Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:26:22Z] <mbsantos@deploy1001> Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 01m 09s)

Jgiannelos claimed this task.
TheDJ added a subscriber: TheDJ.

curl -v "https://maps.wikimedia.org/robots.txt"

< HTTP/2 200
< date: Thu, 07 Apr 2022 09:34:32 GMT
< access-control-allow-origin: *
< access-control-allow-headers: accept, x-requested-with, content-type
< access-control-expose-headers: etag
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< content-security-policy: default-src 'self'; object-src 'none'; media-src 'none'; style-src 'self'; script-src 'self'; frame-ancestors 'self'
< x-content-security-policy: default-src 'self'; object-src 'none'; media-src 'none'; style-src 'self'; script-src 'self'; frame-ancestors 'self'
< x-webkit-csp: default-src 'self'; object-src 'none'; media-src 'none'; style-src 'self'; script-src 'self'; frame-ancestors 'self'
< user-agent: *
< disallow: /
< server: ATS/8.0.8
< age: 30729
< x-cache: cp3061 miss, cp3057 hit/243
< x-cache-status: hit-front
< server-timing: cache;desc="hit-front", host;desc="cp3057"
< strict-transport-security: max-age=106384710; includeSubDomains; preload
< report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
< nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
< accept-ch: Sec-CH-UA-Arch,Sec-CH-UA-Bitness,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-UA-Platform-Version
< permissions-policy: interest-cohort=(),ch-ua-arch=(self "intake-analytics.wikimedia.org"),ch-ua-bitness=(self "intake-analytics.wikimedia.org"),ch-ua-full-version-list=(self "intake-analytics.wikimedia.org"),ch-ua-model=(self "intake-analytics.wikimedia.org"),ch-ua-platform-version=(self "intake-analytics.wikimedia.org")
< x-client-ip: xxxxxxxxxxx
< accept-ranges: bytes
< content-length: 0

Change 778327 had a related patch set uploaded (by MSantos; author: MSantos):

[mediawiki/services/kartotherian@master] Fix robots.txt content-type

https://gerrit.wikimedia.org/r/778327

Change 778327 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] Fix robots.txt content-type

https://gerrit.wikimedia.org/r/778327