doc.wikimedia.org displays "403 Forbidden" for coverage sub directories
Closed, ResolvedPublic

Description

  1. https://doc.wikimedia.org/cover/
  2. Select "mediawiki-core" (https://doc.wikimedia.org/cover/mediawiki-core/)

This used to load dir.php via htaccess handler (/cover/.htaccess) which then redirects to /master/php/. Similar auto-discover redirects exist for many other paths here (every library, every branch, and from there to programming languages - or browser names for js coverage).

Instead, this now renders a "403 Forbidden" error.

Krinkle created this task.Nov 15 2016, 4:32 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 15 2016, 4:32 AM
hashar added a subscriber: hashar.Nov 15 2016, 1:10 PM

Trying to access https://doc.wikimedia.org/cover/mediawiki-core/ yields in Apache:

[autoindex:error] AH01276: Cannot serve directory /srv/org/wikimedia/doc/cover/mediawiki-core/:
No matching DirectoryIndex (index.html,index.cgi,index.pl,index.php,index.xhtml,index.htm) found,
and server-generated directory index forbidden by Options directive

Maybe a consequence of the switch from Apache 2.2 to 2.4. Digging a bit in Apache, the reason is bdaa8aab50d4422b73bf8f4bba40764919b94f06 for the doc.wikimedia.org virtualhost: there is no more any AllowOverride All on / which is a good thing, but there should be one for /srv/org/wikimedia/doc.

Change 321650 had a related patch set uploaded (by Hashar):
contint: remove Apache 2.2 compatibility config

https://gerrit.wikimedia.org/r/321650

Change 321651 had a related patch set uploaded (by Hashar):
contint: allow .htaccess on doc.wikimedia.org

https://gerrit.wikimedia.org/r/321651

hashar claimed this task.Nov 15 2016, 1:20 PM

https://gerrit.wikimedia.org/r/321650 cleans up apache 2.2 back compatibility.

https://gerrit.wikimedia.org/r/321651 makes Apache to honor .htaccess files for doc.wikimedia.org.

hashar triaged this task as High priority.Nov 15 2016, 1:21 PM
hashar moved this task from To Triage to In-progress on the Release-Engineering-Team board.

https://gerrit.wikimedia.org/r/321651 makes Apache to honor .htaccess files for doc.wikimedia.org.

Daniel Zahn pointed out it would probably be better to have the rewrite rules directly in the Apache conf and managed by puppet.

Change 321650 merged by Dzahn:
contint: remove Apache 2.2 compatibility config

https://gerrit.wikimedia.org/r/321650

Change 322019 had a related patch set uploaded (by Dzahn):
contint: move .htaccess content for doc/integration to puppet

https://gerrit.wikimedia.org/r/322019

Change 322020 had a related patch set uploaded (by Dzahn):
delete .htaccess files for doc/integration

https://gerrit.wikimedia.org/r/322020

Change 321651 abandoned by Hashar:
contint: allow .htaccess on doc.wikimedia.org

Reason:
See instead:

https://gerrit.wikimedia.org/r/#/c/322020/
https://gerrit.wikimedia.org/r/#/c/322019/

https://gerrit.wikimedia.org/r/321651

Change 322019 merged by Dzahn:
contint: move .htaccess content for doc/integration to puppet

https://gerrit.wikimedia.org/r/322019

Change 322020 merged by jenkins-bot:
delete .htaccess files for doc/integration

https://gerrit.wikimedia.org/r/322020

Status

  • The rewrite rules have been moved from integration/docroot.git to operations/puppet.git
  • AllowOverride is still None
  • Configuration for Apache 2.2 have been removed
  • DirectoryIndex has been set to use index.php / index.html and some rewrite rules have been dropped.

/cover/mediawiki-core/ does not work since it has neither index.html or index.php and the configuration no more rewrite to the fallback dir.php. Looks like we might be able to use FallbackResource https://httpd.apache.org/docs/2.4/en/rewrite/remapping.html#fallback-resource instead of:

Will have to reproduce/test it first.

Status [..]

/cover/mediawiki-core/ does not work since it has neither index.html or index.php and the configuration no more rewrite to the fallback dir.php. Looks like we might be able to use FallbackResource https://httpd.apache.org/docs/2.4/en/rewrite/remapping.html#fallback-resource instead

The puppet patch does set dir.php as resource to use when accessing a directory-like url that doesn't exist on doc.wikimedia.org.

# DirectoryIndex would be neater, but doesn't work properly under Apache 2.2
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_FILENAME}/index.php !-f
RewriteCond %{REQUEST_FILENAME}/index.html !-f
RewriteRule .* dir.php

I'm curious why that doesn't work. Maybe it's in the error logs?

Also since we no longer support Apache 2.2, per my above inline comment, DirectoryIndex might work.

Bump. Links still broken.

hashar added a comment.Jan 5 2017, 6:23 PM

Indeed and I forgot about this task sorry. Last status stands: gotta reproduce and figure out the proper Apache config :-/

The really fun one is https://doc.wikimedia.org/cover/visualeditor/ – you need to magically know to go to https://doc.wikimedia.org/cover/visualeditor/Chrome%2055.0.2883%20(Linux%200.0.0)/ (which changes from time to time as Chrome versions upgrade).

greg added a subscriber: greg.Jan 5 2017, 7:40 PM

Aside from automatic redirects (which is only one of dir.php's features), the other feature (and its primary purpose) was listing the available subdirectories in a clean way. This is naturally broken now as well.

Which means going to https://doc.wikimedia.org/mediawiki-core/ (as opposed to /cover/) also generates a 403 Forbidden - instead of listing the available subdirectories ("master", "REL1_28" etc.)

This means that our versioned documentation is now completely undiscoverable.

Krinkle added a comment.EditedJan 10 2017, 10:53 PM

The change that caused this breakage was the move from htaccess configuration into puppet (which coincided with an upgrade from Apache 2.2 to Apache 2.4)

Change 331558 had a related patch set uploaded (by Krinkle):
contint: Re-add dir.php to doc.wm.org DirectoryIndex

https://gerrit.wikimedia.org/r/331558

Change 331559 had a related patch set uploaded (by Krinkle):
doc: Remove unused htaccess file

https://gerrit.wikimedia.org/r/331559

Change 331559 merged by jenkins-bot:
doc: Remove unused htaccess file

https://gerrit.wikimedia.org/r/331559

Mentioned in SAL (#wikimedia-operations) [2017-01-12T08:59:30Z] <hashar> disabling puppet on contint1001 to live hack apache conf ( T150727 )

hashar added a comment.EditedJan 16 2017, 12:36 PM

Finally went back to this task. I have set back up a test environment on my local machine and reproduce the issue locally.

I had some old conf left behind which was working and had:

<Directory />
    AllowOverride All   
</Directory>

And I note we still have a .htaccess in the docroot:

./org/wikimedia/doc/cover/.htaccess
# DirectoryIndex would be neater, but doesn't work properly under Apache 2.2
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_FILENAME}/index.php !-f
RewriteCond %{REQUEST_FILENAME}/index.html !-f
RewriteRule .* dir.php

In a VirtualHost $REQUEST_FILENAME is not mapped yet to the filename but to the request URI. From https://httpd.apache.org/docs/current/en/mod/mod_rewrite.html#rewritecond

If used in per-server context (i.e., before the request is mapped to the filesystem) SCRIPT_FILENAME and REQUEST_FILENAME cannot contain the full local filesystem path since the path is unknown at this stage of processing. Both variables will initially contain the value of REQUEST_URI in that case.

In order to obtain the full local filesystem path of the request in per-server context, use an URL-based look-ahead %{LA-U:REQUEST_FILENAME} to determine the final value of REQUEST_FILENAME.

The reason for the change is that we had a .htaccess which is evaluated in Directory context and gives us the file path. Gotta harness the rewrite rules inside Directory statements.

Change 332385 had a related patch set uploaded (by Hashar):
contint: import rewrite rule from integration/docroot

https://gerrit.wikimedia.org/r/332385

Change 332386 had a related patch set uploaded (by Hashar):
doc: migrate .htaccess to puppet

https://gerrit.wikimedia.org/r/332386

Change 332386 abandoned by Hashar:
doc: migrate .htaccess to puppet

Reason:
Bah dupe of af45ba0e4cf49c61c57ac3136e373a0a3c8384ae

https://gerrit.wikimedia.org/r/332386

Change 332385 merged by Alexandros Kosiaris:
contint: import rewrite rule from integration/docroot

https://gerrit.wikimedia.org/r/332385

Change 332432 had a related patch set uploaded (by Hashar):
contint: properly set REQUEST_FILENAME in vhost

https://gerrit.wikimedia.org/r/332432

Change 332432 merged by Alexandros Kosiaris:
contint: properly set REQUEST_FILENAME in vhost

https://gerrit.wikimedia.org/r/332432

hashar closed this task as Resolved.Jan 17 2017, 8:45 AM

So we had two issues:

  • We forgot to move the rewrite rules from /cover/.htaccess
  • REQUEST_FILENAME in a vhost initially contains REQUEST_URI instead of the file path. That is fixed by enclosing rewrite rules in a <Directory /> statement so Apache resolve the path.

Deployed and it looks fine to me now?

Worth noting, Timo had a puppet patch on https://gerrit.wikimedia.org/r/#/c/331558/ as well

Change 331558 abandoned by Krinkle:
contint: Re-add dir.php to doc.wm.org DirectoryIndex

Reason:
Yep. https://gerrit.wikimedia.org/r/#/c/332432/ fixed it 3 days ago.

https://gerrit.wikimedia.org/r/331558

  • We forgot to move the rewrite rules from /cover/.htaccess

Yeah. Though it wasn't really forgotten. Daniel and you moved it correctly last year in rOPUP3a2be, but that was removed later that day in favour of DIrectoryIndex (rOPUP057bc7c), which didn't work (my bad for recommending that in the first place).

Commit 3d30700 re-imported it with a minor fixup in e009f5df19. Thanks!

antoine-approve