Page MenuHomePhabricator

Make stats.wikimedia.org point to wikistats2 by default
Closed, ResolvedPublic

Description

This is a bit tricky cause we want all erik's urls to be preserved (minus homepage) and also we want to be able to give a path to users to access the older data which means that we need to add a link to the legacy wikistats site top the wikistats2 site

Event Timeline

Ottomata triaged this task as Medium priority.
Ottomata added a project: Analytics-Kanban.
Ottomata moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Ottomata moved this task from Smart Tools for Better Data to Wikistats on the Analytics board.

ping @elukey on task as we would need the couple possible options we have about this

First of all, inside the Virtual host there are Directory entries that are probably super stale:

# Allow CGI scripts for this site
<Directory "/srv/stats.wikimedia.org/cgi-bin">
    Require all granted
    AddHandler cgi-script .pl
</Directory>

ScriptAlias /cgi-bin/ /srv/stats.wikimedia.org/cgi-bin/

<Directory "/srv/stats.wikimedia.org/htdocs/reportcard/staff">
    AllowOverride None
    AuthName "Password protected area"
    AuthType Basic
    AuthUserFile /etc/apache2/htpasswd.stats
    Require user wmf
</Directory>
<Directory "/srv/stats.wikimedia.org/htdocs/reportcard/extended">
    AllowOverride None
    AuthName "Password protected area"
    AuthType Basic
    AuthUserFile /etc/apache2/htpasswd.stats
    Require user internal
</Directory>
<Directory "/srv/stats.wikimedia.org/htdocs/reportcard/pediapress">
    AllowOverride None
    AuthName "Password protected area"
    AuthType Basic
    AuthUserFile /etc/apache2/htpasswd.stats
    Require user pediapress
</Directory>


# Force https and use http auth for geowiki's private data
<Directory "/srv/stats.wikimedia.org/htdocs/geowiki-private">
    RewriteEngine On
    RewriteCond %{HTTP:X-Forwarded-Proto} !https
    RewriteRule ^/(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,E=ProtoRedirect]

    AllowOverride None
    AuthName "Geowiki's 'foundation only' files"
    AuthType Basic
    AuthUserFile "/etc/apache2/htpasswd.stats-geowiki"
    Require valid-user
</Directory>

Can we drop these configs? It would simplify a lot the Vhost..

The other idea that me and Fran had about moving on with the stats.w.o -> v2 transition is the following:

  1. Add something like the following to the httpd config:
RewriteEngine On
RewriteRule ^v2 - [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+) v2/$1 [QSA,L]
  1. symlink the htdocs index.html (wikistats 1 ) to the v2 version

In this way (needs more testing to confirm) httpd should behave like the following:

  • When stats.wikimedia.org is hit, it will look for the index.html file in the wikistats 1 htdocs, follow the symlink to v2 and load that file. The v2 index.html requires some static assets, that will be looked first by mod_rewrite in the v1 directory, and if not found in the v2. This will allow to keep stats.wikimedia.org current links and point index.html to the new v2 version.
  • When stats.wikimedia.org/v2 is hit everything will work as expected.
  • When any stats.wikimedia.org non-index.html link is requested, everything should keep working as expected.

+1 to dropping those configs. Geowiki data was archived a whileback: https://wikitech.wikimedia.org/wiki/Analytics/Archive/Geowiki#On_the_hadoop_cluster_(foundation_internal_only)

And we deprecated anything to do with reportcard a while back too.

+1 to dropping those configs. Geowiki data was archived a whileback: https://wikitech.wikimedia.org/wiki/Analytics/Archive/Geowiki#On_the_hadoop_cluster_(foundation_internal_only)

And we deprecated anything to do with reportcard a while back too.

Can we also drop the content of the related directories after reviewing the files?

Can we also drop the content of the related directories after reviewing the files?

Yes, all that functionality i think is 4+ old

@Milimetric I'd ask for a quick review of the data contained on thorium's password-protected directories if you have time :)

We have archived all the old geowiki (old name for geoeditors data) data to the archive hive database, tables are:

  • geowiki_archive_active_editors_world
  • geowiki_archive_country
  • geowiki_archive_edit_fraction_city
  • geowiki_archive_edits_country
  • geowiki_archive_monthly_country
  • geowiki_archive_monthly_edits_country

To be very cautious, @Ijon maybe you could let us know if the private data in the geowiki and reportcard folders mentioned above in T237752#5667542 are still used in any way. Otherwise, dropping is ok as far as I know.

Hi, I don't understand this example from https://wikitech.wikimedia.org/w/index.php?title=Analytics/Wikistats/Deprecation_of_Wikistats_1&oldid=1844348:

Example: https://stats.wikimedia.org/v2/#/mediawiki would be moved to https://stats.wikimedia.org/#/mediawiki, but the previous version should also work and redirect to the new one

Why not keep the Wikistats 2 under "/v2" and save ourselves the trouble when "/v3" comes in?

@elukey - we can talk more tomorrow but this solution hides Wikistats 1's index.html, which is how most people navigated the old site. I think we should preserve it. My idea seemed simple to me, let's see what I'm missing:

  • Copy all of / to /v1
  • Deploy wikistats 2 to /
  • redirect /v2 to /

This way, all bookmarks keep working. Relative links from the old index.html keep working. Absolute links work. And links to / just render the new site, where we can link to the old site (probably from the all-metrics page?)

@saper - keeping a project that is no longer updated as the root and leaving the maintained version in a subfolder means nobody will find it.

@elukey - we can talk more tomorrow but this solution hides Wikistats 1's index.html, which is how most people navigated the old site. I think we should preserve it. My idea seemed simple to me, let's see what I'm missing:

  • Copy all of / to /v1
  • Deploy wikistats 2 to /
  • redirect /v2 to /

This way, all bookmarks keep working. Relative links from the old index.html keep working. Absolute links work. And links to / just render the new site, where we can link to the old site (probably from the all-metrics page?)

Yep seems good, the only drawback that I can see is that wikistats 1 links like https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm will need to get the "/v1/" in the URL path right?

That’s the ugly/cute part, since we’re copying and not moving / to /v1, all the urls will work, relative or absolute. This is totally fine with v2 because that’s a client-side single page app that won’t overwrite anything except index.html

That’s the ugly/cute part, since we’re copying and not moving / to /v1, all the urls will work, relative or absolute. This is totally fine with v2 because that’s a client-side single page app that won’t overwrite anything except index.html

ahhhh okok, then I have another proposal: what if we re-use the mod_rewrite trick above for v1? httpd will look into /v1/ if a file or directory is not found, and then bail off. This would allow us to move stuff to v1 (as opposed to copy) and deploy v2 into the main dir.

We're targeting January to roll this out

@saper - keeping a project that is no longer updated as the root and leaving the maintained version in a subfolder means nobody will find it.

You'll be surprised how many links are there to the old statistics in PDFs, old presentations slides, etc. - please do not break them.

I understand we are going to redirect them to /v1 now?

Please keep /v2 URL. You'll save us the trouble when going for /v3.

@saper the idea is to have the /v1 directory, but httpd will also have a rule to look into it to find assets (html, images, etc..) that will not be reflected in the URL structure. We are aware of the importance of the old URLs, the solution shouldn't break any of them :)

The other idea that me and Fran had about moving on with the stats.w.o -> v2 transition is the following:

  1. Add something like the following to the httpd config:
RewriteEngine On
RewriteRule ^v2 - [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+) v2/$1 [QSA,L]

New version proposal:

# Do not try to rewrite anything starting with /v1
# and skip any further check
RewriteRule ^v1 - [L]
# Strip /v2 from any URI and skip any further check
RewriteRule ^v2(.+) $1 [QSA,L]

# If the file/dir is reachable by httpd, serve it
# and skip any further check
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f

# If not, do one last attempt in the /v1 directory
# as it might be related to an old URL
RewriteCond %{DOCUMENT_ROOT}/v1%{REQUEST_URI} -d [or]
RewriteCond %{DOCUMENT_ROOT}/v1%{REQUEST_URI} -f
RewriteRule ^(.+) v1/$1 [QSA,L]

I was about to prep a puppet change for the above, but then I realized that there is another detail that I didn't take into account, namely the fact that the v2 directory is now a symlink to /srv/src/wikistats-v2/dist. If we checkout the wikistats v2 in httpd's root (after moving the current content to v1) we'd end up with something that doesn't really work (because we'd need only the dist directory content, not the rest).

I have another idea to propose, that would easily solve the situation:

  • leave the /v2 dir
  • create a /v1 dir with the old html/etc.. content
  • instruct httpd to look into v2 first, then v1 if still needed

The above works (already tested) but it would need an index.html symlink file pointing to v2's index.html one in the apache root directory.

This probably needs more discussion, I'll wait for @Milimetric's suggestions/ideas.

Change 563508 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] wikistats: serve the v2 version of the website by default

https://gerrit.wikimedia.org/r/563508

An alternative approach:

  • Let's have wikistats1 and wikistats2 in the same directory
  • Let's create a wikistats deployment git repo that only contains the deployment build and have puppet deploy that one. This implies changing slightly the current wikistats build and docs.

What about something like this:

DocumentRoot /srv/stats.wikimedia.org/htdocs
<Directory "/srv/stats.wikimedia.org/htdocs">
  Options Indexes MultiViews
  AllowOverride None
  Require all granted
</Directory>
Redirect permanent "/index.html" "/v2/index.html"
Alias "/v2"  "/srv/src/wikistats-v2/dist"
<Directory   "/srv/src/wikistats-v2/dist">
  <Files "index.html">
    Header set Cache-Control "max-age=10"
  </Files>
  AllowOverride None
  Require all granted
</Directory>
<Directory "/srv/src/wikistats-v2">
  AllowOverride None
  Require all denied
</Directory>

I love the power mod_rewrite gives me, but after 6 months I always scratch my head again...

Here is the test layout I have used:

/srv/stats.wikimedia.org/htdocs/old-index.html
/srv/stats.wikimedia.org/htdocs/index.html
/srv/stats.wikimedia.org/htdocs/old-stuff.html
/srv/src/wikistats-v2/dist/index.html
/usr/local/etc/apache24/Includes/stats.conf
# find /srv /usr/local/etc/apache24/Includes/stats.conf -type f -exec head -30 {} /dev/null \; | grep -v /dev/null
==> /srv/stats.wikimedia.org/htdocs/old-index.html <==
<html><head><title>Old index</title></head>
<body>
  <h1>Old index</h1>
  <p>Some <a href="old-stuff.html">Old stuff</a> still works at <a href="old-stuff.html">/old-stuff.html</a>.</p>
  <p>You really want to check out or new stuff at <a href="/v2/">/v2/</a>.
</body></html>


==> /srv/stats.wikimedia.org/htdocs/index.html <==
Unreachable

==> /srv/stats.wikimedia.org/htdocs/old-stuff.html <==
<html><head><title>Old stuff</title></head>
<body>
  <p>Old stuff works as before</p>
</body>
</html>

==> /srv/src/wikistats-v2/dist/index.html <==
<html><head><title>New index</title></head>
<body>
   <h1>New index</h1>

   <p>New cool stuff!</p>

   <p>Points to an old <a href="/old-index.html">/old-index.html</a>.</p>
</body></html>

==> /usr/local/etc/apache24/Includes/stats.conf <==
DocumentRoot /srv/stats.wikimedia.org/htdocs
<Directory "/srv/stats.wikimedia.org/htdocs">
  Options Indexes MultiViews
  AllowOverride None
  Require all granted
</Directory>
Redirect permanent "/index.html" "/v2/index.html"
Alias "/v2"  "/srv/src/wikistats-v2/dist"
<Directory   "/srv/src/wikistats-v2/dist">
  <Files "index.html">
    Header set Cache-Control "max-age=10"
  </Files>
  AllowOverride None
  Require all granted
</Directory>
<Directory "/srv/src/wikistats-v2">
  AllowOverride None
  Require all denied
</Directory>

The only change to the old site would be to rename index.html to old-index.html in srv/stats.wikimedia.org/htdocs. And then link to the old stats from thew new app (towards "/old-index.html").

I also assume that symbolic links are no longer needed on Erik's site; if not just add FollowSymlinks in the first Options directive.

This change could be done even now - only without the "Redirect permanent" line, remove the symlink and everything should work as it works now.

When we switch live, we just add the following line:

Redirect permanent "/index.html" "/v2/index.html"

Also you might want to add some cache control parameters to js and css assets if you like, all that inside of /srv/src/wikistats-v2/dist directory block.

Change 564739 had a related patch set uploaded (by saper; owner: saper):
[operations/puppet@production] Wikistats v2 need no symbolic link

https://gerrit.wikimedia.org/r/564739

Change 564745 had a related patch set uploaded (by saper; owner: saper):
[operations/puppet@production] Wikistats v2 go live

https://gerrit.wikimedia.org/r/564745

I love the power mod_rewrite gives me, but after 6 months I always scratch my head again...

The solution works but indeed the complexity also brings some side effect, namely adding a simple Redirect may cause loops due to the URL rewrite.. So we are now thinking about something different, originally we didn't want to have the "/v2" URL path, so what we are thinking to do is just to have v2/ deployed in /htdocs, since there is no file collision (except index.html).

Here is the test layout I have used:

Thanks a lot for the time spent in testing! Really appreciated :)

The only change to the old site would be to rename index.html to old-index.html in srv/stats.wikimedia.org/htdocs. And then link to the old stats from thew new app (towards "/old-index.html").

I also assume that symbolic links are no longer needed on Erik's site; if not just add FollowSymlinks in the first Options directive.

This change could be done even now - only without the "Redirect permanent" line, remove the symlink and everything should work as it works now.

When we switch live, we just add the following line:

Redirect permanent "/index.html" "/v2/index.html"

If we decide to go with this route I'll surely use your code change but need to triple check with my team first :)

Thank you. I think that getting rid of "/v2" is not a very worthy goal in itself. It could be also changed to something else. Unfortunately, I don't have a team to consult :)

Change 570667 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/wikistats2@master] Moves all dist assets to ./assets-v2 in the production build

https://gerrit.wikimedia.org/r/570667

For the record this is the approach being followed by the team, as a result of which I've made the above change to the way Wikistats bundles its files:

from @Milimetric :

Ok, here's what we came up with after some more brainstorming:

  • Change webpack build to clean up the /dist directory so it only includes two things:

index.html
assets-v2 (or whatever name)

  • Rename current index.html to index.old.html and add a link to it from new index.
  • Make 4 symlinks in the existing wikistats 1 htdocs:

/index.html -> git/clone/latest/dist/index.html
/assets-v2 -> git/clone/latest/dist/index.html

/v2/index.html -> git/clone/latest/dist/index.html
/v2/assets-v2 -> git/clone/latest/dist/index.html

Change 570667 moves all dist assets to ./assets-v2 in the production build, following this plan.

mmm, /assets-v2 should point to git/clone/latest/dist/assests-v2?

Rename current index.html to index.old.html and add a link to it from new index.

is this wikistats1 index? For this to work we also need additional apache configuration, can we push also the apache changes to CR them?

For this to work we also need additional apache configuration, can we push also the apache changes to CR them?

I'm not sure if we do? Whatever is needed I'll figure it out, just let me know when we are ready to go and I'll work with fdans (or whoever) to do it!

What is the canonical URL for the new stats after go live? For example, will this be

https://stats.wikimedia.org/v2/#/pl.wikipedia.org/contributing/active-editors/normal|line|2-year|~total|monthly

the canonical URL or something else? I don't think we should alias /v2/index.html to be the same as /index.html, we should alias one and all non-canonical ones should be redirected.

What happens if v3 gets launched? How do we keep the URLs like the above valid?

The url: https://stats.wikimedia.org/v2/#/pl.wikipedia.org/contributing/active-editors/normal|line|2-year|~total|monthly

would keep on working as is.

What happens if v3 gets launched? How do we keep the URLs like the above valid?

I do not think this is a likely scenario on the next few years

Thanks. I just realized there is a something I don't like with those links, just filed T244618: Canonical wikistats v2 URLs should be permalinks to the period the graph is referring to for this... possibly a duplicate though

I think it's worth thinking through a migration to a hypothetical v3, just to be careful. So I did that today and I reason that a v2 -> v3 migration wouldn't be too bad. Mostly because the v2 urls respect a single hierarchical rule (project[/area[/metric]]). So a new version could change or extend that rule in a controlled way without breaking the old URLs. The output that v2 renders could even be materialized as static HTML and served at the old urls without any performance hit on the new site.

This was an interesting thought exercise. I think we followed the users' wishes for a more dynamic interactive way to explore stats, but that does make it harder to persist this access over time while all the infrastructure is changing behind the UI. Perhaps Druid stops being maintained or we run out of money to upkeep a Cassandra cluster, etc. How would we make sure these URLs don't break? We could freeze them as I suggest above, like in the case of a migration, but we should talk more about the trade-off between dynamic and stable. I hadn't seen it as clearly before, thank you @saper.

FWIW, I think we should just keep /v2 as the canonical URLs. I don't think anyone will care. We can just redirect /index.html to /v2/index.html and be done with it :)

But, I don't care so much, I know folks like the clean versionless URLs too.

Change 570667 merged by jenkins-bot:
[analytics/wikistats2@master] Moves all dist assets to ./assets-v2 in the production build

https://gerrit.wikimedia.org/r/570667

Change 571496 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Symlink wikistats v2 at stats.wikimedia.org/index.html

https://gerrit.wikimedia.org/r/571496

Change 571499 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Redirect stats.wikimedia.org/v2 urls to the docroot

https://gerrit.wikimedia.org/r/571499

Change 571496 merged by Ottomata:
[operations/puppet@production] Symlink wikistats v2 at stats.wikimedia.org/index.html

https://gerrit.wikimedia.org/r/571496

I think we should just keep /v2 as the canonical URLs. I don't think anyone will care.

Agreed.

Change 571499 merged by Ottomata:
[operations/puppet@production] Redirect stats.wikimedia.org/v2 urls to the docroot

https://gerrit.wikimedia.org/r/571499

I think we should just keep /v2 as the canonical URLs. I don't think anyone will care.

Agreed.

Heh, Dan doesn't and we had already agreed not to, sooooo TOO LATE! :)

The output that v2 renders could even be materialized as static HTML and served at the old urls without any performance hit on the new site.

Current v2 URLs specify everything after # (URL fragment) - I think this cannot be dumped and served statically, because browsers will not send URL fragment to the server...

This was an interesting thought exercise.

Yes, it is good to think about the current project going away at some point...

Change 571726 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix stats.wikimedia.org/v2 redirect

https://gerrit.wikimedia.org/r/571726

Change 571726 merged by Ottomata:
[operations/puppet@production] Fix stats.wikimedia.org/v2 redirect

https://gerrit.wikimedia.org/r/571726

Change 563508 abandoned by Elukey:
wikistats: serve the v2 version of the website by default

https://gerrit.wikimedia.org/r/563508

Change 564745 abandoned by saper:
Wikistats v2: go live

Reason:
Another solution seems to be chosen

https://gerrit.wikimedia.org/r/564745

Change 564739 abandoned by saper:
Wikistats v2 need no symbolic link

Reason:
Another solution seems to be chosen

https://gerrit.wikimedia.org/r/564739