Page MenuHomePhabricator

Acquire enwp.org
Open, In Progress, LowPublicFeature

Description

Feature summary (what you would like to be able to do and where):
Acquire the enwp.org URL shortener to ensure its continued functionality considering its widespread use. I talked to Thomas Wang, the owner of the domain, who expressed interest in donating it to Wikipedia. This will not vary significantly in functionality from en-wp.com and en-wp.org, which WMF owns, and should be a simple low-cost rewrite.

The redirect rules are:

enwp.org → https://en.wikipedia.org/wiki$uri
c.enwp.org → https://commons.wikimedia.org/wiki$uri

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

  • enwp.org is used widely in IRC and on 3rd party platforms to link to Wikipedia
  • It is hosted by a non-WMF third party
  • Wikipedia links are long

Benefits (why should this be implemented?):

  • One of the largest Wikipedia URL shorteners continues to function
  • Wikipedia users can rely on it not being hijacked in the future
  • Thomas is willing to give it to WMF for free

Event Timeline

BCornwall moved this task from Backlog to Scheduled incidental work on the Traffic board.
BCornwall subscribed.
>>> len("enwp.org")
8
>>> len("w.wiki")
6

Not to mention that in most cases w.wiki will generate shorter URLs than enwp.org.

I think we'd be better off focusing improving the w.wiki service and getting people to swap over to it instead of also taking on enwp.org as a one-off special case.

+1 to what Legoktm said, it was quite the effort to introduce our very own https://w.wiki which is already official and shorter. Introducing a second "2nd tier" redirector also operated by us seems like future tech debt and URLs will have to be maintained forever if we don't make the effort now to replace them.

Also agreed irt phasing out enwp.org in favour of w.wiki (be nice if you could do w.wiki/en/A_page or something, but that's for another time..) however that desire isn't mutually exclusive to having the domain donated to us — I'd personally much rather we decide to break a load of enwp.org links now, than the domain expire one day and be used for something malicious..

@Legoktm The viability of this as a URL shortener is not relevant to the discussion. I am not proposing that we create a URL shortener, I am proposing that we take one on that already exists so that we can keep Wikipedia users safe. Additionally, enwp.org and w.wiki serve objectively different purposes. enwp is not just about making short links, it's about making shorter links than normal enwiki without having to go to a website to do so -- you can just type it out.

@Dzahn We already have (at least) two 2nd tier redirectors in the form of en-wp.org and en-wp.com, which to my knowledge were bought exclusively because WMF didn't have enwp.org. The technical effort is beyond minimal to keep this running, and it's in the interest of the general public. People are going to use enwp.org either way, in my opinion it is better for them to use it with the safety of a WMF implementation behind it.

@TheresNoTime Speaks my point eloquently. It's a user safety issue. Whether or not you want enwp.org to be used, it should be under the WMF umbrella because Wikipedia users do use it and will continue to.

I'd personally much rather we decide to break a load of enwp.org links now, than the domain expire one day and be used for something malicious..

While that's a valid point I would predict that then we do not ever want to break them and it will end result in something like "write a bunch of complex rewrite rules to rewrite old URL shortener URLs to new URL shortener URLs and maintain them forever".

Yea, get the domain just to hold on to it, but don't underestimate the effort needed to actually maintain and keep the URLs and make it clear early if that's really the goal or just to prevent hijacking. Given what kind of a project w.wiki was I somehow have doubts that this is just "a simple low-cost rewrite" without ongoing maintenance cost.

better for them to use it with the safety of a WMF implementation behind it.

Sure, that's true. Not argueing with that.

The technical effort is beyond minimal to keep this running,

But at the same time don't think this is going to be the case.

"write a bunch of complex rewrite rules to rewrite old URL shortener URLs to new URL shortener URLs and maintain them forever".

The rewrite rules in question are:

  • enwp.org/$1 -> en.wikipedia.org/wiki/$1
  • c.enwp.org/$1 -> commons.wikimedia.org/$1

Please do not exaggerate the complexity of this task to make a point

Please do not exaggerate the complexity of this task to make a point

That is if you want to keep the old URLs around forever and not migrate them. You will also need to have that added as virtual host either on the cluster or on the ncredir service and get it added to TLS certs to name just one thing.

I don't want you to think I am trying to block you to make a point when it was meant just as a light warning to not underestimate it.

P.S.

So, no worries, I am not blocking anything here in any way.

To get the domain you'll have to talk to Legal regardless and I don't think anyone argued we should not accept the domain.

It does make sense though to talk about this stuff early before we start using it, in my humble opinion. And with that.. I respectfully step back.

P.S.P.S.

I feel like I have never seen a "one off"-thing that actually stayed a one-off thing for 10 years.

That is if you want to keep the old URLs around forever and not migrate them.

enwp.org has never been and ideally will never be responsible for things such as Wikipedia redirects, naming scheme changes, etc. The only guarantee of permanence it makes is that it'll redirect to the corresponding wiki page on enwiki, or commons page on c.enwp.org.

You will also need to have that added as virtual host either on the cluster or on the ncredir service and get it added to TLS certs to name just one thing.

Great point -- but I think the cost here is still well worth the reward. In my opinion, doing this (mostly one off) work to guarantee this lives as long as WMF sees fit for security is worth it so that 10 years from now someone linking to enwp.org doesn't suddenly get faced with some spam/malicious website.

This is not a declaration that this is a good idea, but if I understand the current enwp.org behavior, I think this would replace it:

diff --git i/modules/ncredir/files/nc_redirects.dat w/modules/ncredir/files/nc_redirects.dat
index dc52c09013..bbcc31d1d3 100644
--- i/modules/ncredir/files/nc_redirects.dat
+++ w/modules/ncredir/files/nc_redirects.dat
@@ -115,6 +115,7 @@ rewrite *wikimediaenterprise.com    https://enterprise.wikimedia.com

 rewrite *en-wp.com  https://en.wikipedia.org
 rewrite *en-wp.org  https://en.wikipedia.org
+rewrite enwp.org    https://en.wikipedia.org/wiki

 rewrite wikipedia.com   https://www.wikipedia.org
 rewrite wikipedia.net   https://www.wikipedia.org

...however that desire isn't mutually exclusive to having the domain donated to us — I'd personally much rather we decide to break a load of enwp.org links now, than the domain expire one day and be used for something malicious..

...I am proposing that we take one on that already exists so that we can keep Wikipedia users safe.

Thomas has run this service for over a decade now, has he said he no longer wants to?

More philosophically, if we have reached the point where individual users can no longer run independent services and they must be taken over by the WMF, we're in a really bad place. I can think of dozens of off-wiki hosted tools that are important and in theory would stop working if the operator decided to stop supporting them. Consolidation of literally everything under WMF control is not healthy.

Please do not exaggerate the complexity of this task to make a point

To be clear, the technical work is more than rewrites, It's DNS, registration, TLS, HSTS, monitoring, logging, etc., etc. Of course, all of these are trivial individually because we have a decent framework, but it quickly adds up over the years to keep things updated with the latest stuff.

This is not a declaration that this is a good idea, but if I understand the current enwp.org behavior, I think this would replace it:

Almost. I am editing this diff manually so don't expect it to be a functional diff.

diff --git i/modules/ncredir/files/nc_redirects.dat w/modules/ncredir/files/nc_redirects.dat
index dc52c09013..bbcc31d1d3 100644
--- i/modules/ncredir/files/nc_redirects.dat
+++ w/modules/ncredir/files/nc_redirects.dat
@@ -115,6 +115,7 @@ rewrite *wikimediaenterprise.com    https://enterprise.wikimedia.com

 rewrite *en-wp.com  https://en.wikipedia.org
 rewrite *en-wp.org  https://en.wikipedia.org
+rewrite enwp.org    https://en.wikipedia.org/wiki
+rewrite c.enwp.org  https://commons.wikimedia.org

 rewrite wikipedia.com   https://www.wikipedia.org
 rewrite wikipedia.net   https://www.wikipedia.org

Thomas has run this service for over a decade now, has he said he no longer wants to?

No. I reached out to Thomas because I believed this was in the interest of user security. Thomas is willing to donate it to the WMF and said the reason he didn't before was because WMF became unresponsive about the idea.

More philosophically, if we have reached the point where individual users can no longer run independent services and they must be taken over by the WMF, we're in a really bad place. I can think of dozens of off-wiki hosted tools that are important and in theory would stop working if the operator decided to stop supporting them. Consolidation of literally everything under WMF control is not healthy.

I think this spans a bit outside of "literally everything" -- enwp.org is widely used by Wikipedia editors in Wikipedia-adjacent channels to refer to Wikipedia. I do not have exact statistics for you, but if you're on IRC long enough you'll see it quite a few times. If you take most third-party projects, like the tools users create, they are already hosted on WMF infrastructure because there's a security advantage. That does not mean WMF controls it, ideally the usage of the domain still comes down to the democratic processes we are all familiar with. However, given wide usage, I think this is a stand-out case.

To be clear, the technical work is more than rewrites, It's DNS, registration, TLS, HSTS, monitoring, logging, etc., etc. Of course, all of these are trivial individually because we have a decent framework, but it quickly adds up over the years to keep things updated with the latest stuff.

Understood, but as you mentioned, these are trivial in the scope of WMF, and the functionality can entirely be implemented within ncredir without any actual code change. It's not like we're spinning up a new service. We're adding 2 lines of config (see above).

I think this spans a bit outside of "literally everything" -- enwp.org is widely used by Wikipedia editors in Wikipedia-adjacent channels to refer to Wikipedia. I do not have exact statistics for you, but if you're on IRC long enough you'll see it quite a few times.

I'm in plenty of IRC channels :-) I suspect it's used in *some* channels to refer to the *English* Wikipedia, but people could just...stop doing that?

If you take most third-party projects, like the tools users create, they are already hosted on WMF infrastructure because there's a security advantage.

{{cn}}

That does not mean WMF controls it, ideally the usage of the domain still comes down to the democratic processes we are all familiar with. However, given wide usage, I think this is a stand-out case.

Transferring the domain to the WMF means the WMF literally owns and controls it, there's no other way about it. Nor is Wikimedia Tech a democracy.

To be clear, the technical work is more than rewrites, It's DNS, registration, TLS, HSTS, monitoring, logging, etc., etc. Of course, all of these are trivial individually because we have a decent framework, but it quickly adds up over the years to keep things updated with the latest stuff.

Understood, but as you mentioned, these are trivial in the scope of WMF, and the functionality can entirely be implemented within ncredir without any actual code change. It's not like we're spinning up a new service. We're adding 2 lines of config (see above).

Well no, I said that each one on its own is trivial, together they're not. Certainly not in 2 lines of config. Best of luck!

I suspect it's used in *some* channels to refer to the *English* Wikipedia, but people could just...stop doing that?

I don't see a good reason to potentially end up with lots of LANGUAGECODEwp.TLD style domains to be owned and maintained by WMF...

Proposing to decline.

@Aklapper, I would agree to decline this but for the line mentioning that enwp.org is in widespread use. If it is (it'd be good to see some stats, @violetwtf!) then it might be worth accepting the donation. Not wanting to open any floodgates to all LANGwp.tld is perfectly reasonable.

the line mentioning that enwp.org is in widespread use. If it is (it'd be good to see some stats

If nobody provides such stats I again propose to decline this task. Folks are welcome to use https://w.wiki/ instead.

One more domain/line in an existing list like https://gerrit.wikimedia.org/r/c/operations/puppet/+/1069643/12/modules/ncredir/files/nc_redirects.dat won't make a big difference either way.

But we also shouldn't introduce a lot of new short URL domains.

If it's already in widespread use though... it still seems better to have WMF own it rather than a 3rd party.

If nobody provides such stats I again propose to decline this task. Folks are welcome to use https://w.wiki/ instead.

Not really comprehensive, but just scanning these google results I see a decent amount of usage of it across a range of applications, not just ephemeral usage on IRC/twitter/etc. In particular, I see a number of academic papers using it.

I'm in favor of accepting it if Thomas wants to donate it. It really does seem to be a trivial amount of technical effort, and the long-term stability to support existing links seems worth the trade-off.

(Further, I don't think it's a sign that we're in "a bad place" to acknowledge that domain registrations / hosting have a very short lifespan if Thomas won the lottery and retired immediately to a tropical island, and a permalink service is in a very different category than other off-WMF tools...)

Popping in to mention that I haven't spoken to Thomas since March 2023 when I first opened this thread. Happy to reach back out if WMF reaches a decision to take the domain though. As of then, he said he would "gladly transfer it to WMF", but it has been a year and a half, so I'm unsure if his view has changed.

One more domain/line .. won't make a big difference

I have to add an important part here. Redirecting a domain (for example the various typo domains) to the right project URL is cheap.

But using it as an active URL shortener AND not breaking existing URLs that are already in use is a whole project that isn't that cheap. When it comes to that I tend to agree with Andre what existing w.wiki should be used instead.

Not really comprehensive, but just scanning these google results I see a decent amount of usage of it across a range of applications, not just ephemeral usage on IRC/twitter/etc. In particular, I see a number of academic papers using it.

FWIW, enwp.org returns 3,860 results while w.wiki returns 353,000.

I'm in favor of accepting it if Thomas wants to donate it. It really does seem to be a trivial amount of technical effort, and the long-term stability to support existing links seems worth the trade-off.

It's not difficult but it does promote a secondary shortening service. If it weren't for the domain name's similarity to an official wiki link I'd say it's not worth nabbing at all.

I see a number of academic papers using it.

This could be seen as unfortunate but to me it's a very good pro argument to take it over and ensure it keeps working / redirecting.

https://www.w3.org/Provider/Style/URI is important.

. It really does seem to be a trivial amount of technical effort,

I am not sure that is true. I would expect it to be bigger than it may seem.

Folks are welcome to use https://w.wiki/ instead.

But using it as an active URL shortener AND not breaking existing URLs that are already in use is a whole project that isn't that cheap. When it comes to that I tend to agree with Andre what existing w.wiki should be used instead.

To clarify here, enwp.org does not operate like w.wiki and covers a separate use-case. It does not have codes generated for pages, they are simply the same as they are on enwiki.

For example, https://enwp.org/URL_shortening redirects to https://en.wikipedia.org/wiki/URL_shortening

I use this daily to quickly navigate to enwiki articles and link my friends to them without having to take any action to create and then copy a URL (or type out the full enwiki URL), and allows for a less opaque URL shortener (people know where they're going before even clicking the link). With this shortener, I don't need to even access Wikipedia to shortlink to an enwiki article - it's super helpful! I don't really care who owns it as long as Thomas keeps it up, and if the only case in which WMF acquires this domain is one where they subsequently deprecate it and do not reimplement this helpful functionality, I would actually prefer it remains privately owned.

enwp.org does not operate like w.wiki and covers a separate use-case.

Thanks! That's an important distinction.

Indeed, if it's possible to rewrite everything with a simple rewrite rule from A to B then we could add this to our Apache cluster config and transfer the domain to avoid breaking any existing URLs.

It wouldn't have to be added to the URL shortener and technical effort could be low.

Yeah, calling it a "shortener service" is very misleading really. In practice it's literally just a way to not have to type out "en.wikipedia.org/wiki" because you can replace it with "enwp.org". There's no id-generation or other persistent state.

It's not difficult but it does promote a secondary shortening service. If it weren't for the domain name's similarity to an official wiki link I'd say it's not worth nabbing at all.

Perhaps worth noting that we're not deciding whether enwp.org will be shut down (except from in a long-term sense where Thomas will stop hosting it someday). So long as we don't actually do anything ourselves to promote it, it'll be no more of a secondary shortening tool than our existing ownership+redirect of domains like en-wp.com. (Plus, it's already promoted via community-maintained resources, e.g. https://en.wikipedia.org/wiki/Help:URL.)

Just keep in mind that, as far as I can tell, we wouldn't want the combination where WMF owns the domain while it points to Thomas' servers.

So I think it's either Thomas keeps running the service and the domain or WMF takes the domain and also gets the traffic and adds rewrite rules to not break URLs.

But that being said, it's not actually as complicated or technical effort as it may have sounded at first.

Sorry for the confusion, and thanks for pointing out that the domain is a simple redirection and not a shortener. What a detail to miss! Indeed, this should be simple enough to fit into our infrastructure. It would be useful to see Thomas' webserver configuration so we can confirm a smooth transition for redirect rules. @violetwtf would you be willing/able to contact Thomas for that information?

Thanks for your patience and persistence!

BCornwall changed the task status from Open to In Progress.Oct 2 2024, 7:34 PM

Change #1077466 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] ncredir: Add enwp.org redirection

https://gerrit.wikimedia.org/r/1077466

I've reached out to Thomas and will notify here if/when I get a reply. I've also made a WMF developer account to comment on Gerrit to ensure we support c.enwp.org, which was mentioned in the original conversation, but seems to have been lost to the year and a half context switch.

I've updated the CR to include c.enwp.org.

Okay, the patch has approval and will be merged if/when we get the domain into our MarkMonitor account - otherwise automation (ncmonitor) will be unhappy and try to remove it from ncredir.

Hi, @violetwtf, has Thomas responded? Thanks for getting on this. :)