Page MenuHomePhabricator

Stop using rel=nofollow on all external links
Closed, DeclinedPublic

Description

$wgNoFollowLinks currently defaults to true as an anti-spam measure. This works against the goal of many wikis in adding legitimate external links, viz. to promote and draw attention to those linked-to sites. The wikis that are overrun by spam generally won't have many incoming links because people prefer to link to useful content, so their influence on pagerank won't be all that great. On the other hand, wikis that lack an effective CAPTCHA will be spammed regardless of nofollow because the cost of spamming them is so low and there is the potential to get at least some pagerank boost by means of the sites that mirror the wiki and don't use nofollow.

In short, setting $wgNoFollowLinks to true probably does more harm than good to most wikis' missions. Some system administrators probably don't think to change it, since it doesn't show up in LocalSettings.php. In most cases, we'd be doing them a favor by having it default to false.


Version: 1.21.x
Severity: enhancement
URL: http://lists.wikimedia.org/pipermail/mediawiki-l/2013-November/042038.html

Details

Reference
bz42594

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:09 AM
bzimport set Reference to bz42594.

Do we have any report/measure/guesstimate on how many wikis changed this to false and what effects they experienced?

It's a good question. There are some other variables (most notably, CAPTCHAs, spam blacklists, and other anti-spam tools) involved in how much spam a wiki gets, so I'm not sure how to get reliable data on that. I've administered wikis that had $wgNoFollowLinks set to true, and others that had it set to false. In both cases, spam tended to get out of hand until we installed Asirra and then it went down to almost zero.

There's a "good citizenship" aspect of this too. There was some debate over whether wikis act as good citizens by using nofollow, and thereby discouraging spammers from targeting wikis in general; or whether they act as good citizens by not using it, so that good external links have a chance to prosper in the pageranks relative to the spam sites that are all over the Internet. I suppose it partly depends on how effective are the search engines' algorithms and reporting methods for identifying and "punishing" linkspam.

Google encourages websites to consider nofollow as a potentially useful tool in their spam control arsenal, but to also try to find ways to avoid using nofollow on links it would be reasonable to trust. http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569&ctx=cb&src=cb&cbid=-1ebbpwhoov99v&cbrank=2 I wonder what other methods might allow for a more nuanced approach than what is currently available with $wgNoFollowDomainExceptions, $wgNoFollowNsExceptions, etc. E.g., maybe links added in unpatrolled revisions would be nofollow, but then the nofollow would be removed after someone patrolled the page.

Also, if some search engines accept spam reports, perhaps code could be written to automatically report the contents of pages that sysops delete with the "Spam" reason.

(In reply to comment #2)

I wonder what other methods might allow for a more nuanced approach than what
is currently available with $wgNoFollowDomainExceptions,
$wgNoFollowNsExceptions, etc.

Simple! https://www.mediawiki.org/wiki/Extension:Interwiki allows the wiki administrators to easily set up their own interwiki links. For any domain linked more than once on the wiki, it's a more efficient way to whitelist/authorise links to it.
There's already [[mw:Suggestions_for_extensions_to_be_integrated#Interwiki]]. Would this supersede your proposal?

(In reply to comment #1)

Do we have any report/measure/guesstimate on how many wikis changed this to
false and what effects they experienced?

Self-answer: I saw some, on a website selling tools to spam wikis which I opened coming from a recent mediawiki-l discussion. Wikis without nofollow are definitely targeted more at least by some spammers, the default should be changed only by installations with an interest in it and which are conscious of the decision.

(In reply to comment #4)

(In reply to comment #1)

Do we have any report/measure/guesstimate on how many wikis changed this to
false and what effects they experienced?

Self-answer: I saw some, on a website selling tools to spam wikis which I
opened coming from a recent mediawiki-l discussion. Wikis without nofollow
are
definitely targeted more at least by some spammers, the default should be
changed only by installations with an interest in it and which are conscious
of
the decision.

Can you go into detail how you determined that those bots take nofollow into account and that the bots that do so add spam not trivially combated by other anti-spam tools.


I'm re-opening this since it doesn't look like it was closed based on a real discussion.

(In reply to comment #5)

Can you go into detail how you determined that those bots take nofollow into
account

They make lists of such wikis and target them for spam; such lists are even sold by some. I'm not eager of linking one such "SEO consultant" site here, which would add to their SEO; where should I put the link in your opinion?

and that the bots that do so add spam not trivially combated by other
anti-spam tools.

Does this matter, if such tools are not on the wikis in question? We're speaking of defaults here, so ideally you should consider only core tools (or at most bundled extensions' defaults, but upgrading MediaWiki doesn't automatically bring all bundled extensions).

(In reply to comment #6)

They make lists of such wikis and target them for spam; such lists are even
sold by some. I'm not eager of linking one such "SEO consultant" site here,
which would add to their SEO; where should I put the link in your opinion?

Post the SEO consultant link to a wiki such as MediaWiki.org that has nofollow=true; that way it won't boost the SEO consultant's pagerank.

(In reply to comment #6)

(In reply to comment #5)

and that the bots that do so add spam not trivially combated by other
anti-spam tools.

Does this matter, if such tools are not on the wikis in question? We're
speaking of defaults here, so ideally you should consider only core tools (or
at most bundled extensions' defaults, but upgrading MediaWiki doesn't
automatically bring all bundled extensions).

Sure it matters. Unconfigured core is not rated to fend off any spam at all. If it were then SimpleAntiSpam's method would be part of core.

Frankly given how trivially we see MediaWiki installs that haven't been configured to deal with spam filled up, whether or not one core default changes the amount of spam on that already spam-filled wiki is irrelevant to the default value.

If a wiki that is rated to fend off a reasonable amount of spam trivially fends off bots that $wgNoFollowLinks affects then $wgNoFollowLinks is irrelevant to the spam issue.

Nemo: http://www.wikirobot.net/wikibase_breakdown.aspx shows that one wiki spamming site keeps track of the nofollow; but who knows how big of a player that site and others that check nofollow are in the big scheme of things.

I don't really worry about boosting their pagerank through a link, by the way. If that would be enough to significantly help us lose the war on spammers, then that fight is already hopeless. Boosting the pagerank helps customers find them but it also helps people looking to combat spam find them and learn about their adversaries.

I find it interesting to catch a glimpse of the face of evil up close and see what they say, so as to try to understand their thinking. They analyze us; we might benefit from analyzing them. Both sides are looking for weaknesses in the other.

I don't know why this is listed as UNCONFIRMED; I was able to confirm what the current default value is.

Change 94401 had a related patch set uploaded by leucosticte:
Bug: 42594 Set $wgNoFollowLinks = false by default

https://gerrit.wikimedia.org/r/94401

Change 94401 had a related patch set uploaded by Dereckson:
Bug: 42594 Set $wgNoFollowLinks = false by default

https://gerrit.wikimedia.org/r/94401

Some arguments against the tag's deployment on Wikipedia, which are also applicable to its deployment elsewhere: "Critics pointed out that this deprives many presumably useful sites of the benefits in search engine rankings from having a link on Wikipedia. . . . In a different take on the issue, Philipp Lenssen expressed disappointment with the move, arguing that it was actually poor etiquette for Wikipedia because outside links are part of why Wikipedia ranks so well in search engines, so that it now 'takes from the communities but doesn’t give back'" https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2007-01-22/Nofollow

I concur with the Daniel Friesen analysis and point of view in comment 9.

No opinion about comment 13, especially as this bug is related to MediaWiki, not to the default Wikimedia projects/the en.wikipedia's MediaWiki configuration.

The patch now works, but I still think this is a bad idea. We're trying to make MediaWiki easier to install and configure for the average usecase of MediaWiki, not harder. Intentionally making your wiki a target for even "just one spambot more" (per comment 10 uncertainty) is something that most wiki sysadmins won't want. I suppose the next step is a notification/discussion on mediawiki-l by the proposer(s).

The SEO people seem to think that a mix of dofollow and nofollow is best. The implication, then, is that it might not bring in that much more spam if the nofollow ratio goes down, since they are going to try to spam either way: http://trafficplanet.com/topic/3266-do-follow-vs-no-follow-wikis/

*"Mix of dofollows and nofollows should look natural to the eyes of google.." *"His point is right,by mixing both types of links we send strong message to google that the links are natural and whether it is do-follow or no-follow it will surely add up some value .No-follow links dont bring much pr value but it can at least bring some traffic to the blog."
*"I think combination of both Do follow and No follow is a better option. There is less risk if we send the signal that we are trying to make natural things by doing both Do follow and No follow."
*"I think the 'link is a link is a link' line of thought it was the right one when it comes to wiki's. It's not as though they're really difficult to obtain or anything like that, so go for them all. DoFollow are obviously going to be more valuable than NoFollow, but that's not to say there is no benefit of getting NoFollow ones."
*"You will want a mixture of both to give the look of a natural backlink profile"
*"I've used a mixture of do-follow and no-follows.. always work well.. but more anchortext variation is more important in this area..."
*"Search engine give more priority to do-follow back-links it doesn't mean they don't recognize no-follow link. No-follow back-links from relevant and high PR website can also help you to rank higher on Google and also on other search engines."
*"So, having backlinks from different sources, both of dofollow and nofollow type would indicate no explicit effort from your end to force your way to the top of search engines..."
*"Because normal website owners do not pay attention to whether or not a link is no-follow or not. The only people who pay attention are the seo crowd trying to game the system. This means that a normal non-seo website owner will naturally end up with a mix of both types of links. Therefor not having any no-follow links at all pointing to your site is not a natural pattern."
*"I cant remember how much of the links on the web are nofollow, think it was like 5-10%. Could be off. If you want to look 'natural' you want to have a similar ratio."

(In reply to comment #16)

The SEO people seem to think that a mix of dofollow and nofollow is best. The
implication, then, is that it might not bring in that much more spam if the
nofollow ratio goes down, since they are going to try to spam either way: [...]
*"I cant remember how much of the links on the web are nofollow, think it was
like 5-10%. Could be off. If you want to look 'natural' you want to have a
similar ratio."

This would still mean that 90-95 % of their spam efforts would go towards not-nofollow'ed links. :) So this seems to prove rather than disprove the usefulness of this setting.

The consensus among MediaWiki developers is clearly against this.

It would seem too that the community of wiki system administrators on mediawiki-l either didn't care about the default or was content to let the developers debate and decide the matter for them.

(In reply to comment #18)

The consensus among MediaWiki developers is clearly against this.

@MaxSem, I do not see a clear consensus against this.

https://en.wikipedia.org/wiki/Consensus_decision-making

There still seem to be some devs who are not against it.

The mediawiki-l discussion has not finished, it's still going on. In fact the last message on the topic in mediawiki-l was under an hour ago.

There has been no discussion on wikitech-l yet. And frankly some of us devs aren't always subscribed to mediawiki-l because the subscription can get too heavy to manage.

This is also not a black and white or boolean topic. "Accept the bug as-is" and "WONTFIX the bug" are NOT the only options here. There are other possible outcomes such as modifying the exact subject or realigning the goal of the bug or making it dependent on some other bugs that would have to be solved before the bug could be "fixed".

So it's too early to just flat out close the bug.

Loosening up the summary per Daniel Friesen's suggestion that making it not a boolean "Put rel=nofollow on everything" vs. "Don't put rel=nofollow on anything" debate could help open up the discussion to those who do not like the way we nofollow everything but don't think that flatly dropping it everywhere is the correct decision. Thanks for the idea.

Bug 57053 proposes adding more spam protection to the default configuration. The specifics are still uncertain.

Can I offer an alternative suggestion?

We could add in the second installer part a checkbox to ask the person who installs MediaWiki the preferred behavior.

Comment 24's suggestion is probably the best solution, as long as people don't mind adding another question to the installer. There are lots of config settings and therefore any number of questions the installer could ask people; how do we decide what should be included? Has the development community set out some guidelines on that?

This is why I recommended to put it in the second part, this is the part skippable if the person install is tired of answering questions.

A sensible guideline could be "Settings a good part of users want to configure".

All right, if there is no objection I will go ahead and write the patch and submit it. Any suggested wording for what specifically to ask the person who is doing the installing?

(In reply to comment #26)

A sensible guideline could be "Settings a good part of users want to
configure".

Then can someone explain on what basis $wgNoFollowLinks is one such setting?

Nemo, I suspect if we made that a checkbox in the installer, whatever we set the default to (checked or unchecked), people would tend to leave it that way, trusting the developers' judgment to set it to the optimal setting for most wikis. However, I do think more people would change it than currently do, if the option were presented in the installer.

With reference to criteria for putting options in the installer -- one could also ask, Why do we have some of the stuff we do have in the installer, such as checkboxes pertaining to upload and email settings? Anything could conceivably be a setting people would be more likely to change from the default if we put it in the installer. Maybe the solution is to implement the proposed configuration database and web interface, and take system administrators there after they finish the installation. That configuration interface could have all the settings, so the nofollow setting would not be out of place there.

In other news, going through my old bugs I see that bug 42599 is now related to this bug, now that its scope has been expanded.

(In reply to comment #29)

one could
also ask, Why do we have some of the stuff we do have in the installer, such
as
checkboxes pertaining to upload and email settings?

I think the rationale so far has been something like "settings which define the fundamentals of what the wiki *is* and what access is available after the install apart from the initial WikiSysop user". Whether this is a valid rationale and whether it was applied correctly, I'd discuss in another bug. :)

(In reply to comment #30)

I think the rationale so far has been something like "settings which define
the
fundamentals of what the wiki *is* and what access is available after the
install apart from the initial WikiSysop user". Whether this is a valid
rationale and whether it was applied correctly, I'd discuss in another bug.
:)

If that's broadly enough construed, nofollow could fall under that, because $wgNoFollowLinks controls whether users have access to be able to add external links that won't have rel=nofollow. Actually, almost anything could fall under that. E.g., $wgLogo controls what logo users have access to see by default in the top left corner; $wgMaxArticleSize controls the size of a revision users have access to be able to save; etc.

It's kinda arbitrary to include stuff like $wgEnotifUserTalk in the installer and not some of this other stuff. I'd file a bug to remove those settings from the installer, but I don't want to violate the rule against disrupting MediaWiki to prove a point.

Change 94401 abandoned by Tim Starling:
Set $wgNoFollowLinks to false by default

Reason:
The author's comments on the bug (e.g. comment 22, 25) imply that even he is no longer seriously pushing this patch as a solution for the bug.

https://gerrit.wikimedia.org/r/94401

TTO set Security to None.
Nemo_bis changed the task status from Open to Stalled.May 16 2015, 5:01 PM
Aklapper subscribed.

I'm going to boldly decline this stalled task: Four people expressed that they don't support this proposal, plus tasks shouldn't be stalled forever.
If there were sudden new arguments / analyses (and/or community discussions), this could potentially be reopened (or a new task could be created).