Page MenuHomePhabricator

Publication of the Flow Satisfaction Survey results including an analysis and raw data
Closed, ResolvedPublic

Description

Oct-Dec-2016 goal.

New goal is Jan-Mar-2017, according to the projects/tags.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Trizek-WMF triaged this task as Medium priority.Sep 5 2016, 3:50 PM
Qgil renamed this task from Work on Flow satisfaction survey's results and publish them to Publication of the Flow Satisfaction Survey results including an analysis and raw data.Sep 24 2016, 9:19 AM

The analysis needs to clearly note that invitation and participation were heavily biased towards Flow enthusiasts. (Invitations were selectively posted on the user_talk pages of the small minority who converted their user_talk to Flow.)

This should be particularly noted in relation to the question comparing wikitext to Flow. The percentages on that question will be heavily skewed towards Flow.

What's the status of this task? It looks like the survey was closed in September 2016 (cf. https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Flow&diff=740570570&oldid=739850424). Where are the survey results?

The survey results were supposed to be published in December according to https://meta.wikimedia.org/wiki/Collaboration/Flow_satisfaction_survey#Results, but I guess it's behind schedule. Any updates, Benoît?

Benoît is on vacation this week, so let me share a quick update. We could not complete this task in the past quarter but it is one of the top priorities of the Technical Collaboration team. Our aim is to complete it as soon as possible, so it can inform the Wikimedia Foundation Annual Plan FY2017-18.

Trizek-WMF raised the priority of this task from Medium to High.Jan 23 2017, 2:15 PM

Four months ago I commented here: The analysis needs to clearly note that invitation and participation were heavily biased towards Flow enthusiasts. (Invitations were selectively posted on the user_talk pages of the small minority who converted their user_talk to Flow.) This should be particularly noted in relation to the question comparing wikitext to Flow.

Can someone on this task indicate whether they consider that relevant and useful context, which would be noted in the final product?

Context that has been noted is that the invitation was targeting Flow Beta users and active Flow boards, because it was the only way to reach at people who can compare the two systems.

@Trizek-WMF thanx for the reply, although I think I failed to properly identify my concern. I fear that you may be unaware of a potential incoming trainwreck.

If the data is only used to compare various Flow features with each other, fine.

I believe you would agree that it would be illegitimate to ban Flow based on talk page invitations sent only to Flow's most vocal critics. If so, then I believe you can see how it would be equally illegitimate to roll out Flow based on talk page invitations selectively sent to Flow's biggest fans.

If anyone at the WMF suggests the prefer-Flow vs prefer-Wikitext numbers represent what percentage of editors want Flow, if anyone suggests that it supports Flow deployment (for Annual Plan FY2017-18 or anywhere else), then the WMF's reputation will go down the toilet. We have a policy declaring that sort of thing to be illegitimate. It would be viewed as gross error at best, and incompetence or fraud at worst.

I understand that you wanted to get input from people familiar with both systems. However people who tried Flow and didn't want it on their talk page (or anywhere else) were denied an equal opportunity to respond. The community will consider that to be a critical point. Any "data" on support for Flow is 100% garbage. If any staff members reads the survey and interprets that result as meaningful, if they act on that result or cite that result as supporting their actions, the WMF may get hit with ugly blowback.

As I've already explained, because we were interested in hearing from people who can compare Flow to Wikitext talk pages, the survey was distributed to all public wikis using Flow (as a Beta feature, manually activated on user talk pages or used as the default talk page system). Send the survey invite to wikis where no one has heard about Flow would have been a waste of resources and would have created a bias, like just send it to Flow fans.

I've run a query to have a complete list of all users from those wikis who use or used Flow. Users that have disabled Flow were identified by having an active Flow board on an archived subpage (as it can be created by Flow manager). They all have received the invitation.

The survey was open to anyone. So a message was also sent to main community hubs on wikis where Flow is used on a few pages as a trial. The invitation was also sent to some public Wikimedia mailing-lists of other public hubs (VPs, tech boards...). Any user who has seen the survey’s link can participate, even if that user has never used Flow. Some people (like you did, if I remember correctly) have also posted the invitations on places where I plan to post it or to spread the word to more users, anywhere.

Most of what I've said above is quoted from the report I'm finishing. I think that distribution has been fair enough to hear anyone's voice, don't you think?

I'll illustrate the problem:

Let's say we post the survey invitation to general community pages of wikis with Flow. We get 200 responses. It's an unbiased sample. For simplicity let's call it an even split. 100 out of 200 prefer Flow. Result: 50%. Let's say I don't like that result. Now I selectively post invitations to 1200 talk pages of people who opted-in to Flow. This draws 300 more survey responses. Let's conservatively say 90% of them prefer Flow (they opted-in). Now we have 270 pro-Flow responses plus the original 100 pro-Flow responses. That's 370 pro-Flow out of 500 total responses. That gives a false result of 74%. The result is biased due to the selective invitations. I made the example blatant by splitting it into two separate invitation phases. The issue and the result are identical if you post all of the invitations at the same time.

If I may make the example even more vivid, let's hypothesize community support for Flow is only 10%. Invitations on general community pages draw 200 responses, 20 are pro-Flow. Talk page invitations get 300 more responses from opt-in Flow fans. Let's assume a simple 100% of them respond pro-Flow. Now you have 320 pro-Flow responses out of 500. That is 64%. Votestacking changed the results from 10% to 64%. Posting selective and biased invitations... even if it was done innocently and with good intentions... yields grossly biased garbage.

We have a dedicated policy shortcut for this issue. WP:VOTESTACK. Votestacking is a threat to our consensus governance process. It's an issue that we are very conscious about.

If people read the survey summary and see a percentage for "prefer Flow" vs "prefer wikitext", that number will stick in their memory. It's a factoid. They won't be focusing on the invitations-details mentioned elsewhere in the summary. They just know "Flow is popular". So they're deploying Flow someplace "because it's popular", and they casually cite that factoid percentage from memory. The community will start ranting about the WMF pushing out software based on bogus data.

I think I should mention my motivation here. I'll admit I'm not a Flow fan. I'll admit I don't want Flow promoted based on biased data. But there's a bigger issue. I've seen what happened in the past when the community spotted the WMF using incorrect survey numbers to push out software. That wasn't a votestacking case, but staff were relying on and citing numbers that were clearly wrong. The community completely lost trust in the WMF. The community lost trust in any data cited by the WMF. We need to be able to work together. We need to be able to trust each other. I don't want community trust in the WMF to tank. I don't want anyone making a good-faith mistake relying on, or citing, a junk number and losing the community's trust. I'd like to see a warning note on this specific number, the "prefer Flow" vs "prefer wikitext" percent. At a minimum it should be clearly labeled as unreliable. I think it really should be labeled as biased in favor of Flow.

Let us publish the analysis and results of the survey, so we can discuss based on that instead of on assumptions.

Rather than "assumptions", I'd rather call it a concern. Ensuring a good initial publication is a lot better than (possibly) trying to rewrite history after publication.

Ok. Thanx. You're aware of the concern. I'll wait to see how it turns out. I'm extremely interested to read all of the results.

Results are scheduled to be published next Monday.