Page MenuHomePhabricator

Spike [2 hours]: How to ensure that every session only can submit a survey result once (quick survey)
Closed, ResolvedPublic

Description

How do we avoid a malicious user sending multiple answers and skewing the survey results?

This applies only to the quick survey, the external surveys will deal with it.


Seems like from the event logging results we can group by a combo of fields (user agent + ip hash + others) to get a fairly unique identifier for a user, then we would be able to see malicious users and investigate if we should remove them from the results.


  • How do we identify malicious users in the survey results.
  • Any other issues we should be thinking about?

Event Timeline

Jhernandez raised the priority of this task from to Medium.
Jhernandez updated the task description. (Show Details)
Jhernandez subscribed.

Not easily... It really comes down to where we store the results and whether we store session ids. EventLogging? Given the surveys are sampled it would be hard for an anon to cheat the survey and impossible for a logged in user.

@Jdlrobson I asked @dr0ptp4kt about this and he told me it should be fairly easy to group the survey results from the event logging data by a combo of user agent+a few other things to see if there are any offenders and remove them from the results, so it seems like this may be a non-issue.

I'd like to clarify the fields we would group by from the EL data and any other issues we may find though, that's what the spike is for.

KLans_WMF renamed this task from Spike: How to ensure that every session only can submit a survey result once (quick survey) to Spike [2 hours]: How to ensure that every session only can submit a survey result once (quick survey).Aug 3 2015, 4:24 PM

We could use a combination of the following fields from the database: clientIp and userAgent.

We could also send/store a unique random token per browser and use it for filtering. It can be generated using mw.user.id(). This will be equal to the username for logged in users.

I think for now we should be okay without the additional field, and we can deal with abuse later should it surface.

Also I don't think we want to log username+answers on an identifiable way.

Let's keep clientIp and userAgent in mind then. Thanks @bmansurov