Fri, Aug 10
We will pick this task up again after August 15.
This is the repeat of T189792 task. I'll decline this.
Update: this is now almost done. I have to run a few queries for Swati and we're working on a blog post. We should expect the conclusion of this project in a couple of weeks from now.
Thu, Aug 9
Great. Thanks @Johan.
Sounds good. Thank you!
@Trizek-WMF just responded on that task. sorry about the delay. I somehow missed the ping there.
@Trizek-WMF As you've seen, we've started the first experiment (T190776). The communications so far has been manageable though from time to time we will run into questions that I bet you can help us with. It would be great if we can reach out to you for those questions, or count on you if the flow of communications become unmanageable for us, otherwise, we won't be taking much of your time for this one. Does it work for you that you work on this project in this lighter mode?
Mon, Aug 6
I sent an update to enwiki VP about this experiment: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(miscellaneous)#Experiment%3A_Eliciting_New_Editor_Interest
The experiment started today.
Wed, Jul 18
We discussed this today: it's best not to add a new language, yet, as we will run into similar issues with rare language pairs when we introduce another language. For example, it will most likely be hard to collect labels for Korean and Arabic. Diego is going to do one pass over the data to see how much we can infer from it. If we can't learn from it, we will push for more data again. @Trizek-WMF I think for now we can consider this task as done, and we can open a new one if there is a need in the future. Thanks for all your help! :)
Tue, Jul 17
@Nuria a few more thoughts:
- You should make a decision if you want to label human activity which is bot-like as bot or not. For example, if I'm playing a Wikipedia game such as https://thewikigame.com/ where the goal is to get from article A to article B as fast as possible, are the webrequests associated with my activity bot requests or human requests? :) The answer to this question is context dependent, perhaps: for example, if you want to report human pageviews, you will likely want to keep these requests out but if you're reporting human consumption vs. machine consumption or if you consider offering different levels of service to the users, perhaps because there is human cognition and learning involved you'd want to count these as human requests.
Jul 15 2018
Jul 12 2018
Jul 10 2018
Jul 6 2018
@Trizek-WMF thanks for the update.
Jul 5 2018
Jul 2 2018
@Miriam Please link to the task that captures the NDA request. Once all steps are done, please add Operations tag to this task and someone from SRE will pick it up to process it. If you have questions, just ping.
@debt Research is a focus area and we will be capturing our content at https://wikimania2018.wikimedia.org/wiki/Hackathon/Research which I understand is different than https://wikimania2018.wikimedia.org/wiki/Hackathon/Technology . You may want to link from the latter link to the former, if you choose to. :)
Jun 29 2018
@MMiller_WMF Miriam and I sat with Andrew and Dan yesterday and discussed T185223. Would the input we provided to them suffice or shall we sit with you as well to take care of this ticket? I understand that your efforts can be broader than what's captured at T185233#4326260 . Please let me know if more input is helpful or I will close this task.
Jun 26 2018
@Bawolff @EBjune for context re why Security is asked to provide feedback: For data releases, we usually ask for privacy and security feedback if the data may contain private information (either within itself or in combination with other possible datasets that we, WMF, or others may release in the future.) Sometimes we don't have the capacity or expertise to do this in-house in which case we reach out to external privacy experts (check this example), sometimes we ask internally. Some level of such feedback is needed to understand the risks from the expert perspective before these releases.
Jun 18 2018
@Pginer-WMF excellent, and waiting until 2018-06-25 is fine at this stage given that it's unavoidable. :)
Jun 13 2018
Jun 12 2018
Update: I'm going to close this task as:
- We had a few discussions over this.
- The decision was to not fully document this process.
The umbrella topic of the RFP was outside of our current focus and we didn't have enough time to prepare a good set of question(s) and dataset(s).
prepared the form for privacy statement request. The form is with Ramtin for his review.
request for a privacy statement submitted to Legal.
Jun 11 2018
@Pirroh no data at all from editors.
Context on my end: bmansurov and I discussed a while back that the differences between different skin types may have impact on citation usage characteristics. That is why we included skin as part of the data collection. Of course, if certain types of skins have very few users, we can drop them as the information will not be very useful anyway. Two potentially related points:
- Initially, we intend to purge the data at 90-day time intervals until we get a better sense of what kind of signal we can get from this kind of data. In that regard, it can be fine to collect skin information (but again, I don't insist for skins that have very few users).
- We won't collect data from logged-in users.
@elukey I reviewed what you say below and I confirm that it's in-line with our earlier discussions on and off this phabricator task. No concerns from Research end (I do expect us to need doing some joins but the use-cases we could spot were rare enough in Research applications that we decided we will handle those on a case by case basis and at the analysis level.)
Jun 8 2018