Wed, Apr 18
Fri, Apr 13
Update: In T190776#4130825 I describe the choice of language for the specific experiment and I also give a sense of the timelines based on the latest discussions.
Update: Bob, Ramtin, and I talked about the details of how the experiment can look like, and we need around a week or two to sort out details. I'm off on the week on May 1st, so the experiment itself can realistically start in 3 weeks. In the mean time, of course, there is work to do:
- Choice of language: it makes most sense to run this test in English. Running it in another language right now requires updating the code base to handle text in other languages and this needs some days of coding and quality check. We should not do this for this very first experiment. So, let's fix the language for this to en.
@bmansurov I am nobody around here. ;) Do you want to check with Quiddity to see what's the best way to handle it?
Thu, Apr 12
Assigned to Dario per discussions in a meeting now.
Wed, Apr 11
@bmansurov the text at https://etherpad.wikimedia.org/p/InstructionsForSectionMapping is ready to go out. As we discussed, let's send a test message to Bob, Diego and myself on our enwiki talk pages.
Thanks, @bmansurov. I'll start an email thread to collect feedback on this and will write here the result of the compiled feedback.
@DarTar all details are documented at https://docs.google.com/document/d/14_Ptu8YaqQwNuWzwptuFl6tdabb2I1AAVBwCbmNBlMY/edit#
Tue, Apr 10
- Is sectionID the ID of an HTML element? For example, for a link appearing on https://en.wikipedia.org/wiki/Book#History, is the ID "History"?
Right. I think we don't have a perfect answer here. Basically, what we're interested to know is some notion of "how far" the user has been in the page without having to worry about capturing scrolls, screen size, etc. Ideally, we can measure the word counts (excluding templates) from the beginning of the article and with revisionID that would give us enough information to see where the user was. sectionID can be another proxy to figure out how far in the page the user was.
Can we use data-mw-section-id, instead? (Check https://www.mediawiki.org/wiki/Parsing/Notes/Section_Wrapping#Examples for some examples.) (FYI: I had some early chats with Subbu and that direction can be worth exploring.)
- elinkText — can you give an example of "space normalized string within anchor element"?
Please check T191086#4119312.
- freelyAccessible — can you share an example link for this?
Please check T191086#4118691.
- internalClick — is it a boolean?
- upClicks — is it a boolean?
- wikiProjects — can you share a page with (multiple, if possible) WikiProjects? I wonder if I can easily query this information while on some article page. If this information is not readily available on the page itself, then we probably can query this information after collecting data using pageId and some API to query from.
Please check T191086#4119312. In general, we want to do post-processing only if collecting data at the time of interaction becomes expensive. Technically, given that we will have revisionID of the page, we can always post-process which wikiprojects the page belonged to when such a notion exists. We rely on your judgement whether it's easier to collect this data at the time of registering the event or not.
- pageQualities — ditto.
Same as above. We can post process and the researchers may have to use ORES scores anyway since the assigned quality scores on wiki may not be updated. Sometimes they live in discussion pages https://en.wikipedia.org/wiki/Talk:Cardiovascular_disease sometimes in the article itself. This actually can be a hairy problem to solve at the event registration moment. Let's rely on post-processing unless you insist it's easy. ;)
- elinkPosition and citationNumber — are these numbers zero-based or one-based?
- timeBeforeClick — what is the start time for this? And to answer your question from the wiki page, it's easy to collect this timestamp while at data collection time.
- elinksClicked — To comment on your comment, I'm not sure if we can save userID for privacy reasons.
Actually, I had confused something. Please scratch this item. We do need IP address, and userAgent which get collected as part of the event capsule or if not, we should collect them. re userID, I had confused it with what we collected at https://meta.wikimedia.org/wiki/Schema:TranslationRecommendationUIRequests which was an editor facing item.
Fri, Apr 6
@bmansurov thanks for this. Diego and I spent some time on this and we have a way for listing the top n usernames can reach out to first.
Per my current understanding, I listed all data points that I can see we need for the first study by the two teams who will work with this data. They're at https://meta.wikimedia.org/wiki/Research_talk:Characterizing_Wikipedia_Citation_Usage#What_data_we_need I've asked Bob et al. and Lauren et al. to leave comments there.
Thu, Apr 5
Wed, Apr 4
Mon, Apr 2
Thu, Mar 29
@DarTar I'm going to assign this to you. Can you go over the survey that run in Q2 and prepare a summary based on the results? That can serve as a document we can go to in the future to learn from.
@bmansurov We can call this task for the purposes of Q3 done. If you agree, please move it to Done.
I'm moving this task to Backlog as it's not actively worked on. It would be great if @DarTar processes it at some point and defines the next step.
I'm going to decline this task until we hear back from Ma Commune folks on how they imagine the experiment to be done on their end. This task is less essential now that we know the quality of the recommendations are good, /and/ that the community already has found the idea helpful and aligned with some of the tools/systems they have developed.
@srodlund Dario mentioned that he will write this report. It would be great if you can help him. Specifically, it would be great if before April 24 (workshop day) you talk with him to see what depth of report he wants to have, what kind of information he wants to include, and put a structure around it before we go to the workshop. After the workshop, we can provide a summary of what happened and help you fill up the structure.
@diego I will move this task to Done given that you have built the first version of the classifier and we're collecting ground truth data to improve it now.
@mkroetzsch can you let us know if you still need fresh data for wdqs research. If not, we will stop the script that extracts the data.
Wed, Mar 28
Thanks both for the super fast turnaround.
Can that information go to public pages, @jrbs? I /think/ there was some sensitivity around it?
@Halfak @Milimetric As you may know, we have been asked to provide a description of what happened in this session plus anything we want affiliates to know for a presentation in WMCON about Deb Summit. I agreed to write a first draft for the two of you to review. here it is: https://etherpad.wikimedia.org/p/ZWn4q5ceHA
Tue, Mar 27
Thanks a lot, @bmansurov. it's fine to skip TP11 for now. Miriam has already worked on some of the tasks and I'll add 1-2 more.
Makes sense, @bmansurov. Go for it.
In case it's useful, my notes about my feedback at: https://meta.wikimedia.org/wiki/User_talk:Markus_Kr%C3%B6tzsch/Wikidata_queries#Feedback
@Smalyshev I would say you need your team's manager sign-off, plus Security's and Legal's. Given that you're deeply familiar with this data and how it's processed, you're perhaps in the best position to have these conversations with the three people/entities.
@bmansurov Diego and I did one pass over this task and as we talked about it today, we want to start with en-* given that we have en instructions ready. As we were finalizing the instructions, we figured out that it's best to narrow down the editors we reach out to based on their edit counts in en and *. Can you give us this number for all the pairs that start with en?
Mon, Mar 26
Halfak says this is unlikely to happen. I'll decline it. Feel free to bring it back to life, you two! :)
@Smalyshev can you check my comment at https://meta.wikimedia.org/wiki/User_talk:Markus_Kr%C3%B6tzsch/Wikidata_queries and let me know if this is something your team is willing to pick up?
Goals are moved to https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q4 in their corresponding sections.
@bmansurov shall we leave this task open?