Sep 5 2019
I just deleted some files and I'm compressing others. I didn't realize space was so tight ... my apologies.
Mar 22 2019
Mar 13 2019
Hi @leila. I don't think we ever collectively defined what an external link was in our schema. Using the external class, in my opinion, is a large problem that negatively impacts the strength of our research. I'm not sure how to account for it in analysis since the data for extClick events would miss a considerable number of links that editors clearly intend to be external reference links.
Mar 8 2019
My team discussed this today and reached consensus that comparing links with the document's hostname is preferred. This new definition of external seems cleaner and well worth the wait. Of course I'm happy to hear from others if there are objections to this change. Thank you!
Mar 7 2019
The instrumentation code only reports extClick events on links explicitly coded with class external. It's simple to exclude internal links that were miscoded as external, but what about the reverse? Links that are coded as internal but are really external won't be represented in click data at all. It looks like interwiki links are a potential problem here. For example interwiki doi links get the class extiw not external so would be missed. See ref 5 on Diamantane for an example. Interwiki doi alone represents a good number of links that my team would surely think of as external: see first 500. And reviewing the interwikimap, I see other base interwiki hostnames that seem "external": merriam-webster.com, handle.net, google.com, etc.
Mar 6 2019
Sorry guys, it's hard for me to find time for this project during the week.
New review: https://github.com/ryanmax/wiki-citation-usage/blob/master/data-regression-2019-03-05.ipynb
Mar 1 2019
Hi @bmansurov. I reviewed data today, specifically looking at section_id and freely_accessible elements.
Feb 25 2019
I'm reopening this so someone can take a look at @toddleroux's access issue. See above. His ssh-key entry appears to be missing a key type (ssh-rsa) at the beginning of this line: https://github.com/wikimedia/puppet/blob/production/modules/admin/data/data.yaml#L3123
Feb 12 2019
Hi @Miriam. This sounds good to me. I should have some time to look at the sampled data this Friday. Thank you!
Feb 6 2019
Excellent! Thank you, @Dzahn. I think this ticket can be closed.
Jan 25 2019
@bmansurov I don't think I have access to deployment-eventlog05.deployment-prep.eqiad.wmflabs or any of the wmflabs machines.
Jan 24 2019
@Lauren.maggio and I just discussed citation_identifier_label. Using a more comprehensive list of identifiers from citation style 1 would improve the data quality of this element. That said, we don't think we'll be able to make meaningful use of it and recommend dropping the element altogether. @bmansurov can you remove citation_identifier_label? Thank you for your patience on this one.
Thank you for the testing instructions @bmansurov. I will plan to review beta cluster pages and test data tomorrow. I'm still waiting to hear back from my team on T212937#4893106 and whether or not citation_identifier_label data is useful. In the meantime, I will update that task with a more comprehensive list of identifiers that I think should be used if citation_identifier_label remains. Sorry this has taken so long.
Jan 18 2019
Thank you for the fix to *freely_accessible*
Jan 15 2019
Jan 14 2019
Nov 20 2018
wikitech info for @RyanSteinberg
Nov 14 2018
Nov 13 2018
I don't seem to have access to Office Wiki and I don't see an option to create an account. Should I share my public SSH key here or wait for Office Wiki access?