hey @Aklapper : yes, the project is accepted. we didn't require the phabricator submission when Outreachy applications were due because some of the students were unable to create phabricator accounts at the time. so instead we just had Doris create this when she was accepted for tracking going forward. thanks for checking!
Mon, May 20
Fri, May 17
Thu, May 16
Tue, May 14
Mon, May 13
Wed, May 8
Could I ask for any kind of feedback on my analysis? It would be very useful to know what I need to pay attention to next time.
Hey @Cherrywins -- yes, I can do that. I'll email you by the end of the week using the email you provided on your application.
Tue, May 7
Mon, May 6
Fri, May 3
There seems to be less STEM related language switch on wiki. My guess is that those articles are not available in the local languages.
Yeah, I'd agree and also expect that this is somewhat Google's bias in what signals they use to choose articles to translate.
Thu, May 2
This is awesome @chelsyx !
Tue, Apr 30
Mon, Apr 29
This looks awesome @Miriam -- just adding more particulars to what I mentioned today:
Apr 19 2019
- QuickSurveys: added functionality complete but waiting on status of two possibly related bugs (T218243 and T220627)
- Potential for some preliminary insights: the reader demographics surveys (T203042) should provide some early insight into editor gender
Apr 16 2019
- Found supporting evidence that men do indeed read Wikipedia more frequently than women in the United States: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2617021
- Based on survey of 1000 AMT workers from US: "Second, men use Wikipedia more often — they are twice as likely than women to use Wikipedia daily"
- While younger respondents were consistently more likely to read Wikipedia frequently, mixed evidence from Global Insights phone surveys on gender:
- India: women more likely to be frequent readers of Wikipedia
- Mexico: men more likely to be frequent readers of Wikipedia
- Nigeria: men slightly more likely to be frequent readers of Wikipedia
- Iraq: ~equal likelihood by gender of being frequent readers of Wikipedia
- Though we're missing EventLogging for ~10% of our responses, it actually is more likely to be missing from our younger, male readers (T220627#5113109), so if there are issues with QuickSurveys loading, this would suggest that if anything it would lead to greater skew in the demographics results
Apr 15 2019
My other current theory is the missing 10% is possibly browsers that don't support sendBeacon
From the pilot survey, the only question-related feedback we received was that the language dropdown was not displaying correctly for some, so I'll be adding the English names of each language as a fallback -- e.g., "Чӑвашла (Chuvash)"
Apr 12 2019
i.e. it's unlikely but not impossible that QuickSurveys could be loaded and executed before EventLogging.
Is there any documentation I can read on the flow of the surveys? Does the user click on a link on-wiki, that opens a Google/Qualtrics form?
Apr 10 2019
Is it possible that the link to the survey is being shared outside a QuickSurvey (e.g. social media)?
Apr 9 2019
Excellent - thanks @srodlund ! I will try to make sure we don't have to create any more of these tasks in the future too :)
Apr 8 2019
Looks good to me! Thanks team!
Apr 4 2019
Not sure if this is the same issue as in this thread or should be separated into a new task, but...
Apr 2 2019
Apr 1 2019
Am I correct in understanding that the talk page basically records the changes made to the original article in English?
Mar 30 2019
Mar 29 2019
are registeredBefore and registeredAfter stored in ISO 8601 format ok for you? Todays date ( Fri Mar 29 2019 ) would be stored as 2019-03-29
Yes, that'd be perfect. Thanks!
Mar 28 2019
@NuKira that is entirely up to you whether you feel you can complete the application. Glad to hear you are still interested.
Mar 27 2019
Yes, regarding public links for PAWS notebooks: in general if you want to check what public notebooks exist for you, you can go to this URL (with your username substituted in) to see the list:
Mar 26 2019
@Cherrywins : categories are not a straightforward concept on Wikipedia. I believe you can get the categories that are listed for a page (https://www.mediawiki.org/wiki/API:Categories), but this is far from a perfect solution. I would not worry about getting this perfect on a submission - if you find an approach that works, great, but I'd say more important is to discuss how you might approach this given more time.
@Supida_h the latter is sufficient but do your best to keep it to an amount of work that you could reasonably complete during the program.
Mar 25 2019
Always worth saying: thanks all for answering each others questions and being supportive.
@Trishla08 if you have specific questions, I or others can try to provide some assistance. General feedback is not feasible at this stage though. I am not great at troubleshooting IRC but Phabricator has been the more effective channel for discussion on this project.
Mar 22 2019
An issue raised by @Muraran : even with the removal of duplicate commas in the .text.json.gz file, there can be a trailing comma at the very end that interferes with proper loading. Here's how you can figure out what's going on when you get these errors and how to fix it:
Hey @Muraran: glad you're being proactive but in the future, reach out to me first or raise questions/bugs on the thread (T217699) or its related subtasks. I'm going to close this task and move the discussion over there.
Mar 21 2019
This looks good to me. A few notes:
Looks good to me. Documentation of each potential criteria:
- Target anonymous users (wgEditCount === null) and logged-in users without edits (wgEditCount === 0)
- minEdits undefined and maxEdits set to 0
- Target a non-editor (wgEditCount === 0)
- minEdits and maxEdits set to 0. Alternatively, setting maxEdits set to 0 and anons to false (T186737) would also lead to just targeting logged-in users without edits.
- Target an editor
- minEdits set to 1, which would sample all users with at least one edit. This is based on the definition used in this task that a non-editor has zero edits and therefore an editor has at least one edit.
- Target a user with an edit count that falls into a given range -- e.g., 5-20 edits
- For this example, minEdits set to 5 and maxEdits set to 20. Notably, the configuration allows for flexible ranges to be set (as opposed to limiting the edit ranges to pre-defined buckets as recorded in the EventLogging schema)
Mar 20 2019
Thanks @Jdlrobson for making that update. All looks good to me (i'll update the task description and resolve)! Documentation for each acceptance criteria:
@Jdlrobson : before I sign off on this, I think the Developer Notes in the task description are the opposite of what the functionality actually does. Before I update them, I wanted to make sure I wasn't misinterpreting:
Mar 19 2019
Hey all - considering that PAWS was unreachable for a while and this project was posted later in the cycle, I am going to extend the deadline for working on this until April 2nd. That gives you another two weeks to explore the data and begin to generate questions / analyses that you could build on in a summer project. I'll update Outreachy's website as well.
Is there a way to way to get the length of each article i.e. no, of bytes or do I have to perform scraping to get that information.
Mar 18 2019
The dumps exclude sections which have no user translation, because that is not useful information in a comparable corpora. It seems the API does not do this filtering.
@Israashahin : thanks for letting me know that you needed an answer to that as the original comment has been deleted. The example notebook that I provided to you ( https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Content%20Translation%20Example.ipynb ) has a link to examples of how to do that under the Quantitative Analyses section. If you have more specific questions, let me know.
I am trying to access Page History for the translated page, but it doesn't work. It works as planned with the source page though.
Mar 17 2019
@Supida_h yes - while I'd prefer that you upload to PAWS and submit that link, if the service is not responding, a Github link that is open would be acceptable as well.
Mar 16 2019
Thanks @XinyueWang1 for offering assistance!
@Mansi29ag and @Supida_h : Glad you're looking into this -- I believe those statistics are for the initial translation (not what happens afterwards, which is one reason that this is an important research project) and indicate what proportion of content is translated over and whether it was created by humans or came from the machine translation. Because it's based on word count, if the translated article has more words than the source article, this would result in a number over 1. For example if the source article had 1000 words and the translated article had 1200 words, then this would result in 1.2 for any and if half of those 1200 words were suggested by the machine translation and half was added by the editor, then that would be 0.6 for mt and 0.6 for human.
Mar 15 2019
@NuKira at this point I understand that not everyone will have experience with qualitative methods and so do not worry if you're not certain of the right approach. What you should focus on is whether you can generate some questions or hypotheses around the content translation tool. This can be aimed at the types of content that is / is not translated or what happens after an article is translated. So go through some of the articles that have been translated and look for patterns. For example: do you see that overview content is translated but that more detailed specifics of an article are often left behind? if so, maybe give some examples of sections that correspond to each. Do you find that new content that is more culturally-specific is added to the translated article after it has been created?
Welcome everyone who has joined in the past few days! As you may see from the others, feel free to ask questions and let me know if you're running into challenges with getting started on this research. It's an open-ended task so don't be discouraged!
Hey @Mansi29ag : is this the notebook you're trying this out in: https://paws-public.wmflabs.org/paws-public/57510755/Untitled.ipynb
Mar 14 2019
@Jdlrobson Revising a bit from what I said on IRC: for our recent survey (March 4/5) on English Wikipedia, I looked at the distribution of webhost (en vs. en.m) and browser family (Chrome, Chrome Mobile etc.) for those who received the survey (QuickSurveyInitiation) vs all of the webrequests to en.wikipedia for that same time period. It seems less about mobile vs. desktop and more about specific browsers or OSes. See below (and I'm happy to talk more):
Update: see T215670#5024817 for current blocking issues with pilot that prevent full surveys from moving forward.
Current status and findings:
- We are concerned about how much selection bias we may be seeing in the pilot results (i.e. is the survey reaching a representative sample of readers or not).
- We are seeing a much higher proportion of younger users and users who identify as men than we expected. This could be because these users truly read Wikipedia more frequently (and so are more likely to be included in the survey) or it could be due to higher rates of self-selection into the survey. We will evaluate whether these trends are consistent by country and other demographics.
- We are concerned about whether certain bugs are affecting the QuickSurvey sampling and logging and would like to address these (or at least better understand them) before moving on:
- T218243 which would mean mobile is being undersampled. This matches what we see in our survey, which is that while e.g., ~20% of English Wikipedia readers use Chrome, 34% of the devices that saw the survey used Chrome
- Approximately 12% of our survey responses cannot be matched to QuickSurveyInitiation EventLogging. The reason for this is unclear: almost all of the survey codes from these responses look normal and the timestamps are relatively evenly distributed across the survey deployment.
- Approximately 18% of our survey responses cannot be matched to QuickSurveysResponses EventLogging: this higher percentage is possibly due to QuickSurveysResponses EL not being captured if the user explicitly right-clicks and opens the survey in a new tab (verified by me and see T131315#2311065 for related issue)
The first one why I don't have translation for the parallel translation is it because the Arabic translation is for the general description of the Articles (not the whole article) I choose or the download for the dump file has a problem??
Mar 13 2019
Hey @atgo: can we close this task out or are there still questions around the short-term analysis that you're waiting on? Thanks!
Volunteers / discussion being tracked here: https://meta.wikimedia.org/wiki/Research_talk:Characterizing_Wikipedia_Reader_Behaviour/Demographics_and_Wikipedia_use_cases