Computer Science undergrad in love with WikiMedia's vision.
Imagine a world where every human being can share freely in the sum of all knowledge.
@RonnieV we are importing the main() function in an executable file called utility and running that using
./utility /mnt/data/xmldatadumps/public/nlwikimedia/latest/nlwikimedia-latest-pages-meta-history.xml.bz2 --processes=8 --output=../output.json
so the __name__ == "__main__" is not an issue.
So I have been trying to run the script @Halfak has linked in his previous comment on ores-misc-1. The output is slightly confusing.
Hi everyone, I'm an Outreachy intern and looking forward to contributing to this project. Just to clarify, in order to complete the micro tasks, I would need to compare all existing NSFW classifiers/datasets for Wikimedia Commons?
@calbon This task is still ongoing work, any reason we have marked it as resolved? Has it been moved from our to-do?
@calbon Any reason we removing the Machine Learning Platform (Research) tag from this task?
Just saw your comment on the discussion @Aklapper, I realize now that I was speaking on behalf of the foundation. Thank you for correcting me.
User Rar also asked if it is possible to list levels in order, either descending or ascending. Since IV=stub, current order is a bit confusing.
New thread continuing the discussion with the Mozilla folks: https://github.com/mozilla-frontend-infra/codetribute/issues/401
Definitely considering making it a GSoC project for next year, but I would love to get started with the planning and groundwork now to make sure we are on top of things.
Codetribute code: https://github.com/mozilla-frontend-infra/codetribute
Thank you so much Andre! Just excited to be of help to the foundation.
Right, but I'm not sure if that's a good argument to add extra layers, as I could also argue that new contributors have no prior experience with wiki pages? :P
Thank you @Aklapper
I see, that is unfortunate. Nevertheless we can continue and maybe you can tell me if my answers have helped move the task along a little further?
I meant someone from the Service deployment team, so that it can be on their radar. T214201 is waiting on this task.
@Aklapper anyone I can reach out to for this task?
@Aklapper Lets continue our discussion on this task, it could be something of relevance to GCI this year and I think we should try wrapping the discussion up before then. What say?
Great! Feel free to share any feedback here or in IRC (chtnnh on Freenode)
Also, we have a date for the production deployment, sometime around later next week.
I see no way of judging if template is an infobox just by it's name, there has to be a comparison with the list of template names believed to be infoboxes.
Hey @Aklapper ! Sorry for the late reply, was stuck with some work on other tasks.
Great, thanks @Ata
We are going to move ahead with these numbers to build an initial iteration of the model and then get your feedback on that.
@Halfak The PR has been merged, let me know when you want me to mark this task as resolved.
We haven't received any misclassification reports from the ptwiki community and are planning to go ahead and deploy the current version of the model.
Correct me if I'm wrong, but ВС is featured article class right? If so, then the numbers maybe worth trusting and we can go ahead with building the feature lists for the model.
When we run the extractor and count the number of instances of each class, we get the following output:
Thank you so much for your support @Ata. We have constructed an initial version of the extractor and are going to have a run on it hopefully by the end of today.
Hello @Ata! So the approach I am thinking of here is to solve this task in three steps:
@Halfak Can you please check if the utility is functioning as required?
Task has been completed and model has shown improvement in fitness.
The table has been updated under the summary section of the misclassification reports. This task can be closed as resolved.
Adding Outreach-Programs-Projects to the tags as the work is currently stalled on this task and it would be better suited as a GCI/GSoC/Outreachy project!
@Darwinius Hey Darwin! Can we find another way to collaborate and discuss the task? Something like irc where you can find me hanging in the #wikimedia-ai channel by the nick chtnnh. I am open to anything that works for you also.
@Salgo60 I am sorry, I do not understand how your comment relates to the task on hand? Can you please elaborate a little bit on your previous comment?
Thank you @Fuzheado!
Thank you @srishakatux 😄
@Pavithraes Thank you! 😄
This is the difference between model performance before and after adding words_to_watch to the feature_lists/ptwiki.py on selected articles that were misclassified by the old model
Here is the articlequality code for review.
After adding words_to_watch to draftquality we did not achieve any significant fitness improvement. This is evident in the tuning_report diff in this PR: https://github.com/wikimedia/draftquality/pull/39
Thanks so much @Lazy-restless 😄
@He7d3r has updated the message. What do you think about it now? I think we should see some input from the community on the model (articlequality) soon. In the meantime what can we achieve for draftquality?
@GoEThe Added my message on Esplanada, do check it out and correct it if anything seems wrong about it
The script is all credit to @Halfak 😄
Whichever is easiest. I would personally prefer that someone could translate it to Portuguese first though. Maybe you can add a Portuguese translation at the top of the same pad?
The model seems to be working with an accuracy of 80% from the numbers @He7d3r has reported. I think this review could benefit from extra pairs of eyes on it. Can @GoEThe and @He7d3r bring in more community members to check out https://etherpad.wikimedia.org/p/jsForPtwikiOres ?
That seems to make the models prediction right, is that correct @GoEThe ?
That could be one way to think of it, yes. This is because the prediction is the label with the highest probability whereas the weighted sum throws a greater light on the actual understanding the model makes of the article.
That would be very handy, it would make the review process much faster for the ptwikipedians.
I think @GoEThe 's suggestion makes a lot of sense. That would cover a reasonable amount of articles and would still be relevant to be considered for draftquality.
Model has been built 😄
The new service request has been filed (T250110) and if you have any input on it please feel free to comment on the task.
Thank you so much for your review @Pchelolo,
I think you could do it in a deferred update so that it doesn't block giving a response to the user.