Computer Science undergrad in love with WikiMedia's vision.
Imagine a world where every human being can share freely in the sum of all knowledge.
I see no way of judging if template is an infobox just by it's name, there has to be a comparison with the list of template names believed to be infoboxes.
Hey @Aklapper ! Sorry for the late reply, was stuck with some work on other tasks.
Great, thanks @Ata
We are going to move ahead with these numbers to build an initial iteration of the model and then get your feedback on that.
@Halfak The PR has been merged, let me know when you want me to mark this task as resolved.
We haven't received any misclassification reports from the ptwiki community and are planning to go ahead and deploy the current version of the model.
Correct me if I'm wrong, but ВС is featured article class right? If so, then the numbers maybe worth trusting and we can go ahead with building the feature lists for the model.
When we run the extractor and count the number of instances of each class, we get the following output:
Thank you so much for your support @Ata. We have constructed an initial version of the extractor and are going to have a run on it hopefully by the end of today.
Hello @Ata! So the approach I am thinking of here is to solve this task in three steps:
@Halfak Can you please check if the utility is functioning as required?
Task has been completed and model has shown improvement in fitness.
The table has been updated under the summary section of the misclassification reports. This task can be closed as resolved.
Adding Outreach-Programs-Projects to the tags as the work is currently stalled on this task and it would be better suited as a GCI/GSoC/Outreachy project!
@Darwinius Hey Darwin! Can we find another way to collaborate and discuss the task? Something like irc where you can find me hanging in the #wikimedia-ai channel by the nick chtnnh. I am open to anything that works for you also.
@Salgo60 I am sorry, I do not understand how your comment relates to the task on hand? Can you please elaborate a little bit on your previous comment?
Thank you @Fuzheado!
Thank you @srishakatux 😄
@Pavithraes Thank you! 😄
This is the difference between model performance before and after adding words_to_watch to the feature_lists/ptwiki.py on selected articles that were misclassified by the old model
Here is the articlequality code for review.
After adding words_to_watch to draftquality we did not achieve any significant fitness improvement. This is evident in the tuning_report diff in this PR: https://github.com/wikimedia/draftquality/pull/39
Thanks so much @Lazy-restless 😄
@He7d3r has updated the message. What do you think about it now? I think we should see some input from the community on the model (articlequality) soon. In the meantime what can we achieve for draftquality?
@GoEThe Added my message on Esplanada, do check it out and correct it if anything seems wrong about it
The script is all credit to @Halfak 😄
Whichever is easiest. I would personally prefer that someone could translate it to Portuguese first though. Maybe you can add a Portuguese translation at the top of the same pad?
The model seems to be working with an accuracy of 80% from the numbers @He7d3r has reported. I think this review could benefit from extra pairs of eyes on it. Can @GoEThe and @He7d3r bring in more community members to check out https://etherpad.wikimedia.org/p/jsForPtwikiOres ?
That seems to make the models prediction right, is that correct @GoEThe ?
That could be one way to think of it, yes. This is because the prediction is the label with the highest probability whereas the weighted sum throws a greater light on the actual understanding the model makes of the article.
That would be very handy, it would make the review process much faster for the ptwikipedians.
I think @GoEThe 's suggestion makes a lot of sense. That would cover a reasonable amount of articles and would still be relevant to be considered for draftquality.
Model has been built 😄
The new service request has been filed (T250110) and if you have any input on it please feel free to comment on the task.
Thank you so much for your review @Pchelolo,
I think you could do it in a deferred update so that it doesn't block giving a response to the user.
After speaking with @Daimona I have been informed that we require a MediaWiki extension if we wish to communicate with AbuseFilter. Is it possible to use the MachineVision extension for this or should I write a custom extension for this task?
Also, the ElasticSearch data is only updated once a week. That's a minor annoyance usually, but for recent changes it seems particularly ill-suited
@GoEThe Apart from the Infobox and Citation needed templates, can we assume the other templates are the same as for enwiki? We will require those templates as well if they are different for building the model.
@MusikAnimal Some follow up questions after the preliminary work I have done on the task:
@Halfak is there a way to loop in someone who would know about ElasticSearch so that we can explore that possibility? The table size for ores_classifications is quite important as you mentioned.
@Halfak The second link seems to be broken, could you help me find that resource please?
@kostajh Thank you for the pointers. I will treat them as a step-by-step on the task.
Claiming the task to clear any confusion regarding work. I will be working on this as soon as possible.
T240558 is not yet done so how do you suggest I go about this task? Also could you throw some light on storing in the ORES extension as I have almost no experience with the Wikimedia DB.
Thank you @He7d3r! That clarifies the template situation completely
@Halfak Why is the sum of all articles in all classes so low? Doesn't the ptwiki have more than a million articles?
Thank you so much for your support @Lazy-restless 😄☮
Hello Darwin! Me and @Halfak would like to submit this proposal to the coming Google Summer of Code program and require a second mentor from the Portuguese wiki community. @GoEThe suggested your name to us. What we would need from you is about 4 hours a week to answer some questions about the ptwiki and ascertain any assumptions we maybe making while developing this model. We would also require your assistance in testing the models once we're ready. Do you think it would be possible for you to commit your time to this?
Fetching scores for images should work in the same way.
@Darwinius Hello! Do you think you would be able to help us out with this proposal?
Brion Vibber certainly does, and he designed the schema and implementation plan at https://www.mediawiki.org/wiki/Wikimedia_Product/NSFW_image_classifier/Storage_notes.
Understood. That will be our training and testing data and then upon satisfactory performance we can try deploying the model, am I right?
Some follow up questions:
@MusikAnimal I found the link describing the image table! Do check it out.