User Details
- User Since
- Aug 8 2017, 10:56 AM (293 w, 5 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Diego (WMF) [ Global Accounts ]
Wed, Mar 22
Sat, Mar 11
- New features had slightly improved the accuracy (now is 75%), I'm still working on improving the model.
- @Trokhymovych had finished the version of this model, covering 47 languages. @MunizaA reviewed and adapted the code, and now we are coordinating with @achou to update the model on Lift Wing.
- We are discussing the schema for this (and other) ML-generated events (T331401)
Wed, Mar 8
The problem that I see with 1) is that we are already filtering (and rightfully so) a lot of events, meanwhile researchers may want the whole stream scored.
@elukey, do we know which is the data that is being filtered out?
Tue, Mar 7
And for the revert-risk model:
score: model_name: revertrisk model_version: 1.0.1 prediction: - true probability: true: 0.9 false: 0.1
This works for me.
Mon, Mar 6
- We are working in increasing the number of languages covered by this model. Currently, the model hosted on Lift Wing has 6. The next version will cover 47. (Keep in mind that the language agnostic model T314385 cover all wikis)
Sun, Mar 5
- We are discussing how to integrate this model on the Recent Changes page on MediaWiki (T329071)
- Currently I'm working on featuring engineering. The current model has around 72% accuracy on balanced data.
Fri, Mar 3
Thu, Mar 2
We should all sync up and work on some big standardized modeling design decisions and ideas. It would be great if we could share intensions and strategies for the future so we can prioritize work between ML and Event Platform especially.
I would add research on this ;)
Mon, Feb 27
Feb 24 2023
@achou the main requestor will be the aforementioned API, for evaluating the the model. I don't expect high traffic. Let's say a couple of thousands per week.
Feb 18 2023
- The paper was officially accepted in WWW'23. We made some final updates to the text.
- Discussing the integration of Revert Risk on MediaWiki: T329071
- Still working on the data evaluation. Currently I'm studying the use of tags and user groups and their relation with reverts.
Feb 16 2023
Apparently this is currently done by: https://www.mediawiki.org/wiki/Extension:ORES
Feb 7 2023
Feb 3 2023
- We are working on manually evaluating reverts to identify the right data to train the model.
- We have submitted the Camera Ready version of this paper.
- We have started working on evaluating references in other languages.
- We are coordinating with the ML-team to create a public stream with this model's score.
Jan 27 2023
- We are working on paper to share results of this model.
- Our paper was conditionally accepted at TheWebConf'23 (a.k.a WWW'23)
Jan 25 2023
@Miriam what should we do with sections that has multiple images? do you want
<page id>,<page title>,<section title>,<img_title>,<n_recommendations>
?
Jan 19 2023
The model is already available, check here how to use it: T314385#8496547
Jan 13 2023
- We have presented the results of this project at the WMF's "Monthly Tech All Meeting"
*The final decision should be out this week.
Jan 9 2023
Hi all, we have a use case here T326179. These models are already hosted on LiftWing. The suggested end-point could be mediawiki-revision-score-revert-risk-la
For the records here a snippet (by @achou) to try the models from the WMF's cluster
Jan 3 2023
For the records here a snippet (by @achou) to try this model:
For the records here a snippet (by @achou) to try this model:
Dec 23 2022
Update
- The Airflow DAG is ready to be deployed.
- Multilingual and language-agnostic models has been deployed to production. Check the details in the related tasks.
- We are now onboarding @Sheilakaruku to work on developing an user-interface to work with these models (T318634)
- We have received the reviews from the WWW, and submitted the rebuttal. Now, we need to wait for the final decision.
Dec 22 2022
I'm trying to implement a link-prediction task on Wikidata, to be used as proxy for claims coverage. I'm building on top of Goyal & Ferrara's work. The existing libraries might require some tweaks to work on the full Wikidata Graph, but before addressing the scalability issues I want to test this approach on a small sample to see the suitability of this approach.
Dec 13 2022
Regarding article quality, you can find the scores for all revisions in all languages from 2020-01-01 until 2022-09-31 here: /user/dsaez/paramita_article_quality/scores_all_v3_from_2020-01-01.parquet (HDFS)
Dec 12 2022
As mentioned before in our meetings, the main problem we have is the confusing usage of "Blue Links" as synonym of "topics". In NLP topics are either categories or clusters of documents. The second important problem we have is the lack of a evaluation task or guidelines. If we are using links as tags, and we want to evaluate the importance/relevance of such tags, we need a task, because relevance depends on the context.
Nov 21 2022
Based on your work at T314384, we would love to incorporate new fields like:
- vandalism_count
- vandalism_ratio
- vandalism_reverts_ratio
- seconds_to_revert_vandalism_avg
Your feedback would be highly appreciated, so thanks in advance for your interest and happy to brainstorm together on this :)
Just to clarify we have a "revert probability", we can't claim this is "vandalism". Different from previous model we have just one single score.
Maybe you might be interested on collecting abuse filter information. I have some code to do that, and from there you might be able to compute something like "abuse filters hits".
Nov 4 2022
Please remember to record your contributions on the Outreachy website! The deadline is (today) Nov 4th!
Nov 1 2022
Oct 31 2022
Yeah! Thanks @achou ! Please, can you write here an example of how to hit the endpoint ?
Hello everybody,
Oct 24 2022
I understand your concerns, but I'll start considering the "Instance Of" as the main "category" for the item. We could later try to cluster instances based on statements similarities, but I would keep that for later.
- The code is being refactored by @MunizaA and reviewed by @achou. They are trying to find the optimal architecture in order to make the code easier to maintain and update.
- We have found a poor performance from the model for anonymous edits. I'm working on updating the model to improve this.
Oct 19 2022
Hi @Caseyy0000. Please check the documentation for medittypes here. It could be that in some specific cases the library fails, but that should be very exceptional.
Oct 17 2022
Looks good to me!
Oct 12 2022
Oct 7 2022
Oct 6 2022
Oct 5 2022
Oct 3 2022
Sorry, I've read the documentation here: https://meta.wikimedia.org/wiki/User-Agent_policy and everything is clear. I'm going to close this ticket.
I see that the error says:
Scripted requests from your IP have been blocked.
However, the error persists from different IPs.
Sep 27 2022
Sep 26 2022
Sep 10 2022
- We are currently working on writing the results
Aug 31 2022
Aug 26 2022
Aug 17 2022
Thanks @Ottomata , the contract finish at December 15th.
@cmooney we need to give access to @Trokhymovych to the stat machines and Spark Cluster.
Aug 15 2022
Aug 10 2022
Aug 9 2022
Hi @BCornwall, just to say this is a high priority for us. We are already lost 4 days of work with @MunizaA been locked-out from the servers.
Aug 5 2022
@Muehlenhoff I just re-confirmed via call with @MunizaA that ssh-key is correct.
@Dzahn, there is a -ctr email: maslam-ctr@wikimedia.org , would that solve the problem?
Aug 4 2022
Thanks @RhinosF1 !