- We have shared the results on several places, such as the talk mentioned before, the research showcase, and
- This work was published at ICWSM'21, paper publicly available.
- The paper is published at SIGIR and available here: https://arxiv.org/pdf/2105.04117.pdf
- The work is published, poster and presentation submitted.
Tue, May 11
Thanks @Majavah. I've added the new rules, and problem is solved :)
Fri, May 7
- The CR version was submitted.
- We have uploaded the paper on arxiv, it should be available next week.
Thu, May 6
Fri, Apr 30
- I gave an online talk, presenting this and other disinformation related work: https://youtu.be/jTF4gdertUA
- Preparing the ICWSM presentation.
- We are preparing the camera ready version for SIGIR.
Apr 3 2021
- No updates
- The paper "Tracking Knowledge Propagation Across Wikipedia Languages" (ICWSM'21) has been published!
Mar 30 2021
I see. I was asking because we wrote these address on published papers, and those are immutable. But if is not possible, is not possible.
Would be possible to add redirects from the old urls to the new ones?
Mar 26 2021
- No updates.
- Finishing the camera ready version for ICWSM'21 paper.
- We have trained a new model, content information, to predict the likelihood of item to propagate to other projects.
- We are working on documenting our new results.
Mar 19 2021
- Information updated on betterworks.
- The paper with the dataset and first model has been accepted on ICWSM'21
- We are working on the camera ready version.
Mar 13 2021
- We are improvements on the model to predict the next language of propagation.
- We are working on modeling changes of content propagation behavior depending on the article reliability.
Mar 8 2021
- Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia, has been published in Figshare.
- The documentation about this dataset can be found here.
- We also wrote a paper and submited to a peer-reviewed venue.
- The Outreachy internship has been successfully completed.
Feb 26 2021
- Pageviews added an improvement over %5 on predicting content propagation.
- We are currently working on add content-related information (articles meta-data) to de model. This will allow to study the effects content quality on the spread patterns.
- The datasets will be published next week.
Feb 19 2021
- We have announced the datasets in a presentation to the NLP group in the University of Cambridge.
- Currently working on documenting the datasets.
- New experiments using pageviews as feature to predict content propagation.
Feb 13 2021
Feb 11 2021
Feb 10 2021
Good idea! I'll do the same.
Hi @bd808, I get your point. I can take the responsibility on keeping track of all these instances, and be the point of contact with you.
Feb 9 2021
Jan 28 2021
From our previous meeting:
Jan 23 2021
- New metadata has been added to the dataset: We are differentiating templates at article, section, and inline level.
- The dataset to model content propagation has been published in Zenodo.
Jan 18 2021
Thanks everybody. Especially @Nuria for putting all this together.
Jan 15 2021
- We submitted the propagation dataset to ICWSM.
- We are building a new model considering content popularity (pageviews).
- We have analyzed the impact of reverts on negative examples (reliability issue being solved)
- We have already created an heuristic to find negatives examples.
- We have created an initial dataset 80 templates.
- Currently we identifying relevant meta-data (ie. pre-computed features) to be added on the dataset.
Jan 6 2021
Dec 23 2020
Dec 11 2020
- Depending on the results for the paper submission (received border line evaluation), we are planning to publish the dataset separately from the model. In the case of publishing the dataset separately, this will be done during Q3.
- Kay (outreachy intern) has started her work based on the templates listed in this WikiProject.
- We are exploring techniques to get negative examples (cases were the problem has been solved) for these templates.
- We have submitted one paper about self-contradictory content in Wikipedia articles.
Nov 16 2020
- We have selected one Outreachy intern that will start on December. The intern will help on the task of developing the machine readable dataset.
- The dataset are ready. We are waiting for the paper to be published to share the link in public. Currently, the datasets are available under request via email.
- We started a preliminary analysis propagation of sources across Wikis.
Nov 2 2020
- We are extending the list to other languages: es, pt, ca.
- Reviewing outreaching applications that will help on creating the machine readable dataset.
- We are exploring a follow-up on this project, that based on our results, will focus on how to model the spread of disinformation.
Oct 29 2020
For more details on the timeline recommendations please check Isaac's comment here: T263874#6589856
Got you. Yes, looks good, please add it in the outreachy application.
@KemmieKemy thanks for submitting. You are doing great progress.
Oct 28 2020
@Rvvalentim , please can you double check if you need any of those files?
Oct 26 2020
- Paper was submitted last week.
Oct 10 2020
Oct 8 2020
Oct 2 2020
- We are currently working on the paper, adding new analysis, and improvements on the model published in the first round of analysis.
- The data can be found in HDFS: /user/dsaez/topicsForAllWikipediaPages2020-08-24AllProps.csv , OR
- You can also download the dataset from: https://analytics.wikimedia.org/published/datasets/topics/
- The dataset follows the same format described here: https://figshare.com/articles/Topics_for_each_Wikipedia_Article_across_Languages/12127434
Sep 30 2020
Sep 29 2020
Hi @RBrounley_WMF, thanks for sharing this and for the great work you are doing. Few comments from my side:
Sep 24 2020
@leila I see some overlap although this task seems to be broader than the one I'm working on. Given that I don't see much documentation nor code about this task, I prefer to not take responsibility on this.
Sep 4 2020
- We are currently working on preparing a paper to be submitted at the end of October.
- The two datasets have been prepared:
- One dataset with items that propagates across Wikipedias, removing bot activity.
- Another dataset about external references (links) across projects.
- A recent (with all the articles existing until Aug 31th) dump have been created. During the following days I will upload it in a public repository.
Aug 31 2020
- We have published the first round of analysis.
- Some important highlights:
- The size of the project (ie number of articles) is not correlated with likelihood of propagate content to other projects.
- Initial results shows correlation between cultural similarity and the likelihood of two or more projects to share similar content.
- For long cascades (ie, articles that exists in several languages), we are able to predict with a reasonable accuracy, the new languages that will create articles about the same topic.