Evaluate list-building tools for ad-hoc topic modeling
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MGerlach
	Jan 22 2021, 6:24 PM

Description

Perform a quantitative evaluation for the list-building tools (see API [1]) developed in T266768. The tools are aimed to define custom (ad-hoc) topics in contrast to the pre-defined ORES topics. The question is how well these approaches work. One relevant task is to automatically generate lists of articles belonging to a given topic, such as climate change. We use wikiprojects-labels as a ground-truth dataset for different (arbitrary) topics. Starting from a suitable input-article(s) of the given wikiproject, we compare the output of the list-building tools with the articles contained in the corresponding wikiprojects.

generate a curated dataset of wikiprojects and contained articles (overlap T238437)
identify input-articles characterizing the corresponding wikiproject
Query different list-building tools and quantify overlap to ground-truth

[1] https://list-building.toolforge.org/

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Isaac	T258804 Language-Agnostic Topic Modeling
		Resolved		MGerlach	T272726 Evaluate list-building tools for ad-hoc topic modeling

Event Timeline

MGerlach created this task.Jan 22 2021, 6:24 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 22 2021, 6:24 PM

MGerlach added a parent task: T258804: Language-Agnostic Topic Modeling.Jan 22 2021, 6:24 PM

Update week 2021-01-25:

None

Update week 2021-02-01:

None

Update week 2021-02-08:

started to explore Isaac's dataset containing the wikiprojects-labels for articles in enwiki together with importance/quality

Update week 2021-02-15:

continued exploratory analysis of the data
this is mostly to help decide on defining parameters for the evaluation data, such as i) which wikiprojects should we include? (minimum number of articles, minimum level of activity); ii) which articles should we include (only high priority/quality); iii) how to find a single seed-article (or wikidata-item) needed as input for the list-building tool for a given wikiproject (e.g. search from title)

MGerlach updated the task description. (Show Details)Feb 26 2021, 6:20 PM

Update week 2021-02-22:

started to set up the pipeline for the list-building evaluation
- based on isaac's wikiprojects dataset, I derived a subset of wikiprojects to use for evaluation of list-building; we have 1486 different wikiprojects with at least 100 articles that have an importance rating (top, high, mid, low); articles without importance ratings are discarded to make sure that articles are relevant for the wikiprojects.
- identify a seed-article for each wikiproject by randomly sampling one of the articles with top-importance in that wikiproject
- using the seed-article we query the list-building tools to generate lists of related articles; we compare the overlap with with the articles in the wikiproject via standard precision and recall metrics
implemented and ran a baseline from cirrussearch (via morelike) to have lower bound on precision and recall

Update week 2021-03-01:

none (focus week)

Update week 2021-03-08:

prepared a grund-truth dataset
implemented the 3 different tools into evaluation pipeline
currently running the list-building for all tools (+ morelike-baseline) on ~1400 different wikiprojects
aim for next week is to aggregate peformance metrics across the wikiprojects and write up results

Update week 2021-03-15:

started comparison of different list-building tools with morelike-baseline
for approximately half the projects, morelike actually yields best coverage of the articles in the wikiprojects
for the other half, the new list-building tools better capture articles content in wikiprojects, with reader-based list-building often yielding best performance (though in some cases the content-based and wikidata-based tools yield better results).
aim for next week is to check: i) is there a pattern which list-building tools work best for which wikiprojects (e.g. whether reader-based methods work well for wikiprojects related to, for example, geography) , ii) whether pooling the results from different list-building tools actually yields better coverage of a wikiproject than any single method. the latter is most closely capturing the use-case of the list-building tool to support campaign-organizers, where users can pick which tool yields the most useful results based in the specific case and interest. I also started to discuss with alex stinson, he was interested in testing these tools with event-organizers in practice; this would be interesting that it constitutes a more realistic evaluation but also that we could hopefully evaluate in a non-English language (so far, we have the wikiprojects-ground-truth dataset only for English) in order to also take advantage of the fact that the list-building tools are language-agnostic and can be readily applied to any language.
plan is to write up those results in the next week

MGerlach updated the task description. (Show Details)Mar 19 2021, 5:18 PM

Update week 2021-03-22:

dug a little deeper in this analysis. the picture that emerges is that the different methods are very complementary in how they are able to capture the articles contained in each wikiproject
the overlap among the different lists is very low (among the 100 items from each list, there are very few items in common); on average (over different wikiprojects) jaccard-index is ~0.05...0.1
the improvement of one list with respect to another is not marginal, but often times one list-building-method provides very poor coverage, while another provides very good coverage; for example, there are hundreds of wikiprojects for which the "reader-based" list yields coverage that is at least twice as good as the baseline (that is an improvement of 100% or more in the number of articles that match the articles contained in the wikiprojects)
there seems to be no consistent pattern in terms of whether a specific method works best when aggregating different wikiprojects into topics (e.g. the different wikiprojects related to "Biography")
this seems to suggest that a good strategy as a tool is to pool the results from different lists
discussing with Isaac, we realized that it will be good to check how these results hold for at least one other non-English wiki; Isaac already prepared the data and I should be able to repeat this analysis quickly in the next week (together with writing this up)

Update week 2021-03-29:

finished a set of analysis for english, french, and arabic confirming previous results: the lists of articles are quite different, they complement each other covering different aspects of the wikipedia projects with no clear method emerging as the best single method
writing up the results on meta: https://meta.wikimedia.org/wiki/Research:Evaluating_list_building_tools_for_ad-hoc_topic_models
- some additional figures/results will be added in the next days

leila moved this task from FY2020-21-Research-January-March to FY2020-21-Research-April-June on the Research board.Apr 29 2021, 1:01 AM

leila edited projects, added Research (FY2020-21-Research-April-June); removed Research (FY2020-21-Research-January-March).

Update week 2021-06-28:

performed additional evaluation for 2 non-English Wikipedias (frwiki, arwiki). Results suggest that alternative list-building approaches (such as the one based on reader interest) is especially valuable in smaller languages.
added results to meta-page: https://meta.wikimedia.org/wiki/Research:Evaluating_list_building_tools_for_ad-hoc_topic_models#Results_beyond_English

Closing this task as all todos have been completed.

Evaluate list-building tools for ad-hoc topic modelingClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Evaluate list-building tools for ad-hoc topic modeling
Closed, ResolvedPublic
Actions

Related Objects
Search...