Page MenuHomePhabricator

Improve features for wikibase vandalism detection model
Open, NormalPublic

Description

The number of features of Wikidata vandalism detection is good but it can be better.

Event Timeline

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMay 15 2018, 10:34 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We are working on this with @Lea_Lacroix_WMDE to get feedback and improve them.

Halfak added a subscriber: Halfak.Jun 25 2018, 9:43 PM

Any updates here?

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJun 25 2018, 9:43 PM

Any updates here?

I just had a meeting with Wikidata's communication manager. She is starting the process and it takes some time.

Aaand now I made the landing pages for the feedback: https://www.wikidata.org/wiki/Wikidata:ORES

And the announcement for feedback will be done on Monday, July 2nd :)

Vvjjkkii renamed this task from Improve features for wikibase vandalism detection model to yxcaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii removed Ladsgroup as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from yxcaaaaaaa to Improve features for wikibase vandalism detection model.Jul 2 2018, 4:10 PM
CommunityTechBot assigned this task to Ladsgroup.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Halfak added a comment.Aug 3 2018, 4:15 PM

{{merged}}

Restricted Application added a project: Scoring-platform-team. · View Herald TranscriptJan 25 2019, 1:20 PM

@Ladsgroup let's sit down together and flesh this out :)

@Ladsgroup Can you add the list of existing features as we discussed?

Sure:

is_client_move,
is_client_delete,
is_merge_into,
is_merge_from,
is_revert,
is_restore,
is_item_creation,
sex_or_gender_changed,
country_of_citizenship_changed,
member_of_sports_team_changed,
date_of_birth_changed,
image_changed,
signature_changed,
commons_category_changed,
official_website_changed,
en_label_changed,
is_human,
is_blp
comment_longest_repeated_char,
comment_uppercase_ratio,
comment_numbers_ratio,
comment_whitespace_ratio,
comment_english_bad_words,
comment_english_informals,
comment_longest_repeated_uppercase_char,
comment_has_url,
comment_has_first_person_pronouns_en,
comment_has_second_person_pronouns_en,
comment_has_do_or_dont_en,
log(wikibase.revision.parent.claims + 1),
log(wikibase.revision.parent.properties + 1),
log(wikibase.revision.parent.aliases + 1),
log(wikibase.revision.parent.sources + 1),
log(wikibase.revision.parent.qualifiers + 1),
log(wikibase.revision.parent.badges + 1),
log(wikibase.revision.parent.labels + 1),
log(wikibase.revision.parent.sitelinks + 1),
log(wikibase.revision.parent.descriptions + 1)
wikibase.revision.diff.sitelinks_added,
wikibase.revision.diff.sitelinks_removed,
wikibase.revision.diff.sitelinks_changed,
wikibase.revision.diff.labels_added,
wikibase.revision.diff.labels_removed,
wikibase.revision.diff.labels_changed,
wikibase.revision.diff.descriptions_added,
wikibase.revision.diff.descriptions_removed,
wikibase.revision.diff.descriptions_changed,
wikibase.revision.diff.aliases_added,
wikibase.revision.diff.aliases_removed,
wikibase.revision.diff.properties_added,
wikibase.revision.diff.properties_removed,
wikibase.revision.diff.properties_changed,
wikibase.revision.diff.claims_added,
wikibase.revision.diff.claims_removed,
wikibase.revision.diff.claims_changed,
wikibase.revision.diff.identifiers_changed,
wikibase.revision.diff.sources_added,
wikibase.revision.diff.sources_removed,
wikibase.revision.diff.qualifiers_added,
wikibase.revision.diff.qualifiers_removed,
wikibase.revision.diff.badges_added,
wikibase.revision.diff.badges_removed,
wikibase.revision.diff.proportion_of_qid_added,
wikibase.revision.diff.proportion_of_language_added,
wikibase.revision.diff.proportion_of_links_added
revision.comment.suggests_section_edit
revision.comment.has_link
revision.user.is_bot
revision.user.has_advanced_rights
revision.user.is_admin
revision.user.is_trusted
revision.user.is_patroller
revision.user.is_curator
revision_oriented.revision.user.is_anon,
log(temporal.revision.user.seconds_since_registration + 1)

This is all of the features, Tell me if any one them is not clear enough.

Halfak triaged this task as Normal priority.Feb 19 2019, 10:21 PM
Halfak moved this task from Untriaged to New development on the Scoring-platform-team board.
Addshore added a subscriber: Addshore.

I don't know how much work this is.
@Lydia_Pintscher should this still be on the campsite?
Is this ready to be done?

I don't know how much work this is.
@Lydia_Pintscher should this still be on the campsite?
Is this ready to be done?

It's probably good to do embedding or clustering on set of one-hot encodings of properties changed, languages changed, number of statements per properties, etc. That would make it greatly more accurate.