Maniphest T199851

Data set review for the Wiktionary Cognate Dashboard
Closed, ResolvedPublic
Actions

Description

We need a review and approval before making the data sets that support the Wiktionary Cognate Dashboard available publicly from /srv/published-datasets on stat1005.

The data sets that need a review are currently found in: /home/goransm/RScripts/Wiktionary/Wiktionary_CognateDashboard/

NOTE: Only the .csv files described in the README.txt document (found in the same directory) will be made public and thus need to be reviewed.

NOTE: None of the files contain any private data: merely aggregate statistics and results of statistical modeling of Wiktionary projects and their Cognate extension database.

Thank you.

Related Objects

Mentioned In: T200197: Wiktionary Cognate Dashboard Interface Translations
T200196: Put Wiktionary Cognate Dashboard on Daily Updates
T166487: Provide statistics about Cognate on Wiktionary

Event Timeline

GoranSMilovanovic created this task.Jul 18 2018, 12:46 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 18 2018, 12:46 AM

GoranSMilovanovic moved this task from Technical Wishlist to Prioritized on the User-GoranSMilovanovic board.Jul 18 2018, 12:46 AM

GoranSMilovanovic moved this task from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.

GoranSMilovanovic mentioned this in T166487: Provide statistics about Cognate on Wiktionary.

GoranSMilovanovic mentioned this in T200196: Put Wiktionary Cognate Dashboard on Daily Updates.Jul 23 2018, 1:34 PM

Pamputt subscribed.Jul 23 2018, 4:08 PM

Ping, Analytics can anyone please take a quick look at this - it is really a simple public dataset review that will take no more than a flick of an eye to complete - and we need to put the machinery that will use these data online? Thanks a lot!

GoranSMilovanovic moved this task from Current/Deprioritized to Prioritized on the User-GoranSMilovanovic board.Jul 28 2018, 2:54 PM

Ottomata added a project: Analytics.Jul 31 2018, 1:06 PM

Milimetric claimed this task.Aug 2 2018, 3:25 PM

Milimetric triaged this task as Medium priority.

Milimetric added a project: Analytics-Kanban.

Milimetric moved this task from Incoming to Data Quality on the Analytics board.

GoranSMilovanovic moved this task from Prioritized to Current/Deprioritized on the User-GoranSMilovanovic board.Aug 2 2018, 11:19 PM

I reviewed the files. They look ok. In general if you're just analyzing content, in this case what articles are available in different wiktionaries, then it doesn't need to be reviewed before publishing. Only if you start mixing in any data that's not otherwise public. But all of the data in your analysis could be obtained from public databases, right?

@Milimetric Thank you very much.

In general if you're just analyzing content, in this case what articles are available in different wiktionaries, then it doesn't need to be reviewed before publishing.

In most cases, I would say 99%, my work encompasses analyzing content in the above described sense or similar.

Only if you start mixing in any data that's not otherwise public.

I guess things like user analytics where some fields need to be anonymized or reported only upon aggregation... I perform these types of analytics too, e.g. campaign evaluations for WMDE, but typically I do not need public datasets generated in production for such cases.

But all of the data in your analysis could be obtained from public databases, right?

I'd say yes.

Thanks again, Dan.

GoranSMilovanovic mentioned this in T200197: Wiktionary Cognate Dashboard Interface Translations.Aug 6 2018, 1:16 AM

GoranSMilovanovic closed this task as Resolved.Aug 7 2018, 10:17 AM

Data set review for the Wiktionary Cognate DashboardClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Data set review for the Wiktionary Cognate Dashboard
Closed, ResolvedPublic
Actions