Page MenuHomePhabricator

Instructions to update user data in korma
Closed, ResolvedPublic

Description

Pending T60585, we need instructions to update user data in korma (real name, affiliation, associations with accounts). Something like pull requests to update files in GitHub.

Event Timeline

Qgil raised the priority of this task from to Medium.
Qgil updated the task description. (Show Details)
Qgil added a project: wikimedia.biterg.io.
Qgil added subscribers: Qgil, Dicortazar.

An initial configuration file will be provided with information about unique identities, affiliations and countries.

Initially, this will be private given that this may have sensitive information.

I'd like to close the T92953: Migrate Korma identitites database to SortingHat task before publishing this data.

SortingHat already exports data in a specific format that we call can work on.

That task hopefully finishes before the end of the month.

We have migrated the affiliations information from Wikimedia to the new SortingHat database schema.

However, we're missing nationalities information so far. Only unique identities and affiliations are provided.

Now we have a JSON file hosted in a private project in Bitbucket containing identities and affiliations, to which I got access.

I did a test to fix jforrester appearing as Unknown in the table at https://phabricator.wikimedia.org/T59038#1062054

There were too many new elements for me (including Bitbucket's UI) and no documentation describing the steps I'm supposed to do, so I opted to try a commit directly to master. @Dicortazar, let me know whether this is what you expected. If I got it right, the next time I will try a pull request instead, because I don't like to touch master directly.

jforrester is an example of someone who has different identities but all of them affiliated to the Wikimedia Foundation. I need an example of how to fix someone like legoktm, who was Independent until last year, and now is a WMF employee.

Thanks a lot for the changes. Those were merged into the database through the SortingHat command line tool.

However, some minor comments. If there's a new entry for some of the identitites, that indicates that this is an unique identity. For instance, the changes you made are correct, but, checking the file, I realized that there is probably a better step: jforrester already existed in a bigger identities set using the hash '0848fcd3d184007080330da369b363292812c126'.

So what I've done was to merge the list of identities that you detected into the previous specified hash. If you check the file, that identitiy already has the affiliation, so it was not needed to update the affiliation in all of the cases.

Given that SortingHat already provides a command line option to merge identities, I simply merge them in the following way (as an example):

$ sortinghat -u <user> -p <password> -d <database> merge 4a6c14640286fc0d597d8087d68ab4df45ce1491 0848fcd3d184007080330da369b363292812c126

And I got as a response:

Unique identity 4a6c14640286fc0d597d8087d68ab4df45ce1491 merged on 0848fcd3d184007080330da369b363292812c126

Then what I'll do is to accept your changes in the private repository and upload a new version with these changes.

With respect to how to proceed, given that we're not automatically merging your changes, I'm considering the option of 'bug' opening tickets referring the file. And each ticket would be closed with an update in the identities/affiliation file. What do you think @Qgil?

This will help in two ways: you forget about directly changing the file what may be tedious, so you simply open bug reports. And I simply upload new versions of the file, so I avoid merging your changes and updating that file with new ones.

Finally, if after the summer we have an awesome tool to manage identities through a web front-end, this semi-manual process should be hopefully ignored :).

With respect to how to proceed, given that we're not automatically merging your changes, I'm considering the option of 'bug' opening tickets referring the file. And each ticket would be closed with an update in the identities/affiliation file. What do you think @Qgil?

Works for me. If I get help from other contributors, then we can add them so I'm not defining the bus factor here alone.

This will help in two ways: you forget about directly changing the file what may be tedious, so you simply open bug reports.

Editing json manually is very tedious indeed. :)

Looks like a good compromise between now and (hopefully) the completion of a GSoC/Outreachy project in the current round.

This task can be closed as soon as the instructions to update user data in korma are documented in https://www.mediawiki.org/wiki/Community_metrics

Ok, I'll add this to ECT-April and proceed with the update of the wiki with instructions about how to update it.

I've added information about SortingHat in the Contributions section [1]. I may add extra details if needed. Hope this is useful!

Some old text was removed, but there's still some there. Specifically the paragraphs starting with "The user pages for Top contributors..." in the same section.

Should we remove them? (this is probably work for another ticket, to update that page).

[1] https://www.mediawiki.org/wiki/Community_metrics#Contributors

Thanks for the technical documentation. I have added instructions for users to fix their data. Please review, and close this task when you are happy with the documentation.

(At least for now, we don't need a task to update wiki page. Anybody can do it.)

That's a good point, thanks for the addition!

I'd say that documentation regarding the user identities and affiliations managing is good enough :).