Page MenuHomePhabricator

pywikibot external pycolorname used by catimages.py
Closed, ResolvedPublic

Description

Compat depends on some external packages which do not appear to be hosted elsewhere, and are only used by compat script catimages.py.

http://git.wikimedia.org/log/pywikibot%2Fpycolorname.git
http://git.wikimedia.org/log/pywikibot%2Fopencv.git

Ideally these packages should be published as externally maintained packages, on https://pypi.python.org/ , as this would help with the porting of the catimages.py script.

Event Timeline

jayvdb claimed this task.
jayvdb raised the priority of this task from to Medium.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot-compat.
jayvdb changed Security from none to None.
jayvdb added a subscriber: valhallasw.
jayvdb renamed this task from send upstream any improvements in pywikibot externals forks of opencv and pycolorname to pywikibot externals opencv and pycolorname used by catimages.py.Nov 29 2014, 2:02 AM
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot.

@jayvdb I'm assigning this to myself as you had mentioned that this is a good microtask for the catimages.py project.

I wanted to check up on whose account the pypi package should be created ? Is there a mediawiki account for things like this ? or do I create an own account for this for now ?

Also, as it becomes a generic pypi package do we want to move it out of wikimedia repositories and into github or similar ? or do we still keep it in wikimedia repositories ?

@AbdealiJK , import the wikimedia repository into a github repository, under your own github account, and add a setup.py and ideally also some basic travis tests.

Once you've done that, ping me / @DrTrigon , and we'll do a basic review. Once that is done, you can release it into pypi using your own account unless @DrTrigon would prefer to be the pypi package owner.

Once you've done that, ping me / @DrTrigon , and we'll do a basic review. Once that is done, you can release it into pypi using your own account unless @DrTrigon would prefer to be the pypi package owner.

I don't instist to be mentioned as owner, but as Author or in the Contacts as (one of the) main/initial developper or so.

What license do you plan to use?

What do you suggest ?
And is there a current maintainer for it? Or do I mention it's unmaintained for now ?

@jayvb I'd suggest splitting this task into 2 tasks (1 for each repo) so that other potential GSoC candidates may also solve a microtask if they wish to participate in GSoC.
Is that alright ?

I need a bit of help here - I notice there is a line import wikipedia as pywikibot in pycolornames
Which wikipedia library is this exactly ?

EDIT:
I found wikipedia.py and realized it is what is being referred to. I see that currently we're using pywikibot.comms.http for http communications. Is this needed ? Can we just use requests or something (As I'd like the pypi package to be generic)

I notice that pywikibot.comms.http in pywikibot-core uses requests inherently. But adds a layer on top for error handling nicely.

What do you suggest ?

Would say at least MIT, as free as possible, non public domain. Nice would be to use one that only allows commercial use when paying license fees to us. Did I mention something in catimages.py?
Don't forget to mention that it uses libs that might have other licenses...

And is there a current maintainer for it? Or do I mention it's unmaintained for now ?

You can put me as a maintainer, even though I would not have time to spend. It's always good to have a second (fall-back) name and email there in case something unexpected happens, right?

@jayvb I'd suggest splitting this task into 2 tasks (1 for each repo) so that other potential GSoC candidates may also solve a microtask if they wish to participate in GSoC.
Is that alright ?

Yea. You can do only one if you like. Once you finish one, we can create a new task for the other one.

I need a bit of help here - I notice there is a line import wikipedia as pywikibot in pycolornames
Which wikipedia library is this exactly ?

EDIT:
I found wikipedia.py and realized it is what is being referred to. I see that currently we're using pywikibot.comms.http for http communications. Is this needed ? Can we just use requests or something (As I'd like the pypi package to be generic)

I notice that pywikibot.comms.http in pywikibot-core uses requests inherently. But adds a layer on top for error handling nicely.

These external libraries should not depend on pywikibot, if it can be avoided. Yes, use requests directly.

@jayvdb @DrTrigon I've created the github repository at https://github.com/AbdealiJK/pycolorname
I'm working on a Pull Request https://github.com/AbdealiJK/pycolorname/pull/7

I noticed that some of the functions are invalid now - because the web page has been restructured and the parsing methods need to be changed. I'm revamping the code so that this now uses classes (makes it neater and structured).

I began with cal-print.com and have completed the work on that.

I am next going to work on pantonepaint.co.kr as this is the one used in catimages.py right now.
Note: That the website has changed drastically, so I'm not sure if the names are even the same anymore. have to check it out.

Note 2: I haven't gone about cacheing the data yet. I'm still thinking about how to do it in a nice way. I'm considering either using the local dir ~/.local or the temp dir /tmp.

Note 2: I haven't gone about cacheing the data yet. I'm still thinking about how to do it in a nice way. I'm considering either using the local dir ~/.local or the temp dir /tmp.

If I understand you right, you are talking about the color name "cache" file. A few important things to point out here:

  • these files never change since the color name definition should not change
  • these file are the main content of the package, the python code just returns the data stored in them
  • therefore theses files should be considered as "database" not "cache" and have to be contained and packaged with the python code
  • the functions doing web page parsing are ment to be runed once by a developer just to create a specific database file once (of course updates of the database as part of a new release or manually by an intressted user should be possible but are not a main focus), if a web page goes off-line that's fine as we have the database file
  • thus the web pages are mentioned for transparency and documentation only
This comment was removed by DrTrigon.

@DrTrigon Thanks for the input !
I was under the impression that it would be better to generate it when installing or when being used the first time. Normally pypi packages don't hold generated content (afaik).

Hence, what I planned to do was to set a 1-2 month period for the cache where every few days the data would be refreshed. But your point about using a database does make sense - as the user will not be affected when the website changes.

So, I've decided to create a database folder which can be used as earlier.

I've completed porting the package on to a github account.
It can be seen here - https://github.com/AbdealiJK/pycolorname

Can you'll review it and let me know if there's anything else to change ?
If nothing needs to be changed, I think the next step would be to publish it to pypi.

More docstrings, especially module level docstrings, would be especially helpful, but structurally it looks very good to me. Give @DrTrigon a day or two to look at it before publishing.

Sure - and thanks for the feedback :)

jayvdb renamed this task from pywikibot externals opencv and pycolorname used by catimages.py to pywikibot external pycolorname used by catimages.py.Mar 5 2016, 12:46 PM

Hi,

Just an FYI here. I've made a development release of pycolornames as @jayvdb was ok with it.
You can see that at https://pypi.python.org/pypi/pycolorname

Also, I've set it up so we have "nightly builds" whenever any Pull Request is merged using rultor.
When @DrTrigon completes his review I will release a stable version.

More docstrings, especially module level docstrings, would be especially helpful, but structurally it looks very good to me. Give @DrTrigon a day or two to look at it before publishing.

I agree with docstrings, it should give an nice docu with sphinx and or doxygen.

I would be very happy, if I could get more than 2 days - more like 1 week or so... as I am not in wiki anymore partly due to this rush all the time. Thanks.

Hi,

Just an FYI here. I've made a development release of pycolornames as @jayvdb was ok with it.
You can see that at https://pypi.python.org/pypi/pycolorname

Also, I've set it up so we have "nightly builds" whenever any Pull Request is merged using rultor.
When @DrTrigon completes his review I will release a stable version.

So first I would to thank you for your work! I downloaded pycolorname-0.0.1.dev20160305125219.tar.gz and examined it as well as the pypi page at https://pypi.python.org/pypi/pycolorname. I think we still have some work to do in order to make the first major relase:

  1. I totally miss credits for other developers involved as well as proper licensing for http://www.pantonepaint.co.kr, http://www.logodesignteam.com, http://www.cal-print.com as well as http://www.ralcolor.com (it's questionable to me whether we should mention them at all - actually we would have to pay license fees to pantone to get such a table - these web pages have been kind or careless enough such that we were able to get the data - at least for once ;) - however you can mention Pantone and RAL - https://en.wikipedia.org of course. I don't like insisting on that stuff but I think we have to do this properly.
  1. There are formatting issues on the pypi page.
  1. I cannot compare databases under pycolorname/data/ easily due to changed format. I would have to sit down and write code for comparison. Is JSON faster than pickle for that size of data?
  1. Extending on jayvdb's comment I have to ask for docu containing examples, that explain how to use the code (I currently don't understand how, sorry). Therefore I cannot comment on the code yet.

To extend on 4. a bit more, basically we should implement something like in assignColorNames done:

  • a method that allows to search for a color's name given the RGB value of that color
  • this method should return the most close match out of the set of names we have in the database and the distance (colormath)
  • this assignments for every RGB value could be pre-calculated and stored in a DB in advance as well - not sure whether I tried this once but the DB was insanely big... - we should give it a try, it would speed the code up and remove the colormath dependency for end-users

So long, and thank for all the fish!

@DrTrigon Thanks for the comments

  1. Could you elaborate on how to do this ? I am not familiar with how to credit appropriately.

The questions I have on this point are:

  • Where else should I credit developers ? (Currently Ive written your name as author in pypi and my name as maintainer - each of these can only be 1 name according to PEPs)
  • Who are the other developers ? (The git log showed only your name)
  • How do we mention the sources ? (Should I remove them from the readme?)

I've created an issue on the repo too : https://github.com/AbdealiJK/pycolorname/issues/21

  1. yes. There's a bug at https://github.com/AbdealiJK/pycolorname/issues/20
  1. In general json is supposed to be faster. The reason I personally prefer json is because it's human readable and easy to parse by anything.

Note:

  • The colors from the pickle should be the same for most of them (I will check this myself and probably add a test for it too).
  • The colors for cal-print source are different because it seems colormath changed it's implementation of delta_e. And they do not have docs nor code of the older releases so I'm unable to verify which method of computing delta_e they used.
  1. Yes, docs are important. Will make them :)

Created https://github.com/AbdealiJK/pycolorname/issues/23 to handle this

I have planned assignColorName like functionality check here but I think it's best for a second release - as I'd like to read up on colortheory a bit more before doing that. So I was thinking after the gsoc proposal deadline.

Hello again,

I did a fairly elaborate search for contributors and found out about the history of pycolorname. Here are my findings:

The following changes are noteworthy in pywikibot-compat related to pycolorname:

Now I tracked all the commits in the places pycolorname was used from earlier:

As you can see I have followed all repos that pywikibot-compat has used for the pycolorname package and also all renames the package has had. I haven't found any other contributors other than @DrTrigon

Please let me know if I have missed something. I think I've tracked it completely and can't find any gaps in it's history.

@DrTrigon do you remember anyone else that has contributed ? If I had a name/email maybe I could do this in a more targeted way and find something I've missed ?

Nice. Based on your analysis, and the following addition by me, I believe it is safe to assume that @DrTrigon was the only prior contributor to pycolorname.

From https://sourceforge.net/p/toolserver/code/3409/ we can see that the catimages.py code before pycolorname was a hard coded list, so there was no prior code from anyone else in that branch. (even if multiple people helped build the static list of colors, that list isnt code, and doesnt need crediting).

  1. Could you elaborate on how to do this ? I am not familiar with how to credit appropriately.

The questions I have on this point are:

  • Where else should I credit developers ? (Currently Ive written your name as author in pypi and my name as maintainer - each of these can only be 1 name according to PEPs)
  • Who are the other developers ? (The git log showed only your name)

I only know of me as contributor - so if nothing else happened meanwhile, it's you and me.

  • How do we mention the sources ? (Should I remove them from the readme?)

I would remove the sources from the readme, yes but you can keep them in the code and comments of course. But may be not at the most prominet place... ;)

  1. In general json is supposed to be faster. The reason I personally prefer json is because it's human readable and easy to parse by anything.

I'm not so sure: http://stackoverflow.com/questions/18517949/what-is-faster-loading-a-pickled-dictionary-object-or-loading-a-json-file-to (3 years old)
Arguments for pickle are, you can store it in the final data format already (dict of tuples) and elegantly just load the file and return the dict. Arguments for JSON are human readability and interoperatibility. You could profile the code to get a clue what takes most time.

Note:

  • The colors from the pickle should be the same for most of them (I will check this myself and probably add a test for it too).

Where is the test script located? (you mentioned something in a mail)

  • The colors for cal-print source are different because it seems colormath changed it's implementation of delta_e. And they do not have docs nor code of the older releases so I'm unable to verify which method of computing delta_e they used.

Indeed this is a good point. So we should include our own calculation method according to the international standards. Did you try to get in contact with the colormath developer?

  1. Yes, docs are important. Will make them :)

Created https://github.com/AbdealiJK/pycolorname/issues/23 to handle this

Write a very brief first example into the readme/pypi page, that will alrady allow others to get started with. (what you sent in IRC would be already ok as starter)

I have planned assignColorName like functionality check here but I think it's best for a second release - as I'd like to read up on colortheory a bit more before doing that. So I was thinking after the gsoc proposal deadline.

...nice - so again we need colomath functionality included - good to see that you are working on it. Please include good literature into the docu as well, that important as a reference for later.

Greetings

Hey,

I believe everything has been already addressed except the following:

JSON vs pickle

Even if speed is worse in json, I'd prefer it because human readable. We do not have a complicated class that requires pickling.
This can easily have other language wrappers if it's json :)

delta_e for color-math

There are multiple algorithms 76, 94, 2000, and cmc. I believe 2000 and cmc are popularly used, although there is no official standard. 2000 is the one color-math used to use by default and now it asks users to decide which algorithm to use. I was using cmc, but now I changed to 2000.
Using 2000 I still get some differences, but 656 out of 992 have the same color as earlier. (They've re-factored how the delta_e works completely and that seems to have caused some change ?)
Well, either way - I think this is not something that is high priority, as we're just using the pantonepaint dataset in pycolorname.

Test with older datasets

This was complicated. Because the websites have changed and the old data was quite different sometimes. But I've gotten something up and running.
I've added a deprecationTest which compares the data from the last time. You can find it here
The email was just a quick way for you to convert pickle to json - you could easily use a difftool software to compare them then :) (But this test has explanations and is more detailed. You can ignore the email in favour of this)

Just for completeness sake I tried out the profiler mentioned in the SO post for json vs pickle:

$ python profiler.py 
pickle:
5.83598685265
json:
0.551915884018

I had to change this for 1000 runs as the default (1000000) took too long.
was the times I got for logodesignteam's source. I used their source as they are exactly identical in the pkl and json (The others are not as can be seen in deprecation_test)

@DrTrigon The reason I prefer json : http://abdealijk.github.io/pycolorname/#/chart/pycolorname.pantone.pantonepaint.json

:)

PS : You can see all the color charts using the navbar on top

Pro-actively closing this as it has achieved the main objective. If there is future work to be done, I think it can happen on the github repo (https://github.com/AbdealiJK/pycolorname) and/or as new Phabricator tasks against the Pywikibot project if there is any relationship to pywikibot.

Thanks for profiling. Looks good!

Had a look at the pypi page and the color charts. I like it - good work!! (though I had no time to test the code yet)

Thanks for you work!