diff-checker should verify that copyright comments of edited files are still year-updated
Open, LowPublic

Description

Pywikibot is a Python-based framework to write bots for MediaWiki (more information).

scripts/maintenance/diff-checker.py is a file that is executed at every commit test to execute common verification in the edited files/code lines.

Each file in pywikibot repository contain # (C) Pywikibot team, 2008-2017 comment or so. All comments are not updated automatically each 1st January, but only when an edit is made in the code, so it's a way to know when the code was updated.

diff-checker should verify that the edited files contain this up-to-date comment. You can use a regex detection to verify that no outdated comment is still present in the edited file.

You are expected to provide a patch in Wikimedia Gerrit. See https://www.mediawiki.org/wiki/Gerrit/Tutorial for how to set up Git and Gerrit.


https://codein.withgoogle.com/dashboard/tasks/4560248935284736/

Framawiki created this task.Jan 1 2018, 4:58 PM
Framawiki triaged this task as Low priority.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJan 1 2018, 4:58 PM
Dvorapa added a subscriber: Dvorapa.

TBH, I think we should remove all year mentions altogether, We should not maintain such stuff.

TBH, I think we should remove all year mentions altogether, We should not maintain such stuff.

Same thoughts here. If it's not some legal requirement to have a year, this sounds like a waste of time.

jayvdb added a comment.Jan 3 2018, 7:36 PM

Agree, it is a waste of time to be updating this, as git log has better information if it is needed.

jayvdb awarded a token.Jan 3 2018, 7:36 PM

What next? Create a new task to remove years from scripts and close this one in favor of that? Or change the subject of this task in order not to dissipate this discussion?

What next? Create a new task to remove years from scripts and close this one in favor of that? Or change the subject of this task in order not to dissipate this discussion?

First removing GCI tag as not everybody looks agree with this task.

The creation of this task was done after seeing the comment of @Xqt in https://gerrit.wikimedia.org/r/#/c/210654/8/scripts/noreferences.py@37.

Change 402762 had a related patch set uploaded (by Dalba; owner: Dalba):
[pywikibot/core@master] Remove all copyright comments

https://gerrit.wikimedia.org/r/402762

Legoktm added a subscriber: Legoktm.

I -2'd the patch, while well intentioned, I don't think it's a good idea. IANAL, (but I think I have a decent understanding of software copyright laws and problems).

The year is usually included due to https://www.copyright.gov/title17/92chap4.html - which mentions that a copyright notice should have the year. At least I live in the US, and probably some other contributors do.

On the importance of per-file copyright headers, I'll point to what Luis wrote a few years back - http://lu.is/blog/2012/03/17/on-the-importance-of-per-file-license-information/

As I live in EU, most of countries here claims by law copyright is granted to every piece automatically and doesn't have to be stated by copyright notice anywhere. But we want anyone to use our code and therefore we should use more restrictive law (the US one) and leave notice as is.. BUT

But is the year in which piece was last modified really needed there? I would only preserve year in which script was originally written per my understanding of US copyright law?

Xqt added a comment.Jan 8 2018, 7:41 AM

There is one minor point to keep the copyright year (but it doesn't has to do with the copyright itself): There are various methods to download the framework files and keep them up-to-date. One can use svn or git repository or nightly dump which creates a version file or fetch it directly from published places. Anyway the copyright string is the last change to verify whether the file is outdated without checking line for line.

Wasn't the __version__ string (at this moment removed from scripts in scripts/ directory, but still preserved in circa 1/3 of scripts) originally intended for this purpose?

Xqt added a comment.Jan 8 2018, 2:38 PM

@Dvorapa: yes as long as we had the svn repository long time ago. With git the version string gave a hash but no sequential number anymore and therefore that string isn't usefull.

@Xqt Then we should get rid of __version__ strings completely? Git doesn't work with it like svn did and these strings are above imports => PEP8 violation? (I think I saw some discussion about this topic somewhere)

(sorry for being off-topic to the original copyright discussion)

But is the year in which piece was last modified really needed there? I would only preserve year in which script was originally written per my understanding of US copyright law?

My understanding is that when the script is modified, it becomes a new derivative work, which has its own copyright. (But depending on the change, for example, you wouldn't consider a simple typo fix to be original enough for deserving copyright protection.)

Dalba added a subscriber: Dalba.May 26 2018, 4:19 PM

There is one minor point to keep the copyright year (but it doesn't has to do with the copyright itself): There are various methods to download the framework files and keep them up-to-date. One can use svn or git repository or nightly dump which creates a version file or fetch it directly from published places. Anyway the copyright string is the last change to verify whether the file is outdated without checking line for line.

It's a crude measure and I'm not sure if users can rely on it in practice.

For example in db5ea9a074c0306b2accd, we have updated the copyrights of several modules after a stylistic change. The updated copyright year does not necessarily mean that those modules are up-to-date/actively maintained/under heavy use or anything like that.

I'd say having a look at the date of latest changes in HISTORY.rst (to get a grasp of the status of the repository) or git/svn log (for individual files) would better satisfy such use cases.

Change 402762 abandoned by Dalba:
Remove all copyright comments

Reason:
-2 for a long time, no progress, hopefully someday there will be fork that gets rid of these.

https://gerrit.wikimedia.org/r/402762