Page MenuHomePhabricator

changedetection-io tool not working as expected
Closed, InvalidPublic

Description

Project Name: cd.io-teeW9or8

Developer account usernames of requestors: otcenas11

Purpose: Wasn't able to successfully deploy changedetection.io as a Tool with data persistence and chromium browser support.

Brief description: Persistent /datastore, public url. https://github.com/dgtlmoon/changedetection.io

How soon you are hoping this can be fulfilled: This week.

Event Timeline

Hi @Otcenas11, thanks for taking the time to report this. There is another Phabricator account @Otcenas111 linked to a developer account which is a member of Toolforge groups like tools.changedetection-io. I assume this is also your account; why not use a single Phabricator account?

The purpose described on https://phabricator.wikimedia.org/project/view/8463/ sound a bit broad.

How is this project specifically related to the Wikimedia community, and what is the involvement with/of the Wikimedia community which justifies running this on Wikimedia VPS?
For example, "price drops, restock alerts" is irrelevant to us.

Also, who else is involved in this project, or are you the sole maintainer?

Hi @Aklapper. Thank you for the questions. I could not log in to my previous account via MediaWiki so I created second account via LDAP.

Let me clarify the Wikimedia-specific use case:
changedetection.io is a website change detection and monitoring tool. For the Wikimedia community, the relevant use cases include:

Monitoring external sources for Wikipedia articles: Editors can track changes to official sources, government websites, or reference materials that are cited in Wikipedia articles, enabling timely article updates when source material changes.
Tracking vandalism patterns: Monitoring specific external sites that are frequently used for vandalism or misinformation campaigns.
Citation verification: Automated monitoring of referenced URLs to detect when cited sources change or disappear (link rot prevention).
Wikidata maintenance: Tracking external databases and official sources to keep Wikidata properties current.

You're absolutely right that features like "price drops, restock alerts" are not relevant to Wikimedia. The core functionality—detecting when web pages change—is what's useful for our community.

I am currently the sole maintainer of this VPS instance. The upstream project is maintained by dgtlmoon on GitHub.

Hello @Otcenas11 -- I have a few questions about your request.

  1. The license terms listed for changedetection.io look a bit inconsistent to me. On the one hand, github shows it as 'apach2 licensed,' which would fit with our terms of use. But down below in the readme there's a thing about commercial licensing, which suggests that the project owner means to have a non-commercial restriction on the license. If that's correct, then we would not permit running this software in toolforge or cloud-vps.
  1. Can you tell me more about what trouble you ran into trying to deploy this on toolforge?
  1. What's with the weird randomly-generated project name? Typically a project name would be human-readable and clearly associated with the purpose of the project.

Hello @Andrew

  1. The commercial licensing must be related to the cloud hosted version which they offer for a payment if hosted on their metal.
  2. I have been trying to set it up as a tool but haven't been successful so far. The url says "Internal Server Error".

Let alone set up Chromium/Selenium parsing.

  1. This is to prevent bots stumbling onto the project url randomly.
  1. This is to prevent bots stumbling onto the project url randomly.

The web proxy hostname for anything exposed by the project is separate from the OpenStack project name. There is no reason at all to obfuscate OpenStack project names. There really is no reason to obfuscate webservice hostnames either. Bots will find you via links in other crawled pages or similar published origins. Readable URLs are not the source of unwanted bot traffic.

@Otcenas11 Can you let us know the name of the toolforge tool you tried deploying this on? and maybe give more details about how to reproduce the issue you are facing? I can try to check out what the problem is. We can proceed with this (pending approval from others) if it's confirmed that this can't be deployed on toolforge

@Raymond_Ndibe See here: https://changedetection-io.toolforge.org/
Will this help?
tools.changedetection-io@tools-bastion-15:~$ kubectl logs -l name=changedetection-io --tail=100

Checking container contents

/workspace
total 68
drwxrwxrwx 3 heroku heroku 4096 Jan 1 1980 .
drwxr-xr-x 1 root root 4096 Feb 15 13:02 ..
drwxrwxrwx 8 heroku heroku 4096 Jan 1 1980 .git
-rw-rw-rw- 1 heroku heroku 3235 Jan 1 1980 DEPLOYMENT_SUMMARY.md
-rw-rw-rw- 1 heroku heroku 1205 Jan 1 1980 Dockerfile
-rw-rw-rw- 1 heroku heroku 4583 Jan 1 1980 README.md
-rw-rw-rw- 1 heroku heroku 2667 Jan 1 1980 deploy.sh
-rw-rw-rw- 1 heroku heroku 1868 Jan 1 1980 kubernetes.yml
-rw-rw-rw- 1 heroku heroku 5119 Jan 1 1980 requirements.txt
-rw-rw-rw- 1 heroku heroku 2267 Jan 1 1980 setup_autoupdate.sh
-rw-rw-rw- 1 heroku heroku 911 Jan 1 1980 startup.sh
-rw-rw-rw- 1 heroku heroku 992 Jan 1 1980 tool.yml
-rw-rw-rw- 1 heroku heroku 691 Jan 1 1980 toollabs-config.sh
-rw-rw-rw- 1 heroku heroku 1533 Jan 1 1980 update-cron.yml
-rw-rw-rw- 1 heroku heroku 490 Jan 1 1980 update_source.sh

Checking /workspace

total 68
drwxrwxrwx 3 heroku heroku 4096 Jan 1 1980 .
drwxr-xr-x 1 root root 4096 Feb 15 13:02 ..
drwxrwxrwx 8 heroku heroku 4096 Jan 1 1980 .git
-rw-rw-rw- 1 heroku heroku 3235 Jan 1 1980 DEPLOYMENT_SUMMARY.md
-rw-rw-rw- 1 heroku heroku 1205 Jan 1 1980 Dockerfile
-rw-rw-rw- 1 heroku heroku 4583 Jan 1 1980 README.md
-rw-rw-rw- 1 heroku heroku 2667 Jan 1 1980 deploy.sh
-rw-rw-rw- 1 heroku heroku 1868 Jan 1 1980 kubernetes.yml
-rw-rw-rw- 1 heroku heroku 5119 Jan 1 1980 requirements.txt
-rw-rw-rw- 1 heroku heroku 2267 Jan 1 1980 setup_autoupdate.sh
-rw-rw-rw- 1 heroku heroku 911 Jan 1 1980 startup.sh
-rw-rw-rw- 1 heroku heroku 992 Jan 1 1980 tool.yml
-rw-rw-rw- 1 heroku heroku 691 Jan 1 1980 toollabs-config.sh
-rw-rw-rw- 1 heroku heroku 1533 Jan 1 1980 update-cron.yml
-rw-rw-rw- 1 heroku heroku 490 Jan 1 1980 update_source.sh

Checking /app

lrwxrwxrwx 1 root root 10 Feb 19 2024 /app -> /workspace

Checking /layers

total 20
drwxrwxrwx 1 heroku heroku 4096 Jan 1 1980 .
drwxr-xr-x 1 root root 4096 Feb 15 13:02 ..
drwxrwxrwx 2 heroku heroku 4096 Jan 1 1980 config
drwxrwxrwx 1 heroku heroku 4096 Jan 1 1980 heroku_python
drwxrwxrwx 3 heroku heroku 4096 Jan 1 1980 sbom

Finding changedetection.py

tools.changedetection-io@tools-bastion-15:~$ toolforge webservice status
Your webservice is not running

@Otcenas11 Which command did you use to try start the webservice?

@Otcenas11 I looked at your tool. Prev commands show you are running toolforge webservice --backend=kubernetes --mount=all buildservice start,
which is failing with the error below, because you have a manually created deployment that has the same name as the name webservice is trying to use:

tools.changedetection-io@tools-bastion-15:~$ toolforge webservice --backend=kubernetes --mount=all buildservice start
Traceback (most recent call last):
  File "/usr/bin/toolforge-webservice", line 33, in <module>
    sys.exit(load_entry_point('toolforge-webservice==0.103.18', 'console_scripts', 'toolforge-webservice')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 561, in main
    start(job, "Starting webservice")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolsws/cli/webservice.py", line 88, in start
    job.request_start()
    ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 647, in request_start
    self.api.create_object(
    ~~~~~~~~~~~~~~~~~~~~~~^
        "deployments", self._get_deployment(started_at)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/kubernetes.py", line 249, in create_object
    return self.post(
           ~~~~~~~~~^
        kind,
        ^^^^^
    ...<2 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 190, in post
    response = self._make_request("POST", url, **kwargs).json()
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 157, in _make_request
    raise e
  File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 136, in _make_request
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-changedetection-io/deployments
tools.changedetection-io@tools-bastion-15:~$ toolforge jobs list
tools.changedetection-io@tools-bastion-15:~$ kubectl get deployments
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
changedetection-io   1/1     1            1           7h15m
tools.changedetection-io@tools-bastion-15:~$

If you delete the manually created deployment (or maybe change the name atleast), this will likely resolve your issue with toolforge

What I've done:
 1. Deleted the conflicting manual deployment that was blocking the toolforge webservice
 2. Configured the service to use python3.13 runtime
 3. Installed core Flask dependencies (flask, werkzeug, click, etc.)
 4. Installed babel and flask-babel packages
 5. Updated app.py to include system package paths

The url still gives 404.

fgiunchedi renamed this task from Request creation of cd.io-teeW9or8 VPS project to changedetection-io tool not working as expected.Mon, Feb 23, 10:37 AM
fgiunchedi edited projects, added Toolforge; removed Cloud-VPS (Project-requests).

The task is more about debugging changedetection-io tool rather than creating a new project, renaming and retagging accordingly cc @Raymond_Ndibe @fnegri @dcaro

@Otcenas11 Hi!

I get Internal Server Error currently when trying:

dcaro@acme$ curl https://changedetection-io.toolforge.org/
Internal Server Error

Looking at the logs it seems there's some missing dependencies:

tools.changedetection-io@tools-bastion-15:~$ tail -n 100 /data/project/changedetection-io/uwsgi.log
...
*** Operational MODE: preforking ***
Traceback (most recent call last):
  File "app.py", line 18, in <module>
  File "/data/project/changedetection-io/changedetection.io/changedetectionio/__init__.py", line 19, in <module>
    from changedetectionio import store
  File "/data/project/changedetection-io/changedetection.io/changedetectionio/store.py", line 11, in <module>
    from flask_babel import gettext
  File "/data/project/changedetection-io/.local/lib/python3.13/site-packages/flask_babel/__init__.py", line 20, in <module>
    from pytz import timezone, UTC
ModuleNotFoundError: No module named 'pytz'
unable to load app 0 (mountpoint='') (callable not found or import error)
...

It does not look like you have a venv setup, so I don't know how you set it up, there's a lot of commands and comments that seem AI generated that run in the env.

I suggest following the tutorial instead:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python

Or using a buildpack image, that might be easier if you have your code in a git repo, using a standard requirements.txt file:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_images/My_first_Buildpack_Python_tool

taavi moved this task from Inbox to Watching on the cloud-services-team board.
taavi removed a project: Toolforge.
taavi subscribed.

Does not seem to be an issue with Toolforge infrasturcture, thus declining. Please see the links mentioned above and refer to our support channels if you need more help: https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication.