Page MenuHomePhabricator

using built-in http.client in python implementations does not work
Open, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • the requests module is not available in python implementations
  • using built-in http.client gives the error below

see https://www.wikifunctions.org/view/en/Z18320 for a failing test
implementation in python here: https://www.wikifunctions.org/view/en/Z18327

Relevant code:

# Fetch data from Wikidata
    # Build the URL
    url_path = f"/wiki/Special:EntityData/{entity}.json"
    try:
        # Establish a connection
        conn = http.client.HTTPSConnection("www.wikidata.org")
        conn.request("GET", url_path)
        response = conn.getresponse()

        # Check the status
        if response.status >= 400:
            return f"error: Got >= 400"

        # Read the response data
        data = response.read()
        conn.close()
    except Exception as e:
        return f"error: during request: {e}"

What happens?:

WF outputs this error:

Error type: Error in evaluation
Expected result: { "Z1K1": "Z40", "Z40K1": "Z41" }
Actual result: { "Z1K1": "Z5", "Z5K1": "Z507", "Z5K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z885", "Z885K1": "Z507" }, "Z507K1": "No module named '_socket'" } }

What should have happened instead?:
Given that _socket is NOT listed here https://rustpython.github.io/pages/whats-left the test should have passed instead of returning an error.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

So9q renamed this task from built-in http.client in python implementations does not work to using built-in http.client in python implementations does not work.Jul 31 2024, 11:53 AM
So9q updated the task description. (Show Details)

Here is a chat with chatgpt where we try to diagnose and work around the issue:
https://chatgpt.com/share/89369c64-2531-485f-bc39-5cae75160cd8
Chatgpt suggested the following:

The error you're encountering, No module named '_socket', indicates that the http.client module, which relies on the _socket module for networking operations, isn't available in the restricted environment of RustPython used by WikiFunctions. RustPython has limitations in terms of available standard library modules, especially those requiring native extensions or low-level system access, like networking libraries.

Given these constraints, we can approach the problem differently. Since we can't use external libraries or networking modules, we can focus on the core logic that would normally handle the JSON response if it were available. This function can't make an HTTP request directly, so it will need to accept the data in a different way, possibly through function parameters or preloaded data.

If it is accurate that WF python implementations cannot make http requests at all currently I have the following suggestions:

This might be worked around by T282926 being done for this particular test case and implementation.

Thanks to @Mahir256 for linking to https://t.me/Wikifunctions/9592 "Right now your shouldn't be able to make any outside requests. If you are, please let us know immediately." from @DVrandecic

I suggest closing this as wont-fix and reopening once the dev team is ready to handle all implications of outside requests.

I'd like to add a little more context here.

The unavailability of _sockets is not due to RustPython, but due to WebAssembly. The system interface between Python's sockets and the required syscalls is intentionally not wired up, which manifests in Python as an error when trying to import sockets or call library methods that use it under the hood. Due to our security model, I think we are very unlikely to revisit this decision.

For a possible path forward, I'll copy and expand on what I said on IRC: there is a not-quite-implemented feature (https://phabricator.wikimedia.org/T359566) that will let us make callbacks to the function orchestrator. The orchestrator has far fewer security restrictions, and, at some point, we will probably allow at least some HTTP requests from the orchestrator. Indeed, we already make some requests--but only to MediaWiki, so it's heavily controlled.

But with these two features, Python implementations could (in a roundabout way) make HTTP requests. It's extremely unlikely that we'd ever allow totally unrestricted access, though: one of the fears about this system is that arbitrary HTTP requests could invite abuse.

I would be happy to continue this discussion in some venue in order to determine what the need is; if that venue is here, I'm fine with this task remaining open. To be clear, though, this isn't a bug: the system is working as intended.