Page MenuHomePhabricator

UnicodeDecodeError on poetry run mkdocs --verbose build
Closed, ResolvedPublicBUG REPORT

Description

On running poetry run mkdocs --verbose build, I receive the error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 63: character maps to <undefined>

the traceback points to with document_file.open() as f: (macros/__init__.py#45)

Implicitly setting the encoding to utf8 resolves the issue — patch on the way 🙂

diff --git a/macros/__init__.py b/macros/__init__.py
index 0c58434..d38504a 100644
--- a/macros/__init__.py
+++ b/macros/__init__.py
@@ -42,7 +42,7 @@ def define_env(env):
     # Collect all document descriptions from filesystem and associate with
     # categories
     for document_file in documents_dir.glob("**/*.yaml"):
-        with document_file.open() as f:
+        with document_file.open(encoding='utf8') as f:
             for document in yaml.safe_load_all(f):
                 doc_tree[document_file.stem] = document

Event Timeline

Change 805857 had a related patch set uploaded (by Samtar; author: Samtar):

[wikimedia/developer-portal@main] macros/__init__.py: Implicitly set encoding

https://gerrit.wikimedia.org/r/805857

bd808 changed the subtype of this task from "Task" to "Bug Report".

I cannot reproduce the UnicodeDecodeError in my local environment. I don't mean to imply that it does not happen for others, just that this makes it a bit more difficult for me to evaluate solutions. I'm wondering if setting PYTHONUTF8=1 in the environment for the container would be a more robust solution than patching one of many file reads which happen during the build? This envvar should unconditionally enforce utf-8 encoding in our python 3.7 interpreter per https://peps.python.org/pep-0540/.

Change 808969 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/developer-portal@main] build: force utf-8 encoding for python

https://gerrit.wikimedia.org/r/808969

I cannot reproduce the UnicodeDecodeError in my local environment. I don't mean to imply that it does not happen for others, just that this makes it a bit more difficult for me to evaluate solutions. I'm wondering if setting PYTHONUTF8=1 in the environment for the container would be a more robust solution than patching one of many file reads which happen during the build? This envvar should unconditionally enforce utf-8 encoding in our python 3.7 interpreter per https://peps.python.org/pep-0540/.

+1'd, but just noting here too that it resolves the issue 🙂

Change 805857 abandoned by BryanDavis:

[wikimedia/developer-portal@main] macros/__init__.py: Implicitly set encoding

Reason:

Obsoleted by Ida0769ab058d126d7c1934ca0b80629b0f4fcfa4

https://gerrit.wikimedia.org/r/805857

Change 808969 merged by jenkins-bot:

[wikimedia/developer-portal@main] build: force utf-8 encoding for python

https://gerrit.wikimedia.org/r/808969

Sorry to steal this from you in the end @TheresNoTime, but thank you for the bug report and the first stab at a fix.