[报错解决] 运行MATCHA时需要在线下载Arial.TTF字体,但是无法连接huggingface
一、报错详情
requests.exceptions.ConnectTimeout:(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443):
Max retries exceeded with url: /ybelkada/fonts/resolve/main/Arial.TTF (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f5295722ce0>,
'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: a5b5b41d-c258-46b6-8e40-0200bc4cb62b)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/MATCHA/workdir/matcha_test.py", line 11, in <module>
inputs = processor(images=image, text="Is the sum of all 4 places greater than Laos?", return_tensors="pt")
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/processing_pix2struct.py", line 109, in __call__
encoding_image_processor = self.image_processor(
File "/miniconda3/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 552, in __call__
return self.preprocess(images, **kwargs)
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 437, in preprocess
images = [
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 438, in <listcomp>
render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 169, in render_header
header_image = render_text(header, **kwargs)
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 128, in render_text
font = hf_hub_download(DEFAULT_FONT_PATH, "Arial.TTF")
File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1857, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
二、报错分析
代码运行过程中需要从huggingface上下载“/ybelkada/fonts/resolve/main/Arial.TTF”,但是由于我是在服务器上运行项目,所以无法连接huggingface,导致连接超时报错。
具体导致报错的代码是:
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 128, in render_text
font = hf_hub_download(DEFAULT_FONT_PATH, "Arial.TTF")
三、问题解决
进入上述报错位置(image_processing_pix2struct.py)后,发现代码逻辑是:
if font_bytes is not None and font_path is None:
font = io.BytesIO(font_bytes)
elif font_path is not None:
font = font_path
else:
font = hf_hub_download(DEFAULT_FONT_PATH, "Arial.TTF")
font = ImageFont.truetype(font, encoding="UTF-8", size=text_size)
所以问题根源在于font_path == None。
经过逐层向上搜寻,发现font_path赋值位置
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 438, in <listcomp>
render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)
font_path = kwargs.pop("font_path", None)
if isinstance(header_text, str):
header_text = [header_text] * len(images)
images = [
render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)
for i, image in enumerate(images)
]
但是打印kwargs发现是一个空字典,所以修改config.json文件并无法传入font_path参数,最终直接原地修改,Arial.ttf要直接从huggingface下载然后传到服务器上。
font_path = kwargs.pop("font_path", None)
if font_path == None:
font_path = "YOUR_Arial.ttf_PATH"
if isinstance(header_text, str):
header_text = [header_text] * len(images)
images = [
render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)
for i, image in enumerate(images)
]