CUDA Errors with YOLOv5 Object Detection on CPAI 2.5.x – Seeking Insights
Posted: Sun Feb 18, 2024 10:32 am
I'm reaching out to share a perplexing issue I've encountered with the integration CPAI with my BI setup, hoping to find if anyone else has experienced something similar or could offer any insights. The problem first manifested around 01:45 am on 14/02/2024, and despite troubleshooting efforts, it recurred this morning, indicating a persistent underlying issue.
Initially, the system logs from 14th February showed an error related to CUDA, specifically mentioning "an illegal memory access was encountered". This issue caused a loop of errors until a system reboot was performed at 9:06 am.
Here is the exact log entry for reference:
I upgraded yesterday from CPAI v2.5.1 to v2.5.4, hoping the update would resolve the issue which took place on the 14th. However, this morning, the same CUDA error reappeared, this time indicating "an illegal instruction was encountered". The error persisted until a reboot was done just after 9 am. Below is the log excerpt from today's occurrence:
The repeating nature of this error is particularly concerning as it undermines the reliability of object detection capabilities, which are crucial for the functionality I rely on. It's disconcerting to see the system fail in such a manner, especially considering the otherwise commendable performance improvements in the AI aspects of the software.
Has anyone else encountered similar issues, particularly with CUDA errors causing system instability? Any advice on troubleshooting or resolving this would be immensely appreciated. I'm also posting this query on the CodeProject site to cast a wider net for potential solutions.
Thank you in advance for your time and assistance.
Initially, the system logs from 14th February showed an error related to CUDA, specifically mentioning "an illegal memory access was encountered". This issue caused a loop of errors until a system reboot was performed at 9:06 am.
Here is the exact log entry for reference:
Code: Select all
2024-02-14 01:39:57: Object Detection (YOLOv5 6.2): Retrieved objectdetection_queue command 'custom' in Object Detection (YOLOv5 6.2)
2024-02-14 01:39:57: Object Detection (YOLOv5 6.2): Detecting using ipcam-combined in Object Detection (YOLOv5 6.2)
2024-02-14 01:39:57: Response received (#reqid 85bde494-89d3-429d-a21b-c10b9430c5a8 for command custom)
2024-02-14 01:39:57: Object Detection (YOLOv5 6.2): [RuntimeError] : Traceback (most recent call last):
File "C:\Program Files\CodeProject\AI\modules\ObjectDetectionYOLOv5-6.2\detect.py", line 141, in do_detection
det = detector(img, size=640)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\models\common.py", line 669, in forward
with dt[0]:
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 158, in __enter__
self.start = self.time()
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 167, in time
torch.cuda.synchronize()
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\cuda\__init__.py", line 566, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in Object Detection (YOLOv5 6.2)
I upgraded yesterday from CPAI v2.5.1 to v2.5.4, hoping the update would resolve the issue which took place on the 14th. However, this morning, the same CUDA error reappeared, this time indicating "an illegal instruction was encountered". The error persisted until a reboot was done just after 9 am. Below is the log excerpt from today's occurrence:
Code: Select all
2024-02-18 04:52:40: Object Detection (YOLOv5 6.2): Detecting using ipcam-combined in Object Detection (YOLOv5 6.2)
2024-02-18 04:52:40: Response rec'd from Object Detection (YOLOv5 6.2) command 'custom' (#reqid ad29496b-5caf-4d8b-b02f-7fcc6c7ab605) ['No objects found'] took 22ms
2024-02-18 04:52:40: Client request 'custom' in queue 'objectdetection_queue' (#reqid 7e1b52dd-b880-4c35-b7c6-0f076127faab)
2024-02-18 04:52:40: Request 'custom' dequeued from 'objectdetection_queue' (#reqid 7e1b52dd-b880-4c35-b7c6-0f076127faab)
2024-02-18 04:52:40: Object Detection (YOLOv5 6.2): Retrieved objectdetection_queue command 'custom' in Object Detection (YOLOv5 6.2)
2024-02-18 04:52:40: Object Detection (YOLOv5 6.2): Detecting using ipcam-combined in Object Detection (YOLOv5 6.2)
2024-02-18 04:52:40: Response rec'd from Object Detection (YOLOv5 6.2) command 'custom' (#reqid 7e1b52dd-b880-4c35-b7c6-0f076127faab)
2024-02-18 04:52:40: Object Detection (YOLOv5 6.2): [RuntimeError] : Traceback (most recent call last):
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\models\common.py", line 715, in forward
max_det=self.max_det) # NMS
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 920, in non_max_suppression
x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\CodeProject\AI\modules\ObjectDetectionYOLOv5-6.2\detect.py", line 141, in do_detection
det = detector(img, size=640)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\models\common.py", line 717, in forward
scale_boxes(shape1, y[i][:, :4], shape0[i])
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 162, in __exit__
self.dt = self.time() - self.start # delta-time
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 167, in time
torch.cuda.synchronize()
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\cuda\__init__.py", line 566, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in Object Detection (YOLOv5 6.2)
Has anyone else encountered similar issues, particularly with CUDA errors causing system instability? Any advice on troubleshooting or resolving this would be immensely appreciated. I'm also posting this query on the CodeProject site to cast a wider net for potential solutions.
Thank you in advance for your time and assistance.