Deep Learning-Based Photoacoustic Visual Servoing: Using Outputs from Raw Sensor Data as Inputs to a Robot Controller
Tool tip visualization is an essential component of multiple robotic surgical and interventional procedures. In this paper, we introduce a real-time photoacoustic visual servoing system that processes information directly from raw acoustic sensor data, without requiring image formation or segmentation in order to make robot path planning decisions to track and maintain visualization of tool tips. The performance of this novel deep learning-based visual servoing system is compared to that of a visual servoing system which relies on image formation followed by segmentation to make and execute robot path planning decisions. Experiments were conducted with a plastisol phantom, ex vivo tissue, and a needle as the interventional tool. Needle tip tracking performance with the deep learning-based approach outperformed that of the image-based segmentation approach by 67.7% and 55.3% in phantom and ex vivo tissue, respectively. In addition, the deep learning-based system operated within the framerate-limiting 10 Hz laser pulse repetition frequency rate, with mean execution times of 75.2 ms and 73.9 ms per acquisition frame with phantom and ex vivo tissue, respectively. These results highlight the benefits of our new approach to integrate deep learning with robotic systems for improved automation and visual servoing of tool tips.