Endovascular procedures are among the most challenging manual tasks in medicine. A surgeon must guide a thin wire through a tortuous vascular network using only 2D X-ray feedback, with millimetre precision and sub-second reaction times. Could a machine learn to do this?

The challenge

Reinforcement learning has solved games, locomotion, and manipulation. But endovascular navigation presents unique constraints:

Safety — a single wrong move can perforate a vessel wall
Partial observability — 2D projections lose depth information
Sparse rewards — the goal is reached only after navigating the entire tree
Continuous control — guidewire motion is smooth, not discrete

Our approach

We framed the problem as a continuous-control reinforcement learning task with shaped rewards based on proximity to the target vessel. The agent observes:

Current and past X-ray frames
Guidewire tip position (segmented automatically)
Target vessel centreline (preoperative CT)

The action space is 3-DOF: tip advancement, rotation, and articulation.

Architecture choices

We found that recurrent policies outperformed feed-forward baselines by a significant margin. The temporal context of past frames helps the agent infer depth from motion parallax — a cue human surgeons use implicitly.

Lessons learned

Reward shaping matters more than architecture — a well-designed dense reward can compensate for a simpler policy
Evaluation is harder than training — we spent more time building the evaluation framework than the agent itself
Clinical relevance requires clinical input — every design decision was validated with interventional radiologists

The full system is open-sourced as CathSim. We welcome contributions and feedback from the community.