Learning to Navigate Inside the Human Body

Endovascular procedures are among the most challenging manual tasks in medicine. A surgeon must guide a thin wire through a tortuous vascular network using only 2D X-ray feedback, with millimetre precision and sub-second reaction times. Could a machine learn to do this?

The challenge

Reinforcement learning has solved games, locomotion, and manipulation. But endovascular navigation presents unique constraints:

  • Safety — a single wrong move can perforate a vessel wall
  • Partial observability — 2D projections lose depth information
  • Sparse rewards — the goal is reached only after navigating the entire tree
  • Continuous control — guidewire motion is smooth, not discrete

Our approach

We framed the problem as a continuous-control reinforcement learning task with shaped rewards based on proximity to the target vessel. The agent observes:

  • Current and past X-ray frames
  • Guidewire tip position (segmented automatically)
  • Target vessel centreline (preoperative CT)

The action space is 3-DOF: tip advancement, rotation, and articulation.

Architecture choices

We found that recurrent policies outperformed feed-forward baselines by a significant margin. The temporal context of past frames helps the agent infer depth from motion parallax — a cue human surgeons use implicitly.

Lessons learned

  1. Reward shaping matters more than architecture — a well-designed dense reward can compensate for a simpler policy
  2. Evaluation is harder than training — we spent more time building the evaluation framework than the agent itself
  3. Clinical relevance requires clinical input — every design decision was validated with interventional radiologists

The full system is open-sourced as CathSim. We welcome contributions and feedback from the community.

© 2026 Tudor Jianu