TacThru: The STS Sensor

Robotic manipulation requires both rich multimodal perception and effective learning frameworks to handle complex real-world tasks. See-Through-Skin (STS) sensors, which combine tactile and visual perception, offer promising sensing capabilities, while modern imitation learning provides powerful tools for policy acquisition. However, existing STS designs lack simultaneous multimodal perception and suffer from unreliable tactile tracking. Furthermore, integrating these rich multimodal signals into learning-based manipulation pipelines remains an open challenge. We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction, and TacThru-UMI, an imitation learning framework that leverages these multimodal signals for manipulation. Our sensor features a fully transparent elastomer, persistent illumination, novel keyline markers, and efficient tracking, while our learning system integrates these signals through a Transformer-based Diffusion Policy. Experiments on five challenging real-world tasks show that TacThru-UMI achieves an average success rate of 85.5%, significantly outperforming the baselines of tactile policy (66.3%) and vision-only policy (55.4%). The system excels in critical scenarios, including contact detection with thin and soft objects and precision manipulation requiring multimodal coordination. This work demonstrates that combining simultaneous multimodal perception with modern learning frameworks enables more precise, adaptable robotic manipulation.

Contributions:

TacThru: a novel STS sensor that enables efficient, robust, simultaneous tactile-visual perception.
TacThru-UMI: an imitation learning system with a design compatible with UMI for data collection, processing, and policy deployment.
A comprehensive experimental validation demonstrating how TacThru's simultaneous multimodal perception enables superior fine-grained and contact-rich manipulation.

The TacThru Sensor

Secret Recipes for
Efficient and Robust Marker Tracking

(i)	A fully transparent elastomer that enables clear visual perception.
(ii)	Persistent illumination that eliminates mode switching.
(iii)	Novel keyline markers that maintain visibility against any background.
(iv)	An efficient tracking algorithm processing marker deviations at 6.08 ms per frame.

Fabrication Details of
TacThru / TacThru-UMI

(a) Keyling marker fabrication: The keyline marker elastomer is fabricated by sequentially spraying inner (black) and outer (white) markers on transparent elastomer using laser-cut masks.

(b) Sensor design: The elastomer is installed on a VBTS sensor body, which includes the LEDs for illumination, camera module for image capturing, and the elastomer installed at the top.

(c) Integration on TacThru-UMI: The TacThru-UMI platform includes a robot end-effector (left) and a data collector (right) that share identical body and finger designs.

Manipulation with TacThru-UMI

ⓘ Use to browse different rollouts for the tasks. All videos are in 4x speed.

Pick Bottle

Goal: Pick up the bottle and put it into a bowl
Validates TacThru-UMI's effectiveness in basic imitation learning and real-world execution.

Pull Tissue

Goal: Grasp a tissue and pull it out
Evaluates TacThru's visual perception capability for handling thin and soft objects where tactile feedback is unreliable.

Sort Bolt

Goal: Grasp a bolt and sort it into the corresponding bowl
Assesses TacThru's capability to distinguish object shape and color through STS perception.

Hang Scissors

Goal: Grasp a pair of scissors and hang it onto the hook
Evaluates whether tactile feedback can reliably distinguish task completion from missed attempts.

Insert Cap

Goal: Pick up the bottle cap and insert it onto a mount
Assesses TacThru's ability to perform visual servoing when visible, and fall back to tactile guidance under occlusion.

Results

TT-M (TacThru with markers) achieves the highest overall success rate (85.5%).

Each task targets a specific sensing capability: PickBottle (basic manipulation), PullTissue (thin-and-soft object manipulation), SortBolt (visual discrimination), HangScissors (tactile discrimination), and InsertCap (multimodal fusion).

Error bars show standard deviation across evaluation, and the rightmost column presents overall performance averages.

BibTeX

@article{li2026simultaneous,
  title={Simultaneous tactile-visual perception for learning multimodal robot manipulation},
  author={Li, Yuyang and Chen, Yinghan and Zhao, Zihang and Li, Puhao and Liu, Tengyu and Huang, Siyuan and Zhu, Yixin},
  journal={IEEE Robotics and Automation Letters},
  year={2026},
  publisher={IEEE}
}

Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation

IEEE Robotics and Automation Letters (RA-L)

The TacThru Sensor

Secret Recipes for
Efficient and Robust Marker Tracking

Fabrication Details of
TacThru / TacThru-UMI