My research interest lies in robot learning and my long-term goal is
to build a generalized robot that can accomplish long-horizon and complex
tasks. To be mroe specific, I work on vision-language-action models,
world models, representation learning, and legged locomotion.
Email: hanchen.cui147[at]gmail.com
I am actively seeking internship opportunities for Summer 2025.
If you're interested in collaborating, feel free to reach out!
We propose a data-driven framework for fine-tuning locomotion policies, targeting these hard-to-simulate objectives. Our framework leverages real-world data to model these objectives and incorporates the learned model into simulation for policy improvement.
We propose a generalizable world model pre-tained by large scale and diverse video dataset and then fine-tune the world model to obtain an accurate dynamic function in a sample-efficient manner.
We present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.
Research projects
A fast-speed vision-language intelligent manipulation system
(1) Build a fast-speed vision-language manipulation deployment system to accomplish a variety of tasks based on Franka Panda Robot.
(2) Enable robust remote control with low latency.
(3) Fine-tune the vision-language manipulation model(CLIPORT), achieving 90% success rate in real-world experiments.
Legged robots learn from physical world
(1) Establish a sim2real locomotion deployment codebase for legged robots.
(2) Build a real-world learning framework which collects data in real world and trains the policy
model in a remote server.
(3) Design and deploy a simple yet effective reward function, where all the reward terms are acquired
from the robot itself without extra sensors.
Visual planning for legged robots
(1) Leverage the strong image understanding and high-level planning ability of large multimodal
models(like GPT4-Vision), enabling legged robots to accomplish multi-step and complex tasks,
such as sending packages, and going out to find an object.
Foundation policy model for robotic manipulation (on-going)
(1) Build a foundation policy model from large-scale expert dataset like Open X-Embodiment.
(2) Extract and label action squences using large vision-language models.
(3) Training and inference use action-aware sequence data instead of MDP.