My research interest lies in robot learning and my long-term goal is
to build a generalized robot that can accomplish long-horizon and complex
tasks. To be mroe specific, I work on vision-language-action models,
world models, representation learning, and legged locomotion.
I am interning at Meta FAIR in the summer of 2025, working on world models and vision-language-action models.
We propose a data-driven framework for fine-tuning locomotion policies, targeting these hard-to-simulate objectives. Our framework leverages real-world data to model these objectives and incorporates the learned model into simulation for policy improvement.
We propose a generalizable world model pre-trained by large scale and diverse video dataset using latent action extracted by VQ-VAE and then fine-tune the world model by robot data to obtain an accurate dynamic function.
We present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.
Research projects
A real-time vision-language manipulation system
(1) Build a vision-language manipulation deployment system to accomplish a variety of real world tasks based on Franka Panda Robot.
(2) A robust client-server architecture is employed to enable low-latency remote control.
(3) Fine-tune a vision-language manipulation model(CLIPORT), achieving 90% success rate in real-world experiments.
Legged robots learn from physical world
(1) Establish a sim2real locomotion deployment codebase for legged robots.
(2) Develop a real-world learning framework that acquires data from real-world
interactions and performs policy training on a remote server.
Visual planning for legged robots
(1) Leverage the visual understanding and high-level planning abilities of large multimodal
models(like GPT4o), enabling legged robots to accomplish multi-step and complex tasks,
such as sending packages, and objects retrieval.