About Me
🔥 News
2026.3: 🎉🎉 Intern-S1-Pro technical report is released.
2026.2: 🎉🎉 InternAgent-1.5 technical report is released.
2025.12: 🎉🎉 SciEvalKit (An Open-source Evaluation Toolkit for Scientific General Intelligence) is released.
2025.10: 🎉🎉 Codes of InternAgent 1.0 is released on github.
2025.7: 🎉🎉 SPOT is accepted by IEEE T-PAMI 2025.
2025.6: 🎉🎉 Two papers (Lumina Image 2.0 and Chimera) are accepted by ICCV 2025. One is about text-to-image generation, the other is about multimodal reasoning.
2025.5: 🎉🎉 Two papers (SurveyForge and Dolphin) are accepted by ACL 2025. Both are introduced for accelerate scientific research.
2025.5: 🎉🎉 We release InternAgent (NovelSeek), a unified closed-loop multi-agent framework for Automatic Scientific Research.
2025.2: 🎉🎉 One paper (CST-Stereo) is accepted by CVPR 2025.
2024.12: 🎉🎉 One paper (GeoX) is accepted by ICLR 2025.
2024.12: 🎉🎉 One paper (AIOStereo) is accepted by AAAI 2025.
2024.10: 🎉🎉 I receive the national scholarship.
2024.09: 🎉🎉 Two papers (AdaptiveDiffusion and 3DET-Mamba) are accepted by NeurIPS 2024.
2024.07: 🎉🎉 One paper (Reg-TTA3D) is accepted by ECCV 2024.
2024.01: 🎉🎉 One paper (ReSimAD) is accepted by ICLR 2024.
2023.09: 🎉🎉 One Paper (AD-PT) is accepted by NeurIPS 2023.
2023.02: 🎉🎉 Two Papers (Bi3D and Uni3D) are accepted by CVPR 2023.
2022.07: 🎉🎉 One Paper (HelixFormer) is accepted by ACM'MM 2022.
📝 Publications · Preprints

Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng*, Mingsheng Li*, Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue
- a scalable and low-cost multi-modal pipeline designed to boost the ability of existing LMMs with domain-specific experts.

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao
- State-of-the-art text-to-image generation model.

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
Jiakang Yuan, Xiangchao Yan, Shiyang Feng, Bo Zhang, Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou
- Propose Dolphin, a closed-loop auto-research framework.

Xiangchao Yan*, Shiyang Feng*, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai
- Propose SurveyForge which can automatically generate and refine the content of survey.

Consistency-aware Self-Training for Iterative-based Stereo Matching
Jingyi Zhou*, Peng Ye*, Haoyu Zhang, Jiakang Yuan (Project Leader), Qiang Rao, YangChenXu Liu, Cailin Wu, Feng Xu, Tao Chen
- Propose CST-Stereo, which achieves impressive results in various scenarios, including in domain, domain adaptive and domain generalization,

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia*, Mingsheng Li*, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang
- Propose GeoX, a multi-modal large model focusing on geometric understanding and reasoning tasks which reveals the large potential of formalized visual-language pre-training in enhancing geometric problem-solving abilities.

All-in-One: Transferring Vision Foundation Models into Stereo Matching
Jingyi Zhou*, Haoyu Zhang*, Jiakang Yuan*, Peng Ye, Tao Chen, Hao Jiang, Meiya Chen, Yangyang Zhang
- Propose AIOStereo to flexibly select and transfer knowledge from multiple heterogeneous VFMs to a single stereo matching model. (Rank 1st on Middlebury Stereo Evaluation)

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Hancheng Ye*, Jiakang Yuan*, Renqiu Xia, Xiangchao Yan, Tao Chen, Junchi Yan, Botian Shi, Bo Zhang
- Propose AdaptiveDiffusion to adaptively reduce the noise prediction steps during the denoising proces guided by the third-order latent difference.

3DET-Mamba: State Space Model for End-to-End 3D Object Detection
Mingsheng Li*, Jiakang Yuan*, Sijin Chen, Lin Zhang, Anyu Zhu, Xin Chen, Tao Chen
- Exploit the potential of Mamba architecture on 3D scene-level perception for the first time.

Reg-TTA3D: Better Regression Makes Better Test-time Adaptive 3D Object Detection
Jiakang Yuan, Bo Zhang, Kaixiong Gong, Xiangyu Yue, Botian Shi, Yu Qiao, Tao Chen
- Explore a new task named test-time domain adaptive 3D object detection and propose a pseudo-label-based test-time adaptative 3D object detection method.

Bo Zhang*, Xinyu Cai*, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao
- Provide a new perspective and approach of alleviating the domain shifts, by proposing a Reconstruction-Simulation-Perception scheme.

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset
Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao
- Build a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learn generalizable representations.

Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao
- Propose a Bi-domain active learning approach which select samples from both source and target domain to solve the cross-domain 3D object detection task.

Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao
- Present a Uni3D which tackle multi-dataset 3D object detection from data-level and semantic-level.

Bo Zhang*, Jiakang Yuan*, Baopu Li, Tao Chen, Jiayuan Fan, Botian Shi
- Propose a Transformer-based double-helix model to achieve the cross-image object semantic relation mining in a bidirectional and symmetrical manner.
📖 Educations
-
2022.09 — Present
Ph.D. Candidate · School of Information Science and Technology
-
2018.09 — 2022.06
Bachelor of Engineering · Electronic Engineering
💬 Invited Talks
-
2023.09
Efficient Pre-training of Autonomous Driving
-
2023.07
Towards 3D General Representation · Techbeat
-
2023.03
Transferability of Autonomous Driving
💻 Internships
-
2024.10 — 2026.01
China
-
2022.08 — 2024.02
China



