微信视频号:sph0RgSyDYV47z6
快手号:4874645212
抖音号:dy0so323fq2w
小红书号:95619019828
B站1:UID:3546863642871878
B站2:UID: 3546955410049087
近年来,以LLM、VLM和VLA为代表的基础模型在自动驾驶决策中扮演着越来越重要的角色,吸引了学术界和工业界越来越多的关注。许多小伙伴们询问是否有系统的分类汇总。本文按照模型类别,对决策的基础模型进行汇总,后续还将进一步梳理相关算法,并第一时间汇总至『自动驾驶之心知识星球』,欢迎大家一起学习交流~
基于LLM的方法
基于LLM的方法主要是利用大模型的推理能力描述自动驾驶,输入自动驾驶和大模型结合的早期阶段,但仍然值得学习~
Distilling Multi-modal Large Language Models for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2501.09757
-
会议名称:arXiv
LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models
-
论文链接:https://arxiv.org/pdf/2501.05057
-
会议名称:arXiv
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
-
论文链接:https://arxiv.org/2503.07234
-
会议名称:arXiv
PADriver: Towards Personalized Autonomous Driving
-
论文链接:https://arxiv.org/pdf/2505.05240
-
会议名称:arXiv
Towards Human-Centric Autonomous Driving: AFast-Slow Architecture Integrating Large LanguageModel Guidance with Reinforcement Learning
-
论文链接:https://arxiv.org/pdf/2505.06875
-
项目主页:https://drive.google.com/drive/folders/1K0WgRw1SdJL-JufvJNaTO1ES5SOuSj6p
-
会议名称:arXiv
Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM
-
论文链接:https://arxiv.org/abs/2410.04759
-
会议名称:arXiv
Empowering autonomous driving with large language models: A safety perspective
-
论文链接:https://arxiv.org/abs/2312.00812
-
会议名称:ICLR 2024
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
-
论文链接:https://arxiv.org/pdf/2307.07162.pdf
-
代码:https://github.com/PJLab-ADG/DriveLikeAHuman
-
会议名称:arXiv
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
-
论文链接:https://arxiv.org/abs/2310.01957
-
代码:https://github.com/wayveai/Driving-with-LLMs
-
会议名称:LCRA 2024
A Language Agent for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2311.10813
-
项目主页:https://usc-gvl.github.io/Agent-Driver/
-
会议名称:arXiv
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2310.03026
-
项目主页:https://sites.google.com/view/llm-mpc
-
会议名称:arXiv
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles
-
论文链接:https://arxiv.org/2310.08034v1
-
会议名称:MITS 2024
Dilu: A knowledge-driven approach to autonomous driving with large language models
-
论文链接:https://arxiv.org/abs/2309.16292
-
项目主页:https://pjlab-adg.github.io/DiLu/
-
代码:https://github.com/PJLab-ADG/DiLu
-
会议名称:LCLR 2024
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
-
论文链接:https://arxiv.org/pdf/2505.05360
-
会议名称:arXiv
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning
-
论文链接:https://arxiv.org/abs/2502.01387
-
项目主页:https://perfectxu88.github.io/TeLL-Drive.github.io/
-
会议名称:arXiv
基于VLM的方法
基于VLM和VLA的算法是当前的主流范式,因为视觉是自动驾驶依赖最多的传感器,在这个部分我们汇总了当前最新的工作供大家参考和学习~
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
-
论文链接:https://arxiv.org/abs/2506.18234
-
会议名称:arXiv
FutureSightDrive: Visualizing Trajectory Planning with Spatio-Temporal CoT for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2505.17685
-
代码:https://github.com/MIV-XJTU/FSDrive
-
会议名称:arXiv
Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving
-
论文链接:https://arxiv.org/abs/2501.08861
-
会议名称:arXiv
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
-
论文链接:https://arxiv.org/abs/2503.19755
-
代码:https://github.com/xiaomi-mlab/Orion
-
会议名称:arXiv
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
-
论文链接:https://arxiv.org/abs/2410.05963
-
会议名称:NeurIPS 2024
LingoQA: Visual Question Answering for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2312.14115
-
代码:https://github.com/wayveai/LingoQA/
-
会议名称:ECCV 2024
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
-
论文链接:https://arxiv.org/abs/2402.12289
-
项目主页:https://tsinghua-mars-lab.github.io/DriveVLM/
-
会议名称:arXiv
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
-
论文链接:https://arxiv.org/abs/2405.15324
-
代码:https://github.com/PJLab-ADG/LeapAD
-
会议名称:NeurIPS 2024
ADAPT: Action-aware Driving Caption Transformer
-
论文链接:https://arxiv.org/abs/2302.00673
-
代码:https://github.com/jxbbb/ADAPT
-
会议名称:ICRA 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
-
论文链接:https://arxiv.org/abs/2310.01412
-
项目主页:https://tonyxuqaq.github.io/projects/DriveGPT4/
-
会议名称:RAL 2024
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2505.00284
-
代码:https://github.com/michigan-traffic-lab/LightEMMA
-
会议名称:arXiv
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
-
论文链接:https://arxiv.org/abs/2505.12670
-
会议名称:arXiv
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision
-
论文链接:https://arxiv.org/pdf/2412.14446
-
会议名称:arXiv
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
-
论文链接:https://arxiv.org/pdf/2412.15208
-
代码:https://github.com/taco-group/OpenEMMA
-
会议名称:WACV 2025
CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multi modal Model
-
论文链接:https://arxiv.org/pdf/2412.04209
-
会议名称:arXiv
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
-
论文链接:https://arxiv.org/2412.09951
-
项目主页:https://wyddmw.github.io/WiseAD_demo/
-
代码:https://github.com/wyddmw/WiseAD
-
会议名称:arXiv
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
-
论文链接:https://arxiv.org/2502.00843
-
会议名称:arXiv
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multi modal Driver Attention Fusion
-
论文链接:https://arxiv.org/2502.18042
-
会议名称:arXiv
VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2408.04821
-
会议名称:ICML 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
-
论文链接:https://arxiv.org/2502.14917
-
会议名称:arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
-
论文链接:https://arxiv.org/pdf/2503.07608
-
代码:https://github.com/hustvl/AlphaDrive
-
会议名称:arXiv
X-Driver: Explainable Autonomous Driving with Vision-Language Models
-
论文链接:https://arxiv.org/pdf/2505.05098
-
会议名称:arXiv
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
-
论文链接:https://arxiv.org/pdf/2505.08725
-
代码:https://arxiv.org/pdf/2505.08725
-
会议名称:arXiv
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
-
论文链接:https://arxiv.org/pdf/2505.00284
-
代码:https://github.com/michigan-traffic-lab/LightEMMA
-
会议名称:arXiv
基于VLA的方法
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
-
论文链接:https://arxiv.org/abs/2506.13757
-
项目主页:https://autovla.github.io/
-
代码:https://github.com/ucla-mobility/AutoVLA
-
会议名称:arXiv
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
-
论文链接:https://arxiv.org/abs/2505.19381
-
会议名称:arXiv
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
-
论文链接:https://arxiv.org/abs/2505.23757
-
项目主页:http://impromptu-vla.c7w.tech/
-
代码:https://github.com/ahydchh/Impromptu-VLA
-
会议名称:arXiv
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
-
论文链接:https://arxiv.org/abs/2505.16278
-
项目主页:https://thinklab-sjtu.github.io/DriveMoE/
-
会议名称:arXiv
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
-
论文链接:https://arxiv.org/pdf/2503.23463
-
代码:https://github.com/DriveVLA/OpenDriveVLA
-
会议名称:arXiv
微信视频号:sph0RgSyDYV47z6
快手号:4874645212
抖音号:dy0so323fq2w
小红书号:95619019828
B站1:UID:3546863642871878
B站2:UID: 3546955410049087
参考文献链接
VLM还是VLA?从现有工作看自动驾驶多模态大模型的发展趋势~