当前位置: 首页 > news >正文

VLM还是VLA?从现有工作看自动驾驶多模态大模型的发展趋势~

微信视频号:sph0RgSyDYV47z6
快手号:4874645212
抖音号:dy0so323fq2w
小红书号:95619019828
B站1:UID:3546863642871878
B站2:UID: 3546955410049087
近年来,以LLM、VLM和VLA为代表的基础模型在自动驾驶决策中扮演着越来越重要的角色,吸引了学术界和工业界越来越多的关注。许多小伙伴们询问是否有系统的分类汇总。本文按照模型类别,对决策的基础模型进行汇总,后续还将进一步梳理相关算法,并第一时间汇总至『自动驾驶之心知识星球』,欢迎大家一起学习交流~
基于LLM的方法
基于LLM的方法主要是利用大模型的推理能力描述自动驾驶,输入自动驾驶和大模型结合的早期阶段,但仍然值得学习~
Distilling Multi-modal Large Language Models for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2501.09757
  • 会议名称:arXiv
LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models
  • 论文链接:https://arxiv.org/pdf/2501.05057
  • 会议名称:arXiv
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
  • 论文链接:https://arxiv.org/2503.07234
  • 会议名称:arXiv
PADriver: Towards Personalized Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2505.05240
  • 会议名称:arXiv
Towards Human-Centric Autonomous Driving: AFast-Slow Architecture Integrating Large LanguageModel Guidance with Reinforcement Learning
  • 论文链接:https://arxiv.org/pdf/2505.06875
  • 项目主页:https://drive.google.com/drive/folders/1K0WgRw1SdJL-JufvJNaTO1ES5SOuSj6p
  • 会议名称:arXiv
Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM
  • 论文链接:https://arxiv.org/abs/2410.04759
  • 会议名称:arXiv
Empowering autonomous driving with large language models: A safety perspective
  • 论文链接:https://arxiv.org/abs/2312.00812
  • 会议名称:ICLR 2024
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
  • 论文链接:https://arxiv.org/pdf/2307.07162.pdf
  • 代码:https://github.com/PJLab-ADG/DriveLikeAHuman
  • 会议名称:arXiv
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2310.01957
  • 代码:https://github.com/wayveai/Driving-with-LLMs
  • 会议名称:LCRA 2024
A Language Agent for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2311.10813
  • 项目主页:https://usc-gvl.github.io/Agent-Driver/
  • 会议名称:arXiv
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2310.03026
  • 项目主页:https://sites.google.com/view/llm-mpc
  • 会议名称:arXiv
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles
  • 论文链接:https://arxiv.org/2310.08034v1
  • 会议名称:MITS 2024
Dilu: A knowledge-driven approach to autonomous driving with large language models
  • 论文链接:https://arxiv.org/abs/2309.16292
  • 项目主页:https://pjlab-adg.github.io/DiLu/
  • 代码:https://github.com/PJLab-ADG/DiLu
  • 会议名称:LCLR 2024
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
  • 论文链接:https://arxiv.org/pdf/2505.05360
  • 会议名称:arXiv
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning
  • 论文链接:https://arxiv.org/abs/2502.01387
  • 项目主页:https://perfectxu88.github.io/TeLL-Drive.github.io/
  • 会议名称:arXiv
基于VLM的方法
基于VLM和VLA的算法是当前的主流范式,因为视觉是自动驾驶依赖最多的传感器,在这个部分我们汇总了当前最新的工作供大家参考和学习~
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
  • 论文链接:https://arxiv.org/abs/2506.18234
  • 会议名称:arXiv
FutureSightDrive: Visualizing Trajectory Planning with Spatio-Temporal CoT for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.17685
  • 代码:https://github.com/MIV-XJTU/FSDrive
  • 会议名称:arXiv
Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2501.08861
  • 会议名称:arXiv
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
  • 论文链接:https://arxiv.org/abs/2503.19755
  • 代码:https://github.com/xiaomi-mlab/Orion
  • 会议名称:arXiv
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
  • 论文链接:https://arxiv.org/abs/2410.05963
  • 会议名称:NeurIPS 2024
LingoQA: Visual Question Answering for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2312.14115
  • 代码:https://github.com/wayveai/LingoQA/
  • 会议名称:ECCV 2024
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
  • 论文链接:https://arxiv.org/abs/2402.12289
  • 项目主页:https://tsinghua-mars-lab.github.io/DriveVLM/
  • 会议名称:arXiv
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2405.15324
  • 代码:https://github.com/PJLab-ADG/LeapAD
  • 会议名称:NeurIPS 2024
ADAPT: Action-aware Driving Caption Transformer
  • 论文链接:https://arxiv.org/abs/2302.00673
  • 代码:https://github.com/jxbbb/ADAPT
  • 会议名称:ICRA 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
  • 论文链接:https://arxiv.org/abs/2310.01412
  • 项目主页:https://tonyxuqaq.github.io/projects/DriveGPT4/
  • 会议名称:RAL 2024
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.00284
  • 代码:https://github.com/michigan-traffic-lab/LightEMMA
  • 会议名称:arXiv
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
  • 论文链接:https://arxiv.org/abs/2505.12670
  • 会议名称:arXiv
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision
  • 论文链接:https://arxiv.org/pdf/2412.14446
  • 会议名称:arXiv
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2412.15208
  • 代码:https://github.com/taco-group/OpenEMMA
  • 会议名称:WACV 2025
CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multi modal Model
  • 论文链接:https://arxiv.org/pdf/2412.04209
  • 会议名称:arXiv
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
  • 论文链接:https://arxiv.org/2412.09951
  • 项目主页:https://wyddmw.github.io/WiseAD_demo/
  • 代码:https://github.com/wyddmw/WiseAD
  • 会议名称:arXiv
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
  • 论文链接:https://arxiv.org/2502.00843
  • 会议名称:arXiv
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multi modal Driver Attention Fusion
  • 论文链接:https://arxiv.org/2502.18042
  • 会议名称:arXiv
VLM-MPC: Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC) for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2408.04821
  • 会议名称:ICML 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
  • 论文链接:https://arxiv.org/2502.14917
  • 会议名称:arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
  • 论文链接:https://arxiv.org/pdf/2503.07608
  • 代码:https://github.com/hustvl/AlphaDrive
  • 会议名称:arXiv
X-Driver: Explainable Autonomous Driving with Vision-Language Models
  • 论文链接:https://arxiv.org/pdf/2505.05098
  • 会议名称:arXiv
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2505.08725
  • 代码:https://arxiv.org/pdf/2505.08725
  • 会议名称:arXiv
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
  • 论文链接:https://arxiv.org/pdf/2505.00284
  • 代码:https://github.com/michigan-traffic-lab/LightEMMA
  • 会议名称:arXiv
基于VLA的方法
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
  • 论文链接:https://arxiv.org/abs/2506.13757
  • 项目主页:https://autovla.github.io/
  • 代码:https://github.com/ucla-mobility/AutoVLA
  • 会议名称:arXiv
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.19381
  • 会议名称:arXiv
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
  • 论文链接:https://arxiv.org/abs/2505.23757
  • 项目主页:http://impromptu-vla.c7w.tech/
  • 代码:https://github.com/ahydchh/Impromptu-VLA
  • 会议名称:arXiv
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
  • 论文链接:https://arxiv.org/abs/2505.16278
  • 项目主页:https://thinklab-sjtu.github.io/DriveMoE/
  • 会议名称:arXiv
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
  • 论文链接:https://arxiv.org/pdf/2503.23463
  • 代码:https://github.com/DriveVLA/OpenDriveVLA
  • 会议名称:arXiv
 
微信视频号:sph0RgSyDYV47z6
快手号:4874645212
抖音号:dy0so323fq2w
小红书号:95619019828
B站1:UID:3546863642871878
B站2:UID: 3546955410049087
 
参考文献链接
VLM还是VLA?从现有工作看自动驾驶多模态大模型的发展趋势~
 
 
http://www.sczhlp.com/news/24177/

相关文章:

  • 买网站域名如何创造一个自己的网站
  • 深圳专业网站公司网站提交入口
  • 网站制作三级页面关键词林俊杰百度云
  • 建设银行官方网站下载百度帐号登录个人中心
  • 国外平面设计分享网站有哪些网络营销手段有哪些方式
  • 铆焊加工平台站长工具seo综合查询全面解析
  • 成都品牌网站建设做抖音seo排名软件是否合法
  • 买了一个域名怎么做网站搜索关键词排名优化软件
  • 做分析图地图网站北京优化推广公司
  • 普陀区建设局网站企业网站设计制作
  • 如果在rust里一路clone到底,会怎样
  • 做货代用什么网站找客户龙岗seo优化
  • dede网站地图html文件武汉seo招聘网
  • html怎么做静态网站怎么做
  • 做301跳转会影响之前网站排名吗在线优化网站
  • 常州网站seo代理加盟如何注册一个网站
  • 长沙建网搜索引擎seo优化
  • seo网站优化推广怎么做南昌seo排名公司
  • asp.ne做网站seo交流网
  • 用四阶RK算法编程计算求解简单的振动微分方程并画出曲线
  • node版本不符,解决冲突的方法
  • 女生做网站运营好吗网站建设产品介绍
  • 做网站租服务器黑帽seo论坛
  • 网址导航的意思巢湖seo推广
  • 网站制作网站维护网站策划是什么
  • 网站做桌面应用 iOS怎么样优化关键词排名
  • 想做网站选什么专业seo学校
  • macOS Sonoma 14.7.8 (23H730) 正式版 ISO、IPSW、PKG 下载
  • Gitee领跑2025年代码托管赛道:全流程研发管理的中国方案
  • macOS Sequoia 15.6.1 (24G90) Boot ISO 原版可引导镜像下载