English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

PlanT: Explainable Planning Transformers via Object-Level Representations

MPS-Authors
/persons/resource/persons127761

Akata,  Zeynep       
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
Supplementary Material (public)
There is no public supplementary material available
Citation

Renz, K., Chitta, K., Mercea, O.-B., Koepke, A. S., Akata, Z., & Geiger, A. (2022). PlanT: Explainable Planning Transformers via Object-Level Representations. In K. Liu, D. Kulic, & J. Ichnowski (Eds.), Proceedings of the 6th Annual Conference on Robot Learning (pp. 459-470). MLResearchPress. Retrieved from https://proceedings.mlr.press/v205/renz23a.html.


Cite as: https://hdl.handle.net/21.11116/0000-000C-1B42-C
Abstract
Planning an optimal route in a complex environment requires efficient
reasoning about the surrounding scene. While human drivers prioritize important
objects and ignore details not relevant to the decision, learning-based
planners typically extract features from dense, high-dimensional grid
representations containing all vehicle and road context information. In this
paper, we propose PlanT, a novel approach for planning in the context of
self-driving that uses a standard transformer architecture. PlanT is based on
imitation learning with a compact object-level input representation. On the
Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the
driving score of the expert) while being 5.3x faster than equivalent
pixel-based planning baselines during inference. Combining PlanT with an
off-the-shelf perception module provides a sensor-based driving system that is
more than 10 points better in terms of driving score than the existing state of
the art. Furthermore, we propose an evaluation protocol to quantify the ability
of planners to identify relevant objects, providing insights regarding their
decision-making. Our results indicate that PlanT can focus on the most relevant
object in the scene, even when this object is geometrically distant.