Research on Autonomous Driving Decision Based on Improved Deep Deterministic Policy Algorithm
- 提供方法
- 版元よりダウンロードリンクを連絡
- 形態
- 価格
- 一般価格(税込):¥6,600 会員価格(税込):¥5,280
- 文献・情報種別
- SAE Paper
No.2022-01-0161
- 掲載ページ
- 1-14(Total 14 p)
- 発行年月
- 2022年 3月
- 出版社
- SAE International
- 言語
- 英語
- イベント
- WCX SAE World Congress Experience 2022
書誌事項
著者(英) | 1) Shi YK, 2) Jian Wu, 3) Shiping Song |
---|---|
勤務先(英) | 1) Jilin University, 2) Jilin University, 3) Jilin University |
抄録(英) | Autonomous driving technology, as the product of the fifth stage of the information technology revolution, is of great significance for improving urban traffic and environmentally friendly sustainable development. Autonomous driving can be divided into three main modules. The input of the decision module is the perception information from the perception module and the output of the control strategy to the control module. The deep reinforcement learning method proposes an end-to-end decision-making system design scheme. This paper adopts the Deep Deterministic Policy Gradient Algorithm (DDPG) that incorporates the Priority Experience Playback (PER) method. The framework of the algorithm is based on the actor-critic network structure model. The model takes the continuously acquired perception information as input and the continuous control of the vehicle as output. Combined with the CARLA simulation environment, the state space of the CNN network based on the input of the car's front view image is designed, and the action space design takes into account the actual situation of the accelerator brake being used at different times. In the design of the reward function, the reward function based on the car state information and the reward function based on the artificial potential field method (APF) are designed respectively. After that, based on the CARLA virtual urban driving environment, the DDPG algorithm and the PER-DDPG algorithm with different reward functions were simulated and verified, and tested in different scenarios. The final experimental results show that the APF-PER-DDPG algorithm performs best. Compared with the DDPG algorithm, the average reward has increased by about 27.7%, and the proportion of dangerous actions is the smallest, which has dropped by about 24.8%. The test results show that improving the sampling method and the reward function based on the artificial potential field can improve the performance of the algorithm. 翻訳 |