Please log in

Paper / Information search system




Please log in

  • Summary & Details

Scene-Aware Interaction Technology based on Multimodal Sensing Information

マルチモーダルセンシング情報に基づくScene-Aware Interaction 技術

Detailed Information



Category(E)Look at, Innovating Automotive Electronics
Author(J)1) 堀 智織, 2) スホン・チェン, 3) アヌープ・チェリアン, 4) 堀 貴明, 5) ブレット・ハーシャム, 6) ティム・K・マークス, 7) ジョナトン・ルルー, 8) アラン・サリヴァン, 9) アンソニー・ヴェトロ, 10) 土屋 政人
Author(E)1) Chiori Hori, 2) Masato Tsuchiya, 3) Siheng Chen, 4) Anoop Cherian, 5) Takaaki Hori, 6) Bret Harsham, 7) Tim K. Marks, 8) Jonathan Le Roux, 9) Alan Sullivan, 10) Anthony Vetro
Affiliation(J)1) Mitsubishi Electric Research Laboratories, 2) Mitsubishi Electric Research Laboratories, 3) Mitsubishi Electric Research Laboratories, 4) Mitsubishi Electric Research Laboratories, 5) Mitsubishi Electric Research Laboratories, 6) Mitsubishi Electric Research Laboratories, 7) Mitsubishi Electric Research Laboratories, 8) Mitsubishi Electric Research Laboratories, 9) Mitsubishi Electric Research Laboratories, 10) 三菱電機
Abstract(J)我々はEnd-to-End深層学習を用いて、複数のセンサが収集した情報(マルチモーダルセンシング情報)から周囲の状況を機械が自然な言語で理解し、人とより円滑な意思疎通を実現する「Scene-Aware Interaction技術」を開発した。Scene-Aware interaction技術は、カメラで撮影した画像情報、マイクロフォンで集音した音響情報、LiDARやレーダーで取得した位置情報などのマルチモーダルセンシング情報から、何がどこでどのような状態にあるのか、誰がどこで何をしているのか、といった周囲の状況を機械が自然言語で理解し、人間との会話の文脈も考慮して応答文を生成する技術である。本Scene-Aware Interaction技術は、ロボットやモニタリングシステムといった状況理解に基づき人間とインタラクションを必要とする様々なシステムへの応用が期待できる画期的な技術である。本稿では、その応用例として車載の様々なセンサから取得された複数のセンサ情報に基づき状況を理解し経路案内を行うシステムを紹介する。


Abstract(E)We have developed a novel “scene-aware interaction” technology capable of highly natural and intuitive interaction with humans using natural language. The system incorporates end-to-end deep learning to both understand scenes and interact with users about them. The technology analyzes multimodal sensing information such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR. It understands scenes at the level of “what is located where,” “who is doing what,” etc., for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The interaction technology can be applied to various kinds of applications such as robots and monitoring systems that require interaction based on scene understanding. This paper introduces a car navigation system using the proposed scene-aware interaction technology in which the multimodal sensor information is obtained through car-mounted sensors.

About search


How to use the search box

You can enter up to 5 search conditions. The number of search boxes can be increased or decreased with the "+" and "-" buttons on the right.
If you enter multiple words separated by spaces in one search box, the data that "contains all" of the entered words will be searched (AND search).
Example) X (space) Y → "X and Y (including)"

How to use "AND" and "OR" pull-down

If "AND" is specified, the "contains both" data of the phrase entered in the previous and next search boxes will be searched. If you specify "OR", the data that "contains" any of the words entered in the search boxes before and after is searched.
Example) X AND Y → "X and Y (including)"  X OR Z → "X or Z (including)"
If AND and OR searches are mixed, OR search has priority.
Example) X AND Y OR Z → X AND (Y OR Z)
If AND search and multiple OR search are mixed, OR search has priority.
Example) W AND X OR Y OR Z → W AND (X OR Y OR Z)

How to use the search filters

Use the "search filters" when you want to narrow down the search results, such as when there are too many search results. If you check each item, the search results will be narrowed down to only the data that includes that item.
The number in "()" after each item is the number of data that includes that item.

Search tips

When searching by author name, enter the first and last name separated by a space, such as "Taro Jidosha".