SlideShare a Scribd company logo
1 of 18
Cloud Robotics for Building
Conversational Robots
Komei Sugiura
National Institute of Information and Communications Tech., Japan
Beyond the Language Barrier:
NICT’s free software and cloud services
1. Speech to speech translation system: VoiceTra (2010)
>1M downloads.
High performance in translation to/from Asian languages
2. MCML Speech interaction SDK (2013)
The SDK enable the user to build WFST-
based multilingual dialogue systems.
3. Smartphone dialogue apps (2011)
Spoken dialogues and recommendation in tourist
guidance domains
4. Cloud robotics platform rospeex (2013)
40K unique users. Top level quality as dialogue-based TTS
in Japanese.
[New] Automatic captioning SDK for developers
http://www2.nict.go.jp/astrec-ast/mcml-sdk/index_en.html
Free of charge, but authentication required
Video
Motivation:
How can we build communicative robots to help people?
Smartphones and other consumer devices
Speech interfaces give benefit to
consumers
cf. Market size of speech recognition
¥88B@2013→¥170B@2018 (€1.5B)*
Show me today’s
schedule
* Estimation by NEDO, TSC Foresight Vol.8, 2015
Sushi restaurants
around here
Benefit for
QA/search
GPS Contacts Other context
info.
Current communication with robots
Insufficient benefit to consumers
??
??Throw
them away.
Is there any milk
in the fridge?
• Bad recognition accuracy
• User needs to specify [what,
where, how] as well as start/end
conditions
ROSPEEX:
A CLOUD ROBOTICS PLATFORM FOR
MULTILINGUAL SPOKEN DIALOGUES
5
Background: Speech recognition/synthesis is bottleneck
for reducing cost in human-robot interactions
• Synthesized speech sounds
monotonous and unfriendly
• Speech recognition does not work
well than expected
XIMERA 3
(Text-reading)
Voice
talent
Target = Interactions with service robots
Rospeex:
A cloud robotics platform for multilingual spoken dialogues
• >40,000 unique users have used rospeex
• WER =7.9% (accuracy=92.1%) for IWSLT tst2011 (1st Place
Winner in IWSLT12, 13, 14)
• Top-level quality dialogue-oriented TTS
Python & C++ samples
are available
rospeex Search
* Free of charge for research
Rospeex’s positioning in robot dialogue quadrants
8
Cloud APIs
(Google, Microsoft, IBM,
NTT docomo, Wit.ai,…)
Free software
Commercial software
OpenHRI,
PocketSphinx, Festival
Cloud-based
Stand-alone
Robot
middleware-
compatible
Incompatibl
e
Does not work with
very low-spec PCs 
Robotics-specific
logs are lost 
Authentication
Low quality 
Expensive 
8
Distribution of rospeex users
rospeex applications (40k unique users)
Conversational agents in elderly care
facilities, service robots, humanoid,
dialogue agents, speech interface for car
navigation systems or smarthome devices,
…
Analysis: TTS requests depend heavily on individuals
• Question: Do developers use same sentences for TTS? If so, we can
speed up by introducing local cache.
Cache hit
Cache miss
• Analysis on top 88 users
– New requests = 50.4% on average
– An individual uses max. 200 unique sentences
Without a cloud platform, we
cannot conduct large-scale
analysis of robot developers
Introducing cache will
reduce comm. time
MULTIMODAL SPOKEN DIALOGUES
WITH ROBOTS
10
Multimodal language understanding
Kollar+ 2010
HRI 2010 Best Paper
• Input: Text, LRF, Image
• Output: path planning
• E.g. “Go down the hallway”
Iwahashi &
Sugiura+ 2010
• Input: Image and speech
• Output: object manipulation
• E.g. “Place-on Elmo”
Visual QA[2015-] • Input: Image and question
• Output: Answer
• E.g. “How many elephants are there?” -> “2”
Video
LCore: Multimodal Robot Language Acquisition
[Iwahashi, Sugiura, et al 2010]
Key features
• Fully grounded vocabulary
• Imitation learning
• Incremental & interactive learning
• Language independent
• Learning when to ask questions
12
HMM “Place-
on” Place X on Y
Imitation learning for spoken language understanding:
Re-ranking hypotheses using planned trajectories’ likelihood
• Transformation of reference-point-dependent HMMs*
– Input: verb ID, object ID(s)
e.g. <place-on, Object 1, Object 3>
– Transforms HMM from intrinsic coordinate system into world
coordinate system
HMM “Place-on”
World CS
Situation
Place X on Y
* Sugiura et al, IROS 2011 RoboCup Best Paper
HMM-based trajectory generation using dynamic features*
: state sequence
: HMM parameters
: time series of
(position,velocity,acceleration)
Maximum likelihood trajectory
*Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000
: vector of mean vectors
: matrix of covariance
matrices of each OPDF
: matrix of coefficients in
difference approximation
: time series of position
ROBOCUP@HOME
BUILDING DOMESTIC SERVICE ROBOTS
15
RoboCup@Home: Benchmark tests for domestic robots
• RoboCup@Home: The largest competition for domestic robots
– One of the major RoboCup leagues
– Focuses on human-robot interaction and mobile manipulation
– Robots are evaluated by 8 standardized and 3 demonstration tasks
• Scientific challenges
– Navigation in unknown environments (e.g. real shop), handling
everyday objects, spoken dialogues in very noisy environments, …
16
RoboCup@Home Standard Platform Leagues start in 2017
• Many teams need low-cost standardized platforms
• Companies know NAO’s success after selected as soccer-
Standard Platform (Softbank bought Aldebaran @100M USD )
Toyota HSR
• Main use case = partner robot for those who need care
• Lease-based
Softbank Pepper
• Already deployed in restaurants and shops
• Very low price
Both compatible with ROS
CFPs for HSR/Pepper users will be open soon
Summary
• Data-driven approaches
• Multimodal spoken dialogue with robots
• RoboCup and domestic service robots
• …and we’re hiring!

More Related Content

Similar to 20161014IROS_WS

Human-Machine Interface For Presentation Robot
Human-Machine Interface For Presentation RobotHuman-Machine Interface For Presentation Robot
Human-Machine Interface For Presentation RobotAngela Williams
 
An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...Vrije Universiteit Amsterdam
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
 
An Integrated Prototyping Environment For Programmable Automation
An Integrated Prototyping Environment For Programmable AutomationAn Integrated Prototyping Environment For Programmable Automation
An Integrated Prototyping Environment For Programmable AutomationMeshDynamics
 
IRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET Journal
 
Building Robotics Application at Scale using OpenSource from Zero to Hero
Building Robotics Application at Scale using OpenSource from Zero to HeroBuilding Robotics Application at Scale using OpenSource from Zero to Hero
Building Robotics Application at Scale using OpenSource from Zero to HeroAlex Barbosa Coqueiro
 
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...Sofia Eu
 
Key Features Of The Pseudo Code
Key Features Of The Pseudo CodeKey Features Of The Pseudo Code
Key Features Of The Pseudo CodeAngilina Jones
 
Robots in Human Environments
Robots in Human EnvironmentsRobots in Human Environments
Robots in Human EnvironmentsAndreas Heil
 
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de CastroDesarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de CastroFacultad de Informática UCM
 
Live, Work, Play with Intelligent Robots
Live, Work, Play with Intelligent RobotsLive, Work, Play with Intelligent Robots
Live, Work, Play with Intelligent RobotsNUS-ISS
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
MR + AI: Machine Learning for Language in HoloLens & VR Apps
MR + AI: Machine Learning for Language in HoloLens & VR AppsMR + AI: Machine Learning for Language in HoloLens & VR Apps
MR + AI: Machine Learning for Language in HoloLens & VR AppsNick Landry
 
IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016Vaidheswaran CS
 
HoloLens.pdf
HoloLens.pdfHoloLens.pdf
HoloLens.pdfVishwas N
 
Mobility today & what's next. Application ecosystems.
Mobility today & what's next.Application ecosystems.Mobility today & what's next.Application ecosystems.
Mobility today & what's next. Application ecosystems.Petru Jucovschi
 
Figure 1
Figure 1Figure 1
Figure 1butest
 
Figure 1
Figure 1Figure 1
Figure 1butest
 

Similar to 20161014IROS_WS (20)

Human-Machine Interface For Presentation Robot
Human-Machine Interface For Presentation RobotHuman-Machine Interface For Presentation Robot
Human-Machine Interface For Presentation Robot
 
An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
An Integrated Prototyping Environment For Programmable Automation
An Integrated Prototyping Environment For Programmable AutomationAn Integrated Prototyping Environment For Programmable Automation
An Integrated Prototyping Environment For Programmable Automation
 
robocity2013-jderobot
robocity2013-jderobotrobocity2013-jderobot
robocity2013-jderobot
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
IRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for Blinds
 
Building Robotics Application at Scale using OpenSource from Zero to Hero
Building Robotics Application at Scale using OpenSource from Zero to HeroBuilding Robotics Application at Scale using OpenSource from Zero to Hero
Building Robotics Application at Scale using OpenSource from Zero to Hero
 
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
 
Key Features Of The Pseudo Code
Key Features Of The Pseudo CodeKey Features Of The Pseudo Code
Key Features Of The Pseudo Code
 
Robots in Human Environments
Robots in Human EnvironmentsRobots in Human Environments
Robots in Human Environments
 
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de CastroDesarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
 
Live, Work, Play with Intelligent Robots
Live, Work, Play with Intelligent RobotsLive, Work, Play with Intelligent Robots
Live, Work, Play with Intelligent Robots
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
MR + AI: Machine Learning for Language in HoloLens & VR Apps
MR + AI: Machine Learning for Language in HoloLens & VR AppsMR + AI: Machine Learning for Language in HoloLens & VR Apps
MR + AI: Machine Learning for Language in HoloLens & VR Apps
 
IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016IT TRENDS AND PERSPECTIVES 2016
IT TRENDS AND PERSPECTIVES 2016
 
HoloLens.pdf
HoloLens.pdfHoloLens.pdf
HoloLens.pdf
 
Mobility today & what's next. Application ecosystems.
Mobility today & what's next.Application ecosystems.Mobility today & what's next.Application ecosystems.
Mobility today & what's next. Application ecosystems.
 
Figure 1
Figure 1Figure 1
Figure 1
 
Figure 1
Figure 1Figure 1
Figure 1
 

More from Komei Sugiura

ロボティクスにおける言語の利活用
ロボティクスにおける言語の利活用ロボティクスにおける言語の利活用
ロボティクスにおける言語の利活用Komei Sugiura
 
生活支援ロボットにおける 大規模データ収集に向けて
生活支援ロボットにおける大規模データ収集に向けて生活支援ロボットにおける大規模データ収集に向けて
生活支援ロボットにおける 大規模データ収集に向けてKomei Sugiura
 
生活支援ロボットのマルチモーダル言語理解技術
生活支援ロボットのマルチモーダル言語理解技術生活支援ロボットのマルチモーダル言語理解技術
生活支援ロボットのマルチモーダル言語理解技術Komei Sugiura
 
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...Komei Sugiura
 
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けてロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けてKomei Sugiura
 
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...Komei Sugiura
 
言葉や能力の壁を越えるデータ指向知能
言葉や能力の壁を越えるデータ指向知能言葉や能力の壁を越えるデータ指向知能
言葉や能力の壁を越えるデータ指向知能Komei Sugiura
 
20160907rsj16ロボット聴覚OS
20160907rsj16ロボット聴覚OS20160907rsj16ロボット聴覚OS
20160907rsj16ロボット聴覚OSKomei Sugiura
 
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置Komei Sugiura
 
20160221statistic imitation learning and human-robot communication
20160221statistic imitation learning and human-robot communication20160221statistic imitation learning and human-robot communication
20160221statistic imitation learning and human-robot communicationKomei Sugiura
 
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバックKomei Sugiura
 
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測Komei Sugiura
 
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験Komei Sugiura
 
実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築Komei Sugiura
 
20151129インテリジェントホームロボティクス研究会
20151129インテリジェントホームロボティクス研究会20151129インテリジェントホームロボティクス研究会
20151129インテリジェントホームロボティクス研究会Komei Sugiura
 
Japan Robot Week 2014けいはんなロボットフォーラム
Japan Robot Week 2014けいはんなロボットフォーラムJapan Robot Week 2014けいはんなロボットフォーラム
Japan Robot Week 2014けいはんなロボットフォーラムKomei Sugiura
 
Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...Komei Sugiura
 
Introduction to RoboCup@Home
Introduction to RoboCup@HomeIntroduction to RoboCup@Home
Introduction to RoboCup@HomeKomei Sugiura
 
ロボカップ@ホーム入門
ロボカップ@ホーム入門ロボカップ@ホーム入門
ロボカップ@ホーム入門Komei Sugiura
 

More from Komei Sugiura (19)

ロボティクスにおける言語の利活用
ロボティクスにおける言語の利活用ロボティクスにおける言語の利活用
ロボティクスにおける言語の利活用
 
生活支援ロボットにおける 大規模データ収集に向けて
生活支援ロボットにおける大規模データ収集に向けて生活支援ロボットにおける大規模データ収集に向けて
生活支援ロボットにおける 大規模データ収集に向けて
 
生活支援ロボットのマルチモーダル言語理解技術
生活支援ロボットのマルチモーダル言語理解技術生活支援ロボットのマルチモーダル言語理解技術
生活支援ロボットのマルチモーダル言語理解技術
 
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
 
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けてロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
 
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
 
言葉や能力の壁を越えるデータ指向知能
言葉や能力の壁を越えるデータ指向知能言葉や能力の壁を越えるデータ指向知能
言葉や能力の壁を越えるデータ指向知能
 
20160907rsj16ロボット聴覚OS
20160907rsj16ロボット聴覚OS20160907rsj16ロボット聴覚OS
20160907rsj16ロボット聴覚OS
 
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
 
20160221statistic imitation learning and human-robot communication
20160221statistic imitation learning and human-robot communication20160221statistic imitation learning and human-robot communication
20160221statistic imitation learning and human-robot communication
 
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
 
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
 
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
 
実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築実世界の意味を扱う理論と機械知能の構築
実世界の意味を扱う理論と機械知能の構築
 
20151129インテリジェントホームロボティクス研究会
20151129インテリジェントホームロボティクス研究会20151129インテリジェントホームロボティクス研究会
20151129インテリジェントホームロボティクス研究会
 
Japan Robot Week 2014けいはんなロボットフォーラム
Japan Robot Week 2014けいはんなロボットフォーラムJapan Robot Week 2014けいはんなロボットフォーラム
Japan Robot Week 2014けいはんなロボットフォーラム
 
Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...Language acquisition framework for robots: From grounded language acquisition...
Language acquisition framework for robots: From grounded language acquisition...
 
Introduction to RoboCup@Home
Introduction to RoboCup@HomeIntroduction to RoboCup@Home
Introduction to RoboCup@Home
 
ロボカップ@ホーム入門
ロボカップ@ホーム入門ロボカップ@ホーム入門
ロボカップ@ホーム入門
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

20161014IROS_WS

  • 1. Cloud Robotics for Building Conversational Robots Komei Sugiura National Institute of Information and Communications Tech., Japan
  • 2. Beyond the Language Barrier: NICT’s free software and cloud services 1. Speech to speech translation system: VoiceTra (2010) >1M downloads. High performance in translation to/from Asian languages 2. MCML Speech interaction SDK (2013) The SDK enable the user to build WFST- based multilingual dialogue systems. 3. Smartphone dialogue apps (2011) Spoken dialogues and recommendation in tourist guidance domains 4. Cloud robotics platform rospeex (2013) 40K unique users. Top level quality as dialogue-based TTS in Japanese.
  • 3. [New] Automatic captioning SDK for developers http://www2.nict.go.jp/astrec-ast/mcml-sdk/index_en.html Free of charge, but authentication required Video
  • 4. Motivation: How can we build communicative robots to help people? Smartphones and other consumer devices Speech interfaces give benefit to consumers cf. Market size of speech recognition ¥88B@2013→¥170B@2018 (€1.5B)* Show me today’s schedule * Estimation by NEDO, TSC Foresight Vol.8, 2015 Sushi restaurants around here Benefit for QA/search GPS Contacts Other context info. Current communication with robots Insufficient benefit to consumers ?? ??Throw them away. Is there any milk in the fridge? • Bad recognition accuracy • User needs to specify [what, where, how] as well as start/end conditions
  • 5. ROSPEEX: A CLOUD ROBOTICS PLATFORM FOR MULTILINGUAL SPOKEN DIALOGUES 5
  • 6. Background: Speech recognition/synthesis is bottleneck for reducing cost in human-robot interactions • Synthesized speech sounds monotonous and unfriendly • Speech recognition does not work well than expected XIMERA 3 (Text-reading) Voice talent Target = Interactions with service robots
  • 7. Rospeex: A cloud robotics platform for multilingual spoken dialogues • >40,000 unique users have used rospeex • WER =7.9% (accuracy=92.1%) for IWSLT tst2011 (1st Place Winner in IWSLT12, 13, 14) • Top-level quality dialogue-oriented TTS Python & C++ samples are available rospeex Search * Free of charge for research
  • 8. Rospeex’s positioning in robot dialogue quadrants 8 Cloud APIs (Google, Microsoft, IBM, NTT docomo, Wit.ai,…) Free software Commercial software OpenHRI, PocketSphinx, Festival Cloud-based Stand-alone Robot middleware- compatible Incompatibl e Does not work with very low-spec PCs  Robotics-specific logs are lost  Authentication Low quality  Expensive  8 Distribution of rospeex users rospeex applications (40k unique users) Conversational agents in elderly care facilities, service robots, humanoid, dialogue agents, speech interface for car navigation systems or smarthome devices, …
  • 9. Analysis: TTS requests depend heavily on individuals • Question: Do developers use same sentences for TTS? If so, we can speed up by introducing local cache. Cache hit Cache miss • Analysis on top 88 users – New requests = 50.4% on average – An individual uses max. 200 unique sentences Without a cloud platform, we cannot conduct large-scale analysis of robot developers Introducing cache will reduce comm. time
  • 11. Multimodal language understanding Kollar+ 2010 HRI 2010 Best Paper • Input: Text, LRF, Image • Output: path planning • E.g. “Go down the hallway” Iwahashi & Sugiura+ 2010 • Input: Image and speech • Output: object manipulation • E.g. “Place-on Elmo” Visual QA[2015-] • Input: Image and question • Output: Answer • E.g. “How many elephants are there?” -> “2” Video
  • 12. LCore: Multimodal Robot Language Acquisition [Iwahashi, Sugiura, et al 2010] Key features • Fully grounded vocabulary • Imitation learning • Incremental & interactive learning • Language independent • Learning when to ask questions 12
  • 13. HMM “Place- on” Place X on Y Imitation learning for spoken language understanding: Re-ranking hypotheses using planned trajectories’ likelihood • Transformation of reference-point-dependent HMMs* – Input: verb ID, object ID(s) e.g. <place-on, Object 1, Object 3> – Transforms HMM from intrinsic coordinate system into world coordinate system HMM “Place-on” World CS Situation Place X on Y * Sugiura et al, IROS 2011 RoboCup Best Paper
  • 14. HMM-based trajectory generation using dynamic features* : state sequence : HMM parameters : time series of (position,velocity,acceleration) Maximum likelihood trajectory *Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000 : vector of mean vectors : matrix of covariance matrices of each OPDF : matrix of coefficients in difference approximation : time series of position
  • 16. RoboCup@Home: Benchmark tests for domestic robots • RoboCup@Home: The largest competition for domestic robots – One of the major RoboCup leagues – Focuses on human-robot interaction and mobile manipulation – Robots are evaluated by 8 standardized and 3 demonstration tasks • Scientific challenges – Navigation in unknown environments (e.g. real shop), handling everyday objects, spoken dialogues in very noisy environments, … 16
  • 17. RoboCup@Home Standard Platform Leagues start in 2017 • Many teams need low-cost standardized platforms • Companies know NAO’s success after selected as soccer- Standard Platform (Softbank bought Aldebaran @100M USD ) Toyota HSR • Main use case = partner robot for those who need care • Lease-based Softbank Pepper • Already deployed in restaurants and shops • Very low price Both compatible with ROS CFPs for HSR/Pepper users will be open soon
  • 18. Summary • Data-driven approaches • Multimodal spoken dialogue with robots • RoboCup and domestic service robots • …and we’re hiring!