X

Download Encoding Robotic Sensor States for Q-Learning PowerPoint Presentation

SlidesFinder-Advertising-Design.jpg

Login   OR  Register
X


Iframe embed code :



Presentation url :

X

Description :

Available Encoding Robotic Sensor States for Q-Learning powerpoint presentation for free download which is uploaded by steve an active user in belonging ppt presentation Science & Technology category.

Tags :

Encoding Robotic Sensor States for Q-Learning

Home / Science & Technology / Science & Technology Presentations / Encoding Robotic Sensor States for Q-Learning PowerPoint Presentation

Encoding Robotic Sensor States for Q-Learning PowerPoint Presentation

Ppt Presentation Embed Code   Zoom Ppt Presentation

PowerPoint is the world's most popular presentation software which can let you create professional Encoding Robotic Sensor States for Q-Learning powerpoint presentation easily and in no time. This helps you give your presentation on Encoding Robotic Sensor States for Q-Learning in a conference, a school lecture, a business proposal, in a webinar and business and professional representations.

The uploader spent his/her valuable time to create this Encoding Robotic Sensor States for Q-Learning powerpoint presentation slides, to share his/her useful content with the world. This ppt presentation uploaded by worldwideweb in Science & Technology ppt presentation category is available for free download,and can be used according to your industries like finance, marketing, education, health and many more.

About This Presentation

Slide 1 - Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College
Slide 2 - Outline Statement of Problem Q-Learning Self-Organizing Maps Experiments Discussion
Slide 3 - Statement of Problem Goal Make robots do what we want Minimize/eliminate programming Proposed Solution: Reinforcement Learning Specify desired behavior using rewards Express rewards in terms of sensor states Use machine learning to induce desired actions Target Platform Lego Mindstorms NXT
Slide 4 - Robotic Platform
Slide 5 - Experimental Task Drive forward Avoid hitting things
Slide 6 - Q-Learning Table of expected rewards (“Q-values”)‏ Indexed by state and action Algorithm steps Calculate state index from sensor values Calculate the reward Update previous Q-value Select and perform an action Q(s,a) = (1 - α) Q(s,a) + α (r + γ max(Q(s',a)))‏
Slide 7 - Certain sensors provide continuous values Sonar Motor encoders Q-Learning requires discrete inputs Group continuous values into discrete “buckets” [Mahadevan and Connell, 1992] Q-Learning produces discrete actions Forward Back-left/Back-right Q-Learning and Robots
Slide 8 - Creating Discrete Inputs Basic approach Discretize continuous values into sets Combine each discretized tuple into a single index Another approach Self-Organizing Map Induces a discretization of continuous values [Touzet 1997] [Smith 2002]
Slide 9 - Self-Organizing Map (SOM)‏ 2D Grid of Output Nodes Each output corresponds to an ideal input value Inputs can be anything with a distance function Activating an Output Present input to the network Output with the closest ideal input is the “winner”
Slide 10 - Applying the SOM Each input is a vector of sensor values Sonar Left/Right Bump Sensors Left/Right Motor Speeds Distance function is sum-of-squared-differences
Slide 11 - SOM Unsupervised Learning Present an input to the network Find the winning output node Update ideal input for winner and neighbors weightij = weightij + (α * (inputij – weightij)) Neighborhood function
Slide 12 - Experiments Implemented in Java (LeJOS 0.85) Each experiment 240 seconds (800 Q-Learning iterations) 36 States Three actions Both motors forward Left motor backward, right motor stopped Left motor stopped, right motor backward
Slide 13 - Rewards Either bump sensor pressed: 0.0 Base reward: 1.0 if both motors are going forward 0.5 otherwise Multiplier: Sonar value greater than 20 cm: 1 Otherwise, (sonar value) / 20
Slide 14 - Parameters Discount (γ): 0.5 Learning rate (α): 1/(1 + (t/100)), t is the current iteration (time step) Used for both SOM and Q-Learning [Smith 2002] Exploration/Exploitation Epsilon = α/4 Probability of random action Selected using weighted distribution
Slide 15 - Experimental Controls Q-Learning without SOM Qa States Current action (1-3) Current bumper states Quantized sonar values (0-19 cm; 20-39; 40+) Qb States Current bumper states Quantized sonar values (9) (0-11 cm…; 84-95; 96+)
Slide 16 - SOM Formulations 36 Output Nodes Category “a”: Length-5 input vectors Motor speeds, bumper values, sonar value Category “b”: Length-3 input vectors Bumper values, sonar value All sensor values normalized to [0-100]
Slide 17 - SOM Formulations QSOM Based on [Smith 2002] Gaussian Neighborhood Neighborhood size is one-half SOM width QT Based on [Touzet 1997] Learning rate is fixed at 0.9 Neighborhood is immediate Manhattan neighbors Neighbor learning rate is 0.4
Slide 18 - Quantitative Results
Slide 19 - Qualitative Results QSOMa Motor speeds ranged from 2% to 50% Sonar values stuck between 90% and 94% QSOMb Sonar values range from 40% to 95% Best two runs arguably the best of the bunch Very smooth SOM values in both cases
Slide 20 - Qualitative Results QTa Sonar values ranged from 10% to 100% Still a weak performer on average Best performer similar to QTb QTb Developed bump-sensor oriented behavior Made little use of sonar Highly uneven SOM values in both cases
Slide 21 - Experimental Area
Slide 22 - First Movie QSOMb Strong performer (Reward: 661.89) Minimum sonar value: 43.35% (110 cm)
Slide 23 - Second Movie Also QSOMb Typical bad performer (Reward: 451.6) Learns to avoid by always driving backwards Baseline “not-forward” reward: 400.0 Minimum sonar value: 57.51% (146 cm) Hindered by small filming area
Slide 24 - Discussion Use of SOM on NXT can be effective More research needed to address shortcomings Heterogeneity of sensors is a problem Need to try NXT experiments with multiple sonars Previous work involved homogeneous sensors Approachable by undergraduate students Technique taught in junior/senior AI course