Download Encoding Robotic Sensor States for Q-Learning PowerPoint Presentation

Login   OR  Register

Share on Social Media


Home / Science & Technology / Science & Technology Presentations / Encoding Robotic Sensor States for Q-Learning PowerPoint Presentation

Encoding Robotic Sensor States for Q-Learning PowerPoint Presentation

worldwideweb By : worldwideweb

On : Aug 07, 2014

In : Science & Technology

Embed :

Login / Signup - with account for

  • → Make favorite
  • → Flag as inappropriate
  • → Download Presentation
  • → Share Presentation
  • Slide 1 - Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College
  • Slide 2 - Outline Statement of Problem Q-Learning Self-Organizing Maps Experiments Discussion
  • Slide 3 - Statement of Problem Goal Make robots do what we want Minimize/eliminate programming Proposed Solution: Reinforcement Learning Specify desired behavior using rewards Express rewards in terms of sensor states Use machine learning to induce desired actions Target Platform Lego Mindstorms NXT
  • Slide 4 - Robotic Platform
  • Slide 5 - Experimental Task Drive forward Avoid hitting things
  • Slide 6 - Q-Learning Table of expected rewards (“Q-values”)‏ Indexed by state and action Algorithm steps Calculate state index from sensor values Calculate the reward Update previous Q-value Select and perform an action Q(s,a) = (1 - α) Q(s,a) + α (r + γ max(Q(s',a)))‏
  • Slide 7 - Certain sensors provide continuous values Sonar Motor encoders Q-Learning requires discrete inputs Group continuous values into discrete “buckets” [Mahadevan and Connell, 1992] Q-Learning produces discrete actions Forward Back-left/Back-right Q-Learning and Robots
  • Slide 8 - Creating Discrete Inputs Basic approach Discretize continuous values into sets Combine each discretized tuple into a single index Another approach Self-Organizing Map Induces a discretization of continuous values [Touzet 1997] [Smith 2002]
  • Slide 9 - Self-Organizing Map (SOM)‏ 2D Grid of Output Nodes Each output corresponds to an ideal input value Inputs can be anything with a distance function Activating an Output Present input to the network Output with the closest ideal input is the “winner”
  • Slide 10 - Applying the SOM Each input is a vector of sensor values Sonar Left/Right Bump Sensors Left/Right Motor Speeds Distance function is sum-of-squared-differences
  • Slide 11 - SOM Unsupervised Learning Present an input to the network Find the winning output node Update ideal input for winner and neighbors weightij = weightij + (α * (inputij – weightij)) Neighborhood function
  • Slide 12 - Experiments Implemented in Java (LeJOS 0.85) Each experiment 240 seconds (800 Q-Learning iterations) 36 States Three actions Both motors forward Left motor backward, right motor stopped Left motor stopped, right motor backward
  • Slide 13 - Rewards Either bump sensor pressed: 0.0 Base reward: 1.0 if both motors are going forward 0.5 otherwise Multiplier: Sonar value greater than 20 cm: 1 Otherwise, (sonar value) / 20
  • Slide 14 - Parameters Discount (γ): 0.5 Learning rate (α): 1/(1 + (t/100)), t is the current iteration (time step) Used for both SOM and Q-Learning [Smith 2002] Exploration/Exploitation Epsilon = α/4 Probability of random action Selected using weighted distribution
  • Slide 15 - Experimental Controls Q-Learning without SOM Qa States Current action (1-3) Current bumper states Quantized sonar values (0-19 cm; 20-39; 40+) Qb States Current bumper states Quantized sonar values (9) (0-11 cm…; 84-95; 96+)
  • Slide 16 - SOM Formulations 36 Output Nodes Category “a”: Length-5 input vectors Motor speeds, bumper values, sonar value Category “b”: Length-3 input vectors Bumper values, sonar value All sensor values normalized to [0-100]
  • Slide 17 - SOM Formulations QSOM Based on [Smith 2002] Gaussian Neighborhood Neighborhood size is one-half SOM width QT Based on [Touzet 1997] Learning rate is fixed at 0.9 Neighborhood is immediate Manhattan neighbors Neighbor learning rate is 0.4
  • Slide 18 - Quantitative Results
  • Slide 19 - Qualitative Results QSOMa Motor speeds ranged from 2% to 50% Sonar values stuck between 90% and 94% QSOMb Sonar values range from 40% to 95% Best two runs arguably the best of the bunch Very smooth SOM values in both cases
  • Slide 20 - Qualitative Results QTa Sonar values ranged from 10% to 100% Still a weak performer on average Best performer similar to QTb QTb Developed bump-sensor oriented behavior Made little use of sonar Highly uneven SOM values in both cases
  • Slide 21 - Experimental Area
  • Slide 22 - First Movie QSOMb Strong performer (Reward: 661.89) Minimum sonar value: 43.35% (110 cm)
  • Slide 23 - Second Movie Also QSOMb Typical bad performer (Reward: 451.6) Learns to avoid by always driving backwards Baseline “not-forward” reward: 400.0 Minimum sonar value: 57.51% (146 cm) Hindered by small filming area
  • Slide 24 - Discussion Use of SOM on NXT can be effective More research needed to address shortcomings Heterogeneity of sensors is a problem Need to try NXT experiments with multiple sonars Previous work involved homogeneous sensors Approachable by undergraduate students Technique taught in junior/senior AI course

Description : Available Encoding Robotic Sensor States for Q-Learning powerpoint presentation for free download which is uploaded by steve an active user in belonging ppt presentation Science & Technology category.

Tags : Encoding Robotic Sensor States for Q-Learning