Recolección de objetos con un brazo robótico y técnicas de aprendizaje por refuerzo

Iglesia Fernández-Tresguerres, Pablo

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/51527

Título :	Recolección de objetos con un brazo robótico y técnicas de aprendizaje por refuerzo
Autor :	Juhel, Philippe Iglesia Fernández-Tresguerres, Pablo Universidad Pontificia Comillas, Escuela Técnica Superior de Ingeniería (ICAI)
Fecha de publicación :	2021
Resumen :	La cuarta revoluci on industrial est a aqu y cambiar a la forma en que se producen los bienes, aumentando la e ciencia al aumentar la cantidad de procesos automatizados. Esto conducir a a una producci on m as r apida y una reducci on de errores, ya que las m aquinas tienen la capacidad de decidir y actuar en fracciones de segundo sin cometer errores. Adem as, las m aquinas tambi en pueden estar trabajando las 24 horas del d a deteni endose solo para controles de mantenimiento, lo que ayudar a a aumentar el factor de productividad sin incrementar el gasto en recursos humanos. La Industria 4.0 ha estado en nuestras conversaciones desde 2011, pero lo cierto es que a un no es una realidad. Estamos solo en el comienzo, y llevar a d ecadas realizar un cambio tan grande en la industria. Hay algunos factores a tener en cuenta para analizar la evoluci on de la industria en los pr oximos a~nos. La mejora en las telecomunicaciones con la llegada de las redes 5G, el dilema moral de sustituir trabajadores por m aquinas y el impacto que esto podr a tener en la sociedad o la mejora e implementaci on de tecnolog as de IA son solo algunos de estos factores. Este proyecto pretende dar un m nimo impulso a este ultimo grupo, implementando desde cero un algoritmo de Inteligencia Arti cial para realizar una tarea de Pick and Place con un brazo rob otico. El objetivo es coger de uno en uno objetos colocados de forma desordenada en una caja y deposit andolos en una segunda caja. Es posible que esta tarea no parezca muy compleja, porque ya estamos acostumbrados a ver robots haciendo virguer as de todo tipo, pero la gracia de este proyecto es que debe de ser el propio robot el que aprenda a realizar esta tarea por su cuenta, con la unica ayuda de ciertos sensores. >C omo se va a implementar? Utilizando un algoritmo de Reinforcement Learning, que es un area del Machine Learning cuyo objetivo es precisamente que un agente (En este caso el robot) sea capaz de interaccionar con su entorno, realizando una serie de acciones que le permitan alcanzar un objetivo. Este algoritmo est a inspirado en la forma de aprender de los animales. Igual que los perros aprenden a base de recibir premios cuando hacen cosas bien o castigos cuando hacen cosas mal, el agente de Reinforcement Learning aprende de recibir una recompensa positiva o negativa. No tiene conocimiento de cu al es su objetivo nal, y no le importa, porque para el, su objetivo es maximizar la recompensa obtenida. El entrenamiento de Reinforcement Leanining est a dividido en pasos. En cada paso, el agente se encarga de realizar una acci on y, por ello, recibe una recompensa que puede ser positiva, negativa o neutra, en funci on de si se est a acercando o alejando del objetivo nal. Las recompensas son un hiperpar ametro del algoritmo, y la correcta de nici on de estas, y su alineaci on con el objetivo nal del algoritmo, es uno de los aspectos m as cr ticos del Reinforcement Learning, ya que el objetivo del agente es maximizar estas recompensas. La implementaci on de algoritmos de Reinforcement Learning en problemas del mundo real es muy complicado, ya que el agente debe de recibir informaci on de su entorno antes de tomar la decisi on de qu e acci on es la mejor para maximizar la recompensa. A continuaci on, se mostrar an im agenes de la arquitectura del sistema implementado para poder comprender mejor las necesidades del algoritmo. Fig. 0.1: Imagen de la arquitectura I Fig. 0.2: Imagen de la arquitectura I En las guras podemos ver al robot utilizado, un UR3 de Universal Robots, el m as peque~no de la familia. Adem as, tambi en se ven las dos cajas de inicio y n, que est an marcadas en verde, todos los nodos computacionales necesarios, que est an marcados en azul, y los sensores y actuadores, que est an marcados en naranja. De los ordenadores y la arquitectura l ogica hablaremos m as adelante, pero es importante pararse un poco en los actuadores, ya que, al n y al cabo, son los que interactuaran con el entorno. Al robot se le ha instalado un gripper desarrollado por nosotros en ICAM, y que se encarga de coger las chas por vac o. El sistema es muy simple, la punta del robot es una ventosa, que se aplasta al entrar en contacto con una pieza. Cuando esta ventosa est a aplastada, se enciende una bomba aspiradora de aire que genera vac o entre la ventosa y la pieza. Finalmente se lleva la pieza a su destino nal y se apaga la bomba, haciendo que la pieza caiga. Adem as, el gripper tambi en contiene dos sensores important simos para poder implementar el algoritmo, que permiten detectar cu ando la ventosa est a en contacto con una cha y, tras encender el gripper, si este ha sido capaz de coger la cha o no. Por ultimo, en la imagen se pueden ver dos c amaras, cuyas im agenes ser an la informaci on que utilice el agente para decidir que acci on hacer en cada paso. Una de las c amaras est a instalada en el propio gripper, d andole al agente una visi on parcial del entorno, y la otra est a instalada en una estructura que le permite dar una vista global del entorno. En la Figure 0.9, se muestra la visi on desde ambas c amaras. Fig. 0.3: Visi on de las c amaras de a bordo y superior Todos estos componentes se ven re ejados en una arquitectura l ogica compleja pero necesaria. El esquema de la arquitectura se muestra en la Figure 0.10 y se ha implementado utilizando ROS (Robot Operative System). Este framework, que se est a convirtiendo en un est andar en la industria, nos permite controlar el brazo rob otico de forma remota, as como crear una arquitectura distribuida que genera una exibilidad enorme a la hora de desarrollar soluciones. Fig. 0.4: Arquitectura l ogica del sistema En nuestro caso, tener una arquitectura distribuida nos ha permitido utilizar diferentes versiones de Python y de librer as en cada nodo, utilizar ordenadores con diferentes potencias en funci on de las necesidades (desde raspberry pi a un ordenador optimizado para AI, con una de las GPUs m as potentes del mercado) y a separar el c odigo en unidades l ogicas de tal forma que el desarrollo de un nodo no afecte a todos los dem as. En la Figure 0.10, podemos ver los siguientes nodos, que se comunican entre s : { Universal Robots Driver: Drivers para controlar el robot de forma remota. { Robot Controller: Es el nodo central del sistema. Aqu est an implementadas todas las acciones y se dedica a repetir una y otra vez las acciones que el AI Manager indique. { Gripper: Es un nodo implementado en una tarjeta Arduino y que est a en contacto continuo con el Robot controles para ejecutar las acciones. { Upper Camera: Nodo que publica constantemente en un topic las im agenes captadas por la c amara superior. { Onboard Camera: Nodo que publica constantemente en un topic las im agenes captadas por la c amara de a bordo. { AI Manager: Es la inteligencia del sistema. Aqu est a implemen- tado el algoritmo de Reinforcement Learning. El Robot Controller representa al agente, que hace una acci on y le pide al AI Manager que calcule la siguiente acci on para maximizar la recompensa. El AI Manager recoge la imagen del estado del nodo Onboard Camera y calcula la acci on. Una vez que ya conocemos toda esta informaci on de la arquitectura, podemos continuar con el algoritmo de Reinforcement Learning. Se han de nido 5 acciones b asicas: Norte, Sur, Este, Oeste o Pick. El Agente debe de ser capaz de vaciar la caja de chas con una combinaci on de estas 5 acciones. Para ello, se han de nido las siguientes recompensas: { -10, por salirse de los l mites de la caja { -10 por hacer un pick falso { 10 por hacer un pick exitoso { -1 por cualquier otra acci on Para poder procesar toda la informaci on de las im agenes y lidiar con el mundo real, en este proyecto se ha utilizado una versi on avanzada de Reinforcement Learning, el Deep Reinforcement Learning. Esta variante es realmente una mezcla entre Reinforcement Learning y Convolutional Neural Networks, que son unas redes neuronales optimizadas para procesar y clasi car im agenes. En Deep Reinforcement Learning utilizaremos una primera capa convolucional para poder extraer las caracter sticas (o informaci on importante) de la imagen e introducirlas en una nueva red neuronal, esta vez densa, que nos permita calcular un Q Value (O recompensa esperada) para cada acci on. Este algoritmo se explica en gran profundidad en el estado del arte de este documento, as como su implementaci on desarrollada para este proyecto en particular, en la secci on \Developed System". Este proyecto, as como el sistema y el algoritmo utilizado son muy complejos, y era inabarcable alcanzar una versi on nal del algoritmo en tan solo 5 meses. Sin embargo, se han obtenido unos resultados muy positivos, cumpliendo todos los objetivos marcados al principio del proyecto. Como muestra de ello, se pueden observar en las siguientes gr a cas la evoluci on del agente en las siguientes m etricas: acciones por episodio y picks por episodio. Fig. 0.5: Evoluci on de los pasos por episodio Fig. 0.6: Evoluci on de los picks por episodio Estas muestras son solo un peque~no avance de los resultados del proyecto. Para ver un informe completo de resultados, se debe ir a la secci on Results de este documento. En cualquier caso, estos resultados demuestran que la implantaci on del algoritmo de Reinforcement Learning ha sido exitosa, mejorando enormemente los resultados del algoritmo aleatorio. The fourth industrial revolution is here, and it will change the way that goods are produced, raising e ciency by increasing the amount of automated processes. This will lead to a faster production and a reduction of errors, as machines have the ability to decide and act in fractions of seconds without making mistakes. Furthermore, machines can also be working 24 hours per day stopping just for maintenance checks, which would help to increase the productivity factor without increasing the expense in human resources. Industry 4.0 have been in our conversations since 2011, but the truth is that it is not a reality yet. We are just in the beginning, and it will take decades to perform such a big change in the industry. There are some factors to take in mind in order to analyse the evolution of the industry in the following years. The improvement on the telecommunications with the arrival of 5G networks, the moral dilemma of substituting workers for machines and the impact that this could have in the society or the improvement and implementation of AI technologies are just some of these factors. This project aims to give a minimum boost to this last group, implementing from scratch an Arti cial Intelligence algorithm to perform a Pick and Place task with a robotic arm. The goal is to pick up one item at a time that is randomly placed in a box and place it in a second box. It is possible that this task does not seem very complex, because we are already used to seeing robots doing all kinds of tricks, but the grace of this project is that it must be the robot itself that learns to perform this task on its own, with the only help from certain sensors. How is it going to be implemented? Using a Reinforcement Learning algorithm, which is an area of Machine Learning whose objective is precisely for an agent (in this case the robot) to be able to interact with its environment, performing a series of actions that will allow it to reach a goal. This algorithm is inspired by the way animals learn. Just as dogs learn by receiving rewards when they do well or punishments when they do wrong, the Reinforcement Learning agent learns from receiving a positive or negative reward. He has no knowledge of what his ultimate goal is, and he doesn't care, because for him, his goal is to maximize the reward earned. Reinforcement Leanining training is divided into steps. In each step, the agent is in charge of taking an action and, for this reason, receives a reward that can be positive, negative or neutral, depending on whether he is approaching or moving away from the nal goal. Rewards are a hyperparameter of the algorithm, and the correct de nition of these, and their alignment with the nal objective of the algorithm, is one of the most critical aspects of Reinforcement Learning, since the objective of the agent is to maximize these rewards. The implementation of Reinforcement Learning algorithms in real-world problems is very complicated, since the agent must receive information from his environment before making the decision of which action is the best to maximize the reward. Next, images of the architecture of the implemented system will be shown to better understand the needs of the algorithm. Fig. 0.7: Imagen de la arquitectura I Fig. 0.8: Imagen de la arquitectura I In the gures we can see the robot used, a UR3 from Universal Robots, the smallest of the family. In addition, you can also see the two start and end boxes, which are marked in green, all the necessary computational nodes, which are marked in blue, and the sensors and actuators, which are marked in orange. We will talk about computers and logical architecture later, but it is important to pause a bit on the actuators, since, after all, they are the ones that will interact with the environment. The robot has been installed a gripper developed by us at ICAM, which is responsible for taking the chips by vacuum. The system is very simple, the tip of the robot is a suction cup, which collapses when it comes into contact with a part. When this suction cup is crushed, an air suction pump is turned on that generates a vacuum between the suction cup and the part. Finally the piece is carried to its nal destination and the bomb is turned o , causing the piece to fall. In addition, the gripper also contains two very important sensors to be able to implement the algorithm, which allow detecting when the suction cup is in contact with a token and, after turning on the gripper, whether it has been able to grasp the token or not. Finally, in the image you can see two cameras, whose images will be the information used by the agent to decide what action to take at each step. One of the cameras is installed in the gripper itself, giving the agent a partial view of the environment, and the other is installed in a structure that allows him to give a global view of the environment. In the Figure 0.9, the view from both cameras is shown. Fig. 0.9: Visi on de las c amaras de a bordo y superior All these components are re ected in a complex but necessary logical architecture. The architecture scheme is shown in Figure 0.10 and has been implemented using ROS (Robot Operative System). This framework, which is becoming a standard in the industry, allows us to control the robotic arm remotely, as well as create a distributed architecture that generates enormous exibility when developing solutions. Fig. 0.10: Arquitectura l ogica del sistema In our case, having a distributed architecture has allowed us to use di erent versions of Python and libraries in each node, to use computers with di erent powers depending on the needs (from raspberry pi to a computer optimized for AI, with one of the most powerful GPUs powerful on the market) and to separate the code into logical units in such a way that the development of one node does not a ect all the others. In the Figure 0.10, we can see the following nodes, which communicate with each other: { Universal Robots Driver: Drivers to control the robot remotely. { Robot Controller: It is the central node of the system. Here all the actions are implemented and it is dedicated to repeating over and over again the actions that the AI Manager indicates. { Gripper: It is a node implemented in an Arduino board and that is in continuous contact with the Robot controls to execute the actions. { Upper Camera: Node that constantly publishes the images captured by the upper camera on a topic. { Onboard Camera: Node that constantly publishes the images captured by the on-board camera on a topic. { AI Manager: It is the intelligence of the system. Here the Rein- forcement Learning algorithm is implemented. The Robot Controller represents the agent, who takes an action and asks the AI Manager to calculate the next action to maximize the reward. The AI Manager collects the state image from the Onboard Camera node and calculates the action. Once we know all this information about the architecture, we can continue with the Reinforcement Learning algorithm. 5 basic actions have been de ned: North, South, East, West or Pick. The Agent must be able to empty the chip box with a combination of these 5 actions. For this, the following rewards have been de ned: { -10, for going outside the box { -10 for making a false pick { 10 for making a successful pick { -1 for any other action In order to process all the information in the images and deal with the real world, an advanced version of Reinforcement Learning, Deep Reinforcement Learning, has been used in this project. This variant is really a mix between Reinforcement Learning and Convolutional Neural Networks, which are neural networks optimized to process and classify images. In Deep Reinforcement Learning we will use a rst convolutional layer to be able to extract the characteristics (or important information) from the image and introduce them into a new neural network, this time dense, that allows us to calculate a Q Value (or expected reward) for each action. This algorithm is explained in great depth in the state of the art of this document, as well as its implementation developed for this particular project, in the \Developed System" section. This project, as well as the system and the algorithm used, are very complex, and it was impossible to reach a nal version of the algorithm in just 5 months. However, very positive results have been obtained, meeting all the objectives set at the beginning of the project. As an example of this, the evolution of the agent in the following metrics can be observed in the following graphs: actions per episode and picks per episode. Fig. 0.11: Evoluci on de los pasos por episodio Fig. 0.12: Evoluci on de los picks por episodio These samples are just a small preview of the project results. To see a full results report, go to the Results section of this document. In any case, these results show that the implementation of the Reinforcement Learning algorithm has been successful, greatly improving the results of the random algorithm.
Descripción :	Máster Universitario en Ingeniería de Telecomunicación
URI :	http://hdl.handle.net/11531/51527
Aparece en las colecciones:	H67-Trabajos Fin de Máster

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Autoria.pdf	Autorización	291,34 kB	Adobe PDF	Visualizar/Abrir
Master thesis report - Pablo Iglesia.pdf	Trabajo Fin de Máster	7,19 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem