Final steps q-learning 1D
Index
- Integrating RLidar enhancement into SMARTS
- Adding linear velocity as a state
- Exponential drop in training speed
- Random starting positions
- Example and data analysis
Integrating RLidar enhancement into SMARTS
To achieve this, the custom class ParkingAgent
has been modified by filtering and modifying the dictionary key obs[‘lidar_point_cloud’][‘point_cloud’]
. This function has been used for this purpose:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def filtrate_lidar(self, lidar_data: np.ndarray, car_pose: np.ndarray, heading: float) -> np.ndarray:
"""
Transforma los puntos LIDAR para que sean relativos al vehículo, con el índice 0 a 90° a la izquierda del agente.
Args:
lidar_data (np.ndarray): Datos del LIDAR en coordenadas absolutas.
car_pose (np.ndarray): Posición actual del vehículo en coordenadas absolutas.
heading (float): Ángulo de orientación del vehículo en radianes.
Returns:
np.ndarray: Datos LIDAR transformados en coordenadas relativas.
"""
# Asignar 'inf' a los puntos nulos (donde todo es [0, 0, 0])
lidar_data[np.all(lidar_data == [0, 0, 0], axis=1)] = float('inf')
# Calcular puntos relativos
relative_points = lidar_data - car_pose
heading_deg = np.degrees(heading)
num_points = len(lidar_data)
lidar_resolution = 360 / num_points
shift = int(round((heading_deg-90) / lidar_resolution))
# Aplicar el desplazamiento circular
rotated_lidar = np.roll(relative_points, shift=shift, axis=0)
return rotated_lidar
It works in a simple way, first all null measurements are set to infinite so that they do not disturb the process of moving to relative coordinates. Once we have all the data in relative coordinates, we have to remember that the data is always taken from the absolute orientation 0 radians. We calculate the current orientation of the vehicle and we pass it to angles to calculate the number of positions we have to roll our array (subtract 90º to ensure that we start the sweep from the left of the vehicle). The python function np.roll
makes things much simpler for us as it shifts the array in a circular way.
Adding linear velocity as a state
To add the linal velocity, it has been relatively simple, in fact as it was very modularised we only had to modify the function get_state
, which now returns a tuple with the discretised distance difference and the current velocity.
Usage example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def get_state(self, observation):
"""Extrae y discretiza el estado basado en el LiDAR y la velocidad del vehículo."""
lidar_data = observation["lidar_point_cloud"]["point_cloud"]
distances = np.linalg.norm(lidar_data, axis=1)
lidar_resolution = 360 / len(distances)
index_90 = int(round(90 / lidar_resolution))
index_270 = int(round(270 / lidar_resolution))
distance_90 = self.discretize(distances[index_90])
# print(f"Distancia delante: {distance_90}")
distance_270 = self.discretize(distances[index_270])
# print(f"Distancia detras: {distance_270}")
distance_difference = self.discretize(distance_270 - distance_90)
velocity = observation['ego_vehicle_state']['speed']
discretized_velocity = self.discretize(velocity, step=1, max_value=20)
return (distance_difference, discretized_velocity)
Exponential drop in training speed
This has been found not to be the case in 100% of cases and in fact the increase has dropped significantly. So far I have managed to complete more than 800 episodes, which was unthinkable before.
Random starting positions
It has not been possible to make much progress on this part for a number of reasons. In the scenario it is not possible to set multiple routes, you only have the possibility to set one route for the agent. On the other hand, there is a class called RandomRoute which generates random routes in each episode. SMARTS has created this class with the purpose of generating random routes as long as there are more than 1 road in the scenario (which is not the case). I have tried to modify this behaviour, but there are several nested classes that need several roads to generate the xml file with the routes. In fact, SMARTS has a comment that they are working on it to be able to generate random routes on a single road so that it starts on a random lane/position.
Example and data analysis
As it has always been trained from the same position, another simulation has been made with the q-table already formed, i.e. with the agent trained but from another position (one never registered as a state). The initial behaviour seems quite reasonable, as it takes a while for it to learn what to do with these new states, but once it has learnt I don’t understand what happens…