Final steps q-learning 1D

Posted Jan 6, 2025 Updated Jan 8, 2025

By David Duro 3 min read

Index

Integrating RLidar enhancement into SMARTS
Adding linear velocity as a state
Exponential drop in training speed
Random starting positions
Example and data analysis

Integrating RLidar enhancement into SMARTS

To achieve this, the custom class ParkingAgent has been modified by filtering and modifying the dictionary key obs[‘lidar_point_cloud’][‘point_cloud’]. This function has been used for this purpose:

  
def filtrate_lidar(self, lidar_data: np.ndarray, car_pose: np.ndarray, heading: float) -> np.ndarray:
    """
    Transforma los puntos LIDAR para que sean relativos al vehículo, con el índice 0 a 90° a la izquierda del agente.

    Args:
        lidar_data (np.ndarray): Datos del LIDAR en coordenadas absolutas.
        car_pose (np.ndarray): Posición actual del vehículo en coordenadas absolutas.
        heading (float): Ángulo de orientación del vehículo en radianes.

    Returns:
        np.ndarray: Datos LIDAR transformados en coordenadas relativas.
    """
    # Asignar 'inf' a los puntos nulos (donde todo es [0, 0, 0])
    lidar_data[np.all(lidar_data == [0, 0, 0], axis=1)] = float('inf')

    # Calcular puntos relativos
    relative_points = lidar_data - car_pose

    heading_deg = np.degrees(heading)
    num_points = len(lidar_data)
    lidar_resolution = 360 / num_points

    shift = int(round((heading_deg-90) / lidar_resolution))
    # Aplicar el desplazamiento circular
    rotated_lidar = np.roll(relative_points, shift=shift, axis=0)

    return rotated_lidar

It works in a simple way, first all null measurements are set to infinite so that they do not disturb the process of moving to relative coordinates. Once we have all the data in relative coordinates, we have to remember that the data is always taken from the absolute orientation 0 radians. We calculate the current orientation of the vehicle and we pass it to angles to calculate the number of positions we have to roll our array (subtract 90º to ensure that we start the sweep from the left of the vehicle). The python function np.roll makes things much simpler for us as it shifts the array in a circular way.

Adding linear velocity as a state

To add the linal velocity, it has been relatively simple, in fact as it was very modularised we only had to modify the function get_state, which now returns a tuple with the discretised distance difference and the current velocity.

Usage example:

  
def get_state(self, observation):
    """Extrae y discretiza el estado basado en el LiDAR y la velocidad del vehículo."""
    lidar_data = observation["lidar_point_cloud"]["point_cloud"]

    distances = np.linalg.norm(lidar_data, axis=1)
    lidar_resolution = 360 / len(distances)
    index_90 = int(round(90 / lidar_resolution))
    index_270 = int(round(270 / lidar_resolution))
    distance_90 = self.discretize(distances[index_90])
    # print(f"Distancia delante: {distance_90}")
    distance_270 = self.discretize(distances[index_270])
    # print(f"Distancia detras: {distance_270}")
    distance_difference = self.discretize(distance_270 - distance_90)

    velocity = observation['ego_vehicle_state']['speed']
    discretized_velocity = self.discretize(velocity, step=1, max_value=20)

    return (distance_difference, discretized_velocity)

Exponential drop in training speed

This has been found not to be the case in 100% of cases and in fact the increase has dropped significantly. So far I have managed to complete more than 800 episodes, which was unthinkable before.

Random starting positions

It has not been possible to make much progress on this part for a number of reasons. In the scenario it is not possible to set multiple routes, you only have the possibility to set one route for the agent. On the other hand, there is a class called RandomRoute which generates random routes in each episode. SMARTS has created this class with the purpose of generating random routes as long as there are more than 1 road in the scenario (which is not the case). I have tried to modify this behaviour, but there are several nested classes that need several roads to generate the xml file with the routes. In fact, SMARTS has a comment that they are working on it to be able to generate random routes on a single road so that it starts on a random lane/position.

Example and data analysis

example video

As it has always been trained from the same position, another simulation has been made with the q-table already formed, i.e. with the agent trained but from another position (one never registered as a state). The initial behaviour seems quite reasonable, as it takes a while for it to learn what to do with these new states, but once it has learnt I don’t understand what happens…

phase 2

This post is licensed under CC BY 4.0 by the author.