Post

Fixing RLidar and final steps q-learning 1D

Index

Fixing RLidar

For the functioning of RLidar SMARTS generates a series of rays that will be translated in different iterations depending on where the origin is located, so it is much lighter computationally than generating all the rays each time. For some reason, if the origin does not vary between iterations, the origin itself is detected as an obstacle. This generated erratic and unexpected behaviour.

Once the error has been located, the simplest way to deal with it has been to generate a small noise that varies the origin, so that it will never be the same two iterations in a row.

1
2
3
4
def add_origin_perturbation(origin, magnitude=0.001):
    """Add noise to origin"""
    perturbation = np.random.uniform(-magnitude, magnitude, size=2)
    return origin + np.array([perturbation[0], perturbation[1], 0]) 

Created test for inference phase

In this phase several programs have been created, one saves the q-table generated in the previous section by saving it in a file q_table.py that will be used as the basis for the test. Once the table has been generated after training, in this phase a test episode will be generated from the desired position in which the decisions taken will be based on the table obtained.

Random starting positions

This part was quite complicated, as it was not possible to modify the scenario to be able to start in random positions. A functionality has been added that generates a random number within a range, in this case -3,3 and adds it to the current pose generating a target. With a small algorithm the agent approaches that position and when it reaches or has passed the maximum number of episodes destined for that phase, the agent stops.

In this way we lose some episodes, but we make sure that we always start in random positions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def move_to_random_position(self, current_position, target_position, accelerate, steps, first_act):
        """Mueve el vehículo a una posición (target)."""

        distance = target_position - current_position
        action = 0

        # Determinar si avanzar o retroceder
        if accelerate == True:
            action = 8 if distance > 0 else -8

        # Paramos si estamos cerca o si llegamos a las maximas steps
        if abs(distance) < 0.25 or steps == MAX_ALIGN_STEPS:
            # print(f"finished, current pose: {current_position}")
            action = -first_act
                
        return np.array([action, 0.0])

##################................###########################
        while not terminated:
            # Mover a posicion aleatoria
            if not moved:
                # Indice 0 es el que se usa en nuestro escenario
                action = agent.move_to_random_position(observation["ego_vehicle_state"]["position"][0], target, accelerate, n_steps, first_action[0])
                accelerate = False
                
                if action[0] + first_action[0] == 0:
                    moved = True

                if n_steps == 0:                    
                    first_action = action
                    
                observation, _, _, _, _ = env.step((action[0],action[1]))
                n_steps = n_steps + 1
                # print(observation['ego_vehicle_state']['speed'])
            
            # Tenemos que asegurarnos que SIEMPRE gastamos MAX_ALIGN_STEPS steps, así no modificamos el entrenamiento
            elif n_steps <= MAX_ALIGN_STEPS:
                # print(observation['ego_vehicle_state']['speed'])
                observation, _, _, _, _ = env.step((0.0,0.0))
                n_steps = n_steps + 1
##################................###########################           

Example and data analysis

Here is the graph of the distance difference when the agent starts centred:

This post is licensed under CC BY 4.0 by the author.