Post

Starting Q-Learning 2D

Index

Finding the optimal parking spot

For now a simple solution has been created which consists of knowing on which axes the obstacles are located, once we know if the depth is the x or y axis, we simply divide the point cloud into two subsets, one for each obstacle and calculate the closest point on the depth axis to the vehicle. In this way we can obtain the corners in a relatively simple way.

Modifying states

1
2
3
4
5
6
7
return (
    self.discretize(distance_to_target), 
    self.discretize(heading_error, step=0.1, max_value=np.pi), 
    discretized_speed, 
    distance_difference,
    discretized_min_distance
)

Modifying rewards

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 1. Distancia al objetivo (el target ya está en relativas)
dist_to_target = np.linalg.norm(target_pose)
distance_reward = max(0, 1 - dist_to_target) * 100  # Máx: 100, Mín: 0

# 2. Recompensa por orientación (solo si está cerca del parking)
if dist_to_target < 0.5:
    orient_diff = np.abs(np.arctan2(np.sin(car_orient - target_orient), np.cos(car_orient - target_orient)))
    orientation_reward = max(0, 1 - orient_diff / np.pi) * 50  # Máx: 50, Mín: 0
else:
    orientation_reward = 0

# 3. Penalización por velocidad (si está cerca y va rápido)
if dist_to_target < 1 and speed > 2:
    speed_penalty = -10
else:
    speed_penalty = 0

# 4. Bonificación por detenerse correctamente
if dist_to_target < 0.5 and abs(speed) < 0.1:
    stopping_bonus = 50
else:
    stopping_bonus = 0

# 5. Penalización por colisión (usando la menor distancia del LiDAR)
min_lidar_dist = np.min(np.linalg.norm(lidar_data, axis=1)) if len(lidar_data) > 0 else np.inf
if min_lidar_dist < 0.3:
    collision_penalty = -50
else:
    collision_penalty = 0

# 6. Recompensa total
reward = (
    distance_reward 
    + orientation_reward 
    + speed_penalty 
    + stopping_bonus 
    + collision_penalty
)

Coordinate and orientation problem

In SMARTS all scenarios are parallel to one axis, i.e. there are no ‘diagonal’ roads. Taking this into account it is more or less easy to do the calculations, as it always varies on one axis or the other, either towards negative or positive coordinates. For the current approach this is used as a support, tests have been done in scenarios with different orientations (always parallel to an axis) and it still works, but if they were diagonals it wouldn’t work…

This post is licensed under CC BY 4.0 by the author.