Starting DQN 2D
Index
Finding the optimal parking spot
To make it more robust, the points are now divided into two sets with a minimum distance (the distance necessary for a gap to exist). Once separated into two subsets, the corners of each are extracted and the midpoint of both points is calculated. The implementation of the depth is still missing.
To calculate the target dynamically what is done is to save the position of the car park relative to the initial position and as the vehicle moves we calculate the relative position of the car park with respect to the current position of the vehicle.
- The corners of the vehicles are shown
- The middle point (black) is shown, (target, missing to add width/2 of the vehicle)
- A line is shown from the initial position to the target to show that it is in relative to the start
- The current (blue) position of the vehicle is shown.
- A line is shown marking the position of the target relative to the current position (displayed in blue).
Starting DQN
The problem with Q-Learning is based on the fact that the Q-table (which is a dictionary) now contained more than 10000 states, which slowed down the execution and made learning impossible. For this reason, it was decided to upgrade to DQN.
For the first tests we will use the most common neural network, two hidden layers and an output layer which will return the q-values for the possible actions. As an activation function we will use relu()
and finally as an optimiser Adam
and as a loss function the mean square error.
1
2
3
4
5
6
7
8
9
10
11
class DQN(nn.Module):
def __init__(self, state_size, action_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(state_size, 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, action_size)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return self.fc3(x)