Scenarios & Q-Learning
Index
Building scenario
By modifying the scenario I have not been able to figure out how to add static agents. The only ways I have managed to add agents have been through flows and trips.
Here I have encountered several problems:
- Neither of them allow me to create two agents within 7 units of each other.
- None of them allow me to create flows/trips separately, they have to be using a for loop for the creation of a list of flows/trips.
- Trips doesn’t work if no route is found to reach its destination, i.e. if the road has a transit speed=0 it doesn’t work, that’s why I put a speed of 10^-10.
- For the creation of static agents the speed of the lanes has been set to 0.
- For some reason it doesn’t allow me to add vehicles other than cars.
Lanes example:
1
2
3
<lane id="gneE3_0" index="0" speed="0.00000000000000000000000000000000001" length="200.00" shape="0.00,96.80 200.00,96.80"/>
<lane id="gneE3_1" index="1" speed="0" length="200.00" shape="0.00,100.00 200.00,100.00"/>
<lane id="gneE3_2" index="2" speed="0" length="200.00" shape="0.00,103.20 200.00,103.20"/>
Flows example:
1
2
3
4
5
6
7
8
9
10
11
12
13
flows=[
Flow(
route=Route(
begin=("gneE3", 1, (i+2) * 10),
end=("gneE3", 1, "max"),
),
rate=60,
begin=0,
end=40,
actors={normal: 1},
)
for i in range(2)
],
Trips example:
1
2
3
4
5
6
7
8
9
10
11
12
trips=[
Trip(
vehicle_name=f"car_{i+1}",
route=Route(
begin=("gneE3", 0, (i+1) * 8),
end=("gneE3", 0, "max")
),
depart=0,
actor=normal
)
for i in range(x_cars)
]
Getting started with Q-Learning
The documentation available for this topic is very sparse, so I had to look for examples from the python gym library.
Everything I have done from here on has been purely research as I am not entirely sure how it works.
To begin with, smarts uses a class to return feedback, rewards, information and even if the simulation is finished.
1
2
3
4
5
6
7
8
9
10
for episode in episodes(n=num_episodes):
agent = KeepLaneAgent()
observation, _ = env.reset()
episode.record_scenario(env.unwrapped.scenario_log)
terminated = False
while not terminated:
action = agent.act(observation)
observation, reward, terminated, truncated, info = env.step(action)
episode.record_step(observation, reward, terminated, truncated, info)
The first thing I realised is that we have no use for this class, as the default SMARTS rewards are created based on your distance to the destination. For this reason, I created a new class in which the rewards are decided according to the distance to the target (centre of the parking) and according to the minimum distance to the obstacles, the closer, the higher the penalty (always taking into account that to park you must necessarily get closer to the cars).
On the other hand, what I have so far investigated and understand less, is the functioning of the actions. In the SMARTS examples they are taken randomly. I have taken a python example in which q-learning is used https://github.com/Faust-Wang/DQN-ObstacleAvoidance-FixedWingUAV.