Fleet environment
Subpackages
Submodules
Episode module
Environment module
- class fleetrl.fleet_env.fleet_environment.FleetEnv(env_config)[source]
Bases:
Env
FleetRL: Reinforcement Learning environment for commercial vehicle fleets. Author: Enzo Alexander Cording - https://github.com/EnzoCording Master’s thesis project, M.Sc. Sustainable Energy Engineering @ KTH Copyright (c) 2023, Enzo Cording
This framework is built on the gymnasium core API and inherits from it. __init__, reset, and step are implemented, calling other modules and functions where needed. Base-derived class architecture is implemented, and the code is structured in a modular manner to enable improvements or changes in the model.
Only publicly available data or own-generated data has been used in this implementation.
The agent only sees information coming from the chargers: SOC, how long the vehicle is still plugged in, etc. However, this framework matches the number of chargers with the number of cars to reduce complexity. If more cars than chargers should be modelled, an allocation algorithm is necessary. What is more, battery degradation is modelled in this environment. In this case, the information of the car is required (instead of the charger). Modelling is facilitated by matching cars and chargers one-to-one. Therefore, throughout the code, “car” and “ev_charger” might be used interchangeably as indices.
Note that this does not present a simplification from the agent perspective because the agent does only handles the SOC and time left at the charger, regardless of whether the vehicle is matching the charger one-to-one or not.
- adjust_caretaker_lunch_soc()[source]
The caretaker target SOC can be set lower during the lunch break to avoid unfair penalties occurring. This is because the break is not long enough to charge until 0.85 target SOC. :return: None -> sets the target SOC during lunch break hours to 0.65 by default
- auto_gen()[source]
This function automatically generates schedules as specified. Uses the ScheduleGenerator module. Note: this can take up to 1-3 hours, depending on the number of vehicles.
- Returns:
None -> The schedule is generated and placed in the input folder
- choose_observer()[source]
This function chooses the right observer, depending on whether to include price, building, PV, etc. :return: obs (Observer) -> The observer module to choose
- choose_time_picker(time_picker)[source]
Chooses the right time picker based on the specified in input string. Static: Always the same time is picked to start an episode Random: Start an episode randomly from the training set Eval: Start an episode randomly from the validation set :type time_picker: :param time_picker: (string), specifies which time picker to choose: “static”, “eval”, “random” :return: tp (TimePicker) -> time picker object
- close()[source]
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections.
- detect_dim_and_bounds()[source]
This function chooses the right dimension of the observation space based on the chosen configuration. Each increase of dim is explained below. The low_obs and high_obs are built in the normalizer object, using the dim value that was calculated in this function.
set boundaries of the observation space, detects if normalized or not.
If aux flag is true, additional information enlarges the observation space.
The following code goes through all possible environment setups.
Depending on the setup, the dimensions differ and every case is handled differently.
- Returns:
low_obs and high_obs: tuple[float, float] | tuple[np.ndarray, np.ndarray] -> used for gym.Spaces
- get_dist_factor()[source]
This function returns the distribution/laxity factor: how much time needed vs. how much time left at charger If factor is 0.1, the dist agent would only charge with 10% of the EVSE capacity. Call via env_method(“get_dist_factor”)[0] if using an SB3 Vectorized Environment :return: dist/laxity factor, float
- get_log()[source]
This function can be called through SB3 vectorized environments via VecEnv.env_method(“get_log”)[0] The zero index is required so only the first element -> the DataFrame is extracted
- Returns:
Log dataframe
- get_next_dt()[source]
Calculates the time delta from the current time step and the next one. This allows for csv input files that have irregular time intervals. Energy calculations will automatically adjust for the dynamic time differences through kWh = kW * dt
- Returns:
next time delta in hours
- get_next_minutes()[source]
Calculates the integer of minutes until the next time step. This therefore limits the framework’s current maximum resolution to discrete time steps of 1 min. This will be improved soon, as well as the dependency to know the future value beforehand.
- Returns:
Integer of minutes until next timestep
- print(action)[source]
The print function can provide useful information of the environment dynamics and the agent’s actions. Can slow down FPS due to the printing at each timestep
- Parameters:
action – Action of the agent
- Returns:
None -> Just prints information if specified
- render()[source]
Compute the render frames as specified by
render_mode
during the initialization of the environment.The environment’s
metadata
render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.- Note:
As the
render_mode
is known during__init__
, the objects used to render the environment state should be initialised in__init__
.
By convention, if the
render_mode
is:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()
andrender()
doesn’t need to be called. ReturnsNone
.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarray
with shape(x, y, 3)
representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str
) orStringIO.StringIO
containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollection
that is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list")
. The frames collected are popped afterrender()
is called orreset()
.
- Note:
Make sure that your class’s
metadata
"render_modes"
key includes the list of supported modes.
Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- reset(**kwargs)[source]
- Parameters:
kwargs – Necessary for gym inheritance
- Return type:
tuple
[array
,dict
]- Returns:
First observation (either normalized or not) and an info dict
- set_start_time(start_time)[source]
VecEnv.env_method(“set_start_time”, [f”{start_time}”]) Must parse the function and argument of start_time :type start_time:
str
:param start_time: string of pd.TimeStamp / date :return: None
- step(actions)[source]
The main logic of the EV charging problem is orchestrated in the step function. Input: Action on charging power for each EV Output: Next state, reward
Intermediate processes: EV charging model, battery degradation, cost calculation, building load, penalties, etc.
The step function runs as long as the done flag is False. Different functions and modules are called in this function to reduce the complexity and to distribute the tasks of the model.
- Parameters:
actions (
array
) – Actions parsed by the agent, from -1 to 1, representing % of kW of the EVSE- Return type:
tuple
[array
,float
,bool
,bool
,dict
]- Returns:
Tuple containing next observation, reward, done, truncated and info dictionary