Fleet environment

Subpackages

fleetrl.fleet_env.config package

Submodules

Episode module

class fleetrl.fleet_env.episode.Episode(time_conf)[source]

Bases: object

The Episode class holds all / most of the values that are episode-specific

Environment module

class fleetrl.fleet_env.fleet_environment.FleetEnv(env_config)[source]

Bases: Env

FleetRL: Reinforcement Learning environment for commercial vehicle fleets. Author: Enzo Alexander Cording - https://github.com/EnzoCording Master’s thesis project, M.Sc. Sustainable Energy Engineering @ KTH Copyright (c) 2023, Enzo Cording

This framework is built on the gymnasium core API and inherits from it. __init__, reset, and step are implemented, calling other modules and functions where needed. Base-derived class architecture is implemented, and the code is structured in a modular manner to enable improvements or changes in the model.

Only publicly available data or own-generated data has been used in this implementation.

The agent only sees information coming from the chargers: SOC, how long the vehicle is still plugged in, etc. However, this framework matches the number of chargers with the number of cars to reduce complexity. If more cars than chargers should be modelled, an allocation algorithm is necessary. What is more, battery degradation is modelled in this environment. In this case, the information of the car is required (instead of the charger). Modelling is facilitated by matching cars and chargers one-to-one. Therefore, throughout the code, “car” and “ev_charger” might be used interchangeably as indices.

Note that this does not present a simplification from the agent perspective because the agent does only handles the SOC and time left at the charger, regardless of whether the vehicle is matching the charger one-to-one or not.

adjust_caretaker_lunch_soc()[source]: The caretaker target SOC can be set lower during the lunch break to avoid unfair penalties occurring. This is because the break is not long enough to charge until 0.85 target SOC. :return: None -> sets the target SOC during lunch break hours to 0.65 by default

adjust_score_config()[source]

auto_gen()[source]

This function automatically generates schedules as specified. Uses the ScheduleGenerator module. Note: this can take up to 1-3 hours, depending on the number of vehicles.

Returns:: None -> The schedule is generated and placed in the input folder

change_markups()[source]

check_data_paths(input_path, schedule_path, spot_path, load_path, pv_path)[source]

choose_observer()[source]: This function chooses the right observer, depending on whether to include price, building, PV, etc. :return: obs (Observer) -> The observer module to choose

choose_time_picker(time_picker)[source]: Chooses the right time picker based on the specified in input string. Static: Always the same time is picked to start an episode Random: Start an episode randomly from the training set Eval: Start an episode randomly from the validation set :type time_picker: :param time_picker: (string), specifies which time picker to choose: “static”, “eval”, “random” :return: tp (TimePicker) -> time picker object

close()[source]

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections.

detect_dim_and_bounds()[source]

This function chooses the right dimension of the observation space based on the chosen configuration. Each increase of dim is explained below. The low_obs and high_obs are built in the normalizer object, using the dim value that was calculated in this function.

set boundaries of the observation space, detects if normalized or not.
If aux flag is true, additional information enlarges the observation space.
The following code goes through all possible environment setups.
Depending on the setup, the dimensions differ and every case is handled differently.

Returns:: low_obs and high_obs: tuple[float, float] | tuple[np.ndarray, np.ndarray] -> used for gym.Spaces

get_dist_factor()[source]: This function returns the distribution/laxity factor: how much time needed vs. how much time left at charger If factor is 0.1, the dist agent would only charge with 10% of the EVSE capacity. Call via env_method(“get_dist_factor”)[0] if using an SB3 Vectorized Environment :return: dist/laxity factor, float

get_log()[source]

This function can be called through SB3 vectorized environments via VecEnv.env_method(“get_log”)[0] The zero index is required so only the first element -> the DataFrame is extracted

Returns:: Log dataframe

get_next_dt()[source]

Calculates the time delta from the current time step and the next one. This allows for csv input files that have irregular time intervals. Energy calculations will automatically adjust for the dynamic time differences through kWh = kW * dt

Returns:: next time delta in hours

get_next_minutes()[source]

Calculates the integer of minutes until the next time step. This therefore limits the framework’s current maximum resolution to discrete time steps of 1 min. This will be improved soon, as well as the dependency to know the future value beforehand.

Returns:: Integer of minutes until next timestep

get_start_time()[source]: VecEnv.env_method(“get_start_time”)[0] :return: pd.TimeStamp

get_time()[source]: VecEnv.env_method(“get_time”)[0] :return: pd.TimeStamp: current timestamp

is_done()[source]: VecEnv.env_method(“is_done”)[0] :return: Flag is episode is done, bool

print(action)[source]

The print function can provide useful information of the environment dynamics and the agent’s actions. Can slow down FPS due to the printing at each timestep

Parameters:: action – Action of the agent
Returns:: None -> Just prints information if specified

read_config(conf_path)[source]

render()[source]

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:: As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
“rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
“ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:: Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(**kwargs)[source]

Parameters:: kwargs – Necessary for gym inheritance
Return type:: tuple[array, dict]
Returns:: First observation (either normalized or not) and an info dict

set_start_time(start_time)[source]: VecEnv.env_method(“set_start_time”, [f”{start_time}”]) Must parse the function and argument of start_time :type start_time: str :param start_time: string of pd.TimeStamp / date :return: None

specify_company_and_battery_size(use_case)[source]

step(actions)[source]

The main logic of the EV charging problem is orchestrated in the step function. Input: Action on charging power for each EV Output: Next state, reward

Intermediate processes: EV charging model, battery degradation, cost calculation, building load, penalties, etc.

The step function runs as long as the done flag is False. Different functions and modules are called in this function to reduce the complexity and to distribute the tasks of the model.

Parameters:: actions (array) – Actions parsed by the agent, from -1 to 1, representing % of kW of the EVSE
Return type:: tuple[array, float, bool, bool, dict]
Returns:: Tuple containing next observation, reward, done, truncated and info dictionary