Implementing an Advantage Actor-Critic (A2C) Reinforcement Learning System with FinRL (Python)
“- Give a man a fish and he’ll eat for a day.
- Teach a fish to eat men and you have a Spielberg blockbuster for a lifetime.” (Uncalled-for B/O Trading Blog BS)
The FinRL is an impressive project in many ways. Apart from the fact that a computer program is teaching itself how to place profitable trades it is also a shooting star Financial DRL library that has 8.9k Github stars, has over 100 contributors and a whopping 46k downloads from Github. Whoo-hoo!
In this tutorial we are going to create a Reinforcement Learning system with FinRL using the Advantage Actor-Critic (A2C) method for trading NASDAQ 100 stocks. Now, if that doesn’t let your trading heart beat faster, I don’t know…
This story is solely for general information purposes, and should not be relied upon for trading recommendations or financial advice. Source code and information is provided for educational purposes only, and should not be relied upon to make an investment decision. Please review my full cautionary guidance before continuing.
What is FinRL?
FinRL is an open-source framework developed by the AI4Finance community for financial reinforcement learning. It is specifically designed for automating trading using deep reinforcement learning (DRL), balancing exploration and exploitation to solve dynamic decision-making problems.
FinRL supports various data sources, markets, state-of-the-art DRL algorithms, benchmarks for numerous quant finance tasks, and live trading.
The FinRL offers a truck load of features right out of the box such as:
Built-in data loaders
Built-in data preprocessors
Out-of-the-box set of common technical indicators
A ton of supported DRL algorithms such as DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3.
Useful links:
FinRL Architecture
The diagram below shows the software architecture of the FinRL library. The bottom layer is the interface layer to the market environments. This layer contains historical data for backtesting, for example from the Yahoo! finance APIs or interfaces for retrieving price data from providers like Alpaca, CCXT or QuantConnect for live trading.
The middle layer is where the DRL implementations reside. This is where the different DRL algorithms are implemented, such as DQN, DDPG, TD3 or A2C. DRL agents learn from the market environment by applying actions to the market state and receiving feedback in form of rewards from the market environment to consider the impact of their actions.
The top layer describes the implementations for actually using the trained DRL models, e.g. for live stock or crypto trading or custom user-defined tasks.