Reuters Dataset 1 Sentimetre Model 2 Long-Short BackTest Top 5 Long, Top 5 Short Predictions
Reuters Dataset 1 Sentimetre Model 2 Long-Short BackTest Top 5 Long, Top 5 Short Predictions
**Backtest:
We assume that we are able to buy at market open and liquidate at market close. Our backtest does not incorporate:
Use of options/derivatives
Self-financing portfolios
Leverage
Optimal sizing of trades: all positions are the same size
Transaction costs
Slippage
**Top 5 position selection
For each day, we select the top 5 long positions and top 5 short position based on features and discard the remaining positions.
**Proof-of-Concept 1: Reuters dataset 1 2017-2020
Data is segmented into training data (2017-2018) and test data (2019-2020). Preprocessing of the text data for text normalization, stemming, lemmatization and extraction of stop words.
*Model accuracy on the validation dataset:
NTLK VADER Sentiment Analyzer - N/A
Linear Classifier - 53%
Sentimetre Model 1 - 53%
Sentimetre Model 2 - 57%
*Prediction accuracy on the test dataset:
NTLK VADER Sentiment Analyzer - 50%
Linear Classifier - 52%
Sentimetre Model 1 - 51%
Sentimetre Model 2 - 55%