Reuters Dataset 1 Sentimetre Model 2 Long-Short BackTest
Reuters Dataset 1 Sentimetre Model 2 Long-Short BackTest
**Backtest:
We use a long-short equally-weighted portfolio backtest for all our models. Other papers have tended to use the top 10 long predictions and top 10 short predictions to build a portfolio but we prefer to include all predictions in our portfolio. We assume that we are able to buy at market open and liquidate at market close. Our backtest does not incorporate:
Use of options/derivatives
Self-financing portfolios
Leverage
Optimal sizing of trades: all positions are the same size
Transaction costs
Slippage
**Proof-of-Concept 1: Reuters dataset 1 2017-2020
Data is segmented into training data (2017-2018) and test data (2019-2020). Preprocessing of the text data for text normalization, stemming, lemmatization and extraction of stop words.
*Model accuracy on the validation dataset:
NTLK VADER Sentiment Analyzer - N/A
Linear Classifier - 53%
Sentimetre Model 1 - 53%
Sentimetre Model 2 - 57%
*Prediction accuracy on the test dataset:
NTLK VADER Sentiment Analyzer - 50%
Linear Classifier - 52%
Sentimetre Model 1 - 51%
Sentimetre Model 2 - 55%