We use a long-short equally-weighted portfolio backtest for all our models. Other papers have tended to use the top 10 long predictions and top 10 short predictions to build a portfolio but we prefer to include all predictions in our portfolio. We assume that we are able to buy at market open and liquidate at market close. Our backtest does not incorporate:

  1. Use of options/derivatives

  2. Self-financing portfolios

  3. Leverage

  4. Optimal sizing of trades: all positions are the same size

  5. Transaction costs

  6. Slippage

**Proof-of-Concept 1: Reuters dataset 1 2017-2020

Data is segmented into training data (2017-2018) and test data (2019-2020). Preprocessing of the text data for text normalization, stemming, lemmatization and extraction of stop words.

*Model accuracy on the validation dataset:

NTLK VADER Sentiment Analyzer - N/A

Linear Classifier - 53%

Sentimetre Model 1 - 53%

Sentimetre Model 2 - 57%

*Prediction accuracy on the test dataset:

NTLK VADER Sentiment Analyzer - 50%

Linear Classifier - 52%

Sentimetre Model 1 - 51%

Sentimetre Model 2 - 55%