Sentimetre is building a multilingual sentiment analysis engine, using big data and predictive analytics to optimize investment research and portfolio construction. Our specialty is developing models for aggregating trade signals from unstructured data sets. Our models are targeted towards generating alpha from financial statements, regulatory announcements, news articles and Twitter.

Equity and derivative traders are increasingly seeking out ‘alternative data’ and novel tradingsignals, to gain an edge in an automated market where participants are getting symmetrical access to structured data. Investors are looking for unique alpha that is uncorrelated to traditional stock selection factors and resilient enough to develop a trading strategy around. The current NLP market deploys VADER and linear classifier models that were perfected on English and English lexicon but cannot be extended to other low-resource languages. Sentimetre has developed a language agnostic, deep learning model that can be applied across multiple languages.

Our primary asset class is equities, and using the predictions on underlying equities to predict stock indexes. Our asset universe is currentlly limited to our access to data but the basic concepts are easily expandable to other assets.

Application of artificial intelligence to finance suffers from:

  1. Labelled data are unavailable.
  2. Clean, up-to-date reference financial data is expensive.
  3. Data sets are available in high-resource languages like English but not in low-resource languages e.g. CoFiF corpus in French

Sentimetre’s proof-of-concept was built on tthe following datasets:

  1. Reuters dataset 1 2017-2020 : ( 86.297 news articles
  2. Reuters dataset 2 2007-2016 : ( 150,802 news articles
  3. 10Q quartely financial statements 1993 -2020 EDGAR: Proved to be too noisy
  4. 8K financial statements 1993 -2020 EDGAR: Proved to be too noisy

Models tested:

  1. NTLK VADER Sentiment Analyzer
  2. Linear Classifier
  3. Sentimetre Model 1
  4. Sentimetre Model 2

Model variations:

  1. Tokenizers: spaCy, SentencePiece, BPE
  2. Headlines vs headlines plus articles text vs articles text


We use a long-short equally-weighted portfolio backtest for all our models. Other papers are tended to use the top 10 long predictions and top 10 short predictions to build a portfolio but we prefer to include all predictions in our portfolio , may consequently lead to underperformance. We assume that we are able to buy at market open and liquidate at market close. We don’t take into account transaction costs and slippage as we did not have the adequate resources.

  1. Use of options/derivatives

  2. Self-financing portfolios

  3. Leverage

  4. Optimal sizing of trades: all positiona are the same size

  5. Transaction costs

  6. Slippage

Proof-of-Concept 1: Reuters dataset 1 2017-2020

Data is segmented into training data (2017-2018) and test data (2019-2020). Preprocessing of the text data for text normalization, stemming, lemmatization and extraction of stop words.

Model accuracy on the validation dataset:

  1. NTLK VADER Sentiment Analyzer - N/A
  2. Linear Classifier - 53%
  3. Sentimetre Model 1 - 53%
  4. Sentimetre Model 2 - 57%

Prediction accuracy on the test dataset:

  1. NTLK VADER Sentiment Analyzer - 50%
  2. Linear Classifier - 52%
  3. Sentimetre Model 1 - 51%
  4. Sentimetre Model 2 - 55%

Proof-of-Concept 2: Reuters dataset 2 2007-2016; 150,802 articles

Given that this data set was collected by web scraping not a clean dataset, it presented problems when fed into the models and was thus held out as a second test dataset. The best performing model ,Sentimetre Model 2, was used to predict on this dataset and the charts are provided below.

Meta-model Analysis - Model Stacking

Meta-model analysis was carried out to see if the different models predict better on the different feature spaces of the test data set. Model outputs were trained as inputs into gradient boosting models (Catboost, LGBM and ExtraTreesClassifiers).

The CatBoost meta-model accuracy (56%) improved on the best accuarcy score (55%) and went on to outperform the SentiMetre Model 2 ranked by return.

Model Sanity check

Model sanity check was carried out across the different models to compare returns; given that models are picking different news articles, the check was to make sure a lower accuracy model was not picking news articles that led to higher average returns and thus would lead to a situation where lower accuracy models outperform higher accuracy models.