Pseudo labeling method is training a model on a small set of labeled data along with a large amount of unlabeled data to improve or enhance a model performance.
We use a different implementation.
Our technique has 5 basic steps:
- We train the model on our labeled data and produce a backtest on our test data as a base model.
- Our psuedo-labeling involves using a VADER model to generate labels on our unlabelled dataset
- We then add the VADER labelled set to our training set and
- We then train the model on our new training data and produce a second backtest
- We compare the backtest of the pure training data vs the new training dataset.
In our implementation, we generate labels on an unlabelled sample of 12264 articles and add it to our pure training dataset of 60844 news articles.
- Pure data: Validation accuracy: 56.18/ Test accuracy: 54.36
- New data: Validation accuracy: 54.89/ Test accuracy: 54.52