The model we use for sentiment analysis is the same one we use for the LSTM language model, except that the last output dimension is the number of sentiment classes instead of the vocabulary size. This sameness allows the sentiment analysis model to use the model pretrained on the language model for this task. You can choose to train the sentiment analysis task from scratch, or from the pretrained language model.
In this model, each source sentence is run through the LSTM cells. The last hidden state at the end of the sequence is then passed into the output projection layer before softmax is performed to get the predicted sentiment. If the parameter
use_cell_state is set to True, the last cell state at the end of the sequence is concatenated to the last hidden state.
The datasets we currently support include SST (Stanford Sentiment Treebank) and IMDB reviews.
|Model description||Config file||Checkpoint|
The model specification and training parameters can be found in the corresponding config file.
The SST (Stanford Sentiment Treebank) dataset contains of 10,662 sentences, half of them positive, half of them negative. These sentences are fairly short with the median length of 19 tokens. You can download the pre-processed version of the dataset here <https://github.com/NVIDIA/sentiment-discovery/tree/master/data/binary_sst>. The pre-processed dataset contains the files train.csv, valid.csv, test.csv. The dalay layer used to process this dataset is called SSTDataLayer.
The IMDB Dataset contains 50,000 labeled samples of much longer length. The median length is 205 tokens. Half of them are deemed positive and the other half negative. The train set, which contains of 25,000 samples, is separated into a train set of 24,000 samples and a validation set of 1,000 samples. The dalay layer used to process this dataset is called SSTDataLayer. The dataset can be downloaded here <http://ai.stanford.edu/~amaas/data/sentiment/>.
If you want to use a trained language model for this task, make sure that your dataset is processed in the same way the dataset used for the language model was.
Next let’s create a simple LSTM language model by defining a config file for it or using one of the config files defined in
- if you want to use a pretrained language model, specify the location of the pretrained language model using the parameter
data_rootto point to the directory containing the raw dataset used to train your language model, for example, the IMDB dataset downloaded above.
processed_data_folderto point to the location where you want to store the processed dataset. If the dataset has been pre-procesed before, the data layer can just load the data from this location.
- update other hyper parameters such as number of layers, number of hidden units, cell type, loss function, learning rate, optimizer, etc. to meet your needs.
"mixed"if you want to use mixed-precision training, or
tf.float32to train only in FP32.
For example, your config file is
lstm-wkt103-mixed.py. To train without Horovod, update
use_horovod to False in the config file and run:
python run.py --config_file=example_configs/transfer/imdb-wkt2.py --mode=train_eval --enable_logs
When training with Horovod, use the following command:
mpiexec --allow-run-as-root -np <num_gpus> python run.py --config_file=example_configs/transfer/imdb-wkt2.py --mode=train_eval --enable_logs
Some things to keep in mind:
- Don’t forget to update
num_gpusto the number of GPUs you want to use.
- If your GPUs run out of memory, reduce the
Running in the mode
eval will evaluate your model on the evaluation set:
python run.py --config_file=example_configs/transfer/imdb-wkt2.py --mode=eval --enable_logs
Running in the mode
infer will evaluate your model on the test set:
python run.py --config_file=example_configs/transfer/imdb-wkt2.py --mode=test --enable_logs
The performance of the model is reported on accuracy and F1 scores.