Table 1.
Outcomes distribution on our dataset.
Fig 1.
Yearly distributions of results of appeals against rulings from Federal Small Claims Courts within the jurisdiction of the 5th Regional Federal Court.
The data for 2020 contains appeals tried during the first trimester.
Fig 2.
Comparison between traditional holdout method and our time-sensitive approach.
Grey arrows show how, in the traditional holdout method, data leaks from the training set into the validation and test sets, which leads to overly optimistic results, as the model has access to information from the future.
Table 2.
Hyperparameters for the ULMFiT classifiers.
Fig 3.
Overview of the BERT + LSTM model.
The text of a court ruling is split into chunks containing 512 tokens each. They are passed to a Portuguese BERT model, from which we collect the embeddings from the CLS token. We feed an LSTM with the embeddings from the previous step, condensing them into one vector. We pass this vector to a classifier head to get a final classification.
Table 3.
Hyperparameters for the BERT + LSTM classifier.
Table 4.
Hyperparameters for the Big Bird classifier.
Table 5.
Size and label distribution for the validation, test and human experts’ datasets.
Table 6.
Experience and education data about the experts that labelled our dataset.
Table 7.
Performance results on the human experts dataset and test dataset.
We obtained the 0.99 confidence interval based on the Fisher r-to-z transformation.
Fig 4.
Matthews Correlation Coefficient for each architecture on the test dataset.
We calculated the MCC considering all appeals tried each month. The monthly MCC for all models remains constant during the timeframe of the test dataset, showing no evidence of data drift.