Using deep learning to predict outcomes of legal appeals better than human experts: A study with data from Brazilian federal courts

doi:10.1371/journal.pone.0272287

Table 1.

Outcomes distribution on our dataset.

More »

Expand

Fig 1.

Yearly distributions of results of appeals against rulings from Federal Small Claims Courts within the jurisdiction of the 5th Regional Federal Court.

The data for 2020 contains appeals tried during the first trimester.

More »

Expand

Fig 2.

Comparison between traditional holdout method and our time-sensitive approach.

Grey arrows show how, in the traditional holdout method, data leaks from the training set into the validation and test sets, which leads to overly optimistic results, as the model has access to information from the future.

More »

Expand

Table 2.

Hyperparameters for the ULMFiT classifiers.

More »

Expand

Fig 3.

Overview of the BERT + LSTM model.

The text of a court ruling is split into chunks containing 512 tokens each. They are passed to a Portuguese BERT model, from which we collect the embeddings from the CLS token. We feed an LSTM with the embeddings from the previous step, condensing them into one vector. We pass this vector to a classifier head to get a final classification.

More »

Expand

Table 3.

Hyperparameters for the BERT + LSTM classifier.

More »

Expand

Table 4.

Hyperparameters for the Big Bird classifier.

More »

Expand

Table 5.

Size and label distribution for the validation, test and human experts’ datasets.

More »

Expand

Table 6.

Experience and education data about the experts that labelled our dataset.

More »

Expand

Table 7.

Performance results on the human experts dataset and test dataset.

We obtained the 0.99 confidence interval based on the Fisher r-to-z transformation.

More »

Expand

Fig 4.

Matthews Correlation Coefficient for each architecture on the test dataset.

We calculated the MCC considering all appeals tried each month. The monthly MCC for all models remains constant during the timeframe of the test dataset, showing no evidence of data drift.

More »

Expand