Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese

doi:10.1371/journal.pone.0276545

Fig 1.

Overview diagram of the process of collecting and building medical image dataset.

The process consists of five steps: data collection from PACS and HIS, PA-view filtering, XML parser, data matching and data annotation.

More »

Expand

Fig 2.

The description in a typical radiology report in Vietnam.

The description is divided into four main categories: chest wall, pleura, lungs (parenchyma) and cardiac.

More »

Expand

Fig 3.

Radiology reports extraction process for CXR examinations collected from HIS [38].

The original Vietnamese counterparts are put inside square brackets.

More »

Expand

Fig 4.

Algorithm for matching a DICOM file obtained from PACS with a radiology report collected from HIS.

More »

Expand

Fig 5.

Semi-automated data annotation pipeline.

The system consists of 4 steps, the first 3 steps are automatic and the last one is carried out manually.

More »

Expand

Table 1.

Examples of Vietnamese keywords indicate abnormalities in chest wall, pleura, parenchyma, cardiac classes and abnormality out of these four group.

English translations are enclosed in square brackets.

More »

Expand

Table 2.

Number of instances which contain five labeled observations in training, validation and the whole dataset.

More »

Expand

Table 3.

Evaluation results of proposed labeling tool.

Evaluation was performed on 3001 samples of the validation set.

More »

Expand

Table 4.

Experimental results with different pre-train datasets and loss functions.

Model pre-trained on CheXpert dataset and using Asymmetric loss function yields the best performance.

More »

Expand

Table 5.

Experimental results with different backbones and input sizes.

Model with EfficientNet-B2 architecture and input size of 768 delivers the best performance.

More »

Expand

Table 6.

Performance of EfficientNet-B2 on five classes.

More »

Expand

Fig 6.

Area under the ROC curve.

Pleura class delivered the highest AUC value, at 0.96 (95% CI 0.94, 0.97) whereas chest wall class performed the lowest AUC value, with the figure of 0.81 (95% CI 0.75, 0.85).

More »

Expand

Table 7.

The mappings between CheXpert data labels (14 classes) and the proposed set of labels (5 classes).

P and N refer to positive and negative respectively.

More »

Expand

Table 8.

Comparison of coarse and fine classification on CheXpert.

More »

Expand

Fig 7.

Original images and respective Grad-CAMs.

There is a collarbone (nondisplaced fracture) in the first two figures, while the last two ones containing pleural effusion in the pleura. Both of these pathologies were correctly highlighted.

More »

Expand