Neuro-symbolic procedural semantics for explainable visual dialogue

doi:10.1371/journal.pone.0323098

Fig 1.

Schematic representation of a typical visual dialogue task.

In this task, an artificial agent needs to answer a sequence of follow-up questions about an image.

More »

Expand

Fig 2.

Example of a procedural semantic representation for the question ‘Are there more squares than circles?’, executed on the image in Fig 1.

The answer to the question given this image is no.

More »

Expand

Fig 3.

Schematic representation of the conversation memory after the fourth turn of the dialogue sketched in Fig 1.

The conversation memory is incrementally updated after each dialogue turn as new information becomes available.

More »

Expand

Fig 4.

Schematic representation of the implementation of the primitive operations.

More »

Expand

Table 1.

Overview of primitive operations categorised by their symbolic or subsymbolic implementation.

More »

Expand

Table 2.

Overview of the shared inventory of neural modules on top of which the subsymbolic primitive operations are built. All modules are implemented as binary classifiers adopting the SqueezeNet architecture [73].

More »

Expand

Fig 5.

Example dialogue from the MNIST Dialog dataset.

More »

Expand

Fig 6.

Schematic representation of the execution of the semantic representation for the utterance ‘What is its colour?’ following the utterance ‘How many 3’s are there?’ on a scene from the MNIST Dialog dataset.

More »

Expand

Table 3.

Overview of results for MNIST Dialog, CLEVR-Dialog and CLEVR VQA.

More »

Expand

Fig 7.

Example dialogue from the CLEVR-Dialog dataset.

More »

Expand

Fig 8.

Schematic representation of the execution of the semantic representation for the utterance ‘What is its colour?’ following the caption ‘There is a large sphere.’ on a scene from the CLEVR-Dialog dataset.

More »

Expand

Fig 9.

Schematic representation of the execution of the semantic network underlying the utterance ‘How many brown objects are there?’ on a scene from the CLEVR-Dialog dataset, illustrating the transparency of the approach.

The filter operation wrongly recognises the leftmost object to be brown. As a consequence, two brown objects are counted instead of one.

More »

Expand

Fig 10.

Schematic representation of the execution of the semantic network underlying the utterance ‘How many other objects are there?’ on a scene from the CLEVR-Dialog dataset, illustrating the transparency of the approach.

The figure shows that the semantic network and its execution are flawless. As a consequence, the erroneous answer three must be due to an error in the conversation memory introduced in a previous dialogue turn.

More »

Expand