Retrieval-augmented Chinese text-to-SQL generation for conversational bibliographic search

doi:10.1371/journal.pone.0334965

Fig 1.

A multi-hop example from the BibSQL dataset.

More »

Expand

Table 1.

Examples of question templates and their corresponding query paths in the BibSQL dataset.

More »

Expand

Fig 2.

Distribution of questions and queries in the BibSQL dataset.

These histograms show the length distributions for (a) natural language questions and (b) SQL queries.

More »

Expand

Table 2.

Detailed statistics of the BibSQL dataset.

More »

Expand

Fig 3.

End-to-end workflow of the proposed retrieval-augmented Text-to-SQL framework.

More »

Expand

Fig 4.

The overall process of the SoftSimMatch framework.

More »

Expand

Table 3.

Evaluation of different example selections on BibSQL-test.

More »

Expand

Table 4.

Comparative evaluation of LLMs on BibSQL-test.

Performance is evaluated using various Retrieval-Augmented Generation (RAG) methods, where best results are in bold and second-best are underlined. These methods include using the top 1, 2, or 5 most similar questions (RAG-top1/2/5) and a random dataset sample (RAG-random1) for context.

More »

Expand

Table 5.

Comparative evaluation of LLMs on BibSQL-test with small training set.

More »

Expand

Table 6.

Detailed accuracy of SQL query components on BibSQL-test with small training set.

More »

Expand

Table 7.

Error analysis of the proposed system on BibSQL-test with small training set.

More »

Expand