Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Contrastive learning enhanced retrieval-augmented few-shot framework for multi-label patent classification

  • Wenlong Zheng,

    Roles Data curation, Formal analysis, Resources, Software, Supervision, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Ningbo University of Finance and Economics, School of Finance and Information, Ningbo, Zhejiang, China

  • Xin Li ,

    Roles Data curation, Methodology, Project administration, Validation

    LiXin_DXYD@hotmail.com

    Affiliation The First Topographic Surveying Brigade of Ministry of Natural Resource of P.R.C., Xi’an, Shaanxi, China

  • Guoqing Cui,

    Roles Investigation, Methodology, Resources, Writing – review & editing

    Affiliations The First Topographic Surveying Brigade of Ministry of Natural Resource of P.R.C., Xi’an, Shaanxi, China, Northwest Land and Resources Research Center, Shaanxi Normal University, Xi’an, Shaanxi, China

  • Shikun Chen

    Roles Funding acquisition, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Ningbo University of Finance and Economics, School of Finance and Information, Ningbo, Zhejiang, China

Abstract

The rapid expansion of patent databases poses increasing challenges for multi-label patent classification, particularly for inventions spanning multiple technological domains. Conventional approaches are hindered by high annotation costs and limited scalability, while often neglecting the semantic structure of patent documents. Here, we present a retrieval-enhanced few-shot learning framework that combines patent-specific contrastive pre-training with semantic retrieval to enable scalable multi-label classification. Drone technologies are selected as the evaluation domain due to their multidisciplinary characteristics encompassing mechanical, electronic, and software aspects. The proposed method learns domain-adapted embeddings that capture multi-label co-occurrence patterns and leverages retrieval-augmented few-shot learning with structured reasoning to reduce reliance on extensive annotations. Experiments on a curated dataset of 15,000 annotated drone patents across ten categories demonstrate that the framework achieves Macro-F1 and Micro-F1 scores of 0.847 and 0.892, corresponding to improvements of 30% and 23% over few-shot baselines. Furthermore, contrastive pre-training yields notable benefits for underrepresented categories, with performance improvements reaching 16% over transformer-based approaches. These results indicate that the proposed approach offers an effective and resource-efficient solution for multi-label patent classification, with potential to improve the scalability and accessibility of intellectual property analysis.

Introduction

The global intellectual property landscape operates through advanced classification systems that organize technological innovations across diverse fields. International patent offices, including the United States Patent and Trademark Office, the European Patent Office, and the World Intellectual Property Organization, collectively manage large repositories containing tens of millions of patent documents [1]. These repositories rely on hierarchical classification frameworks such as the International Patent Classification and Cooperative Patent Classification systems, which structure technological knowledge into detailed categories that enable efficient prior art searches and strategic innovation analysis [2].

Contemporary patent filing rates have reached unprecedented levels. As a result, conventional classification methods face increasing difficulty, since the volume of new applications exceeds the capacity of manual review by several orders of magnitude [3]. This exponential growth intersects with increasing technological complexity, as modern patents frequently span multiple domains simultaneously. Rather than fitting neatly into single categories, today’s patents often encompass interdisciplinary concepts that challenge conventional classification boundaries [4].

The technical challenges inherent in patent classification extend beyond mere scale considerations. Patents exhibit an inherently multi-label nature, with individual documents simultaneously belonging to multiple technological categories [5]. Consider autonomous vehicle patents: they encompass sensor technology, artificial intelligence algorithms, mechanical engineering principles, and telecommunications components within unified systems. This complexity is compounded by the specialized vocabulary and technical precision required in patent language, where subtle terminological differences can indicate entirely different technological approaches [6].

The annotation bottleneck represents a particularly acute challenge in this domain. Unlike general text classification where crowd-sourcing can provide adequate labels, patent classification requires deep technical expertise across multiple disciplines. This makes large-scale annotation efforts economically prohibitive [7]. Traditional supervised approaches demand thousands of labeled examples per category, while the specialized knowledge required for accurate patent labeling severely limits the pool of qualified annotators [8]. A fundamental tension thus exists between the need for adequate training data and the practical constraints of obtaining expert-level annotations.

Few-shot learning paradigms offer promising solutions for addressing these annotation constraints. Their application to specialized technical domains, however, presents distinct challenges due to complex semantic relationships and domain-specific terminology [9]. Rather than relying on general-purpose language representations, technical documents demand approaches that capture domain-specific semantic nuances. Recent advances in contrastive learning have shown promise for addressing representation challenges in specialized domains by learning embeddings that capture semantic relationships between similar and dissimilar samples [10].

Existing contrastive frameworks typically focus on single-label scenarios and fail to account for the complex co-occurrence patterns that characterize multi-label technical documents [11]. Retrieval-augmented approaches have simultaneously gained attention as powerful tools for enhancing few-shot performance by leveraging semantic similarity to identify relevant demonstration examples [12]. The combination of these techniques presents untapped potential for technical document classification, particularly in scenarios where label relationships exhibit hierarchical or co-occurrence structures.

Contrastive pre-training provides a natural complement to retrieval-based approaches by learning representations that explicitly encode semantic relationships between documents [13]. In the patent domain, contrastive objectives can be designed to respect multi-label co-occurrence patterns, ensuring that patents sharing technological categories are embedded in similar representation spaces while maintaining separation between distinct domains [14]. Such an approach enables retrieval mechanisms to identify demonstration examples that are not only semantically similar but also relevant for multi-label prediction tasks.

Unmanned aerial vehicle (UAV) technologies exemplify the multi-label classification challenges inherent in modern patent analysis. UAV innovations typically integrate mechanical components such as propulsion systems and structural elements, electronic systems including sensors and control circuits, software algorithms for navigation and autonomous control, and communication technologies for data transmission and remote control [15]. This technological convergence creates patent documents that naturally span multiple classification categories, providing an ideal testbed for multi-label few-shot learning approaches.

The framework presented in this study integrates contrastive pre-training with retrieval-enhanced few-shot learning to address multi-label patent classification under minimal annotation constraints. Our approach employs patent-specific contrastive objectives that capture multi-label co-occurrence patterns while leveraging semantic retrieval to identify demonstration examples that guide few-shot classification decisions. Through chain-of-thought reasoning, the framework methodically evaluates each potential label by considering retrieved examples and learned representations, enabling structured multi-label predictions without extensive training data.

Our work makes several contributions to the intersection of few-shot learning and technical document classification. We introduce a contrastive pre-training strategy specifically designed for multi-label patent classification that respects technological co-occurrence patterns while learning discriminative representations. We develop a retrieval-enhanced few-shot framework that leverages semantic similarity to identify informative demonstration examples for multi-label prediction tasks. Through comprehensive evaluation on UAV patent data encompassing ten technological categories, we demonstrate that our approach achieves meaningful improvements over conventional few-shot baselines while requiring minimal labeled training data.

Background and related work

Patent classification with domain-specific language models

Patent classification constitutes a cornerstone of intellectual property management, yet the exponential growth of patent filings has rendered traditional classification approaches increasingly inadequate. The complexity of patent documents – characterized by specialized technical vocabulary, legal terminology, and interdisciplinary content – poses formidable challenges for automated classification systems [16]. Modern patents frequently span multiple technological domains simultaneously, necessitating multi-label classification frameworks rather than single-label approaches [17].

The landscape of patent classification has witnessed significant evolution through the adoption of deep learning architectures. Transformer-based models, particularly those leveraging pre-trained language representations, have achieved notable success in capturing the nuanced semantics of patent text [18]. PatentNet employs fine-tuned variants of BERT, XLNet, and RoBERTa for multi-label patent classification, establishing new benchmarks on the USPTO-2M dataset [5]. These advances notwithstanding, such approaches demand extensive labeled training data–a requirement that proves particularly onerous given the specialized expertise necessary for accurate patent annotation [19].

Domain-specific language models have emerged as a promising direction for addressing these challenges. PatentGPT and PatentSBERTa represent significant advances in patent-specific pre-training, with PatentGPT trained on over 240 billion tokens of patent-related text demonstrating superior performance on intellectual property benchmarks [18,20,21]. Continual pre-training strategies offer cost-effective domain adaptation by leveraging existing general-purpose models as initialization points [22,23]. The construction of domain-specific training corpora requires careful curation, incorporating patent specifications, office actions, prior art citations, and technical literature [2426]. Vocabulary adaptation to accommodate technical terms, chemical formulas, and specialized notation further improves both compression rates and semantic representation quality [27,28].

The multi-label nature of patent classification introduces additional complexities beyond traditional document categorization. Patents describing autonomous systems may simultaneously encompass mechanical engineering, artificial intelligence, telecommunications, and control systems–each warranting distinct classification labels [29]. Hierarchical approaches have been proposed to capture structured relationships in patent classification systems, with graph convolutional networks effectively modeling label dependencies in the International Patent Classification hierarchy [30]. UAV patents exemplify these multi-disciplinary challenges, spanning mechanical, electronic, software, and communication technologies [31,32]. This technological convergence makes UAV patents an ideal testbed for evaluating multi-label classification methods designed to handle complex, interdisciplinary innovations.

Contrastive learning for document classification

Contrastive learning has revolutionized representation learning across various domains by explicitly optimizing for discriminative embeddings that cluster similar samples while separating dissimilar ones [33]. In the context of document classification, this paradigm shift has yielded notable improvements, particularly in scenarios with limited labeled data. The fundamental principle–learning representations through the lens of similarity and dissimilarity–aligns naturally with the objectives of classification tasks [34].

Recent advances have extended contrastive frameworks to accommodate the complexities of multi-label scenarios. Traditional contrastive objectives, designed primarily for single-label settings, fail to capture the nuanced relationships present when documents can simultaneously belong to multiple categories [35]. CAROL (Class-Aware Contrastive Loss) addresses this limitation by incorporating class separation objectives specifically tailored for imbalanced multi-label text classification [36]. Similarly, label-aware contrastive approaches have demonstrated superior performance by explicitly modeling inter-label relationships during the representation learning phase [37].

The application of contrastive learning to specialized technical domains presents unique opportunities and challenges. Technical documents exhibit domain-specific semantic structures that differ markedly from general text corpora [38]. SimCSE (Simple contrastive learning of sentence embeddings), while successful in general sentence embedding tasks, requires careful adaptation when applied to patent text, where subtle terminological distinctions carry significant legal and technical implications [39]. Recent work has shown that domain-adapted contrastive objectives, which incorporate technical term relationships and hierarchical concept structures, outperform generic contrastive frameworks [40].

The synergy between contrastive pre-training and downstream classification tasks has proven particularly effective. CICA (Content-Injected Contrastive Alignment) demonstrates that incorporating document-specific content modules during contrastive training enhances zero-shot classification capabilities [35]. This approach suggests that contrastive objectives can be designed not merely to learn general representations, but to actively prepare models for specific downstream tasks. For patent classification, this implies the possibility of designing contrastive objectives that respect the multi-label co-occurrence patterns inherent in technological innovation [14].

Decoupled supervised contrastive learning represents another promising direction, separating the representation learning phase from the classification objective [41]. This decoupling allows for more stable training dynamics and improved convergence, particularly beneficial when dealing with the high-dimensional, sparse label spaces characteristic of patent classification [42]. The approach has shown remarkable success in domains requiring fine-grained discrimination between semantically similar categories – a common requirement in patent analysis [43].

Retrieval-augmented few-shot learning

The integration of retrieval mechanisms with few-shot learning paradigms has emerged as a powerful approach for addressing data scarcity challenges. Atlas, a landmark contribution in this area, demonstrates that retrieval-augmented language models can achieve competitive performance on knowledge-intensive tasks with minimal training examples [44]. By leveraging external knowledge sources during both training and inference, these models effectively augment their limited parametric knowledge with non-parametric retrieval [4547].

The application of retrieval-augmented approaches to classification tasks requires careful consideration of demonstration selection strategies. Rather than relying on random sampling from limited training sets, retrieval-based methods identify semantically relevant examples that provide maximal information for classification decisions [48]. This targeted selection proves particularly valuable in technical domains where examples must capture domain-specific nuances [49]. Recent work has shown that task-specific retrieval metrics, learned jointly with classification objectives, outperform generic similarity measures [50,51]. In-context learning approaches have demonstrated effectiveness for text classification with many labels, where demonstration selection proves crucial for handling large label spaces [52]. Few-shot learning has also achieved notable advances in computer vision tasks, particularly multimodal approaches for 3D point cloud segmentation [53,54]. These cross-domain developments may inform future methodological innovations in patent classification.

Meta-learning frameworks have been successfully combined with retrieval mechanisms to enhance few-shot performance. By meta-training on diverse tasks while incorporating retrieval-based demonstration selection, models develop the ability to rapidly adapt to new domains with minimal examples [55]. This approach proves especially relevant for patent classification, where new technological categories emerge continuously and obtaining extensive labeled data for each category remains impractical [55].

The quality and relevance of retrieved demonstrations influence few-shot classification performance. RePrompt introduces visual prompt learning enhanced by retrieval mechanisms, demonstrating that carefully curated retrieval databases can bridge domain gaps in few-shot scenarios [56]. For patent classification, this suggests that building domain-specific retrieval corpora–containing technically relevant demonstrations – could significantly enhance classification accuracy even with limited labeled examples [57].

Retrieval-augmented approaches also address the challenge of handling rare or emerging categories. In patent classification, where technological innovations continuously create new subcategories, the ability to leverage similar historical patents as demonstrations becomes invaluable [58]. QZero demonstrates that query reformulation through retrieval can improve zero-shot classification by enriching input representations with relevant contextual information [56].

Chain-of-thought reasoning for multi-label prediction

Chain-of-thought (CoT) prompting has fundamentally transformed how language models approach complex reasoning tasks. By decomposing multi-step problems into intermediate reasoning steps, CoT enables models to tackle challenges that would otherwise exceed their capabilities [59]. This structured reasoning approach proves particularly relevant for multi-label classification, where decisions about individual labels may depend on complex inter-label relationships [60].

The application of CoT to classification tasks extends beyond simple prompting strategies. Recent work demonstrates that explicitly modeling the reasoning process for each label decision improves both accuracy and interpretability [61]. In multi-label scenarios, this translates to systematic evaluation of each potential label, considering both positive and negative evidence before making classification decisions [11]. The transparency afforded by chain-of-thought reasoning also facilitates error analysis and model improvement [62].

Zero-shot chain-of-thought approaches have shown effectiveness when combined with retrieval mechanisms. By generating reasoning chains based on retrieved examples, models can perform complex classification without task-specific training [63]. This capability proves especially valuable in patent classification, where the reasoning behind label assignments often involves technical comparisons and legal considerations that benefit from explicit articulation [8].

The emergence of chain-of-thought reasoning as an inherent capability of sufficiently large models suggests interesting implications for domain-specific applications [64]. While general-purpose models demonstrate this capability on arithmetic and commonsense reasoning tasks, adapting CoT for specialized domains like patent classification requires careful consideration of domain-specific reasoning patterns [65]. The technical nature of patent documents demands reasoning chains that incorporate technological relationships, innovation boundaries, and classification hierarchy constraints [29].

Materials and methods

Dataset

Our experimental foundation rests upon a large patent corpus sourced from the National Intellectual Property Administration’s IP Search and Consultation Center China. This repository contains one million patmporal scope captures the evolution of contemporary technologies, while the volume provides sufficient diversity for robust few-shot learning evaluation.

From this extensive collection, we extracted and curated approximately 100,000 patents related to unmanned aerial vehicle technologies, with 15,000 receiving expert annotations across ten technological categories. This curated UAV patent dataset represents a contribution of this work and is available at https://github.com/redcican/Contrastive-learning-latent-multi-label-classification. The extraction methodology employed a multi-stage filtering approach combining domain-specific keyword identification with controlled vocabulary validation. Our filtering strategy incorporated 47 carefully selected terms organized across five categorical domains: general UAV terminology, configuration-specific descriptors, functional classifications, application-specific identifiers, and technical component designators.

The validation process consisted of three sequential stages designed to ensure extraction precision while minimizing noise. Initial keyword-based filtering identified candidate patents using Boolean search operators to capture documents containing multiple relevant concepts. Cross-validation against Cooperative Patent Classification codes, particularly B64C, B64D, and B64U categories, provided systematic verification of technological relevance. Manual expert review of a stratified sample comprising 2,000 patents assessed both precision and recall, achieving 94.2% and 89.6% respectively.

Quality control measures addressed potential domain leakage and terminological ambiguity through several mechanisms. Automated removal eliminated patents with conflicting or unclear abstracts, while expert review resolved boundary cases where multiple technological domains intersected. Negative filtering terms excluded non-technical applications such as recreational models and simulation software. This systematic approach resulted in a curated dataset with verified technological relevance and minimal noise contamination.

Patent abstracts constitute the primary textual input for classification models due to their comprehensive yet concise summaries of technological developments. The preprocessing pipeline transforms these abstracts through sequential cleaning, translation, and normalization steps. Since many patents originated in Chinese, neural machine translation via Google Translate API generated English versions for linguistic consistency. Text cleaning removed patent-specific boilerplate phrases, special characters, and residual markup. Tokenization employed whitespace delimiters followed by lemmatization to preserve semantic meaning while reducing vocabulary complexity.

The classification framework targets ten distinct technological categories that capture the multidisciplinary nature of UAV innovations. These categories emerged through systematic thematic analysis combining inductive corpus exploration with expert domain knowledge. Initial topic modeling using Latent Dirichlet Allocation on 10,000 patent abstracts revealed 15 preliminary themes. Expert consolidation by aerospace engineering specialists merged semantically related topics while eliminating overly granular categories. Hierarchical clustering analysis validated topic coherence and ensured adequate categorical separation. The multi-label co-occurrence patterns inherent in this schema provide rich training signals for contrastive pre-training objectives, enabling the model to learn representations that capture technological relationships and domain-specific semantic structures.

Table 1 presents representative examples demonstrating the technological diversity within our dataset. Each entry illustrates the multidisciplinary nature of UAV innovations, from bionic flight mechanisms to modular deployment systems.

thumbnail
Table 1. Representative patent examples from the UAV technology dataset.

https://doi.org/10.1371/journal.pone.0341118.t001

The target classification schema includes the following technological categories: VTOL and hybrid flight systems focusing on vertical take-off capabilities and transition mechanisms; bionic and flapping wing designs mimicking biological flight patterns; modular and deployable architectures emphasizing rapid assembly and compact storage; endurance and power optimization systems incorporating advanced energy solutions; structural integrity and material developments addressing durability and performance; surveillance, inspection, and mapping applications covering data collection and monitoring; logistics and cargo transport capabilities; multi-environment operations including amphibious and submersible functionalities; flight control and stability systems featuring advanced algorithms and mechanisms; and specialized applications addressing niche requirements such as agricultural or emergency operations.

A binary relevance approach transforms the multi-label classification problem into independent binary decisions for each technological category. This formulation maintains compatibility with contrastive learning objectives while facilitating efficient computation during few-shot learning scenarios. The dataset structure supports few-shot learning through carefully balanced class representations and diverse technological examples that enable effective demonstration retrieval across all categories. Ground truth labels were established through structured annotation by three independent domain experts, each possessing over five years of specialized experience in aerospace engineering and patent analysis.

Inter-annotator agreement analysis using Fleiss’ Kappa achieved a score of 0.82, indicating substantial consensus among experts and validating annotation reliability. When disagreements occurred in approximately 12% of cases, a senior patent analyst with over ten years of experience facilitated consensus discussions to establish definitive classifications. This structured annotation process ensures the validity and reliability of our experimental evaluation framework. The curated dataset serves as both training data for contrastive pre-training and a retrieval corpus for identifying semantically similar demonstration examples during few-shot classification tasks.

Table 2 summarizes key characteristics of our curated dataset, including class distributions and textual properties.

thumbnail
Table 2. Dataset characteristics and technological category distributions.

https://doi.org/10.1371/journal.pone.0341118.t002

Contrastive pre-training for multi-label patent representations

Given a patent corpus where xi represents patent abstract text and denotes the multi-label assignment across K technological categories, the objective is to learn an encoder that maps patent text to d-dimensional embeddings. The encoder consists of a RoBERTa-large backbone [66] followed by projection layers that transform contextualized representations into contrastive embedding space.

The multi-label contrastive framework extends standard contrastive objectives by incorporating label co-occurrence relationships. For a given anchor patent xi with labels yi, we define positive examples as patents sharing at least one technological category: . Rather than treating all other patents as negatives, we introduce a similarity-weighted approach that accounts for partial label overlap.

The multi-label contrastive loss combines instance-level and label-aware objectives:

(1)

where λ controls the balance between instance and label-level learning. The instance-level loss employs InfoNCE with multi-label positive sampling:

(2)

where represents the embedding for patent i, denotes cosine similarity, and τ is the temperature parameter.

The label-aware component explicitly models technological co-occurrence patterns through a weighted similarity measure. We compute label similarity between patents as:

(3)

where represents symmetric difference and α penalizes large label disparities. The label-aware loss becomes:

(4)

Patent-specific adaptations address domain characteristics including technical terminology density and hierarchical category relationships. We incorporate domain-adaptive temperature scaling:

(5)

where complexity is measured by technical term frequency. For hierarchical categories, we introduce category-aware negative sampling that reduces the probability of selecting patents from semantically related categories as hard negatives.

The training procedure employs momentum-based updates to maintain stable positive pairs across mini-batches. For each training step, we construct balanced mini-batches ensuring adequate representation from all technological categories. The momentum encoder with slowly updated parameters:

(6)

provides consistent positive examples across iterations where m is the momentum coefficient. Training involves three phases: (1) initialization using RoBERTa-large weights, (2) contrastive pre-training on the patent corpus with frozen RoBERTa parameters, and (3) joint fine-tuning of both RoBERTa and projection layers. This progressive training strategy balances domain adaptation with preservation of general linguistic knowledge. Fig 1 illustrates the contrastive pre-training architecture and multi-label similarity computation process.

thumbnail
Fig 1. Contrastive pre-training architecture for multi-label patent representations.

The system processes patent abstracts through a shared encoder to generate embeddings that are optimized using multi-label contrastive objectives. The similarity computation considers both instance-level relationships and label co-occurrence patterns, enabling the model to learn representations that capture technological relationships and domain-specific semantic structures.

https://doi.org/10.1371/journal.pone.0341118.g001

Retrieval-augmented demonstration selection

The retrieval-augmented demonstration selection mechanism leverages the contrastive embeddings learned during pre-training to identify semantically relevant patent examples that guide few-shot classification decisions. Given a query patent xq requiring multi-label classification, the retrieval system searches through a corpus of labeled patents to identify demonstrations that maximize both semantic similarity and label informativeness.

The retrieval process operates in the embedding space created by the pre-trained encoder . For a query patent xq, we compute its embedding and retrieve the k most similar patents based on a multi-faceted similarity metric:

(7)

where represents semantic similarity in the contrastive embedding space, captures technical domain alignment through specialized features, and promotes diversity among retrieved examples to avoid redundancy.

The semantic similarity component leverages the contrastive embeddings directly:

(8)

where represents an edit distance penalty that down-weights near-duplicate patents, preventing the retrieval of trivially similar examples that provide limited discriminative information.

Technical domain alignment incorporates patent-specific features including International Patent Classification (IPC) codes, technical term overlap, and citation relationships:

(9)

where weights are learned during validation to optimize retrieval quality for the specific patent domain.

To ensure retrieved demonstrations provide comprehensive coverage of the label space, we introduce a diversity-promoting mechanism that penalizes redundant retrievals:

(10)

where represents the set of already retrieved examples and the Jaccard similarity measures label overlap, preventing retrieval of examples with redundant label combinations.

The retrieval strategy adapts dynamically based on query characteristics. For patents with high technical complexity (measured by specialized term density), we increase the weight of technical similarity:

(11)

This adaptation ensures that highly technical patents receive demonstrations from similar technological domains, improving the relevance of retrieved examples.

To handle the multi-label nature of patent classification, we implement a label-aware retrieval strategy that considers the co-occurrence patterns learned during contrastive pre-training. The retrieval process prioritizes examples that exhibit similar label complexity:

(12)

where represents the predicted number of labels for the query (estimated from embedding characteristics), and p(l|zq) denotes the probability of label l given the query embedding, computed using a lightweight classifier head.

The final retrieval set consists of the top-k patents according to the combined scoring function. These demonstrations are ordered by relevance and formatted as input-output pairs that illustrate the multi-label classification task. The ordering preserves semantic coherence while ensuring label diversity:

(13)

This ordering strategy balances individual relevance with collective diversity, creating a demonstration set that provides comprehensive guidance for multi-label prediction. Fig 2 illustrates the retrieval-augmented demonstration selection process and its integration with the few-shot prediction module.

thumbnail
Fig 2. Retrieval-augmented demonstration selection.

The system leverages contrastive embeddings to identify relevant patent demonstrations through multi-faceted similarity scoring. The retrieval process considers semantic similarity, technical domain alignment, and diversity constraints to select informative examples that guide multi-label classification decisions. Retrieved demonstrations are ordered to balance relevance and diversity, providing comprehensive coverage of the label space.

https://doi.org/10.1371/journal.pone.0341118.g002

Few-shot multi-label prediction

The few-shot prediction module integrates the contrastive embeddings and retrieved demonstrations to classify patents across multiple categories using minimal labeled examples. Given a query patent xq and its retrieved demonstration set , the system constructs a structured prompt that enables systematic multi-label prediction through in-context learning.

The prompt construction follows a task-specific template that explicitly frames the multi-label classification problem:

(14)

where denotes concatenation, provides task instructions emphasizing multi-label nature, formats each demonstration as an input-output pair, and presents the target patent for classification.

Each demonstration is formatted to highlight the multi-label assignment pattern:

(15)

This explicit formatting helps the model recognize that multiple categories can be simultaneously assigned, distinguishing the task from single-label classification scenarios.

To leverage the learned representations, we introduce an embedding-guided attention mechanism that modulates the influence of each demonstration based on its relevance to the query:

(16)

where zq and zi are contrastive embeddings, is an attention temperature parameter, and bi represents a bias term that accounts for demonstration position and label informativeness:

(17)

The position-dependent term gives higher weight to earlier demonstrations (which are more relevant according to our retrieval ordering), while the label overlap term prioritizes demonstrations sharing predicted labels with the query.

For multi-label prediction, we employ a decomposed inference strategy that evaluates each category independently while considering inter-label dependencies:

(18)

where fLM represents the language model’s output for category lj, σ is the sigmoid function, and captures learned dependencies between categories j and m.

The language model scoring function fLM combines multiple sources of evidence:

(19)

where is the language model’s raw score for category lj given the prompt, measures category-specific similarity using learned category prototypes, and provides a prior probability based on the contrastive embedding.

To handle the varying complexity of multi-label assignments, we implement an adaptive threshold mechanism:

(20)

where is the category frequency in the retrieval corpus, is the mean frequency, and quantifies prediction uncertainty based on demonstration consistency.

The uncertainty measure evaluates the consistency of label assignments across retrieved demonstrations:

(21)

High uncertainty (inconsistent demonstrations) leads to more conservative thresholds, reducing false positive predictions. For categories with sparse representation in the demonstration set, we employ a prototype-based fallback mechanism:

(22)

where represents the support set of patents containing category lj. When demonstration coverage for category j is insufficient (), the prediction incorporates prototype similarity:

(23)

The final prediction combines demonstration-based and prototype-based predictions:

(24)

where weights the contribution based on demonstration coverage.

This approach ensures robust multi-label predictions even with limited demonstrations, leveraging both the semantic understanding from contrastive pre-training and the pattern recognition from retrieved examples. The adaptive mechanisms handle the inherent challenges of multi-label classification, including label imbalance, sparse categories, and varying label co-occurrence patterns.

Chain-of-thought multi-label reasoning

The chain-of-thought (CoT) reasoning mechanism enhances the few-shot prediction process by introducing structured reasoning paths for multi-label decisions. Rather than directly predicting labels from demonstrations, the system employs GPT-4o to generate explicit reasoning chains that evaluate each technological category through systematic analysis of patent characteristics.

We implement GPT-4o integration through OpenAI’s API with carefully tuned parameters for patent classification. The API calls utilize the GPT-4o model with temperature set to 0.3 for consistent reasoning while maintaining creativity, max_tokens of 2048 to accommodate detailed reasoning chains, top_p of 0.9 for controlled diversity in technical terminology, and frequency_penalty of 0.2 to reduce repetitive patterns in multi-label evaluations. The system prompt explicitly defines the patent classification task and multi-label nature, while maintaining conversation context across category evaluations to preserve inter-label dependencies. Response format is set to JSON mode to ensure structured output parsing for downstream processing.

Given the prompt constructed in the previous step, we augment it with reasoning instructions that decompose the multi-label classification into sequential evaluation steps:

(25)

where instructs the model to: (1) identify key technological features in the query patent, (2) compare these features against each demonstration’s label assignments, (3) evaluate evidence for each category independently, and (4) synthesize final multi-label predictions.

The reasoning process for each category lj follows a structured template:

(26)

This decomposition enables the model to articulate why specific labels apply, improving both accuracy and interpretability.

To handle inter-label dependencies systematically, the reasoning incorporates conditional evaluation:

(27)

where represents previously assigned labels, allowing the model to consider technological relationships. For instance, if “VTOL" is assigned, the reasoning for “Flight Control" explicitly considers this context.

The CoT mechanism integrates with the embedding-guided attention by using attention weights wi to emphasize reasoning about highly relevant demonstrations:

(28)

This ensures that reasoning focuses on the most informative examples identified through retrieval. For sparse categories where demonstrations are limited, the reasoning explicitly incorporates prototype comparisons:

(29)

This transparency helps identify when predictions rely more on prototypes than demonstrations. The final CoT-enhanced prediction combines reasoning confidence with the base framework’s scores:

(30)

where β weights the contribution of reasoning versus direct prediction, typically set to 0.7 to prioritize structured reasoning while maintaining robustness.

The CoT reasoning provides three key advantages for multi-label patent classification: (1) explicit handling of label co-occurrences through conditional reasoning, (2) interpretable decision paths that facilitate error analysis and model improvement, and (3) improved performance on complex patents requiring nuanced technological understanding. By combining the semantic understanding from contrastive embeddings, pattern recognition from retrieved demonstrations, and structured reasoning from GPT-4o, this approach achieves robust multi-label classification with minimal labeled examples while maintaining interpretability. Fig 3 illustrates the whole framework.

thumbnail
Fig 3. The system combines retrieved demonstrations with embedding-guided attention to perform multi-label patent classification.

The prediction module employs decomposed inference for each category while considering inter-label dependencies, adaptive thresholding based on uncertainty, and prototype-based fallback for sparse categories. The integration of contrastive embeddings, demonstration patterns, and language model reasoning enables robust classification with minimal labeled examples.

https://doi.org/10.1371/journal.pone.0341118.g003

Proposed framework

We now present our complete framework that brings together contrastive learning, retrieval-based demonstration selection, and chain-of-thought reasoning to classify patents across multiple categories. This unified approach requires only a small number of labeled examples while maintaining high accuracy. Algorithm 1 details the step-by-step process from raw patent text to final multi-label predictions.

Algorithm 1 Retrieval-augmented contrastive learning for multi-label patent classification.

1: Input: Query patent xq, Patent corpus , Labeled set , Categories

2: Output: Multi-label prediction

3: // Phase 1: Contrastive Pre-training

4: Initialize encoder with RoBERTa-large weights

5: for epoch  = 1 to Epretrain do

6:   Sample mini-batch

7:   Compute embeddings

8:   Calculate multi-label contrastive loss:

9:   Update momentum encoder:

10:   Update θ via gradient descent on

11: end for

12: // Phase 2: Retrieval-Augmented Demonstration Selection

13: Compute query embedding:

14: Initialize retrieval set

15: for j = 1 to k do

16:   Compute scores for all :

17:   

18:   Select:

19:   Update:

20: end ror

21: Order demonstrations:

22: // Phase 3: Few-shot Classification with CoT Reasoning

23: Construct prompt:

24: Compute attention weights:

25: Augment with CoT template:

26: // Phase 4: Multi-label Prediction

27: for each category do

28:   // GPT-4o reasoning with API parameters:

29:    temperature=0.3, max_tokens=2048, top_p=0.9, frequency_penalty=0.2

30:   Generate reasoning:

31:   Compute CoT probability: from reasoning output

32:   // Compute base probability with inter-label dependencies:

33:  

34:   // Check for sparse category fallback:

35:   if then

36:    Compute prototype:

37:   

38:   

39:   else

40:    pfallback = 0,

41:   end if

42:   // Adaptive thresholding:

43:  

44:   // Final prediction:

45:  

46:  

47: end for

48: return

The framework operates in four integrated phases. First, contrastive pre-training learns domain-specific embeddings that capture multi-label relationships in patent text. Second, retrieval-augmented demonstration selection identifies the most informative examples through multi-faceted similarity scoring. Third, few-shot classification constructs prompts with embedding-guided attention weights. Finally, chain-of-thought reasoning via GPT-4o generates interpretable predictions for each category while handling inter-label dependencies, sparse categories through prototypes, and uncertainty through adaptive thresholding.

This unified approach addresses the key challenges of multi-label patent classification: the annotation bottleneck through few-shot learning, the complexity of technical language through domain-specific pre-training, the multi-label nature through explicit modeling of label co-occurrences, and the need for interpretability through structured reasoning chains. The framework’s modular design allows for component improvements while maintaining the overall architecture’s effectiveness.

Results

Experimental design

Our experimental evaluation adopts a stratified few-shot protocol that respects the temporal structure of patent data while maintaining representative label distribution across training and testing phases. The complete UAV patent dataset of 15,000 annotated patents undergoes temporal partitioning where patents from 2000-2020 constitute the training corpus (12,000 patents) and patents from 2021-2023 form the test set (3,000 patents). This temporal division reflects realistic deployment scenarios where models trained on historical patents must classify emerging technologies.

Within the training corpus, we establish a few-shot learning protocol following the N-way-K-shot paradigm adapted for multi-label scenarios. For each evaluation episode, we randomly sample N = 5 technological categories and select demonstration examples per category from the training set, ensuring each selected patent contains at least one target label. The query set consists of 100 patents containing various combinations of the selected categories, with an average of 2.3 labels per patent matching our dataset characteristics.

Categories are grouped into three tiers based on occurrence frequency: frequent categories include VTOL & Hybrid Flight, Surveillance & Mapping, and Flight Control & Stability; moderate categories encompass Modular & Deployable, Endurance & Power Systems, and Structural & Materials; sparse categories comprise Logistics & Cargo, Bionic & Flapping Wing, Specialized Applications, and Multi-Environment. Each evaluation episode samples at least one category from each tier to prevent bias toward frequent categories.

We compare our approach against carefully selected baselines to isolate the contribution of each framework component. Recent patent-specific methods include LLM-AL [29], which combines iterative large language model inference with active learning for scalable multi-label classification, and PatentSBERTa [18], a hybrid approach integrating SBERT sentence embeddings with K-nearest neighbors classification and patent-specific domain adaptation. Traditional multi-label approaches include fine-tuned RoBERTa-Large [5] and XLNet-Large [67], representing current transformer-based methods for patent classification. Few-shot learning baselines comprise Prototypical Networks [68] adapted for multi-label scenarios and META-LSTM [69] with multi-label extensions. Retrieval-augmented approaches include standard RAG using dense passage retrieval with BERT embeddings and RePrompt [70] adapted for multi-label classification.

Our evaluation employs multiple metrics addressing different aspects of multi-label few-shot performance. Macro-F1 score provides equal weighting to all categories:

(31)

where Pj and Rj represent precision and recall for category j, and K denotes the number of categories. Micro-F1 score aggregates predictions across all categories:

(32)

where and .

Label Ranking Average Precision (LRAP) evaluates the quality of label ranking:

(33)

where Yi represents true labels for sample i, , and rankij denotes the rank of label j for sample i. Coverage Error measures the average number of top-ranked labels needed to cover all true labels:

(34)

Few-shot adaptation efficiency is measured through learning curves across different shot numbers and category-wise performance analysis for frequent, moderate, and sparse categories. Performance variance across 50 episodes per configuration provides robustness assessment. Demonstration retrieval effectiveness uses Precision@K and Recall@K metrics that measure whether retrieved patents contain relevant technological categories. Chain-of-thought reasoning quality undergoes human expert assessment where three domain experts rate 500 randomly selected reasoning chains on 5-point Likert scales across coherence, factual accuracy, and decision support utility dimensions. Statistical significance is assessed through paired t-tests with Bonferroni correction for multiple comparisons. All experiments are conducted on a single NVIDIA GeForce RTX 4090 GPU with 32GB memory.

Overall performance comparison

Our proposed framework achieves meaningful improvements across all evaluation metrics compared to existing approaches. Table 3 presents the comprehensive performance comparison under the 5-shot setting, which represents a challenging few-shot scenario with limited demonstration examples per category.

thumbnail
Table 3. Overall performance comparison under 5-shot setting.

https://doi.org/10.1371/journal.pone.0341118.t003

The results demonstrate consistent and statistically significant improvements across all metrics. Our framework achieves a Macro-F1 score of 0.847 (±0.021), which represents improvements of 16.2% over RoBERTa-Large and 14.3% over XLNet-Large. Compared to recent patent-specific methods, our framework achieves 6.1% improvement over LLM-AL and 11.2% improvement over PatentSBERTa in Macro-F1 performance. The Micro-F1 performance reaches 0.892 (±0.018), with improvements of 11.4% and 9.4% over RoBERTa and XLNet respectively. These improvements are particularly notable given the challenging nature of multi-label patent classification and the limited annotation requirements of the few-shot setting.

Label ranking performance, measured by LRAP, shows our framework achieves 0.878 (±0.019). This outperforms transformer baselines by margins that exceed 14%. Our framework also outperforms LLM-AL by 5.3% and PatentSBERTa by 12.3% on LRAP. Coverage Error, where lower values indicate better performance, demonstrates our framework’s superior label ranking capabilities with a score of 1.23 (±0.087), compared to 1.87 for RoBERTa-Large and 1.41 for LLM-AL.

Statistical significance was evaluated through paired t-tests that compared our framework against each baseline method across all four metrics. We conducted pairwise comparisons for all eight baseline methods. This yielded 32 total statistical tests (8 baselines × 4 metrics). To control for family-wise error rate in multiple comparisons, we applied Bonferroni correction with significance threshold . All 32 comparisons yielded p-values below 0.001, which substantially exceeded the corrected significance threshold. The observed improvements are therefore highly unlikely to occur by chance. Effect sizes, measured by Cohen’s d, range from 1.2 to 2.3 across comparisons. These values indicate large practical significance. The combined statistical and effect size analyses confirm that our framework provides meaningful and robust improvements over existing approaches, including recent patent-specific methods.

Fig 4 illustrates the comprehensive performance comparison with error bars representing standard deviations across 50 experimental episodes. The visualization clearly demonstrates our framework’s consistent superiority across diverse evaluation dimensions.

thumbnail
Fig 4. Overall performance comparison across evaluation metrics.

The proposed framework consistently outperforms baseline methods across Macro-F1, Micro-F1, LRAP, and Coverage Error metrics. Error bars represent standard deviations across 50 experimental episodes. Statistical significance indicators (***p < 0.001, **p < 0.01, *p < 0.05) show comparisons against our framework.

https://doi.org/10.1371/journal.pone.0341118.g004

Few-shot learning effectiveness

The few-shot learning curves in Fig 5 reveal the framework’s superior data efficiency across varying numbers of demonstration examples. Performance consistently improves as the number of shots increases from 1 to 10, with our framework maintaining substantial advantages at all shot levels.

thumbnail
Fig 5. Few-shot learning curves showing performance vs. number of shots.

Our framework demonstrates superior few-shot learning capabilities across all shot settings. Left: Macro-F1 performance prioritizes rare category detection. Right: Micro-F1 performance emphasizes overall classification accuracy. Error bars represent standard deviations across 50 episodes.

https://doi.org/10.1371/journal.pone.0341118.g005

In the challenging 1-shot scenario, our framework achieves Macro-F1 and Micro-F1 scores of 0.723 and 0.789 respectively, representing improvements of 23.2% and 21.2% over RoBERTa-Large, 18.5% and 15.7% over LLM-AL, and 25.8% and 22.3% over PatentSBERTa. While absolute performance increases with additional shots, the relative improvements narrow as baseline methods also benefit from more demonstrations, reaching 15.3% and 10.0% improvements over RoBERTa-Large in the 10-shot setting. The learning curves demonstrate effective knowledge transfer from contrastive pre-training, enabling rapid adaptation with minimal supervision.

The error bars reveal that our framework maintains more stable performance across episodes, with standard deviations consistently lower than baseline methods. This stability indicates robust generalization capabilities, crucial for practical deployment scenarios where training data is limited and variable.

Computational efficiency

Table 4 compares computational efficiency between our framework and baseline methods across training and inference phases. Our framework adds 2M parameters (a 0.6% increase) over RoBERTa-Large through projection layers and retrieval components. This represents minimal model size overhead. The contrastive pre-training phase demands substantially fewer computational resources than full supervised fine-tuning employed by transformer baselines (8.2 hours versus 24-26.5 hours) and achieves training efficiency comparable to patent-specific methods (8.2 hours versus 18-22 hours for LLM-AL and PatentSBERTa) on a single RTX 4090 GPU. This advantage arises because our framework leverages unlabeled patent corpora during pre-training while requiring only minimal labeled demonstrations during inference.

Inference latency consists of two components: local model processing (48-52ms per patent) and GPT-4o API calls (165-195ms per patent with typical network latency). The GPT-4o integration introduces additional inference time. However, this overhead proves justified given the substantial performance improvements and interpretability benefits. For deployment scenarios that demand lower latency, the chain-of-thought reasoning module can be disabled with graceful performance degradation (5.4% Macro-F1 reduction as shown in ablation analysis). This modification reduces total inference time to baseline levels while preserving advantages from contrastive pre-training and semantic retrieval.

The memory footprint during inference remains comparable to baseline methods at 3.3GB GPU memory, since GPT-4o operates through API calls without local parameter storage. This efficient resource utilization enables deployment on standard research-grade GPUs. The framework achieves superior accuracy-efficiency trade-offs: 16.2% higher Macro-F1 than RoBERTa-Large with 66% training time reduction, 6.1% improvement over LLM-AL with 54% less training time, and 11.2% improvement over PatentSBERTa with 63% training time reduction. Despite slightly longer inference times due to GPT-4o integration (228ms versus 245ms for LLM-AL and 65ms for PatentSBERTa), the substantial accuracy gains justify this overhead. These characteristics render our approach particularly suitable for resource-constrained settings where both annotation budgets and computational resources remain limited.

Ablation study

The ablation study in Table 5 evaluates each framework component’s contribution to overall performance. Removing individual components reveals their relative importance and validates the integrated design approach.

thumbnail
Table 5. Ablation study: component contribution analysis.

https://doi.org/10.1371/journal.pone.0341118.t005

Contrastive pre-training contributes most significantly to performance, with its removal causing a 6.8% drop in Macro-F1 (from 0.847 to 0.789). This substantial decrease demonstrates the importance of domain-adapted representations that capture multi-label co-occurrence patterns in patent text. The semantic retrieval mechanism provides the second-largest contribution, with random demonstration selection causing a 9.7% performance drop (Macro-F1: 0.765). This validates our multi-faceted similarity scoring approach for identifying informative demonstrations.

Chain-of-thought reasoning contributes a 5.4% improvement (Macro-F1 drop from 0.847 to 0.801 without CoT), which demonstrates the value of structured reasoning for complex multi-label decisions. The inter-label dependency modeling provides a 2.8% improvement. This shows benefits from explicitly modeling technological co-occurrence patterns rather than treating labels independently.

Coverage error improvements follow similar patterns, with contrastive pre-training and semantic retrieval showing the largest contributions. The cumulative effect of all components results in the full framework’s superior performance, which validates the integrated design approach.

Discussion

Our experimental findings reveal that the integration of contrastive learning with retrieval-augmented few-shot learning offers an effective pathway for multi-label patent classification under annotation constraints. Rather than relying on extensive expert annotation, our approach demonstrates meaningful improvements across evaluation metrics, particularly those addressing the complexities of scaling patent classification to emerging technological domains.

We observe that contrastive pre-training effectiveness in the patent domain stems from its capacity to capture intricate co-occurrence patterns within technological innovations. Indeed, patents describing integrated systems naturally span multiple categories, and our multi-label contrastive objective learns representations that respect these technological relationships. This domain adaptation proves especially valuable for underrepresented categories, whereas traditional fine-tuning approaches encounter difficulties due to limited examples.

Our findings suggest that retrieval-augmented demonstration selection plays a pivotal role in few-shot performance. The multi-faceted similarity scoring mechanism, which combines semantic embeddings with technical domain features, enables identification of relevant demonstrations that guide classification decisions. Instead of naive similarity-based approaches, the diversity constraints prevent redundant retrievals while ensuring comprehensive label space coverage.

The chain-of-thought reasoning integration offers both performance enhancements and interpretability benefits. The structured reasoning paths provide insights into classification decisions, facilitating error analysis and model debugging. However, we note that the computational overhead of GPT-4o inference presents practical considerations for large-scale deployment scenarios.

We acknowledge several limitations that warrant consideration. Our evaluation focuses specifically on UAV patent classification, and generalization to other technological domains or patent categories requires additional empirical validation. While the UAV domain provides an excellent testbed due to its multidisciplinary nature and representative multi-label complexity, demonstrating broader applicability across diverse technical fields remains a direction for future work. The framework’s reliance on GPT-4o for chain-of-thought reasoning introduces computational costs and API dependencies that may limit deployment in resource-constrained settings. Furthermore, the contrastive pre-training phase requires domain-specific patent corpora, which may necessitate additional data collection efforts when extending to new technical domains. The reliance on English translations may introduce noise for patents originally filed in other languages. Additionally, although the temporal evaluation split reflects realistic scenarios, it may not fully capture challenges in classifying patents describing genuinely novel technological concepts absent from historical data.

The framework’s modular architecture provides promising directions for future enhancement. Alternative contrastive objectives, particularly those incorporating hierarchical contrastive learning that explicitly models patent classification taxonomies, could improve representation quality. Furthermore, the retrieval mechanism could benefit from learned similarity metrics optimized specifically for patent demonstration selection. In principle, integration with domain-specific language models beyond GPT-4o may provide performance gains while reducing computational costs. Future research will address these limitations by evaluating the framework across multiple patent categories and other specialized document classification domains, exploring lightweight alternatives to large language models, and investigating transfer learning strategies that reduce domain-specific data requirements.

Conclusion

We address the challenge of multi-label patent classification under minimal annotation constraints through an integrated approach combining contrastive learning, retrieval-augmented demonstration selection, and chain-of-thought reasoning. Our framework demonstrates meaningful improvements over conventional approaches, achieving Macro-F1 and Micro-F1 scores of 0.847 and 0.892 respectively on UAV patent classification tasks.

Our contributions encompass a multi-label contrastive pre-training strategy that captures technological co-occurrence patterns, a retrieval mechanism that identifies informative demonstrations through multi-faceted similarity scoring, and a structured reasoning approach that handles inter-label dependencies while maintaining interpretability. The systematic ablation analysis validates each component’s importance, with contrastive pre-training and semantic retrieval providing the most pronounced performance contributions.

We demonstrate the framework’s effectiveness across varying numbers of demonstration examples, from challenging 1-shot to more supportive 10-shot scenarios, which highlights its practical utility for real-world patent classification tasks. The consistent performance advantages over established baselines, including recent retrieval-augmented approaches, suggest that our integrated design addresses the unique challenges of multi-label few-shot learning in technical domains.

Beyond patent classification, our approach offers broader implications for few-shot learning in specialized domains where obtaining expert annotations remains prohibitively expensive. The combination of domain-adapted representations, intelligent demonstration selection, and structured reasoning provides a general framework applicable to other technical document classification tasks. In particular, those requiring minimal supervision while maintaining accuracy and interpretability may benefit from this integrated methodology.

References

  1. 1. Stim R. Patent, copyright & trademark: an intellectual property desk reference. Nolo; 2024.
  2. 2. Sofi IA, Sofi TA, Mir AA, Bhat A. Analysing the current status of open access patent repositories: a global perspective. IDD. 2024;53(2):181–91.
  3. 3. Paul J. A progressive methodology for automating patent filing processes. 2024.
  4. 4. Pelaez S. Value expressions in patents: their relationship to patent valuation and technological orientation. Georgia Institute of Technology; 2025.
  5. 5. Haghighian Roudsari A, Afshar J, Lee W, Lee S. PatentNet: multi-label classification of patent documents using deep learning based language understanding. Scientometrics. 2021;127(1):207–31.
  6. 6. Chen G. Legal, technical, and linguistic nuances in patent translation in english-chinese contexts: based on an invalidated patent. IJTIS. 2024;4(4):55–63.
  7. 7. Karim MM, Khan S, Van DH, Liu X, Wang C, Qu Q. Transforming data annotation with ai agents: a review of architectures, reasoning, applications, and impact. Future Internet. 2025;17(8):353.
  8. 8. Bonnici K. Automated patent landscaping: a comparative study of machine learning approaches. Transfer. 2025;2(2).
  9. 9. Song Y, Wang T, Cai P, Mondal SK, Sahoo JP. A comprehensive survey of few-shot learning: evolution, applications, challenges, and opportunities. ACM Comput Surv. 2023;55(13s):1–40.
  10. 10. Xu L, Xie H, Li Z, Wang FL, Wang W, Li Q. Contrastive learning models for sentence representations. ACM Trans Intell Syst Technol. 2023;14(4):1–34.
  11. 11. Hu W, Fan Q, Yan H, Xu X, Huang S, Zhang K. A survey of multi-label text classification under few-shot scenarios. Applied Sciences. 2025;15(16):8872.
  12. 12. Zhao P, Zhang H, Yu Q, Wang Z, Geng Y, Fu F. Retrieval-augmented generation for AI-generated content: a survey. arXiv preprint 2024.
  13. 13. Zhao WX, Liu J, Ren R, Wen J-R. Dense text retrieval based on pretrained language models: a survey. ACM Trans Inf Syst. 2024;42(4):1–60.
  14. 14. Qiang Y, Sun G, Liu H. PatentALL: multi-label patent classification using adaptive label learning. In: 2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI). 2024. 108–15. https://doi.org/10.1109/ictai62512.2024.00024
  15. 15. Telli K, Kraa O, Himeur Y, Ouamane A, Boumehraz M, Atalla S, et al. A comprehensive review of recent research trends on Unmanned Aerial Vehicles (UAVs). Systems. 2023;11(8):400.
  16. 16. Li Y. Tech-mining on Chinese patents: syntax and terminology. Université de la Sorbonne nouvelle-Paris III. 2024.
  17. 17. Lee J-S. Evaluating generative patent language models. World Patent Information. 2023;72:102173.
  18. 18. Bekamiri H, Hain DS, Jurowetzki R. PatentSBERTa: a deep NLP based hybrid model for patent distance and classification using augmented SBERT. Technological Forecasting and Social Change. 2024;206:123536.
  19. 19. Li M, Wang L. Leveraging patent classification based on deep learning: the case study on smart cities and industrial Internet of Things. Journal of Informetrics. 2025;19(1):101616.
  20. 20. Bai Z, Zhang R, Chen L, Cai Q, Zhong Y, Wang C. Patentgpt: a large language model for intellectual property. arXiv preprint 2024. https://arxiv.org/abs/240418255
  21. 21. Zhao X, Lu J, Deng C, Zheng C, Wang J, Chowdhury T. Beyond one-model-fits-all: a survey of domain specialization for large language models. arXiv preprint 2023.
  22. 22. Xie Y, Aggarwal K, Ahmad A. Efficient continual pre-training for building domain specific large language models. In: Findings of the Association for Computational Linguistics ACL 2024 . 2024. p. 10184–201. https://doi.org/10.18653/v1/2024.findings-acl.606
  23. 23. Lee J-S. InstructPatentGPT: training patent language models to follow instructions with human feedback. Artif Intell Law. 2024;33(3):739–82.
  24. 24. Ren R, Ma J, Luo J. Large language model for patent concept generation. Advanced Engineering Informatics. 2025;65:103301.
  25. 25. Ali A, Tufail A, De Silva LC, Abas PE. Innovating patent retrieval: a comprehensive review of techniques, trends, and challenges in prior art searches. ASI. 2024;7(5):91.
  26. 26. Wu H, Zhang L, Zhu H, Liu Q, Chen E, Xiong H. Examination process modeling for intelligent patent management: a multi-aspect neural sequential approach. ACM Trans Manage Inf Syst. 2025;16(3):1–23.
  27. 27. Christofidellis D. Accelerating scientific discovery using domain adaptive language modelling. Queen’s University Belfast; 2023.
  28. 28. Leema AA, Balakrishnan P, Sangaiah AK. VAE-based compression of token embeddings for domain-specific adaptation of pre-trained language models. In: International Symposium on Pervasive Systems, Algorithms and Networks. 2025. p. 228–48.
  29. 29. Xiong S, Chen S, He J, Liu Y, Mao J, Liu C. Scalable multi-label patent classification via iterative large language model-assisted active learning. World Patent Information. 2025;82:102380.
  30. 30. Liu Y, Xu F, Zhao Y, Ma Z, Wang T, Zhang S, et al. Hierarchical multi-instance multi-label learning for Chinese patent text classification. Connection Science. 2024;36(1).
  31. 31. Trappey AJC, Chou S-C, Li G-KJ. Patent litigation mining using a large language model—Taking unmanned aerial vehicle development as the case domain. World Patent Information. 2025;80:102332.
  32. 32. Alghamdi Y, Munir A, La HM. Architecture, classification, and applications of contemporary unmanned aerial vehicles. IEEE Consumer Electron Mag. 2021;10(6):9–20.
  33. 33. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P. Supervised contrastive learning. Advances in Neural Information Processing Systems. 2020;33:18661–73.
  34. 34. Gao T, Yao X, Chen D. Simcse: simple contrastive learning of sentence embeddings. arXiv Preprint 2021. arXiv:210408821
  35. 35. Sinha S, Khan MSU, Sheikh TU, Stricker D, Afzal MZ. CICA: content-injected contrastive alignment for zero-shot document image classification. In: International Conference on Document Analysis and Recognition. Springer; 2024. p. 124–41.
  36. 36. Khvatskii G, Moniz N, Doan KD, Chawla NV. Class-aware contrastive optimization for imbalanced text classification. Discov Data. 2025;3(1):124.
  37. 37. Su S, Shao D, Ma L, Yi S, Yang Z. ADCL: an attention feature enhancement network based on adversarial contrastive learning for short text classification. Advanced Engineering Informatics. 2025;65:103202.
  38. 38. Zhang Y, Shen Z, Wu C-H, Xie B, Hao J, Wang Y-Y, et al. Metadata-induced contrastive learning for zero-shot multi-label text classification. In: Proceedings of the ACM Web Conference 2022, 2022. p. 3162–73. https://doi.org/10.1145/3485447.3512174
  39. 39. Lin N, Qin G, Wang J, Yang A, Zhou D. An effective deployment of contrastive learning in multi-label text classification. arXiv preprint 2022. https://arxiv.org/abs/221200552
  40. 40. Kang S. Efficient AI systems for domain adaptation: LLM-guided weighted contrastive learning with reduced computational requirements. 2025.
  41. 41. Le-Khac PH, Healy G, Smeaton AF. Contrastive representation learning: a framework and review. IEEE Access. 2020;8:193907–34.
  42. 42. Hu J, Li S, Hu J, Yang G. A hierarchical feature extraction model for multi-label mechanical patent classification. Sustainability. 2018;10(1):219.
  43. 43. Ding J, Wang A, Huang KG-L, Zhang Q, Yang S. Improving large-scale classification in technology management: making full use of label information for professional technical documents. IEEE Trans Eng Manage. 2024;71:15188–208.
  44. 44. Izacard G, Lewis P, Lomeli M, Hosseini L, Petroni F, Schick T. Atlas: few-shot learning with retrieval augmented language models. Journal of Machine Learning Research. 2023;24(251):1–43.
  45. 45. Lo HC, Chu JM, Hsiang J, Cho CC. Large language model informed patent image retrieval. arXiv preprint 2024. https://arxiv.org/abs/2404.19360
  46. 46. Ren R, Ma J, Luo J. Retrieval-augmented generation systems for intellectual property via synthetic multi-angle fine-tuning. arXiv preprint 2025. https://arxiv.org/abs/250600527
  47. 47. Deng Y, Li H, Fu X. Few-shot learning named entity recognition of pressure sensor patent text based on MLM. In: 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS). 2021. p. 540–6. https://doi.org/10.1109/tocs53301.2021.9688929
  48. 48. Chen H, Zhao Y, Chen Z, Wang M, Li L, Zhang M, et al. Retrieval-style in-context learning for few-shot hierarchical text classification. Transactions of the Association for Computational Linguistics. 2024;12:1214–31.
  49. 49. Yu G, Liu L, Jiang H, Shi S, Ao X. Retrieval-augmented few-shot text classification. In: Findings of the Association for Computational Linguistics: EMNLP 2023 . 2023. p. 6721–35. https://doi.org/10.18653/v1/2023.findings-emnlp.447
  50. 50. Braga M, Kasela P, Raganato A, Pasi G. Investigating task arithmetic for zero-shot information retrieval. In: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2025. p. 2738–43. https://doi.org/10.1145/3726302.3730216
  51. 51. Zhang P, Liu Z, Xiao S, Dou Z, Nie J-Y. A multi-task embedder for retrieval augmented LLMs. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. p. 3537–53. https://doi.org/10.18653/v1/2024.acl-long.194
  52. 52. Milios A, Reddy S, Bahdanau D. In-context learning for text classification with many labels. arXiv preprint 2023. https://arxiv.org/abs/2309.10954
  53. 53. An Z, Sun G, Liu Y, Li R, Wu M, Cheng MM, et al. Multimodality helps few-shot 3d point cloud semantic segmentation. arXiv preprint arXiv:241022489. 2024.
  54. 54. An Z, Sun G, Liu Y, Li R, Han J, Konukoglu E, et al. Generalized few-shot 3d point cloud segmentation with vision-language model. In: Proceedings of the Computer Vision and Pattern Recognition Conference; 2025. p. 16997–7007.
  55. 55. Mueller A, Narang K, Mathias L, Wang Q, Firooz H. Meta-training with demonstration retrieval for efficient few-shot learning. arXiv preprint 2023. https://arxiv.org/abs/230700119
  56. 56. Abdullahi T, Singh R, Eickhoff C. Retrieval augmented zero-shot text classification. In: Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval. 2024. p. 195–203. https://doi.org/10.1145/3664190.3672514
  57. 57. Althammer S, Hofstätter S, Hanbury A. Cross-domain retrieval in the legal and patent domains: a reproducibility study. In: European Conference on Information Retrieval. Springer; 2021. p. 3–17.
  58. 58. Xie X, Wu J, Xiang M, Tang J, Sheng Y. Enhancing the efficiency of patent classification: a multimodal classification approach for design patents. Journal of King Saud University Computer and Information Sciences. 2025;37(7):1–17.
  59. 59. Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems. 2022;35:24824–37.
  60. 60. Zheng G, Yang B, Tang J, Zhou HY, Yang S. Ddcot: duty-distinct chain-of-thought prompting for multimodal reasoning in language models. Advances in Neural Information Processing Systems. 2023;36:5168–91.
  61. 61. Guan R, Shen C, Wang W, Wang Y, Xie L, Yan S, et al. Instance-adaptive zero-shot chain-of-thought prompting. In: Advances in Neural Information Processing Systems 37. 2024. p. 125469–86. https://doi.org/10.52202/079017-3986
  62. 62. Miao J, Thongprayoon C, Suppadungsuk S, Krisanapan P, Radhakrishnan Y, Cheungpasitporn W. Chain of thought utilization in large language models and application in nephrology. Medicina (Kaunas). 2024;60(1):148. pmid:38256408
  63. 63. Hebenstreit K. Zero-shot chain-of-thought reasoning across datasets and models. 2023.
  64. 64. Che W, Chen Q, Qin L, Wang J, Zhou J. Unlocking the capabilities of thought: a reasoning boundary framework to quantify and optimize chain-of-thought. In: Advances in Neural Information Processing Systems 37. 2024. p. 54872–904. https://doi.org/10.52202/079017-1740
  65. 65. Sun J, Zheng C, Xie E, Liu Z, Chu R, Qiu J, et al. A survey of reasoning with foundation models: concepts, methodologies, and outlook. ACM Comput Surv. 2025.
  66. 66. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D. Roberta: a robustly optimized bert pretraining approach. arXiv preprint 2019. https://arxiv.org/abs/1907.11692
  67. 67. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. Xlnet: generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems. 2019;32.
  68. 68. Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems. 2017;30.
  69. 69. Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: International conference on learning representations; 2017.
  70. 70. Wang Y, Shen S, Lim BY. RePrompt: automatic prompt editing to refine AI-generative art towards precise expressions. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 2023. p. 1–29. https://doi.org/10.1145/3544548.3581402