Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Exploring novel semi-inner product reproducing Kernels in Banach space for robust Kernel methods

  • Yi Ding,

    Roles Investigation, Methodology, Writing – original draft

    Affiliation Graduate School of Computer Science and Engineering, University of Aizu, Aizu-wakamatsu, Fukushima, Japan

  • Ying Zhao,

    Roles Software, Visualization

    Affiliation Graduate School of Computer Science and Engineering, University of Aizu, Aizu-wakamatsu, Fukushima, Japan

  • Yan Pei

    Roles Conceptualization, Supervision

    peiyan@u-aizu.ac.jp

    Current address: Computer Science Division, University of Aizu, Aizu-wakamatsu, Fukushima, Japan

    Affiliation Computer Science Division, University of Aizu, Aizu-wakamatsu, Fukushima, Japan

Abstract

Kernel methods are widely applied across various domains; however, structural limitations of reproducing kernels in Hilbert spaces pose significant challenges. Many challenges inherent to Hilbert spaces can be effectively addressed within the framework of Banach spaces. In this work, we define the semi-inner product reproducing kernel Banach space and its reproducing kernels using semi-inner product and bilinear mapping, supported by rigorous proofs. Specific forms of semi-inner product reproducing kernels are derived within the theoretical framework of the semi-inner product reproducing kernel Banach space. This constitutes the core originality of our work and represents its primary contribution. Through illustrative experiments, we validate the effectiveness of semi-inner product reproducing kernels and demonstrate their superior performance compared to polynomial reproducing kernels.

1 Introduction

The reproducing kernel theory originated from integral theory. During its initial development, the kernel was regarded as the continuous kernel of a positive definite integral operator. This theory was introduced by Mecer and termed the positive definite kernel [1]. The theoretical framework of reproducing kernel Hilbert space (RKHS) was proposed in the 1930s. Bergman [2] investigated the boundary value problem of differential equations as shown in Eq (1). In this context, a(x,y), b(x,y), , that is, a(x,y), b(x,y), and c(x,y) are bivariate square-integrable functions on the interval . The concept and formulation of the reproducing kernel were given for the first time. Aronszajn summarized the previous work, developed the reproducing kernel theory incorporating the Bergman kernel, and streamlined the proof process [3].

(1)

Reproducing kernel functions are widely utilized in mathematics and computer science. In mathematical fields, such as functional analysis, numerical analysis, and partial differential equations, kernel functions are prevalent. In computer science, reproducing kernel functions are primarily applied in kernel methods, which play a vital role in machine learning [4], pattern recognition, and statistics. In the field of kernel methods, there have been several works applying reproducing kernel functions. In support vector machines (SVM), kernel functions are utilized to transform data from lower-dimensional feature space to a high-dimensional feature space to find better linear segmentation surfaces, i.e., linear decision boundaries [59]. Bernhard introduced kernel methods into the principal component analysis (PCA), performing PCA in the high-dimensional space mapped by kernel functions to reduce data dimensionality [10]. He extended reproducing kernels into the k-means clustering algorithm, enabling the algorithm to handle clustering problems with more complex shapes [11]. Parzen utilized kernel functions to estimate probability density functions, which are extensively employed in data analysis and pattern recognition [12]. Mika employed kernel functions to enhance the linear discriminant analysis (LDA) algorithm, allowing it to manage nonlinear classification problems [13].

It is well-known that the Hilbert space and its dual space are isometrically isomorphic, meaning they are essentially equivalent in this context. Therefore, the inner product defined on can also be interpreted as being defined on both and its dual space . This property enables the classical reproducing properties to be described in terms of dual spaces and dual bilinear forms, and serves as a foundation for extending the reproducing properties from Hilbert spaces to Banach spaces.

Compared with Hilbert space, Banach space offerss distinct advantages when solving certain classes of problems. First, any two Hilbert spaces of the same dimension are isometrically isomorphic, meaning they can essentially be considered as the same space. In contrast, two Banach spaces of the same dimension, , Lp[0,1] and Lq[0,1], are not isomorphic, i.e. two different spaces [14]. Consequently, for a given dimension, Banach spaces exhibit richer geometric structure than Hilbert spaces. This structural advantage can potentially improve the performance of kernel-based machine learning algorithms. Second, Hilbert spaces lack the flexibility to accommodate the intrinsic geometric structure present in many real-world datasets [15]. Consequently, traditional machine learning algorithms based on RKHS may fail to process such data. In contrast, machine learning algorithms based on Banach spaces can address this issue, as data with intrinsic structure can be embedded into Banach spaces. Third, during data processing, algorithms frequently apply a large number of inner product operations. Certain dimensions of the data may be lost during reading and processing. In real-world scenarios, the number of dimensions does not always match among the datasets, preventing inner product operations with other data. Currently, missing data dimensions are commonly filled with zeros [16] to ensure consistent dimensionality, thereby enabling kernel methods based on inner product operations. As shown in Fig 1, when vector A loses one dimension of data and becomes , the inner product operation cannot be performed. The missing data is filled with zeros to obtain vector , enabling the inner product operation to be performed again. However, the zero-filling method assumes that the missing data component in a given dimension is zero, making the corresponding values of other data meaningless in the inner product operation. In other words, if a dimension is missing from a piece of data, it is equivalent to losing that dimension across all data during inner product operations. A number of other padding methods have been adopted to make the inner product computable, such as mean imputation [17] and other statistical approaches [18]. Although these methods facilitate the calculation of the inner product, they can result in significant deviations. If an alternative operation can effectively substitute the inner product in data processing, more accurate and reasonable results for the designed machine learning algorithms may be achieved. In light of the above considerations, we aim to identify a function that can replace the inner product operation when generating reproducing kernel functions. This objective forms the core motivation for our proposal and the present work.

thumbnail
Fig 1. A and B are two vectors of the same dimension, and these two can be the inner product, but when A loses the data of one dimension and becomes A’, A’ and B cannot perform inner product, as A’ and B are vectors of two different dimensions.

To solve the problem, the common way is to use the method of zeroing in the location of the lost data to get the A” vector, making it possible to perform an inner product operation. However, this zero-filling method assumes that the corresponding component of vector B in the missing dimension is zero, rendering the contribution of vector B in that dimension meaningless during the inner product operation.

https://doi.org/10.1371/journal.pone.0340686.g001

In this paper, we first start with the theoretical foundation and provide a complete proof of the reproducing kernel Banach space theory. Based on the relationship between the reproducing kernel Hilbert space and the reproducing kernel, we present the theoretical form of the reproducing kernel in the Banach spaces. We propose the introduction of a bilinear mapping similar to the inner product in the Banach space and introduce a semi-inner product in the Banach space. Due to certain structural issues within the space, we add additional properties to the Banach space, thereby establishing the theorem of the semi-inner product reproducing kernel Banach space and the reproducing kernel. Subsequently, we explore practical applications by providing several specific forms of the semi-inner product reproducing kernel that are applicable. Through motivating examples, we verify the effectiveness of the semi-inner product reproducing kernel in algorithmic applications.

Following this introduction, Sect 2 briefly discusses the relationship between the reproducing kernel Hilbert space and the reproducing kernel, as well as the process of constructing the reproducing kernel. In Sect 3, we provide a complete proof of the necessary and sufficient conditions for the existence of a reproducing kernel Banach space and its reproducing kernel. We also justify the introduction of a semi-inner product in Banach spaces and discuss its additional properties. Finally, we derive the semi-inner product reproducing kernel. In Sect 4, specific forms of semi-inner product reproducing kernels are presented. Subsequently, we validate the effectiveness of these kernels through motivational examples and compare them with polynomial reproducing kernels in Sect 5. Sect 6 summarizes the contributions of this study and discusses key issues to outline directions for further research.

2 Review of the reproducing Kernel Hilbert space

Before giving the theory and proof of the reproducing kernel Banach space, we have reviewed the reproducing kernel theory in Hilbert space. This review addresses three key points as follows.

  • The reproducing kernel Hilbert space (RKHS) is defined by the evaluation function.
  • The necessary conditions for the existence of reproducing kernels.
  • The positive definiteness property that enables kernel construction. The construction is implemented using a method provided by the Moore-Aronszajn theorem. A function must be positive definite to serve as a reproducing kernel, ensuring that it can define an inner product and generate the corresponding RKHS using the Moore-Aronszajn theorem.

2.1 Reproducing Kernel Hilbert space

Definition: Let where X is a set, RKHS is a Hilbert space of functions on X, if the evaluation functional is a continuous linear functional.

Let be a reproducing kernel on X. Then there exists a unique RKHS satisfying:

  1. The Membership Property: , for each ;
  2. The Reproducing Property: For any , (2)
    denotes the inner product on ;
  3. The Inner Product Between Kernel Functions Property: Let for , then we can obtain, (3)

A RKHS is a Hilbert space of functions in which the evaluation of a function at any point can be represented as an inner product. Specifically, given a positive definite kernel function K(x,y), the RKHS associated with K consists of functions f such that for all , the reproducing property of Eq (2) holds. This reproducing property guarantees that pointwise evaluation is a continuous linear functional, a critical feature that distinguishes RKHS from general Hilbert spaces. Moreover, each RKHS corresponds uniquely to a kernel function, and the inner product structure it inherits from Hilbert spaces facilitates rigorous mathematical analysis and optimization.

2.2 Moore-Aronszajn theorem

Theorem 1 (Moore-Aronszajn Theorem). Let be positive definite. There exists a unique RKHS associated with the reproducing kernel K. Furthermore, if space is equipped with the inner product

where and , then constitutes a valid RKHS.

We can derive the following conclusions from the proof of the RKHS. First, in the process of proving the reproducing kernel, since the Hilbert space is self-conjugate, we can introduce the isometric isomorphic mapping . This allows us to replace the elements in with the elements in to complete the proof. In case of reproducing kernel Banach space (RKBS) , we retain certain properties of . We assume that RKBS is reflexive, meaning that . Second, in RKHS, since the Riesz representation theorem holds, f(x) can be expressed as an inner product. The inner product can be viewed as a bilinear map between . It is called a bilinear map because , and the following holds:

and

Third, the reproducing kernel is intrinsically a bivariate function that becomes a univariate function when one variable is fixed. The Moore-Aronszajn theorem utilizes the space spanned by this univariate function to define the RKHS, thereby providing a method for constructing RKHS. This inspires the construction of RKBS, and we will adopt a similar strategy to design algorithms within the RKBS framework.

3 Theorem and proof of the reproducing Kernel Banach space

When generalizing the reproducing kernel to a Banach space, a bilinear form analogous to Eq (2) in the Banach space must be introduced. Specifically, we aim to extend the Riesz representation theorem to the Banach space. To achieve this, we define the bilinear form of on the normed vector space

(4)

If the Riesz Representation Theorem holds on RKBS, that is to say, , because is a bounded linear functional on , there so that

(5)

Based on the definitions and properties of RKHS, we formally introduce the corresponding definitions and fundamental properties for RKBS.

Definition: A Reproducing Kernel Banach space (RKBS) is a Banach space of functions defined on X in which the evaluation functional is a continuous linear functional [19].

Property: Let be a reproducing kernel on X. Then there exists a unique RKBS with dual space satisfying the following conditions:

  1. The Function Property: for all ;
  2. the Reproducing Property: For any ,
    where denotes the dual bilinear pairing on ;
  3. The Inner Product Between Kernel Functions Property: For any , let ,

If the Riesz representation theorem holds on a Banach space, Theorem 2 can be rigorously derived.

3.1 Densify theorem of the reproducing Kernel Banach space

The densification theorem in RKBS theory establishes that the linear span of reproducing kernels forms a dense subset of the entire space. Here, density implies that every element in the RKBS admits arbitrarily close approximation (with respect to the space’s norm) by finite linear combinations of reproducing kernels. This fundamental result holds significant importance in both functional analysis and kernel-based machine learning, extending the classical density property of RKHS to the more general and structurally complex setting of Banach spaces.

3.1.1 Theorem and proof summary.

Theorem 2. Let be an RKBS on X, defined by

and let its conjugate space , given by

where and are reflexive Banach spaces with being the dual of , and there exist two maps and such that

(6)

The bilinear form between and is defined by

(7)

A function is the reproducing kernel of an RKBS on X if and only if the reproducing kernel K is expressed by Eq (8)

(8)

and satisfies the conditions of Eq (6).

The proof of the theorem is divided into two parts, i.e., sufficiency and necessity. The proof of sufficiency consists of three steps. First, we prove that is a Banach space. Since the construction of is analogous to that of , and is known to be a Banach space, we begin by verifying the density condition. Next, we demonstrate that both and are RKBS over X. Specifically, we prove that the evaluation function is continuous on both and . This is established using the Cauchy–Schwarz inequality. Finally, we use a constructive approach to verify that the reproducing kernel expression given in the theorem satisfies the criteria for being a reproducing kernel.

The proof of necessity also consists of three steps. First, we prove the uniqueness of the reproducing kernel by contradiction. Then, using a constructive method, we derive the explicit expression of the kernel given in the theorem. Finally, we prove the dense condition. We use inference from the Hahn-Banach theorem and a proof by contradiction. By assuming the density condition does not hold, we derive a contradiction. Therefore, the assumption is invalid, and the dense condition is confirmed. Taken together, these steps complete the proof of the theorem.

3.1.2 Proof of sufficiency and necessity in the theorem.

Proof of sufficiency. Let and assume that , . Because is dense in , we have . Therefore, , , implying that u = 0. Conversely, if u = 0, then , , which is obvious. This demonstrates that the representation of the function in is unique. In other words, we can use u to represent , i.e., each function in can be uniquely represented by the element .

Based on this one-to-one mapping relationship, we define the norm on the space as follows: . Since is a Banach space, endowed with this norm consequently forms a Banach space. Similarly, is a Banach space. Define the bilinear form on as Clearly, that

Consequently, every function in is a bounded linear functional on . This is because the linear mapping is isometric from to . Therefore, contains all bounded linear functionals on , implying that . Since is reflexive, is the conjugate space of , implying that both and are reflexive. In addition, ,

Similarly, it can be proved that the evaluation functional is bounded on . Therefore, is the RKBS over X. Next, we prove the existence of the reproducing kernel: , and .

This demonstrates that K is the reproducing kernel of . Therefore, the sufficiency part is proved.

Proof of necessity. Since is an RKBS over X and K is its reproducing kernel, is bounded. In other words, such that

Let , assume that , such that , . Then,

(9)

Suppose instead there exists another function satisfying , and

Then , . Since , , so, for all , we have

Therefore, the function is exactly the same as the function , in other words, . Similarly, there exists a unique function such that and .

Let , and , then

(10)

and

(11)

We obtain that , therefore, , .

Let , , then . Let , and suppose that . By the Hahn-Banach theorem, ,

However, the above formula implies that , , that is, . This contradicts the above formula. Therefore, . Similarly, , and the proof is complete.

3.2 Construction of the reproducing Kernel Banach space with properties of semi-inner product, Fréchet differentiable and convex

There are three primary properties of RKBS that should be defined. These properties indicate that a Banach space is being constructed which not only possesses a reproducing kernel structure but also incorporates the important mathematical properties. First is the semi-inner product; it allows for the definition of a certain notion of direction and magnitude in non-Hilbert spaces. Second is the Fréchet differentiability, which enables the definition of derivatives or gradients in Banach spaces, which is beneficial for functional analysis and optimization problems in machine learning. The third is the convexity. It ensures the existence of certain minimizers and the uniqueness of the reproducing kernel. The combination of these properties provides a solid foundation for both the theoretical development and practical applications of RKBS, such as in support vector machines and functional approximation.

3.2.1 Semi-inner product.

From Theorem 2, it follows that an RKBS corresponds to a unique reproducing kernel. However, a single reproducing kernel may correspond to multiple RKBSs. For example, consider the kernel function , where . Two distinct feature mappings exist for this kernel:

  • First feature mapping: , with the associated Banach space .
  • Second feature mapping: , with the associated Banach space .

, this demonstrates how a single reproducing kernel k can correspond to two different RKBS ( and ) through distinct feature embeddings. Additionally, if X is a finite set, every non-zero functional K defined on corresponds to the reproducing kernel of a certain RKBS on X.

In other words, the reproducing kernel of a general RKBS may be neither positive definite nor symmetric. This is due to the distinction between Banach space and Hilbert space. There is no inner product structure in Banach space. To ensure that the reproducing kernel of an RKBS possesses the same properties as that of an RKHS, we introduce the semi-inner product, as defined by Lumer [20], into the RKBS. A semi-inner product on a vector space is a function, denoted by , that maps to and satisfies the following conditions for all and :

  1. if and only if f = 0,
  2. ,
  3. .

The semi-inner product differs from the inner product in that it does not satisfy the property of conjugate symmetry, that is, . This results in the second variable of the semi-inner product being not additive, in other words

(12)

We can easily prove that a semi-inner product is an inner product if and only if the second variable of the semi-inner product has additivity, that is

(13)

In Lumer [20], it was demonstrated that endowed with a semi-inner product is a normed space, where the norm is defined by

(14)

If a vector space has a semi-inner product and the norm on is induced by Eq (14), we refer to as a semi-inner product space.

By the Cauchy-Schwarz inequality, if is a semi-inner product space, for each , corresponds to a continuous linear functional on , we denote this linear functional as g*. From this definition, we obtain

(15)

3.2.2 Differentiable and convex characteristics.

In general, semi-inner products on normed vector spaces may not be unique. The differential properties of the norm ensure uniqueness. Let denote the unit sphere on the normed vector space , . If the limit exists

(16)

Moreover, if the limit converges uniformly on , then the normed vector space is called Fréchet differentiable [21].

Furthermore, we introduce the property of uniform convexityity to the vector space, which ensures the Riesz representation theoris valid valid on the semi-inner product space. The normed vector space is uniformly convex if such that,

(17)

3.2.3 Uniformly Fréchet differentiable Banach Space and uniformly convex.

According to [22], it was shown that a uniformly convex Banach space is reflexive. In addition, a normed vector space is uniformly Fréchet differentiable if and only if its dual space is uniformly convex [23]. From these two properties, we obtain the following result. If is a uniformly convex and uniformly Fréchet differentiable Banach space, then the space is reflexive, meaning is also a uniformly Fréchet differentiable and uniformly convex Banach space. Because is a Fréchet differentiable Banach space, then its dual space is a uniformly convex Banach space. Moreover, since is a uniformly convex Banach space and the dual space of is , it follows that is a uniformly Fréchet differentiable Banach space. In summary, and are uniformly convex and uniformly Fréchet differentiable Banach spaces.

Let be a uniformly convex and uniformly Fréchet differentiable Banach space. Then, for each there exists a unique so that , i.e., [21]. Moreover, . Let X be an input space, and let be a uniformly Fréchet differentiable and uniformly convex Banach space, with as the conjugate space of . We define mapping to , to , and . By combining the two conditions added above, from Theorem 2, we can obtain the following conclusions.

3.3 Dual mapping theorem of the reproducing Kernel Banach space

The Dual Mapping Theorem is a fundamental result in the theory of RKBS. It states that, under appropriate conditions, the dual space of an RKBS also forms an RKBS equipped with a corresponding dual reproducing kernel. This implies that the reproducing property is preserved not only in the original space but also in its dual, allowing any continuous linear functional to be represented via pairing with the dual kernel function. The theorem plays a crucial role in functional analysis and kernel-based machine learning, providing a theoretical foundation for extending kernel methods to more general Banach spaces, such as those based on Lp norms, for tasks like regression and classification.

Theorem 3. Let be a function that maps X to , and is a function that maps X to , so that

(18)

Then is uniformly Fréchet differentiable and uniformly convex Banach space, and with norm . The semi-inner product on the space can be expressed as

(19)

and is a uniformly Fréchet differentiable and uniformly convex Banach space, and with norm . The semi-inner product on the space can be expressed as

(20)

And is the conjugate space of . and have the following bilinear form

(21)

A mapping G on is a semi-inner-product reproducing kernel if and only if it satisfies the Eq (22),

(22)

where is a function that maps X to a uniformly Fréchet differentiable and uniformly convex Banach space , which satisfies Eq (18).

Proof. Given that is a uniformly convex and uniformly Fréchet differentiable Banach space, we utilize the semi-inner product to explicitly characterize the dual bilinear mapping on . This construction satisfies all prerequisite assumptions for Theorem 2, particularly establishing the validity of the Riesz representation theorem in . Through the replacement of the bilinear mapping in Theorem 2 with the semi-inner product formulation (as specified in Eq (15)), we thereby complete the proof of the theorem.

So far, by incorporating uniform Fréchet differentiable and uniform convex into the semi-inner product space, we have derived the theoretical form of the semi-inner product RKBS and obtained the corresponding expression of the reproducing kernel on the space. We now summarize the mapping relationships between different spaces as presented in Theorem 3. As shown in Fig 2, a schematic diagram illustrates the generation of the semi-inner product RKBS and the corresponding reproducing kernel is provided.

thumbnail
Fig 2. Schematic diagram of the reproducing kernel generation.

, are bounded linear functionals on space X. Denote the space spanned by as , and the space spanned by as . The relationship between the semi-inner product and the bilinear form given by Eq (15), we will use the mapping and respectively on the spaces and , and are generated. The bilinear form on and provides the expression of the semi-inner product reproducing kernel.

https://doi.org/10.1371/journal.pone.0340686.g002

The theoretical foundation of RKBS has achieved substantial progress. The work by Xu [24] introduced a generalized RKBS framework by constructing left-sided and right-sided RKBS, extending the theory to asymmetric spaces. Song and Zhang [25] systematically investigated the theoretical mechanism of l1 norm based RKBS for improving algorithmic learning rates and provided rigorous theoretical proofs. However, existing studies remain confined to theoretical exploration, lacking explicit analytical forms of reproducing kernel functions and their integration into practical algorithms to address real-world problems.

To bridge the gap between theory and practical applications, this study aims to advance the practical implementation of RKBS in machine learning. Based on the algebraic properties of core computational operations in algorithms, we systematically analogize the structure of conventional inner products and adopt semi-inner products to replace inner products, thereby constructing a semi-inner-product reproducing kernel and its corresponding semi-inner-product RKBS theoretical framework. This framework not only preserves rigorous theoretical foundations but also enhances the applicability of reproducing kernel methods to real-world problems, offering a novel theoretical tool and practical pathway for complex data modeling. In the subsequent sections, we will first rigorously derive the mathematical construction of the semi-inner product reproducing kernel, then apply it to typical machine learning tasks through multiple empirical experiments, and systematically compare its performance with traditional kernel methods to validate the superiority and effectiveness of the proposed framework in practical applications.

4 Construction of semi-inner product reproducing Kernel

The construction of a semi-inner product reproducing kernel refers to defining a kernel function in a Banach space using a semi-inner product, which generalizes the concept of an inner product to spaces lacking symmetry or bilinearity. This construction allows the reproducing property, central in kernel methods, to extend beyond Hilbert spaces, enabling learning and approximation in more general Banach spaces.

Below, we present a reproducing kernel Banach space and a reproducing kernel on this space. Let , and p,q be conjugate numbers, , . Define and mappings by

For any , the Fourier transform and its inverse are defined as the following two functions,

We construct two function spaces, and , generated by mapping and , respectively:

For all and , the semi-inner product is given by The reproducing kernel K on the space is defined by

(23)

Following this construction, we introduce several discrete forms of semi-inner product reproducing kernels as shown in Eq (24), demonstrating the originality of this work.

Let x = {xi}, y = {yi}, , we have

(24)

5 Experiment to verify the effectiveness of the semi-inner product Kernel function

5.1 Experimental setting

We validate the effectiveness of the proposed semi-inner product kernel function through four sets of experiments using different datasets. All datasets employed in our experiments were obtained from scikit-learn, an open-source Python machine learning library [26]. This library provides efficient tools for data mining and analysis and is widely adopted in both academic research and industry.

In the first experiment, we generated concentric circular test data comprising 800 sample points using the make_circles method from the scikit-learn library. This synthetic dataset was designed to evaluate the algorithm’s capability to correctly identify the optimal hyperplane separating the two distinct classes [26]. In the second experiment, we evaluated the algorithm’s classification performance using the classic Iris dataset [27]. In the third experiment, we employed the make_moons method from the scikit-learn library to generate a double semi-circle dataset, thereby further evaluating the algorithm’s classification capability on non-linearly separable data [26]. Finally, to systematically evaluate the comprehensive performance of the algorithm in handling multi-class classification problems, this study employs the standard wine dataset [28] for benchmark testing.

In all four experiments, we incorporated , the classical polynomial kernel, linear kernel, sigmoid kernel and the highly effective radial basis function(RBF) kernel as reproducing kernels in the support vector machine algorithm. Comparative experiments were conducted to validate the effectiveness of the semi-inner product kernel proposed in this study.

5.2 Experiment 1: Concentric circles

In the concentric circle experiment, each circle represents one class of data, totaling two classes. We aim to find a hyperplane that can separate the points of the two circles. Since the SVM algorithm seeks a hyperplane that separates the two classes of data while maximizing the margin between them, the optimal separating hyperplane in this experiment is a circular hyperplane located midway between the two concentric rings. We expect the algorithm to produce a separating hyperplane that lies midway between the two concentric circles.

As shown in Fig 3, the points in the blue region belong to one class of data, while those in the yellow region belong to the other class. The interface formed between the two colored regions is the separating hyperplane obtained by the algorithm. We can clearly observe that the KSVM algorithm based on the semi-inner product reproducing kernel and the RBF kernel successfully classifies the two classes of data points. Both algorithms identified a circular separating surface located midway between the two classes, achieving nearly identical results. In contrast, KSVM algorithms employing both the polynomial kernel and the linear kernel yielded similar results, failing to achieve correct classification between the two data categories. Likewise, the KSVM model based on the sigmoid kernel also demonstrated an inability to effectively separate the two classes.

thumbnail
Fig 3. Results on the concentric circles dataset.

Algorithms utilizing linear, polynomial, and sigmoid kernels failed to achieve correct classification due to their inability to capture the radial distribution characteristics of the data. In contrast, both the semi-inner-product kernel and RBF kernel successfully generated accurate decision boundaries, achieving perfect separation of the concentric structures.

https://doi.org/10.1371/journal.pone.0340686.g003

5.3 Experiment 2: Iris dataset

In the classification experiment conducted on the Iris dataset, we classified two types of flowers. As shown in Fig 4, the blue points represent one class of data, while the yellow points represent the other class. The goal of the algorithm is to find the maximum-margin hyperplane that separates the two classes of data. The algorithms based on the five reproducing kernels each divided the data into two classes. The points in the blue region belong to one class, and those in the yellow region belong to the other class. The interface formed between the two colored regions is the separating hyperplane obtained by the algorithm.

thumbnail
Fig 4. Experimental results obtained on the iris dataset.

The polynomial kernel failed to yield satisfactory classification results. Although the semi-inner-product kernel, RBF kernel, sigmoid kernel, and linear kernel all achieved accurate data separation, the RBF kernel exhibited signs of overfitting.

https://doi.org/10.1371/journal.pone.0340686.g004

From the figure, we can observe that:

  • The KSVM algorithms based on the semi-inner-product reproducing kernel, RBF kernel, linear kernel, and sigmoid kernel all achieved perfect classification, successfully separating the two classes of data without any misclassified samples.
  • In contrast, the KSVM algorithm employing the polynomial reproducing kernel exhibited noticeable misclassification, resulting in lower classification accuracy compared to the other methods.

5.4 Experiment 3: Double semi-circle

In the experiment on the double semi-circle dataset, each semicircle represents one class of data, totaling two classes. As shown in Fig 5, the blue points belong to one class, while the yellow points belong to the other class. We aim to find a hyperplane that can separate the two classes of data. According to SVM theory, the optimal separating hyperplane in this case is a cubic curve located between the two classes of data points, and we expect the algorithm to produce this separating hyperplane. The algorithms based on the five reproducing kernels each divided the data into two classes. The points in the blue region belong to one class, and those in the yellow region belong to the other class. The interface formed between the two colored regions is the separating hyperplane obtained by the algorithm.

thumbnail
Fig 5. Experimental results obtained on the double semicircle dataset.

The polynomial kernel did not obtain good classification results, while the semi-inner product kernel and RBF kernel both obtained accurate results.

https://doi.org/10.1371/journal.pone.0340686.g005

We can observe that:

  • Polynomial and Linear Kernels: The separation hyperplanes generated by these kernels exhibit highly linear characteristics, resulting in insufficient model flexibility to capture nonlinear patterns in the data. Consequently, the classification performance is suboptimal, and the algorithm fails to effectively accomplish the given classification task.
  • Sigmoid Kernel: Although this kernel introduces nonlinear mapping, the constructed hyperplane still fails to form an effective inter-class separation boundary, leading to significant misclassification phenomena.
  • RBF and Semi-Inner-Product Reproducing Kernels: Despite minor localized classification errors, the hyperplanes constructed by these kernels effectively capture the essential distributional differences between the two classes, achieving high overall separation accuracy and successfully accomplishing the classification task.
  • The separation hyperplane generated by the semi-inner-product reproducing kernel algorithm exhibits a cubic curve morphology, which most closely approximates the theoretically optimal classification interface among all compared models, without exhibiting overfitting. The geometric characteristics of this hyperplane reveal two key insights: First, the cubic curve structure effectively balances fitting accuracy and generalization capability; Second, its curvature variations align closely with the data distribution density, demonstrating the adaptive advantages of this kernel function in feature space mapping.

5.5 Experiment 4: Wine dataset

In this experiment, the wine dataset was selected as the benchmark platform [28]. This dataset comprises 178 wine samples, each represented by a 13-dimensional vector of continuous chemical features, including key physicochemical indicators such as alcohol content, malic acid concentration, alkalinity of ash, and total phenolic compounds. All samples originate from three distinct grape cultivars, forming a standard three-class supervised learning problem, thereby providing an ideal data foundation for validating the discriminative performance of complex classification algorithms. We trained and evaluated the algorithm by calculating the Accuracy, Precision, Recall, and F1-score for each kernel function on the test set, with the results summarized in Table 1.

thumbnail
Table 1. Comparison of classification performance among different kernel functions on the handwritten digits dataset (red indicates optimal values).

https://doi.org/10.1371/journal.pone.0340686.t001

We can observe that:

  • The RBF kernel achieved perfect scores of 1.00 across all four evaluation metrics, demonstrating its exceptional capability in modeling nonlinear feature mappings.
  • The semi-inner-product reproducing kernel also attained optimal results of 1.00 on all metrics, matching the performance of the RBF kernel exactly.
  • The polynomial and linear kernels exhibited comparable performance, both obtaining scores of 0.97 across all four metrics.
  • The sigmoid kernel showed marginally better overall performance (0.98), though it still fell short of the optimal level.

Overall, the experimental results confirm the critical role of kernel function selection in determining classification performance. It is particularly noteworthy that the semi-inner-product reproducing kernel achieved classification performance fully comparable to the extensively validated RBF kernel, demonstrating significant potential for further generalization and optimization.

The integrated results from four experiments demonstrate that the separation hyperplanes constructed by KSVM algorithms using polynomial, linear, and sigmoid kernels exhibit significant limitations, resulting in generally suboptimal classification performance. In contrast, the semi-inner-product reproducing kernel and the RBF kernel achieve comparable classification accuracy in KSVM, with the semi-inner kernel demonstrating superior overfitting suppression on certain datasets. Notably, the semi-inner kernel effectively enhances nonlinear feature mapping capabilities through its reproducing properties, exhibiting exceptional generalization performance in complex boundary classification tasks. This study represents the first systematic experimental validation of the efficacy and application potential of the semi-inner-product reproducing kernel for nonlinear classification problems.

6 Conclusion and future work

In this study, we commence with a review of the conventional RKHS and its associated reproducing kernels. We define the necessary and sufficient conditions for the existence of a reproducing kernel and examine the construction methodologies of RKHS. Finally, we consolidate the core theorems and conditions relevant to the proof of RKHS, introduce the analogous definition of RKBS, and present the necessary and sufficient conditions for the existence of a reproducing kernel in RKBS, supported by a comprehensive proof process.

We address the existing issues in RKBS by integrating a semi-inner product into the space and augmenting it with two additional properties, ultimately establishing a semi-inner product RKBS that satisfies the required conditions. This demonstrates the originality of this study. Subsequently, we provide an example demonstrating the generation of a reproducing kernel and the construction process of each space. To facilitate the application of the semi-inner product reproducing kernel in machine learning algorithms, we introduce a discrete form of the reproducing kernel based on the example. Through systematic comparative experiments, we have not only validated the effectiveness of the semi-inner-product reproducing kernel in algorithmic design, but also demonstrated its significantly superior classification performance over Sigmoid, polynomial, and linear kernels, while achieving results comparable to those of the RBF kernel. These empirical findings collectively indicate that the proposed kernel function maintains theoretical innovativeness while possessing substantial value for practical application and dissemination.

However, there are still several unresolved issues in this work that require further improvement in future research. First, we need to provide theoretical proof based on RKBS for various algorithms. Second, this work did not systematically perform a large number of evaluations but only validated its effectiveness. Additionally, RKHS-based algorithms have a broad range of application scenarios. We need to identify application scenarios where RKBS-based algorithms offer advantages and explain why they outperform in these scenarios. These research issues will be addressed in future work.

We intend to leverage the semi-inner product reproducing kernel to enhance machine learning algorithms, such as kernel method-based algorithms, support vector machines, and principal component analysis, to develop a more comprehensive theory of machine learning algorithms grounded in the framework of reproducing kernel Banach space. Through a motivating example, we not only verified the effectiveness of the semi-inner-product reproducing kernel in the algorithm but also achieved superior results compared to the polynomial reproducing kernel. The results obtained from the KSVM using the semi-inner-product reproducing kernel are identical to those obtained from the KSVM with the RBF kernel, demonstrating the practical utility of the semi-inner-product reproducing kernel.

The kernel method originated in the early 20th century after Mercer’s theorem was proposed and Hilbert space was established. Since then, primary research in kernel methods has focused on algorithm development and application practices within the framework of RKHS. We hope this paper will attract machine learning researchers to the theoretical and algorithmic development in RKBS, enriching the study of kernel methods and marking a milestone in the history of kernel method research.

References

  1. 1. Mercer J. Functions of positive and negativetypeand their connection with theory ofintegral equations. Philosophical Transactions of Royal Society. 1909;209:4–415.
  2. 2. Bergman S. The Kernel function and conformal mapping. Providence: American Mathematical Society. 1950.
  3. 3. Aronszajn N. Theory of reproducing kernels. Trans Amer Math Soc. 1950;68(3):337–404.
  4. 4. Cucker F, Smale S. On the mathematical foundations of learning. Bull Amer Math Soc. 2001;39(1):1–49.
  5. 5. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
  6. 6. Evgeniou T, Pontil M, Poggio T. Regularization networks and support vector machines. Advances in Computational Mathematics. 2000;13(1):1–50.
  7. 7. Schölkopf B, Smola AJ. Learning with Kernels: support vector machines, regularization, optimization, and beyond. MIT Press; 2001.
  8. 8. Shawe-Taylor J, Cristianini N. Kernel methods for pattern analysis. Cambridge University Press; 2004.
  9. 9. Qu L, Pei Y, Li J. A data analysis method using orthogonal transformation in a reproducing Kernel Hilbert space. In: 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2023. p. 887–92. https://doi.org/10.1109/smc53992.2023.10394417
  10. 10. Schölkopf B, Smola A, Müller K-R. Nonlinear component analysis as a Kernel eigenvalue problem. Neural Computation. 1998;10(5):1299–319.
  11. 11. Schölkopf B, Smola A, Müller KR. Kernel principal component analysis. In: International conference on artificial neural networks. Springer; 1997. p. 583–8.
  12. 12. Parzen E. On estimation of a probability density function and mode. Ann Math Statist. 1962;33(3):1065–76.
  13. 13. Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR. Fisher discriminant analysis with kernels. In:Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop. IEEE; 1999. p. 41–8.
  14. 14. Fabian MJ, Habala PH, Pelant J. Functional analysis and infinite-dimensional geometry. Providence: Springer; 2001.
  15. 15. Tropp JA. Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans Inform Theory. 2006;52(3):1030–51.
  16. 16. RUBIN DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
  17. 17. Little RJ, Rubin DB. Statistical analysis with missing data. John Wiley & Sons, Inc.; 1986.
  18. 18. Hartley HO, Hocking RR. The analysis of incomplete data. Biometrics. 1971;27(4):783.
  19. 19. Zhang H, Xu Y, Zhang J. Reproducing Kernel Banach spaces for machine learning. Journal of Machine Learning Research. 2009;10(12):2741–75.
  20. 20. Lumer G. Semi-inner-product spaces. Trans Amer Math Soc. 1961;100(1):29–43.
  21. 21. Giles JR. Classes of semi-inner-product spaces. Trans Amer Math Soc. 1967;129(3):436–46.
  22. 22. Conway JB. A course in functional analysis. New York: Springer-Verlag; 1990.
  23. 23. Cudia DF. On the localization and directionalization of uniform convexity. Bull Amer Math Soc. 1963;69(2):265–7.
  24. 24. Xu Y, Ye Q. Generalized Mercer Kernels and reproducing Kernel Banach spaces. American Mathematical Society; 2019.
  25. 25. Song G, Zhang H. Reproducing Kernel Banach Spaces with the 1Norm II: error analysis for regularized least square regression. Neural Computation. 2011;23(10):2713–29.
  26. 26. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. Journal of Machine Learning Research. 2011;12:2825–30.
  27. 27. Fisher RA. Iris. UCI Machine Learning Repository. 1936. https://archive.ics.uci.edu/ml/datasets/iris
  28. 28. Aeberhard S, Forina M. Wine. UCI Machine Learning Repository. 1992.