Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Investigating the dynamics and uncertainties in portfolio optimization using the Fourier-Millen transform

  • Muhammad Hilal Alkhudaydi ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    mh.ayedh@tu.edu.sa; amharthi@tu.edu.sa

    Affiliation Department of Mathematics and Statistics, College of Science, Taif University, Taif City, Saudi Arabia

  • Aiedh Mrisi Alharthi

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, Turabah University College, Taif University, Taif City, Saudi Arabia

Abstract

Many investors and financial managers view portfolio optimisation as a critical step in the management and selection processes. This is due to the fact that a portfolio fundamentally comprises a collection of uncertain securities, such as equities. For this reason, having a solid understanding of the elements responsible for these uncertainties is absolutely necessary. Investors will always look for a portfolio that can handle the required amount of risk while still producing the desired level of expected returns. This article uses feature-based models to investigate the primary elements that contribute to the optimal composition of a specific portfolio. These models make use of physical analyses, such as the Fourier transform, wavelet transforms and the Fourier–Mellin transform. Motivated by their use in medical analysis and detection, the purpose of this research was to analyse the efficacy of these methods in establishing the primary factors that go into optimising a particular portfolio. These geometric features are input into artificial neural networks, including convolutional and recurrent networks. These are then compared with other algorithms, such as vector autoregression, in portfolio optimisation tests. By testing these models on real-world data obtained from the US stock market, we were able to obtain preliminary findings on their utility.

Introduction

The overarching goal of this research is to identify the most effective strategy for portfolio management. To achieve this, we employed various methods for extracting information to identify the most effective components for portfolio optimisation.

Portfolio optimisation is a key component of contemporary finance, as it helps investors maximise profits while lowering risk. Due to rapid improvements in technology and the increasing complexity of financial markets, there is a growing need for sophisticated strategies to efficiently optimise portfolios [1]. This study seeks to provide a thorough understanding of cutting-edge approaches and their applications to portfolio optimisation using MATLAB. An outline of the study is first presented. Then, a section on the theory explores the guiding concepts of each of the seven topics covered.

The study goes on to compare the outcomes of three portfolio optimisation strategies: vector autoregression (VAR) (as a baseline), continuous wavelet transform (CWT) combined with a convolutional neural network (CNN), and the Fourier–Mellin transform (FMT) combined with a recurrent neural network (RNN). The purpose of the paper to determine the best method for optimising portfolios under various market scenarios by contrasting the performance of several feature extraction methods based on deep learning.

The application of machine learning methods, including automated machine learning (AutoML), CNNs, and long short-term memory (LSTM) networks, has garnered considerable interest within the dynamic field of investment strategies. The use of these methodologies, which are frequently combined with traditional statistical approaches such as VAR, CWT, and the FMT, presents opportunities for improving investment decision-making and portfolio management.

Portfolio optimisation is the process of choosing the best feasible set of assets to accomplish a particular investment goal, while taking the investor’s risk tolerance into account [2]. Modern portfolio theory (MPT), which established the concepts of diversification and the efficient frontier, serves as the theoretical underpinning of portfolio optimisation [3]. To reduce risk, diversification includes spreading investments across a variety of assets, while the efficient frontier is the set of ideal portfolios that provide the best expected return for a given level of risk. The Black–Litterman model, downside risk measures, and resilient optimisation techniques are only a few of the new optimisation techniques and risk measures that have been introduced as a result of the continued development and extension of MPT by numerous studies [4].

A study by Feng et al. conducted an in-depth examination of the contributions of different frequency components to image characteristics using weighted standard deviation (WSD) and weighted column standard deviation-based filtering to investigate image features [5]. The suggested methodology employs a weighted evaluation, which highlights specific aspects based on their importance and provides a numerical measure for quantitative analysis. Although WSD is computationally complex and requires careful interpretation, its ability to detect both high- and mid-frequency components allows for a comprehensive understanding of image quality [5]. The authors use wavelet spectral density to build a strong statistical model that accurately shows how different frequency components affect each other. This could make the model useful for processing and analysing images [5].

Although the work introduces advanced approaches for image registration and analysis using WSD methodologies, it also directly addresses the influence of noise as a key limitation, especially in short-exposure images. For rotational and scaling shifts, the authors agree that noise may reduce the accuracy of relative shift calculations. In this regard, the proposed FMT approach may not be as effective. Robust noise reduction methods must be used to address this issue and achieve improved outcomes from image processing.

The goal of portfolio optimisation is to provide investors with guidance for controlling their portfolios by determining the optimal allocations in different markets. Ali et al. analysed different portfolios by experimenting with changes in optimal weights for two eras – pre-COVID and during the COVID pandemic – in Asia Pacific [6]. They implemented various models as a core feature of their study, including dynamic conditional correlation (DCC), which is an extension of constant correlation estimation; multivariate and bivariate DCC-generalised autoregressive conditional heteroskedasticity (GARCH) models; the standard GARCH (1,1) model; and portfolio optimisation methods such as mean-variance (MV) and safe-haven dynamics, to provide protection during market conditions (a hedge) [6].

Additionally, the use of DCC-GARCH models was investigated to study green and non-green cryptocurrencies, diversification, risk management, and the impact of green assets on various equity portfolios by analysing risk–return dynamics [7]. This study evaluated the effectiveness of several portfolio optimisation techniques and weight constraints on assets to gain a deeper understanding of their practical implementations. By testing the safe-haven dynamics of specific cryptocurrencies, the study offers valuable insights into cross-market hedging and safe-haven opportunities for investors during periods of extremely low returns [7]. The assumptions used in the study, including GARCH modelling, which fails to capture certain market dynamics (such as jumps), may impact its robustness. Methods used to determine the optimal weights may also be significantly affected by the addition of factors such as liquidity limitations, investor preferences, and transaction costs, among others. Therefore, combining these studies with deep learning methods is important to investigate the key factors behind portfolio optimisation dynamics. Deep learning approaches have the power to handle various complexities, such as non-linear patterns in data, which is missing in traditional statistical models such as GARCH. Another advantage to deep learning methods is that they learn from data automatically, in contrast to the GARCH model, which depends more on manual feature engineering [8].

Artificial neural networks with many layers are the focus of the machine learning field known as deep learning. These networks can recognise subtle correlations and patterns in data [4], making them appropriate for a range of financial applications including portfolio optimisation [9]. Large datasets can be used to train deep learning models, enabling them to recognise complex patterns in financial data that may be challenging to find using more conventional techniques [10]. This may result in more accurate portfolio allocation decisions and better asset return estimates. Feedforward neural networks, CNNs, RNNs, and autoencoders are a few of the common deep learning architectures used in finance.

Specialised neural network architectures such as CNNs and RNNs are designed for particular kinds of data. CNNs have been used to evaluate financial time series data for a variety of tasks, including predicting stock prices and spotting market trends [11]. CNNs are particularly well suited for processing grid-like data, such as pictures or time series data. The temporal dependencies in financial time series data have been modelled using RNNs, which are built to process sequential data. Due to their capacity to recognise intricate patterns and correlations in financial data, both CNNs and RNNs have demonstrated promise in financial applications including portfolio optimisation. Popular RNN variants have been used to solve the vanishing gradient problem (which can occur in conventional RNNs during trainings), including LSTM and gated recurrent unit (GRU) networks.

By projecting high-dimensional data onto a lower-dimensional space while retaining as much of the data’s variance as possible, dimensionality reduction techniques seek to reduce the complexity of huge datasets. One common linear transformation method used to reduce dimensionality is principal component analysis (PCA) [12]. PCA can help uncover important characteristics influencing asset returns in the context of portfolio optimisation and help minimise the dimensionality of the optimisation issue [13]. This can in turn reduce the danger of overfitting and increase the computational efficiency of portfolio optimisation algorithms. Other dimensionality reduction methods have also been investigated for a variety of applications within finance, including t-distributed stochastic neighbour embedding and independent component analysis.

Multivariate extensions of autoregressive models known as VAR(p) models are capable of capturing the dynamic interactions between several financial variables. Important insights for portfolio management have been made by using these models to simulate the coupled dynamics of asset returns, interest rates, and other macroeconomic variables [14]. VAR(p) models can assist in increasing the accuracy of portfolio optimisation algorithms and improve investment decisions by incorporating data on many financial factors [15]. To address key difficulties in financial time series analysis, such as cointegration and model uncertainty, extensions of VAR models have been developed, such as the vector error correction model and the Bayesian vector autoregressive model.

A time series can be divided into multiple frequency components using a wavelet transform, which enables researchers to examine the data at various time scales [10]. This method has been used to capture both short- and long-term patterns in asset returns in financial time series data [15]. Wavelet transforms can facilitate increased precision in portfolio optimisation algorithms and result in better investment decisions by combining data from various time scales [16]. Numerous financial applications, including volatility forecasting, risk management, and market efficiency analysis, have used wavelet-based techniques.

The FMT combines the Fourier transform and polar coordinates to examine a signal’s frequency and scale content [17]. By using this combination, the FMT can examine the frequency and scale content of financial data, resulting in a more thorough understanding of the data [18]. The FMT has been used by researchers in a number of image processing applications, including classification problems, to showcase its ability to process and analyse data using deep learning methods [5].The FMT can assist in increasing the accuracy of portfolio optimisation algorithms, resulting in better investment decisions by using data from both the frequency and scale domains.

This study introduces feature-based models that contribute to the field of portfolio optimisation in the following ways:

  1. Feature extraction is considered as a vital component for data analyses, with financial data being no exception. Investigating the underlying uncertainty of portfolio dynamics is a difficult task when relying only on historical prices or returns of assets, necessitating the use of a model able to learn from the data, such as a deep learning model.
  2. The FMT has many applications in image processing, including registration and rotation. This works by extracting the important geometric features using the Fourier transform along with polar coordinates. In addition, the wavelet transform is a beneficial tool in financial time series analysis, because it can handle non-stationary financial data, whereas the Fourier transform only deals with stationary cases.
  3. Integrating deep learning with the wavelet transform and FMT can lead to state-of-the-art feature-based deep neural network models capable of capturing the important elements of a financial portfolio. Our goal is not only to assess our models’ predictability but also to evaluate their ability to investigate the core data behind the portfolio dynamics.

This paper is divided into four sections. Section Mathematical explanation of tools used in this paper explains the mathematical methods necessary to build feature-based model blocks. These include VAR, PCA, CWT, RNNs, CNNs and the FMT. Section System design focuses on the design of three main feature-based models, from data selection to their structure. These models include the VAR automated machine learning investment strategy model (VAR(1)-AutoML), and the CWT investment strategy model (CWT-CNN), and the FMT recurrent neural network (FM-LSTM). Section Results evaluates the models’ performance, beginning with statistical analysis for the probability measures. Section Conclusions and future work discusses the conclusions of the study by highlighting its strengths and limitations.

Mathematical explanation of tools used in this paper

The goal of this section is to describe the key mathematical concepts employed in this study.

Vector autoregression

VAR is a statistical model widely used in econometrics and applied statistics. It is employed to analyse the linear relationships between several time series variables [19]. The VAR technique enables the modelling of each variable in a multivariate time series by considering its past values as well as the past values of other variables [19, 20].

Consider a multivariate time series , where each is a p 1 vector of p endogenous variables at time t:

(1)

A VAR model of order p (VAR(p)) can be represented as:

(2)

where:

  • c is a vector of constants (intercepts),
  • (for ) are coefficient matrices, and
  • is a vector of error terms.

The error terms are considered to have the following additional characteristics:

  • The expected value of is 0 (), indicating it has a mean of zero.
  • The expected value of is (), where is a positive definite matrix of size p p that represents the covariance matrix of the error terms.
  • The expected value of is 0 () for , ensuring that the errors are not correlated with each other over time [1921].

VAR models are used for predicting the behaviour of a set of variables,

performing impulse response analysis to investigate the impact of shocks,

and conducting variance decomposition to analyse the individual impact of each variable on the predicted errors [22].

(3)

is the formula for the VAR(1) model.

The given equation states that c is a vector of constants, is a matrix of coefficients, and is a vector of error terms [1921].

Based on the assumptions made by the model, the error terms are assumed to meet certain criteria:

  • Zero mean: .
  • Constant covariance: .
  • No serial correlation: for all .

Ordinary least squares (OLS) regression can be used to approximated the parameters and [23]. To obtain the estimates, we run a regression on all the variables using their lagged values.

For the VAR(1) model to be stable, each eigenvalue of must have a modulus smaller than one. The way the system reacts to shocks is set by the characteristic equation obtained from .

One way to observe the long-term effects of a single variable shock in a VAR(1) model is through the impulse response functions. The answers are obtained by raising to an appropriate power.

Econometricians and finance econometricians frequently employ VAR models when studying the fluctuating connections between different time series. Financial market analysts also often use VAR models to track asset returns so as to analyse interactions and predict future moves [1921].

VAR models are widely employed in the fields of econometrics and finance econometrics to examine the dynamic connections among numerous time series. They are commonly used in financial markets to analyse the log returns of assets, with the aim of understanding their relationships and making predictions about future movements.

Logarithmic returns are commonly employed in finance because of their advantageous characteristics, including time additivity and improved management of the multiplicative nature of asset values. The log return of asset i from time t − 1 to t for a given collection of assets is calculated using the formula:

(4)

The symbol Pi,t denotes the price of asset i at time t.

Given a set of n assets, the VAR model represents the log returns of these assets as:

(5)

The variable is the vector of log returns at time t,

is the vector of intercepts,

are the coefficient matrices, and is the vector of error terms, assumed to be white noise with covariance matrix .

The OLS method is commonly used to estimate the parameters of the VAR model. The model also enables the examination of impulse response functions. Forecast error variance decomposition is a method used to analyse the impact of shocks to individual assets and how they contribute to the forecast error variances of all assets [1921].

Principal component analysis

PCA is a statistical method commonly used for dimensionality reduction in a variety of domains, including financial mathematics. In the context of portfolio optimisation, PCA can be of great assistance in the process of extracting dominating financial ratios from a wide variety of economic data. These ratios can then be used to design an optimal investment portfolio [24]. The power of PCA in the field of financial mathematics stems from its capacity to transform complex multidimensional data into information easier to manage without compromising essential information. This, in turn, enables better informed investment decisions. It is interesting to note that although PCA has traditionally been linked to the maximisation of variance, greater variance does not always signal more important information for portfolio optimisation [24].

PCA is a prominent statistical methodology that employs an orthogonal transformation to transform a set of correlated observations of multiple variables into a set of principal components, which are linearly uncorrelated values. This procedure is frequently used to decrease the number of dimensions in datasets, improving comprehensibility while minimising the loss of information [25].

Prior to PCA analysis, a dataset is commonly denoted as an n p matrix referred to as . Each row in this matrix represents a unique replication of the experiment, while each column represents a specific type of feature, such as data collected from a particular sensor [26]. To achieve precise outcomes, the dataset undergoes preprocessing to ensure that the empirical mean of each column is zero, suggesting that the sample mean of each characteristic has been corrected to zero.

The transformation in PCA is determined by a collection of l p-dimensional weight vectors or coefficients, denoted as . The weight vectors are used to transform each row vector of into a new vector of principal component scores, which are calculated as follows:

(6)

The scores are calculated in a way that maximises the variance they derive from , while ensuring that each weight vector is a unit vector.

To maximise variance, the initial weight vector is obtained by solving the following optimisation problem:

(7)

Alternatively, this can be represented in matrix format as:

(8)

The first principal component score for a data vector is determined by the dot product of and , resulting in .

The k-th principal component can be obtained by first removing the influence from the previous k−1 components. This modified data matrix, denoted as , is written as:

(9)

Here, represents the weight vectors based on the first k−1 principal components.

The weight vector for the k-th principal component, , is then determined by solving the following optimisation problem:

(10)

Solving this maximisation optimisation problem yields an eigenvector that has a strong relationship with the largest eigenvalue of , which helps to determine the direction of maximum variance.

The k-th principal component score for a data vector can be evaluated as:

(11)

Here, represents the k-th eigenvector of .

The complete principal component decomposition of can be written as:

(12)

In this context, the matrix of weights, denoted as , consists of columns that are the eigenvectors of . These columns, which are scaled by the square root of the corresponding eigenvalues, are referred to as loadings.

The covariance matrix, denoted by , is proportional to the transpose of the matrix X multiplied by . Specifically, is related to the PCA of the dataset, and the covariance between different principal components can be expressed as follows [26]:

(13)

The eigenvalues are represented by the symbol . The eigenvectors corresponding to various eigenvalues are orthogonal, which ensures there is no covariance between different principal components.

The matrix transformation that diagonalises the empirical sample covariance matrix is often referred to as the whitening or sphering transformation. In formal terms, it is represented as follows:

(14)

Here, represents the diagonal matrix of eigenvalues.

PCA is an effective method for transforming original variables into a new coordinate system. In this new system, the axes are the directions of maximum variance, which are orthogonal to each other and ranked according to the variance they explain. This transformation is essential for applications in data reduction, noise reduction, and exploratory data analysis.

PCA transforms a data vector from the original space defined by p variables into a new space of p uncorrelated variables. This transformation is defined by the matrix equation:

(15)

This equation represents the relationship between the original data matrix (denoted as ) and the matrix of eigenvectors (denoted as ) ,where the columns of are the eigenvectors of the covariance matrix of .

However, it is not mandatory to preserve all principal components. Preserving only the first L principal components, which correspond to the first L eigenvectors, leads to a truncated transformation:

(16)

where contains n rows but only L columns, effectively reducing the dimensionality of the data. The transformation yielded by PCA takes the following form:

(17)

Here, represents a matrix that forms an orthogonal basis for the L features [27].

The purpose of this transformation is to optimise the score matrix so that it maximises the amount of variance from the original data that is preserved and minimises the total squared reconstruction error. This can be quantified by the following equation:

(18)

The difference between the reconstructed data from the full transformation and that from the truncated transformation is represented by this error measure. This difference highlights the effectiveness of PCA in identifying the most important aspects of the data.

By using PCA, key features of the dataset can be effectively preserved in a smaller number of dimensions. This makes analysis and visualisation less complex, while preserving essential information on the variability of the data.

Continuous wavelet transform

The CWT can be used to analyse non-stationary signals. This is accomplished by splitting the signals into wavelets, which are functions that are both time- and frequency-localised. Unlike Fourier transforms, the CWT provides a representation of time and frequency, making it particularly well suited for signals whose frequency content is subject to changes over time.

The CWT of a function is represented as [28]:

(19)

The function is the mother wavelet, where a and b represent the scale and translation parameters, respectively, and signifies the complex conjugate of [28, 29].

To meet the admissibility requirement—having a zero mean and a finite energy—it is crucial for the mother wavelet to act as a band-pass filter. Furthermore, it needs to be zero-integrated [28, 29].

The signal processing community uses CWT for a variety of tasks, including feature extraction, noise reduction, and time-frequency analysis.

Recurrent neural networks

An RNN is an artificial neural network in which the connections between nodes follow a specific temporal sequence. This allows them to display temporally dynamic behaviour for the given time sequence. RNNs differ from feedforward neural networks in that they may process input sequences using their internal state (memory). This opens up a world of possibilities for application, such as to speech recognition and unsegmented linked handwriting recognition.

The fundamental building blocks of RNNs are nodes, also known as neurons. At each time step, nodes modify their activation and transmit this activation to the subsequent time step [30, 31].

According to [32], this can be expressed mathematically as follows. The input at each step t is represented by each element of the input vector . The RNN uses these equations to find the hidden vector sequence and, if necessary, the output vector sequence from t = 1 to T:

(20)(21)

Here, the hidden state at time t is represented by ht. The activation function is usually a non-linear function like tanh or ReLU, and the weight matrices (parameters) for input-to-hidden, hidden-to-hidden, and hidden-to-output connections are Whh, Wxh, and Why, respectively.

The bias vectors are denoted as bh and by, whereas the output at time t is represented by yt.

In the case of output gradients that are very small relative to the parameters, the vanishing gradient problem arises, making the network incapable of identifying long-range correlations in the input data. This is one of the main obstacles to training RNNs. According to [30], two common RNN variations that aim to address this problem are LSTM units and GRUs.

LSTM networks also employ gating mechanisms to control the information flow, and they keep a separate cell state in addition to the concealed state. A set of gate types are present in every unit. A forget gate determines which pieces of data to remove from the cell’s state, while an input gate determines which cell state values should be updated.

It is the job of the output gate to decide at each step which component of the cell state should be output see (Fig 1).

thumbnail
Fig 1. An LSTM model diagram by J. Leon, Beerware.

The graph shows the the architecture of LSTM model.

https://doi.org/10.1371/journal.pone.0321204.g001

The mathematical expressions for these gates are as follows:

(22)(23)(24)(25)(26)(27)

Convolutional neural networks

Combining concepts from deep learning with signal processing allows for a description of the architecture and functioning of CNNs using wavelet transforms. This is especially important when dealing with financial time series data, such as r log returns. When analysing non-stationary financial time series data, in which features such as volatility might fluctuate over time, wavelet transforms offer a highly beneficial time-frequency analysis tool see (Fig 2).

thumbnail
Fig 2. A CNN architecture illustrution [36].

The graph shows the the architecture of CNN model.

https://doi.org/10.1371/journal.pone.0321204.g002

This article uses CNNs in conjunction with the CWT to analyse financial time series data, with a particular emphasis on log returns. Improved CNN feature extraction for dynamic financial data is achieved using the fine-grained time-frequency localisation features of CWTs.

Financial instruments’ log returns are notoriously difficult to analyse due to their non-linear behaviour and volatility. The CWT is one effective method for breaking down these non-stationary signals into time-frequency space and improving the capacity to detect localised scale-dependent patterns.

A time series can be transformed into a two-dimensional signal representation in the time and frequency domains using the CWT [33, 34]. The definition of the transform is:

(28)

where a and b are the scale and translation parameters, respectively, and is the mother wavelet function.

The CNN receives its input from the two-dimensional CWT coefficient matrix, which encodes signal information across both time and frequency.

To extract features from the CWT coefficients, these layers employ various filters via the equation:

(29)

Each filter picks up on unique patterns, such as spikes in frequency around certain dates, which may be indicative of major financial occurrences [35]. represents the region (or set of indices) over which the pooling is performed for the i-th unit of layer l + 1.

The deep neural network is completed by fully linked layers that combine the acquired characteristics to make predictions or classify patterns.

During training, the goal is to improve a loss function suitable for the specific job at hand, such as mean squared error for regression or cross-entropy for classification. This is achieved by employing methods such as stochastic gradient descent.

Combining CWTs with CNNs provides an effective method for analysing complex and dynamic financial time series. This strategy improves the accuracy of predictions and allows for improved extraction of data by collecting specific local characteristics at different levels of detail [34].

Fourier–Mellin transform

This section provides an examination of the image-matching methodology (in this context, by image, we mean the log-returns time series lag of the asset’s wavelet transform), showing its limitations and offering the symmetry phase-only FMT (SPOFM) as a potential alternative approach. Two phases are involved in SPOFM: calculating the Fourier–Mellin invariant (FMI) descriptors and matching FMI descriptors to two-dimensional images. Some of the notable advantages of SPOFM are its robustness against noise, accuracy in numerical computations, and effectiveness in data selection. Nevertheless, there are several limitations associated with its use in scenarios involving image rotation and scaling. A proposed alternate approach involves employing a circular harmonic expansion technique, which introduces the additional task of identifying the common centre of rotation. The combination of the FMI and SMPOF methodologies capitalises on their respective strengths and is experiencing growing popularity. This section also examines the role of the FMT in the comparison of image sizes, emphasising its wide peak as a notable limitation in determining the location and identification of object elements.

The primary aim of the image comparison is the recognition of the existence of an image within a background of noise in the form:

(30)

The term on the left-hand side is the noisy scene function s(x, y). The first function on the right-hand side, , is the image function, while the remaining function n(x, y) is the zero mean, which remains unaffected by the image signal r(x, y). The following equation can be used to determine the geometric translation measures between the two images:

(31)

Here, the and components, as well as the geometric components , represent the computational functions associated with the geometric translation regarding the variable p. For illustrative purposes, consider a two-dimensional scenario where the translation offset is denoted as .

(32)

Here, is the Fourier transform function, and:

(33)

The transfer function H(u, v) can be considered as the maximum detection of the signal, which at the same time minimises the noise. Mathematically, the equation can be expressed as:

(34)

The quantity can be defined as the complex conjugate of the Fourier spectrum R(u, v). Additionally, it can be expressed as the product of the noise power-spectral intensity .

If the noise spectrum exhibits a uniform distribution, with intensity denoted as nw, then the transfer function decreases to a level determined by:

(35)

The result of the filter function is given by the convolution of the functions s(x, y) and , which can be expressed as:

(36)

The function achieves its highest value at the point , according to [37]. This point determines the values of the translation parameters as well as the intensity of the noise, denoted as nw and calculated as the squared absolute value of N(u, v).

This becomes an obstacle in image matching approaches when there is limited shape differentiation but where the images have the same size and energy content [37].

This also presents a difficulty in identifying the maximum value in the presence of noise, a problem that can be resolved by the use of a phase-only matching filter.

The transfer function is defined as follows:

(37)

In this context, it becomes clear that j2 = −1 due to the fact that the spectral phase preserves only the spatial information of the object, while rejecting the image’s energy content.

The use of the phase-only matched filter yields a more distinguishable picture in comparison to the conventional matched filtering technique, hence increasing the ability to discriminate between objects.

The approach can be enhanced by acquiring the correlation phase of both the image and the noise function [37]. This is achieved by including a non-linear filter that produces output according to the following equation:

(38)

The spectral phases and indicate the image signal r(x, y) and the noise function s(x, y), respectively. When the level of noise is insignificant, the aforementioned equation can be rewritten as:

(39)

The inverse Fourier transform of the above function yields a Dirac delta function centred at the coordinates . This function exhibits superior performance in comparison to phase-only matched filtering (POMF).

The technique discussed here offers as an alternative approach to POMF known as symmetry POMF (SPOMF) [37].

The image transformation method consists of three sequential processes: rotation, scaling, and translation. These steps are applied to an image denoted as r(x, y). The technique involves first collecting the phases of the image, followed by performing POMF [37].

This approach can be achieved by the implementation of pre-processing techniques in the spectral phase.

In the given scenario, an object indicated as s(x, y) performs a series of operations resulting in the generation of a comparable image denoted as r(x, y):

(40)

In the provided context, represents the rotation angle, and denotes the uniform scale factor.

The variables x0 and y0 represent the offsets of the transformation. The mathematical representation of the Fourier transform is given by:

(41)

The function characterises the spectral phase of the input image, which is influenced by rotation, scaling, and translation. On the other hand, the spectral magnitude remains unaffected by changes in spatial size:

(42)

The scaling is constant under spectral operations for , as it is proportional to . The two primary processes can be identified by employing polar coordinates to specify the magnitudes of s and r.

(43)

It is well established that the spectral magnitude exhibits periodic behaviour with respect to the angle . Thus, it is possible to estimate the transfer function of a filter by using only half of the magnitude field, as long as the original image is authentic. Therefore,

(44)

The function performs angular rotation, where rp represents the spectral magnitudes. Conversely, the process of scaling involves the reduction of coordinates while simultaneously amplifying the constant factor denoted as .

The reduction of scaling can be achieved by implementing logarithms of scale in the radial coordinates, as proposed by [37]. These logarithmic transformations are defined as:

(45)

Here stands for polar-logarithmic representation, and

(46)

Here, , and .

In the given equation, which is presented in polar logarithmic form, the operations of scaling and rotation have been simplified to translation. The resulting equation can be expressed as follows:

(47)

Additionally, the computations for rotation and scaling are performed independently. Therefore, this numerical approach can be considered dependable. The natural visual system has significant parallels to the log-polar mapping technique.

The many strategies that have been mentioned can be integrated together to obtain an optimal outcome [37]. The comparison of the FMI descriptor of an image may be made with a variety of methods, including cross-correlation (CC), MF, POMF, and SPOMF. SPOMF exhibits a pronounced correlation peak, as demonstrated in the works of [37].

Let s(x, y) and r(x, y) denote the images obtained from the application of the SPOMF algorithm. Then:

(48)

The fundamental procedure for the SPOMF of the FMI involves the following steps:

Algorithm 1 The core mechanism of the FMI-SPOMF algorithm.

1: The procedure begins by applying the Fourier transform to the FMI descriptor of the reference image, which is represented as .

2: After that, it proceeds to extract the phase from .

3: Then, the Fourier transform of the FMI descriptor of the observed image s(x, y) is computed.

4: This is followed by the extraction of the phase:

5: The output of the SPOMF can be determined using the following equation:

6: The process for determining the inverse of the Fourier transform is obtained via

7: The final procedure is to locate the point where the function is at its maximum.

In the initial phase of the procedure, it is assumed that all the images have identical characteristics, with the fundamental aim being to identify the geometric transformation that establishes correlation between these images.

The execution of the core FMI-SPOMF algorithm is performed as follows:

  1. The output of the maximum filter is denoted as and is subsequently identified at the precise coordinates and .
  2. The value of the rotational angle is denoted as and is equal to the maximum angle of rotation, represented by .
  3. The scaling factor, denoted as , is mathematically defined as the exponential function of the maximum eigenvalue .

The descriptor is generated using half of the spectrum, resulting in an estimated rotation angle ranging from 0 to 180 degrees. The actual angle of rotation depends on a pair of potential values: or  +  . The second stage involves the use of the image registration method to determine the translation offset, followed by the determination of the rotation angle. A scaling process is performed on the observed image, resulting in two re-scaled images. These re-scaled images are then rotated by an angle of . When executed accurately, the spectrum of the rotated image exhibits a phase discrepancy with R(u, v), which results from the translation of the image. When performed incorrectly, there is a lack of relationship between the spectrum and the coordinates, denoted as R(x, y).This implies that the appropriate angle of rotation can be calculated. This approach can be effectively employed in the identification of pattern recognition problems.

We can now go on to clarify the structure of image registration. Algorithm 2 below outlines the stages involved in image registration using the FMI-SPOMF.

Algorithm 2 Image registration mechanism.

1: Initiate the core FMI-SPOMF algorithm.

2: Identify the location coordinates that correspond to the maximum of .

3: Resize the image s(x, y) by a factor of and repeat this process.

4: Generate two resized versions of the image and rotate them by and , respectively.

5: Assess the SPOMF between the reference image r(x, y) and the resized, rotated versions of the observed image.

6: Among the two manipulated images, determine the one with the highest filter maximum.

7: Identify the coordinates of this maximum.

8: Use the geometric features obtained from this transformation process.

Having discussed the importance of every element within the image registration framework, we can go on to outline an application that demonstrates how to take advantage of these geometric characteristics to optimise a portfolio.

To obtain these geometric characteristics, we employ the CWT on Rpca in a process similar to that described in Algorithm 4. However, it is necessary to provide a reference dataset to implement Algorithm 2.

System design

This section provides a comprehensive explanation of the system’s architecture and capabilities, offering a comprehensive understanding of the interactions across its components. This is of crucial significance to developing an in-depth knowledge of the operational dynamics of the system and the impact of design choices on its performance.

Within the scope of our research, the system being analysed comprises three separate investing strategies: VAR(1)-AutoML, CWT-CNN, and FM-LSTM. These methods reflect the system developed for forecasting financial markets and maintaining investment portfolios.

The first element of our system, known as VAR(1)-AutoML consists of the following steps. First, let denote the matrix of stock prices, where Pij represents the price of the stock at the time point. Then, we find the log returns matrix , which can be calculated as . Before applying PCA, the data is standardised. Let be the standardised log returns matrix, where , and and are the mean and standard deviation of the stock returns, respectively.

After standardisation, PCA can be applied to obtain the matrix , where , and − 1 is the number of principal components chosen. The method of using retained principal components is called Kaiser’s rule. According to this rule, it is recommended to retain only those components with eigenvalues exceeding 1, since they provide a greater quantity of information compared to single variables.

To effectively capture the temporal dynamics of stock prices and predict future returns, we use a rolling window method for VAR modelling. The VAR(1) model, a multivariate extension of the standard autoregressive model, is chosen for its capacity to represent the linear relationship present across numerous time series.

The method we developed is designed to calculate a series of VAR(1) models. Each model is built using a rolling window approach, where the window includes the most recent L = 22 trading days. The decision to use L coincides with the normal trading procedure, wherein each model is updated using a dataset that covers approximately 30 days of current data. The use of a rolling window method allows our system to respond effectively to fluctuations in market dynamics, as the estimated model parameters possess the ability to fluctuate over time.

The VAR(1) operates as follows:

  1. The system is initialised by specifying an empty VAR(1) model structure V with p variables, where p is the number of stocks in the dataset.
  2. The following steps are implemented for each trade day i (from the day after the initial L days to the final day in the dataset):
    • The VAR(1) model Pi is estimated using the logarithmic returns of the p stocks, spanning from day i to i + L−1. The above method produces a set of VAR(1) coefficients of the model, which successfully describe the linear relationships between the stocks, using the L most recent days of data.
    • A vector Ei is initialised to include the predicted returns for each day i.
    • For every day j within the range of iL + 1 (or day 1, if iL + 1 is less than 1) to i, a one-step-ahead forecast is calculated using the VAR(1) model Pi and the log returns from day i to i + L−1. The forecasts are collected and stored in the variable Ei.
    • Eventually, the predictions collected in Ei are averaged across the number of days used in the estimation, which is the minimum between i and L. This process results in an average forecast of returns for day i, specifically, for a one-step-ahead prediction.

Once this procedure is completed, a time series of expected returns E is obtained. Each Ei is calculated from a VAR(1) model, which is estimated using the most recent L days of data. The suggested approach combines VAR modelling, with its advantages for capturing linear connection among shares, with a rolling window method to efficiently react to changing market dynamics.

The division of data into training and testing sets is a critical stage in the application of deep learning or machine learning techniques. In this process, the data is partitioned, with allocated for training and the remaining reserved for testing. The purpose of the training dataset is to aid in the training of the system, while the testing dataset is used to assess and evaluate the performance of the model. Algorithm 3 below illustrates the model’s procedure:

Algorithm 3 VAR(1)-AutoML investment strategy.

1: procedure StockPricePrediction (P, n, m)

2:   Initialise

3:   for i = 1 to n do

4:    for j = 1 to m–1 do

5:    

6:    

7:    end for

8:   end for

9:   Apply PCA using Kaiser’s Rule

10:   Split Rpca into and

11:   Train the model

12:   Test the model on

13:   Determine important feature order in and , relying on the model outcomes by masking Rpca.

14:   Calculate cumulative log returns as CLtr and CLte for and , respectively

15:   return CLtr, CLte

16: end procedure

Important note: is a model that trains with a tool that uses automated procedures to select between Bayesian optimisation and the Asynchronous Successive Halving Algorithm. These algorithms are applied to a range of regression model types with varying hyperparameter values. The output of fitrauto is the model anticipated to yield the most accurate predictions for new data.

The second suggested approach – CWT-CNN – involves combining a wavelet transform with a CNN. The wavelet transform is used as a first step to divide the investment data into distinct frequency components. Subsequently, a CNN is employed to identify the important features of the data for portfolio optimisation. This model includes an important phase that involves the modification and extraction of features from a dataset with several dimensions. To achieve this, a CWT is performed on specific components of the data collection.

The technique begins by creating new transformed data points, denoted as Q, whose dimensions are determined by the span of the dataset dates (which is decreased by L elements from the end) and the parameter p. The matrix Q is arranged to enable the storage of the results of the repeated wavelet transformations.

After the creation of Q, the algorithm proceeds to participate in a two-level iteration loop. The outer loop sequentially traverses each element in the dataset P, which is a vector that stores the indices of the dataset. For every iteration i in the set P, a nested loop is initiated, which operates p times. The algorithm conducts a CWT on a section of the j-th column of the data matrix Rpca, covering the i-th row to row i + L−1, within the nested loops.

The proposed approach involves the application of a CWT to different sections of the multi-dimensional dataset. The results of the transformation outputs are important for capturing vital information from the data. The feature extraction technique plays a crucial role in the data analysis pipeline of the system as a whole.

Algorithm 4 CWT-CNN investment strategy.

1: procedure StockPricePrediction (P, n, m)

2:   Initialise

3:   for i = 1 to n do

4:    for j = 1 to m–1 do

5:    

6:    

7:    end for

8:   end for

9:   Apply PCA using Kaiser’s rule.

10:   Apply CWT on Rpca.

11:   Split Rpca into and .

12:   Train the model.

13:   Test the model on

14:   Determine important feature order in and , relying on the model outcomes by masking Rpca.

15:   Calculate cumulative log returns as CLtr and CLte for and , respectively.

16:   return CLtr, CLte.

17: end procedure

The third component, the FM-LSTM, implements the FMT in conjunction with an LSTM network. The FMT is employed to transform the time-series investment data into a representation that is invariant to changes in scale and rotation. Following this, an LSTM network is used to find the important features to control and optimise the portfolio.

Here is the outline of the FM-LSTM model:

Algorithm 5 FMT-LSTM investment strategy.

1: procedure StockPricePrediction (P, n, m)

2:   Initialise

3:   for i = 1 to n do

4:    for j = 1 to m–1 do

5:    

6:    

7:    end for

8:   end for

9:   Apply PCA using Kaiser’s rule.

10:   Apply CWT on Rpca.

11:   Split Rpca into , and .

12:   Obtain geometric features from using Algorithm 2 on and , relying on .

13:   Assign FMtr and FMte as the matrix geometric features from training and testing data, respectively.

14:   Train the model.

15:   Test the model on

16:   Determine important feature order in and , relying on the model outcomes by masking Rpca.

17:   Calculate cumulative log returns as CLtr and CLte for and , respectively.

18:   return CLtr, CLte.

19: end procedure

Here, is the log return of the PCA reference dataset.

The computational procedures, including data preprocessing, model implementation, and statistical analysis, were performed using custom MATLAB scripts. These scripts are included in the Supporting Information section to ensure reproducibility.

Results

The portfolio consisted of 1,421 stocks, taken from US stock market between 1 April 2013 and 1 April 2023. The data was obtained from a website reliable for its financial reports and details using the MATLAB connection function whic is available in https://www.mathworks.com/help/datafeed/moneynet.html. The principal purpose of this initial research effort is to experiment with feature-based optimisation approaches. Therefore, the selection was random to ensure a diverse and unbiased sample. This randomness guarantees the assessment of the models avoids the influence of bias from specific sectors or companies.

Additionally, the random precursor allowed for testing of the robustness and generalisation capability of the optimisation models across a broad range of stocks.

We conducted a statistical analysis of the daily prices of the stocks in the portfolio. The aim of the analysis was to identify important features, including the stocks’ central tendency, dispersion, and correlation, to capture important factors for investment strategies and portfolio management.

The investigated statistical measures included the mean, median, standard deviation, skewness, and kurtosis (summary statistic). As mentioned, these describe the central tendency and dispersion of returns. Correlation coefficients were computed to examine the relationships between different stocks.

To obtain an in-depth view of the data, visualisations, including time-series plots, histograms, box plots, and correlation heatmaps were created to provide a graphical representation of these statistics. We began with a visualisation of a time-series plot of the stock prices as presented in (Fig 3) below.

thumbnail
Fig 3. Time series plot of stock prices.

The graph shows the performance of the portfolio optimization method using historical stock data.

https://doi.org/10.1371/journal.pone.0321204.g003

The daily stock prices for the firms under study are summarised in the Table 1 below, along with descriptive information. Mean and median price, standard deviation, skewness, and kurtosis are displayed in separate rows.

The average rate of price changes can be calculated using the mean. The median price represents the midpoint of the price distribution and is less volatile than the mean due to extreme values. We found that the average mean and median prices were between 2.108 and 2,870.409 and 1.171 and 2,704.705, respectively.

Using the standard deviation, the volatility of the returns can be assessed. Our research shows that the daily prices of these stocks can vary widely, based on the standard deviation of 0.281 to 2,300.347.

The skewness index quantifies the disproportion in a price distribution. In the case of positive skewness, the tail of the prices is far to the right, whereas in the case of negative skewness, it is far to the left. Our results exhibit skewness in the range of –3.686 to 4.866.

The tailedness of a price distribution can be quantified by its kurtosis. A greater kurtosis suggests a distribution with fatter tails, which may portend more dramatic price swings. Our price kurtosis fell between 1.186 and 26.017 degrees.

These numbers are useful since they shed light on the performance of these equities. There appears to be a wide range of stock performance, as indicated by the wide range of mean and median returns. There also appears to be a wide range of stock risk, as indicated by the large difference in standard deviation. Differences in the distribution of stock returns are further highlighted by the skewness and kurtosis, which exhibit substantial diversity across these equities.

We performed hierarchical clustering based on the correlation matrix of the stock prices calculation. Then, we derived measures from these samples. To obtain these measures through clustering, we applied the so-called linkage method. The figure was truncated to show only a few merges, as the 1,412 samples would be difficult to present using a heatmap.

To implement this approach, we used the following steps based on the daily stocks prices:

  1. A correlation matrix among stock prices was computed.
  2. A transformation was applied from the correlation matrix to the dissimilarity matrix.
  3. The average linkage approach was used to apply hierarchical clustering to the dissimilarity matrix.
  4. The resultant dendrogram was truncated so only the most recent N mergers are shown.

The stock returns are displayed in a dendrogram in (Fig 4), which shows their hierarchical grouping. The dendrogram is a tree-like diagram in which each node represents a stock, and the distance between nodes represents the degree of dissimilarity between groups of stocks. To simplify interpretation of the larger clusters, we trimmed the dendrogram to display only the most recent N mergers.

To group samples sets, we used an approach that relied on the average correlation with all other samples. The aim of using this approach was to capture the correlation structure in our large datasets comprised of stock prices. The steps for applying this method were as follows:

  1. We began by computing the correlation matrix of the dataset using the Pearson correlation coefficient, which measures the linear relationship between two variables. The correlation coefficient ranges from –1 to 1, where –1 indicates a strong negative linear relationship, 1 a strong positive linear relationship, and 0 no linear relationship.
  2. We then computed the average correlation for each sample by evaluating the mean of each row in the correlation matrix.
  3. We divided the samples into three groups – low, medium, and high – depending on their average correlations. The thresholds for these groups were set randomly and could be adjusted as needed.

Figs 5 and 6 present a bar chart showing the results of the average correlation of samples grouped by correlation level. The x-axis is the sample index, and the y-axis represents the average correlation values. The three groups are designated as follows: The low correlation group is coloured red, the medium group is green, and the high group is blue. The motivation for including this step was to obtain a clear visual representation of the groupings and allow for clear identification of patterns in the data. The fact that the results of averaging and grouping the correlation samples are sensitive to the specific thresholds used for the groupings should also be taken into account.

thumbnail
Fig 5. Here, –1, 0, and 1 are the thresholds used to group the correlations.

https://doi.org/10.1371/journal.pone.0321204.g005

thumbnail
Fig 6. Here, –0.8, 0, and 0.2 are the thresholds used to group the correlations.

https://doi.org/10.1371/journal.pone.0321204.g006

We also calculated the log returns of the stocks prices and applied PCA to the stocks’ log returns. The Kaiser rule was employed to select the optimal principal components of the data, and the first 10 were kept as a representative of the stock returns of the portfolio. The aim of this strategy was to capture the most variance in the data using the fewest number of principal components. PCA removes outliers and unusual patterns while ensuring noise reduction, as outliers and noise have strong negative effects on identifying the best features for optimising the portfolio. Below is a plot of the first 10 principal components of the data see (Fig 7).

thumbnail
Fig 7. First 10 principal components of the stock log returns.

https://doi.org/10.1371/journal.pone.0321204.g007

We plotted a heatmap of the correlation matrix for the 10 principal components obtained from the stock log returns derived from the data see (Fig 8). On the heatmap, the diagonal values are equal to 1. This means that each PCA correlated perfectly with itself. The numbers not on the diagonal, which show the relationships between major components, are near zero. This illustrates that these principal components are orthogonal and support the use of PCA. The cause of these small off-diagonal numbers is calculation error or background noise.

We also plotted the cumulative log returns of the principal components over time, considering PCA1 to PCA10. (Fig 9) shows that the initial investment was initialised at zero for all cumulative log returns. There are increasing and decreasing log returns over time from 0 to 3,000 days (plotted on the x-axis), which is indicative of the fluctuating nature of returns over time.

thumbnail
Fig 9. Cumulative returns of the principal components of the log returns.

https://doi.org/10.1371/journal.pone.0321204.g009

The cumulative log returns plotted along the y-axis fall between –1 and 9. Note that a value above zero indicates a positive return, while one below zero indicates a negative return or a loss. These results demonstrate the inherent volatility of the market, with notable fluctuations in cumulative returns for all principal components see (Fig 9).

We also plotted the cumulative log returns of these 10 principal components, as presented in Fig 10 below:

thumbnail
Fig 10. Cumulative returns of a portfolio containing the principal components of the log returns.

https://doi.org/10.1371/journal.pone.0321204.g010

Fig 10 shows that the portfolio’s cumulative log return decreases over time, with brief intervals of stability marked by less sharp decreases. Portfolio assets are dynamic and variable, which explains these changes. The steeper downwards slopes suggest larger negative returns, whereas the more flat areas indicate relative stability with no substantial positive log returns.

A wavelet transform divides a signal into frequency components and analyses each component with a scale-matched resolution. Time-series analysis uses wavelet transform to investigate non-stationary signals, such as financial data see (Fig 11).

thumbnail
Fig 11. Wavelet transform of some of the principal components of log returns.

https://doi.org/10.1371/journal.pone.0321204.g011

To evaluate our model, VAR(1)-AutoML, CWT-CNN, and FM-LSTM were used to analyse the data. We compared the results of these investment approaches by creating profitability plots using cumulative log returns.

Fig 12 visualises the paths of the three investment strategies for the 10-year period from April 2013 to April 2023. Cumulative log return was employed for comparative analysis, serving as an indicator of the total investment yield and taking into account the compounding factor.

thumbnail
Fig 12. Performance evaluation: the profitability of VAR(1)-AutoML, CWT-CNN, and FM-LSTM.

https://doi.org/10.1371/journal.pone.0321204.g012

The x-axis of the graph represents the 10-year time span from April 2013 to April 2023. The y-axis represents the cumulative log returns, ranging from –15 to 10. This range signifies the potential for substantial profits, as indicated by positive logarithmic returns, as well as losses, as indicated by the negative logarithmic returns, over a 10-year periods.

The trajectory of each approach on the chart represents its overall performance. The overall patterns indicate varying degrees of effectiveness. The FM-LSTM investment strategy has an upward trajectory on the graph, demonstrating its tendency to generate positive outcomes. In contrast, the VAR(1)-AutoML strategy exhibits periods of decline.

Mean absolute error (MAE) is a statistical metric that quantifies the differences between two observations that represent the same phenomenon. In this context, examples of observations (Y versus X) include expected and observed values, subsequent time and starting time, or a measuring technique and an alternate measurement approach [38]. We computed the MAE by adding the absolute errors, expressed as the Manhattan distance, and then dividing the sum by the sample size:

(49)

Therefore, the MAE is the geometric mean of the absolute inaccuracies , where yi is the forecast, and xi is the actual value. Relative frequencies can be used as weight factors in other formulations. To calculate the MAE, the scale must match that of the observed data. As the MAE is dependent on the scale being used, it cannot be used to compare predicted values measured on different scales [39]. In time series analysis, the MAE is frequently used to measure forecast error and is often confused for the more popular mean absolute deviation. The same misunderstanding is also present more broadly [40].

In statistics, the mean squared error (MSE) or mean squared deviation of an estimator quantifies the average of the squares of the errors, that is, the average squared difference between the estimated values and the actual value [41]. MSE is a risk function that represents the expected value of the squared error loss. The near-constant positivity of MSE, or its tendency not to drop to zero, arises from randomness or the estimator’s inability to incorporate information that could yield a more precise estimate [42, 43]. In machine learning, especially empirical risk minimisation, MSE can be used to represent the empirical risk, which is the average loss on a given dataset [44]. It is a close approximation of the real MSE, which is the average loss on the actual population distribution.

Given a vector of n predictions derived from a sample of n data points across all variables, where yi represents the vector of observed values for the predicted variable, and denotes the predicted values (e.g. obtained from a least-squares fit), the within-sample MSE of the predictor is calculated as [44]:

(50)

This study assessed the efficacy of three different investment methods – VAR(1)-AutoML, CWT-CNN, and FMT-LSTM – using MAE and MSE measures across several principal components. The findings shown in Table 2 demonstrate that CWT-CNN consistently achieved the lowest error rates across the majority of components, indicating enhanced prediction accuracy relative to the alternative approaches. Significantly, for principal component 9 (PC9), CWT-CNN exhibited remarkable accuracy, with an MAE of 0.014 and an MSE of 0.001. Conversely, FMT-LSTM demonstrated elevated error rates, especially for PC6 and PC8, indicating possible paths for model enhancement. These findings underscore the efficacy of CWT-CNN for time series prediction tasks in this setting.

thumbnail
Table 2. MAE and MSE for the training set across the three methods.

https://doi.org/10.1371/journal.pone.0321204.t002

The findings shown in the testing dataset presented in Table 3 demonstrate that CWT-CNN consistently outperformed the other approaches, yielding reduced error rates for the majority of PCs. CWT-CNN attained the minimum MAE and MSE for PC1 to PC8, indicating its exceptional prediction proficiency. The FMT-LSTM model, although competitive in certain situations (e.g. PC2), typically exhibited elevated error rates, especially for PC6 and PC8. In particular, PC9 was a challenge for all models, exhibiting markedly elevated MSE values. The findings show that the CWT-CNN methodology was highly effective in identifying the fundamental patterns within this dataset, indicating it could improve accuracy in forecasting for analogue time series applications.

thumbnail
Table 3. MAE and MSE for the testing dataset across the three methods.

https://doi.org/10.1371/journal.pone.0321204.t003

Fig 12 presents a comparison of the VAR(1)-AutoML, FM-LSTM, and CWT-CNN models in terms of their efficacy in optimising portfolio returns. Under particular circumstances, the FM-LSTM model exhibited exceptional returns, indicating its successful representation of certain market dynamics. Nevertheless, it is important to emphasise that this improved performance does not indicate an innate capacity to effectively forecast future market trends.

The performance outcomes may have been affected by the random decision to use of the data for testing and for training. The variability introduced by this sampling technique could compromise the model results. Although FM-LSTM produced better cumulative returns, its success better indicates flexibility to historical trends rather than forecasting ability.

By contrast, although it did not produce the same degree of returns, the CWT-CNN model provided an insightful analysis of temporal patterns and volatility. We also respect the complexity of financial markets and the restrictions of modelling techniques in dynamics prediction.

Further study is required to investigate the effects of several data splitting cases and improve model resilience. This will assist in understanding the actual predictive power of these models.

Conclusions and future work

This paper investigated three computational portfolio optimisation strategies employing a PCA of stock log returns. The approaches included VAR(1)-AutoML, CWT-CNN, and FM-LSTM. Each strategy obtained crucial stock return data to enhance the performance of financial portfolio prediction models.

The VAR(1)-AutoML technique made use of VAR(1)’s statistical power for time series forecasting and AutoML’s scalability for model selection and hyperparameter tweaking. This combination accurately captured linear connections and changes in the data over time, providing a solid platform for more complex models.

In contrast, the CWT-CNN approach used the CWT to extract features and decompose financial data series into time-frequency components. Combining these multi-resolution analytical capabilities with the pattern recognition strength of a CNN facilitated the discovery of complex non-linear patterns in data that standard approaches ignore.

After conducting a thorough investigation in the frequency domain using the FMT, the FM-LSTM approach used LSTM networks to model the sequences. This technique helped capture long-term dependencies and cyclic patterns in the data, which are essential for identifying trends and producing more accurate stock performance projections.

While each model has its own benefits, the FM-LSTM model demonstrated promising cumulative log return values, a critical performance indicator in this study. However, the random selection of testing and training data may have influenced the results. For this reason, the results indicate the model’s ability to adapt to these data in the portfolio rather than its prediction capability. Even given this limitation, our model can be considered a potentially beneficial tool for portfolio managers to investigate the the uncertain events in volatile financial markets.

The use of PCA as a preprocessing step reduced dimensionality and highlighted the most important characteristics, improving model performance and portfolio optimisation.

Our work demonstrates how improved data processing and machine learning models can affect financial market forecasts. The efficiency of the FM-LSTM model indicates its potential application in examining and predicting long-term financial trends. A future study could combine these approaches or use additional data sources to improve the stability and adaptability of portfolio optimisation systems. While our model exhibits some predictive power, it is important to consider certain constraints. First of all, the FMT can show sensitivity to noise, limiting the model’s performance on unclean data. When the FMT and LSTM are combined, the data dimensionality may also be reduced to a small number of features, making the prediction process difficult. Particularly in the case of sparse data, the complexity of the integrated model could lead to overfitting. A lack of data points in high-dimensional domains prevents algorithms from identifying significant trends without overfitting. This can significantly impact a model’s generalisability and performance. The issue of dimensionality represents a major obstacle in this study, since it lowers the capacity of predictive models, including LSTM networks, to appropriately forecast financial events. Solving this problem requires increasing the model’s dependability and effectiveness. Feature selection approaches or other dimensionality reduction strategies are essential for this purpose.

To make FM-LSTM models function effectively, it is important to find reference data points that can be used to obtain geometric features from both the training and testing datasets. The number of reference points directly influences the richness of the features obtained, but increasing it may result in higher processing costs and complexity. This requirement imposes a significant limitation, since an excessive number of features may cause overfitting, while too few may result in underfitting and suboptimal model performance. Achieving a balance in the quantity of reference points is critical for improving model precision and efficacy, and careful analysis is required in the design of FM-LSTM applications.

Although our methodology has yielded encouraging results, numerous questions remain unanswered. To enhance the model’s generalisability, it would be beneficial to use a more diverse inter-class asset, such as cryptocurrencies. Additionally, attempting other designs or including different methods, such as mean-variance optimisation, could improve the FM-LSTM model. Combining the GARCH and DCC-GARCH models with FM-LSTM could also help obtain a deep understanding of volatility dynamics. Applying the model to real-time data would further allow for its performance to be tested in dynamic circumstances. Finally, to improve decision-making, it may be useful to check whether the model’s predictions are simple to understand. The proposed models could also be applied in other fields, such as for classification problems in medical diagnosis and epidemics.

Supporting information

S1 Code. main.m

This script executes the primary analysis, running the three models and applying profitability, error, training, and testing analyses.

https://doi.org/10.1371/journal.pone.0321204.s001

(PDF)

S2 Code. stack.m

Helper function to stack data into a 2D vector using feature extraction for CNN and LSTM.

https://doi.org/10.1371/journal.pone.0321204.s002

(PDF)

S3 Code. FM.m

Applies the Fourier-Millen transform to extract geometric features.

https://doi.org/10.1371/journal.pone.0321204.s003

(PDF)

S4 Code. hipass_filter.m

High-pass filter function used within FM.m.

https://doi.org/10.1371/journal.pone.0321204.s004

(PDF)

S5 Code. error_analysis.m

Computes MAE and RMSE and plots them as time series.

https://doi.org/10.1371/journal.pone.0321204.s005

(PDF)

S6 Code. calculateMetrics_csv_1.m

Applies MAE and RMSE on VAR(1)-AutoML and CWT-CNN, storing results as CSV files.

https://doi.org/10.1371/journal.pone.0321204.s006

(PDF)

S7 Code. calculateMetrics_csv_OO.m

Applies MAE and RMSE on FM-LSTM.

https://doi.org/10.1371/journal.pone.0321204.s007

(PDF)

S8 Code. transform_Image.m

Transforms 2D images to obtain geometric features.

https://doi.org/10.1371/journal.pone.0321204.s008

(PDF)

S9 Code. statistical_analysis.m

Implements statistical analysis.

https://doi.org/10.1371/journal.pone.0321204.s009

(PDF)

References

  1. 1. Kremmel T, Kubalík J, Biffl S. Software project portfolio optimization with advanced multiobjective evolutionary algorithms. Appl Soft Comput. 2011;11(1):1416–26.
  2. 2. Ta VD, Liu CM, Tadesse DA. Portfolio optimization-based stock prediction using long-short term memory network in quantitative trading. Appl Sci. 2020;10(2):437.
  3. 3. Qu H. Risk and diversification of nonprofit revenue portfolios: Applying modern portfolio theory to nonprofit revenue management. Nonprofit Manage Leadership. 2019;30(2):193–212.
  4. 4. Platanakis E, Urquhart A. Portfolio management with cryptocurrencies: The role of estimation risk. Econ Lett. 2019;177:76–80.
  5. 5. Feng Q, Tao S, Liu C, Qu H. An improved Fourier-Mellin transform-based registration used in TDI-CMOS. IEEE Access. 2021;9:64165–78.
  6. 6. Ali F, Sensoy A, Goodell JW. Identifying diversifiers, hedges, and safe havens among Asia Pacific equity markets during COVID-19: New results for ongoing portfolio allocation. Int Rev Econ Fin. 2023;85:744–92.
  7. 7. Ali F, Khurram M, Sensoy A, Vo X. Green cryptocurrencies and portfolio diversification in the era of greener paths. Renew Sustain Energy Rev. 2024;191:114137.
  8. 8. Ozbayoglu AM, Gudelek MU, Sezer OB. Deep learning for financial applications: A survey. Appl Soft Comput. 2020;93:106384.
  9. 9. Zhang Z, Zohren S, Roberts S. Deep learning for portfolio optimization. arXiv preprint. 2020.
  10. 10. Sang Y-F. A review on the applications of wavelet transform in hydrology time series analysis. Atmos Res. 2013;122:8–15.
  11. 11. Çınar A, Tuncer SA. Classification of normal sinus rhythm, abnormal arrhythmia and congestive heart failure ECG signals using LSTM and hybrid CNN-SVM deep neural networks. Comput Methods Biomech Biomed Eng. 2021;24(2):203–14. pmid:32955928
  12. 12. Reddy G, Reddy M, Lakshmanna K, Kaluri R, Rajput D, Srivastava G. Analysis of dimensionality reduction techniques on big data. IEEE Access. 2020;8:54776–88.
  13. 13. Hasan B, Abdulazeez A. A review of principal component analysis algorithm for dimensionality reduction. J Soft Comput Data Mining. 2021;2(1):20–30.
  14. 14. Antonakakis N, Chatziantoniou I, Gabauer D. Refined measures of dynamic connectedness based on time-varying parameter vector autoregressions. J Risk Fin Manage. 2020;13(4):84.
  15. 15. Warsono W, Russel E, Wamiliana W, Widiarti W, Usman M. Vector autoregressive with exogenous variable model and its application in modeling and forecasting energy data: Case study of PTBA and HRUM energy. Int J Energy Econ Policy. 2019;9(2):390–8.
  16. 16. Mangalathu S, Jeon J. Ground motion-dependent rapid damage assessment of structures based on wavelet transform and image analysis techniques. J Struct Eng. 2020;146(11):04020230.
  17. 17. Dolas P, Sharma V. Generalized offset Fourier-Mellin transform & its analytical structure. Int J Innov Sci Eng Technol. 2020;7(10).
  18. 18. Xu Q, Kuang H, Kneip L, Schwertfeger S. Rethinking the Fourier-Mellin transform: Multiple depths in the camera’s view. Remote Sens. 2021;13(5):1000.
  19. 19. Sims CA. Macroeconomics and reality. Econometrica: J Econometr Soc. 1980:1–48.
  20. 20. Hamilton JD. Time series analysis. Princeton University Press; 2020.
  21. 21. Lütkepohl H. New introduction to multiple time series analysis. Springer Science & Business Media; 2005.
  22. 22. Kilian L. Structural vector autoregressions. In: Handbook of research methods and applications in empirical macroeconomics. Edward Elgar Publishing; 2013. p. 515–54.
  23. 23. Phillips PC. Fully modified least squares and vector autoregression. Econometrica: J Econometr Soc. 1995;63(5):1023–78.
  24. 24. Dhingra V, Sharma A, Gupta SK. Sectoral portfolio optimization by judicious selection of financial ratios via PCA. Optim Eng. 2023:1–38.
  25. 25. Jolliffe IT, Cadima J. Principal component analysis: A review and recent developments. Philos Trans A Math Phys Eng Sci. 2016;374(2065):20150202. pmid:26953178
  26. 26. Jolliffe IT. Principal component analysis for special types of data. Springer; 2002.
  27. 27. Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798–828. pmid:23787338
  28. 28. Daubechies I. Ten lectures on wavelets. SIAM; 1992.
  29. 29. Stephane M. A wavelet tour of signal processing; 1999.
  30. 30. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. pmid:9377276
  31. 31. Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019;31(7):1235–70. pmid:31113301
  32. 32. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016.
  33. 33. Lilly J, Olhede S. Generalized Morse wavelets as a superfamily of analytic wavelets. IEEE Trans Signal Process. 2012;60(11):6036–41.
  34. 34. Yoo Y, Baek JG. A novel image feature for the remaining useful lifetime prediction of bearings based on continuous wavelet transform and convolutional neural network. Appl Sci. 2018;8(7):1102.
  35. 35. Sadouk L. CNN approaches for time series classification. In: Time series analysis-data, methods, and applications. 2019;5.
  36. 36. StackExchange. Drawing a CNN with Tikz; 2019. Available from: https://tex.stackexchange.com/questions/439170/drawing-a-cnn-with-tikz.
  37. 37. Qin-Sheng Chen, Defrise M, Deconinck F. Symmetric phase-only matched filtering of Fourier-Mellin transforms for image registration and recognition. IEEE Trans Pattern Anal Machine Intell. 1994;16(12):1156–68.
  38. 38. Willmott C, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30(1):79–82.
  39. 39. Hyndman R. Forecasting: Principles and practice. OTexts; 2018.
  40. 40. Pontius R, Thontteh O, Chen H. Components of information for multiple resolution comparison between maps that share a real variable. Environ Ecol Stat. 2008;15:111–42.
  41. 41. Pishro-Nik H. Introduction to probability, statistics, and random processes. LLC Blue Bell, PA, USA: Kappa Research; 2014.
  42. 42. Bickel PJ, Doksum KA. Mathematical statistics: Basic ideas and selected topics, volumes I-II package. Chapman and Hall/CRC; 2015.
  43. 43. Lehmann EL, Casella G. Theory of point estimation. Springer Science & Business Media; 2006.
  44. 44. James G, Witten D, Hastie T, Tibshirani R, Taylor J. An introduction to statistical learning: With applications in python. Springer Nature; 2023.