Table 1.
Overview of the risk measures.
Fig 1.
LLM investment risk model.
Table 2.
Overview of risk measures.
Fig 2.
Exemplary ChatGPT 30-year-old high risk-taking tendency investment advice.
Fig 3.
Overview of methodological approach.
We query ChatGPT, Copilot, and Gemini for financial advice, parse the investment portfolio recommendations, and augment them with data from Yahoo Finance, Refinitiv Eikon, and FRED (“Federal Reserve Economic Data” repository) to assess the investment risks, financial performance, and language style of the received financial advice.
Fig 4.
LLM financial advice increases financial investment portfolio risks (Study 1).
(A) We find above benchmark geographical cluster risk (N = 269), (B) sector cluster risk (N = 269), (C) trend chasing risk (N = 270), and (D) active investment risk (N = 270). The violin plots and boxplots represent the shape of the distribution of the respective investment risk by LLM. The black dot represents the mean and the colored dots the value of individual portfolio samples. The dashed line represents the value of the benchmark.
Fig 5.
LLMs show higher cluster risks compared to the benchmark.
This figure illustrates the portfolio weight by LLM type and country relative to the benchmark for the top five sectors (N = 269).
Fig 6.
Risk-adjusted returns relative to benchmark.
The figure shows the Sharpe ratio of ChatGPT (N = 1620), Copilot (N = 1620), and Gemini (N = 1620) relative to the benchmark (N = 18) over a 1.5-year period following the initial data collection.
Fig 7.
More current LLMs show similar investment portfolio risks.
We find (A) above benchmark geographical cluster risk (N = 359), (B) sector cluster risk (N = 357), (C) trend chasing risk (N = 360), and (D) active investment risk (N = 270) also for current versions of LLMs. The violin plots and boxplots represent the shape of the distribution of the respective investment risk by LLM. The black dot represents the mean and the colored dots the value of individual portfolios. The dashed line represents the value of the benchmark.
Fig 8.
Broad debiasing reduces the overall financial investment portfolio risks in LLM financial advice (Study 2b).
We find (A) above benchmark geographical cluster risk (N = 180), (B) sector cluster risk (N = 180), (D) active investment risk (N = 180), and (C) non-different trend chasing risk (N = 180) when using a broad debiasing intervention. The violin plots and boxplots represent the shape of the distribution of the respective characteristic by condition. The black dot represents the mean and the colored dots the value of individual portfolios. The dashed line represents the value of the benchmark.
Fig 9.
Incorporating a social responsibility goal in the prompt causes overleverage in utilities (A) and decreases risk of low ESG rating (B) (Study 3).
(A) illustrates the portfolio weight by condition and country relative to the benchmark for the top five sectors (N = 180). In (B) (N = 180) the violin plots and boxplots represent the shape of the distribution of the ESG score by condition. The black dot represents the mean and the colored dots the value of individual portfolios. The dashed line represents the ESG score value of the benchmark.