Figures
Abstract
Research on small and medium-sized enterprises (SMEs) access to bank finance is vital for the euro area economy. SMEs heavily represent the European business sector, employing around 100 million people and accounting for more than half of the Gross Domestic Product. Research studies in the field often rely on the ECB/EC Survey on the Access to Finance of Enterprises (SAFE). Many studies employ probit or logit models with categorical dependent variables derived from SAFE. The research findings show that hardly any study employs the simpler linear probability model (LPM), with a dominant lack of research providing evidence that justifies the model selection process and suitability. However, it is well known that different econometrics models can lack consistency and frequently yield different results. Yet, the literature has no consensus on the best econometric approach. In addition, there is a lack of robustness tests in the literature to ensure model validity, underlining the need for a comprehensive review of the methodological framework that dominates SAFE data use. This paper addresses the identified research gap by introducing a robust methodological framework that helps researchers identify and choose an appropriate categorical model when using SAFE data. The study adds significant value to the extant literature by identifying four criteria that need to be considered when selecting the appropriate model among three common binary dependent models: LPM, probit and logit models. The findings show that the probit model was appropriate is all cases but that the LPM should not be disregarded, as it can be used in two cases: when considering the interaction between monetary policy and debt to assets and monetary policy and innovation. The use of the LPM is justified as a less complex econometric model, allowing for clearer communication of the results. This innovative, robust approach to choosing the appropriate econometric categorical dependent model when employing SAFE data contributes to support policy effectively.
Citation: Finnegan M, Morales L (2024) A methodological framework for exploring SME finance with SAFE data. PLoS ONE 19(8): e0307361. https://doi.org/10.1371/journal.pone.0307361
Editor: Daphne Nicolitsas, University of Crete, GREECE
Received: March 15, 2024; Accepted: July 3, 2024; Published: August 29, 2024
Copyright: © 2024 Finnegan, Morales. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: A description of the data set and the third-party source: The survey on the access to finance of enterprises (SAFE) provides information on the latest developments in the financial situation of enterprises, documents trends in the need for and availability of external financing, and measures firms’ expectations about their selling prices, costs, and employment, as well as about euro area inflation. The survey results are broken down by firm size, sector, country, firm age, financial autonomy and ownership. The survey is conducted every quarter as of 2024 Q1 (formerly every six month), three quarters by the ECB covering euro area countries (see results below) and once in cooperation with the European Commission covering all EU countries plus some neighbouring countries. If you would like to access the anonymised SAFE microdata, please fill in and return the confidentiality declaration form available at https://www.ecb.europa.eu/stats/ecb_surveys/safe/html/index.en.html#dd to the survey access team via email: survey.accesstofinance@ecb.europa.eu. Other data is publicly available from the ECB Statistical Data Warehouse, Eurostat and the International Monetary Fund. All relevant data are within the paper and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Small and medium sized enterprises (SMEs), comprising 99.8% of firms in the euro area economy, play a crucial role in economic growth and employment. In 2022, they contributed 52% of the total value added (€3.95 billion) and represented 64% of total employment in the European Union’s non-financial business sector [2]. Euro area SMEs are especially bank-dependent as they find it difficult to borrow in the corporate bond market or raise capital in the stock market due to their opacity and associated risk [3, 4]. Therefore, euro area banks have an important intermediation role to play in supporting macroeconomic stability given that bank credit is among the crucial determinants for SME survival and growth [5, 6]. SMEs access to bank finance has been a major focus for researchers, given this economic relevance and bank dependence.
Much of the literature investigating SMEs access to bank finance access constructs binary dependent variables using the ECB/EC Survey on the Access to Finance of Enterprises (SAFE). SAFE provides information on the latest developments in the financial situation of enterprises and documents trends in SMEs’ demand for and access to bank loans. It is a cross-sectional dataset with only a subset of the respondents to a given wave interviewed in another wave. In general, the literature constructs a categorical dependent variable from the survey’s Q7a and Q7b, which asks if SMEs sought a loan and if their request was granted [1, 7–15]. Much of the literature employs nonlinear models such as probit—and, to a lesser extent—logit to estimate their categorical dependent variable. However, the literature lacks consensus and clarity regarding the optimal econometric framework that researchers should consider when using categorical dependent variables constructed from SAFE. Yet, it is well understood that different models can yield different results. Further, there is a lack of a diagnostic framework employed in the literature to ensure that models are robust.
This paper contributes to existing literature by introducing a methodological framework aimed at aiding researchers in determining the preferable categorical dependent model when utilising SAFE data. Such a robust approach is necessary as more robust models inform policy more effectively. This paper considers the three most common models used by researchers: the linear probability model and nonlinear models such as probit and logit. It is an extension study on the recent work by Finnegan and Kapoor [1], which compares their probit model to an LPM and logit model. It suggests that researchers should consider using the LPM if four criteria are met given its superior ease of interpretation. It suggests using nonlinear models such as logit and probit in the absence of these criteria and that the choice between these nonlinear models should be based on which model performs better using a proposed comprehensive diagnostic framework. The remainder of this paper is structured as follows. Section 2 explores the literature surrounding SMEs’ access to finance and the methodological and diagnostic approach present in the SAFE literature. Section 3 provides insights into the SAFE dataset. Section 4 presents the methodological framework for choosing one model over another. Section 5 applies this framework to the empirical model and sample used by Finnegan and Kapoor [1]. Section 6 offers some concluding remarks.
2. Literature
The literature review examines SMEs’ access to bank finance using the SAFE dataset and discrete choice models. SAFE’s categorical survey responses, particularly from Q7a and Q7b, are often used to construct dependent variable models for assessing whether SMEs sought a loan and if it was granted. Table 1 shows that nonlinear models, especially probit (57%) and, to a lesser extent, logit (28%), are commonly employed in the SAFE literature—since the first article by Artola and Genre [16] which used SAFE—which had started in 2009. The research findings identified a common trend where the linear probability model is rarely used without justifying this econometric model’s neglect (7.4%).
Moreover, the literature has no clarity regarding the methodological framework that leads towards more consistent and reliable estimations. Yet, in the econometric field, it is well documented that the choice of econometric framework can lead to different outcomes, and that some models perform better than others. There is no evidence why the probit model is the dominant methodology employed, with the literature justifying its use solely on the nature of the binary dependent variable [7–9, 11, 12, 14, 15].
Further, logit and probit models are very alike in that they generally yield similar results and have the same asymptotic properties [17, 18]. Therefore, there is no compelling reason to choose one over another [19], and it is often a matter of personal choice for the researcher for binary dependent variable models [17, 18]. In addition, there may be good reasons to employ an LPM given its ease of computation, interpretation and the fact that its estimated effects are often reasonable and in alignment with practice [20–22].
Table 1 below shows that the extant literature on diagnostic statistics is concise while Table 2 shows some common measures of goodness of fit for binary dependent variables proposed in the econometrics literature and documents their scant use in the SAFE literature.
The core findings highlight how 50% of SAFE studies use the pseudo R2 as a goodness of fit measure as the coefficient of determination R2 cannot be applied to nonlinear categorical dependent models as a measure for goodness of fit [40–42]. However, Hemmert et al. [43] and Williams [44, 45] argue that reporting unknown pseudo R2 is meaningless given the plethora of existing measures and their definitional differences. However, only Mac an Bhaird [25] and Guercio et al. [35] acknowledge that they employ McFadden’s Pseudo R2 and none of the literature comments on these measures as a goodness of fit. 18% of the SAFE literature uses the percentage of correct predictions (PCP)—which uses a cut-off of 0.5 to assign probabilities: if ≥ 0.5, set = 1; otherwise set = 0 [40–42, 46].
However, according to Herron [47], Menard [48] and Gelman and Hill [49], reporting PCP on its own is pointless, and the estimated model needs to be compared to the null model. However, none of the SAFE literature reports the percentage reduction in error (PRE), which is a measure comparing the predictive success of the estimated model (PCP) to a null model, that is, the proportion of the dependent variable in the model category of the observed data (PMC) [48, 50, 51]. Further, the econometrics literature suggests that there is a need to report the expected ePCP and ePRE–proposed by Herron [47], which deals with the arbitrary choice of 0.5 used in the PCP and PRE [47, 52, 53]. This measure never appears in the SAFE literature. The receiver operating characteristics (ROC) graph is another goodness of fit measure—a technique for visualising, organising and selecting classifiers based on their performance—proposed in the econometrics literature [54–56] but does not feature in the SAFE literature.
Finally, the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) penalise models for adding additional variables—and can be used to assess both LPM and nonlinear models—have become increasingly popular in the broader literature as measures of goodness of fit to distinguish among models [44, 57, 58]. However, AIC and BIC appear in only 7.1% of the SAFE literature.
Table 2 shows common inference tests—to see if the model is significant—proposed in the econometrics literature for nonlinear binary dependent variables models such as the Wald test [45, 48, 59] and the likelihood ratio test [43, 48, 60]. However, Table 2 shows that the SAFE literature is scant on diagnostic testing for joint significance of variables with the Wald test present in 25% of studies and the Likelihood ratio test present in just 14.3% of literature with no discussion on the results of these tests. A common test for the joint significance of variables for the LPM is the F test, however, the literature does not report the F test when LMPs are employed [14, 15].
The reviewed literature underlines the importance of identifying a robust methodological framework which researchers investigating SMEs access to finance using SAFE data could use to choose one binary dependent variable model over another—given the lack of clarity in the SAFE literature for choosing a binary dependent model. Moreover, the probit and logit models are the focus of this study, given their relative dominance in the literature. While probit or logit may be preferred over LPM given the well documented problems of linear models estimating binary variables there is a need to offer evidence on models performance that enable a research-informed process that supports researchers when assessing which model is the best fit for their study. Some initial elements to be considered in the context of the LPM model are that its estimates are not constrained to the unit interval and that Ordinary Least Square (OLS) estimation imposes heteroskedasticity in the case of a binary response variable [21, 53, 61].
Further, the LMP is identified as problematic as it assumes that the Pi = E(Y = 1|x) increases linearly with X; that is, the marginal or incremental effect of X is constant throughout, and this may not be the case with a binary model [20, 62]. This paper explores if these reasons exist to elevate probit and logit models over LPM in the context of SAFE data. In addition, even though there are a number of problems associated with the LPM, and it is employed in only 7.4% of studies, this paper considers this model for a number of reasons. First, the results emerging when OLS when applied to LPM are often similar to results emerging from maximum likelihood applied to a probit or logit model when sample sizes are large, despite the unboundedness problem inherent in a LMP [21, 53, 63]. Indeed, estimated effects and predictions with LPM are often reasonably good in practice [20–22]. Second, while probit and logit both capture the nonlinear nature of the population regression function better than the LPM, they are harder to interpret [22, 64]. This justifies the investigation of LPM as an alternative to nonlinear models such as probit or logit, given its superior simplicity in the interpretation of results [22, 63, 65]. The next section explores the SAFE dataset.
3. Data
SAFE provides information on the latest developments in the financial situation of enterprises and documents trends in SMEs demand for and access to bank loans and is published every six months. There have been 29 SAFE waves conducted starting in 2009 after the financial crisis affected the euro area. The firm level SAFE also includes information on firms’ responses to questions regarding their characteristics in terms of age, size, sector, turnover ownership status and legal form. In addition, it includes an assessment of the SMEs own view of their credit risk. The SAFE sample includes only non-financial firms and companies are selected randomly from the Dun & Bradstreet business register [66]. All survey-based percentages are weighted statistics that restore the proportions of the economic weight (in terms of employees) of each class size, economic activity and country [66].
Relevant aspects to be considered when using SAFE relate to the database limitations. For example, SAFE is a cross-sectional dataset with only a subset of the respondents to a given wave interviewed in another wave. This restricts the use of firm-fixed effects that would help to identify omitted variable bias related to firm-specific heterogeneity [9]. Further, the publicly available SAFE is anonymised and does not identify firms or match them to their banks, unlike other data sources such as credit registers used to study SMEs access to finance arising from UMP [67–71]. Finally, the dataset provides mainly qualitative information, which contains subjective responses to survey questions, which may not be supported by balance sheet information [15].
Even so, it is argued that SAFE is a useful dataset for studying SMEs’ access to bank finance. First, SMEs are generally bank-dependent [13, 72, 73] and SAFE is a rich data source on SMEs access to bank finance. Second, it includes discouraged borrowers, giving us a broader view of the credit markets than other data sources such as credit registers [9–11, 74]. Third, it is used extensively in the literature that studies SMEs’ access to bank finance. Finally, it is a very reliable data set given that it is conducted by the ECB/EC and is used by the ECB to evaluate its monetary policy interventions on SMEs [31, 75]. Indeed, the ECB conducts validity checks to ensure that survey answers are accurate [11]. The next section outlines a methodological framework for choosing between discrete choice models when using SAFE and applies it to an empirical model.
4. Methodology
This section expands on the broad literature that uses SAFE to explore SMEs’ access to bank finance, as outlined in Table 1. In particular, it in an extension of the study by Finnegan and Kapoor [1]—which studies the impact of unconventional monetary policy on SMEs’ access to bank finance—in terms of assessing the methodological framework when using binary dependent variables and SAFE. It employs this paper, given their focus on the post-crisis period from 2014–2019 relative to the other literature examining UMP and SMEs, which tends to focus on the financial and/or sovereign debt crisis from 2008 [9, 11, 13, 31, 76].
Further, it expands on this work given that this paper uniquely uses measures of risk from the firms’ point of view to evaluate if recently leveraged firms and risky firms are more credit-constrained in times of expansive UMP–to act as a counterpart to the risk-taking channel of monetary policy. The risk-taking channel describes how UMP can lead to excessive risk-taking [71, 77–79]. They find that UMP may trickle down to SMEs unevenly due to their location, even in a post-crisis environment for recently leveraged SMEs [1]. Section 3.1 summarises this empirical model, and Section 3.2 outlines the methodological framework for choosing one model over another.
4.1 Empirical model
Finnegan and Kapoor’s [1] use SAFE data to construct a binary dependent variable—credit constrained—and employ a probit model. This dependent variable ‘credit constrained’ equals 1 if the firm reported to have (i) applied for bank loans in the previous six months but was rejected (Credit Denied) or (ii) applied but received less than 75% of its demand (Rationed) or (iii) refused credit because it was offered at too high a cost (Refused due to high cost) or (iv) not applied because of possible rejection (Discouraged). Alternatively, the variable equals 0 if the firm reported having applied for bank loans in the previous six months and received everything or 75% and above. Given that the indicator is equal to 1 if the firm is credit-constrained, a negative coefficient indicates that SMEs are less likely to be credit-constrained.
They propose two hypotheses using the following empirical specifications:
Hypothesis 1 (H1): UMP decreases the probability of firms with increased debt-to-assets being credit-constrained.
To test H1, they model Eq 1 as follows: (1)
MPt−2 is the one-year lag (equivalent to two survey waves in SAFE) of the logarithm of the assets of individual central bank balance sheets—minus autonomous factors—for stressed countries. Debt-to-assets increased is a categorical variable, which equals 1 if the firm’s debt-to-assets increased, and 0 if it remained the same or decreased in the previous six months. For H1, γ′ is an interaction term and is the main coefficient of interest that captures if firms with increased debt-to-assets decreased their probability of being credit-constrained during UMP. It is expected that UMP for leveraged firms should make accessing bank finance easier via their improved balance sheets and collateral, and this should translate into a reduction in credit constraints [80, 81]. A negative relationship is, therefore, expected between the probability that a firm is credit-constrained and the interaction between increased debt-to-assets ratio and UMP. S1 Table outlines the main variables, their definition and their data source employed in the regressions for H1 and H2.
Hypothesis 2 (H2): UMP reduces the probability of risky firms being credit-constrained
To test H2, they model Eq 2 as follows: (2)
For H2, the interaction between monetary policy and firm risk variables is the main coefficient of interest—γ′—as it captures the probability of a risky firm being credit-constrained during periods of UMP. FirmRiski,c,t is modelled categorically, and the measures for firm risk include a future predictor of risk—profit decreased in the previous six months, as well as a selection of subjective measures of risk—the firm’s own view if there has been deterioration in credit history, own economic outlook and own capital in the previous six months and finally, an activity-based measure of risk—innovative activity given that such activity is more uncertain and therefore riskier. Given the low-interest rate environment generated by UMP, banks are expected to chase higher yields. This will manifest in lending to riskier firms via the monetary policy risk-taking channel [76, 82]. A negative relationship is, therefore, expected between the probability that a firm is credit-constrained and the interaction between increased firm risk and UMP.
The model controls for confounding factors that might influence loan supply and loan demand, such as firm-level heterogeneity, the stage of the economic cycle and bank characteristics. Xi,c,t is a set of firm level covariates to control for firm heterogeneity with subscript i, c and t indicating firm, country and time respectively. Macroc,t,−2 is a vector of macroeconomic variables to control for the economic cycle. BankChc,t−2 easures banks’ balance sheet health indicators at the country-level which impacts credit supply and demand. Time-fixed effects are added when monetary policy is measured at the country level to exclude unobserved variables that evolve over time but are constant across firms. Further, sector-country fixed effects (τc,s) are included to eliminate any shocks common to all firms in the same sector and in the same country. The next section outlines the methodological framework for choosing among models.
4.2 Framework for choosing appropriate model
The framework proposed to guide researchers in their choice of a binary dependent variable model using SAFE draws on the econometrics literature regarding the LPM and non-linear models such as probit and logit. Fig 1 outlines the framework proposed. Put simply, it suggests that researchers should employ the LPM—given its ease of interpretation and computation and that its estimates are often reliable in practice—rather than probit or logit if four criteria are met. First, 100% of LPM probabilities should be bounded by 0–1 [21, 63, 83].
Second, strong correlations should exist between the LPM models and nonlinear models such as probit and logit [21]. Third, the estimates should be similar to those from nonlinear probability models [20–22, 63]. Fourth, the LPM model should perform relatively well compared to the probit and logit across a comprehensive range of goodness of fit statistics and test statistics. If the LPM does not meet these four criteria, the researcher should choose the best-performing model from the probit and logit models using the diagnostic framework outlined in Table 2.
These goodness of fit statistics proposed are outlined in Table 2 and include McFadden’s pseudo R2, McFaddens’s adjusted pseudo R2, percentage of correct predictions (PCP), percentage reduction in error (PRE), the expected PCP and PRE, the BIC and AIC and area under the ROC curve. McFadden’s pseudo R2 is chosen as this measure satisfies almost all of Kvålseth [41] eight criteria for a good R2 [40, 41, 42]. The inference tests proposed are the Wald test and Likelihood ratio test. The Lagrange multiplier test is also used for joint significance of variables in the literature [44, 45]. The next section uses this proposed framework to choose among binary dependent variables and applies it to the model specification used by Finnegan and Kapoor [1].
5. Findings
This section reports the results using the most saturated empirical model (with macro, bank and firm controls with country sector and time fixed effects and robust heteroskedastic standard errors) and sample over 2014–2019 from SAFE used by Finnegan and Kapoor [1] for the LPM, probit and logit model. It aims to assess which model, LPM, probit or logit is the most appropriate based on the methodological framework for evaluating models proposed in this paper. Section 5.1 presents the findings for H1: MP t-2 x debt to assets increased. Section 5.2 presents the findings for H2: MP t-2 x firm risk.
5.1. H1: MPt-2 x Debt to assets increased
Fig 2 shows the quantile-quantile plot for probability of Debt to Assets increased with probit versus LPM. At first glance, the unboundedness so common in LPM is evident for this regression. On further investigation, Table 3 which shows the range of probabilities for H1 and the percentage of probabilities that fall within the range of 0 and 1.
Range and percentage of probabilities 0–1 for LPM.
It can be seen that 99.34% of observations fall inside the unit interval for this regression. If no (or very few) predicted probabilities lie outside the unit interval then the LPM is expected to be unbiased and consistent (or largely so) [21, 63]. Table 4 shows the correlation coefficients for H1: MPt-2 x Debt to assets increased for logit, probit and LPM, and it can be seen that the correlations are over 0.995 and significant at the 1% level.
Table 5 shows H1: MPt-2 x Debt to assets using the three econometrics methods. It can be seen that the estimates do not change for the main variable of interest (Further, estimates do not change for any of the control variables when estimated by the three different techniques and these more comprehensive results are reported in the S2–S6 Tables). The LPM is comparable in terms of PCP and PRE and the LPM is associated with a higher likelihood statistic. This regression meets the four criteria for choosing an LPM, and this may be preferred given its easier interpretation.
5.2. H2: MPt-2 x firm risk
Firm risk is proxied using SAFE categorical variables: Profit decreased, credit history deteriorated, own outlook deteriorated, and own capital deteriorated and innovation (if the firm innovated in the previous six months). Fig 2 shows the quantile-quantile plots for probit versus LPM for all measures of risk, and it can be seen that in each case, the LMP is unbounded in the range of (0,1). However, Table 3 shows that the predicted probabilities fall outside the unit interval to a small degree for each case. Further, Table 4 shows the correlation coefficients for LPM, logit, and probit, which are above 0.99 and significant at the 1% level. Table 6 to 11 show the regressions for each interaction with various measures of risk: Profit decreased (Table 6), credit history deteriorated (Table 7), own outlook deteriorated (Table 8), own capital deteriorated (Table 9) and innovation (Table 10). Profit decreased, credit history deteriorated, outlook deteriorated, and capital deteriorated, and they do not meet all the criteria for an LPM. Consistent estimates across LPM and nonlinear models generally do not meet the criteria. In these cases, a nonlinear model is preferred. In each of these cases, the diagnostics indicate that logit and probit perform equally well across all goodness of fit statistics, but the probit model displays a higher Wald statistic and, therefore, is the most appropriate model for all regressions. MPt-2 X Innovation does satisfy the criteria for choosing an LPM; the unit interval bounds 0.996% of its predicted probabilities, the correlation coefficients between LPM, logit and probit are high and significant, the estimates are the same for the three methodologies, the goodness of fit statistics are comparable, and the Likelihood ratio is higher for the LPM.
These findings illustrate a framework for choosing one binary dependent variable model over another. It is a valuable addition to the researcher’s toolkit for taking a robust approach to research decisions on which model to employ in their analysis of SMEs credit constraints using SAFE data. The LPM is suitable for MPt-2 x Debt to assets and MPt-2 x Innovation when applied to the SAFE sample and empirical model presented in Finnegan and Kapoor [1] as it satisfies the four criteria outlined in the diagnostic framework outlined in Fig 1. However, the probit model is more appropriate for MPt-2 x profit decreased, MPt-2 x credit history deteriorated, MPt-2 x outlook deteriorated and MPt-2 x capital deteriorated. Even though the LPM is suitable for MPt-2 x Debt to assets and MPt-2 x Innovation, if a researcher wishes to choose a probit model for consistency in their reporting, the probit model is more suitable given its higher Wald statistic indicating a better model fit. Further, there may be other reasons to choose a probit model. Logit estimators are on the log odds scale, whereas the probit models provide probabilities that are easier to interpret [1, 61]. Further, probit uses the normality assumption, allowing for easier analysis [20].
5.3 Multicollinearity and outliers
In the SAFE literature, there is barely any discussion on checks for multicollinearity and no discussion of outliers. In terms of multicollinearity, Sclip [11], Calabrese et al. [12] and Mac an Bhaird [25] (11.6% of studies) use the Variance Inflation Factor (VIF) while Mol-Gómez-Vázquez et al. [37] use correlation matrices to identify if there is correlation between the independent variables. There is no discussion of outliers, yet, SAFE is a cross-sectional survey conducted across EU countries, each experiencing different economic cycles over time. Yet, it is well known that either phenomenon can distort regression results, and this study checks to ensure that neither multicollinearity or outliers influence the results. In this study, correlation tables are used to identify pairwise correlations. The mean Variance Inflation Factor (VIF) is used to identify the extent to which a given explanatory variable can be explained by all the other explanatory variables in the equation.
The correlation tables in S7 Table indicates that, in general, pairwise correlation is not present to any great degree with no correlation in excess of 0.8, which is considered acceptable [19, 84, 85]. However, the correlation tables show perfect multicollinearity between the interaction term and its constituent parts. This is because there is structural multicollinearity in the model as the key variable of interest for H1 and H2 is an interaction term: H1: and H2: . The Variance Inflator Factor (VIF) also reflects this structural multicollinearity. Tables 5–10 shows that when the interaction term is present the VIF is between 22 and 28 (Tables 5–10). However, when the interaction term is excluded, the VIF falls below 5, which is considered to indicate an acceptable level of multicollinearity [53, 86, 87].
One solution to interaction terms and structural multicollinearity cited in the literature is to mean centre a constituent variable (standardise the variable by subtracting the mean) [27, 53, 87]. However, this is not possible with a binary variable like DebttoAssets or FirmRisk. Another solution is to increase the sample size as logit and probit regression uses maximum likelihood estimation (MLE), which relies on large-sample asymptotic normality. This means that the reliability of estimates increases when the sample size is large enough [17, 18, 88]. However, the sample size (11,319, which drops to 8,777 on account of the monetary policy lag employed) cannot be increased as it is determined by the research focus on stressed countries and credit-constrained SMEs from 2014–2019.
This study acknowledges and underlines the necessary degree of multicollinearity arising from the interaction term due to its importance in addressing the hypothesis. This is justified because standard errors are relatively small, and the estimates do not change much when more variables are added to the model (S2–S6 Tables), indicating that multicollinearity does not compromise the model [19, 61, 89].
In terms of outliers, there is no discussion in the SAFE literature, despite SAFE being a cross-sectional survey across EU countries with different countries experiencing different economic cycles over time. A single observation that is substantially different from all other observations can make a significant difference in the regression analysis results [19, 53, 90]. Studies using SAFE need to consider outliers explicitly, and this paper identifies outliers for each regression and runs the regressions, excluding the outliers to identify if the estimates change. In particular, this study takes the following three steps to identify if the outliers distort the estimates:
- Identifies the number of outliers—observations with deviance residuals > |3| and their average [91, 92]. Residuals greater than the absolute value of 3 are in the tails of a standard normal distribution and this usually indicates strain in the model [91]. Deviance residuals can be roughly approximated with a standard normal distribution when the model holds [92].
- Plots the deviance residuals against the predicted values for credit constrained to identify any extreme values
- Run the regressions without the observations with deviance residual > |3|
In terms of outliers, Tables 5–10 show the estimates for the interaction terms for both H1 and H2 when observations with a deviance residual > |3|are excluded; the main results are very similar, which ensures confidence in the results [91]. The next section concludes the research study.
6. Conclusion
SMEs’ access to bank finance has been a significant focus for researchers given that they are critical for the euro area economy and they are primarily bank dependent. SAFE data, especially responses to Q7a and Q7b, which document outcomes of SMEs’ application for bank loans, has been employed extensively in the literature to study SME access to bank finance. Most of these studies construct a categorical dependent variable from SAFE data, with probit emerging as the dominant model (57%), followed by logit (28%) and then LPM (7.4%). However, the literature fails to provide a justification for their methodological approach; simply citing a categorical dependent model requires a nonlinear model without considering the superior simplicity of interpreting LPM. Further, there is a dearth of goodness of fit and inference test statistics employed in the literature to determine if the models are a good fit or if they are statistically significant. There is thus a need to provide insights into analysis using SAFE data and binary dependent models. This research offers an initial contribution to understanding an appropriate methodological framework in the context of SAFE data and categorical dependent variables. It adds to the literature by providing a framework for researchers to choose one model over the other by identifying four criteria for the LPM to be considered an appropriate model. Further, it provides a diagnostic framework that researchers can deploy to ensure that their models perform robustly. The research study uses the sample and empirical model presented in Finnegan and Kapoor [1] to apply this framework and finds that their probit model was appropriate in all cases. However, this analysis shows that Finnegan and Kapoor [1] could have used a LPM in two cases: when considering the interaction between monetary policy and debt to assets and monetary policy and innovation. The use of the LPM is justified as a less complex econometric model, allowing for clearer communication of the results. By proposing a robust methodological framework when using SAFE to investigate SMEs’ access to bank finance, this paper fosters further robust research using the SAFE dataset to investigate SMEs’ access to finance within Europe by adding to the researcher’s toolkit in the practical and robust application of methodologies.
Some limitations to this study require further research to keep building a solid framework on using SAFE data for policy making. This study employs robust standard errors but does not investigate the rationale for this standard error treatment. Other standard error treatments, such as clustering at the country level, may be more appropriate as these regressions combine the aggregate effect of monetary policy on micro units by merging aggregate data with micro-observations from SAFE [93]. Further, this study does not employ firm fixed effects—in line with the majority of literature which uses SAFE—given the cross-sectional nature of the SAFE firm data. Using firm fixed effects would further isolate the impact of UMP on SME bank liquidity limitations by absorbing any firm-specific credit demand shocks. The appropriate use of standard errors and firm fixed effects is worthy of further investigation.
Supporting information
S1 Table. Variables, definition and data source.
https://doi.org/10.1371/journal.pone.0307361.s002
(DOCX)
References
- 1. Finnegan M, Kapoor S. ECB unconventional monetary policy and SME access to finance. Small Business Economics. 2023 2023/02/18. pmid:38625295
- 2.
EC. European Union SME Fact Sheet. 2023.
- 3. Langfield S, Pagano M. Bank bias in Europe: effects on systemic risk and growth. Economic Policy. 2016;31(85):51–106.
- 4. Hoffman M, Maslow E, Sorensen B. Small Firms and Domestic Bank Dependence in Europe’s Great Recession. Journal of International Economics. 2022;137.
- 5. Gerlach-Kristen P, O’Connell B, O’Toole C. Do credit constraints affect investment and employment? The Economic and Social Review. 2015;46(1).
- 6.
Kallandranis C, Kalantonis P. A New Perspective in Credit Rationing for SMEs. In: Kostyuk A, Guedes M, Govorun D, editors. Corporate Governance: Examining Key Challenges and Perspectives. Sumy, Ukraine: Virtus Interpress; 2020. p. 222–4.
- 7. Ferrando A, Popov A, Udell GF. Sovereign stress and SMEs’ access to finance: Evidence from the ECB’s SAFE survey. Journal of Banking & Finance. 2017 2017/08/01/;81:65–80.
- 8. Kaya O, Masetti O. Small and medium sized enterprise financing and securitization: Firm-level evidence from the euro area: SME financing. Economic Inquiry. 2018;57(8).
- 9. Ferrando A, Popov A, Udell G, F. Do SMEs benefit from unconventional monetry policy and how? Microevidence from the eurozone. Journal of Money, Credit and Banking. 2019;51(4):895–928.
- 10. McQuinn J. SME access to finance in Europe: structural change and the legacy of the crisis, Central Bank of Ireland: Research Technical Paper, No. 10. 2019.
- 11. Sclip A. Do SMEs benefit from the corporate sector purchase program? evidence from the eurozone. The European Journal of Finance. 2022 2022/08/13;28(12):1212–36.
- 12. Calabrese R, Girardone C, Sclip A. Financial fragmentation and SMEs’ access to finance. Small Business Economics. 2021;57:2041–65.
- 13. Betz F, De Santis R. ECB corporate QE and the loan supply to bank dependent firms. International Journal of Central Banking. 2022;18:107–48.
- 14. Corbisiero G, Faccia D. Firm or bank weakness? Access to finance since the European sovereign debt crisis. European Central Bank, Working Paper Series, No. 2361. 2020.
- 15. Ferrando A, Mulier K. The real effects of credit constraints: Evidence from discouraged borrowers. Journal of Corporate Finance. 2022 2022/04/01/;73:102171.
- 16. Artola C, Genre V. Euro Area SMEs under financial constraints: Belief or Reality? CESifo: Working Paper Series, No. 3650. 2011.
- 17. Hahn ED, Soyer R. Probit and logit models: Differences in the multivariate realm. The Journal of the Royal Statistical Society, Series B. 2005;67:1–12.
- 18.
Borooah V. Logit and Probit. Thousand Oaks, California: Sage; 2002. Available from: https://methods.sagepub.com/book/logit-and-probit.
- 19.
Gujatati D, Porter D. Basic Econometrics. 5th ed. London: McGrawHill Irwin; 2009.
- 20.
Wooldridge J. Introductory Econometrics: A Modern Approach. United Kingdom: South Western College Publishing, Thompson Learning; 2000.
- 21. Friedman J. Whether to probit or to probe it: in defense of the Linear Probability Model 2012 [cited 2022 01.02.22]. Available from: https://blogs.worldbank.org/impactevaluations/whether-to-probit-or-to-probe-it-in-defense-of-the-linear-probability-model.
- 22. Von Hippel P. When can you fit a Linear Probability Model? More often than you think. Statistical Horizons 2017 [cited 2022 20.02.22]. Available from: https://statisticalhorizons.com/when-can-you-fit/#:~:text=As%20a%20rule%20of%20thumb,log%20odds%20is%20almost%20linear.
- 23. Holton S, Lawless M, McCann F. Firm credit in the euro area: a tale of three crises. Applied Economics. 2014 2014/01/12;46(2):190–211.
- 24. Ferrando A, Mulier K. Firms’ Financing Constraints: Do Perceptions Match the Actual Situation? Economic and Social Review. 2015 03/01;46.
- 25. Mac an Bhaird C, Vidal JS, Lucey B. Discouraged borrowers: Evidence for Eurozone SMEs. Journal of International Financial Markets, Institutions and Money. 2016 2016/09/01/;44:46–55.
- 26. Demoussis M, Drakos K, Giannakopoulos N. The impact of sovereign ratings on euro zone SMEs’ credit rationing. Journal of Economic Studies. 2017;44(5):745–64.
- 27. Galli E, Mascia DV, Rossi SPS. Does Corruption Influence the Self-Restraint Attitude of Women-led SMEs towards Bank Lending? CESifo Economic Studies. 2018;64(3):426–55.
- 28. García-Posada Gómez M. Credit constraints, firm investment and employment: Evidence from survey data. Journal of Banking & Finance. 2019 2019/02/01/;99:121–41.
- 29. Guercio M, Martinez L, Bariviera A. SME steeplechase: When obtaining money is harder than innovating. International Journal of Financial Studies. 2019;7(2):25.
- 30. Mc Namara A, O’Donohoe S, Murro P. Lending infrastructure and credit rationing of European SMEs. The European Journal of Finance. 2020;26(7–8):728–45.
- 31. Ertan A, Kleymenova A, Tuijn M. Financial Intermediation Through Financial Disintermediation: Evidence from the ECB Corporate Sector Purchase Programme, Fama-Miller Working Paper, Chicago Booth Research Paper, 18–06. 2020;18–06.
- 32. Beyhaghi M, Firoozi F, Jalilvand A, Samarbakhsh L. Components of credit rationing. Journal of Financial Stability. 2020;50:100762.
- 33. Moro A, Maresch D, Fink M, Ferrando A, Piga C. Spillover effects of government initiatives fostering entrepreneurship on the access to bank credit for entrepreneurial firms in Europe. Journal of Corporate Finance. 2020;62:101603.
- 34. Galli E, Mascia D, Rossi S. Bank credit constraints for women‐led SMEs: Self‐restraint or lender bias. European Financial Managment. 2020;26(4):1147–88.
- 35. Guercio B, Martinez L, Bariviera F, Scherger V. Credit Crunch or Loan Demand Shortage: What Is the Problem with the SMEs’ Financing? Czech Journal of Economics and Finance, 70(6), 521–540. https://journal.fsv.cuni.cz/mag/article/show/id/1475. 2020.
- 36. Ferrando A, Ganoulis I, Preuss C. What were they thinking? Firms’ expectations since the financial crisis. Review of Behavioral Finance. 2021;13(4):370–85.
- 37. Mol-Gómez-Vázquez A, Hernández-Cánovas G, Koëter-Kant J. Banking stability and borrower discouragement: a multilevel analysis for SMEs in the EU-28. Small Business Economics. 2022;58:1579–93.
- 38. Kallandranis C, Drakos K. Self-Rationing in European Businesses: Evidence from Survey Analysis. Finance Research Letters. 2021;41:101807.
- 39. Kallandranis C, Anastasiou D, Drakos K. Credit rationing prevalence for Eurozone firms. Journal of Business Research. 2023 2023/03/01/;158:113640.
- 40. Menard S. Coefficients of Determination for Multiple Logistic Regression Analysis. The American Statistician. 2000;54(1):17–24.
- 41. Kvålseth TO. Cautionary Note about R 2. The American Statistician. 1985 1985/11/01;39(4):279–85.
- 42.
Allison PD, editor Measures of fit for logistic regression2014 2014: SAS Institute Inc Cary, NC.
- 43. Hemmert GAJ, Schons LM, Wieseke J, Schimmelpfennig H. Log-likelihood-based Pseudo-R2 in Logistic Regression: Deriving Sample-sensitive Benchmarks. Sociological Methods & Research. 2016 2018/08/01;47(3):507–31.
- 44.
Williams R. Scalar Measures of Fit, Pseudo R2 and Information Measures (AIC and BIC) 2020 [cited 2023 03.04.23]. Available from: Measures of Fit (nd.edu).
- 45.
Williams R. Logistic Regression, Part III: Hypothesis Testing, Comparison to OLS 2021 [03/03/22]. Available from: https://www3.nd.edu/~rwilliam/stats3/Logit03.pdf.
- 46.
Long JS, Freese J. Regression models for categorical dependent variables using Stata. 3rd ed: Stata press; 2014.
- 47. Herron MC. Postestimation Uncertainty in Limited Dependent Variable Models. Political Analysis. 1999;8(1):83–98. Epub 2017/01/04.
- 48. Menard S. Applied Logistic Regression Analysis. Thousand Oaks, California2002. Available from: https://methods.sagepub.com/book/applied-logistic-regression-analysis.
- 49.
Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2006.
- 50. Hagle T, Mitchell G II. Goodness of Fit Measures for Probit and Logit. American Journal of Political Science. 1992;36(3).
- 51. Sweeny K, Bartels B. Post estimation techniques in statistical analysis: Introduction to Clarify and S-Post in STATA 2004. Available from: https://polisci.osu.edu/sites/polisci.osu.edu/files/SPostPresentation2.pdf.
- 52. Greenhill B, Ward M, Sacks A. The separation plot: A new visual method for evaluating the fit of binary models American Journal of Political Science. 2011;55(4):991–1002.
- 53.
Studenmund A. A practical guide to using econometrics. 7th ed. London: Pearson; 2017.
- 54. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006 2006/06/01/;27(8):861–74.
- 55. Brown CD, Davis HT. Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems. 2006 2006/01/20/;80(1):24–38.
- 56. Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol. 2022 Feb;75(1):25–36. pmid:35124947. Pubmed Central PMCID: PMC8831439. Epub 20220118. eng.
- 57. Kuha J. AIC and BIC: Comparisons of Assumptions and Performance. Sociological Methods & Research. 2004 2004/11/01;33(2):188–229.
- 58.
Mohammed EA, Naugler C, Far BH. Chapter 32—Emerging Business Intelligence Framework for a Clinical Laboratory Through Big Data Analytics. In: Tran QN, Arabnia H, editors. Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology. Boston: Morgan Kaufmann; 2015. p. 577–602.
- 59.
Aldrich JH, Nelson FD. Linear probability, logit, and probit models: Sage; 1984.
- 60.
Hosmer D, Lemeshow S. Applied Logistic Regression. 2nd ed: Wiley; 2000.
- 61.
Stock J, Watson M. Introduction to Econometrics. 4th ed. London: Pearson; 2020.
- 62.
Gujatati D. Basic Econometrics. 4th ed. London: McGrawHill; 2003.
- 63.
Wooldridge J. Econometric Analysis of Cross Section and Panel Data. London: The MIT Press; 2002.
- 64. Hanck C, Arnold M, Gerber A, Schmelzer M. Introduction to Econometrics with R: Open Review; 2021.
- 65. Gomila R. Logistic or linear? Estimating causal effects of experimental treatments on binary outcomes using regression analysis. Journal of Experimental Psychology: General. 2021;150(4):700. pmid:32969684
- 66.
ECB. Survey on the access to finance of enterprises. Methodological information on the survey and user guide for the anonymised micro dataset. https://www.ecb.europa.eu/stats/pdf/surveys/sme/ecb.safemi.en.pdf. 2023.
- 67. Albertazzi U, Marchetti D. Credit Supply, Flight to Quality and Evergreening: An Analysis of Bank-Firm Relationships after Lehman. Bank of Italy Economic Working Paper (Temi di discussione), No. 756. 2010 01/01.
- 68. Iyer R, Peydró J-L, da-Rocha-Lopes S, Schoar A. Interbank Liquidity Crunch and the Firm Credit Crunch: Evidence from the 2007–2009 Crisis. The Review of Financial Studies. 2014;27(1):347–72.
- 69. Arce Ó, Mayordomo S, Gimeno R. Making Room for the Needy: The Credit-Reallocation Effects of the ECB’s Corporate QE*. Review of Finance. 2021;25(1):43–84.
- 70. Benetton M, Fantino D. Targeted monetary policy and bank lending behavior. Journal of Financial Economics. 2021 2021/10/01/;142(1):404–29.
- 71. Jimenez G, Ongena S, Peydro J-L, Saurina J. Hazardous times for monetary policy: What do twenty‐three million bank loans say about the effects of monetary Policy on credit risk‐taking? Econometrica. 2014;82(2):563–05.
- 72. Hoffmann M, Maslov E, Sørensen BE. Small firms and domestic bank dependence in Europe’s great recession. Journal of International Economics. 2022 2022/07/01/;137:103623.
- 73. Mkhaiber A, Werner R. The relationship between bank size and the propensity to lend to small firms: New empirical evidence from a large sample. Journal of International Money and Finance. 2021;110:102281.
- 74. Popov A. Monetary policy, bank capital and credit supply: A role for discouraged and informally rejected firms. International Journal of Central Banking. 2016;43.
- 75. Bańkowska K, Ferrando A, García J. Access to finance for small and medium-sized enterprises since the financial crisis: evidence from survey data. ECB Economic Bulletin, No. 4. 2020.
- 76. Ferrando A, Popov A, Udell GF. Unconventional monetary policy, funding expectations, and firm decisions. European Economic Review. 2022 2022/10/01/;149:104268.
- 77. Borio C, Zhu H. Capital regulation, risk-taking and monetary policy: A missing link in the transmission mechanism. Journal of Financial Stability. 2012;8:236–51.
- 78. Bruno V, Shin HS. Capital flows and the risk-taking channel of monetary policy. Journal of Monetary Economics. 2015 2015/04/01/;71:119–32.
- 79.
Coimbra N, Rey H. Financial Cycles with Heterogeneous Intermediaries. National Bureau of Economic Research, Inc, 2017 23245.
- 80. Caglio C, Darst M, Kalemli-Ozcan S. Risk-Taking and Monetary Policy Transmission: Evidence from Loans to SMEs and Large Firms. CEPR Discussion Paper, No. DP17174. 2022.
- 81. Durante E, Ferrando A, Vermeulen P. Monetary policy, investment and firm heterogeneity, European Central Bank: Working Paper Series, No.2390. 2020.
- 82. Albertazzi U, Becker B, Boucinha M. Portfolio rebalancing and the transmission of large-scale asset purchase programs: Evidence from the Euro area. Journal of Financial Intermediation. 2021 2021/10/01/;48:100896.
- 83. Horrace W, Oaxaca R. Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics Letters. 2006 2006/03/01/;90(3):321–7.
- 84.
Dougherty C. Introduction to econometrics. 5th ed. Oxford, United Kingdom: Oxford University Press; 2016.
- 85.
Verbeek M. A Guide to Modern Econometrics: Wiley; 2017.
- 86. Kim JH. Multicollinearity and misleading statistical results. Korean J Anesthesiol. 2019 Dec;72(6):558–69. pmid:31304696. Pubmed Central PMCID: PMC6900425. Epub 20190715. eng.
- 87.
Frost J. Regression Analysis: An intuitive guide for using and interpreting linear models. 1st ed: Statistics By Jim Publishing; 2019.
- 88. Midi H, Sarkar SK, Rana S. Collinearity diagnostics of binary logistic regression model. Journal of Interdisciplinary Mathematics. 2010 2010/06/01;13(3):253–67.
- 89.
Kennedy P. A Guide to Econometrics. 6th ed. London: Wiley-Blackwell; 2008.
- 90.
Pedace R. Econometrics for Dummies. London: Wiley; 2013.
- 91.
Ford C. Understanding Deviance Residuals Virgina, USA2022 [20.04.22]. Available from: https://library.virginia.edu/data/articles/understanding-deviance-residuals#:~:text=Deviance%20residuals%20measure%20how%20much,the%20observed%20proportions%20of%20successes.
- 92.
Agresti A. Categorical Data Analysis. 3rd ed: John Wiley & Sons; 2013.
- 93. Moulton BR. An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units. The Review of Economics and Statistics. 1990;72(2):334–8.