Is publicly-reported firm-level trade data reliable? Evidence from the UK

In this paper we compare firms’ self-reported overseas sales, as reported in a commonly used UK financial reporting dataset, with their actual exports, as reported by Her Majesty’s Revenue and Customs (HMRC). Finding that these flows are in several dimensions quite different, we then explore the implications of these differences more formally. Since several studies within the international trade literature report findings based on the self-reported export values in financial datasets, we discuss these findings in light of the departure of financial dataset-based exports from “true” (HMRC) export values.


Introduction
Firm-level datasets have been increasingly used to explore questions related to international trade. The most common sources for these data are customs records and surveys by national government statistics agencies, while the next most commonly used data come from the selfreports of firms in their end-of-fiscal-year financial reports. These reports are made available to researchers and others by several private data providers, and are available for many countries and regions including the U.S. (CompuStat), the U.K. and Ireland (FAME), Germany (dafne), India (Prowess), the Americas and Asia (Orbis), Russia, Ukraine and Kazakhstan (Ruslana), and Europe (Amadeus), to name just a few. Here we explore the reliability of these financial datasets for international trade research, focusing on the UK and the Bureau van Dijk dataset FAME. In particular, we are motivated by the possibility that these self-reported data may be systematically misreported, which may be very important to the extent that policy decisions are informed by estimates derived from these datasets.
While financial datasets are used as a source of export information for several countries, the UK's FAME database is one of the most widely used. This is in part because UK customs data have only recently become accessible to researchers and the UK's main firm-level production survey (the ABI/ABS) did not contain any information about goods exports until 2011 (and only a binary indicator for export status since then). Researchers have used FAME to explore a range of questions, many of which address fundamental issues within the international trade literature, highlighting the need to understand the extent to which the data accurately reflect the UK economy. Recent work has explored the impact of exporting on R&D [1]; the impact a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 of the financial crisis on exporting [2]; the relationship between the financial health of a firm, exporting, and firm survival [3,4]; the relationship between exporting and agglomeration economies [5]; the role of exchange rate uncertainty in the export decision [6]; the magnitude of learning-by-exporting [7]; the contribution of exporting to UK productivity growth [8]; the relationship between exporting and firm exit [9]; and firm heterogeneity in barriers to exporting [10]. FAME has also been used extensively for the evaluation of export promotion policies [11,12].

Data
We compare patterns of overseas sales across two data sources, restricting our analysis to the manufacturing sector in accordance with most of the literature. The first source is the UK's FAME data, a financial reporting dataset produced by Bureau van Dijk Electronic Publishing, which includes balance sheet information for nearly all UK firms. In addition to reporting a long list of variables related to firm performance and firm finance, FAME also reports "overseas turnover", a variable that primarily captures export sales but also includes the local (overseas) sales associated with the foreign affiliate of a UK firm. This is the variable used as a proxy for export status in the studies listed above and we will refer to it as either "overseas turnover" or simply "exports" throughout this note.
We compare and contrast these FAME-reported values with those reported by another source of export information: the universe of UK transaction-level exports, collected and housed by HMRC. These data are derived from customs declaration forms associated with the physical shipment of goods across borders and should provide a more accurate picture than self-reported exports. In addition, they are not contaminated by the inclusion of local affiliate sales, which are conceptually different from exports.
We merge monthly HMRC transaction-level exports covering the period 2007 to 2010 with FAME, using a common firm identifier. The merged dataset contains two export variables: overseas sales from FAME and data on actual exports from HMRC. Throughout the analysis we also exploit additional firm-level variables such as assets, employment or sales reported by FAME. One issue is that HMRC exports are associated with a trader identification number, which in 26 percent of cases is associated with more than one FAME identifier (HMRC trade flows need to be aggregated to the enterprise group level to be matched to FAME, and these groups often encompass several enterprises). We therefore perform our analysis with a sample that aggregates FAME variables up to the level of each unique trader identification number. We also performed the analysis on the sample of unique FAME-to-HMRC matches (the 74 percent of cases), with very similar results.

Comparing the FAME and HMRC data-Export status and export values
We begin by asking how well FAME captures some basic facts about export activity. Tables 1 and 2 present an initial comparison of the mean differences in firm activity between exporters and non-exporters in the HMRC data (Table 1) and FAME data ( Table 2). We see that firms that report positive exports in FAME are on average larger than the set of exporters in the HMRC data. Figs 1 through 4 provide a more detailed look across the firm size distribution. These Figs illustrate the extent to which FAME self-reported exports deviate from the true distribution of exports by comparing the value of exports and number of exporters reported in both FAME and HMRC, by quartile of firm total assets. Total assets is the only variable available for the universe of firms in FAME and is used as a proxy for size throughout. Fig 1 simply illustrates the fact that there are a greater number of HMRC-reporting (actual) exporters relative to FAME-reporting exporters, and this is true for each year in the sample and also true across the firm size distribution. There seems to be a particularly large absence of FAMEreporting exporters among the smallest firms. Fig 2 then narrows the focus to the top percentiles, where the largest disparities are again among the smallest of the large firms. This suggests that the FAME data vastly under-represent the number of exporters in all categories of firm size except for the largest 1 percent. Note that exporters are only required to report intra-EU exports to HMRC if they exceed an annual threshold (£250,000 in 2016). This implies that HMRC might also underestimate the number of actual exporters, suggesting that the disparity between FAME and the true figure may be even greater than reported here.
Across most of the distribution of export volumes there is little difference between FAMEreported values and true HMRC values. However, for the top quartile of firms as measured by assets, FAME-reported export sales vastly overstate both total UK exports as well as the importance of large firms in total exports (Fig 3). Furthermore, Fig 4 shows that the overstatement of exports among large firms in FAME is entirely driven by the concentration of export value among the very largest firms (the top 1 percent). Given the well-documented concentration of large multinational enterprises among the largest firms, the most likely explanation for this pattern is that the inclusion of local affiliate sales in FAME leads to a substantial overestimate of export values at the top of the firm size distribution.

Implications of mismeasurement-Determinants of export status and exporter premia
While export status and export values are likely to be severely mismeasured in FAME, this does not necessarily invalidate the key results from the studies mentioned earlier. The two principal goals of these and other studies of export behavior are i) to understand the determinants of export status; and ii) to establish whether exporting has a positive and (possibly) causal association with firm performance indicators ("export premia"). To see whether and how the measurement error introduced by misreporting of exports in FAME changes existing insights, we replicate standard export status and premia regressions for our FAME and HMRC datasets and compare the results. Table 3 reports OLS regression results in which export status (1,0) is regressed on several firm variables. Columns (5)-(8) include year and industry fixed effects while columns (3), (4) and (7), (8) add lagged export status. First, in our preferred

PLOS ONE
Is publicly-reported firm-level trade data reliable?
specifications, columns (7) and (8), both assets and turnover are positive and highly significant when applied to the HMRC export data, a result that is consistent with the literature. In contrast, these firm size proxies are near zero and not significant when applied to the FAME selfreported exports. And second, the HMRC data show a strong positive relationship between labor productivity and exporting, also consistent with the literature, which is not found in FAME. To summarize, the regressions that adopt the FAME export status variable suggest that firm size and labor productivity play no role in determining whether a firm exports or not. However, the regressions that adopt the HMRC export status variable indicate that larger firms, and more productive firms, are much more likely to export.
In Tables 4 through 6 we estimate export premia by regressing firm capital investment, turn-over, and wages on export status. For each case we estimate OLS specifications with and

PLOS ONE
Is publicly-reported firm-level trade data reliable?
without controls for assets, a proxy for firm size. There is a consistent pattern throughout, namely that the estimates are not very different from one another for both FAME self-reported exports and HMRC exports. Finally, following the literature we estimate the productivity premia associated with export starters, export stoppers and continuing exporters, for each measure of export status (HMRC versus FAME). Formally, we regress the change in each firm's labor productivity between periods t and t+1 on a set of indicators for whether the firm started exporting, stopped exporting, or continued to export between t and t+1. Table 7 presents the results, where columns (2) and (3) control for firm assets (size) and column (3) also adds industry and year fixed effects. We

PLOS ONE
Is publicly-reported firm-level trade data reliable? see that the (negative) premium associated with export stopping is nearly identical in both cases, a result that has also been identified throughout the literature [13]. On the other hand, the true HMRC results suggest no statistically discernible impact of starting or continuing to export, while FAME reports a positive and significant effect of export starting, and a negative and significant effect of continuing to export. The HMRC results are more consistent with the literature (and of course reflect the true behavior of UK firms), which has typically found that firms self-select into exporting, such that the act of beginning to export has little causal impact on productivity levels. With respect to continuing exporters, the evidence from the literature is mixed as to whether there is so-called "learning-by-exporting"-i.e., rising productivity over the export tenure. However, to our knowledge there is no evidence in the literature suggesting that there is a decrease in productivity over the export tenure, as is indicated by the FAMEbased result in Column (3).

Concluding remarks
In this note we have explored the extent to which the export values reported in a widely used U.K. financial dataset, FAME, reflect the true export behavior of those firms. Financial datasets are a commonly used source of export information, and our results should therefore be informative in interpreting existing studies as well as in directing future work that utilizes these data.
Our analysis centers around a comparison of the export values reported in FAME with the true export values collected by HMRC. We conclude with a summary of our findings and some comments on their implications: • Small (and, possibly, medium-sized) firms often report no exports in FAME when, in fact, they have exported. As a consequence, FAME is unreliable for estimating the total number of exporting firms.
• Export values derived from FAME substantially overstate exports for the largest firms. As a consequence, total exports reported by FAME across industries or economy-wide are not reliable.
• The determinants of export status are not very well captured by FAME. In particular, the relationships between size and exporting, and productivity and exporting, are inconsistent with the HMRC data as well as the existing literature.
• The premia associated with export status are captured fairly well by FAME. One exception is that FAME overestimates the productivity effects associated with starting to export while overstating the losses associated with continuing to export.