Reader Comments

Post a new comment on this article

An agenda without data

Posted by sawers on 20 Dec 2007 at 06:32 GMT

The controversy over John Talbott’s assertion regarding male circumcision has so far generated most of the heat in the discussion of his article, but that obscures what we believe is an even more serious problem: Talbott cannot get the results he claims with the data he cites. The principal conclusion of the article is not about circumcision; rather, it is that female commercial sex work accounts for intercountry differences in HIV prevalence and, in particular, explains the extraordinarily high prevalence of AIDS in Africa. He asserts that “it is the number of infected prostitutes in a country that . . .” explains “. . . why Africa has been hit the hardest by the AIDS pandemic . . .” (abstract, page 1). Calculating the number of sex workers, however, is problematic, since commercial sex work is often criminalized or subject to social opprobrium, and transactional sex is ill defined across cultures. Hence, skepticism is warranted about the validity of Talbott’s measurement of his key variable.

Talbott’s article cites only Vandepitte et al. [1] as a source of data on female commercial sex workers (FSWs) (footnotes 12 and 13 on page 2 and in Table 2 on page 4). That paper has three tables presenting data by country, but Talbott never explains from which table(s) or other sources he draws his data. Talbott’s study is a cross-country analysis, thus the data he can legitimately use are averages for the country as a whole. The only country-level data in Vandepitte et al. are found in Table 2, but that table lists country-wide averages for not a single African country.

Table 1 in Vandepitte et al. presents estimates of FSWs as a percent of the adult female population for specific locations within countries. For example, the data for Zambia are for a few truck stops and one town; the data for Madagascar are for a single provincial town. Clearly, data from truck stops, market towns, and capital cities are not representative of countries as a whole. Using those data in cross-country regressions would be entirely inappropriate.

Table 3 in Vandepitte et al. gives data from women in seven African countries who were asked if they had “paid sex in the past 12 months or had received money, gifts, or other favors for sex in the past 12 months.” Talbott says (page 2) that commercial sex work does not include “infrequent and informal exchanges of sex for goods and services.” That would appear to rule out the use of data from Table 3, but he also says that FSWs include “indirect (sex work on the side) commercial sex work.” The ambiguity of his definition of commercial sex work raises the possibility that he used the data from Vandepitte et al.’s Table 3 in his analysis. The question asked of the African women was not only about commercial sex work, as Vandepitte et al. recognize explicitly in their discussion on page iii24 and in their comment on Talbott’s paper (“The fatal attraction of ecologic studies”). Halperin, in his comment on Talbott, points out, “Many longer-term, regular relationships in Africa involve an important ‘transactional’ element (exchange of gifts, etc.) which is often construed as ‘prostitution,’ although it is fundamentally different.” Halperin could have also said that many long-term, regular relationships everywhere in the world involve a transactional element, though possibly some Africans are more comfortable saying so. If Talbott used data from Table 3 of Vandepitte et al. for his regressions, he did so inappropriately.

Since the only source of data on FSWs that Talbott cites in his article has no data appropriate for his study, we contacted Talbott and asked if there was a misprint or if he had other sources of information on commercial sex work in Africa that were missing from the published article. Talbott says in his article that the Vandepitte et al. paper is his “primary source” of data on FSWs and that he uses “multiple sources,” so we asked him what were his other sources. Talbott sent us a list of other sources of information for his key independent variable. He did not clarify whether there were additional sources of data not included in the list he sent us and did not respond to repeated requests for a fuller explanation of exactly what data he used from which source and how he combined them into a single measure of FSWs. Only one source on the list he sent us, a study published by USAID et al. [2], has any data on the number of FSWs in Africa. Table 6 in that report’s Annex gives estimates of the number of FSWs by country, including 19 sub-Saharan African countries.

Since Talbott’s dataset includes high income countries (but USAID et al. include only low- and-middle income countries), and since Vandepitte et al. have national data on countries at all income levels (but not on Africa), it appears that Talbott merged the data from the two studies. If that is what he did, he should explain how he merged the datasets, since many countries appear in both samples.

Moreover, Talbott’s assertion that “Africa has more than four times as many CSWs as the rest of the world (as a percent of the population)” (page 6) needs to be examined carefully. In the USAID et al. study of low- and middle-income countries, Africa’s average prevalence of FSWs is only 2.65 times the average of the other countries in the study. Note, however, that only two African countries in the study (Zimbabwe and Angola) have very high ratios of FSWs to the total population. Among the other 17 African countries in USAID et al., FSW prevalence is less than half the average in the other countries in the dataset. In addition, only four African countries (Mauritius and Senegal in addition to Zimbabwe and Angola) have FSW prevalence greater than the average for non-African low- and middle-income countries. In other words, using the data that Talbott says he uses, we can see that a high prevalence of FSWs in the population is not generally characteristic of Africa and thus cannot explain the higher prevalence of HIV there.

We can also compare data from both studies that Talbott says he uses for his article. According to Vandepitte et al. (Table 2), FSWs average 0.79 percent of adult females outside Africa. According to USAID et al., FSWs average 1.22 percent of adult females within Africa. That is far less than the four to one ratio asserted by Talbott. Moreover, (excluding Zimbabwe and Angola), FSWs are only 0.23 percent of adult females in 17 African countries in USAID et al., about a third of what Vandepitte et al. find in the rest of the world (including high income countries). Talbott’s data do not support his assertion that Africa has substantially higher prevalence of FSWs than elsewhere. Talbott makes sweeping claims about the large number of FSWs in Africa as a whole, but most African countries (including two in southern Africa—Swaziland and Zambia—with high HIV prevalence) have a far smaller prevalence of FSWs than is common elsewhere.

USAID et al. has the only dataset with national estimates of FSWs in Africa that Talbott told us he used (although he did not cite it in the article), so we tried to confirm Talbott’s results with those data. We calculated FSWs as a percent of each country’s 2003 adult female population. For the 51 low- and middle-income countries in the USAID et al. report, the percentage of FSWs correlates positively at the 99 percent confidence level with the log of adult HIV prevalence in 2006 in a bivariate regression, apparently corroborating Talbott’s results.[Endnote 1] Talbott adds measures of percent Muslim, the Gini coefficient, per capita income, and literacy to his regression and finds that the statistical significance of his measure of FSWs persists. We also added measures of these four variables to our regression and the coefficient on FSW is still significant at the 99 percent level.

An explanation of the forces driving an epidemic, however, should be valid for the regions that are most affected. Although HIV prevalence in the nine countries of southern Africa is far greater than in the rest of Africa, USAID et al. include data from only three southern African countries, only one of which (Zimbabwe) has both high HIV prevalence and a high percentage of FSWs in its population. Even if the data for Zimbabwe are correct, an argument that is trying to explain global epidemic disease patterns but hinges on a single observation cannot be persuasive. We reestimated the regression without Zimbabwe and the coefficient on FSWs is no longer significant. Eliminating Zimbabwe pushes the t statistic on FSW down to only 0.03.[Endnote 2] These regressions do not provide convincing support for Talbott’s argument that the prevalence of FSWs explains variation in HIV prevalence across countries since his results apparently rest on a single observation (Zimbabwe).

The crucial weakness in Talbott’s article is his failure to explain the sources of his data in a way that gives the reader confidence in his measure of female commercial sex workers. He should publish his data on FSWs, explain his methodology in cobbling together different data sources—if that is what he did, and tell us if he utilized other data sources. He needs to reestimate his regressions to see if his results are robust when a single outlier is omitted. Without those clarifications, one cannot give any weight to his claim that Africa has an unusually high prevalence of FSWs or that variations in the presence of FSWs allows us to understand the global pattern of HIV prevalence and its extremely high prevalence in Africa.

And why is puerile innuendo (“Size Matters”) considered an acceptable title for a scientific article?

Larry Sawers (Department of Economics, American University, Washington, DC, USA)

Eileen Stillwaggon (Department of Economics, Gettysburg College, Gettysburg, PA, USA)

[1] Vandepitte et al., “Estimates of the number of female sex workers in different regions of the world,” Sexually Transmitted Infections, 82. iii18–iii25.
[2] USAID, UNAIDS, WHO, UNICEF, and the Policy Project, “Coverage of selected services for HIV/AIDS prevention, care and support in low and middle income countries in 2003,” June 2004.

1. Talbott’s study includes high-income countries, but our regressions—based on USAID et al. that only includes low- and medium-income countries—do not. We used the log of HIV prevalence for 2006 (Talbott did not use the log of HIV prevalence) as is almost universal practice among cross-national studies of HIV prevalence. Since the variable is highly skewed, a regression using the log form of the variable yields more robust and efficient estimates and reduces the influence of outliers. All of the regressions reported in this comment were also estimated with HIV prevalence (not in its log form) and the results were far less favorable to Talbott’s argument than when the log form was used. All regressions reported in this discussion are robust to heteroskedasticity.

2. We also added four more control variables similar to those used in most other cross-country studies of HIV prevalence: age of the epidemic, contraceptive use, urbanization, and a binary (one or zero) variable for southern Africa. Using all eight control variables makes the coefficient on FSW statistically significant at only the 94 percent level—slightly below what is generally considered to be statistically convincing—and dropping Zimbabwe from the equation again makes the FSW variable insignificant.

Author Talbott Responds to Sawers

johntalbs replied to sawers on 23 Dec 2007 at 09:15 GMT

While pleased that Sawers was able to confirm my results when restricting the data sample to just developing countries, I do object to his implication that somehow my data methods were flawed. I report throughout the paper that the data available on the number of prostitutes by country was not perfect. But, regression analysis does not require perfect data in order to uncover statistically significant correlations. It is the nature of regression analysis that random error in data can be eliminated in searching for a true underlying correlation. Did he naively think surveys of people asking their degree of infection and their involvement in prostitution would ever be perfect? Does he think we should tell the 25 million infected in Africa that we should wait for “perfect” data before proceeding? Does he believe we should cease all research utilizing UNAIDS HIV prevalence data just because it has recently been shown to be less than ideal?

Sawers erred when he incorporated an additional four independent variables to the analysis without controlling for multi-collinearity between the variables. Specifically, by introducing a circumcision variable alongside the Muslim percentage variable in the analysis he made any reported results highly suspect as these two variables are very highly correlated (p<.0001). It is also completely natural for reported t-stats to decline as you add additional variables without controlling for correlation among the independent variables. In effect, the significance of any effect is divided across many variables partly measuring the same thing.

Sawers removal of Zimbabwe from the analysis after the fact as an outlier is also suspicious. Zimbabwe is not an outlier. It is highly representative of many countries in Southern Africa that have high degrees of HIV prevalence and large prostitute communities. Be wary of any study that selectively removes data from a sample, especially after the analysis has been completed. This is an ex-poste form of data mining in that you determine those data points that are most important to the analysis and then find lots of subjective reasons to exclude them from the analysis while their primary offense is solely that they support the initial thesis.

I would encourage greater efforts at better and more accurate data collection regarding the number of prostitutes by country. One idea would be to conduct cell phone surveys across the countries of southern Africa because, even in Africa, prostitutes carry cell phones. While such a cell phone survey is inherently biased, it is hard to imagine relative results reported across countries being any more biased than in person surveys that ask such personal questions regarding sex, disease and prostitution. It would be very cost effective and should provide the necessary additional data to confirm my original thesis.

RE: Author Talbott Responds to Sawers

sawers replied to johntalbs on 13 Jan 2008 at 06:40 GMT

John Talbott responded to our comment on his article, but he did not address our principal criticism, which was his failure to explain the sources of his data. In his response, he makes a number of statements that reveal misunderstandings of our criticisms.

1) Talbott starts by saying that he is pleased that we are “able to confirm my results,” but we do not confirm his results. Our regression analysis shows that his results are not credible because they rest on a single observation, Zimbabwe, the only country in Africa and in the world in which reported HIV prevalence and FSW prevalence are both very high.

2) Talbott says that our “removal of Zimbabwe from the analysis after the fact as an outlier is also suspicious. Zimbabwe is not an outlier. It is highly representative of many countries in Southern Africa that have high degrees of HIV prevalence and large prostitute communities.” The data that Talbott says he used (in citations in his article and in correspondence with us) show that only Zimbabwe among African countries has very high reported prevalence of both FSWs and HIV. Two other countries from southern Africa in those datasets, Swaziland and Zambia, have high HIV prevalence, but both have FSW prevalence well below average in the dataset. Some analysts include Angola in the southern African region, but Angola has relatively low HIV prevalence (though high reported FSW prevalence). Using Talbott’s own data, it is clear that Zimbabwe is not “highly representative of many countries in southern Africa.”

3) Talbott implies that our criticism of his article is about the poor quality of his data. We do not criticize the quality of the data in the two sources that he tells us about. The only source of data cited in his article has no country-level data on Africa even though his methodology is a cross-country analysis. In correspondence, he told us of only one other source of data on FSWs in Africa, but those data contradict numerous specific statements in his article and in his rejoinder to our comment. It is not the quality of the data that he uses that we criticize. Instead, we have pointed out that the data he cites do not corroborate his argument about commercial sex workers and HIV in Africa. His rejoinder to our comment ignores that point.

4) Talbott says that we introduced “a circumcision variable alongside the Muslim percentage variable in the analysis.” On the contrary, we have no measure of circumcision in our analysis. (We do use a variable that measures contraceptive use that he apparently confuses with circumcision.)

Larry Sawers, Department of Economics, American University
Eileen Stillwaggon, Department of Economics, Gettysburg College