Confidence Interval Based Parameter Estimation—A New SOCR Applet and Activity

Many scientific investigations depend on obtaining data-driven, accurate, robust and computationally-tractable parameter estimates. In the face of unavoidable intrinsic variability, there are different algorithmic approaches, prior assumptions and fundamental principles for computing point and interval estimates. Efficient and reliable parameter estimation is critical in making inference about observable experiments, summarizing process characteristics and prediction of experimental behaviors. In this manuscript, we demonstrate simulation, construction, validation and interpretation of confidence intervals, under various assumptions, using the interactive web-based tools provided by the Statistics Online Computational Resource (http://www.SOCR.ucla.edu). Specifically, we present confidence interval examples for population means, with known or unknown population standard deviation; population variance; population proportion (exact and approximate), as well as confidence intervals based on bootstrapping or the asymptotic properties of the maximum likelihood estimates. Like all SOCR resources, these confidence interval resources may be openly accessed via an Internet-connected Java-enabled browser. The SOCR confidence interval applet enables the user to empirically explore and investigate the effects of the confidence-level, the sample-size and parameter of interest on the corresponding confidence interval. Two applications of the new interval estimation computational library are presented. The first one is a simulation of confidence interval estimating the US unemployment rate and the second application demonstrates the computations of point and interval estimates of hippocampal surface complexity for Alzheimers disease patients, mild cognitive impairment subjects and asymptomatic controls.


Variability and Estimation in Quantitative Studies
Solutions to many biological, engineering, social, environmental or health related challenges depend on obtaining accurate, robust and computationally-tractable parameter estimates. All natural processes, observable phenomena and designed experiments are affected by intrinsically or extrinsically induced variation [1]. Our understanding of such processes frequently revolves around estimating various population parameters of interest based on observed (acquired) data. Commonly used parameters of interest include measures of centrality (e.g., mean, modes), measures of variability (e.g., mean absolute deviation, range), measures of shape (e.g., skewness, kurtosis), proportions, quantiles, and many, many others. There are two types of parameter estimates -pointbased and interval-based estimates. The former refer to unique quantitative estimates, and the latter represent ranges of plausible values for the parameters of interest. There are different algorithmic approaches, prior assumptions and principals for computing data-driven parameter estimates. These depend on the distribution of the process of interest, the available computational resources and other criteria that may be desirable [2], e.g., biasness and robustness of the estimates. Accurate, robust and efficient parameter estimation is critical in making inference about observable experiments, summarizing process characteristics and prediction of experimental behaviors.
For example, a 2005 study proposing a new computational brain atlas for Alzheimer's disease [3] investigated the mean volumetric characteristics and the spectra of shapes and sizes of different cortical and subcortical brain regions for Alzheimer's patients, individuals with minor cognitive impairment and asymptomatic subjects. This study estimated several centrality and variability parameters for these populations. Based on these point-and interval-estimates, the study analyzed a number of digital scans to derive criteria for imaging-based classification of subjects based on the intensities of their 3D brain scans. Their results enabled a number of subsequent inference studies that quantified the effects of subject demographics (e.g., education level, familial history, APOE allele, etc.), stage of the disease and the efficacy of new drug treatments targeting Alzheimer's disease. Figure 1 illustrates the shape, center and distribution parameters for the 3D geometric structure of the right hippocampus in the AlzheimerÕ s disease brain atlas. New imaging data can then be co-registered and quantitatively compared relative to the amount of anatomical variability encoded in this atlas. This enables automated, efficient and quantitative inference on large number of brain volumes. Examples of point and interval estimates computed in this atlas framework include the mean-intensity and mean shape location, and the standard deviation of intensities and the mean deviation of shape, respectively.

The Statistics Online Computational Resource (SOCR)
The Statistics Online Computational Resource (SOCR) (http:// www.socr.ucla.edu) is an NSF-funded project that designs, implements, validates and integrates various interactive tools for statistics and probability education and computing. SOCR resource tools attempt to bridge between the learning and practice of introductory and more advanced computational and applied probability and statistics concepts. The SOCR resource comprises of a hierarchy of portable online interactive aids for motivating, modernizing and improving the teaching format in college-level probability and statistics courses. These tools include a number of applets, user interfaces and demonstrations, which are fully accessible over the Internet [4][5][6]. The SOCR resources allow instructors to supplement methodological course material with hands-on demonstrations, simulations and interactive graphical displays illustrating in a problem-driven manner the presented theoretical and data-analytic concepts. SOCR consists of seven major categories of resources: interactive distribution modeler, virtual experiments, statistical analyses, computer generated games, a data modeler, a data-graphing tool and a newly added java applet on confidence intervals. In this paper, we will demonstrate simulation, construction, validation and interpretation of confidence intervals, under various assumptions, using the interactive web-based tools and materials provided by the Statistics Computational Resource. Specific confidence interval demonstration examples will include intervals for: the population mean, with known or unknown population standard deviation; population variance; population proportion (exact and approximate), as well as confidence intervals based on bootstrapping and using the asymptotic properties of the maximum likelihood estimates. Like all SOCR resources, these confidence interval applets and activities are openly accessible via an Internet-connected computer with a Java-enabled browser [7]. The SOCR confidence interval applet enables the user to empirically explore and investigate the effect of the confidence-level, the sample-size and parameter of interest on the size and location of the corresponding confidence interval. This confidence interval applet also allows random sampling from a large number of discrete and continuous distributions and interactive user-selection of the distribution parameters. These materials are tested and validated in various settings and can be directly integrated in high-school and college curricula.

Background
Most scientific investigations rely on observable data (quantitative and/or qualitative), which is typically used to develop models, validate diverse research hypotheses, statistically analyze the power of different studies and interpret the intrinsic characteristics of the process of interest. Each observed dataset is assumed to be a representative sample of the population. Throughout this manuscript we are using the common statistics notation denoting by capital letters (e.g., X ,Y ,Z) and lower case letters (e.g., x,y,z) random variables and observed sample values, respectively. Random samples of n observations may be represented by fX i g n i~1f X 1 ,X 2 , Á Á Á ,X n g. For instance, if fX i g n i~1 are independent and identically distributed random variables from N(m,s) distribution, we can compute interval estimates (ranges) for the population mean using a sample (specific sequence of observations) fx i g n i~1~f x 1 ,x 2 , Á Á Á ,x n g. There are several different situations: 1. Depending upon our knowledge of the population variance (s 2 ), there are two approaches for constructing the confidence interval for the population mean, m.
If the population standard deviation s is known the confidence interval for the mean m is: where za 2 is the 1{ a 2 percentile of the N(0,1) distribution.
If the population standard deviation s is unknown: percentile of the student's t distribution with n{1 degrees of freedom. Note that this interval is generally wider than the first one. 2. The confidence interval for the population variance is obtained by 3. Confidence intervals for the population proportion may also be constructed in several different ways. Suppose X 1 ,X 2 , Á Á Á ,X n are binary (dichotomous) observations (e.g., ''yes'' or ''no'' responses). A simple confidence interval for the population proportion (p) of yes responses is most commonly constructed using the Wald method:p wherep p is the sample proportion. However, the Wald confidence interval has poor coverage of the true parameter (p) when the sample size is small or when the sample proportionp p is near zero or near 1. A better estimation of the confidence interval for p is obtained by:p And the best (exact) confidence interval for p, also called the ''Clopper-Pearson'' interval [8] is computed by: where

Results
The SOCR confidence interval applet is unique in a way that it allows the user to interactively sample from any of the 70z distributions of SOCR (http://www.socr.ucla.edu/htmls/SOCR_ Distributions.html) (Figure 2), set the specific parameters of the distribution, select the appropriate confidence interval parameter (m,s,p, etc.) and then choose the a. Sample size, b. Confidence level, c. Number of intervals to construct. The SOCR confidence interval applet can be accessed directly at (http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experi ment_General.html) or via the main SOCR Experiments (http://www.socr.ucla.edu/htmls/SOCR_Experiments.html), select ''Confidence Interval Experiment General'' from the dropdown menu). The output window of the applet is shown on Figure 3.
The selection of the various parameters of the experiment can be done through the ''Confidence Interval'' tab of the main window. Figure 4 shows the default settings of the input window.
In this input window the user can choose: a. The type of confidence interval to construct (m,p,s 2 ). b. Choose the distribution from where the samples will be taken and enter values for the appropriate parameters of this distribution.
c. Select the sample size. d. Choose the number of intervals to construct. e. Set the confidence level (1{a). f. Multiple sample simulation or bootstrapping resampling approach.
Clicking on the ''Step'' tab on the main applet window will show the results of a single run of the confidence interval (CI) experiment, using the user specified parameters, Figure 5.
We observe that all the input parameters are recorded on the right margin of the applet's main window ( Figure 6). There are two displays on this window. The top display shows the distribution mean and the sample values (n~20 in our example) selected for each interval (we have constructed 20 intervals). The second display depicts the actual 20 intervals which are constructed from the 20 random samples (red segments). When a confidence interval misses the true mean (here m~2) a green dot is shown to indicate this discrepancy. Observe that in this example because we have assumed that the population standard deviation is known (s~1) all the intervals have the same length. The confidence level was chosen to be 1{a~0:95 and therefore it is not surprising that among the 20 intervals only one missed the target parameter (m~2). If we choose to run the experiment  multiple times (for example 10 times), we simply select the ''Number of Experiments = 10'' and then click on the ''Run'' button. The results from each run of these 10 experiments are recorded on the right margin of the applet and are shown on Figure 7. In general, as these are random simulations, repeats of the experiment using the same parameter settings would generate different outcomes (sample instances and corresponding confidence intervals).
We can now access again the input window and observe how changes of the parameters may affect the appearance (size and location) of the confidence intervals. For example, if we increase the sample size, we will observe narrower confidence intervals. On the other hand, increasing the confidence level will generate higher interval coverage (more intervals will include the actual population mean parameter), however the width of the CIs will increase.
Confidence intervals for the mean with unknown standard deviation can be constructed analogously. These confidence intervals are based on the t distribution and the length of different intervals will vary because the specific sample standard deviation is used in their construction. Consider the exponential distribution with parameter l~5 (mean of 0.2), sample size 60, confidence level 0.95, and number of intervals 50. Results of this experiment are shown on Figure 8 and Figure 9.
We now examine the confidence interval for the population variance s 2 . Suppose we first use the confidence intervals applet to sample from a normal distribution with mean 5 and standard deviation 2, specifying sample size 30, confidence intervals 50, and confidence level 0.95. We observe that the coverage is indeed about 0:95 ( Figure 10 -see results on the right margin of the applet). Again, this is not surprising since when sampling from normal distribution the confidence interval is based on the chisquare distribution.
However, if the population is not normal the interval coverage is poor as can be seen in the following SOCR example. Consider the exponential distribution with l~2 (variance is s 2~0 :25). If we use the confidence interval based on the x 2 distribution we obtain the following results (first with sample size 30 and then sample size 300), Figure 11. Specifically, we observe that in both cases (regardless of the sample size -small or large) the coverage is poor, significantly less than 0:95. For example, on Figure 11, a metaexperiment, each consisting of generating 50 intervals of sample size n~30, yields an overall average number of confidence intervals that fail to cover the true population variance equals to 16. On Figure 12 we see the same poor coverage even when the sample size increases to (n~300). In these situations (sampling from non-normal populations), an asymptotic distribution-free confidence interval for the variance can be obtained using the following large sample theory result [9]: 4 is the fourth moment of the distribution. Of course, m 4 is unknown and will be estimated by the fourth sample The confidence interval for the population variance is then computed as follows: Using the SOCR confidence intervals applet (exponential distribution with l~2, sample size 300, number of intervals 50, confidence level 0.95), we observe an approximate interval coverage of 0:95, Figure 13.
We will examine now the confidence interval for the population parameter p. The SOCR confidence interval applet provides calculation for the three types of confidence interval for p mentioned in the Background Section. The applet's unique feature is that the user can again sample from any of the available SOCR distributions and define what the meaning of a success is. Success is defined as a randomly chosen observation falling within a userspecified interval. Success intervals are selected by drawing left and right interval limit on the distribution curve using the mouse, or by entering appropriate numerical limits in the distribution left and right text fields (see Figure 14). For example, suppose we choose to sample n~100 observations from a normal distribution with mean m~5, and standard deviation s~2. The user may define as In the next two figures we are presenting the case of poor coverage of the Wald confidence interval (Figure 15 whenp p is large (close to one), and the much better coverage for the same case using the Clopper-Pearson (exact) confidence interval ( Figure 16). In both cases we sample n~100 observations from N(5,2), a 0:95 confidence level is used, and success is defined by an observation falling between 0 and 11.
In addition, the SOCR confidence interval applet provides interval estimation for population parameters of a distribution  based on the asymptotic properties of maximum likelihood estimates. This is based on the large sample theory result of maximum likelihood estimates.
As the sample size n increases it can be shown [10] that the maximum likelihood estimate (MLE)ĥ h of a parameter h follows approximately normal distribution with mean h and variance equal to the lower bound of the Cramer-Rao inequality [10].
is the lower bound of the Cramer{Rao inequality : Because I(h) (Fisher's information) is a function of the unknown parameter h, the parameter is replaced by its maximum likelihood estimateĥ h to get I(ĥ h). Since, we can write Therefore, we are 1{a confident that h falls in the interval This result is used in the example below to construct a confidence interval for Poisson distribution with parameter l. Let X 1 ,X 2 , Á Á Á ,X n be independent and identically distributed random variables from a Poisson distribution with parameter l. We know that the maximum likelihood estimate of l isl l~ x x. We need to find the lower bound of the Cramer-Rao inequality: Let's find the first and second derivativeswith respect to l.
x l 2 :  Therefore, When n is large,l l follows approximatelŷ l l*N l,  In other words, 4:15ƒlƒ5:34. Using the same method, a confidence interval for the parameter l of the exponential distribution may be obtained. It can be shown that the confidence interval obtained by this method is given as follows:  Currently the SOCR confidence interval applet provides these two intervals using the asymptotic properties of maximum likelihood estimates. The following SOCR simulations ( Figure 17 and Figure 18) refer to a. Poisson distribution, l~5, sample size 40, number of intervals 50, confidence level 0.95, Figure 17.
The following two applications of the new SOCR Confidence Interval applet demonstrate the practical usage of these new statistical computing resources.
The first application shows a study of US unemployment. One question is how can we find the 99% confidence interval for the   Figure 19.
Next, we fit in a generalized Beta distribution model to the frequency distribution of the unemployment data. Figure  Then, we can run a simulation and estimate the confidence interval for the proportion of unemployment population to be  within the desirable range of 2%{5% (note that 5:3% is considered ''full employment''). Figure 21 shows the 0:99 confidence interval simulation settings panel, and Figure 22 illustrates the results of the simulation. The 100 simulations of 0:99 confidence intervals for the population proportion use the exact method and provide effective coverage of 0:95. In other words, 5 of the 100 simulations miss the real population proportion (p~0:47 representing the shaded area below the Beta density function on Figure 22, which indicates a healthy unemployment rate between 2%{5%). Note that each of the 100 samples contains N~2,000 random observations from Beta distribution. The discrepancy between the expected 0:99 coverage and the observed 0:95 coverage rates for the simulated confidence intervals of the population proportion can be explained by random sampling variation or the small number of intervals (100). A larger number of simulations using 1,000 0:99 confidence intervals cover the population proportion of interest (p~0:47) 97:5% of the time (or 9751000).
Finally, we can use the analysis portion of the SOCR confidence interval analysis applet (http://socr.ucla.edu/htmls/ana/Confi denceInterval_Analysis.html) to estimate the 0:99 confidence intervals for the expected (mean) unemployment rate in the US using the available sample data of 609 monthly unemployment measurement for 1959-2009: [5.732, 6.038], with a 50-year unemployment median of 5.885.
The second application of the SOCR confidence interval computational library involves a large neuroimaging study where automated volumetric data processing [11] is used to obtain different shape and volume measures of local brain anatomy. The subject population is derived from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://ADNI.loni. ucla.edu/) [12] and includes 27 Alzheimer's disease (AD) subjects, 35 normal controls (NC), and 42 mild cognitive impairment subjects (MCI). The broad goal of this study is to identify associations and relationships between neuroimaging biomarkers and various subject demographics and traits. For instance, we use  the Bootstrapping method to construct 0:99 confidence intervals for the mean curvedness (measure of shape) for the left and right hippocampus for each of the three cohorts. Given the two principal curvatures k 1 and k 2 [13], where k 1 ƒk 2 , the curvedness (CV) is defined by: The CV shape measure is computed at each vertex on the shape and we used the global curvedness, which is the overall average of local curvedness measured at each hippocampal surface vertex. Figure 23 shows an example of the local curvedness map on the left hemisphere of the cortical surface for one subject.
This neuroimaging dataset is available online (http://wiki.stat. ucla.edu/socr/index.php/021211). It contains (deidentified) subject index, group indicator (AD, NC, MCI), two cognitive assessment measures: MMSE (Mini-Mental State Exam) score and CDR (Clinical Dementia Rating) scale, subject's gender, age, TBV (Total Brain Volume) measure, total GMV (Gray Matter Volume), total WMV (White Matter Volume), total CSFV (Cerebrospinal Fluid Volume), and a numerical value for four shape-based measures (Surface Area, Shape Index, Curvedness and Fractal Dimension) for each of 56 brain Regions of Interest (ROIs) [14]. All of these measures are extracted using the Global Shape Analysis protocol via the LONI Pipeline environment [11]. Here we only demonstrate the Bootstrap based construction of the confidence intervals for the average left and right hippocampal curvedness measure for each of the three cohorts, Table 1. These results clearly show a monotonic increase of the curvedness shape measure between the 3 cohorts. This trend indicating group differences of the centers (medians) and dispersion (width) of estimated confidence intervals supports prior studies indicating progressive hippocampal anatomical atrophy reported in dementia subjects as they progress from NC to MCI and AD [15][16][17][18].
Similarly, for the same dataset we may obtain point and interval estimates for other parameters of interest (e.g., variance, various types of proportions) for specific ROI and shape measure based on any of the available confidence interval methods included in the SOCR CI computational library.

Discussion
Interval estimation of population parameters is an important component of many quantitative scientific investigations. Algorithmic constructions of interval estimates depend on a number of different factors, e.g., the characteristics of the natural distribution of the process, parameter of interest, computational efficiency and stability of the estimates, and sample size. In addition, there are different approaches for obtaining parameter interval estimates. For instance, there are completely automated and semi-automated techniques for constructing confidence intervals facilitating quantitative statistical analysis. Automatic protocols for obtaining confidence intervals (CIs) represent software programs which compute the intervals directly (using the data and the process density function). Some of the examples above show that in certain situations the interpretations of these intervals may significantly deviate from our common understanding of confidence intervals.
Variable transformations (e.g., ln, x k , tanh {1 ) are known to help aligning the theoretical approaches, algorithmic implementations and the practical interpretations of confidence intervals. However, the choice of an appropriate transformation is mostly subjective and requires input from investigators. There are also completely automated approaches to produce approximate confidence intervals. Confidence intervals bootstrapping is one notable example [19,20], which requires no special expert intervention. Drawbacks of such techniques include significant dependence of the interval on small variations in the data and the large amount of computing required to obtain the interval estimates (often 10,000Õ s of iterations of bootstrap CI calculation are required) [21]. Completely automated nonparametric CI estimates are always approximate since exact estimates do not exist for most parameters [22].
There are a number of pedagogical challenges in motivation, application and interpretation of confidence intervals for different population parameters [23]. Some of these challenges relate to the underlying assumptions, applications or limitations, accuracy and validity of the confidence intervals, as well as the interplays between sample size, confidence level and process density function. Graphical renderings and interactive simulations of data sampling and CI calculations may require some technical expertise and programming skills, but provide powerful instructional aides for training novice researchers and junior investigators. A direct example of these pedagogical challenges is the interrelation between the desirable narrow-width confidence interval and the expectation for high level of confidence.
Theory, construction and interpretation of CIÕ s may all present problems for learners and practitioners of quantitative statistical methodologies. Frequent refreshers and reinforcement of these ideas using interactive graphical tools improves the knowledge retention and understanding the role of the sample in the construction of dynamic confidence intervals with each repetition of the experiment.
Computational challenges present another barrier for many confidence intervals learners, applied scientists and clinical investigators [24,25]. There are two specific types of CI computational challenges. The first one is identifying the  distribution of the parameter of interest and developing a computationally-tractable algorithmic approach for estimating the parameter (and its moments). The second one is the implementation of (accessible) software tools that provide efficient and robust numerical CI estimates for datasets with varying characteristics (e.g., scale, heterogeneous format, sample-size).
The SOCR confidence intervals applet allows simulation and validation of various protocols for interval estimation using random simulations. In practice, most investigators and learners need to construct interval estimates for various parameters of interest using data obtained via research or observational protocols. The open-source SOCR library enables the integration of these calculations as part of any web-based or stand-alone computational tool. Complete Java documentations of this library is available here (http://www.socr.ucla.edu/docs). An instance demonstrating the usage of this CI computational library is included in the SOCR Analysis package [5]. The interactive SOCR confidence intervals analysis applet (http://www.socr.ucla. edu/htmls/ana/ConfidenceInterval_Analysis.html) enables the user to enter any (numerical) data, specify a parameter of interest, select an interval generation protocol, and compute the corresponding data-driven interval estimate. The applet provides several default datasets, however researchers can load or paste in external tabular data in the applet's data tab. The source code for this CI analysis applet demonstrates the externals invocation protocol of the SOCR CI calculations and is also freely available online.
This manuscript presents a unified, open-source, portable and extensible computational framework for computing, simulating and visualizing confidence intervals estimates in a broad spectrum of conditions. These resources address many of the common instructional, computational and application challenges described above. We showed two applications of the new interval estimation computational library. The first one is a simulation of confidence interval estimates for the proportion of years when a healthy 2%{5% unemployment rate may be expected and an estimation of the confidence interval for the US unemployment rate. The second application demonstrates the computations of point and interval estimates of hippocampal surface complexity for three cohorts. The source code, web-applet and interactive learning activity are all freely and anonymously accessible online (http:// wiki.stat.ucla.edu/socr/index.php/092110).