Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A quantitative method for measuring the relationship between an objective endpoint and patient reported outcome measures

  • Chul Ahn ,

    Contributed equally to this work with: Chul Ahn, Xin Fang

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, United States of America

  • Xin Fang ,

    Contributed equally to this work with: Chul Ahn, Xin Fang

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

    xin.fang@fda.hhs.gov

    Affiliation Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, United States of America

  • Phyllis Silverman ,

    Roles Methodology, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, Maryland, United States of America

  • Zhiwei Zhang

    Roles Methodology, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Statistics, University of California, Riverside, CA, United States of America

Abstract

Patient reported outcome measures (PROMs) become increasingly important for assessing the effectiveness of a drug or medical device. In order for a PROM to be claimed in labeling, the PROM has to be valid, reliable and able to detect a change if the targeted disease status changes. One approach to assess the quality of a patient reported outcome measure (PROM) is to investigate the association between the PROM and an objective clinical endpoint measuring the status of a disease/condition. However, methods assessing the association between continuous and discrete variables are limited, especially for correlated measurements. In this paper, we propose a method to assess such association with any type of samples with or without correlation. The method involves estimating the probability revealing the status of a subject’s disease/condition (called truth thereafter) through the subject’s reported outcomes. The probability is a conditional probability revealing truth given the relative location of the subject’s objective outcome compared to the subject-specific latent threshold in the objective endpoint. A consistent estimator for the probability is derived. The operating characteristics of the consistent estimator are illustrated using simulation. Our method is applied to hypothetical clinical trial data generated for an ophthalmic device as an illustration.

1. Introduction

Patient reported outcome measures (PROMs) have become increasingly important in measuring the effectiveness of a drug or medical device. Between years 1997 and 2002, about 30% of the new drug labels were found to have included patient reported outcomes (PROs) [1]. Later between 2006 and 2010, about 24% of new molecular entities and biologic license applications were granted patient reported outcome (PRO) claims [2]. The authors of this paper also noticed that the PROM claims in approved medical devices had been steadily increasing since 2012. In the meantime, many efforts have been made to advance the use of PROMs in drug or medical device development and regulatory decision making. Recent major challenges were reported from the Food and Drug Administration’s perspective [3]. The National Institutes of Health (NIH) also funded the establishment of a PROM Information System (PROMIS) [4, 5, 6]. Some recent literature focuses on the interpretation of PRO analysis results [7, 8].

In order for a PROM to be claimed in labeling of a drug or medical device, the PROM has to be valid, reliable and able to detect a change if the status of the targeted disease or condition changes [9]. The most frequently and broadly used statistics in a PROM validation such as Pearson and intra-class correlation coefficient (ICC) [10] assess the association among PROM items or between a PROM and other established measurement(s). These correlation coefficients have been used to examine various validities (e.g. construct, convergent/divergent, criterion) of PROMs [1030]. The correlation coefficients were also used to investigate the PROM’s ability to detect a change [31]. Some authors also used these correlation coefficients to explore the relationship of a PROM with other measurements [3235].

However, these correlation coefficients (1) may not be appropriate in correlated samples such as repeated measures, (2) may not be reliable for endpoints with different scales (e.g. categorical scale vs. continuous scale), and (3) do not have an intuitive clinical meaning because these coefficients or their changes don’t directly carry a clinical meaning. It is difficult to draw a line for an acceptable association based on these popular coefficients most likely due to the lack of clinical meaning of these correlation coefficients.

The challenge here is to develop a meaningful reliable methodology to measure the relationship between an objective continuous endpoint (X) and the dichotomized endpoint (G) of an ordinal PROM, and if the association index is strong enough, to use only the PROM to make inference about the effectiveness of the therapy or to use the PROM to support the primary inference in a clinical trial setup. This paper provides such a new meaningful quantitative statistic measuring the conditional association (denoted as Q here after) between paired endpoints (X, G), and a method to translate the ordinary PROM scales to the continuous objective measurement. The use of conditional association is due to the fact that the outcome of G is conditional on the outcome of X, because the PROM is always administrated after the treatment takes effect. The dichotomized endpoint (G) may represent mixed Bernoulli random variables with the same parameter but opposite meaning, which is explained in the method section of this paper.

Section 2 describes the definition of the conditional association parameter Q, the data structure used in this paper and how to estimate Q. The derivation of the estimator of Q is also presented in this section. Section 3 shows simulation results of the estimator () of Q and an application of this new methodology to hypothetical clinical trial data. The discussion and conclusion are presented in Section 4.

2. Methods

This section shows how the parameter Q works in assessing the quality of a PROM using repeated measures from a single subject. It starts with minimum notations and theoretical construct of Q, followed by the characteristic and estimation procedure of Q, and the derivation of the consistent estimator of Q. The derivation of the estimator is specifically arranged after introducing the estimation process so that the derivation is more accessible to readers. The section ends with how to obtain the inference for the PROM in multiple subjects.

In general, a single italic lower-case letter represents a nonrandom variable and a single italic upper-case letter represents a random variable unless stated otherwise (such as parameter Q). The non-italic PROMz (not a random variable) represents the scale z of the unidimensional PROM. Qiz is the probability of the PROMz revealing the disease status of Subject i according to his/her latent minimum objective threshold aiz given the subject’s objective outcome xiaiz or xi < aiz. Note: Qiz is not defined as a random variable and is a parameter to be estimated. The italic PROM is the random variable for the subjective PRO measurement, and the italic PRO is the realization of the PROM. The italic PROz represents the patient reported outcome equal to the scale z of the PROM. Other notations are defined in Appendix A.

2.1 Theoretical construct of parameter Qiz

As illustrated in Fig 1 below, the theoretical construct of Qiz is that there is a latent minimum threshold aiz of a disease status in terms of the objective disease measurements of Subject i which triggers PROz (z = 1, …, 7 in Fig 1) upon the PROM question according to the association parameter Qiz given the subject’s objective outcome xiaiz. Although the PROM question and scales don’t change with subject, sub-index i is used to indicate that the PROMi is the PROM random variable for Subject i, hereafter for clearance and without a loss of generality. Subject i will give his/her PROMiz with probability Qiz when his/her xiaiz, and will give his/her PROMi < z with probability Qiz when xi < aiz. Note here, the PROi is always dependent on where the Xi is realized relative to the minimum latent threshold aiz.

thumbnail
Fig 1. Conditional associations between a PROM and a continuous objective efficacy endpoint X for subject i.

https://doi.org/10.1371/journal.pone.0205845.g001

Fig 1 illustrates the relationship between the continuous objective endpoint Xi (such as increase in hemoglobin count (HC)) and a unidimensional 7-scale PROMi (such as fatigue improvement). The upper divided rectangular block illustrates a 7-scale unidimensional PROM, and the lower line X illustrates the continuous objective measurement with letter O indicating the baseline location of a subject. Each scale of the PROM (such as 5 = improved) for Subject i has its own minimum latent objective threshold (such as ai5) pointed by a connecting arrow between the two measurements. The PROM will be realized to the PROz with probability Qiz by Subject i upon the PROM question if xiaiz, which determines the conditional association of the PROMi with the continuous objective endpoint Xi for Subject i at PROMz.

Note here, the event of “PROMz” revealing the disease status of Subject i includes two true events: (1) PROMiz if xiaiz (as true positive), and (2) PROMi <z if xi < aiz (as true negative). We realize that if there is no conditional association between PROMi and Xi both Pr(PROMiz | Xiaiz) and Pr(PROMi <z | Xi < aiz) are equal to the pure chance rate: 50%. Therefore, we are searching the minimum threshold aiz in this paper such that Subject i will give his/her PROMiz with probability Qiz when xiaiz; and likewise Subject i will give his/her PROMi < z also with probability Qiz when xi < aiz. If the probability between the two possible “truths” are not equal, their estimations require many more assumptions (see derivation section for details) and are not considered in this paper. It is also necessary to point out that the two probabilities are not complementary to each other.

2.2 Characteristics of parameter Qiz

Parameter Qiz varies with PROMz and subject based on its definition. Therefore, there is no linear relationship between the PROM and the objective endpoint X for any subject. For example, Pr(PROMi < 5 | Xi < ai5) may be different from Pr(PROMi < 6 | Xi < ai6); and Pr(PROMi < 5 | Xi < ai5) for Subject i may be different from Pr(PROMh < 5 | Xh < ah5) for Subject h.

It is obvious that the clinical meaning of Qiz is inherited from its definition; i.e. the rate of revealing the truth, conditional on disease status (the actual disease status of Subject i relative to his/her minimum latent objective threshold for PROMz). A 50% rate revealing truth is equivalent to the subject flipping a fair coin to determine his/her PROz upon the PROM question; thus, this rate of 50% revealing truth indicates that the PROMz is not able to reveal the subject’s disease status. In general, the higher the rate revealing truth is, the better the quality of the PROMz is. This is because the higher rate indicates a higher probability of the PROMz to reveal a subject’s disease status upon the PROM question.

The use of Qiz to reveal the actual status of a subject’s disease has not been discussed in literature. Rasch promoted a probability model for a true positive response [36]. However, because a negative agreement was not considered, the Rasch positive probability did not measure the probability of revealing truth from a PROM. Our approach is related to latent variable models for similar problems [37, 38] in the sense that aiz can be regarded as a latent variable. On the other hand, we do not assume a particular distribution for aiz, which makes our approach different from most latent variable models. It is also noteworthy to know that Qiz is also measuring an indirect agreement between a continuous endpoint and a dichotomized version of an ordinal endpoint. Most traditional methodologies for measuring agreement as described in [39] are developed for two measures of the same type: both categorical or both continuous endpoints. In the case of different types of endpoints, ranks within each endpoint will replace the original values to make the two endpoints the same type (such as Spearman CC). In addition, the estimation of Qiz (1) can be applied to correlated data, (2) takes into consideration the uncertainty of the “gold standard” and involves a series of 2-by-2 tables in order to select one for the estimate (see the toy example below). Therefore, Qiz can be also viewed as a new agreement statistic between a continuous endpoint and a binary endpoint with or without correlation among samples.

2.3 Data and corresponding random variables

The data considered in this paper consist of pairs of observations (xik, gik) for Subject i at clinical visit k, where k = 1, , t. This xik is a continuous outcome representing disease status and could be the value at visit k or the change from baseline to visit k, such as the change in hemoglobin count from baseline. The outcome gik is the dichotomized version of the collected PROs at visit k, such as gik = 1 if the PROMi ≥ 5 and gik = 0 otherwise. The change from baseline in the PROMi is not considered here, because (1) each latent threshold of a PROMz is corresponding to the PROMz itself instead of its change, and (2) a change in PROs from baseline does not carry the same clinical meaning, which depends on the baseline PROs. For example, in a 7-point scale PROMi shown in Fig 1, a change in one PROM unit from “much worse” to “worse” may not be meaningful to a subject, while a change in one PROM unit from “neither” to “improved” carries clinical meaning to the subject.

The corresponding random variables are denoted as (Xik, or ). The is the Bernoulli random variable (B1(1, Qiz)) with probability Qiz to be 1 when xikaiz, and is the Bernoulli random variable (B0(1, Qiz)) with parameter Qiz to be 0 when xik < aiz. In other words, upon the PROM question, Subject i will give his/her gik = 1 (positive) with probability Qiz when his/her xikaiz, and will give his/her gik = 0 (negative) with probability Qiz when his/her xik < aiz as illustrated in Fig 2 below.

thumbnail
Fig 2. The gik is from two Bernoulli random variables with same parameter but opposite meaning depending where the Xik is realized: xik < aiz or ≥ aiz.

https://doi.org/10.1371/journal.pone.0205845.g002

2.4 Estimation of Qiz

This subsection shows how to estimate Qiz using a toy example. The derivation of the estimator of Qiz can be found in next subsection. In order to estimate Qiz, it is necessary to first search aiz. Because the aiz is the minimum latent threshold in the objective measurement for the PROMz, the search for aiz can be done using a pre-selected set of values {aj, j = 1, …, m} between the possible minimum objective measurement and the maximum objective measurement based on the current medical knowledge for the entire target population (such as normal range of human hemoglobin count). The pre-selected value aj is not meant to be random, but rather fixed and ideally pre-determined before the realization of Xik. For example, the normal range of human blood hemoglobin concentration can be determined from 5g/dL to 20g/dL so that aiz is believed to be included in the range for any subject; if the increasing step is 1g/dL between aj and aj+1, then number of searching points, m, is equal to 16 in this case. The magnitude of the increasing step is determined by how precise the aiz is expected to be. Again, this searching set is not considered random because it doesn’t change with study or subject and may not be changed for decades, such as the normal range of human blood pressures.

Table 1 shows a toy example of how to estimate Qiz. Note here, the number of searching points m need not necessarily be equal to the number (t) of clinical visits although we do so for illustration purpose. At each aj, the outcome xik (k = 1, …, t) is compared to aj one at a time. Then the number of potential true positive (TP) and the number of potential true negative (TN) responses can be summarized per Table 2. For example, in the 1st data row of Table 1 there are 9 xi ≥ 5.0 (positive) and only 6 gi equal to one (PRO positive), therefore the TP is equal to 6 (see next paragraph for more details). The total number of such 2-by-2 tables is equal to m, as the total number of distinct aj is m. The derivation in next subsection shows that the maximum of Rij = (TP+TN)ij/t is a consistent estimator of Qiz.

thumbnail
Table 1. Estimate of Qiz based on 9 pairs of repeated outcomes (xij, gij) from subject i.

https://doi.org/10.1371/journal.pone.0205845.t001

thumbnail
Table 2. Number of cell count at aj (j = 1, …, m) for subject i and PROMz.

https://doi.org/10.1371/journal.pone.0205845.t002

Table 1 shows how to use the pre-determined set of aj (j = 1, …, m) to calculate Rij at each aj based on two sets of 9 pairs of observations (xi1, gi1) … (xi9, gi9) from Subject i. The only difference between the two sets of samples is the different values in the 2nd binary outcome gi2 (0 vs. 1). If the PROi is positive, gik = 1; otherwise gik = 0. The pre-determined set of aj (j = 1, …, 9) is listed in the 2nd column of Table 1. At each aj, one can compare the 9 objective outcomes (xi1, …, xi9) to aj one at a time, and obtain the numbers of potential TP, FP, TN, FN per Table 2 above. Thus, each data row of Table 1 displays the four statistics TP, FN, FP, and TN, corresponding to aj. The estimate of Qiz for Subject i at the PROMz is the maximum of Rij. In this paper, if there are multiple tied maximums of Rij the median of the corresponding aj is used as an estimate of aiz. This is because at each maximum of Rij, the corresponding aj could be an estimate of aiz.

2.5 Derivation of the estimator of Qiz

As illustrated in Fig 1 above, the Qiz doesn’t change its magnitude as long as xiaiz or xi < aiz although Qiz changes its meaning from conditional true positive rate (when xiaiz) to conditional true negative rate (when xi < aiz). This is a reasonable setup because the event of PROMiz is a composite event including PROiz, PROiz+1, etc. For example, the event PROMi ≥ 5 includes PROi = 5, 6, or 7. When xi is far above aiz, Subject i may just give a higher PROi (say 7) and this event counts as one event of PROMi ≥ 5. This illustrates the fact that Qiz can be independent of the distance between xi and aiz. Because we search aiz such that Pr(PROMiz | Xiaiz) = Pr(PROMi <z | Xi < aiz) and Qiz doesn’t change its magnitude as long as xiaiz or xi < aiz., we define Qiz = Pr(PROMi z | Xi a, ∀ aaiz) = Pr(PROMi <z | Xi < b,b < aiz) (see Fig 2 for the illustration), where a and b are two arbitrary values in the objective measurement. Note here, although the clinical meaning of Qiz changes from conditional positive rate to conditional negative rate according to xi ≥ aiz or xi < aiz, the magnitude of Qiz doesn’t change. This implies that the magnitude of Qiz doesn’t change with any subset of Xi ≥ aiz or Xi < aiz. In order to reflect the setup and the meaning of Qiz, we use a and b here to indicate that Qiz does not change its magnitude with any subset in Xi ≥ aiz or Xi < aiz.

Also, the derivation of the Qiz estimator doesn’t assume independence among Xi1, …, Xit. The cumulative distribution function of Xi1 is denoted as Fi1. Because the xi1 is obtained in the 1st clinical visit before the realization of Xi2,…, Xit, the cumulative distribution function of Xik (denoted as Fik, k>1) is the marginal cumulative distribution function, which can be obtained by integrating out Xi1, …, Xik-1 from the joint distribution FXi1, …, Xik for Subject i. The use of general form of Fik in the derivation takes into consideration the correlated samples. The joint distribution FXi1, …, Xik applies to random variables with or without correlation. Therefore, the Xik (k = 1, …, t) are not assumed independent to each other and each Xik has a different marginal distribution.

The derivation of the estimator of Qiz starts with the probability of getting TN and TP at Visit k, which are presented in Expressions (1)–(4) below:

  • When aj < aiz:
(1)

With same argument, one can have the following: (2)

  • When ajaiz:
(3) (4) , where .

Consequently, the expectation of TP+TN can be shown in Expressions (5) and (6), where E is the expectation operator.

  • When aj ≤ aiz:
(5)
  • When aj > aiz:
(6)

If aj is equal to aiz, both expressions (5) and (6) are reduced to tQiz. Therefore, Rij = (TP+TN)/t is an unbiased estimator of Qiz only if aj = aiz, and TP+TN follows the binomial distribution when aj = aiz because its expectation follows the expectation of the binomial random variable (tQiz). We further notice that Eij (TP+TN) obtains its maximum at aiz when Qiz > 0.5 (i.e. ) based on the sign of the derivative of Eij (TP+TN) with respect to aj. When Qiz > 0.5, the derivative of Eij (TP+TN) is positive at the left of aiz (see Expression 5), and becomes negative at the right of aiz (see Expression 6). Therefore, Eij (TP+TN) not only reaches its maximum at aiz, but also becomes tQiz. This is why the unbiased estimate of Qiz is chosen as the maximum of Rij. Similarly, Eij (TP+TN) obtains its minimum at aiz when Qiz < 0.5.

In practice, it is reasonable to assume that a PROM has a non-negative association with the objective endpoint because it is obvious to see a potential direction of the PROM. If a negative association is expected, one can transform the objective outcome in order to have a non-negative association. For the example of a negative associate, if the PROM is the price satisfaction survey and the continuous objective endpoint is the cost of medical expense; then one can transform the cost by multiplying “-1” so that the higher negative cost (smaller cost) is in positive direction. Therefore, Qiz can be assumed to be ≥ 0.5. If Qiz = 0.5 (pure chance), this indicates that the PROMz may not be able to reveal the truth; consequently, there is no conditional association between the PROM and the objective measurement X at PROMz. This is because Qiz is defined as the probability revealing truth at PROMz; Qiz = 0.5 is equivalent to Subject i flipping a fair coin to get the PROz by pure chance.

As discussed above, based on Expressions (5) and (6), the unbiased estimator of Qiz is if aiz is in the searching set {aj, j = 1, …, m}. In practice, many tied maximums of Rij may occur especially when t is small and m is large. In this case, the median of the tied maximums will be taken as the estimate. Because of this, becomes a consistent estimator. The variance of is nuisance because the validation of PROM is usually drawn from multiple subjects instead of Subject i. Nonetheless the variance estimate () of for Subject i can be obtained by , because TP+TN follows a binomial distribution with parameter t and Qiz when aj = aiz. Further, because t is usually small, the exact binomial confidence interval for Qiz is used for in the simulation study.

It is necessary to point out that if the two probabilities (say Q_iz for negative truth and Q+iz for positive truth) are not equal, many more assumptions are needed to estimate Q_iz and Q+iz. Using our method, when both Q_iz and Q+iz are both greater than 0.5 we can have , where r = Q+iz /Q_iz. We can estimate aizusing aj at which the maximum of (TP + TN) is reached, but we have unknown r and many unknown Fik (k = 1, …, t). If we further assume r is known, we still could not find the estimate for Q+iz because we don’t know these Fik. Unless we further assume the distribution function of Xik at each clinical visit k, we can have a consistent estimate of Q_iz and Q+iz. But we feel that these further assumptions on knowing r and Fik (k = 1, …, t) are not practical, especially in medical device clinical trials. Therefore, we only search the threshold such that the two probabilities are equal in this paper.

2.6 Inference of Qiz in multiple subjects

So far, Qiz is estimated based on t repeated pairs of measurements from Subject i for the PROMz. If one wants to know the population Qz for the PROM and the objective measurement X at PROMz in a target patient population, the Qz can be confirmed by the mean () of with its 95% CI. For example, the lower bound of the 95% confidence interval of Qz must be greater than a desired probability of revealing truth in order for one to believe that the PROMz is able to reveal disease status for majority of subjects in the patient population.

The ability of the PROMi to detect a change in the objective endpoint Xi could be confirmed by the statistically significant change of aiz to aiz’ obtained by different dichotomizations of the PROi. Note, the magnitude of aiz will be changed when the PROi is dichotomized differently. For example, the PROi can be dichotomized at scale 7 by “at least very much improvement or otherwise" or at scale 6 by “at least much improvement or otherwise". This change of dichotomization represents one unit change of the PROi from scale 6 to 7, and thus the change of ai6 to ai7 measures the ability of the PROMi to detect the change in the objective endpoint Xi. The aiz is expected to be larger when the PROi is dichotomized by “at least very much improvement or otherwise" compared to that by “at least much improvement or otherwise". This is because “at least very much improvement” is more difficult to be reached and thus its minimum threshold is expected to be higher than that for “at least much improvement”. One can obtain the estimate of the change of aiz to aiz’ from each of n different subjects, and perform the test of the mean change > 0.

3 Simulation and illustration

3.1 Simulation

Simulation data from Subject i is used to illustrate the characteristics of , especially to show is a consistent estimator of Qiz. The simulation is not meant to align with a real clinical trial, however the use of in a clinical trial is presented after the simulation using hypothetical clinical data. Because Qiz is defined at subject level, the simulation uses one treatment for a disease in one subject only. The primary endpoint is an objective endpoint measuring the change of the disease status from baseline to 3 months. The PROM is the 7-scaled disease-related satisfaction PROM such as illustrated in Fig 1. In order to include different means and standard deviations, the simulation uses 5 different means [μ = (0, 0.5, 1, 1.5, 2)] and 5 associated different standard deviations [σ2 = (1, 1.3, 1.6, 1.9, 2.2)] as two building blocks to construct various multivariate normal distribution for the objective endpoint. For example, if t = 10 then Xik (k = 1, … 10) will follow the multivariate normal distribution with stacked mean vector (μ, μ) and the variance-covariance matrix with diagonal elements of σ2 repeated similarly on diagonal and the off-diagonal element of ρσlσs. Other setups are described as follows:

  1. The correlation coefficient (ρ) between Xik and Xik’ ranges from 0.3, 0.5, and 0.8.
  2. ai7’ (the minimum objective threshold for “at least very much improved”) is equal to 1.2, ‘ai3’ is equal to -0.3 and ‘ai5’ is equal to 0.4.
  3. The underlying probability of revealing the truth, Qiz (z = 3, 5, or 7) has values of 0.5, 0.6, 0.7, 0.8, and 0.9.
  4. Number of repeated measurements for the subject is t = 5, 10, 20, 40.
  5. Pre-selected aj ranges from -2 to 5.0 with increasing step of 0.1, therefore m = 71. Because the minimum two standard deviations below the five means is -2 and the maximum two standard deviations above the five means is 5, this range is wide enough to include all underlying true values of ai3, ai5, and ai7.
  6. Number of simulation is 10,000.

For each combination of ρ (0.3, 0.5, 0.8), aiz (-0.3, 0.4, 1.2), and Qiz (0.5, 0.6, 0.7, 0.8, 0.9),, the t (5, 10, 20, 40) pairs of outcomes (xik, gik) (k = 1, …, t) are sampled as follow. First xik (k = 1, …, t) is drawn from the corresponding multivariate normal distribution. If xikaiz, gik is drawn from Bernoulli (Qiz); otherwise gik is drawn from Bernoulli (1-Qiz). Then an estimate of Qiz is calculated using the method described above based on the t pairs of outcomes, and its 95% CI is calculated using the exact binomial confidence interval due to small samples. These steps are repeated 10,000 times for each underling value of Qiz and t; and then the mean of these 10,000 and the coverage probability of the 95% CIs for the Qiz are obtained.

Figs 35 show three examples that the mean of these 10,000 converges to Qiz regardless of the values of ρ and aiz. As the number of clinical visits increases for Subject i, the mean of approaches its underlying true value of Qiz. The converging pattern exists for every value of Qiz (0.6, 0.7, 0.8, 0.9) except for Qiz = 0.5. This is not a surprise because when Qiz = 0.5 there is no association between PROMi and Xi at PROMz. As shown in expressions (5) and (6), when Qiz = 0.5 every Rij (j = 1 … m) is an unbiased estimator of Qiz. A separate simulation using the median of Rij as is performed when Qiz = 0.5. The mean ranges from 0.50 to 0.52 (converging to 0.5) for different combinations of ρ, aiz, Qiz, and t. In practice, the simulation results for Qiz = 0.5 in Figs 35 can be used as a reference to set a minimum acceptable Qiz value. Table 3 shows that mean is a fairly close estimate of Qi7 under different values of t (5, 10, 20, 40). It is found that the probability of the 95% CI including the true value of Qi7 (coverage probability) is at least 95% due to the use of exact binomial confidence interval.

thumbnail
Fig 3. The mean converges to its underlying value of Qiz as sample size increases (ρ = 0.3, aiz = 1.2).

https://doi.org/10.1371/journal.pone.0205845.g003

thumbnail
Fig 4. The mean converges to its underlying value of Qiz as sample size increases (ρ = 0.5, aiz = -0.3).

https://doi.org/10.1371/journal.pone.0205845.g004

thumbnail
Fig 5. The mean converges to its underlying value of Qiz as sample size increases (ρ = 0.8, aiz = 0.4).

https://doi.org/10.1371/journal.pone.0205845.g005

thumbnail
Table 3. Mean estimate and coverage probability of Qi7 ρ = 0.8, ai7 = 1.2.

https://doi.org/10.1371/journal.pone.0205845.t003

3.2 Case study: Hypothetical clinical trial data

The probability Qiz of revealing truth for Subject i at PROMz, has been applied to hypothetical clinical trial data in order to assess the conditional association parameter in multiple subjects. The purpose of the trial is to improve near vision by a medical device. Each subject had a test device implanted and was followed up at Months 3, 6, 12, 18, 24, 30 post procedure. At each follow-up visit, a subject had his/her uncorrected near visual acuity (UCNVA) measured using ETDRS Chart at 40 cm/16 in, and answered a unidimensional PROM question with 7 possible outcomes as shown in Fig 1. The question in the PROM was “How satisfied are you with your near vision without reading glasses after the treatment?” The change from baseline in UCNVA is considered as the continuous objective clinical endpoint with a larger change indicating better near vision. The outcome of the satisfaction question is the PRO which can be dichotomized in 3 ways for every subject: ≥5 or otherwise, ≥6 or otherwise, ≥7 or otherwise. The mean (z = 5, 6, or 7) is used to assess the probability of the PROMz to reveal the status of the visual acuity in the targeted population.

The pre-determined threshold searching set {aj, j = 1, …, m} ranges from -20 to 60 letters with an increasing step of 1. This set contains m = 81 searching points for the minimum threshold aiz (z = 5, 6, or 7). It is believed that the threshold-searching set is large enough to contain the true value of aiz for PROMz for every subject in the target population.

Table 4 below shows that the mean of the (probability of revealing truth) and the mean in the change of UCNVA. As expected, one can see that the highest satisfaction has the lowest mean probability of revealing truth uncorrected visual acuity and the largest threshold in the change of UCNVA: 21 more letters correctly identified from baseline. The associated 95% CIs for Qiz well exclude 0.5 indicating Qiz from the majority of subjects are greater than 0.5 and consequently the probability of the PROMz revealing subjects’ uncorrected visual acuity is established. Since the PROMz has > 83% probability (based on the lower limits) of revealing the status of UCNVA, it may be used as a binary endpoint for the primary inference for uncorrected near visual acuity.

Table 5 shows the median of when the satisfaction level changes. The is found to have a highly skewed distribution; therefore p-values are reported here from a non-parametric signed rank test, and the reference statistic is referred to median instead of mean. One can observe that

  1. When the PRO increases from ≥5 to ≥6, the majority of subjects have no change (median = 0) in their uncorrected near vision acuity; this means that the PRO change from scale 5 to scale 6 may not represent a change in majority subjects’ uncorrected near vision acuity.
  2. When the PRO increases from ≥6 to ≥7 or ≥5 to ≥7, the majority of subjects have a positive change (median = 9 or 21, respectively) in their uncorrected near vision acuity; this means that the PRO changes from a lower score to 7 represent a change in majority subjects’ uncorrected near vision acuity.

These indicate that a change of one PROM unit in this case might not be adequate for a translation to a change in uncorrected near visual acuity. An increase of at least two (2) PROM units represents that the majority subjects have a positive increase in their uncorrected near visual acuity. Consequently, the ability of detecting the change of uncorrected near vision function by this PROM is suggested by two (2) PROM units in this clinical trial instead of one (1) PROM unit; or the majority of subjects have their PRO scores changed to 7. It is noted that the number of samples from each subject is ≤ 6 in this trial, which limits the capability of this method to search for aiz.

4 Concluding remarks

The conditional probability Qiz revealing the true status of Subject i’s disease at PROMz is a new quantitative statistic assessing the conditional association between a unidimensional PROMi and a continuous objective endpoint Xi measuring the disease status. The probability Qiz of revealing truth is estimated for each subject using paired observations (xik, gik) measured repeatedly at different clinical visits (such as Months 3, 6, 12 etc.). The Qiz reveals truth with respect to the latent minimum objective threshold aiz (i.e. xikaiz, or xik < aiz). When the PROMi doesn’t associate with the objective endpoint Xi, the Qiz is equal to the pure chance of 0.5. Because Qiz is a probability measure, this situation looks like one has flipped a fair coin to get his/her PRO regardless the status of his/her disease. When a PROM is used as a measure for a disease/condition in a clinical trial setup, the probability of revealing truth must be at least statistically greater than the pure chance of 0.5.

The threshold searching set {aj: j = 1, …, m} can be pre-determined using the current clinical standard of the possible minimum and maximum objective measurements in the target population. For example, the human hemoglobin concentration ranges from 5 g/dL to 20 g/dL. The number m can be determined based on how precise aiz is expected to be.

In practice, a clinical trial has n subjects and thus has n estimates of Qiz (i = 1, …, n). In order to have the PROMz used for a target population, the majority of Qiz (i = 1, …, n) have to be greater than the pure chance of 0.5; or it is equivalent to say that the mean/median of the Qiz (i = 1, …, n) should be greater than 0.5. Although the mean/median of the Qiz > 0.5 would indicate some association between the PROM and the objective endpoint X greater than chance in the target population, a higher quality PROM should have a larger value of the mean/median of the Qiz (i = 1, …, n). Let’s denote δ as the minimum value of the mean/median of the Qiz (i = 1, …, n) which is an acceptable probability for PROMz to reveal the status of the majority of subjects’ disease. To confirm that the majority of subjects have their Qiz (i = 1, …, n) greater than δ, one can simply test that the mean/median of the Qiz (i = 1, …, n) among n different subjects is >δ.

When the PRO is dichotomized differently by one PROM unit increased at a time, one can get the associated estimate of the change of the minimum threshold in the objective measurement for each subject, such as (i = 1, …, n). If the mean of these estimates from different subjects is statistically significantly greater than 0, then the PROM has the ability to detect a change in the objective endpoint. In case that (i = 1, …, n) has a skewed distribution, one should use the median of the estimates of (i = 1, …, n) so that the test implies that the majority of aizaiz (i = 1, …, n) are greater than 0.

The limitations of using Qiz include (1) it is applicable to a unidimensional PROM or a PROM item of interest in a multi-dimensional PROM instrument when a valid continuous objective measure of the disease status exists, and (2) if the number of repeated measurements is small, the estimator of Qiz is more biased. In this case, one can adjust the minimum acceptable probability of revealing truth in order to have confidence for the PROMz to reveal truth. Further research may focus on a quantitative method for measuring the conditional association between a multi-dimensional PROM and a pertinent objective measurement.

Appendix A: Notations

  • Sub-indexes i and j represent Subject i and threshold searching point j within a clinical visit k (i = 1, …, n, j = 1, …, m, and k = 1, …, t). The letter z denotes the zth scale of the PROM (PROMz).
  • The aiz is a fixed parameter which is defined as the minimum latent threshold in terms of the objective measurement for Subject i at PROMz. The aiz is defined for the zth scale and Subject i. For example, if the PROM has 5 different scales, then we will have five different values of aiz for the subject.
  • The aj is the jth searching point for aiz, and the aj belongs to a fixed pre-selected threshold searching set {aj: j = 1, …, m} (such as the normal range of hemoglobin count with an increasing step of 0.5). The aj is a nonrandom variable and does not change with subject. The set is selected based on the current clinical standard of normal range.
  • The X is the random variable for the continuous objective measurement of the status of a subject’s disease/condition, and lower case x is an outcome/realization of X.
  • is the Bernoulli random variable with probability Qiz to be 1 when xikaiz.
  • is the Bernoulli random variable also with probability Qiz to be 0 when xik < aiz.
  • Gik represents two mixed Bernoulli random variables with the same parameter Qiz (but opposite meaning) (if xikaiz) or (if xik < aiz).

Acknowledgments

The authors acknowledged the administrative support of this research work from the Division of Biostatistics, Office of Surveillance and Biometrics, CDRH, FDA.

References

  1. 1. Willke RJ, Burke LB, Erickson P. Measuring treatment impact: A review of patient reported outcomes and other efficacy variables in approved product labels. Controlled Clinical Trials 2004; 25: 535–552. pmid:15588741
  2. 2. Gnanasakthy A., Mordin M, Clark M., DeMuro C., Fehnel S., Copley-Merriman C. A Review of Patient-Reported Outcome Labels in the United States: 2006 to 2010. Value in Health 2012;,15(3): 437–442. pmid:22583453
  3. 3. Patrick DL, Burke LB, Powers JH, et al. Patient-reported outcomes to support medical product labeling claims: FDA perspective. Value Health 2007; 10 Supp. 2: S125–137.
  4. 4. Ader DN. Developing the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care 2007; 45(Suppl. 1): S1–S2.
  5. 5. Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap Cooperative Group during its first two years. Medical Care 2007; 45(Suppl. 1): S3–S11.
  6. 6. Reeve BB, Hays RD, Bjorner JB, et al. Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care 2007; 45: S22–S31. pmid:17443115
  7. 7. Cappelleri JC, Bushmakin AG. Interpretation of patient reported outcomes. Statistical Methods in Medical Research 2014; 23(5): 460–483 pmid:23427226
  8. 8. Massof RW. A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses. Statistical Methods in Medical Research 2014; 23(5) 409–429. pmid:23427227
  9. 9. FDA Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009. https://www.fda.gov/downloads/drugs/guidances/ucm193282.pdf
  10. 10. Fayers PM, Machin D. Quality of Life: The assessment, analysis and interpretation of patient-reported outcomes. 2007, 2nd, John Wiley & Sons Ltd.
  11. 11. Bassim CW, Fassil H, et al (16 other authors). Validation of the National Institutes of Health chronic GVHD Oral Mucosal Score using component-specific measures. Bone Marrow Transplantation 2014, 49: 116–121. pmid:23995099
  12. 12. Bredart A, et al (with other 7 authors). Quality of care in the oncology outpatient setting from patients' perspective: a systematic review of questionnaires' content and psychometric performance. Psycho-Oncology 2015, 24(4): 382–394. pmid:25196048
  13. 13. Brod M, Hojbjerre L, Bushnell DM, Hansen CT. Assessing the impact of non-severe hypoglycemic events and treatment in adults: development of the Treatment-Related Impact Measure—Non-severe Hypoglycemic Events (TRIM-HYPO). Quality of Life Research 2015, 24: 2971–2984. pmid:26094008
  14. 14. Cavaletti G. et al (with 27 authors). The chemotherapy-induced peripheral neuropathy outcome measures standardization study: from consensus to the first validity and reliability findings. Annal of Oncology 2013, 24 (2): 454–462.
  15. 15. Cella D, Jensen SE, Webster K, Du H, Lai JS, Rosen S, Tallman MS, Yount S. Measuring Health-Related Quality of Life in Leukemia: The Functional Assessment of Cancer Therapy–Leukemia (FACT-Leu) Questionnaire. Value in Health 2012, 15(8): 1051–1058. pmid:23244807
  16. 16. Coyne KS, Margolis MK, Murphy J, Spies J. Validation of the UFS-QOL-Hysterectomy Questionnaire: Modifying an Existing Measure for Comparative Effectiveness Research. Value in Health 2012, 15(5): 674–679. pmid:22867776
  17. 17. Gewandtera JS, et al (with other 10 authors). Associations between a patient-reported outcome (PRO) measure of sarcopenia and falls, functional status, and physical performance in older patients with cancer. Journal of Geriatric Oncology 2015, 6(6): 433–441. pmid:26365897
  18. 18. Gunaydin G, et al (with other 7 authors). Cross-cultural adaptation, reliability and validity of the Turkish version of the Japanese Orthopaedic Association Back Pain Evaluation Questionnaire. Journal of Orthopaedic Science 2016, 21(3):295–8. pmid:26898339
  19. 19. Harris-Hayes M, McDonough CM, Leunig M, Lee CB Callaghan JJ, Roos Em. Clinical outcomes assessment in clinical trials to assess treatment of femoroacetabular impingement: use of patient-reported outcome measures. Journal of the American Academy of Orthopaedic Surgeons 2013, 21(01): S39–S46.
  20. 20. Kimball AB, et al (with other 9 authors). Assessing the validity, responsiveness and eaningfulness of the Hidradenitis Suppurativa Clinical Response (HiSCR) as the clinical endpoint for idradenitis suppurativa treatment. British Journal of Dermatology 2014, 171(6): 1434–1442. pmid:25040429
  21. 21. Kopjar B, Tetreault L, Kalsi-Ryan S, Fehlings M. Psychometric properties of the modified Japanese Orthopaedic Association scale in patients with cervical spondylotic myelopathy. Spine 2015, 40(1): E23–E28 pmid:25341993
  22. 22. Pinheiro LC, Callahan LF, Cleveland RJ, Edwards LJ, Reeve BB. The Performance and Association Between Patient-reported and Performance-based Measures of Physical Functioning in Research on Individuals with Arthritis. The Journal of Rheumatology 2016, 43(1): 131–137. pmid:26628600
  23. 23. Motl RW, Cadavid D, Sandroff BM, Pilutti LA, Pula JH, Benedict RHB. Cognitive processing speed has minimal influence on the construct validity of Multiple Sclerosis Walking Scale-12 scores. Journal of the Neurological Sciences 2013, 335(1–2): 169–173. pmid:24104065
  24. 24. Petrillo J, Cano SJ, McLeod LD, Coon CD. Using Classical Test Theory, Item Response Theory, and Rasch Measurement Theory to Evaluate Patient-Reported Outcome Measures: A Comparison of Worked Examples. Value in Health, Vol 2015, 18(1): 25–34 pmid:25595231
  25. 25. Selman L, et al (with other 13 authors). 'Peace'and 'life worthwhile'as measures of spiritual well-being in African palliative care: a mixed-methods study. Health and Quality of Life Outcomes 2013,11: 94. pmid:23758738
  26. 26. Strober BE, Nyirady J, Mallya UG, Guettner A, Papavassilis C, Gottlieb A, Elewski BE, Turner-Bowker DM, et al. Item-Level Psychometric Properties for a New Patient-Reported Psoriasis Symptom Diary. Value in Health 2013, 16(6): 1014–1022 pmid:24041351
  27. 27. Tubergen AV, Black P, McKenna SP, Coteur G. FRI0523 Validity of ankylosing spondylitis patient-reported outcome instruments in the broad axial spondyloarthritis population. Annals of the Rheumatic Diseases, 2013, 72: A551
  28. 28. Turk DC, Dworkin RH, Trudeau JJ, Benson C, Biondi DM, Katz NP, Kim M. Validation of the Hospital Anxiety and Depression Scale in Patients With Acute Low Back Pain 2015, 16(10): 1012–1021. pmid:26208762
  29. 29. Vascellari A, Schiavetti S, Rebuzzi E, Coletti N. Translation, cross-cultural adaptation and validation of the Italian version of the Nottingham Clavicle Score (NCS). Archives of Orthopaedic and Trauma Surgery 2015, 135:1561–1566 pmid:26254581
  30. 30. Zeneli A, Fabbri E, Donati G, Tierney G, Pasa S, Berardi MA, Maltoni M. Translation of Supportive Care Needs Survey Short Form 34 (SCNS-SF34) into Italian and cultural validation study. Support Care Cancer 2016, 24:843–848. pmid:26166001
  31. 31. Rejas J, Ruiz M, Pardo A, Soto J. Detecting Changes in Patient Treatment Satisfaction with Medicines: The SATMED-Q. Value in Health 2013, 16(1): 88–96. pmid:23337219
  32. 32. Giesinger JM, Kuster MS, Behrend H, Giesinger K. Association of psychological status and patient-reported physical outcome measures in joint arthroplasty: a lack of divergent validity. Health and Quality of Life Outcomes 2013,11: 64. pmid:23601140
  33. 33. Manor L, et al (with other 6 authors). Age-related variables in childhood epilepsy: How do they relate to each other and to quality of life? Epilepsy & Behavior 2013, 26(1): 71–74.
  34. 34. Persson E, Eklund M, Lexell J, Rivano-Fischer M. Psychosocial coping profiles after pain rehabilitation: associations with occupational performance and patient characteristics. Disability and Rehabilitation 2016, pmid:26883399
  35. 35. Tate FR. Correlation between a Discrete and a Continuous Variable. Point-Biserial Correlation. The Annals of Mathematical Statistics 1954, 25(3): 603–607.
  36. 36. Rasch G. Probability models for some intelligence and attainment tests. Copenhagen, Danish Institute for Educational Research, 1960.
  37. 37. Kim M, Wall MM, Li G. Risk stratification for major postoperative complications in patients undergoing intra-abdominal general surgery using latent class analysis. Anesthesia & Analgesia 2018 Mar,126(3):848–857. pmid:28806210
  38. 38. Lombardi S, Santini G, Marchetti GM, Focardi S. Generalized structural equations improve sexual-selection analyses. PLOS ONE 2017, https://doi.org/10.1371/journal.pone.0181305
  39. 39. Lin L., Hedayat AS., Wu W. Statistical Tools for Measuring Agreement. Springer New York Dordrecht Heidelberg London, 2012.