Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Development of a methodology to make individual estimates of the precision of liquid chromatography-tandem mass spectrometry drug assay results for use in population pharmacokinetic modeling and the optimization of dosage regimens

  • Gellért Balázs Karvaly ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft

    Affiliation Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary

  • Michael N. Neely,

    Roles Conceptualization, Supervision, Writing – original draft

    Affiliation Laboratory of Applied Pharmacokinetics and Bioinformatics, Children’s Hospital of Los Angeles, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America

  • Krisztián Kovács,

    Roles Data curation, Project administration, Visualization

    Affiliation Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary

  • István Vincze,

    Roles Data curation, Project administration, Validation

    Affiliation Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary

  • Barna Vásárhelyi,

    Roles Funding acquisition, Supervision, Writing – original draft

    Affiliation Department of Laboratory Medicine, Semmelweis University, Budapest, Hungary

  • Roger W. Jelliffe

    Roles Conceptualization, Supervision, Writing – original draft

    Affiliation Laboratory of Applied Pharmacokinetics and Bioinformatics, Children’s Hospital of Los Angeles, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America



The clinical value of therapeutic drug monitoring can be increased most significantly by integrating assay results into clinical pharmacokinetic models for optimal dosing. The correct weighting in the modeling process is 1/variance, therefore, knowledge of the standard deviations (SD) of each measured concentration is important. Because bioanalytical methods are heteroscedastic, the concentration-SD relationship must be modeled using assay error equations (AEE). We describe a methodology of establishing AEE’s for liquid chromatography-tandem mass spectrometry (LC-MS/MS) drug assays using carbamazepine, fluconazole, lamotrigine and levetiracetam as model analytes.


Following method validation, three independent experiments were conducted to develop AEE’s using various least squares linear or nonlinear, and median-based linear regression techniques. SD’s were determined from zero concentration to the high end of the assayed range. In each experiment, precision profiles of 6 (“small” sample sets) or 20 (“large” sample sets) out of 24 independent, spiked specimens were evaluated. Combinatorial calculations were performed to attain the most suitable regression approach. The final AEE’s were developed by combining the SD’s of the assay results, established in 24 specimens/spiking level and using all spiking levels, into a single precision profile. The effects of gross hyperbilirubinemia, hemolysis and lipemia as laboratory interferences were investigated.


Precision profiles were best characterized by linear regression when 20 spiking levels, each having 24 specimens and obtained by performing 3 independent experiments, were combined. Theil’s regression with the Siegel estimator was the most consistent and robust in providing acceptable agreement between measured and predicted SD’s, including SD’s below the lower limit of quantification.


In the framework of precision pharmacotherapy, establishing the AEE of assayed drugs is the responsibility of the therapeutic drug monitoring service. This permits optimal dosages by providing the correct weighting factor of assay results in the development of population and individual pharmacokinetic models.


It is increasingly accepted that the utility of therapeutic drug monitoring (TDM) can be improved significantly by integrating the assayed drug concentrations into population and individualized clinical pharmacokinetic models. This can be optimized by using correct weighting of the data in making such models [14], enabling the maximally precise calculation of individual patient dosages using precision pharmacotherapy software. Because the correct weighting factor is proportional to 1/variance, all such models require reporting the standard deviation (SD) values [not the coefficients of variation (CV%)] corresponding to the measured drug levels. (Fig 1) [5,6].

Fig 1. An overview of how therapeutic drug monitoring (TDM) results can be integrated into clinical population and individual pharmacokinetic models on which individualized drug therapy is based.

After the drug is administered (A), blood is drawn at specific times (B) and submitted for analysis to obtain time-concentration data with the aim of determining the subject’s individual pharmacokinetic parameters. This can be done using, for instance, either parametric or nonparametric Bayesian analysis (C). An illustrative input pharmacokinetic model file of Pmetrics, a nonparametric population modeling software suite [12], is shown (D). The coefficients of the assay error equation are included in the computer script under “#Err”. Ka, Ke, V and Tlag are notations for pharmacokinetic parameters. Line art images are uncopyrighted and were made available by the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, United States at

Increasing evidence suggests that most bioanalytical assays are heteroscedastic, i.e. the SD varies in some way with the measured concentration [7,8]. The relationship between SD and analyte concentration can be estimated quantitatively for each level measured by the analytical method. This approach was first proposed by Ekins et al for an assay of aldosterone, revisited by Sadler, and proposed as the experimentally quantified error term in both population pharmacokinetic (PK) modeling and Bayesian posterior models in individual patients by Jelliffe and Tahani [911]. The clinical value of quantifying the assay error as a function of the drug concentration has resulted in its incorporation into precision pharmacotherapy software suites employing nonparametric pharmacokinetic modeling, such as Pmetrics and BestDose [12,13].

Based on studies conducted using automated clinical analyzers in the 1990’s, unweighted second degree polynomial regression was proposed to characterize the assay precision profile, established by analyzing several replicates of real patient specimens [6,11]. Recent advances show that the future of TDM is strongly linked to liquid chromatography-tandem mass spectrometry (LC-MS/MS) [14]. As shown in practice on a voriconazole assay and on theoretical grounds, bioanalytical LC-MS/MS methods can be characterized by linear error patterns [6,8].

In an effort to increase performance and decrease laboratory turnaround times and operational costs, automated clinical assays, including drug analysis, are typically performed without repeats. Therefore, the SD of the drug assay result, reported for further PK calculations, needs to be estimated. For this purpose, assay error equations (AEE’s) can be easily used. AEE’s are developed systematically, from zero concentration throughout the entire working range of the assay, by performing regression on the series of concentration-SD data pairs (i.e. the precision profile) with multiple measurements done over the widest measurable concentration range and enrolling several independent serum samples fortified with the analyte(s). Furthermore, an appropriate number of independent experiments is performed to include the inter-day variations of laboratory operation.

Although it has been advocated that using AEE’s may extend the clinical utility of bioanalytical methods, to our knowledge, no systematic analysis has been published on the methodology of AEE development [15].

An appropriate AEE will significantly optimize the precision of parameter estimates in population and individual patient pharmacokinetic models. Currently, little attention is given to this issue, and CV% is still regarded generally as the primary measure of assay precision. Our aim is to present an approach to develop AEE’s which allows the most precise calculation of the SD’s for each result obtained in routine clinical LC-MS/MS drug assays. For this purpose we studied precision profiles using an in-house LC-MS/MS method developed for the analysis of carbamazepine (CBZ), fluconazole (FLU), lamotrigine (LAM) and levetiracetam (LEV) [2,16,17].

Materials and methods

Chemicals and solutions

CBZ, FLU, LAM, LEV (all >99.5% pure), and their isotopically labeled analogs fluconazole-2H4 98.0%, lamotrigine-13C7,15N 99.0%, and levetiracetam-2H6 99.5% were purchased from Alsachim SAS (Ilkirch-Graffenstaden, France). Carbamazepine-2H10 0.1 mg/mL in methanol solution (CBZ-ISS) was a certified reference material bought from Sigma-Aldrich Hungary Kft (Budapest, Hungary). LC-MS grade acetonitrile, methanol and water were obtained from Molar Chemicals Hungary Kft (Halásztelek, Hungary).

Stock solutions of the analytes were prepared in methanol and stored at -75 °C until use. The first series of stock solutions contained CBZ (CBZ-SS1), FLU (FLU-SS1), LAM (LAM-SS1) or LEV (LEV-SS1) at 4.006, 4.948, 4.080 or 3.974 mg/mL, respectively. The second series of stock solutions contained the analytes at 2.017 (CBZ-SS2), 2.014 (FLU-SS2), 1.987 (LAM-SS2) and 2.015 (LEV-SS2) mg/mL, respectively. The stock solutions of the internal standards fluconazole-2H4 (FLC-ISS), lamotrigine 13C7 (LAM-ISS) and levetiracetam 2H6 (LEV-ISS), prepared in methanol, contained the substances at 1.00 mg/mL. The internal standard solution (IS) contained carbamazepine-2H10 at 0.2 μg/mL as well as fluconazole-2H4, lamotrigine-13C7,15N, and levetiracetam-2H6 at 2 μg/mL in acetonitrile.

The analyte solution used in the carry-over experiment was prepared by spiking CBZ-SS1, FLU-SS1, LAM-SS1 and LEV-SS1, 5 μL each, to 980 μL methanol, and by adding 10 μL of the resulting mixture to 990 μL methanol-water (3:1, v/v).

Further solutions were used in the matrix factor study. The high-level matrix factor spiking solution (MF-H) was prepared by diluting CBZ-SS1, FLC-SS1, LAM-SS1 and LEV-SS1, 25 μL each, to 500 μL with methanol. The low-level matrix factor spiking solution (MF-L) was prepared by diluting 10 μL MF-H with 990 μL methanol. The internal standard spiking solution (MF-IS) was prepared by diluting CBZ-ISS, FLC-ISS, LAM-ISS and LEV-ISS, 90 μL each, to 1000 μL with methanol.

Serum specimens

All of our procedures were in agreement with the Declaration of Helsinki. There was no interaction or intervention with human subjects in the framework of our study, and only leftover patient-related physical material was used. All employed serum specimens were irreversibly de-identified prior to making any manipulation or measurements related to our study on these specimens. Identifiable private information was not accessed by the authors. Due to these considerations and in line with effective regulations, no approval was sought from the Institutional Ethics Committee of Semmelweis University.

In this work, the word ‘specimen’ refers to unmanipulated blood or serum separated therefrom. ‘Sample’ is used for serum prepared, or manipulated by spiking the analytes, and then prepared for analysis. ‘Sample set’ is a batch of samples prepared from serum specimens, each of which had been obtained from a different individual. Human serum was obtained after blood had been drawn into 6-mL red-top native, serum-separator tubes in a standard phlebotomy process from in- and outpatients at various Departments of Semmelweis University (Budapest, Hungary), sent to the Central Laboratory (Pest), Department of Laboratory Medicine, Semmelweis University for routine diagnostic tests unrelated to our study, and by separating the serum by centrifugation (5366 x g, 10 min, cooled to +4 °C). The specimens employed in this study were leftover material which had undergone all ordered tests and were stored separately at 2–8 °C, allocated for disposal. All information displayed on the tubes and allowing patient identification were removed irreversibly at the Central Laboratory (Pest), and the tubes were given a numerical identifier for use in our study.

Candidate specimens were tested for hyperbilirubinemia, hyperlipidemia and hemolysis by establishing the lipemia-icterus-hemolysis index (LIH index, lipemic index) using a Beckman-Coulter clinical analyzer (Beckman Coulter Hungary, Budapest, Hungary) at Central Laboratory (Pest). Specimens with a positive LIH index were inspected visually for making judgement on which type of interference was dominant. Subsequently, specimens were frozen at -18 °C and transported in a frozen state to the Laboratory of Mass Spectrometry and Separation Technology, Department of Laboratory Medicine, Semmelweis University, for processing in the framework of this study. The absence of the analytes from the specimens was verified in preliminary experiments using the methodology described below.

Experimental design

This study consisted of 3 experiments conducted on separate sets of human serum specimens of at least 2.0 mL in volume, all collected from different patients. Spiked specimens (i.e. samples) were stored at -75 °C until analysis. The specimens were not pooled before spiking. In both experiments 1 and 2, specimens with a positive LIH index (hyperbilirubinemic, hemolytic and lipemic, 6 of each) were selected, in addition to 6 specimens without these conditions (termed as “normal”). The total of 24 specimens were spiked with the analytes at 4 and 6 concentration levels in each (i.e. spiking levels) in experiments 1 and 2, respectively. Experiment 3 was carried out by spiking 24”normal” serum specimens to reach 10 different concentration levels for each analyte (Fig 2).

Fig 2. Presentation of the experimental design.

3 experiments were performed. The impact of sample size, the number of spiking levels and common interferences was assessed on “small” and”large” sample sets containing 6 and 20 samples, respectively, using a combinatorial approach. The final assay error equations were established by using all of the results obtained in the 3 experiments (N = 24 specimens/spiking level, M = 20 spiking levels+blank).

In each experiment, various regression equations were applied to characterize the precision profiles obtained on sets of 6 (“small” sample sets), as well as of 20 spiked serum specimens (“large” sample sets). To evaluate the results obtained in all possible combinations of the samples, the combination formula “N choose R”: , was employed, where N is the total number of samples included in an experimental set (N = 24 at each spiking level) and R is the number of samples in which the drug concentrations were evaluated to obtain the precision profiles (R = 6 and R = 20 for “small” and for “large” sample sets, respectively, at each spiking level). Consequently, in each experiment and for each type of regression, 134596 (when R = 6) and 10626 (when R = 20) sets of coefficients were obtained. The calculations were performed using the computer scripts provided in S3 File.

The final AEE’s, intended for use in developing pharmacokinetic models, were established by performing regression on the precision profiles established using each and every assay result obtained in the 3 independent experiments (N = 24 specimens at each spiking level, M = 20 spiking levels, Fig 2).

Preparation of serum calibrator and spiked independent samples

Separate calibrator series, consisting of 6 levels each, were prepared freshly for each experiment. Pooled serum previously screened for the absence of the investigated substances was used. S1 and S2 Tables present details of the preparation and the analyte content of the calibrators and the spiked independent samples, respectively.

Sample preparation

Spiking of the analytes was performed by adding a small volume of the methanol solution of the analytes (i.e. the spiking volume) to the specimens, corresponding to ≤9.5% of the specimen volume.

Sample preparation was identical for all 4 analytes. 25 μL serum and 200 μL IS were pipetted onto a Phenomenex Impact 96-well protein precipitation plate (Gen-Lab Kft, Budapest, Hungary), which was shaken (1100 rpm, 10 min) on an Allsheng TMS-200 thermo shaker (Labex Kft, Budapest, Hungary). The supernatant was forced into a Brand® 96-well deep well collection plate (Merck Kft, Budapest, Hungary) using a Phenomenex® Presston 100 positive pressure manifold (Gen-Lab Kft, Budapest, Hungary) and nitrogen gas (5.0 purity, Messer Hungarogáz Kft, Budapest, Hungary). Then, a 25 μL-aliquot of the supernatant was transferred into another collection plate and 475 μL water-methanol (3:1, v/v) mixture was added. The entire procedure was performed at room temperature.

Analytical method

Analysis was performed using a Shimadzu Nexera X2 ultra-performance liquid chromatograph consisting of two LC-30 pumps, a SIL-30AC autosampling unit and a CTO-20AC column thermostat coupled to a Shimadzu LCMS-8060 mass spectrometer equipped with an electrospray ionization source and operated with positive polarity. Integrated system control and data acquisition were conducted by the Shimadzu LabSolutions v5.89 software (Simkon Kft, Budapest, Hungary).

Chromatography was performed using a Phenomenex Kinetex XB-C18 50x2.1 mm (1.7 μm) analytical column kept at 40 °C and protected by a Phenomenex® SecurityGuard Ultra C18 2.1x5 mm cartridge (Gen-Lab Kft, Budapest, Hungary). The mobile phases were water (A) and methanol (B), both containing 0.1% formic acid. The gradient program was 0 min: 30% B, 0.50 min: 30% B, 2.00 min: 90% B, 3.00 min: 90% B, 3.01 min: 30% B. The run time was 4.00 min. The sampler tray was cooled to 15 °C. 2-μL aliquots were injected.

The general mass spectrometer settings were as follows. Nebulizer gas: 3 L/min, heating gas: 12 L/min, drying gas: 6 L/min, interface temperature: 300 °C, desolvation line temperature: 250 °C, heat block temperature: 400 °C. The dwell time of each ion transition was 50 ms. Analyte-specific detection settings are shown in Table 1. Each spiked serum specimen was tested in a single measurement.

Table 1. Mass spectrometry settings applied for the quantitation of carbamazepine, fluconazole, lamotrigine and levetiracetam.

Evaluation of analytical results

The compounds of interest were identified by their ion transitions and retention times. Peak area ratios of the target ion transitions of analytes and internal standards were used for quantitative analysis. The 6-point calibration model was linear with 1/x2 weights, performed at the beginning of each batch run. In experiments 2 and 3, concentrations lower than the calibrated range were evaluated by single point calibration using the lowest level calibrator (0.403, 0.403, 0.397 and 0.403 μg/mL for CBZ, FLU, LAM and LEV, respectively).

Method validation

Method validation was carried out according to the respective guideline on bioanalytical method validation issued by the European Medicines Agency [18]. Sample carry-over with respect to the signals of the analytes, the performance of the calibration curves, matrix factors, as well as accuracy and precision (the latter as the CV%) were evaluated. Since the aim of this work was to establish precision profiles using specimens spiked immediately before conducting the experiments, storage or freeze-thaw stability studies were not performed.

Sample carry-over was tested by injecting 2 μL of water-methanol (3:1, v/v) solutions containing CBZ, FLU, LAM and LEV at 200, 247, 204 and 198 ng/mL, respectively, 5 times, and by evaluating the areas of analyte and internal standard peaks potentially appearing in the ion chromatograms of blank solvents run in between.

Matrix effect was tested by evaluating matrix factors and the internal-standard corrected matrix factors in “normal”, hyperbilirubinemic, hemolytic and lipemic specimens, 6 of each. Deproteinization was accomplished by diluting 25 μL blank specimen with 200 μL acetonitrile as described in the ‘Sample preparation’ section. 190 μL supernatant was transferred to a 2-mL Eppendorf tube and was spiked with 5 μL MF-H or MF-L, as well as with 5 μL MF-IS. 25 μL of the spiked supernatant was diluted with 475 μL water-methanol (3:1, v/v). Reference solutions were prepared by adding 5 μL MF-H or MF-L, as well as 5 μL MF-IS, to 200 μL acetonitrile-water (8:1, v/v) and diluting a 25-μL aliquot of this mixture with 475 μL water-methanol (3:1, v/v). The analyte content of the spiked supernatants corresponded to CBZ, FLU, LAM and LEV serum concentrations of 0.451, 0.557, 0.459 and 0.447 μg/mL in the low, as well as 45.1, 55.7, 45.9 and 44.7 μg/mL in the high level samples, respectively.

In this paper, the term „lower limit of quantification (LLOQ)” refers to the lowest concentration level of each calibration model, as set forth in [18] and not to the lowest level judged as quantifiable based on signal-to-noise ratios.

Development of AEE’s by applying linear and nonlinear regression to the precision profiles

The performance of the AEE’s was compared in terms of the differences between the experimentally determined SD’s and the SD’s estimated using (a) unweighted (ordinary) linear, (b) second- and (c) third-order least squares, (d) 1/x2-weighted linear least squares, and Theil’s regression, with (e) and without (f) using the Siegel estimator [19,20]. Theil’s regression is a robust nonparametric median-based linear regression algorithm which uses Eqs (1) and (2) [21]: (1) (2) with M spiking levels. The estimated slope was obtained by first calculating the slopes of the lines connecting each i-th SD to all of the j-th SD’s, where j>i, i = 1…(M-1), and j = 2…M, and, second, by taking their median. Subsequently, the intercepts were calculated [again, i = 1…(M-1)], and their median was determined to obtain the estimate.

The Siegel estimator is a modification of Theil’s regression which uses repeated medians as follows: (3) (4) where â is the estimated slope and ĉ is the estimated intercept. One of the differences from Theil’s original formula here is that the estimate of the slope is obtained by calculating the individual slopes of the lines connecting each i-th SD to all others (j<i as well as j>i, i = 1…M), recording the median of each i-th slope [median(i≠j)], and, after obtaining all individual medians, determining the overall median, i.e. the overall estimate of the slope. The calculation of the intercept is similarly performed in Eq (4). Theil’s algorithm is frequently employed with modifications of Sen (referred to as Theil-Sen regression [22]). However, in this study Theil’s original approach was applied.

The performance of the various regression approaches was evaluated by determining the sums of squared residuals normalized to those predicted by the regression algorithm (NSSR). NSSR is an unbiased estimator which retains the widely employed concept of judging the goodness of fit by calculating the residuals [Eq (5)]: (5)

Computer scripts were written by the authors in the R environment (version 3.5.1) to perform the calculation of SD’s, to generate and plot precision profiles, and to perform, as well as to evaluate and plot the results of the regression using algorithms (a)-(f) [23]. Theil’s regression [algorithms (e) and (f)] was performed using the mblm() function of the ‘mblm’ package. Weighted and unweighted linear and nonlinear least squares were fitted using the lm() function of the ‘stats’ package. In each experiment and for each analyte, the medians and ranges of the regression coefficients were calculated. Plots were created using the ‘ggplot2’ package. The input files, created in comma-separated value (.csv) format using Microsoft Excel, contained the analytical results (nominal and measured concentrations obtained for each specimen at each spiking level, S1 and S2 Files). An interactive website is under construction to make the procedure presented here available to interested professionals.


Method validation

The analytical method described here allowed the selective and specific detection of the analytes in human serum specimens with conventional detection limits of 0.144 pg, 0.216 pg, 0.213 pg and 0.216 pg for CBZ, FLU, LAM and LEV, respectively. The retention times were 2.6, 1.9, 1.4 and 0.9 min, respectively. No sample carry-over was observed. Typical ion chromatograms and representative calibration plots are shown in Fig 3.

Fig 3. Representative analytical data.

Ion chromatograms of the analytes (left) and the internal standards (middle), and representative calibration curves (right) obtained in the experiments.

The performance of the calibration models used in the 3 independent experiments are shown in Table 2. The intercepts were negligible and the differences of their means from zero was not significant, as evaluated by performing a one-sample Welch test (p = 0.4099, p = 0.3454, p = 0.2912 and p = 0.4188, respectively). All of the back-calculated concentrations of the calibration standards were accepted by applying the recommendations of [18].

Table 2. Performance of the calibration curves employed in this study.

The accuracy of CBZ, FLU, LAM and LEV concentrations measured in the spiked serum samples ranged between 89.2–133%, 93.1–107%, 80.3–109% and 89.8–109% with CV% ranges of 2.51–17.7%, 2.96–16.6%, 2.19–17.0% and 1.79–11.4%, respectively (Table 3).

Table 3. Within-run accuracy and relative precision (expressed as the coefficient of variation) obtained for the analytes in spiked independent serum samples in the conducted experiments.

The results of the matrix factor experiments did not show the suppression or enhancement of the ionization of the analytes or the internal standards in the “normal”, hyperbilirubinemic, hemolytic or lipemic samples. The matrix factors ranged between 98.1–111%, and the internal standard-corrected matrix factors were 98.9–114% (S3 Table).

Combinatorial evaluation of precision profiles in experiments 1–3

The medians and ranges of the linear coefficients of the AEE’s established by performing regression on SD’s obtained in sets of R = 6 or R = 20 serum samples are shown in Table 4. These medians (but not the ranges) showed good agreement regardless of the regression algorithm or the experimental setup applied. In addition, sample size (R) caused negligible differences between the medians of the linear coefficients obtained using the same regression algorithm. The medians of the SD’s obtained in experiment 3 conducted with „normal” serum specimens were lower than those obtained in experiments 1 and 2.

Table 4. Medians and ranges of linear and constant coefficients of carbamazepine, fluconazole, lamotrigine and levetiracetam assay error equations obtained by developing precision profiles for all combinations of 6 or 20 samples in the three experiments conducted.

(a) unweighted linear least squares regression, (b) unweighted 2nd-order least squares polynomial regression, (c) unweighted 3rd-order least squares polynomial regression, (d) 1/x2-weighted least squares linear regression, (e) Theil’s regression, (f) Theil’s regression using the Siegel estimator. 3rd-order least squares regression was not applied in experiment 1 due to the small number of spiking levels (M = 4). (A) Linear coefficients. High/low, ratio of the highest and the lowest linear coefficients obtained in the combinatorial calculation of SD’s and, subsequently, assay error equations in each experimental setup. ND, not determined. (B) Constant coefficients (intercepts). ND, not determined. NNI, percentage proportions of non-negative intercepts obtained in the combinatorial calculation of SD’s and, subsequently, assay error equations in each experimental setup.

When linear regression [algorithms (a), (d), (e) or (f)] was applied, the ratios of the highest/lowest linear coefficients (Table 4A, ‘high/low’ values) obtained for CBZ, FLU, LAM and LEV were <18.0, <18.4, <25.2 and <13.6, respectively, in the “small”, and <1.96, <1.95, <10.8 and <1.81, respectively, in the “large” sample sets. These data show that the variability of the linear coefficients (slopes) of the linear assay error equations dropped by approximately tenfold when the number of independent samples assayed at each spiking level was increased from 6 to 20. In one experimental setup (experiment 3, M = 10, R = 6), 1/x2-weighted linear least squares regression [algorithm (d)] performed on LAM yielded negative linear coefficients in 24.4% of the sample combinations, and, for the “large” sample set of the same experiment (M = 10, R = 20), a highest/lowest linear coefficient ratio of 10.8, which exceeded dramatically those obtained in all other experimental setups with “large” sample sets (1.16–1.59).

The ranges of the calculated constant coefficients (intercepts) of the assay error equations, which correspond to the estimated SD’s of the blanks, are presented in Table 4B. The proportion of negative intercepts observed within “small” and “large” sample sets were 19.4–99.5% and 0–100% [algorithm (a)], 0.1–75.3% and 0–98.8% [algorithm (b)], 16.7–48.6% and 0–73.2% [algorithm (c)], 0–17.9% and 0–2.0% [algorithm (e)], further, <0.1–42.8% and 0–20.3% [algorithm(f)]. Increasing the size of the sample set typically, but not always, improved the proportion of non-negative intercepts, regardless of the regression algorithm applied. Unweighted linear least squares regression [algorithm (a)] gave rise to unacceptably large proportions of negative intercepts, except for a single experimental setup (M = 6, R = 20). On the contrary, all intercepts were non-negative when 1/x2-weighted linear least squares regression[algorithm (d)] was performed, regardless of the number of samples and spiking levels [except for a single case with LAM (M = 6, R = 6)], or when Theil’s regression [algorithms (e) and (f)] was applied to “large” sample sets, using 6 or 10 spiking levels (with a single negative intercept recorded for CBZ, M = 6).

Considering nonlinear regression, proportions of non-negative intercepts exceeding 99.9% were obtained by applying 2nd-order polynomial regression [algorithm (b)] only for FLU and LEV in the “large” sample sets, and when 6 or 10 spiking levels were used. Further cases with acceptable proportions of non-negative intercept were recorded with FLU (94.1%) and LEV (99.9%) in “small” sample sets when 10 spiking levels were employed. These proportions were unacceptably large in all other cases.

The results obtained with 3rd-order polynomial regression were inconsistent, with an unacceptably large proportion of negative intercepts recorded for all (M = 6 or 10, R = 6) or most (M = 10, R = 20) assayed drugs. In a single setup (M = 6, R = 20), 99.6–100% of the experiments with 3 of the 4 drugs gave rise to non-negative intercepts.

Evaluation of the final precision profiles

The final precision profiles and the fitted AEE’s are shown in Figs 4 and 5. The summarizing data of the AEE’s and the NSSR values indicating the goodness of fit are presented in Table 5 (individual predicted SD’s are provided in S4S7 Tables). None of the linear regression algorithms yielded negative linear coefficients. Negative constant coefficients (intercepts), however, were obtained for CBZ, LAM and LEV using unweighted linear [algorithm (a)], and, for CBZ, FLU and LAM using unweighted 2nd-order polynomial least squares regression [algorithm (b), Table 5]. Based on the NSSR values, superior linear fits were obtained when the AEE’s were established using Theil’s regression with or without the Siegel estimator (2.492–25.66 and 2.197–24.28, respectively). 1/x2-weighted linear least squares and unweighted 3rd-order nonlinear regression demonstrated similar performance (NSSR = 9.021–23.11 and 4.234–10.86, respectively). Unweighted linear and 2nd-order least squares regression, on the other hand, were far less suitable, as their NSSR’s ranged between 3.288–3782 and 15.88–49848, respectively.

Fig 4. Regression plots obtained by applying various regression algorithms to the precision profiles of carbamazepine, fluconazole, lamotrigine and levetiracetam and by using all of the serum specimens (N = 24) and spiking levels (M = 20+blank) of the 3 experiments.

Each data point displays the standard deviation of the assay results. Green line: Theil’s regression, black line: unweighted linear least squares, blue line: 1/x2-weighted linear least squares, red curve: unweighted 2nd-order least squares, orange curve: unweighted 3rd-order least squares regression. See Table 5 for a detailed evaluation.

Fig 5. Accuracy of the prediction of SD’s by applying various regression algorithms to the final precision profiles of (A) carbamazepine, (B) fluconazole, (C) lamotrigine and (D) levetiracetam (M = 20 spiking levels+blank, N = 24 specimens/spiking level).

Red lines indicate perfect (100%) agreement between predicted and experimentally determined SD’s. Insets show results for concentration ranges in which the inaccuracy of the predictions was on the same scale. The applied regression methods are (a) unweighted linear least squares, (b) unweighted 2nd-order least squares, (c) unweighted 3rd-order least squares, (d) 1/x2-weighted linear least squares, (e) Theil’s regression, (f) Theil’s regression with the Siegel estimator.

Table 5. Regression equations and the normalized sums of squared residuals obtained applying various regression algorithms to the combined results of the 3 experiments performed on carbamazepine, fluconazole, lamotrigine and levetiracetam (N = 24 specimens/spiking level, M = 20 spiking levels+blank).


The present work was conducted to provide a new methodology for the estimation of numeric SD values for each concentration level obtained in clinical LC-MS/MS drug assays. This methodology is of importance as maximally useful clinical pharmacokinetic models employed for accomplishing optimal patient care require the use of the SD reasonably associated with each assay result instead of the coefficient of variation.

As a major difference from earlier works where assay error polynomials were evaluated by running patient specimens submitted for TDM in replicates [11,15], in this study, multiple independent blank human serum specimens were spiked with the analytes at known levels and evaluated in a single run. Because patient samples are typically assayed in a single measurement in routine diagnostics, along with the technical difficulties of presenting samples to cover the entire assayed concentration range, and with the lack of knowledge of nominal analyte concentrations in the real samples, we propose that the presented approach is more feasible and correct for the development of AEE’s. An additional benefit is that this approach can be applied in parallel with experiments performed to establish method accuracy and precision, allowing AEE’s to become a tool of method performance evaluation and part of the validation process.

Measures were taken to simulate special clinical situations. In experiments 1 and 2, 75% of the serum specimens employed had a positive lipemia-icterus-hemolysis (LIH) index, a condition frequently encountered and likely to interfere with assay performance. In experiment 1, a proportion of the spiking levels would be considered as supratherapeutic according to the widely considered therapeutic concentration ranges. In experiment 2 and 3, some of the spiking levels were lower than the lowest level calibrator and would therefore not be reported by most clinical laboratories.

The number of samples in the “small” sample sets was selected to make it equal to that recommended as a minimum by the bioanalytical method validation guideline of the European Medicines Agency [18]. The number of samples in the “large” sample sets was chosen to allow a low fractional error of the estimate of the SD while still providing us with a sufficiently large number of combinations of samples. This was based on the work of Ahn and Fessler who described the relationship between the error of the estimated assay SD and the number of specimens as , where σRS/σ is the fractional error of the estimate of σ, the true value of the standard deviation [24]. The error of the SD is 32% when R = 6 and 16% when R = 20.

The evaluation of the performance of the AEE’s obtained using various regression algorithms needs to be based on specific considerations. First, because SD’s can only take positive values, and a key goal of developing AEE’s is to have a tool for the estimation of assay precision all the way down to zero concentration, the primary aspect of judging the applicability of a regression algorithm is whether negative constant coefficients (intercepts) occur or not. Our results demonstrate that, regardless of the order of the fitted polynomial, the size of the sample set or the number of spiking levels, unweighted least squares regression yields negative intercepts in many sample sets and is, therefore, not suitable when LC-MS/MS is employed. Second, the characterization of the goodness of fit employing correlation coefficients is not helpful for this task as the relationship between concentrations and experimentally determined SD’s is unlikely to be very strong, correlation coefficients are subject to the presence of even a single outlier SD, and provide no information on the potential bias of the regression. We therefore propose the use of the sum of squared residuals normalized to the predicted value (NSSR), an unbiased indicator for comparing the goodness-of-fit of the AEE’s obtained using various types of regression or in various experiments.

The NSSR value is not the only indicator of the performance of the applied regression algorithms. The key advantage of applying algorithms (d)-(f) becomes apparent by comparing the estimates of SD obtained at low and at high concentration levels. The percentage errors of the estimates of SD obtained at low levels using unweighted least squares regression [algorithms (a)-(c)] were unacceptably large (Fig 5). In contrast, algorithms (d)-(f) yielded SD estimates on the same scale throughout the entire concentration range, down to zero concentration. Since making acceptable estimates at low concentrations (i.e. those outside the validated concentration range) and at the blank are of exceptional importance for avoiding the loss of pharmacokinetic information, the consideration of algorithms (d)-(f) is recommended (S4S7 Tables).

The selection of the various regression equations was based on earlier work of Jelliffe and Tahani who demonstrated that precision profiles obtained on clinical analyzers could often be characterized by unweighted second degree polynomials [11]. As shown above, the LC-MS/MS method described herein was characterized by linear precision profiles. Based on theoretical considerations and empirical findings, Gu et al proposed that SD is directly proportional to the analyte concentration in all bioanalytical assays performed using LC-MS/MS, and 1/concentration2 is the correct weight for calibration when LC-MS/MS is used [8]. Although the reasoning Gu et al provide is short of comprehensive theoretical evidence, 1/x2 weighting of quantitative relationships has been proposed in several publications on bioanalytical methodology, rendering reasonable the hypothesis that it can also be used for establishing precision profiles. Our results confirm that 1/x2-weighted linear least squares regression consistently provides better estimates of the SD than unweighted linear or nonlinear least squares regression algorithms do. The disadvantage of this approach, in addition to the lack of comprehensive theoretical evidence to substantiate its use for developing AEE’s, is its sensitivity to outliers in the precision profile which can be overcome by assessing an adequate number of samples and spiking levels.

Theil’s regression [algorithm (e)] is a very robust, nonparametric median-based, unbiased linear regression approach without any statistical assumptions and easy to calculate using computer software. When a sufficient amount of data pairs is available, Theil’s regression gives rise to the true quantitative relationship between the drug concentration and the SD as long as this relationship is linear. Its application afforded acceptable estimates of SD’s throughout the assayed concentration ranges, was theoretically sound, and allowed the tolerance of the outliers encountered (which does not mean that it makes good estimates for outliers). While Theil’s regression method is relatively insensitive to the presence of outlier SD’s in the precision profile, it eventually becomes biased when ≥29.4% of the data points are outliers. This limitation is overcome by the use of the Siegel estimator which is maximally robust [19,20,22]. Earlier, the utility of Theil’s regression has been demonstrated for a variety of bioanalytical tasks [25,26]. In our experiments, Theil’s regression was found to yield superior estimates of assay precision all the way down to zero concentration (Fig 5), with positive intercepts for almost all combinations of specimens when precision profiles established using “large” sample sets were evaluated (Table 4B). We therefore recommend using Theil’s regression with the Siegel estimator as the first choice for establishing AEE’s of TDM assays when the relationship between drug concentrations and SD’s is linear, while 1/x2-weighted linear least-squares can be an efficient, though not universal, alternative.

Our results confirm experimentally that the precise estimation of the coefficients of the AEE’s requires the use of large serum sample sets, while the differences observed in the ranges of the linear coefficients obtained with “small” and “large” sample sets in this study can be well explained by the findings of Ahn and Fessler [24]. Our results also demonstrate that the quantitative relationship depends on the number of the spiking levels used. The evaluation of accuracy and precision results obtained in 5 or 6 independent samples at 4 spiking levels, as proposed by the widely employed bioanalytical method validation guidelines, is clearly suboptimal in this respect [18,27].

In experiments 1 and 2, where a substantial proportion of the sample sets consisted of hemolytic, hyperbilirubinemic and lipemic serum specimens, the median linear coefficients obtained were consistently higher than those obtained in experiment 3 where”normal” serum was used. While these results do not imply that serum specimens containing these interferences should be rejected, the impact of the above interferences on the analysis is apparent, even when prepared samples are diluted substantially before analysis. It is therefore crucial to consider the inclusion of an adequate number of specimens with these interferences in serum sample sets employed for developing AEE’s to avoid obtaining falsely low estimates of SD’s.

The custom of not reporting the correct mathematical credibility of each measurement causes clinical and bioanalytical laboratories to censor many clinically important “below-LLOQ” results. The general recommendation that establishing concentration-dependent estimates of SD allows the laboratory to correctly report analytical results lower than the LLOQ for PK modeling, made by Jelliffe et al earlier, is confirmed by our experiments. The reported concentration range can be extended all the way down to zero concentration with acceptable accuracy and precision for pharmacokinetic modeling when the employed bioanalytical method relies on the use of LC-MS/MS [15].

The approach presented here can be incorporated flawlessly into the workflow of bioanalytical method validation according to the guidelines of the European Medicines Agency or the US Food and Drug Administration [18,27]. In terms of controlling precision, current guidelines focus only on the prevention of the coefficient of variation (CV%) becoming “unacceptably” great at the low end of the assay range by applying the arbitrary concept of LLOQ. Employing such arbitrary limits of “acceptability” during the evaluation of method accuracy and precision is a debatable concept in general, and is clearly not useful for incorporating analytical results correctly in pharmacokinetic models and practical patient care [2830]. Moreover, current tools proposed by the guidelines for monitoring the performance of bioanalytical methods, such as running internal quality controls, incurred sample reanalysis, and participation in external quality assessment schemes, focus only on accuracy but not on precision.

One may argue that the development of an assay error equation leads to the ignorance of the between-run performance of the analytical method. Between-run experimens are usually conducted by assaying the same specimen spiked to various levels in 3–6 batches, often on separate days. These specimens are usually not retained after method validation has been completed, instead, during routine clinical use, it is the internal controls measured in each batch which provide similar information, monitored using process control charts. Therefore it has been proposed that, in view of the diversity of random events contributing to the assay error, the information provided by between-run experiments should be incorporated into the precision profile established for developing the assay error equation [15]. The methodology we describe in the present work does not imply that validation assays should not be peformed on separate days. Our results in fact demonstrate that performing multiple independent experiments using different serum specimens each time increases the credibility of the AEE’s. Nevertheless, the integrated evaluation of the validation measurements renders the separation of within-run and between-run validation tests unnecessary, while providing quantitative data on precision across the assayed concentration range, considerably exceeding the quality of the information delivered by within-run and between-run validation tests performed as recommended by [18] and [27].

The experimental setup described here can be extended to studying the impact of further pre-analytical factors, such as storage conditions. At the same time, the laboratory is free to incorporate further data into the developed AEE’s, as well as to develop new ones for comparisons, whenever it is desirable to check the assay for drift or other departure from previous behavior.


Software-guided adaptive control of drug therapy for the maximally precise achievement of clinically selected targets can now be carried out for each individual patient. Consequently, integrating correct knowledge about the performance of the analytical method offers important opportunities for improvement for clinical laboratories. An essential element of this process is the correct estimation of analytical precision and weighting of each measured concentration to obtain proper drug dosage adjustments.

The proposed paradigm of evaluating clinical and bioanalytical drug assays for this purpose consists of the following.

  1. Reporting the measurement along with its estimated SD so the result can be incorporated into population pharmacokinetic models to obtain correct parameter estimates without which maximally precise initial dosage regimens cannot be developed, and into the subsequent individual models used for maximally precise dosage adjustment for each individual patient.
  2. Establishing assay error equations specifically for the method employed in multiple experiments by using several spiking levels and by spiking a large number of (preferably, at least 20) independent serum specimens at each level. When the precision profile is linear, as in most LC-MS/MS assays, it can be estimated efficiently using Theil’s regression with the Siegel estimator all the way down to zero concentration, eliminating the LLOQ and any need to censor low results. Full knowledge about the results obtained over the entire assay range is now available to laboratory personnel, to clinicians and to pharmacotherapists.

Supporting information

S1 Table. Preparation and analyte content of the employed calibrator samples.

CBZ, carbamazepine. FLU, fluconazole. LAM, lamotrigine. LEV, levetiracetam. SS1, Stock solution 1.


S2 Table. Preparation and analyte content of spiked independent serum samples.

CBZ-SS1, carbamazepine stock solution 1. FLU-SS1, fluconazole stock solution 1. LAM-SS1, lamotrigine stock solution 1. LEV-SS1, levetiracetam stock solution 1. CBZ-SS2, carbamazepine stock solution 1. FLU-SS2, fluconazole stock solution 1. LAM-SS2, lamotrigine stock solution 1. LEV-SS2, levetiracetam stock solution 1.


S3 Table. Matrix factors and internal standard-corrected matrix factors.


S4 Table. Observed versus predicted standard deviations of carbamazepine.

OLS, unweighted linear least squares. WLS, 1/x2-weighted linear least squares.


S5 Table. Observed versus predicted standard deviations of fluconazole.

OLS, unweighted linear least squares. WLS, 1/x2-weighted linear least squares.


S6 Table. Observed versus predicted standard deviations of lamotrigine.

OLS, unweighted linear least squares. WLS, 1/x2-weighted linear least squares.


S7 Table. Observed versus predicted standard deviations of levetiracetam.

OLS, unweighted linear least squares. WLS, 1/x2-weighted linear least squares.


S1 File. Zipped data files employed for performing the combinatorial calculations.


S2 File. Zipped data files employed for creating the final precision profiles.


S3 File. Scripts, written in R, employed for performing the calculations.



The authors gratefully acknowledge the careful and thoughtful help of Dr. David Bayard in the preparation of this manuscript. Tünde Heiger-Holczer is acknowledged for managing serum specimens. Katalin Rischák and Róbert Farkas are acknowledged for their valuable advice on the manuscript.


  1. 1. Diaz FJ, Yeh HW, de Leon J. Role of statistical random-effects linear models in personalized medicine. Curr Pharmacogenom Pers Med. 2012;10: 22–32.
  2. 2. Bondareva IB, Jelliffe RW, Andreeva OV, Bondareva KI. Predictability of individualized dosage regimens of carbamazepine and valproate mono- and combination therapy. J Clin Pharm Ther. 2011;36: 625–36. pmid:21062333
  3. 3. Roberts JA, Abdul-Aziz MHA, Lipman J, Mouton JW, Vinks AA et al. Individualized antibiotic dosing for patients who are critically ill: challenges and potential solutions. Lancet Infect Dis. 2014;14: 498–509. pmid:24768475
  4. 4. Hope WW, van Guilder M, Donnelly JP, Blijlevens NM, Brüggemann RJ, Jelliffe RW, Neely MN. Software for dosage individualization of voriconazole for immunocompromised patients. Antimicrob Agents Chemother. 2013;57: 1888–94. pmid:23380734
  5. 5. Burr T, Croft S, Reed BC. Least-squares fitting with errors in the response and predictors. Int J Metrol Qual Eng. 2012;3: 117–23.
  6. 6. Jelliffe RW. Optimizing laboratory assay methods for individualized therapy. In: Jelliffe RW, Neely MN, editors. Individualized Drug Therapy for Patients 1st ed. Elsevier, 2017. pp. 33–46.
  7. 7. Almeida AM, Castel-Branco MM, Falcao AC. Linear regression for calibration lines revisited: Weighting schemes for bioanalytical methods. J Chromatogr B Anal Technol Biomed Life Sci. 2002;774: 215–22.
  8. 8. Gu H, Liu G, Wang J, Aubry AF, Arnold ME. Selecting the correct weighting factors for linear and quadratic calibration curves with least-squares regression algorithm in bioanalytical LC-MS/MS assays and impacts of using incorrect weighting factors on curve stability, data quality, and assay performance. Anal Chem. 2014;86: 8959–66. pmid:25157966
  9. 9. Ekins RP, Newman GB, Oiyasena R, Banks P, Slater DH. The radioimmunoassay of aldosterone in serum and urine: theoretical and practical aspects. J Steroid Biochem. 1972;2: 289–304.
  10. 10. Sadler WA, Smith MH, Leggs HM. A method for direct estimation of imprecision profiles, with reference to immunoassay data. Clin Chem. 1988;34: 1058–61. pmid:3378320
  11. 11. Jelliffe RW, Tahani B. Pharmacoinformatics: equations for serum drug assay patterns; implications of therapeutic drug monitoring and dosage. Proc Annu Symp Comput Appl Med Care. 1993: 517–21. pmid:8130527
  12. 12. Neely MN, van Guilder MG, Yamada WM, Schumitzky A, Jelliffe RW. Accurate Detection of Outliers and Subpopulations With Pmetrics, a Nonparametric and Parametric Pharmacometric Modeling and Simulation Package for R. Ther Drug Monit. 2012;34: 467–76. pmid:22722776
  13. 13. Jelliffe RW, Neely MN, editors. Individualized Drug Therapy for Patients 1st ed. Elsevier, 2017. ISBN: 978-0-12-803348-7.
  14. 14. Shipkova M, Sviranov D. LC-MS/MS as a tool for TDM services: Where are we? Clin Biochem. 2016;49: 1009–1023. pmid:27163969
  15. 15. Jelliffe RW, Shumitzky A, Bayard D, Fu X, Neely M. Describing assay precision-reciprocal of variance is correct, not CV percent: its use should significantly improve laboratory performance. Ther Drug Monit. 2015;37: 389–94. pmid:25970509
  16. 16. Aldaz A, Alzueta N, Viteri C. Influence of comedication on levetiracetam pharmacokinetics. Ther Drug Monit. 2018;40: 130–4. pmid:29210977
  17. 17. Han S, Kim J, Yim H, Hur J, Song W, Lee J, et al. Population pharmacokinetic analysis of fluconazole to predict therapeutic outcome in burn patients with Candida infection. Antimicrob Agents Chemother. 2013;57: 1006–11. pmid:23254425
  18. 18. Guideline on bioanalytical method validation (EMEA/CHMP/EWP/192217/2009 Rev. 1 Corr. 1**). European Medicine Agency, 2012. Accessed: 6 Feb 2019.
  19. 19. Theil H. A rank-invariant method of linear and polynomial regression analysis. Nederl Akad Wetensch Proc. 1950;53: 386–92.
  20. 20. Siegel AF. Robust regression using repeated medians. Biometrika. 1982;69: 242–244.
  21. 21. Györfi L, Kohler M, Krzyzak A, Walk H. A distribution-free theory of nonparametric regression. Springer, 2002.
  22. 22. Sen PK. Estimates of the regression coefficient based on Kendall’s tau. J Am Stat Assoc. 1968;63: 1379–89.
  23. 23. R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Accessed: 6 Feb 2019.
  24. 24. Ahn S, Fessler JA. Standard errors of mean, variance, and standard deviation estimators. Technical report. The University of Michigan, July 24, 2003.
  25. 25. Lavagnini I, Badocco D, Pastore P, Magno F. Theil-Sen nonparametric regression technique on univariate calibration, inverse regression and detection limits. Talanta. 2011;87: 180–8. pmid:22099666
  26. 26. Zabell APR, Lytle FE, Julian RK. A proposal to improve calibration and outlier detection in high-throughput mass spectrometry. Clin Mass Spectrom. 2016;2: 25–33.
  27. 27. Bioanalytical method validation. Guidance for industry. US Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research, Center for Veterinary Medicine, 2018. Accessed: 6 Feb 2019.
  28. 28. Byon W, Fletcher CV, Brundage RC. Impact of censoring data below an arbitrary quantification limit on structural model misspecification. J Pharmacokinet Pharmacodyn. 2008;35: 101–16. pmid:17963024
  29. 29. Xu SX, Dunne A, Kimko H, Nandy P, Vermeulen A. Impact of low percentage of data below the quantification limit on parameter estimates of pharmacokinetic models. J Pharmacokinet Pharmacodyn. 2011;38: 423–32. pmid:21626437
  30. 30. Yang S, Roger J. Evaluations of Bayesian and maximum likelihood methods in PK models with below-quantification-limit data. Pharm Stat. 2010;9: 313–30. pmid:20043295