Figures
Abstract
Sedimentation velocity analytical ultracentrifugation (SV-AUC) is an indispensable tool for the study of particle size distributions in biopharmaceutical industry, for example, to characterize protein therapeutics and vaccine products. In particular, the diffusion-deconvoluted sedimentation coefficient distribution analysis, in the software SEDFIT, has found widespread applications due to its relatively high resolution and sensitivity. However, a lack of suitable software compatible with Good Manufacturing Practices (GMP) has hampered the use of SV-AUC in this regulatory environment. To address this, we have created an interface for SEDFIT so that it can serve as an automatically spawned module with controlled data input through command line parameters and output of key results in files. The interface can be integrated in custom GMP compatible software, and in scripts that provide documentation and meta-analyses for replicate or related samples, for example, to streamline analysis of large families of experimental data, such as binding isotherm analyses in the study of protein interactions. To test and demonstrate this approach we provide a MATLAB script mlSEDFIT.
Author summary
Sedimentation velocity analytical ultracentrifugation (SV-AUC), a classical first-principles based method to study size-distributions of macromolecules and particles in solution, has become a popular technique in a variety of disciplines, such as physical biochemistry, structural biology, supramolecular chemistry, and nanoparticles due to its high resolution, wide size-range, and label-free detection. It has also assumed an important role in pharmaceutical development of therapeutics and vaccines. One factor complicating the implementation of SV-AUC in biotechnology is the lack of compatibility of the most widely used software, SEDFIT, with regulatory requirements in the good manufacturing practices (GMP) environment. In this paper we present a solution to this problem, by introducing a command line interface that permits automated configuration of SEDFIT, data loading, and retrieval of results. In this way, SEDFIT may be used as a computational module integrated in GMP-compliant software. We demonstrate the validity of the results from this approach. In addition, this automation facilitates meta-analyses of families of SV-AUC experiments, for example, in binding isotherm analysis of interacting macromolecules.
Citation: Schuck P, To SC, Zhao H (2023) An automated interface for sedimentation velocity analysis in SEDFIT. PLoS Comput Biol 19(9): e1011454. https://doi.org/10.1371/journal.pcbi.1011454
Editor: Susumu Uchiyama, Graduate School of Engineering, JAPAN
Received: May 15, 2023; Accepted: August 22, 2023; Published: September 5, 2023
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: SEDFIT version 16.50 equipped with the command line interface, as well as the source code of an accompanying family of MATLAB scripts mlSEDFIT can be freely downloaded from sedfitsedphat.nibib.nih.gov/software. Example input and output files can be downloaded from sedfitsedphat.nibib.nih.gov/tools. All data can be downloaded from the Harvard Dataverse https://doi.org/10.7910/DVN/CZDWRZ.
Funding: This work was funded through the National Institute of Health (ZIA EB000051-12 to PS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Sedimentation velocity analytical ultracentrifugation (SV-AUC) is a classical first-principles based method to study particles that sediment or float as a result of a gravitational field generated in a centrifuge [1–3]. In the last two decades, this technique has been used in a growing number and types of applications, coinciding with a significant advance in the computational data analysis [4]. Specifically, the combination of efficient Lamm equation modeling [5], novel size-distribution techniques for direct data fitting with and without diffusional deconvolution [6–8], and systematic noise analysis [9,10] as combined in the ls-g*(s) and the high-resolution c(s) approach implemented in the software SEDFIT has garnered widespread applications. These include analyses of size-distributions, hydrodynamic properties, and interactions of particles across the entire size-range accessible to SV-AUC from below 1 kDa to above 10 GDa, ranging from small carbohydrates and peptides [11–13] to proteins and protein complexes [14–17], carbohydrates [18], synthetic polymers [19], small and large nanoparticles [20,21], multi-protein complexes and interacting systems [15,22–25], lipid vesicles and emulsions [26,27], viral particles [28–34], and entire cellular organisms [35], at concentrations from picomolar to millimolar [3,36,37]. Due to the high resolution and sensitivity, and the measurement of particle sedimentation free in solution in the absence of surfaces or labels, this approach has also proven to be advantageous in biotechnology for the characterization of therapeutic and vaccine products, for example including antibody and other protein therapeutics [38–47], protein/polymer conjugates [48,49], and AAV products [28–32].
One of the challenges of applying SV-AUC in the context of biopharmaceutical production is the lack of software compliant with current good laboratory practices (cGLP) and current good manufacturing practices (cGMP) that satisfies part 11 of Title 21 of the Code of Federal Regulations; Electronic Records; Electronic Signatures (21CFR11) by FDA [50]. Other existing limitations are the required expertise and the laboriousness of serial analyses, for example, of replicate sets or of titration series in the study of interacting systems [43,44,51]. Both would greatly benefit from some degree of automation and streamlining of the data analysis. For this purpose, the present work presents an extension of the SEDFIT software for partially automated operation through the use of a command line interface. The interface allows starting SEDFIT in a specific state, as well as communicating data, analysis parameters, and results. Any secondary software utilizing this interface can spawn SEDFIT, automatically load specific data, execute a specific model for analysis, and retrieve the best-fit results of SEDFIT for documentation, further aggregation, and to carry out meta-analyses of multiple sedimentation coefficient distributions. While this allows streamlined SV-AUC analysis of replicates and titration series, importantly, this approach will be particularly suitable for preserving custody of data and associated analysis results and generating audit trails consistent with 21CFR11.
As illustrated in the present work, the results from such command-line controlled analyses are equivalent to the standard entirely manual use of SEDFIT. Due to the universality of SV-AUC, this approach can be applied to the analysis of therapeutic proteins, carbohydrates, nucleic acids, protein/polymer conjugates, any viral vectors (recombinant or wildtype) including adeno-associated virus (AAV), adenovirus, and lentivirus, lipid vesicles and lipid-based nanoparticles, metal- and other nanoparticles with and without protein conjugates or associated macromolecules such as loaded nucleic acids or proteins.
Method
The general strategy for command line operated SEDFIT is for it to be spawned by a secondary software, initiating a controlled input of data and providing a controlled return of the analysis results (Fig 1). This will allow logging of all activities and safely record results for further processing. SEDFIT may be also manually executed from the command line, for example, from within the DOS command prompt in Windows, although this would not take advantage of much of the graphical user interface for loading and saving data that is provided in the conventional stand-alone operation of SEDFIT. As a roadmap for the envisioned incorporation of command line operated SEDFIT into secondary software, we have created a set of MATLAB scripts mlSEDFIT, which can be freely downloaded from sedfitsedphat.nibib.nih.gov/software.
The secondary software organizes access control and preprocesses data. After SEDFIT is spawned by the secondary program, it reads a specifically formatted input file, and provides a graphical user interface with options controlled by the secondary program. Upon termination of SEDFIT analysis, its output files are read, and quality control, postprocessing and documentation by the secondary program can take place. This flow allows a single or multiple copies of SEDFIT to be utilized solely as a computational module within a framework of the secondary program, which may enforce GMP compatibility, incorporate results into meta-analyses, and/or provide an expert system or AI for automated analysis and quality control.
The command line-controlled run of SEDFIT requires three command line parameters. The first is a number that identifies the SEDFIT mode of operation, at present with possible values of “111” and “112”. The mode “111” will invoke SEDFIT with a standard menu allowing the full range of SEDFIT analyses models, and “112” will present SEDFIT with a reduced set of menu functions more narrowly tailored to elementary size-distribution analyses. In this restricted mode, non-standard settings are inaccessible to the operator. Different values of this command line parameter are possible in future implementations with alternate behavior. The second command line parameter is an arbitrary string that serves to identify the SEDFIT call. It will be repeated by SEDFIT after completion of the analysis, and thereby provides a layer of security that the reported results do belong to the requested analysis. The third parameter is the path of the input file. For example, the command line could be “sedfit.exe 111 handshakestring c:\datafolder\TestInput.xml” (without the quotes) where ‘handshakestring’ can be replaced by any string, and ‘c:\datafolder\TestInput.xml’ should be replaced by the appropriate directory and filename to point to a valid xml input file in the correct format.
An example for an input file is provided in S1 File. For convenience in generating and parsing the input file, it is in xml format version 1.0, as indicated in the first line by the prolog string <?xml version="1.0" encoding="UTF-8"?>. This is followed by the root ‘cGMPSedfitCall’ as written in an initial <cGMPSedfitCall> call, which is paired with a final </cGMPSedfitCall> statement at the end of the file. In between these opening and closing root identifiers are the analysis parameters. They can be in any order, but must adhere to the xml syntax of <ParameterName>value string</ParameterName>, where ‘ParameterName’ is any of the identifiers described below, and ‘value string’ can be a number, a file path, the string TRUE, FALSE, or other string, as appropriate for the parameter. A complete list of all parameters, their purpose and possible values can be found in Table 1. Unless the menu of SEDFIT is restricted (in mode “112”, see above), all parameters can also be set later as usual in the SEDFIT user interface by the operator.
Input parameters will be read from a file named as third command line parameter. It is in xml format and parameters are case sensitive. An example can be found in S1 File.
While most of the input parameters are strictly related to the data analysis, some are related to the SEDFIT interface and analysis flow. Most importantly, the parameter AllDoneFlagFile will establish the path and name of the file that will be created upon completion of the SEDFIT analysis. For example, it might be called ‘c:\datafolder\finished.txt’. Importantly, this file must not exist prior to the SEDFIT call, such that its creation can be used as a convenient flag to the secondary software that SEDFIT has created the results files and was exited. The only content of this file will be the handshake string (second command line parameter, see above), which may be checked by the secondary software for consistency with the calling parameter. The parameter OutputResultsDirectory is a path that points to the directory where SEDFIT will create several output files describing the results after conclusion of the analysis, as detailed below. For example, the string ‘c:\datafolder\analysisresults\results1’ will cause SEDFIT to create the subfolder ‘results1’ within the existing ‘c:\datafolder\analysisresults’ directory, and all results files will be located in this ‘results1’ subfolder. Finally, it is possible to use a parameter PassThrough to declare any string that, for convenience and clarity, should be repeated in the output xml file. For example, this could be a sample name, user notes, or some additional control strings to be recorded alongside the results. It may also be itself an xml-formatted string, but in this case it should not contain any regular parameter identifiers used in the SEDFIT interface to avoid conflicts.
To define the data to be analyzed, the input file must specify the folder containing the SV-AUC scan files in the parameter DataDirectory, the parameter Channel that identifies the scan file extension, and the parameters FirstScan, LastScan, and ScanInterval that specify the set of scans to be loaded. It is assumed that the scan files are in the customary SV-AUC numeric format with leading zeros. This information substitutes for manually invoking the Load New Files function in the Data menu of SEDFIT. For example, if DataDirectory is ‘c:\datafolder\Run158’, Channel is ‘IP2’, FirstScan is ‘1’, LastScan is ‘101’, and ScanInterval is ‘10’, SEDFIT will load 11 files named ‘c:\datafolder\Run158\00001.ip2’, ‘c:\datafolder\Run158\00011.ip2’, ‘c:\datafolder\Run158\00021.ip2’, …, ‘c:\datafolder\Run158\00101.ip2’. Scan data not conforming to the assumed filename convention should be renamed accordingly by the secondary software prior to spawning SEDFIT. The parameters FilterDataSpikes and DataSpikeThreshold will control whether to ignore isolated spikes in the scan data and the threshold for defining a spike, analogous to the corresponding Loading Options and Tools function of the Options menu of SEDFIT. If the parameters relating to spike filtering are not set then the default filtering with a threshold of 0.4 will be applied.
The input parameters related to the SEDFIT workflow include AutoRun and AutoFit, both of which take Boolean values TRUE or FALSE. If TRUE they will automatically initiate the SEDFIT Run or Fit command, respectively, directly after loading the data. This will be equivalent to manually invoking these commands from the SEDFIT menu. (It should be noted that when using AutoFit the analysis is carried out before the SEDFIT window appears on the screen.) To enhance the performance of SEDFIT, a parameter NumberComputationThreads can be set according to the computational cores that SEDFIT should employ for multithreaded computations. In c(s) distribution analyses in SEDFIT, key computational steps will be approximately n-fold faster if n > 1 threads are specified and available [52]. While the default is 2, the optimal value will depend on the computer hardware and on concurrently running software, which may be other instances of SEDFIT. The best choice for n is usually the total number of processing cores of the computer used. Finally, parameters related to the graphical data presentation are ShowResidualsHistogram and AutoSubtractSystematicNoise, which will take Boolean values TRUE or FALSE and control the visual appearance of the SEDFIT analysis window as in their corresponding SEDFIT menu functions [9,10,53]. In addition to changing the graphical display of the SEDFIT window during data analysis, an image of this window will be saved for documentation at the end of the analysis. Scan data, best-fit values, and residuals will also be saved in an output file, so that the graph showing the quality of fit can be recreated (see below).
Ancillary parameters needed for the analysis are the estimated locations of the meniscus and bottom of the solution column, and their fitting limits. Meniscus and Bottom, as well as LeftFitLimit and RightFitLimit, are specified as numerical values in cm from the center of rotation, and will function equivalently to their customary graphic input in SEDFIT. During the SEDFIT analysis, they can still be readjusted by the operator or by the fitting routine. If the parameter MeniscusFitted is set TRUE, the meniscus value will be optimized during the fit, remaining within the bounds specified in the MeniscusLowerLimit and MeniscusUpperLimit parameters (which should bracket the Meniscus value and be smaller than LeftFitLimit). Fitting for the meniscus is usually recommended, and therefore coarse initial estimates may suffice, for example, from a table of expected solution column heights dependent on filling volume in standard double-sector cells provided in [3], from a simple identification of the meniscus artifact in scan files, or from a preliminary analysis. Analogous parameters are available for fitting the bottom of the solution column (Table 1), however, unless significant back-diffusion is affecting the sedimentation process, typically the bottom position does not need to be fitted in standard analysis and the corresponding value for BottomFitted would be FALSE.
Most importantly, the parameter Model specifies the data analysis model. It currently can take string values of ‘lsgofs’ for the ls-g*(s) model [7,8] and ‘cofs’ for the c(s) model [6,7]. Specifying these models is equivalent to their selection in the Model menu of SEDFIT. Both are sedimentation coefficient distribution models that require definition of the discrete grid of s-values through Smin, Smax, and Resolution. Analogous to the manual entry of these parameters in the model parameter boxes of SEDFIT, they take numerical values describing the range of the distribution and the number of grid points. Alternatively, if the parameter GridfromFile is set TRUE, the grid of s-values can be read from a file ‘sdist’ with the same extension as the scan files (such as ‘sdist.ip2’) that must be located in the same folder as the scan data. This text file will automatically be created after each analysis and contains a single column of s-values of the distribution grid, but it can also be edited and augmented and serve as a template for new distribution analyses with custom-spaced grid, for example, to efficiently describe very small or very large sedimenting species outside the range of the majority particles of interest. Another option to modify the grid is UseLogSpaceSgrid, which when TRUE abandons equidistant grids in favor of logarithmically increasing s-value intervals. When using the command line interface these grid functions operate identically with their common stand-alone use of SEDFIT.
The c(s) analysis applies diffusional deconvolution to achieve sedimentation coefficient distributions with high hydrodynamic resolutions. When using the SEDFIT command line interface, the most commonly used standard c(s) variant is applied where the diffusion coefficients associated with each sedimentation coefficient are based on a hydrodynamic scaling law via a constant hydrodynamic frictional ratio. Thus, the parameter StartingFrictionalRatio must be set. The numerical values of the predicted diffusion coefficients also depend on the partial specific volume Vbar (to be specified in mL/g) and the buffer density BufferDensity (in g/mL). During the analysis, non-linear regression will be used to optimize the frictional ratio during the fit when FrictionalRatioFitted is TRUE. It should be noted that, unless the final frictional ratio is to be interpreted quantitatively, the partial specific volume values can be rough estimates. The distribution analysis also requires regularization to avoid spurious peaks and error amplification [6,52], which is set through the parameter RegularizationType. It currently can take values ‘maxent’ for maximum entropy regularization or ‘Tikhonov’ for Tikhonov-Philips regularization [6]. As is standard in SEDFIT, the tolerated increase in the root-mean-square deviation (rmsd) of the fit allowed for regularization is scaled by F-statistics and a p-value specified in the parameter RegularizationPvalue [7]. Since sedimentation patterns of very small particles are very similar to baseline offsets, a correlation between baselines and distribution values at small s-values can exist. As shown previously, this correlation can be suppressed through a Bayesian prior at the smallest s-value [54], and this is specified by setting the parameter SupressBaselineCorrelation to TRUE.
Lastly, details of the fit can be specified through the command line interface. The fitting algorithm is chosen by setting the parameter FittingAlgorithm to either ‘Simplex’ or ‘Levenberg-Marquardt’. In addition to the above mentioned fitting of meniscus and bottom of the solution column and the frictional ratio, the treatment of baselines can be specified in the parameters BaselineFitted, RINoiseFitted, and TINoiseFitted. If set to TRUE this will cause a spatio-temporally uniform baseline, a time-dependent baseline, and/or a radial-dependent baseline to be fitted during the nonlinear regression, respectively [9,10]. It should be noted that even when the SEDFIT operation is set to automatically execute a Run and Fit command, these procedures can be interrupted and/or manually executed by the SEDFIT operator as usual. Even in the reduced SEDFIT menu it is possible to readjust solution column parameters and other fitting parameters to achieve the best fit prior to concluding the analysis. The usual side effect of the SEDFIT analysis is the creation of the above mentioned ‘sdist’ file containing the s-value grid of the distribution analysis, as well as a file named ~tmppars that, when manually reloading the scan files will allow to restore the last best-fit analysis. Both files are located in the data directory and have the same file extension as the scan files loaded.
When the operator invokes the Exit function of SEDFIT, several files are created prior to the termination of the SEDFIT process and placed in the designated output folder defined in the OutputResultsDirectory parameter. The first is a bitmap image of the SEDFIT window, saved as ‘screenshot.bmp’. Besides the graphical display of the scan file overlay and fit, residuals overlay, distribution, and optional residuals bitmap, it shows the customary informational text that provides information about the data files and fitting parameters including the overall rmsd of the fit. Further, it creates text files ‘RInoise.dat’ and ‘TInoise.dat’ which contain two columns with the best-fit time- and radial-dependent baseline values. A file ‘ScanRMSD.dat’ provides information about the rmsd of the fit to each scan file separately. This may help to recognize trends or outliers. The file ‘distribution.dat’ contains two columns with the distribution in form of grid s-values vs. c(s) values. This distribution file can later be integrated used for further analysis in the secondary software. A file ‘dfr.dat’ saves the fitted boundary data for recreation of the data and fit overlay plot, and a residuals plot or bitmap. It is an ASCII text file in the form of a matrix with columns: radius (TI noise), TI noise, RI noise, radius (scan 1), raw data (scan 1), fit value (scan 1), radius (scan 2), raw data (scan 2), fit value (scan 2), etc., and rows corresponding to consecutive radii or scan times, respectively. Finally, SEDFIT creates an output file ‘ResultParameters.xml’ in the same xml format as the input file and containing the same parameters, some of which may have changed due to adjustments during the fit or by operator actions. Such changes may be registered in the spawning program. In addition, it reports the SEDFIT version, file paths of all input SV-AUC scan data files and all output files, as well as the PassThrough parameter.
As statistical measures of the quality of fit the output file reports the overall rmsd (RMSD), the number of data points fitted (RMSD-points), the sum of squared residuals (RMSD-SSR), the runs test Z-value (RunsTestZ), and the histogram H (HistogramH) [7,53]. Additional information about the data include whether scan file time stamps could be accessed for correction (CheckTimeStamps) [55], the rotor speed (RotorSpeed), the time and accumulated ω2t value of the last scan (tLastScan and w2tLastScan), the rotor temperature at the time of the first and last scan (TemperatureStart and TemperatureEnd), as well as the temperature average and largest temperature difference (TemperatureAverage and TemperatureDiffMax-Min). These parameters may be used for experimental quality control to flag the possible presence of convection artifacts [56,57]. A full list of the output parameters can be found in Table 2.
Output will be written in an xml formatted file in the designated output folder. Output parameters include the same parameters regarding data, model, and solution conditions as the input parameters, but also include the additional parameters in this table.
To indicate termination of the SEDFIT analysis and to allow hand-over to the secondary software, the AllDoneFlagFile is created. As mentioned above, this ASCII text file contains as sole entry the handshake string from the command line starting SEDFIT. Creation of this file also indicates that the other results files have been created, which will not be the case if SEDFIT is prematurely exited.
In summary, to access the SEDFIT command line interface any secondary software must carry out the following tasks (Fig 1): 1) organize data such as scan files and starting analysis parameters; 2) generate an input xml file that contains the desired SEDFIT completion flag file (which must not exist yet) and designate the directory for results files; 3) execute SEDFIT with the command line parameters, including a handshake string; 4) wait for completion of the analysis by periodically checking for the creation of the specified completion flag file containing the handshake string; 5) read the results from the xml and other output files created by SEDFIT; 6) perform optional quality checks, optional integration of the distribution results, carry out secondary analyses, and/or write automated reports. These tasks can be wrapped in access controlled environment, as necessary in the GMP setting. To allow efficient analyses of a large number of samples, the SEDFIT interface can be run in multiple instances side-by-side; if care is taken to create unique AllDoneFlagFile files and OutputResultsDirectory locations, different SEDFIT instances will operate completely independently of each other.
Results
To test the command line interface we wrote a family of MATLAB scripts for data input and retrieval of results, termed ‘mlSEDFIT’. The script may be taken as a template for further modification. We applied it to the analysis of stressed NISTmAb monoclonal antibody [58] that is partially denatured and presents a series of oligomeric populations and a multimodal sedimentation boundary (Fig 2). Using the AutoRun and AutoFit option, the non-linear regression converges at an rmsd of 0.006743 OD (Fig 2, solid lines) with a best-fit meniscus at 6.1657 cm, a best-fit frictional ratio of 1.37, and the c(s) distribution shown in Fig 3.
Top: Scan files and best fit (for clarity, showing black dots only for every 2nd data point of every 2nd scan) with a c(s) model automatically converged to a final rmsd of 0.006743 OD (colored lines). Progression of scan time is indicated by color from purple to red. Middle and Bottom: Residuals bitmap and residuals overlay. Plot was made using the software GUSSI [59], which is spawned from the script mlSEDFIT.
The distribution from command line operation (Fig 2), and exhibits a monomer peak at 6.477 S with 29.20% of signal, a trace degradation product at 4.199 S with 0.95% of signal, a dimer peak at 9.473 S with 12.51% of signal, and higher aggregates with collective sw 16.799 S and 51.77% of signal. The analogous manually operated analysis producing a monomer peak at 6.481 S with 29.27% of signal, a degradation product of 4.178S with 0.92% of the signal, a dimer peak at 9.488 S with 12.51% of signal, and higher aggregates with collective sw of 16.81 S with 51.73% of signal. Integration and plot were made using the software GUSSI [59], which can be spawned from the script mlSEDFIT.
It is possible to manually reload the command line generated analysis and inspect the fit further. However, as an independent control we loaded the same data separately in standard operation of SEDFIT and performed an analysis with the same model. It converged to an rmsd of 0.006738 OD, with a best-fit meniscus at 6.1655 cm, a best-fit frictional ratio of 1.35, and a c(s) distribution that is virtually identical in all aspects to the distribution from command line operated SEDFIT analysis (Fig 3). Differences of < 0.01 S in sedimentation coefficients and < 0.1% in population for all peaks are observed, which is better than the typical accuracy and statistical precision of SV-AUC analysis.
Generally, one practical limitation in comparing analysis results can be the required graphical input of the fitting limits, which is replaced by numerical control in the command line parameters. Similarly, detailed loading option preferences may differ in the two operation modes. Furthermore, small and insignificant numerical differences in repeat analyses should be expected with the Simplex algorithm for fitting, since this involves initial randomization of fitting parameters. Small differences may also be found when adopting different paths in the error surface during non-linear regression. In the present case, remaining insignificant differences in the fit are a result of a locally very flat error surface for the precise numerical value of the frictional ratio parameter, which ultimately reflects the limit of diffusion information content of the broad sedimentation boundaries of the sample dataset used.
The script mlSEDFIT for testing and demonstrating the new interface is equipped with functions for automatically pre-determining an estimate for the meniscus position for absorbance data prior to spawning SEDFIT, and for creating high-quality illustrations by spawning the software GUSSI [59] utilizing the data, fit, and residuals values retrieved from SEDFIT after the data analysis. In addition, it can integrate the distributions alternatively through a graphical process or through pre-determined integration limits (Fig 4). Finally, it will save the results alongside the SEDFIT output parameters. mlSEDFIT can be easily customized and extended, and may be compiled to prevent further modification.
The output generated through the command line interface can be read in the mlSEDFIT script. For example, integration of distribution peaks can be carried out in this script after mouse clicks on the peaks in the distribution plot, as shown.
Discussion
SV-AUC has become an indispensable tool to study particle size distributions in science and biopharmaceutical industry [38–40,56,60,61]. Therefore, implementation of SV-AUC analyses in the GMP environment would be desirable. The lack of SV-AUC analysis compatible with the GMP environment was previously discussed by Savelyev and colleagues [62]. Their software ULTRASCAN GMP provides data access and analysis workflow control, but unfortunately the SV-AUC data analysis in ULTRASCAN includes ad hoc algorithms that are mathematically uncertain in important aspects, and computationally excessively wasteful requiring supercomputers [52]; therefore it has only a minor share in SV-AUC applications in biopharmaceutical applications, the vast majority of which are carried out with the c(s) and ls-g*(s) methods implemented in SEDFIT [7,41]. Furthermore, ULTRASCAN GMP presents a closed system with pre-conceived workflow strategies and analyses that may not be suitable or adaptable to different objectives. Moreover, it is linked to a particular version of the analytical ultracentrifuge instrument, of which deficiencies for certain applications have been reported [56], and which is currently incapable of providing independent operating system time-stamps of the scan files to the analyst to verify time accuracy [55].
The availability of the computational interface for SEDFIT provides a computational core for flexible state-of-the-art SV-AUC analysis as a module that can be easily embedded into scripts and software satisfying GMP requirements, including auditable analysis trails and custody of data and results. For demonstration and customization, a generic MATLAB script for spawning SEDFIT was developed, and similar access could conceivably be incorporated into user-friendly software such as GUSSI [59], or into ULTRASCAN GMP [62] or other custom-written GMP software. The extent to which 21CFR11 requirements are met will depend on the spawning software, and the SEDFIT spawning mode chosen.
At present, the analysis still needs to be supervised, since manual adjustments to the fitting parameters and model may be required to arrive at the best-fit analysis. This allows adventitious scan files or other possible artifacts from experimental imperfections to be recognized and their effect to be alleviated. Detailed protocols and instructions can be found in the literature [3,7,41,47,63–65], and for reliable results this guidance is equally valid when using the command line interface. Nonetheless, the interface described here can provide a platform for future improvements that conceivably may allow fully unsupervised analyses by expert systems or AIs with automated meniscus recognition, adjustment of fitting limits, and judgment of fit quality, which may be applied, for example, to replicate experiments. For series of equivalent experiments, the present version already allows the result (for example, meniscus position or frictional ratio) of an initial analysis to be automatically entered as starting parameters for a following data set, thereby improving the efficiency of the analysis. Besides the GMP environment, this may be useful for analyzing large families of experimental data sets designated for meta-analyses, such as collective integration of distributions for binding isotherms and their analysis [23,51,65]. This ties in with developments of higher throughput experimental techniques, such as pseudo-absorbance data acquisition without need of a reference sector [66] and 3D-printed multi-sector centerpieces [67,68].
Another area for future expansion of the SEDFIT interface is the extension to allow pre-selection of more sophisticated analysis models. For example, biopharmaceutical samples often contain small co-solutes that sediment and create dynamic density and signal gradients, both of which can be taken into account when analyzing macromolecular sedimentation [69–71]. Similarly, different sedimentation configurations, such as analytical zone or band sedimentation can be highly desirable for certain applications [31,34,72,73], and advanced regularization methods may be advantageous [7,74,75]. While the corresponding analysis methods are currently available if SEDFIT is spawned in the unrestricted mode, future releases may allow passing their relevant parameters directly through the command line interface.
Importantly, since none of the computational functions from SEDFIT have been altered, the results will remain the same as in the equivalent standard operation of SEDFIT. The command line interface solely modifies the data input and output, replacing manual startup and loading of analysis files with automated pre-loaded SEDFIT. For this reason, the command line mode of SEDFIT will be applicable to the same range of current and future applications. With regard to the biopharmaceutical industry this includes studies of therapeutic peptides and proteins, polymer conjugates, nucleic acids, carbohydrates, vectors for therapeutics or vaccines based on metal nanoparticles, lipid nanoparticles, viral vectors such as adenovirus, AAV or lentivirus, and others. More generally, due to the universal nature of buoyant mass-based separation in SV-AUC and the high sensitivity and hydrodynamic resolution of c(s) analysis in SEDFIT, it will be applicable to study mass- and size-distributions of macromolecules and particles that differ in density from that of the formulation buffer across a mass range from 1 kDa to >10 GDa, or a sedimentation coefficient range between 0.1 and 100,000 S [76,77].
References
- 1.
Svedberg T, Pedersen KO (1940) The Ultracentrifuge. London: Oxford University Press.
- 2.
Schachman HK (1959) Ultracentrifugation in Biochemistry. New York: Academic Press.
- 3.
Schuck P, Zhao H, Brautigam CA, Ghirlando R (2015) Basic Principles of Analytical Ultracentrifugation. Boca Raton, FL: CRC Press. 302 p.
- 4. Schuck P (2013) Analytical ultracentrifugation as a tool for studying protein interactions. Biophys Rev 5: 159–171. pmid:23682298
- 5. Brown PH, Schuck P (2008) A new adaptive grid-size algorithm for the simulation of sedimentation velocity profiles in analytical ultracentrifugation. Comput Phys Commun 178: 105–120. pmid:18196178
- 6. Schuck P (2000) Size-distribution analysis of macromolecules by sedimentation velocity ultracentrifugation and Lamm equation modeling. Biophys J 78: 1606–1619. pmid:10692345
- 7.
Schuck P (2016) Sedimentation Velocity Analytical Ultracentrifugation: Discrete Species and Size-Distributions of Macromolecules and Particles. Boca Raton, FL: CRC Press. 244 p.
- 8. Schuck P, Rossmanith P (2000) Determination of the sedimentation coefficient distribution by least-squares boundary modeling. Biopolymers 54: 328–341. pmid:10935973
- 9. Schuck P, Demeler B (1999) Direct sedimentation analysis of interference optical data in analytical ultracentrifugation. Biophys J 76: 2288–2296. pmid:10096923
- 10. Schuck P (2010) Some statistical properties of differencing schemes for baseline correction of sedimentation velocity data. Anal Biochem 401: 280–287. pmid:20206114
- 11. Pavlov GM, Korneeva EV, Smolina NA, Schubert US (2010) Hydrodynamic properties of cyclodextrin molecules in dilute solutions. Eur Biophys J 39: 371–379. pmid:19159925
- 12. Pechar M, Pola R, Laga R, Braunová A, Filippov SK, et al. (2014) Coiled coil peptides and polymer-peptide conjugates: Synthesis, self-assembly, characterization and potential in drug delivery systems. Biomacromolecules 15: 2590–2599. pmid:24857680
- 13. Zhao H, Wu D, Hassan SA, Nguyen A, Chen J, et al. (2023) A conserved oligomerization domain in the disordered linker of coronavirus nucleocapsid proteins. Sci Adv 9. pmid:37018390
- 14. Naue N, Curth U (2012) Investigation of protein-protein interactions of single-stranded DNA-binding proteins by analytical ultracentrifugation. Methods Mol Biol 922: 133–149. pmid:22976181
- 15. Manna A, Zhao H, Wada J, Balagopalan L, Tagad HD, et al. (2018) Cooperative assembly of a four-molecule signaling complex formed upon T cell antigen receptor activation. Proc Natl Acad Sci 115: 201817142. pmid:30510001
- 16. Ebel C (2011) Sedimentation velocity to characterize surfactants and solubilized membrane proteins. Methods 54: 56–66. pmid:21112401
- 17. Padrick SB, Brautigam CA (2011) Evaluating the stoichiometry of macromolecular complexes using multisignal sedimentation velocity. Methods 54: 39–55. pmid:21256217
- 18. Grube M, Dinu V, Lindemann H, Pielenz F, Festag G, et al. (2020) Polysaccharide valproates: Structure—property relationships in solution. Carbohydr Polym 246: 116652. pmid:32747284
- 19. Pavlov GM, Knop K, Okatova OV, Schubert US (2013) Star-brush-shaped macromolecules: Peculiar properties in dilute solution. Macromolecules 46: 8671–8679.
- 20. Bekdemir A, Stellacci F (2016) A centrifugation-based physicochemical characterization method for the interaction between proteins and nanoparticles. Nat Commun 7: 13121. pmid:27762263
- 21. Sousa AA, Schuck P, Hassan SA (2021) Biomolecular interactions of ultrasmall metallic nanoparticles and nanoclusters. Nanoscale Adv 3: 2995–3027. pmid:34124577
- 22. Schuck P, Zhao H (2013) Biophysical methods for the study of protein interactions. Methods 59. pmid:23522094
- 23. Schuck P (2010) Sedimentation patterns of rapidly reversible protein interactions. Biophys J 98: 2005–2013. pmid:20441765
- 24. Chaton CT, Herr AB (2015) Elucidating complicated assembling systems in biology using size-and-shape analysis of sedimentation velocity data. Methods Enzymol 562: 187–204. pmid:26412652
- 25. Ebel C, Birck C (2021) Sedimentation Velocity Methods for the Characterization of Protein Heterogeneity and Protein Affinity Interactions. Methods Mol Biol 2247: 155–171. pmid:33301117
- 26. Perugini MA, Schuck P, Howlett GJ (2002) Differences in the binding capacity of human apolipoprotein E3 and E4 to size-fractionated lipid emulsions. Eur J Biochem 269: 5939–5949. pmid:12444983
- 27. Mehn D, Iavicoli P, Cabaleiro N, Borgos SE, Caputo F, et al. (2017) Analytical ultracentrifugation for analysis of doxorubicin loaded liposomes. Int J Pharm 523: 320–326. pmid:28342788
- 28. Maruno T, Usami K, Ishii K, Torisu T, Uchiyama S (2021) Comprehensive Size Distribution and Composition Analysis of Adeno-Associated Virus Vector by Multiwavelength Sedimentation Velocity Analytical Ultracentrifugation. J Pharm Sci 110: 3375–3384. pmid:34186069
- 29. Saleun S, Mas C, Le Roy A, Penaud-Budloo M, Adjali O, et al. (2023) Analytical ultracentrifugation sedimentation velocity for the characterization of recombinant adeno-associated virus vectors sub-populations. Eur Biophys J. pmid:37106255
- 30. Burnham B, Nass S, Kong E, Mattingly M, Woodcock D, et al. (2015) Analytical Ultracentrifugation as an Approach to Characterize Recombinant AAV Vectors. Hum Gene Ther Methods: 1–48. pmid:26414997
- 31. Khasa H, Kilby G, Chen X, Wang C (2021) Analytical band centrifugation for the separation and quantification of empty and full AAV particles. Mol Ther—Methods Clin Dev 21: 585–591. pmid:34095342
- 32. Yarawsky AE, Zai-Rose V, Cunningham HM, Burgner JW, DeLion MT, et al. (2023) AAV analysis by sedimentation velocity analytical ultracentrifugation: beyond empty and full capsids. Eur Biophys J. pmid:37037926
- 33. Wawra S, Kessler S, Egel A, Solzin J, Burkert O, et al. (2023) Hydrodynamic characterization of a vesicular stomatitis virus—based oncolytic virus using analytical ultracentrifugation. Eur Biophys J. pmid:37133524
- 34. Maruno T, Ishii K, Torisu T, Uchiyama S (2023) Size Distribution Analysis of the Adeno-Associated Virus Vector by the c(s) Analysis of Band Sedimentation Analytical Ultracentrifugation with Multiwavelength Detection. J Pharm Sci 112: 937–946. pmid:36374763
- 35. Trachtenberg S, Schuck P, Phillips TM, Andrews SB, Leapman RD (2014) A structural framework for a near-minimal form of life: Mass and compositional analysis of the helical mollicute Spiroplasma melliferum BC3. PLoS One 9: e87921. pmid:24586297
- 36. Zhao H, Mayer ML, Schuck P (2014) Analysis of protein interactions with picomolar binding affinity by fluorescence-detected sedimentation velocity. Anal Chem 86: 3181–3187. pmid:24552356
- 37. Chaturvedi SK, Ma J, Brown PH, Zhao H, Schuck P (2018) Measuring macromolecular size distributions and interactions at high concentrations by sedimentation velocity. Nat Commun 9: 4415. pmid:30356043
- 38. Liu J, Yadav S, Andya J, Demeule B, Shire SJ (2015) Analytical ultracentrifugation and its role in development and research of therapeutical proteins. Methods Enzymol 562: 441–476. pmid:26412663
- 39. Berkowitz SA, Engen JR, Mazzeo JR, Jones GB (2012) Analytical tools for characterizing biopharmaceuticals and the implications for biosimilars. Nat Rev Drug Discov 11: 527–540. pmid:22743980
- 40.
Berkowitz SA, Philo JS (2015) Characterizing biopharmaceuticals using analytical ultracentrifugation. In: Houde DJ, Berkowitz SA, editors. Biophysical Characterization of Proteins in Developing Biopharmaceuticals. Amsterdam: Elsevier. pp. 211–260. https://doi.org/10.1016/B978-0-444-59573-7.00009–9
- 41. Bou-Assaf GM, Budyak IL, Brenowitz M, Day ES, Hayes D, et al. (2022) Best Practices for Aggregate Quantitation of Antibody Therapeutics by Sedimentation Velocity Analytical Ultracentrifugation. J Pharm Sci. pmid:34986360
- 42. Gabrielson JP, Brader ML, Pekar AH, Mathis KB, Winter G, et al. (2007) Quantitation of aggregate levels in a recombinant humanized monoclonal antibody formulation by size exclusion chromatography, asymmetrical flow field flow fractionation, and sedimentation velocity. J Pharm Sci 96: 268–279. pmid:17080424
- 43. Chaturvedi SK, Parupudi A, Juul-Madsen K, Nguyen A, Vorup-Jensen T, et al. (2020) Measuring aggregates, self-association, and weak interactions in concentrated therapeutic antibody solutions. MAbs 12: 1810488. pmid:32887536
- 44. Parupudi A, Chaturvedi SK, Adão R, Harkness RW, Dragulin-Otto S, et al. (2021) Global multi-method analysis of interaction parameters for reversibly self-associating macromolecules at high concentrations. Sci Rep 11: 5741. pmid:33707571
- 45. Hopkins MM, Parupudi A, Bee JS, Bain DL (2021) Energetic Dissection of Mab-Specific Reversible Self-Association Reveals Unique Thermodynamic Signatures. Pharm Res. pmid:33604786
- 46. Philo JS (2009) A critical review of methods for size characterization of non-particulate protein aggregates. Curr Pharm Biotechnol 10: 359–372. pmid:19519411
- 47.
Arthur KK, Kendrick BS, Gabrielson JP (2015) Guidance to Achieve Accurate Aggregate Quantitation in Biopharmaceuticals by SV-AUC. 1st ed. Elsevier Inc. 1–24 p. https://doi.org/10.1016/bs.mie.2015.06.011 pmid:26412664
- 48. Lu Y, Harding SE, Turner A, Smith B, Athwal DS, et al. (2008) Effect of PEGylation on the solution conformation of antibody fragments. J Pharm Sci 97: 2062–2079. pmid:17828753
- 49. Clardy SM, Lee DH, Schuck P (2021) Determining the Stoichiometry of a Protein–Polymer Conjugate Using Multisignal Sedimentation Velocity Analytical Ultracentrifugation. Bioconjug Chem 32: 942–949. pmid:33848127
- 50.
FDA (n.d.) Part 11, Electronic Records; Electronic Signatures—Scope and Application. Available: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application. Accessed 5 May 2023.
- 51.
Schuck P, Zhao H (2017) Sedimentation Velocity Analytical Ultracentrifugation: Interacting Systems. Boca Raton, FL: CRC Press. 271 p.
- 52. Schuck P (2010) On computational approaches for size-and-shape distributions from sedimentation velocity analytical ultracentrifugation. Eur Biophys J 39: 1261–1275. pmid:19806353
- 53. Ma J, Zhao H, Schuck P (2015) A histogram approach to the quality of fit in sedimentation velocity analyses. Anal Biochem 483: 1–3. pmid:25959995
- 54. Brown PH, Balbo A, Schuck P (2007) Using prior knowledge in the determination of macromolecular size-distributions by analytical ultracentrifugation. Biomacromolecules 8: 2011–2024. pmid:17521163
- 55. Zhao H, Ghirlando R, Piszczek G, Curth U, Brautigam CA, et al. (2013) Recorded scan times can limit the accuracy of sedimentation coefficients in analytical ultracentrifugation. Anal Biochem 437: 104–108. pmid:23458356
- 56. Berkowitz SA, Laue TM (2021) Boundary convection during velocity sedimentation in the Optima analytical ultracentrifuge. Anal Biochem: 114306. pmid:34274312
- 57. Zhao H, Balbo A, Metger H, Clary R, Ghirlando R, et al. (2014) Improved measurement of the rotor temperature in analytical ultracentrifugation. Anal Biochem 451: 69–75. pmid:24530285
- 58. Schiel JE, Turner A, Mouchahoir T, Yandrofski K, Telikepalli S, et al. (2018) The NISTmAb Reference Material 8671 value assignment, homogeneity, and stability. Anal Bioanal Chem 410: 2127–2139. pmid:29411089
- 59. Brautigam CA (2015) Calculations and publication-quality illustrations for analytical ultracentrifugation data. Methods Enzymol 562: 109–133. pmid:26412649
- 60. Caputo F, Clogston J, Calzolai L, Rösslein M, Prina-Mello A (2019) Measuring particle size distribution of nanoparticle enabled medicinal products, the joint view of EUNCL and NCI-NCL. A step by step approach combining orthogonal measurements with increasing complexity. J Control Release 299: 31–43. pmid:30797868
- 61. Kirchhoff CF, Wang XZM, Conlon HD, Anderson S, Ryan AM, et al. (2017) Biosimilars: Key regulatory considerations and similarity assessment tools. Biotechnol Bioeng 114: 2696–2705. pmid:28842986
- 62. Savelyev A, Gorbet GE, Henrickson A, Demeler B (2020) Moving analytical ultracentrifugation software to a good manufacturing practices (GMP) environment. PLOS Comput Biol 16: e1007942. pmid:32559250
- 63. Zhao H, Brautigam CA, Ghirlando R, Schuck P (2013) Overview of current methods in sedimentation velocity and sedimentation equilibrium analytical ultracentrifugation. Curr Protoc Protein Sci 7: 20.12.1. pmid:23377850
- 64. Salvay AG, Communie G, Ebel C (2012) Sedimentation velocity analytical ultracentrifugation for intrinsically disordered proteins. Methods Mol Biol 896: 91–105. pmid:22821519
- 65. Zhao H, Li W, Chu W, Bollard M, Adão R, et al. (2020) Quantitative Analysis of Protein Self-Association by Sedimentation Velocity. Curr Protoc Protein Sci 101: 1–15. pmid:32614509
- 66. Kar SR, Kingsbury JS, Lewis MS, Laue TM, Schuck P (2000) Analysis of transport experiments using pseudo-absorbance data. Anal Biochem 285: 135–142. pmid:10998273
- 67. Juul-Madsen K, Zhao H, Vorup-Jensen T, Schuck P (2019) Efficient data acquisition with three-channel centerpieces in sedimentation velocity. Anal Biochem 586. pmid:31493371
- 68. To SC, Brautigam CA, Chaturvedi SK, Bollard MT, Krynitsky J, et al. (2019) Enhanced Sample Handling for Analytical Ultracentrifugation With 3D-Printed Centerpieces. Anal Chem 91: 5866–5873. pmid:30933465
- 69. Schuck P (2004) A model for sedimentation in inhomogeneous media. I. Dynamic density gradients from sedimenting co-solutes. Biophys Chem 108: 187–200. pmid:15043929
- 70. Zhao H, Brown PH, Balbo A, Fernandez Alonso MC, Polishchuck N, et al. (2010) Accounting for solvent signal offsets in the analysis of interferometric sedimentation velocity data. Macromol Biosci 10: 736–745. pmid:20480511
- 71. Gabrielson JP, Arthur KK, Kendrick BS, Randolph TW, Stoner MR (2009) Common excipients impair detection of protein aggregates during sedimentation velocity analytical ultracentrifugation. J Pharm Sci 98: 50–62. pmid:18425806
- 72. Vinograd J, Bruner R, Kent R, Weigle J (1963) Band-centrifugation of macromolecules and viruses in self-generating density gradients. Proc Natl Acad Sci USA 49: 902–910. pmid:13997382
- 73. Lebowitz J, Teale M, Schuck P (1998) Analytical band centrifugation of proteins and protein complexes. Biochem Soc Trans 26: 745–749. pmid:10047819
- 74. Wafer L, Kloczewiak M, Luo Y (2016) Quantifying trace amounts of aggregates in biopharmaceuticals using analytical ultracentrifugation sedimentation velocity: Bayesian analyses and F statistics. AAPS J. pmid:27184576
- 75. Brown PH, Balbo A, Schuck P (2008) A Bayesian approach for quantifying trace amounts of antibody aggregates by sedimentation velocity analytical ultracentrifugation. AAPS J 10: 481–493. pmid:18814037
- 76. Ma J, Zhao H, Sandmaier J, Liddle JA, Schuck P (2016) Variable-field analytical ultracentrifugation: II. Gravitational sweep sedimentation. Biophys J 110: 103–112. pmid:26745414
- 77. Mehn D, Rio-Echevarria IM, Gilliland D, Kaiser M, Vilsmeier K, et al. (2018) Identification of nanomaterials: A validation report of two laboratories using analytical ultracentrifugation with fixed and ramped speed options. NanoImpact 10: 87–96.