A software package for efficient patient trajectory analysis applied to analyzing bladder cancer development

We present the Patient Trajectory Analysis Library (PTRA), a software package for explorative analysis of patient development. PTRA provides the tools for extracting statistically relevant trajectories from the medical event histories of a patient population. These trajectories can additionally be clustered for visual inspection and identifying key events in patient progression. The algorithms of PTRA are based on a statistical method developed previously by Jensen et al, but we contribute several modifications and extensions to enable the implementation of a practical tool. This includes a new clustering strategy, filter mechanisms for controlling analysis to specific cohorts and for controlling trajectory output, a parallel implementation that executes on a single server rather than a high-performance computing (HPC) cluster, etc. PTRA is furthermore open source and the code is organized as a framework so researchers can reuse it to analyze new data sets. We illustrate our tool by discussing trajectories extracted from the TriNetX Dataworks database for analyzing bladder cancer development. We show this experiment uncovers medically sound trajectories for bladder cancer.

func M a l e F i l t e r ( p * P a t i e n t ) bool { return p .Sex != Male } Listing 2. API for implementing a patient filter.
Patient filters are called by the function ApplyPatientFilters which is called during input parsing (e.g.ParseTriNetxData in app/parseData.go).
In order to implement a new patient filter, one has to implement the interface in Listing 1.The CLI also has to be extended to be able to pass the new filter: extend getPatientFilter in ptra/main.go.
To implement meaningful patient filters, one also has to have an understanding of the structure that implements patients, cf.trajectory.Patient and the functions that operate on this type.The most important fields of the struct are highlighted in Listing 3. Please consult the code documentation on github for details.In addition to patient filters, it is also possible to implement trajectory filters, to control which trajectories are written to output.The standard API to implement a trajectory filter is shown in Listing 4. Analogous to patient filters, a trajectory filter is a boolean function that takes as input a trajectory and return true if the intent is to keep the trajectory for output or false if it is to be removed.Trajectory filters are called by the function BuildTrajectories in ptra/trajectory/trajectory.go.
In addition to implementing the interface in Listing 4, implementing a new trajectory filter also requires extending the CLI so that filter can be passed: extend getTrajectoryFilter in ptra/main.go.

Implementing a new use case
In order to implement trajectory analysis for a new data set, a five step protocol can be followed: 1. Parse the input data into a trajectory.Experiment structure.
4. Output the found trajectories to disk. 5. Cluster the found trajectories and write the output to disk.
These steps are explained in the next sections.

Parse the data inputs
The core data structure for implementing trajectory analysis is the trajectory.Experiment structure shown in Listing 7: Some of these slots are initialised by ptra functions in subsequent steps of the protocol, but the following need to be initialized explicitly: • Name: a name for the experiment.This name is used for creating file names when writing output to disk.
• NofAgeGroups: the number of age groups for the experiment.This can be a parameter passed via the CLI as in the TriNetX case cf.ptra/main.go.
• Level: this is an optional integer slot to hold a 'level'.It is only used for logging.
For the TriNetX case, this refers to the level in the ICD10 hierarchy the analysis operates on.
• NofDiagnosisCodes: this is the number of diagnosis codes used in the input data.This number is used to size different data structures that are initialized for calculating RR scores.For example, the idea is that diagnosis codes/medical events are mapped on a unique analysis ID, counting from 0 to NofDiagnosisCodes.We can then initialize, for example, an array of size NofDiagnosisCodes and use the analysis ID of a diagnosis as an index into this array.
For example, assume the following medical events occur in the diagnosis histories: Cough, Dyspnea, COPD, BMI>30, High Blood Pressure, Diabetes 0 1 2 3 4 5 There are 6 medical events, so NofDiagnosisCodes = 6.We could assign them analysis IDs as shown by using a simple counter.
• DxDRR: this is a matrix that stores for each diagnosis pair the relative risk score for that pair.This matrix must be initialized when creating a trajectory.Experiment via the function trajectory.MakeDxDRR.• DxDPatients: this is a matrix that stores for each diagnosis pair the patients diagnoses with that pair.The matrix must be initialized when creating a trajectory.Experiment via the function trajectory.MakeDxDPatients.This function takes a single size parameter, cf.NofDiagnosisCodes.
• IdMap: maps analysis IDs for diagnoses onto diagnosis ID that occurs in the input.For example, in the TriNetX case, this would be ICD10 code of a diagnosis.See initializeIcd10AnalysisMapsFromXML as an example.
• Cohorts: this represents an array with all cohorts in the experiment.The statistical model behind the RR calculation divides the patients into cohorts according to their sex and age.This is to make sure statistical sampling experiments can compare similar patients (to avoid Simpson's paradox).
Concretely, patients are split into male and female patients.Both sexes are subsequently split up into age groups.
-The nofAgegroups argument are the number of age groups to use when defining cohorts.Can be a CLI parameter.
-The nofDiagnosisCodes: the number of different diagnosis codes used in the input.
• DPatients: has for each diagnosis the list of patients that are diagnosed with that diagnosis.This can be obtained from the patient lists of the cohorts collected by trajectory.InitializeCohorts (see Cohort.DPatients).
• Pairs and Trajectories do not need to be initialized, as they are filled in at later steps.

Initialize the experiment relative risk ratios
The relative risk ratios (RR) are initialized by calling the function trajectory.InitializeExperimentRelativeRiskRatios.The signature of this function is shown in Listing 9: func I n i t i a l i z e E x p e r i m e n t R e l a t i v e R i s k R a t i o s ( exp * Experiment , minTime , maxTime float64 , i t e r i n t ) Listing 9. Signature of the function for initializing relative risk ratios The parameters are: • the trajectory.Experiment object exp created in step 1 • the minTime and maxTime parameters respectively for the minimum and maximum allowed time between diagnoses to be considered for RR calculation.This is a parameter passed via the CLI.
• the iter parameter that determines the number of sampling iterations for calculating the RR.This is a parameter passed via the CLI.

Build the experiment's trajectories
The trajectories are built by calling the function trajectory.BuildTrajectories.The signature of this function is shown in Listing 10: December 21, 2022 5/6 type T r a j e c t o r y F i l t e r func ( t * T r a j e c t o r y ) bool Listing 4. API for implementing a trajectory filter.An example trajectory filter is shown in Listing 5.It implements a filter that makes sure only trajectories containing a bladder cancer-related diagnosis are kept for output.func B C F i l t e r ( exp * t r a j e c t o r y .Experiment ) t r a j e c t o r y .T r a j e c t o r y F i l t e r { return func ( t * t r a j e c t o r y .T r a j e c t o r y ) bool { f o r , d i d := range t .D i a g n o s e s { icdCode := exp .IdMap [ d i d ] i f len ( icdCode ) >= 3 { subCode := icdCode [ 0 : 3 ] i f subCode == "C67" { return t Example trajectory filter to remove trajectories without a bladder cancer-related diagnosis.
Trajectory filters operate on trajectory objects implemented by the struct trajectory.Trajectory.It is crucial to look at the December 21, 2022 2/6fields of that type and the functions that operate on it (Listing 6).We refer to the source code documentation for details.
The ps argument represents the patient object parsed from the input.The patients should be passed as a trajectory.PatientMap object: