Global optimization of default phases for parallel transmit coils for ultra-high-field cardiac MRI

The development of novel multiple-element transmit-receive arrays is an essential factor for improving B1+ field homogeneity in cardiac MRI at ultra-high magnetic field strength (B0 > = 7.0T). One of the key steps in the design and fine-tuning of such arrays during the development process is finding the default driving phases for individual coil elements providing the best possible homogeneity of the combined B1+-field that is achievable without (or before) subject-specific B1+-adjustment in the scanner. This task is often solved by time-consuming (brute-force) or by limited efficiency optimization methods. In this work, we propose a robust technique to find phase vectors providing optimization of the B1-homogeneity in the default setup of multiple-element transceiver arrays. The key point of the described method is the pre-selection of starting vectors for the iterative solver-based search to maximize the probability of finding a global extremum for a cost function optimizing the homogeneity of a shaped B1+-field. This strategy allows for (i) drastic reduction of the computation time in comparison to a brute-force method and (ii) finding phase vectors providing a combined B1+-field with homogeneity characteristics superior to the one provided by the random-multi-start optimization approach. The method was efficiently used for optimizing the default phase settings in the in-house-built 8Tx/16Rx arrays designed for cMRI in pigs at 7T.


Introduction
The implementation of MRI-scanners, operating at ultra-high field static magnetic fields (UHF of B 0 �7T, promises a significant increase of both SNR and the spatial resolution (up tõ 200μm in-plane) [1] of clinical MR-images. Therefore, despite technical challenges related to both B 0 and B 1 + fields' inhomogeneities, the interest in the application of UHF scanners for cardiovascular MRI (cMRI) grows [2][3][4]. At the Larmor frequency of protons � 300MHz (B 0 �7T) and electrical permittivity of muscle tissue (ε�60), the wavelength of B þ 1 À field (10-12cm) is smaller than the dimensions of a human thorax. This leads to the establishment of a standing wave regime and creates interferences of a B 1 + -field across an imaged field-of-view (FOV). Additionally, the homogeneity of the B 1 + -field suffers from strong differences in the permittivity of intra-thoracic structures, in particular, of the lung parenchyma and cardiac muscle tissue. Thus, the spatial distribution of contrast in MR images becomes essentially heterogeneous rising demand for methods capable to neutralize these negative factors. During the last decades, multiple elements transmit-receive arrays (mTx-arrays) combined with parallel transmit (pTX) technology became an emerging tool for the improvement of the B 1 + -field homogeneity in UHF cMRI. Starting with standard birdcage coils with individual control of the phases of the driving ports, the technology development has led to customized arrays with up to 32 transceiver elements [5]. The type of mTX-arrays elements also evolved from classical magnetic loops to electrical dipoles, microstrips, and sophisticated loop-dipole hybrids [5][6][7][8][9][10][11]. The novel hardware allows for shaping the optimal spatial distribution of the B 1 + -field to excite a targeted transversal magnetization with the best possible smoothness of both magnitude and phase within a region-of-interest (ROI). This procedure is usually called "B 1 + -shimming". Additionally, using the RF-power amplifier (RFPA) operating in pTX mode allows for driving individual TX-arrays elements dynamically, i.e. with varying magnitude and phase of the RF-pulse waveforms to shape excitation profiles (including 2D and 3D) using different optimization criteria [12][13][14][15][16]. Last but not least, both techniques allow for including safety margins regarding specific adsorption rate (SAR) of electromagnetic energy.
Despite the significant progress in the field of subject-specific static and dynamic B 1 + -shimming, it remains to be a quite complex research tool. It requires deep expertise in MR-physics, pulse sequence programming, and optimization methods, to build a pipeline including (i) B 1 + -mapping, (ii) B 1 + -shimming parameters computation taking into account SAR margins, and (iii) integration on the scanner system. For cardiac MRI additional significant efforts for reliable absolute B 1 + -mapping are required to overcome limitations related to breathing and cardiac motion [17]. An important step in the development of mTx-arrays for UHF applications is the optimization and fine-tuning of the coil ensuring reasonable image quality and SAR safety for an average subject without additional adjustment procedures. In particular, this includes finding the default set of phase shifts required for the driving voltages of the individual TX-elements such that a uniform B 1 + distribution can be achieved for the largest possible Field-of-View (FOV). This set, usually called a "phase vector", can be integrated into the hardware as permanent or adjustable phase shifters implemented e.g in the form of coaxial cables of defined length. Alternatively, for the last generation of MR-scanners with support of pTX-mode (e.g. Magnetom™ "Terra", Siemens Healthineers), the default phase vector can be fixed in the coil configuration file to set the phases of driving voltages generated by RFPA. This provides operation of the mTX-array in the so-called "pTX Compatibility Mode". The default phase vector should shape a reasonable combined B 1 + -profile for an average subject making possible a straightforward application of the array in the single-Tx mode without RF-shimming as well as simplifying the B 1 -shimming process for pTX mode. For the birdcage volume resonators or dipole antenna arrays with cylindrical symmetry of elements arrangement, this usually corresponds to the "circularly polarized" ("CP") phases distribution based on the geometry of elements allocation [18]. However, the straightforward geometry-based approach for setting up default phases is not possible for the surface cardiac mTX-arrays having sophisticated shapes of individual elements and their allocation on a thorax. Therefore, for this type of mTX-arrays, the problem of default phasing is formulated as an optimization problem for a cost function maximizing homogeneity of the B 1 + -field with certain constraints or regularization on SAR and efficiency of using an available RF-power [5,8,11,19,20]. For typical numbers of Tx-elements (N = 8. . .16), the high-dimensional cost function has numerous local extremes. Many computational methods were applied in the context of both static and dynamic B 1 + manipulations which include linear and non-linear solvers approaches. However, most of these methods are targeted for the application of the B 1 + -shimming during the scan session, to deliver practical solutions under a strong time limitation (~1 min computation time). Therefore in most cases, the result represents a local optimum dependent on starting point used for the search process initialization. Moreover, these methods are mostly developed for the B 1 + -shimming using pTX-RFPA and manipulating with both phases and magnitudes of the driving voltages. Due to this reason, the optimization techniques developed for the subject-specific B 1 + -shimming providing quick but locally optimal solution within limited FOV is suboptimal for the optimization of default array elements phases as a part of the coil development and final fine-tuning. In this case, the computation time could be extended to hours and the main goal is to achieve the globally optimal characteristics of a shaped B 1 + in a frame of the formulated optimization problem.
For the small (N = 4-6) number of TX-elements, the most straightforward and universal solution of the discussed "phase only" optimization problem is the global brute force search over the complete phase space raster defined with reasonable discretization over each vector coordinate [21]. However, because the number of values in the raster grows exponentially with the number of elements this requires an extremely long computation time even for a moderate number of phase elements N�8 and discretization step (e.g 5˚). Therefore, often a prior experience and pragmatic restrictions on the searched phase vector components range are used to accelerate these computations. As an example, in the work [8] the brute force approach was used in a highly restricted phase space raster for 12 elements array (36 3 �40000 vectors was used). This procedure was considered to be sufficient for a symmetrical rectilinear array geometry and suggesting that phases should not vary along the z-direction. Alternatively, in the work [6], the search of optimal phases was performed using a non-linear solver (NLS) approach with multiple random starting points (referred to further as "random multi-start"). However, for keeping computation time reasonably short, the number of starting vectors for the NLS-optimization was deliberately limited to 1000. For the mTX array with 16 elements, this number corresponds to �10 −24 parts of the whole phase vector space gridded with 10s teps. As it'll be shown further in this paper, in an arbitrary array elements configuration, the straightforward usage of random-multi-start strategy may not allow us to detect a sufficiently good approximation of a global optimum of the typical optimization cost functions. This paper aims to propose a flexible pragmatic approach for searching for an optimal phase setting to shape a targeted default B 1 + -field of mTX arrays. The proposed numerical optimization strategy allows for a flexible trade-off between (i) sufficiently short computation time and (ii) probability of reaching the global optimum of the targeted B 1 + characteristics. It does not involve a priori constraints on the searched phase space (e.g. originating from the symmetry of the coil's geometry). In the current work, the proposed technique was validated using an inhouse developed 8TX/16RX dual part array for 7T cardiac MRI in pigs described in [22].

Theory
The combined field created by an mTX array with the constant driving amplitudes is expressed as: Here, F k ¼ e iφ k are complex phasors of the driving current for channels and b þ 1k ðrÞ are spatial B 1 + -maps of the individual coil elements. The phasing of an array is performed by control of the phasor vector {F} = {F 1 � � �F N } to achieve the targeted spatial homogeneity of combined field B 1c + (r). Cost functions and optimization problem. In general, the goal of B 1 + -shimming is to achieve a uniform excitation pulse flip-angle (FA) distribution within а whole imaged FOV.
However, using static B þ 1 À shimming and manipulating only with phases of individual elements this task would be either non-accomplishable or put very strong limitations on the range of achievable FA. Therefore, a pragmatic approach demands a certain degree of B þ 1c homogeneity characterized by statistical metrics of a distribution uniformity to be reached in the limited region-of-interest (ROI) Δr = (Δr x ,Δr y, Δr z ).
One of the widely used uniformity metrics used for the B 1 + -field shimming is the coefficient of variation: Here mean() and std() denote the mean value and the standard deviation calculated over the voxels within Δr. Similar to the work [23] we introduce a regularization factor Δ m to enhance the demand on homogeneity and compose the spatial uniformity cost function F u as: Among an achievable B 1 + -homogeneity, the essential characteristic of an mTX-array is the efficient usage of the available radiofrequency power. This can be controlled in the optimization procedure via an array transmit efficiency ratio of "sum-of-magnitudes" (SOM) and "magnitude-of-sum" (MOS) determined as [18,21,23]: Taking values in the range [0. . . 1] it characterizes the efficiency of power usage at the specific combination of phases F k producing B þ 1c . Finally, the optimization problem for the combined cost function F c to be solved for the determining phasor vector can be formulated as follows: fF opt g ¼ argmaxðF c ðDr; fΦgÞÞ F c ðDr; fΦgÞ ¼ F u ðDr; fΦgÞ � TX e ðDr; fΦgÞ ð5Þ

Method
Computation over sub-sampled phase space. The essential difficulty of solving the problem (5) is the periodic influence of each component of the phasor vector F k on the cost function F c leading to multiple local extremes (S1 Fig). In practice, this means that an optimization search initiated at a specific point {F 0 } ends up in the nearest local extremum {F opt }. This often makes a straightforward application of both derivative-based and other types of local NLS inefficient.
The most straightforward and universal solution for the problem is using a global exhaustive search varying all possible combinations of phasor vector components with a discrete step. However, for the given step δF and N Tx-elements, the global full-exhaustive search requires computing the cost function (360/δF) N times which for the δF�10˚and N = 16 would lead to more than 10 25 multiplications of complex matrices of size (Δr x �Δr y �Δr z ). The estimated computation time (up to several months on a high-performance workstation with multicores CPU) makes such an approach technically non-practical. In this work, we propose the computation strategy with an intelligent fusion of the brute-force approach and usage of local solvers to find a sufficiently good approximation of the global maximum of the cost function and, thus, globally optimal combined B 1 The complete flowchart scheme of the proposed computation method is shown in Fig 1. The goal of the first stage (Stage I) is to find a reasonable finite discrete representation of F c (Δr , {F}) using the N-dimensional discrete phase space {F}. For this purpose, the discrete grid of phase vector with N components should be formed as providing the "full-grid" discrete form of the cost function F c ðG N d� Þ, where L denotes the total number of nodes in the grid.
The further approach relies on two observations: (i) the cost function F c ({F}) is periodic by each phase vector component and (ii) the set of values {F top } providing values above 90% of its top (F c {F top } >0.9�max(F c )) is formed by relatively tiny (<10 −6 ) part of all vectors in full grid G N d� (see S1 Fig). Therefore, as the second step, the G N d� is randomly sub-sampled by the sampling operator R s with the uniform probability density function (PDF). This operation selects L s = L N /s vectors from G N d� , where s denotes the scaling factor of the effective nodes number reduction. In this work, the sub-sampling was performed using the uniform PDF. The usage of alternative sampling schemes (e.g. with Poisson disk PDF) for a further improvement of the computation efficiency of the proposed method can be also considered. As the result, this creates a "sub-sampled" grid: The phasor vector sub-space with the reasonably chosen subsampling by factor s�10 3 for N = 8 can be used to find the great majority of the summit points of the cost function  solver iterations to find the local optimum of the cost function for a single starting point in {F top }. This procedure is repeated iteratively (see Fig 1) to reduce the set of phasors {F opt } to a sufficiently small amount (N<10) which can be considered as a good approximation of the global optimum. The B 1c + field shaped by these vectors {F fin } can be analyzed based on various preference criteria for (e.g. minimal B 1c + gradients, maximal peak FA, best possible slice profile, etc) to select the variant used for the array default setup. In this work Stage II was performed using "fminsearch" Matlab function implementing the Nelder-Mead algorithm of optimization and known as less dependent from the starting point in comparison to other local solvers [24]. The performed in the Stage I preselection of the starting points allows for starting NLS-search close to the summits of the cost function and, thus, essentially increases the probability that the found solution vector will be globally optimal for the problem (5). B 1 + -profiles of the transmit elements. The described B 1 + optimization strategy relies on the knowledge of the complex B 1 + -maps of each Tx-element b 1k (r). However, to acquire the experimental B 1 + -maps a fully functional prototype of the array is required. Moreover, measuring these maps experimentally with sufficient reconstruction quality is often a non-trivial problem. The destructive interferences lead to attenuation of the MR-signal and producing artifacts of reconstruction or voids in the B 1 + -maps. If this takes place in the region where B 1 + shimming is targeted, the reliable optimization becomes problematic or requires significant efforts for data curation.
The alternative efficient approach is to use numeric electromagnetic simulations to compute the 3D distribution of the electromagnetic fields created by individual coil elements. After validating the simulated data by the experimental MR-measurements it can be further employed for the computation of the combined B 1c + in any arbitrarily placed region of interest.
Moreover, this allows to test the capabilities of the specific array design in terms of tailoring desired B 1 + profiles "in-silico" and perform necessary optimizations of both physical elements arrangement and electrical circuits [25]. In this work, the electromagnetic fields of the mTX-arrays were calculated using CST Studio Suite (3DS, Dassault Systems). The time-domain solver was used for the EM-simulation of the array's structures and the whole phantom volume within the coil using 4�10 7 mesh cells. The average computation time using a dual-core Intel Xeon (TM) E2650 CPU and two NVIDIA Tesla (TM) K80 GPUs was 48 hours for each simulated structure. The simulated data have been exported as 3D complex H-field values with the isotropic spatial resolution voxels of 4x4x4mm. Further computations were performed using in-house developed Matlab scripts (Matlab 2017a/b, Mathworks, USA).
Testing and validation of the phase space under-sampling strategy. To test and validate the efficiency of Stage I strategy in terms of a sub-sampling factor the optimization of phase vector for the six paired elements (Tx1-Tx4, Tx6, Tx7) was performed. These elements providẽ 90% of the total contribution in the combined B 1 + field in the center of the ROI targeted for the testing. Reducing dimension made it possible to perform computation over a full phase space grid and grid undersampled by low factors s within a reasonable time. The full phase space grid G L was built according to [6] using L = 17 nodes. With the span of {ϕ k } components within [-180˚..180˚] it provides the discretization step δϕ � 20 0 . The full grid was sampled according to [7] to create the sub-sampled grids U s with densities varying in a range of s = [1.. 10 4 ]. The grids with the different sub-sampling factors were used to test the fidelity of approximation of the cost function F c by the sub-sampled version Usage of CPU and GPU. In general, the efficiency of GPU computations depends on the GPU performance index and dimensionality of the problem, and the size of processed arrays. For N = 8 elements no advantage of GPU usage was found. However, the situation changes significantly for a higher-dimensional problem with N = 16 elements optimization. For the CPU the computation time grows roughly proportional to the number of voxels in the optimized ROI. At the same time for the GPU, the computation rate for both Stage I and Stage II calculations remains practically independent of the size of computed arrays up to 2.5�10 4 voxels in the optimized ROI (corresponds to 12x12x12cm volume). S2 Fig shows that computation time with a modern GPU may become drastically shorter compared to the CPU already starting from a targeted volume of~1300 voxels. Therefore, in practice, the decision of usage of GPUs within the computation pipeline should be based on benchmarking of small portions of the actual phase grid U s (e.g~0.5%) for the particular dimensionality of the problem and optimization volume.
Comparison to a traditional optimal phase computation method. To analyze the improvement of the homogeneity and mean value of B 1 + which can be obtained with our optimization method we compared it to the random multi-start optimization [6]. For N = 8 optimization a randomly selected set of 1000 phasor seed values with uniform distribution of components φ k in the range of [-180˚..180˚] was used as a starting point. The NLS-search was performed using the function (5). To provide sufficient statistics for the NLS-search comparison, the number of initial phasors {F 0 } for the pre-optimized start search was set to be 250. This corresponded to a cost function cut-off level of �70% from the top (the cut-off at 90% typically provides~20-30 phasors for Stage II). The comparison of the optimization methods was performed for an ROI within a spherical phantom P 1 (described below). The ROI with dimensions Δr = 50x50x50mm was placed as shown in Fig 5. Because the major intrinsic B 1 + inhomogeneity for the surface array is the gradient in the anterior-posterior ("Y-axis") direction, the optimized B 1c + profiles were compared using the relative standard deviation of the mean intensity projection computed as W r ¼ stdðb þ 1m Þ= < b þ 1m > and its relative absolute gradient jrb þ 1m j r ¼ jrb þ 1m j= < b þ 1m >. These projections were computed in z-direction for the slab Δz = 50mm.
As the final step, the comparison of an optimized and random multi-start was done for a high dimensional problem (N = 16 elements). The optimal phase vector search was done using the simulated elements B 1 + -maps of the array loaded by the numerical phantom P 2 (described below). The dependence of the achieved optimization result from the number of starting phasors N start was analyzed using 3 repetitions of NLS-optimization with N start = 7500. The cumulative maximum of the cost function was computed as max (F opt c [1..N start ]), where N start increments from 1 to 7500.
Coils, phantoms and MRI validation of electromagnetic simulation data. The mTXarray used for the validation consists of two physically independent parts each comprised of eight loop elements. The array is connected to the scanner via a dedicated RF-interface (Rapid Biomedical, Rimpar, Germany) which allows for using connected coils both in sTx and pTX modes. A detailed description of the array design and hardware specifications can be found in [22]. The driving voltage phases of elements can be adjusted in two ways: 1. Individually for each of 16 elements using phase-shifting coaxial cables connected to the dedicated sockets of the RF-interface.
2. Pairwise for every two elements, using 8 RFPA channels available on the scanner in pTX regime. Fig 2 shows a sketch of the array elements' configuration and location in both anterior and posterior parts along with the elements pairing scheme for driving by 8TX channels in pTX regime. Two phantoms we used for testing the proposed method both in the numerical simulations and in the experimental validation. Phantom P 1 comprised six rectangular cross-section PE bottles loading the posterior part, and a plexiglass sphere (diameter, 160mm) positioned inside the anterior part (Fig 2 left panel). The bottles and the sphere P 1 were filled with a solution of sugar and NaCl mixed in an experimentally defined proportion providing permittivity close to the typical muscle tissue (ε�58). The second phantom P 2 represents an acryl glass sphere with a diameter of 200mm filled with PVP solution prepared as described in [22] and [26].
The spherical phantom was chosen based on practical experience to minimize the number of destructive RF-interferences within the ROIs corresponding to the typical position of an animal's heart relative to the array. Simultaneously, this mimics real animal studies where some of the anterior array elements are more distant from the loading tissue than others.
All MRI measurements were performed using a 7T Magnetom (TM) Terra whole body 7T scanner (Siemens Healthineers, Erlangen, Germany) equipped with an 8TX channels RFPA. The FA measurement for the B 1 + -map reconstruction was done with the double angle mapping (DAM) technique based on a GRE sequence [27]. Additionally, the B 1 + -maps were crossvalidated with the vendor's B 1 + -mapping pulse sequence (turbo-FLASH with magnetization preparation [28]) available on the scanner. For the GRE-DAM measurements, the parameters were: TE/TR = 1.8/4000ms, pixel resolution 2mm 3 , slice thickness 4mm. The sequence parameters for the Turbo-FLASH B 1 + -mapping were TE/TR = 1.7/9000ms, FA = 10 0 for the same spatial resolution. The coronal slice providing the best overview of the Tx-profiles of the anterior array elements was chosen for comparison with the CST-simulated maps. The FA-map reconstruction was performed using an in-house developed Matlab script. The agreement of simulated and measured data was checked numerically by linear regression between normalized 1D-profiles computed along the lines crossing the most prominent features of simulated and measured 2D FA-maps in each channel. Testing the efficiency of the computed phases for B 1 + optimization was done using the same GRE-sequence as for DAM measurements. The initial "zero" phasеs of the array were set by manual adjustment geometrically guessed phase distribution [22]. To visualize the effect of B 1 + optimization the volumes inside the iso-surface SNR>30 were compared using 3D stack of GRE images. SNR was computed on pixel bases as SNR(x,y,z) = S i (x,y,z)/σ m . The S i denotes signal intensity in pixel and σ m is the mean standard deviation of the intensity of the pixels in the "noise pool" represented by 20x20 pixel regions in each corner of the FOV for all acquired slices.    Fig 7B shows the dependence of the cost function progression ratio in the optimization process, defined as F c ({F opt })/F c ({F 0 }), from the norm of the initial and the final phasor vector difference δF opt = kF opt −F 0 k. One can see that the length of the "search trajectory" δF opt is nearly constant (�2 rad) for the pre-optimized starting phasors. At the same time, for the random starting phasors one observes a widely spread distribution of δF opt in the range of 2 to 10 rad. The median value of the cost function grow-up rate performed with the pre-optimized starting phasors is �25% higher than that of random starting points (marked by dashed lines of corresponding colors). -field in the central sagittal and coronal slices of the optimized region for zero phase vector (left) and phase vectors found by random (middle) and proposed optimized multi-start optimization (right) are shown. The value of the normalized cost function F opt c =F 0 c , coefficient of variation and mean gradient demonstrate an improvement of B 1c + homogeneity achieved by the proposed method. Fig 10A and 10B show the GRE MR-images acquired using phase vector settings optimized for N = 8 channels in P 1 phantom. Phases vector were set using the pTX adjustment platform of the scanner simulating usage of the array in the "pTX Compatibility Mode". The mean SNR value in the optimized coronal region is improved by �60%. For the labeled on the panel (a) optimization region the �40% increase of the volume with SNR>30 was achieved as shown on panel (b). The average local gradient of SNR in the anterior-posterior direction was reduced by �50%. Fig 11 shows the experimental result of the optimization of 16 components phase vector for the phantom P 2 . An optimized phase vector for driving voltages of array elements was set using phase shifters (coaxial cables) connected to the dedicated sockets of the RF-interface.  + . An essential inhomogeneity of b 1m. + before the optimization is observed.

Discussion
In this paper, we demonstrated a relatively simple way to balance the computation time with the depth of optimization search targeted to reach the best possible homogeneity of the combined B 1 + -field achievable by the fine-tuning of the array by the developer before (or without) subject-specific adjustments in the scanner. The efficiency of the approaching to a globally optimal for the formulated optimization problem phase vector is expected to be close to the full phase space exhaustive search, however, achieved with a realistic computation time even for high-dimensional problems (N>16). To the best of our knowledge, the groups reporting the development of in-house designed transceiver arrays for UHF MRI usually use implicit or explicit pragmatic limitations on the depth of exploring of possible phase combinations in their approaches to the adjustment of the default phases. The results of this work demonstrate that the arbitrary set limitations on the number of starting vectors for the plain random multi-

PLOS ONE
start optimization may lead to achieving essentially lower B 1 + -field homogeneity than using the proposed optimized multi-start strategy with adequately defined phase space raster for Stage I. In particular, with N start = 1000 starting vectors and N = 8 optimized phase components the random multi-start optimization using the cost function based on the coefficient-ofvariation may lead to the relative gradient which is by factor 10 higher compared to the one achieved in the optimized multi-start. The analysis of the high dimensional problem (N = 16) confirms that arbitrary set limitations of the starting vectors set may limit the potentially achievable B 1 + -homogeneity. For example, as shown by Fig 8, for N start = 1000, the maximal cost function value achieved by the random-multi-start would be 20% lower than for the described strategy. The increased depth of the random multi-start search (up to the N start =  The problems of higher dimensions (N = 24..32 elements) should be preferably solved by stepwise iterative optimization. On each step, the dominant array's elements for specific ROI can be selected, optimized, and combined using found phase vector. This combined section will be considered as a single element for the optimization of the remaining (or next part of) elements as it was proposed in the work [29]. The experimental validation of the optimized phasing by MRI measurements confirms increased uniformity of the B 1 + leading to the improved homogeneity of the SNR in the optimized region. The results of testing the arrays for pig cardiac MRI with default B 1 -profiles optimized using the described technique were demonstrated both ex-vivo and in-vivo for the dedicated pigs arrays [22,26] and as well as for the human arrays [30,31]. The limitation of the current approach is that found globally optimal phase settings are valid in the mathematical context of the specific optimization problems i.e. "global maximum/minimum of the specific cost function(s) used in the optimization". The achieved B 1 is not necessarily optimal across the "global population" (animals or humans). Moreover, as we demonstrate in the work [30] the human model-based fixed phases optimization is most probably limited to a specific cohort (specific patient weight range and sex). Solution globally optimal to the population of human or animal subjects for the static B 1shimming would require using a machine or deep learning approach with a neuronal network trained to a large number of B 1 -maps representing the population [32]. Another approach providing B 1 + optimization which is global regarding a population is a dynamic pTX-based B 1 + shimming with using universal 3D kT pTX-pulses designed based on the multiple B 1 -maps acquired from the population [33]. Both these approaches, however, require the preliminary acquisition of patient-specific B1-maps using a fully operational array validated for SAR safety, whereas the described method is primarily targeted to the optimization at the hardware development and construction stage.

Conclusion
In this paper, we propose a time-efficient technique for the calculation of globally optimized hardware phases to be used for the construction and the subsequent optimization of transceiver arrays for UHF MRI. The proposed methodology combines the advantages of full coverage of phase space by an exhaustive search with high computational time efficiency of the solver-based optimization. Besides a final hardware adjustment of phases, the proposed approach allows for a rapid evaluating of the B 1 + -shimming capability of different array architectures "in silico". The proposed technique has demonstrated its efficiency in the In both stages using of GPU brings acceleration of the computations starting from the number of voxels in the optimization region exceeding~1300. One can notice, that the speed of CPU computation remains practically constant by a 6-fold increase of the optimization volume whereas for CPU computation the time increases linearly up to the number of voxels~2200 and even faster by larger arrays size. The CPU type used was AMD Ryzen 9 3950x/16 cores. GPU type: GeForce RTX 2080 Titan, 68 multiprocessors, 1.545GHz, Matlab compute capability index = 7.5. (TIF)