Figure 1.
Image acquisition and tomographic reconstruction.
(a) Single-tilt axis data acquisition geometry. The specimen is imaged in the microscope by tilting it over a typical range of [,
] or [
,
] in small tilt increments. The specimen can be considered as being composed of slices perpendicular to the tilt axis, as sketched. Hence, every projection image holds information about all the slices. (b) Three-dimensional reconstruction from projections with backprojection. The projection images are projected back into the volume to be reconstructed.
Figure 2.
Three-dimensional reconstruction with SIRT.
The reconstruction is progressively refined by minimizing the average error between the experimental and the calculated projections. (a) Calculation of projections from the volume at the current iteration. (b) Computation of the error with respect to the experimental projections. (c) Refinement of the volume by backprojection of the error. If and represents the current iteration, then
denotes the volume reconstructed in the previous iteration.
indicates that the volume generated in the previous iteration is taken into account to build the volume of the current iteration. In general, the volume is initialized to 0, that is,
.
Figure 3.
Three-dimensional reconstruction of Vaccinia virus.
Tomogram obtained with WBP (left) and 30 iterations of SIRT (right). A 1.64 nm thick XY plane of the 3D reconstruction is shown. The tilt-series contained images in the range 60 degrees at an interval of 2 degrees.
Figure 4.
All projection images are stacked and the 1D projections (or, simply, projections) that belong to the same slice (those between the vertical dotted lines) are grouped into a sinogram. This process is repeated for every slice. Therefore, there will be as many sinograms as slices.
Figure 5.
Source 1 and Source 2 are vector registers. Each one contains several data elements, which can be integer or real numbers. The same operation (op) is carried out between the two registers as indicated and the result is stored in another one (Destination).
Figure 6.
Arrangement of data for vector processing.
Four different slices are sketched (coloured in grey, blue, red and green), and also the four sinograms associated with them. Every sinogram is composed of two projections. (a) Native data layout. (b) Data arrangement to take advantage of vector units. (c) Vector processing between data elements.
Figure 7.
Each one represents four (working) threads as black parallel horizontal lines. ‘I’ stands for input (i.e. disk read), while ‘O’ means output (i.e. disk write). (a) is the static scheme. Note that every thread fills its input buffer before starting to reconstruct. (b) is the dynamic scheme. The manager is in charge of I/O operations. (c) is the dynamic with asynchronous I/O scheme. Here the manager is replaced by the so-called I/O threads (the reader and the writer). The reader needs to start shortly before the working threads in order to fill the input buffer. It finishes when there are no more sinograms to read. The writer begins when some slices have been reconstructed and finishes when it writes to disk the last pack of reconstructed slices.
Figure 8.
(a) Static scheme. All threads (T0, T1, T2, T3) are allowed to perform disk I/O. Here I/O buffers are private and every thread has an identical pre-assigned amount of slices to reconstruct. (b) Dynamic schemes. Disk I/O is not carried out by threads anymore. There exist a shared input buffer where threads go to look for work. Once reconstructed, the slices are put in the shared output buffer. In (a) and (b) sinograms and slices are allotted to threads in slabs of four. Slabs in grey have been already processed, while those coloured (green, red, blue, orange) are being processed. The white ones have not been used yet. (c) An I/O buffer. If the buffer holds sinograms, it is called ‘input buffer’. On the other hand, if it keeps slices, it is called ‘output buffer’. The number of entries in an input buffer does not need to match the number of entries in an output buffer.
Table 1.
Reconstruction times.
Figure 9.
Both WBP and SIRT have been taken into account. (a) Speedup provided by individual optimizations. Using the basic optimizations we reach a speedup slightly higher than 6x, with SSE instructions we are close to the theoretical 4x, and with two and four threads we can say that the speedup is linear with the number of cores. With eight threads it decreases a little. (b) Accumulated speedup. If we include basic optimizations and SSE instructions, the speedup is around 20x. When using two, four and eight threads, it rises above 40x, 80x and 160x, respectively.
Figure 10.
The larger the I/O buffers, the lower the ratio. The asynchronous I/O is the scheme with the lowest ratio, particularly when using two hard disks. In this case, it is very close to 1 with buffers of size 128 or 256, which means that almost all the I/O is being overlapped. Note that the ratio of the dynamic scheme is similar to that obtained by the static one, except for 256. Though we have seen that the dynamic approach is better, here we are including smaller volumes (4 GB versus 8 GB) and the differences between this two strategies are mitigated.
Figure 11.
The asynchronous I/O is the best of the three schemes, but in SIRT there is not a significant difference between using one or two hard disks. It can also be observed that buffers as big as in WBP are not needed to obtain good ratios. The dynamic scheme behaves the worst since it has the highest ratios.
Table 2.
Load balancing in WBP.
Table 3.
Load balancing in SIRT.
Table 4.
Comparison CPU vs. GPU (backprojection).
Table 5.
Comparison CPU vs. GPU (SIRT, 1 iteration).
Table 6.
Comparison with IMOD.