...

Class Averaging in Cryo-EM Single Particle Reconstruction

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Class Averaging in Cryo-EM Single Particle Reconstruction
Class Averaging in Cryo-EM Single
Particle Reconstruction
Zhizhen Zhao
A Dissertation
Presented to the Faculty
of Princeton University
in Candidacy for the Degree
of Doctor of Philosophy
Recommended for Acceptance
by the Department of
Physics
Adviser: Professor Amit Singer and Professor Michael Aizenman
September 2013
c Copyright by Zhizhen Zhao, 2013.
°
All rights reserved.
Abstract
Single-particle reconstruction of vitrified samples (cryogenic electron microscopy, or
cryo-EM) provides structural information for a large variety of biological molecules,
ranging from small proteins to large macromolecular assemblies in their native state,
without the need to produce crystals. However, for common-lines based ab initio
modeling, it is a great challenge to produce density maps, because of the low signal to noise ratio (SNR) of the projection images. A crucial step is alignment and
averaging of the 2D projection images, which are from similar viewing directions, a
procedure known as “class averaging”. Images from the same projection angles should
be identified, aligned and averaged to achieve a higher signal-to-noise ratio.
In Chapter 2, we introduce a set of rotationally invariant adaptive basis to compress and denoise the projection images. The global alignment followed by clustering
has been a common practice to generate class averages for the past 20 years. In Chapter 3, we show that due to the geometry and topology of the image data set, it is
impossible to globally align all images. The commonly used reference-free alignment
procedure unavoidably introduces errors. We then introduce a fast in-plane rotationally invariant viewing angle classification method for identifying, among a large
number of projection images, similar views without prior knowledge of the molecule
to alleviate the computational burden for pairwise alignment and classification. To
improve the initial classification results, we introduce the class averaging matrix in
Chapter 4. The spectral properties of class averaging matrix allow us to define new
metrics between projection images, which are more robust to noise. Class averaging
operator is a special case of vector diffusion maps (VDM), which takes into account
the linear transformation between data points. In Chapter 5, we discuss the application of VDM in class averaging. We are able to reconstruct ab initio 3D electron
density maps with high resolution for experimental data sets, using the class averaging
procedure described in this thesis and the common-lines reconstruction algorithm.
iii
Acknowledgements
I am greatly indebted to my excellent adviser Amit Singer for being a wonderful
mentor. He has been extremely helpful in scientific discussions and in giving me
advice. The work in this thesis has benefited greatly from his input and knowledge.
I would also like to thank Professor Michael Aizenman for the support, guidance and
freedom he has been offering me.
I am deeply grateful to all my teachers, friends, and colleagues. Without their
generous help at all stages ever since I started research, this would have been a
hopeless endeavor. I do not nearly have enough space here to give credit where credit
is due, but I would especially like to thank Professor Fred Sigworth and Professor
Joachim Frank for teaching me experimental details of cryo-EM and structural biology
in general; Professor Ronny Hadani and Professor Shamgar Gurevich for showing
me the magic of representation theory; Professor Yoel Shkolnisky, Professor Andrei
Osipov, Dr. Joan Bruna, Lanhui Wang, and Colin Ponce for sharing their codes. I
would also like to thank Professor Charles Fefferman, Professor Peter Jones, and Dr.
Hau-tieng Wu for all their help and time spent discussing with me. Other people who
have brightened my days at Princeton are my lovely friends, roommates, and others.
I would like to acknowledge my examining committee, Professor Joshua Shaevitz,
Professor David Huse and my advisor Amit for reviewing my work, and my second
reader, Professor Michael Aizenman for reading this dissertation.
I have to acknowledge Professor Thomas Duke, my undergraduate Director of
Studies, who unfortunately passed away last summer. He was an excellent teacher
and mentor and it is hard to exaggerate how much our discussions about (bio)physics,
inquiry and understanding meant to me.
Finally, I owe a great deal of gratitude to my parents, Pingyu Zhao and Liangshen
Jiang, for their love and support throughout my life.
iv
To my parents.
v
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
1 Introduction
1
1.1
Macromolecular machines and cryo-EM single particle reconstruction
1
1.2
Algorithmic procedure for single particle reconstruction . . . . . . . .
4
1.3
Review of current class averaging methods . . . . . . . . . . . . . . .
6
1.3.1
MSA/MRA classification . . . . . . . . . . . . . . . . . . . . .
9
1.3.2
Iterative reference-free alignment . . . . . . . . . . . . . . . .
10
1.3.3
Use of invariants . . . . . . . . . . . . . . . . . . . . . . . . .
12
Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.4
2 Fourier-Bessel Rotational Invariant Eigenimages
17
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2
Sampling criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
Fourier-Bessel expansion of images sampled on a Cartesian grid . . .
22
2.4
The sample covariance matrix . . . . . . . . . . . . . . . . . . . . . .
24
2.5
Algorithm and computational complexity . . . . . . . . . . . . . . . .
26
2.6
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
vi
2.7
Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . .
3 Rotationally Invariant Image Representation
32
36
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.2
Global alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.3
Bispectrum-like rotationally invariant features . . . . . . . . . . . . .
44
3.4
Nearest neighbor search and rotational alignment . . . . . . . . . . .
50
3.5
Shift alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
3.6
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
3.6.1
Simulated clean data . . . . . . . . . . . . . . . . . . . . . . .
54
3.6.2
Simulated noisy data . . . . . . . . . . . . . . . . . . . . . . .
57
Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . .
62
3.7
4 Class Averaging Matrix
64
4.1
Small-world graph on S 2 . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.2
Hairy ball theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
4.3
Spectral properties of the class averaging matrix. . . . . . . . . . . .
70
4.4
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
4.5
Probabilistic model and random matrix theory . . . . . . . . . . . . .
77
4.6
Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . .
79
4.6.1
Experiments with the probabilistic model . . . . . . . . . . . .
79
4.6.2
Experiments with noisy simulated images . . . . . . . . . . . .
81
4.6.3
Numerical comparison with diffusion maps . . . . . . . . . . .
87
4.6.4
Using more than three eigenvectors . . . . . . . . . . . . . . .
90
Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . .
92
4.7
5 Vector Diffusion Maps and Class Averages
94
5.1
Vector diffusion maps . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
5.2
Graph construction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
vii
5.3
Alignment using eigenvectors
5.4
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5
. . . . . . . . . . . . . . . . . . . . . . 102
5.4.1
Simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4.2
Experimental data . . . . . . . . . . . . . . . . . . . . . . . . 114
Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . . 118
6 Conclusions and Future Work
121
A PCA and Wiener filtering
125
A.1 PCA: The Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
A.2 Linear Wiener Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . 127
A.3 PCA in High Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . 129
A.4 Linear filtering in high dimension . . . . . . . . . . . . . . . . . . . . 131
B Parallel transport on the sphere and related operators
133
B.1 Common lines and orthographic lines . . . . . . . . . . . . . . . . . . 134
B.2 Common lines, orthographic lines, and parallel transport kernels. . . . 137
B.3 Global and local integral operators . . . . . . . . . . . . . . . . . . . 138
B.4 Spectral analysis of the local operators . . . . . . . . . . . . . . . . . 139
B.5 Complex structure of the parallel transport kernel. . . . . . . . . . . . 143
B.6 Spectral analysis of the complex local parallel transport operator . . . 145
Bibliography
149
viii
List of Tables
2.1
FBsPCA: Comparison of denoising effects. . . . . . . . . . . . . . . .
32
3.1
Timing: RANN vs brute force nearest neighbor search I. . . . . . . .
56
3.2
Simulated defocus values. . . . . . . . . . . . . . . . . . . . . . . . . .
58
3.3
Timing: RANN vs brute force nearest neighbor search II. . . . . . . .
61
ix
List of Figures
1.1
Cryo-EM imaging process. . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Experimental cryo-EM images. . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Illustration: class averaging. . . . . . . . . . . . . . . . . . . . . . . .
7
1.4
Our class averaging procedure. . . . . . . . . . . . . . . . . . . . . . .
16
2.1
Eigenvalues of Fourier-Bessel transform matrix. . . . . . . . . . . . .
23
2.2
Block diagonal structure of the covariance matrix. . . . . . . . . . . .
26
2.3
Simulated clean and noisy projection images. . . . . . . . . . . . . . .
28
2.4
Eigenimages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2.5
FBsPCA: Marčenko-Pastur distribution. . . . . . . . . . . . . . . . .
30
2.6
Eigenvalues of Fourier-Bessel steerable PCA on simulated images. . .
31
2.7
Comparison of image denoising effects. . . . . . . . . . . . . . . . . .
33
3.1
Illustration: geometric structure of a projection image. . . . . . . . .
39
3.2
Alignment errors with iterative reference-free alignment. . . . . . . .
42
3.3
Illustration: geometric structure of rotationally invariant features. . .
49
3.4
Simulated 70S ribosome projection images. . . . . . . . . . . . . . . .
54
3.5
Results for 104 simulated clean and centered projection images. . . .
56
3.6
Results for 105 simulated clean and centered projection images. . . .
57
3.7
Classification for 104 simulated noisy projection images. SNR= 1/50.
59
3.8
Alignment errors for 104 noisy projection images. SNR= 1/50. . . . .
60
x
3.9
SPIDER classification cluster size. SNR= 1/50. . . . . . . . . . . . .
60
3.10 Classification for 105 noisy images: accuracy of RANN. . . . . . . . .
61
4.1
Ring graph and small-world graph. . . . . . . . . . . . . . . . . . . .
66
4.2
Eigenvalues of class averaging matrix. . . . . . . . . . . . . . . . . . .
74
4.3
Results with probabilistic model, n = 1000, β = 45.6◦ . . . . . . . . . .
82
4.4
Results with probabilistic model, n = 40, 000, β = 18.2◦ . . . . . . . .
83
4.5
Simulated images at different SNRs. . . . . . . . . . . . . . . . . . . .
84
4.6
dij with 40 nearest neighbors. . . . . . . . . . . . . . . . . . . . . . .
86
4.7
Bar plots of {λi }10
i=1 of H̃. . . . . . . . . . . . . . . . . . . . . . . . .
87
4.8
Scatter plots of Gij . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
4.9
Bar plots of {λi }10
i=1 of Ã. . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.10 Scatter plots of G′ij . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.11 SNR = 1/40: H̃ vs Ã. . . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.12 SNR= 1/64: Comparing results with H̃, Ã, and d. . . . . . . . . . . .
91
5.1
Illustration for Vector Diffusion Map (VDM) affinity. . . . . . . . . .
96
5.2
VDM angular alignment. . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3
Contrast transfer functions. . . . . . . . . . . . . . . . . . . . . . . . 107
5.4
Results for 104 simulated centered noisy data. . . . . . . . . . . . . . 108
5.5
Simulated images with colored noise. . . . . . . . . . . . . . . . . . . 110
5.6
Noise background estimation. . . . . . . . . . . . . . . . . . . . . . . 111
5.7
Results for 104 prewhitened simulated images with colored noise. . . . 112
5.8
Results for 104 noisy non-centered images. . . . . . . . . . . . . . . . 113
5.9
Reconstruction from class averages with non-centered images. . . . . 114
5.10 70S ribosome: raw images and class averages. . . . . . . . . . . . . . 115
5.11 70S ribosome: reconstructed volumes. . . . . . . . . . . . . . . . . . . 116
5.12 70S ribosome: Fourier shell correlation. . . . . . . . . . . . . . . . . . 117
xi
5.13 50S ribosomal subunit: raw images and class averages. . . . . . . . . 117
5.14 50S ribosomal subunit: reconstructed volumes. . . . . . . . . . . . . . 119
5.15 50S ribosomal subunit: Fourier shell correlation. . . . . . . . . . . . . 120
xii
Chapter 1
Introduction
1.1
Macromolecular machines and cryo-EM single
particle reconstruction
Many essential processes that occur in the cell, including transcription, translation,
protein folding, and protein degradation, are all carried about by molecular machines.
Bruce Alberts popularized the concept of molecular “machines” through a programmatic essay in the journal Cell in 1998 [2]. “Machine” is useful as a concept and it
stands in sharp contrast to the long-held view of the cell as a sack, or compendium of
sacks, in which molecules engage and disengage one another more or less randomly.
In invoking mechanistic imagery, it is also, finally, a view that invites physicists to
employ their tools and creative minds in developing models that take into account the
reality of the nanoworld in a quantitative way. Thus Bruce Alberts is, as invitations
go, the third in the past century, with Werner Schrödinger and Richard Feynmnan
having been the first and second to usher their colleagues into tackling the grand
challenge posed by biology.
To approach the subject, we must foremost know the structure of the static molecular machine at the atomic level, as a precondition for making sense of its behavior
1
and going beyond mere phenomenological description. Therefore determining 3D
macromolecular structures for large biological molecules remain vitally important, as
witnessed, for example, by the 2003 Nobel Prize in Chemistry, co-awarded to MacKinnon for resolving the 3D structure for Shaker K + channel proteins [20, 55], and by
the 2009 Nobel Prize in Chemistry, awarded to Ramakrishnan, Steitz, and Yonath
for studies of the structure and function of the ribosome. X-ray crystallography, a
well matured technique by now, has provided us with an ever-increasing inventory of
structures-either complete structures with and without their functional ligands or, if
unavailable, structures of isolated components. The solution of the bacterial ribosome
structure, has given us unparalleled insights into the architecture of one of the most
complex molecular machines in nature. However, the challenge in X-ray crystallography is often more in the crystallization itself than in the interpretation of the X-ray
results, since many large proteins have so far withstood all attempts to crystallize
them. Cryogenic electron microscopy (cryo-EM) single particle reconstruction seems
the most promising alternative for molecules that defy crystallization [24].
In cryo-EM, functionally active macromolecular complexes are prepared in vitro.
They are stalled by chemical means (antibiotics, GTP non-hydrolyzable analogs, etc.),
placed on a grid, and rapidly frozen by immersion into liquid ethane at liquid-nitrogen
temperature (77K). Images are recorded either on film (to be subsequently scanned)
or electronically by means of a CCD camera. Current CCD cameras that are affordable in price have a size of maximally 4, 000 × 4, 000 pixels. Voltage being used
are normally in the range of 200 to 300kV. Voltages lower than 200kV are avoided
because of the increase in radiation damage, whereas voltages higher than 300kV lead
to diminishing contrast and inefficient electron recording.
The cryo-EM imaging process produces a large collection of tomographic projections of the same molecule, corresponding to different and unknown projection
orientations. The goal is to reconstruct the 3D structure of the molecule from such
2
Figure 1.1: Schematic drawing of the imaging process: Every projection image corresponds to some unknown 3D rotation of the unknown molecule.
unlabeled projection images, where data sets typically range from 104 to 105 projection images whose size is roughly 100 × 100 pixels. The intensity of the pixels in a
given projection image is proportional to the line integrals of the electric potential
induced by the molecule along the path of the imaging electrons (see Figure 1.1).
The highly intense electron beam destroys the frozen molecule, and it is therefore
impractical to image the same molecule at known different directions as in the case of
classical CT. In other words, a single molecule can be imaged only once, rendering an
extremely low signal-to noise ratio (SNR) for the images (see Figure 1.2 for a sample
of real microscope images), mostly due to shot noise induced by the maximal allowed
electron dose (other sources of noise include the varying thickness of the ice layer
and partial knowledge of the contrast transfer function of the microscope). In the
basic homogeneity setting considered hereafter, all imaged molecules are assumed to
have the exact same structure; they differ only by their spatial orientation. Every
3
image is a projection of the same molecule from an unknown random direction, and
the cryo-EM problem is to find the 3D structure of the molecule from a collection of
noisy projection images.
Figure 1.2: A collection of four real electron microscope images of E. coli 70S ribosome.
1.2
Algorithmic procedure for single particle reconstruction
The algorithmic procedure for single particle reconstruction can be summarized as
follows,
1. Particle selection: single particle images of size about 100×100 pixels are boxed
out from micrograph either manually or automatically.
2. Class averaging: images of similar views are aligned and averaged to achieve
higher signal to noise ratio.
3. Orientation determination: an ab initio estimation of the pose parameters are
obtained using the random conical tilt technique [80], or common-lines based
approaches, like angular reconstitution with common-lines [106].
4. 3D tomographic inversion: a 3D volume is generated by 3D tomographic inversion algorithm.
4
5. Iterative refinement: each image is matched with the re-projected images from
the reconstructed volume of the last iteration to determine pose parameters for
new reconstruction in step 4 until it reaches the convergence.
Numerous technical and computational advances have contributed to the different
processing steps in SPR. Instead of manually picking tens or hundreds of thousands
of single macromolecules from the micrographs, the particles can be automatically
selected, for example, using method described in [82]. Recent advances in particle
selection algorithms based on template matching, recent advances computer vision,
neural network, or others are reviewd in [124].
Various methods are used to generate class averages [109, 105, 89, 70, 72], of which
two most commonly used are multivariate statistical analysis (MSA)1 [109, 105] with
multi-reference alignment (MRA) [21] and iterative reference-free alignment [70] with
K-means clustering [72].
For orientation estimation, although the orientation information is easy to be determined in random conical tilt, its 3D reconstructions suffer from missing information
resulting from a restricted range of orientations. Common-lines based reconstruction [106, 96, 97] include information of the projection images from all orientations.
However, since single particle image is of very low signal-to-noise ratio, it is extremely
difficult to detect common lines. Therefore images from the nearby views need to be
identified, aligned and averaged to improve the image quality.
The 3D tomographic inversion is either computed by weighted joint operator [37,
79, 77, 71], or by using an iterative approach for inversion [30, 28, 59]. The initial
model is then iteratively refined [69] in order to obtain a higher-resolution 3D reconstruction. In each iteration of the refinement process, the images are aligned with
references that are re-projected from the current 3D model for finding new estimation
of the pose parameters. The new pose parameters are then used to produce a refined
1
This is the direct application of PCA or correspondence analysis. The term multivariate statistical analysis, often used in cryo-EM, is too broad in its current usage in statistics.
5
3D model using the 3D tomographic inversion. This process is repeated for several
times until convergence.
1.3
Review of current class averaging methods
To each projection image I there corresponds a 3 × 3 unknown rotation matrix R
describing its orientation (see Figure 1.1). Excluding the contribution of noise, the
intensity I(x, y) of the pixel located at (x, y) in the image plane corresponds to the
line integral of the electric potential induced by the molecule along the path of the
imaging electrons: that is,
I(x, y) =
Z
∞
φ(xR1 + yR2 + zR3 )dz,
(1.1)
−∞
where φ : R3 7→ R is the electric potential of the molecule in some fixed ”laboratory” coordinate system. The projection operator (1.1) is also known as the X-ray
transform [64].
We therefore identify the third column R3 of R as the imaging directions, also
known as the viewing angle of the molecule. We will often refer to the viewing angle
of R as v: that is, v : SO(3) 7→ R3 is given by v = v(R) = R3 . The viewing angle v
can be realized as a point on S 2 (the unit sphere in R3 ) and can therefore be described
using two parameters.
The first two columns R1 and R2 from an orthonormal basis for the plane in R3
perpendicular to the viewing angle v. All clean projection images of the molecule that
share the same viewing angle v look the same up to some in-plane rotation. That is,
if Ri and Rj are two rotations with the same viewing angle v(Ri ) = v(Rj ), then Ri1 ,
Ri2 and Rj1 , Rj2 are two rotations with the same viewing angle v(Ri ) = v(Rj ), then
Ri1 , Ri2 and Rj1 , Rj2 are two orthonormal bases for the same plane and the rotation
matrix Ri−1 Rj has the form
6
(a) Clean image
(b) Ii
(c) Ij
(d) Average
Figure 1.3: The class averaging problem is to find, align, and average images with
similar viewing angles: (a)a clean simulated projection image of the ribosomal subunit
generated from its known density map; (b) noisy instance of (a), denoted Ii , obtained
by the addition of white Gaussian noise. For the simulated images we chose the SNR
to be higher than that of experimental images in order for image features to be clearly
visible; (c) noisy projection, denoted Ij , of the ribosome taken at the same viewing
angle but with a different in-plane rotation. The in-plane rotation angle is αij = π/2,
because image Ij needs to be rotated by π/2 in the clockwise direction in order to
be aligned with Ii (see also text following (1.3)); (d) averaging the noisy images (b)
and (c) after in-plane rotational alignment. The class average of the two images has
a higher SNR than that of the noisy images (b) and (c), and it has better similarity
to the clean image (a).


 cos αij − sin αij 0 



Ri−1 Rj = 
 sin αij cos αij 0  ,


0
0
1
(1.2)
Class averaging is the grouping of a large data set of n noisy raw projection
images I1 , . . . , In into clusters, such that images within a single cluster have similar
viewing angles (it is possible to artificially double the number of projection images by
including all mirrored images). Averaging aligned noisy images within each cluster
results in “class averages”; these images enjoy higher signal to noise ratio and are
used in later cryo-EM procedure such as the angular reconstitution procedure [106]
that requires better quality images. Finding consistent class averages is challenging
due to the high level of noise in the raw images as well as the large size of the image
data set. A sketch of the class averaging procedure is shown in Figure 1.3.
7
Penczek, Zhu, and Frank [72] introduced the rotationally invariant K-means clustering procedure to identify images that have similar viewing angles. Their invariant
distance dij between image Ii and image Ij is defined as the Euclidean distance between the images when they are optimally aligned with respect to in-plane rotations
(assuming the images are centered):
dij = min kIi − R(α)Ij k,
α∈[0,2π)
(1.3)
where R(α) is the rotation operator of an image by an angle α in the counterclockwise
direction. Prior to computing the invariant distance of (1.3), a common practice is
P
to center all images by correlating them with their total average n1 ni=1 Ii , which is
approximately radial (i.e., has little angular variation) due to the randomness in the
rotations. The resulting centers usually miss the true centers by only a few pixels
(as can be validated in simulations during the refinement procedure). Therefore, as
in [72], we also choose to focus mainly on the more challenging problem of rotational
alignment by assuming that the images are properly centered. Other methods for
class averaging are reviewed in [110].
Historically, 2D averaging of molecules presenting the same view are aligned by
cross-correlation [23, 88]. Averaging presents a fast way of gauging the quality of
the data and improves the signal-to-noise ratio of the raw images. Usually it is very
hard to come up with good references for class averaging procedure. Aligning a set
of images with respect to a reference image using correlation procedure will bias the
data set towards the properties of the reference image used [7, 108, 107, 31]. Thus,
‘reference-free’ alignment procedures are preferred for obtaining various typical views
present in the data. The most commonly used method implemented in IMAGIC [111]
is the reference-free alignment by classification proposed by Dube et al. [21]. The other
8
mostly cited method implemented in SPIDER [25, 93] was proposed by Penczek et
al. [70], which was termed iterative reference-free alignment.
In the proceeding three subsections, I review the current popular methods for
generating class averages. Given a stack of crudely centered images. The number
of classes is related to the average number of images per class one needs. Usually
it is parameter that can be tuned by observing the quality of the classes. Once
classification has finished, the images in each class are averaged to generate class
means.
1.3.1
MSA/MRA classification
MSA/MRA classification first uses multivariate statistical analysis (MSA) to compress and denoise the image data set, typically with 24 − 69 eigenimages. The full
mathematics of the MSA procedures has been described in [8]. The method was
implemented in IMAGIC [111]. After the MSA data compression, an automatic hierarchical ascendant classification (HAC) in combination with a moving elements
post-processor is performed. HAC algorithm is aimed at minimizing the internal
variance of the classes (“intra-class variance”) while at the same time maximizing the
inter-class variance between the means of the class averages.
From those class means, by observing the gallery of class means, one hopefully is
able to select a number of different views. The number of references is kept small
at this stage of the analysis. Those references are used to perform multi-reference
alignment (MRA) [113]. The (tens-of-thousands of) individual projection images are
aligned and correlated with the references with correlation coefficients (or some similarity measure). The best aligned reference for each image is selected. After alignment
in each class has been completed, a new round of MSA compression and classification
on aligned data set is performed to generate new class averages. The MRA alignment,
9
MSA compression and hierachical classification procedure are iterated until one gets
stable results in classification.
1.3.2
Iterative reference-free alignment
Penczek et al. introduced iterative reference-free alignment that avoids choosing
“reference”. They generalized the alignment between two images to a set of n images
that Frank et al. [26] proposed the following definition: a set of n images is aligned
if all images are pairwise aligned. Alignment of such a set I = {Ii ; i = 1, . . . , n} is
achieved if the functional
L(I, S) =
Z X
n−1 X
n
i=1 j=i+1
[Ii (r; siα , six , siy ) − Ij (r; sjα , sjx , sjy )]2 dr
(1.4)
is minimized by appropriate choice of the set of the 3n parameters: S = [siα , six , siy ; i =
1, . . . n].
Penczek and coworkers showed that minimization of equation (1.4) is equivalent
to minimization of another functional:
L̃(I, S) =
Z X
n−1
where
I¯i (r) =
i=1
[Ii (r; siα , six , siy ) − I¯i (r)]2 dr,
n
X
1
Ij (r; sjα , sjx , sjy ).
n − 1 j=1;j6=i
(1.5)
(1.6)
In this reformulated expression, each image numbered i is aligned to the average of
all images except itself. An iterative algorithm based on this reformulated expression
are summarized in two steps. In the first step, which is the random approximation
of the global average, proceeds as follows
1. Pick two images Ii and Ij at random from the set I of images and align them.
10
2. Initialize the global average2 a2 = Ii
L
Ij and set a counter m to 3.
3. Pick the next image Ik from the set of n − m + 1 remaining images. Align Ik
and am−1 .
4. Update the global average to give am = [Ii
L
(m − 1)am−1 ]/m.
5. Increase counter m by 1. If m = n, then stop, else go to step 3.
The second part of the algorithm performs an iterative refinement of the average
A = an :
1. Set counter m = 1.
2. Create the modified average A′ by subtracting the current image Ij in its current
position, A′ = (nA − Ij )/(n − 1).
3. Align Ij with A′ .
4. Update the global average A = [Ij
L
(n − 1)A′ ]/n.
5. Increase m by 1. If m < N then go to step 2.
6. If in step 3 any of the images changed its position significantly, then go back to
step 1, else stop.
Therefore, the algorithm iteratively estimates the global average and the alignment
parameters siα , six , siy , for i = 1, . . . , n. The aligned images are then used with Kmeans clustering to classify the data into K clusters [72]. In Chapter 3, we note that
global alignment methods, like iterative reference-free alignment, introduce errors
because of the geometrical and topological structures of the data sets.
2
general
L term to be denoted by ak , where k is the number of images that are averaged over and
we use
to denote adding the aligned images.
11
1.3.3
Use of invariants
Several invariant features were proposed for viewing angle classification, for example,
autocorrelation function (ACF), double autocorrelation function (DACF) [89]. Following Schatz and van Heel’s procedure, the ACF of an image in Cartesian grid is
first re-sampled on a polar grid. Then the rotational ACF is obtained by computing
1D fast Fourier transforms and conjugate scalar Fourier products along concentric
rings. Later, Schatz and van Heel [90, 114] introduced other functions, which were
termed self correlation function (SCF) and double self-correlation function (DSCF),
that avoids the problem caused by the twofold squaring of intensities in the computation of the ACF and DACF. Through the use of either DACF or DSCF, classification
of the aligned, original images can be replaced by a classification of their invariants,
which are formed from the images without prior alignment and alignment needs only
to be applied separately, after classification of the invariants to images within each
homogeneous subset. However, those invariants had been criticized for losing phase
information and not discriminate enough for accurate classification.
1.4
Thesis outline
In this thesis, we introduce new algorithms for identifying noisy cryo-EM images of
nearby viewing angles. The main advantage of our class averaging method is its extreme robustness to noise and it is efficient in terms of running time and memory
requirements. It is worth noting that the specific choice of metric to measure proximity between images can make a big difference in class averaging. The cross-correlation
and Euclidean distance are by no means optimal measures of proximity. In practice,
it is common to denoise the images prior to computing their pairwise distances. A
popular smoothing scheme is to convolve the images with a Gaussian kernel, and
other linear and nonlinear filters are also used. Filtering can have dramatic effect on
12
finding meaningful class averages. In Chapter 2, we present an efficient and accurate
algorithm to build adaptive basis through principal component analysis for a large
set of two-dimensional images, and, for each image, the set of its uniform rotations in
the plane and its reflection. The algorithm starts by expanding each image, originally
given on a Cartesian grid, in the Fourier-Bessel basis for the disk. Because the images
are essentially bandlimited in the Fourier domain, it is important to use a sampling
criterion to truncate the Fourier-Bessel expansion such that the maximum amount
of information is preserved without the effect of aliasing. The constructed covariance
matrix is invariant to rotation and reflection and has a special block diagonal structure. PCA is efficiently done for each block separately. This Fourier-Bessel based
PCA detects more meaningful eigenimages and has improved denoising capability
compared to traditional PCA for a finite number of noisy images.
The invariant distance (1.3) is invariant to in-plane rotations and as such it induces
a metric on the viewing angle space S 2 . In Chapter 3, we describe some methods to
reduce the computational burden for computing the rotationally invariant distance
and alignment. The invariant distance between noisy images that share the same
viewing angle (with perhaps a different in-plane rotation) is expected to be small.
Ideally, all neighboring images of some reference image Ii in a small invariant distance
ball centered at Ii should have similar viewing angles, and averaging such neighboring
images (after proper rotational alignment) would amplify the signal and diminish the
noise.
Unfortunately, due to the low SNR, it often happens that two images of completely
different viewing angles might have a small invariant distance. This can happen when
the realizations of the noise in the two images match well for some random in-plane
rotational angle, leading to spurious neighbor identification. Therefore, averaging the
nearest neighbor images can sometimes yield a poor estimate of the true signal in
the reference image. Clustering algorithms, such as the K-means algorithm, perform
13
much better than this naı̈ve nearest neighbor averaging, because they take into account all pairwise distances, not just distances to the reference image. Such clustering
procedures are based on the philosophy that images that share a similar viewing angle
with the reference image are expected to have a small invariant distance not only to
the reference image but also to all other images with similar viewing angles. This observation was utilized in the rotationally invariant K-means clustering algorithm [72].
Such clustering algorithms make it harder for spurious neighbors to sneak their way
into the neighborhood. Still, the rotationally invariant K-means clustering algorithm
may suffer from misidentifications at the low SNR values present in experimental
data.
Is it possible to further improve the detection of neighboring images at even lower
SNR values? In Chapter 4, we provide a positive answer to this question. First, we
note that the rotationally invariant distance neglects an important piece of information, namely, the optimal angle that realizes the best rotational alignment. Second, we
observe that these optimal rotation angles must satisfy a global system of consistency
relations, namely, that if three projection images correspond to similar viewing angles, then the three optimal rotation angles should add up to 0 modulo 2π. Based on
these observations, we use the optimal in-plane rotation angles to construct a sparse
n × n Hermitian matrix, which we call “the class averaging matrix,” and show how
to identify nearby viewing directions from the top three eigenvectors of the matrix.
In [95], an algorithm was presented for estimating angles from noisy measurements
of their offsets modulo 2π using the top eigenvector of a Hermitian matrix constructed
in exactly the same way the class averaging matrix is constructed. The reason why
three eigenvectors (instead of one) are needed here is due to the special topology of the
sphere S 2 as rendered in the “Hairy Ball” theorem that says that a continuous tangent
vector field to the sphere must vanish at some point. We also conducted numerical
experiments that demonstrate how the identication of nearby viewing directions can
14
be improved by using more than three eigenvectors. Finally, we show that the class
averaging matrix is a discretization of a local version of the parallel transport operator
on the sphere. This interpretation leads to a complete understanding of the spectrum
of the class averaging matrix and to a rigorous proof of the admissibility (correctness)
of the algorithm presented in Chapter 4. The complete spectral analysis is presented
in [33].
In Chapter 5, we extend the class averaging matrix and use vector diffusion
maps [98] to analyze images without the assumption that the viewing angles are
uniformly distributed over the sphere. This new framework is based on the heat kernel of the connection Laplacian over a given manifold, which provides a new notion
for “affinity between data points”. It helps us to better understand how to reorganize
spatially noisy data. We managed to have common-line based ab initio reconstructions with high resolution from the class averages of experimental data sets. Figure 1.4
shows the pipeline for class averaging procedure in this thesis.
15
Figure 1.4: Schematic diagram of our class averaging procedure for single particle
reconstruction.
16
Chapter 2
Fourier-Bessel Rotational Invariant
Eigenimages
2.1
Introduction
Principal component analysis (PCA) is a classical method for dimensionality reduction, compression and denoising. The principal components are the eigenvectors of
the sample covariance matrix. In image analysis, the principal components are often
referred to as “eigenimages” and they form a basis adaptive to the image set. In
some applications, including all planar rotations of the input images for PCA is advantageous. For example, in single particle reconstruction (SPR) using cryo-electron
microscopy [24], the 3D structure of a molecule needs to be determined from many
noisy 2D projection images taken at unknown viewing directions. PCA, known in
this field as multivariate statistical analysis (MSA) is often a first step in SPR [109].
Inclusion of the rotated images for PCA is desirable, because such images are just as
likely to be obtained in the experiment, by in-plane rotating either the specimen or
the detector. When all rotated images are included for PCA, then the eigenimages
have a special separation of variables form in polar coordinates in terms of radial
17
functions and angular Fourier modes [38, 73, 103]. It is easy to steer the eigenimages
by a simple phase shift, hence the name “steerable PCA”. Computing the steerable
PCA efficiently and accurately is however challenging.
The first challenge is to mitigate the increased computational cost associated
with replicating each image multiple times. Efficient algorithms for steerable PCA
were introduced in [41, 76] with computational complexity almost similar to that
of traditional PCA on the original images without their rotations. Current efficient
algorithms for steerable PCA first map the images from Cartesian grid to polar grid,
using, e.g., interpolation. Because the transformation from Cartesian to polar is
not unitary, the eigenimages corresponding to images mapped to polar grid are not
equivalent to transforming the original eigenimages from Cartesian to polar.
The second challenge is associated with noise. The non-unitary transformation
from Cartesian to polar changes the noise statistics. Interpolation transforms additive
white noise to correlated (i.e., colored) noise. As a consequence, spurious eigenimages and eigenvalues corresponding to colored noise may arise in the eigen-analysis.
The naı̈ve algorithm for steerable PCA that replicates each image multiple times also
mistreats the noise: While the realization of noise is independent between the original images, the realization of noise among duplicated images is dependent. This in
turn can lead to noise-induced spurious eigenimages and to artifacts in the leading
eigenimages.
We present a new efficient and accurate algorithm for computing the steerable
PCA that meets these challenges. This is achieved by combining into the steerable
PCA framework a sampling criterion similar to the criterion proposed by Klug and
Crowther [48] in a different context of reconstructing a 2D image from its 1D projections. We represent the images in a truncated Fourier-Bessel basis, in which the
number of radial components is decreasing with the angular frequency. Our sampling
criterion implies that the covariance matrix of the images has a block diagonal struc18
ture where the block size decreases as a function of the angular frequency. The block
diagonal structure of the covariance matrix was observed and utilized in previous
works on steerable PCA. However, while in all existing methods for steerable PCA
the block size is constant, here the block size shrinks with the angular frequency. The
incorporation of the sampling criterion into the steerable PCA framework is the main
contribution of this chapter.
2.2
Sampling criterion
We assume that the set of images correspond to spatially limited objects. By appropriate scaling of the pixel size, we can assume that the images vanish outside a
disk of radius 1. The eigenfunctions of the Laplacian in the unit disk with vanishing
Dirichlet boundary condition are the Fourier-Bessel functions. Hence, they form an
orthogonal basis to the space of squared-integrable functions over the unit disk, and
it is natural to expand the images in that basis. The Fourier-Bessel functions are
given by
ψ kq (r, θ) =



Nkq Jk (Rkq r)eıkθ , r ≤ 1


0,
(2.1)
r > 1,
where Nkq is a normalization factor; Jk is the Bessel function of integer order k; and
Rkq is the q th root of the equation
Jk (Rkq ) = 0.
(2.2)
The functions ψ kq are normalized to unity, that is
Z
0
2π
Z
1
ψ kq ψ kq r dr dθ = 1.
0
19
(2.3)
The normalization factors are given by
Nk,q = √
1
π|Jk+1 (Rkq )|
.
(2.4)
We use here the following convention for the 2D Fourier transform of a function
f in polar coordinates
F(f )(k0 , φ0 ) =
Z
2π
0
∞
Z
f (r, θ)e−2πık0 r cos(θ−φ0 ) r dr dθ.
(2.5)
0
The 2D Fourier transform of the Fourier-Bessel functions, denoted F(ψ kq ), is given
in polar coordinates as
√
F(ψ kq )(k0 , φ0 ) = 2 π(−1)q (−ı)k Rkq
Jk (2πk0 )
eıkφ0 .
2
(2πk0 )2 − Rkq
(2.6)
This result is typically derived using the Jacobi-Anger identity
ız cos θ
e
=
∞
X
ın Jn (z)eınθ .
(2.7)
n=−∞
Notice that the Fourier transform F(ψ kq )(k0 , φ0 ) vanishes on concentric rings of
radii k0 =
ring k0 =
Rkq′
2π
Rkq
,
2π
with q ′ 6= q. The maximum of |F(ψ kq )(k0 , φ0 )| is obtained near the
as can be verified from the asymptotic behavior of the Bessel functions
(cf. eq. (2.9)). For images that are sampled on a squared Cartesian grid of size
2L × 2L pixels, with grid size 1/L (corresponding the square [−1, 1] × [−1, 1]) the
sampling rate is L and the corresponding Nyquist frequency (the bandlimit) is
L
.
2
Due to the Nyquist criterion, the Fourier-Bessel expansion requires components for
which
L
Rkq
≤ ,
2π
2
20
(2.8)
because other components represent features beyond the resolution and their inclusion
would result in aliasing. The asymptotic behavior of the Bessel functions
Jk (Rkq r) ∼
s
kπ π
2
cos(Rkq r −
− )
πRkq r
2
4
(2.9)
for Rkq r ≫ |k 2 − 14 |, suggests that the roots are asymptotically
Rkq ∼
π
1
(k + 2q − ),
2
2
(2.10)
which is indeed the first term of the asymptotic expansion for the roots (see [56]).
Eqs. (2.8) and (2.10) lead to the sampling criterion
1
k + 2q ≤ 2L + .
2
(2.11)
In practice, we do not rely on the asymptotic formula (2.10). Instead, we find the
roots of the Bessel functions numerically, and check directly which components satisfy
the criterion Rkq ≤ πL. The number of components satisfying |k| + 2q ≤ 2L +
1
2
is
approximately 2L2 , which is smaller than πL2 (the number of pixels inside the unit
disk) by a factor of
2
.
π
We remark that the cut-off criterion (2.11) is similar to the
criterion in [48] but is not identical to it. Specifically, the criterion in [48] has a
different cut-off value for k + 2q. The difference stems from the fact that [48] studies
a different problem, namely, the 2D reconstruction problem of an image from its 1D
line projections.
21
2.3
Fourier-Bessel expansion of images sampled on
a Cartesian grid
Suppose I1 , . . . , In are n images sampled on a Cartesian grid. We denote by I˜i the
continuous approximation of the i’th image in terms of a truncated Fourier-Bessel
expansion including only components satisfying the sampling criterion, namely
I˜i (r, θ) =
pk
2L X
X
aik,q ψ kq (r, θ),
(2.12)
k=−2L q=1
where for each k, pk denotes the number of components satisfying Rkq ≤ πL. We
choose the expansion coefficients aik,q that minimize the squared distance between Ii
and I˜i , with the latter restricted to the grid points, that is, aik,q are the least squares
solution to the overdetermined linear system obtained from eq. (2.12) by evaluating
the images and the Fourier-Bessel functions at the grid points. More specifically, let
Ψ be the matrix whose entries are evaluations of the Fourier-Bessel functions at the
grid points, with rows indexed by the grid points and columns indexed by angular
and radial frequencies. Then, the coefficient vector ai of the i’th image is the solution
of minai kΨai − Ii k2 , given by
ai = (Ψ† Ψ)−1 Ψ† Ii .
(2.13)
The orthogonality of the Fourier-Bessel functions on the disk does not necessarily
imply that their discretized counterparts are orthogonal. That is, the matrix Ψ† Ψ
may differ from the identity matrix. The columns of the matrix Ψ are guaranteed
to approach orthogonality only as the grid is indefinitely refined (e.g, as L → ∞).
In practice, we have numerically observed that the columns of Ψ are approximately
orthogonal already for moderate values of L. In particular, white noise remains
approximately white after the transformation, since the eigenvalues of (Ψ† Ψ)−1 are
22
1
1.1
0.95
1
i
1.2
λ
i
λ
1.05
0.9
0.9
0.85
0.8
0.8
0
20
i
40
0
(a) L = 6
2000
i
4000
(b) L = 60
Figure 2.1: Eigenvalues of (Ψ† Ψ)−1 , where Ψ is the truncated Fourier-Bessel transform
for L = 6 and L = 60 pixels respectively. This is also the spectrum of transformed
white noise images (see text). The eigenvalues are close to 1, indicating that the
discretized Fourier-Bessel transform is approximately unitary and that white noise
remains approximately white. For L = 60, among the eigenvalues, only 18 are above
1.05 and 34 are below 0.95. For L = 6, among the eigenvalues, there are 6 below 0.95.
close to 1 (see Figure 2.1). Indeed, for white noise images, the mean and covariance
satisfy E[Ii ] = 0 and E[Ii Ii† ] = σ 2 I, where σ 2 is the noise variance and I is the identity
matrix. Therefore, eq. (2.13) and the linearity of expectation yield E[ai ] = 0 and
†
E[ai ai ] = σ 2 (Ψ† Ψ)−1 .
In the previous work [76] on fast computation of steerable principal components
for large data set, images are re-sampled on a polar grid by the polar Fourier transform. The polar Fourier transform is non-unitary and therefore causes artifacts in
the eigenimages. Suppose the linear transformation is described by the matrix A.
The sample covariance matrix CA of the transformed images is related to the sample
covariance matrix C of the original images by
CA = ACA† .
(2.14)
Unless A is unitary, there is no simple way to transform the eigenvectors of ACA† to
the eigenvectors of C. The advantage of using the Fourier-Bessel transform with the
23
sampling criterion that adapts to the band limit of the images is that such transform
is approximately unitary (Fig. 2.1).
2.4
The sample covariance matrix
It is easy to “steer” the continuous approximation I˜i of Ii (eq. (2.12)). If I˜iα denotes
the rotation of the i’th image by an angle α, then
I˜iα (r, θ) = I˜i (r, θ − α) =
X
aik,q e−ıkα ψ kq (r, θ).
(2.15)
k,q
Because J−k (x) = (−1)k Jk (x), for real valued images a−k,q = (−1)k ak,q . Also, under
reflection aik,q changes to ai−k,q , and the reflected image, denoted I˜ir , is given by
I˜ir (r, θ) = I˜i (r, π − θ) =
=
X
k,q
X
k,q
aik,q ψ kq (r, π − θ)
ai−k,q ψ kq (r, θ).
(2.16)
(2.17)
The sample mean, denoted I˜mean , is the continuous image obtained by averaging
the continuous images and all their possible rotations and reflections:
n
1 X 1
I˜mean (r, θ) =
2n i=1 2π
Z
2π
0
h
i
I˜iα (r, θ) + I˜iα,r (r, θ) dα
(2.18)
Substituting eqs. (2.15) and (2.17) into (2.18) we obtain
I˜mean (r, θ) =
p0
X
q=1
Ã
n
1X i
a
n i=1 0,q
!
ψ 0q (r, θ).
(2.19)
As expected, the sample mean image is radially symmetric, because ψ 0q is only a
function of r but not of θ.
24
The sample covariance matrix built from the continuous images and all possible
rotations and reflections is formally
n
1 X 1
C =
2n i=1 2π
Z
2π
0
h
(I˜iα − I˜mean )(I˜iα − I˜mean )†
i
α,r
α,r
†
˜
˜
˜
˜
+ (Ii − Imean )(Ii − Imean ) ) dα.
(2.20)
When written in terms of the Fourier-Bessel basis, the covariance matrix C can be
directly computed from the expansion coefficients aik,q . Subtracting the mean image
P
is equivalent to subtracting the coefficients ai0,q with n1 ni=1 ai0,q , while keeping other
coefficients unchanged. That is, we update
n
ai0,q
←
ai0,q
1X i
a .
−
n i=1 0,q
(2.21)
In the Fourier-Bessel basis the covariance matrix C is given by
=
C(k,q),(k′ ,q′ )
n Z
1 X
4πn
= δk,k′
i=1 0
n
X
1
n
i=1
2π
h
i
′
aik,q aik′ ,q′ + ai−k,q ai−k′ ,q′ e−ı(k−k )α dα
R{aik,q aik,q′ }.
(2.22)
The non-zero entries of C correspond to k = k ′ , rendering its block diagonal structure
(see Fig.2.2). Also, it suffices to consider k ≥ 0, because C(k,q),(k,q′ ) = C(−k,q),(−k,q′ ) .
Thus, the covariance matrix can be written as the direct sum
C=
2L
M
C (k)
(2.23)
k=0
(k)
where Cq,q′ =
1
n
Pn
i=1
R{aik,q aik,q′ } is by itself a sample covariance matrix of size pk ×pk .
The block size pk decreases as the angular frequency k increases (see Figure 2.2). We
25
































































Figure 2.2: Illustration of the block diagonal structure of the rotational invariant
covariance matrix. The block size shrinks as the angular frequency k increases.
remark that the block structure of the covariance matrix in steerable PCA is well
known; however, in previous works the block size is constant (i.e., independent of the
angular frequency k). Our main observation is that the block size must reduce as
the angular frequency increases in order to avoid aliasing. Moreover, if the images
are corrupted by independent additive white (uncorrelated) noise, then each block
C (k) is also affected by independent additive white noise, because the Fourier-Bessel
transform is unitary (up to grid discretization).
2.5
Algorithm and computational complexity
We refer to the resulting algorithm as Fourier-Bessel steerable PCA (FBsPCA). The
steps of FBsPCA are summarized in Algorithm 1. The computational complexity
of FBsPCA (excluding pre-computation) is O(nL4 + L5 ), whereas the computational
26
Algorithm 1 Fourier-Bessel steerable PCA (FBsPCA)
Require: n pre-whitened images I1 , . . . , In sampled on a Cartesian grid of size 2L ×
2L.
1: (Precomputation) Compute ψ kq for all Rkq ≤ πL.
2: (Precomputation) Compute the Moore-Penrose pseudoinverse of Ψ.
3: For each Ii compute aik,q using eq. (2.13).
4: Estimate the I˜mean using eq. (2.19). Subtract the mean image by changing ai0,q
P
to ai0,q − n1 i ai0,q .
5: For each k = 0, 1, . . . , 2L compute the covariance matrix C (k) of size pk × pk .
p
6: For each k compute the eigenvalues of C (k) , in descending order λ1k ≥ λ2k · · · ≥ λkk
and the associated eigenvectors h1k , h2k , . . . , hpkk .
7: The eigenimages V kl (0 ≤ k ≤ 2L, 1 ≤ l ≤ pk ) are linear combinations of the
Fourier-Bessel functions, i.e., V kl = Ψ(k) hlk , where Ψ(k) = [ψ k1 , . . . , ψ kpk ].
8: (optional) Use an algorithm such as [50, 51] to estimate the number of components
to choose for each C (k) .
complexity of the traditional PCA (applied on the original images without their rotational copies) is O(nL4 + L6 ). The different steps of FBsPCA have the following
computational cost. The cost for precomputing the Fourier-Bessel functions on the
discrete Cartesian grid is O(L4 ), because the number of basis functions satisfying
the sampling criterion is O(L2 ) and the number of grid points is also O(L2 ). The
complexity of computing the pseudoinverse of Ψ† Ψ is dominated by computing the
singular value decomposition (SVD) of Ψ in O(L6 ). Computing the pseudoinverse of
Ψ† Ψ is a precomputation that does not depend on the images. Alternatively, since the
eigenvalues of Ψ† Ψ are almost equal to one it can be approximated by the identity matrix. It takes O(nL4 ) to compute the expansion coefficients aik,q . The computational
complexity of constructing the block diagonal covariance matrix is O(nL3 ). Since
the covariance matrix has a special block diagonal structure, its eigen-decomposition
takes O(L4 ). Constructing the steerable basis takes O(L5 ). Therefore, the total
computational complexity for FBsPCA without the precomputation is O(nL4 + L5 ).
For traditional PCA, computing the covariance matrix takes O(nL4 ) and its eigendecomposition takes O(L6 ). Overall, the computational cost of FBsPCA is lower than
that of the traditional rotational variant PCA.
27
(a) Clean
(b) SNR=
1
10
(c) SNR=
1
50
Figure 2.3: Simulated 70S ribosome projection images with different signal to noise
ratio.
2.6
Results
In the first experiment, we simulated n = 104 clean projection images of E. coli 70S
ribosome. The images are of size 129×129 pixels, but the molecule is confined to a disk
of radius L = 55 pixels. We corrupted the clean images with additive white Gaussian
noise at different levels of signal-to-noise ratio (SNR) (Fig. 2.3). The running time of
traditional PCA was 695 seconds, compared to 85 seconds (including precomputation)
for FBsPCA, both implemented in MATLAB on a machine with 2 Intel(R) Xeon(R)
CPUs X5570, each with 4 cores, running at 2.93 GHz. The top 5 eigenimages for
noisy images agree with the eigenimages from clean projection images (Fig. 2.4(a)
and Fig. 2.4(d)). The eigenimages generated by FBsPCA are much cleaner than the
eigenimages from the traditional PCA (Fig. 2.4(d) and Fig. 2.4(e)). We also see
that the eigenimages generated by steerable PCA with polar transform (PTsPCA)
are consistent with the eigenvectors from the traditional PCA (see Fig. 2.4(c) and
Fig. 2.4(f).
In another experiment, we simulated images consisting entirely of white Gaussian
noise with mean 0 and variance 1 and computed the Fourier-Bessel expansion coefficients. The distribution of the eigenvalues for the rotational invariant covariance matrix with different angular frequencies k are well predicted by the Marčenko-Pastur
distribution (see Fig. 2.5). This property allows us to use the method described
28
Figure 2.4: Eigenimages for 104 simulated 70S ribosome projection images. Clean
images: (a) FBsPCA, (b) traditional PCA and (c) PTsPCA. Noisy images with
1
: (d) FBsPCA, (e) traditional PCA and (f) PTsPCA. Image size is 129×129
SNR= 50
pixels and L = 55 pixels.
in [50, 51] to estimate the noise variance and the number of principal components to
choose.
We compared the eigenvalues in the previous experiment of the ribosome projection images with different SNRs (Fig. 2.6). As the noise variance (σ 2 ) increases, the
number of signal components that we are able to discriminate decreases. We use the
method in [50, 51] to automatically estimate the noise variance and the number of
29
0.15
0.25
0.15
0.05
Ratio
Ratio
Ratio
0.2
0.1
0.1
0.05
0.15
0.1
0.05
0
0.8
1
λ
1.2
0
0.8
(a) k = 0
1
1.2
λ
(b) k = 1
0
0.8
1
λ
1.2
(c) k = 20
Figure 2.5: Histogram of eigenvalues of C (k) (with k = 0, 1, 20) for images consisting
of white Gaussian noise with mean 0 and variance 1. The dashed lines correspond to
the Marčenko-Pastur distribution. n = 104 , L = 55 pixels and σ 2 = 1.
components beyond the noise level. With the estimated noise variance σ̂ 2 , compop
pk
nents with eigenvalues λik > σ̂ 2 (1 + 1 + γk2 )2 , where γ0 = pn0 and γk = 2n
for k > 0,
are chosen (the case k = 0 is special because the expansion coefficients a0,q are purely
real, since ak,q = (−1)k a−k,q ).
We denote the expansion coefficients of the images on the Fourier-Bessel steerable
PCA basis as bkq , that is,
˜ θ) = I˜mean (r, θ) +
I(r,
X
bkq V kq (r, θ).
(2.24)
k,q
We adapted the derivation in [100] to design an adaptive asymptotically optimal
linear filter for the projection images (see Appendix A). The filtering coefficients are
hkq =
1
1+
where,
SNRγk ,k,q = c2kq SNRkq =
1
SNRγk ,k,q



0,
,
(2.25)
if SNRkq <
√
γk
2


 SNRkq −γk , if SNRkq ≥ √γk
SNRkq +γk
30
(2.26)
0.08
clean
SNR=1/10
SNR=1/50
0.02
i
λ10
λ
i
0
0.06
clean
SNR=1/10
SNR=1/50
0.04
0.01
0.02
0
0
20
i
40
0
0
60
20
(a) k = 0
i
40
(b) k = 10
Figure 2.6: Eigenvalues for C (0) and C (10) for simulated projection images with different signal to noise ratios.
and
SNRkq =
λqk − (γk + 1)σ̂ 2 +
p
(γk − 1)2 σ̂ 4 + (λqk )2 − 2λqk (γk + 1)σ̂ 2
.
2σ̂ 2
(2.27)
Notice that when we fix pk and let the sample size n goes to infinity, (2.27) becomes
λqk −σ̂ 2
.
σ̂ 2
The denoised image using linear filtering coefficients described above is
pk
X
kX
max
ˆ θ) = I˜mean (r, θ) +
I(r,
hkq bkq V kq (r, θ)
k=−kmax q=1
p0
= I˜mean (r, θ) +
X
q=1
pk
kX
max X
+ 2Re(
k=1 q=1
1
1+
1
SNRγ0 ,0,q
1
1+
1
SNRγk ,k,q
b0q V 0q (r, θ)
bkq V kq (r, θ)).
(2.28)
We compared the effects of denoising with FBsPCA, PTsPCA, and traditional
PCA. The asymptotically optimal Wiener filters (2.25) designed with the eigenvalues
and eigenimages from FBsPCA (PTsPCA, PCA, resp.) are applied to the noisy
images. The number of signal components determined with PCA is 62, whereas the
total number of eigenimages for FBsPCA is 430 for 104 noisy images with SNR=
31
1
20
FBsPCA
PTsPCA
PCA
MSE (10−5 )
6.6
7.7
10.1
PSNR(dB) 1-SSIM (10−6 )
20.2
4.4
19.5
5.9
18.3
5.9
Table 2.1: Denoising effect: FBsPCA, PTsPCA and PCA.
and L = 55 pixels. For PTsPCA, due to the non-unitary nature of the interpolation
from Cartesian to polar, we do not have a simple rule for choosing the number of
eigenimages. Instead, we applied denoising multiple times corresponding to different
number of eigenimages, and present here the denoising result with the smallest mean
squared error (MSE). The optimal number of eigenimages for PTsPCA was 348. Of
course, in practice the clean image is not available and such a procedure cannot be
used. Still, the optimal denoising result by PTsPCA is inferior to that obtained
by FBsPCA, where the number of components is chosen automatically. Even if we
allow PTsPCA to use more components than FBsPCA, the denoising result does not
improve. Fig. 2.7 shows that FBsPCA gives the best denoised image. We computed
the MSE, Peak SNR (PSNR), and the structural similarity index (SSIM) [117] to show
the effectiveness of the denoising effect using FBsPCA compared with traditional
rotational variant PCA and PTsPCA (see Table 2.1).
2.7
Summary and discussion
In this chapter we adapted a sampling criterion that was originally proposed in [48]
into the framework of steerable PCA. The Fourier-Bessel transform with the modified
cut-off criterion Rkq ≤ πL is approximately unitary and keeps the statistics of white
noise approximately unchanged. This sampling criterion obtains maximum information from a set of images and their rotated and reflected copies while preventing
aliasing. Instead of constructing the invariant covariance matrix from the original
32
(a)
(c)
(b)
(d)
(e)
Figure 2.7: Denoising by FBsPCA, PTsPCA and PCA. n = 104 and L = 55 pixels.
1
). (c) FBsPCA denoised. (d) PTsPCA
(a) clean image. (b) noisy image (SNR= 20
denoised. (e) PCA denoised.
images and their rotated and reflected copies, we compute the covariance matrix
from the truncated Fourier-Bessel expansion coefficients. The covariance matrix has
a special block diagonal structure that allows us to perform PCA on different angular
frequencies separately. The block diagonal structure was also observed and utilized
in previous algorithms for steerable PCA. However, we show here that the block size
must shrink as the angular frequency increases in order to avoid aliasing and spurious
eigenimages.
While steerable PCA has found applications in computer vision and pattern recognition, this work has been mostly motivated by its application to cryo-EM. Besides
compression and denoising of the experimental images, it is worth mentioning that
33
under the assumption that the viewing directions of the images are uniformly distributed over the sphere, Kam has previously demonstrated [45] that the covariance
matrix of the images is related to the expansion coefficients of the molecule in spherical
harmonics. Our Fourier-Bessel steerable PCA can therefore be applied in conjunction
to Kam’s approach.
When applying PCA to cryo-EM images, one has to take into account the fact
that cryo-EM images are not perfectly centered. It is well known that the unknown
translational shifts can be estimated experimentally using the double exposure technique. Specifically, the images that are analyzed by PCA are obtained from an initial
low-dose exposure, while the centers of the images are determined from a second exposure with higher dose. Furthermore, image acquisition is typically followed by an
iterative global translational alignment procedure in which images are translationally
aligned with their sample mean, that was noted earlier to be radially symmetric. It
has been observed that such translational alignment measures produce images whose
centers are off by only a few pixels from their true centers. Such small translational
misalignments mainly impact the high frequencies of the eigenimages. The general
problem of performing PCA for a set of images and their rotations and translations,
or other non-compact groups of transformations is beyond the scope of this chapter.
Another important consideration when applying PCA to cryo-EM images is the
contrast transfer function (CTF) of the microscope. The images are not simply projections of the molecule, but are rather convolved with a point spread function. CTF
is real valued and oscillates between positive and negative values. The frequencies at
which the CTF vanishes are known as “zero crossings”. The images do not carry any
information about the molecule at zero crossing frequencies. The CTF depends on the
defocus value, and changing the defocus value changes the location of the zero crossings. Therefore, projection images from several different defocus values are acquired.
When performing PCA for the images, one must take into account the fact that they
34
belong to several defocus groups characterized by different CTFs. One possible way
of circumventing this issue is to apply CTF correction to the images prior to PCA. A
popular CTF correction procedure is phase-flipping which has the advantage of not
altering the noise statistics. Alternatively, one can estimate the sample covariance
matrix corresponding to each defocus group separately, and then combine these estimators using a least squares procedure in order to estimate the covariance matrix of
projection images that are unaffected by CTF. The optimal procedure for performing
PCA for images that belong to different defocus groups is a topic for future research.
Finally, we remark that the Fourier-Bessel basis can be replaced in our framework
with other suitable bases. For example, the 2D prolate spheroidal wave functions
(PSWF) on a disk [101] enjoy from many properties that make them attractive for
steerable PCA. In particular, among all bandlimited functions of a given bandlimit
they are the mostly spatially concentrated in the disk. They also have a separation of
variables form which makes them convenient for steerable PCA. However, an accurate
numerical evaluation of the PSWF requires more sophistication compared to the
Bessel functions, which is the reason why we have not applied PSWFs here.
35
Chapter 3
Rotationally Invariant Image
Representation
3.1
Introduction
Penczek et al. [70] introduced reference-free alignment that first globally aligns all
the images and then the rotationally invariant distance is the Euclidean distance between the pre-aligned images. We notice that this method produces errors in aligning
images from similar viewing directions when the clean images have many different
views. In fact there does not exist a global alignment for cryo-EM image data set
because of its geometric structure. While global alignment is impossible, one can
always determine the rotationally invariant distances between all pairs of images by
¡ ¢
optimally aligning each pair of them. In this way, we have to perform n2 alignments
for n images. This is computationally intensive and unnecessary, because most of the
time is spent on aligning images from very different views. It would be more efficient
to use a rotationally invariant representation for the images, then find neighboring
images, and finally align and average only neighboring images. Several invariant
features were proposed for viewing angle classification, for example, autocorrelation
36
functions (ACF), double autocorrelation function (DACF) [89] and self correlation
function (SCF) [112]. However, those invariants had been criticized for losing phase
information. A wide variety of invariants were devised for pattern recognition [60].
However, a common feature of most invariants is that they are lossy, in the sense
that they do not uniquely specify the original signal. Among invariant features, the
bispectrum and the triple-correlation provide a lossless shift-invariant representation,
and various algorithms have been devised to retrieve a signal (up to translation)
from its (possibly noisy) bispectrum [86]. We introduce a new rotationally invariant
representation for computing the rotationally invariant distance between all pairs of
cryo-EM images in this chapter. Our invariant representation is based on expanding
the images in a steerable basis and deriving a bispectrum for this expansion [76, 121].
Unlike ACF, DACF and SCF, the new rotationally invariant representation maintains
phase information and is complete, in the sense of uniquely specifying the original
image up to an arbitrary rotation. We therefore find this representation useful in
determining the rotationally invariant distances between any pair of images. Bispectrum and triple-correlation have been considered before for generating translational
or rotational invariant features for cryo-EM images [89, 44, 58]. However, because
the number of such features is extremely large, it was regarded impractical for computations. We reduce the number of bispectrum-features in two steps. In the first
step, images are expanded in a steerable basis [122] and compressed (and denoised)
using the top M expansion coefficients (M is much smaller than the number of pixels
in an image). Different triplets of these expansion coefficients are multiplied together
to produce the invariant image representation. The resulting invariant representation
is still high-dimensional, consisting of O(M 3 /kmax ) features, where M is the number
of expansion coefficients and kmax is the maximum angular frequency determined by
the size of the image (see Chapter 2). Marabini and Carazo [58] suggested projecting
the bispectrum onto a lower dimensional subspace as a pattern classification method.
37
However, their method consists of using a predetermined subset of bispectrum coefficients and does not preserve the information content well enough to discriminate
images of many different views. Therefore, in the second step, we reduce the dimensionality of the invariant feature vectors by principal component analysis (PCA). We
use a randomized algorithm for low rank matrix approximation [83, 35, 34] to efficiently compute the principal components, overcoming the difficulties imposed by the
large number of images and the high dimensionality of the input feature vectors. The
top principal components provide the reduced invariant image representation. We
then efficiently compute the rotationally invariant distance between images as the
Euclidean distance between their reduced invariant representations without performing any in-plane alignment. A set number of nearest neighbors for each image are
identified as those images with the smallest invariant distances. For a large number
of input images, a randomized nearest neighbor algorithm [43] can avoid computing
the distances between all pairs of images and effectively find the nearest neighbors in
time nearly linear with the number of images. Either ordinary or randomized nearest
neighbor search with reduced invariant image representation gives the initial classification result. The rotational alignment angles are then computed only for nearest
neighbor pairs. With the techniques we propose here, a substantial gain in computing
time is obtained by reversing the order of the alignment and the classification.
3.2
Global alignment
The rotation group SO(3) is the group of all orientation preserving orthogonal transformations about the origin of the 3D Euclidean space R3 under the operation of
38
composition. Any 3D rotation can be expressed using a 3 × 3 orthogonal matrix R:


|
| 
 |


 R1 R2 R3 




|
|
|
satisfying
RRT = RT R = I, det R = 1,
where I is a 3 × 3 identity matrix. The column vectors R1 , R2 , R3 of R form an
orthonormal basis to R3 . To each projection image I there corresponds a 3 × 3
unknown rotation matrix R describing its orientation.
The projection image lies on a tangent plane to the two dimensional unit sphere
S 2 at the viewing angle v = v(R) = R3 . The first two columns of R, namely, R1
and R2 , are vectors in R3 that form an orthonormal basis for the tangent plane and
are identified with the axis coordinates of the image (see Figure 3.1). Together with
the imaging direction v they make an orthonormal basis of R3 . An in-plane rotation
of the projection image can thus be viewed as changing the basis vectors R1 and R2
while keeping v fixed.
Figure 3.1: The image I is identified with the tangent plane to the sphere at the
viewing direction R3 which is the third column of the rotation matrix R.
39
When two images Ii and Ij are of the same viewing angle (vi = vj ), the matrix
Ri−1 Rj is of the form


 cos αij − sin αij 0 



Ri−1 Rj = 
 sin αij cos αij 0  .


0
0
1
that is cos αij = (Ri−1 Rj )11 and sin αij = (Ri−1 Rj )21 .
In practice, however, we cannot expect two projection images to have exactly
the same viewing angle. Still, we may assume that the molecule is “nice” enough1
such that projection images that correspond to nearby viewing angles would look
similar (up to an in-plane rotation). In such cases, it is reasonable to assume that the
optimal in-plane rotation angle αij computed in (3.5) provides a good approximation
to the angle α̃ij that “aligns” the orthonormal bases for the planes2 vi⊥ , vj⊥ , given by
the vectors Ri1 , Ri2 and Rj1 , Rj2 , respectively. In other words, for clean images, it is
expected that a small distance between vi and vj would imply that αij approximates
the angle α̃ij given by
α̃ij = argmin kρ(α) − Ri−1 Rj k2F ,
(3.1)
α∈[0,2π)
where
1


 cos α − sin α 0 


,
ρ(α) = 
sin
α
cos
α
0




0
0
1
If φ ∈ L2 (R3 ) and has compact support, then the X-ray transform (1.1) is a continuous function
from SO(3) to L2 (R2 ). A similar statement for the Radon transform is given as an exercise in
Epstein’s book on the mathematics of medical imaging [22, Exercise 6.6.1, p.215]. The proof is
based on the fact that continuous functions are dense in L2 and on the fact that for functions with
compact support, the L2 norm of the projection image is bounded by the L2 norm of φ.
2 ⊥
v is a shorthand notation of the plane perpendicular to v and passing through the origin.
40
and kAk2F = Tr(AAT ) for any real valued m × n matrix A (i.e., it is the squared
Frobenius norm). It can be verified that α̃ij satisfies
(Ri−1 Rj )11 + (Ri−1 Rj )22
,
cos α̃ij = q
[(Ri−1 Rj )11 + (Ri−1 Rj )22 ]2 + [(Ri−1 Rj )21 − (Ri−1 Rj )12 ]2
(Ri−1 Rj )21 − (Ri−1 Rj )12
sin α̃ij = q
,
−1
−1
−1
−1
2
2
[(Ri Rj )11 + (Ri Rj )22 ] + [(Ri Rj )21 − (Ri Rj )12 ]
(3.2)
(3.3)
in accordance with (3.1) even when vi differs from vj . Thus, computing αij provides
indispensable information about the unknown rotations R1 , . . . , Rn .
The similarity of the images can be measured by the Euclidean distance between
the images when they are optimally aligned with respect to in-plane rotations (assuming the images are centered):
dij = min kIi − R(α)Ij k,
α∈[0,2π)
i, j = 1, ..., n,
(3.4)
where R(α) stands for rotating image Ij counter-clockwise by α. The optimal alignment angle is
αij = argmin kIi − R(α)Ij k,
i, j = 1, ..., n.
(3.5)
α∈[0,2π)
Penczek et al. [70] introduced reference-free alignment that first globally aligns all
the images and then the rotationally invariant distance is the Euclidean distance
between the pre-aligned images. What we are about to elucidate is that such global
alignment does not exist when there is a great variety of viewing angles. In such
cases, the estimation of the in-plane rotations between images from similar views by
the reference-free alignment method is not accurate.
We used a data set composed of clean images corresponding to many different
views in order to numerically test the performance of the reference-free alignment
algorithm [70] for viewing angle classification and for rotational alignment of in41
class images. Specifically, 104 centered clean projection images were simulated from
the 3D model of E. coli 70S ribosome with the viewing directions that are sampled
from the uniform distribution over the sphere. We used SPIDER AP RA program
to run reference-free alignment on different subsets of the simulated data to test the
rotational alignment results. Since we know the underlying rotations, we can compute
α̃ij for pairs of images that satisfy hvi , vj i ≥ cos(5◦ ), that is, for viewing angles that
are less than 5◦ apart. This list of true in-plane rotational angles are compared
with the estimation from the reference-free alignment. First we ran the reference-free
alignment on the whole data set whose viewing directions are uniformly distributed
over the sphere. The algorithm produces large errors when all views are included (see
Figure 3.2a). As we decrease the size of the spherical cap to 60◦ , and 20◦ , the errors
5
4
4
3
2
6
log10 (N + 1)
5
log10 (N + 1)
log10 (N + 1)
in in-plane rotational alignment become smaller (see Figure 3.2).
3
2
4
2
1
1
0
−180 −120 −60
0
60
αij − α̃ij
120 180
(a) whole sphere
0
−180 −120 −60
0
60
αij − α̃ij
(b) 60 ◦
120 180
0
−180 −120 −60
0
60
αij − α̃ij
120 180
(c) 20 ◦
Figure 3.2: Error in degrees of in-plane rotational alignment between images with
similar viewing angles that are less than five degrees apart for simulated clean projection images of the 70S ribosome, with viewing angles belonging to spherical caps
of various opening angles (whole sphere, 60 degrees, 20 degrees). The y axis is in log
scale, because the number of outliers is small. The fraction of pairs for which the
error is larger than 2 degrees is pa = 0.13, pb = 0.09, and pc = 0.
The (perhaps surprising) failure of the reference-free alignment algorithm to globally align all images is a consequence of the topology of S 2 and, more specifically, a
mathematical theorem known as the hairy ball theorem [61]. The theorem says that
a continuous tangent vector field to the two dimensional sphere S 2 must vanish at
42
some point on the sphere. In other words, if f is a continuous function that assigns a
vector in R3 to every point v on the sphere such that f (v) is tangent to the sphere at
−
→
v, then there is at least one v ∈ S 2 such that f (v) = 0 . The theorem attests to the
fact that it is impossible to comb a hairy (spherical) cat without creating a cowlick.
The hairy ball theorem implies that any attempt to find a non-vanishing continuous
tangent vector field to the sphere would ultimately fail. A successful global rotational
alignment of all projection images means that we can choose orthogonal bases to all
tangent planes such that the basis vectors vary smoothly from one tangent plane to
the other. However, this is a contradiction to the hairy ball theorem.
For example, consider the tangent vector field f by




 0   y 

  
 =  −x 
f (v) = v × 
0

  

  
0
1


 x 
 

for v = 
 y 
 
z
(3.6)
The tangent vector field f is continuous but vanishes at the north and south poles
±(0, 0, 1). The normalized tangent vector field
f
1
1
f
=p
f=√
kf k
1 − z2
x2 + y 2
(3.7)
is discontinuous at the poles z = ±1. The hairy ball theorem implies that any attempt
to find a non-vanishing continuous tangent vector field to the sphere would ultimately
fail. An elementary proof of the theorem can be found in [61].
We now explain the way the hairy ball theorem relates to the rotational alignment
problem of the projection images. As shown in Figure 3.1, a successful rotational
alignment of all projection images means that we can choose orthogonal bases to all
tangent planes such that the basis vectors vary smoothly from one tangent plane to
the other. However, this is a contradiction to the hairy ball theorem. We conclude
43
that the global rotational alignment of all images is not feasible, because there does
not exist a set of angles α1 , . . . , αn such that αij = αi − αj , not even approximately3 .
This implies that any classification algorithm that first attempts to globally align the
images, such as K-means clustering after reference-free alignment, would ultimately
fail whenever there are many different views that cover the sphere.
Since we cannot align images from different views all at once, the distance computed between images after global alignment is not a truly rotationally invariant
distance. In section 3.3, we introduce a rotationally invariant image representation b̃
and replace the rotationally invariant distance (1.3) by
i
j
dij = kb̃ − b̃ k.
(3.8)
The new rotationally invariant feature vector b̃ needs to be lower dimensional (so
that (3.8) can be computed efficiently), and to retain the information in the image
(so that (3.8) is meaningful). Using the rotationally invariant feature vectors, we
are able to find images with similar views without performing all pairwise rotational
alignments.
3.3
Bispectrum-like rotationally invariant features
Suppose we have a 1D periodic discrete signal f (x), x = 1, ..., N . The discrete Fourier
transform of f is defined as
fˆ(k) =
N
X
2π
f (x)e−i n kx .
x=1
3
For arguments for finite number of images, see Section 4.2.
44
(3.9)
The power spectrum |fˆ|2 is the Fourier transform of the autocorrelation function
ACF(x) =
n
X
f (y)f (y + x).
(3.10)
y=1
Both the power spectrum and the auto-correlation function are shift-invariant. However, the ACF loses the phase information in fˆ and maintains only its amplitude.
The idea behind bispectral invariants is to move from the autocorrelation function to
the triple-correlation function
T (x1 , x2 ) =
n
X
f (y)f (y + x1 )f (y + x2 ).
(3.11)
y=1
Again by the convolution theorem, the Fourier transform of the triple-correlation
function is
b(k1 , k2 ) = fˆ(k1 )fˆ(k2 )fˆ(k1 + k2 ),
(3.12)
and is called the bispectrum of f . Under shift by z, the Fourier transform of f z =
f (x − z) becomes
fˆz (k) =
n
X
x=1
−i 2π
kx
n
f (x − z) e
−i 2π
kz
n
=e
n
X
2π
2π
f (x′ ) e−i n kx = e−i n kz fˆ(k).
′
(3.13)
x′ =1
Therefore, under translation by z, the bispectrum becomes
bz (k1 , k2 ) = e−i2πzk1 /n fˆ(k1 )e−i2πzk2 /n fˆ(k2 )ei2πz(k1 +k2 )/n fˆ(k1 + k2 ) = b(k1 , k2 ), (3.14)
which shows that the bispectrum is shift-invariant. Unlike the power spectrum, the
bispectrum does not lose the phase information and the original signal can be reconstructed from its bispectrum (up to translation). The bispectrum is widely used in
signal processing as a lossless shift-invariant representation, and various algorithms
have been devised to reconstruct f from b [86]. From the definition of the bispectrum
45
(3.12), it follows that
b(k1 , k2 ) = b(k2 , k1 ) = b(−k2 , −k1 ) = b(−k1 , −k2 )
= b(−k1 − k2 , k2 ) = b(k1 , −k1 − k2 )
= b(−k1 − k2 , k1 ) = b(k2 , −k1 − k2 ).
(3.15)
The knowledge of the bispectrum in the triangular region k1 ≥ 0, k2 ≤ k1 , k1 + k2 ≤
kmax is therefore sufficient for a complete description of the bispectrum. For 1D
periodic signals of length n, there are O(n2 ) bispectrum coefficients. Therefore, the
bispectrum is of very high dimensionality. The possibility of using the bispectrum as
shift or rotational invariant image representation for classification of cryo-EM images
has been previously mentioned in [89, 58]. Due to its high dimensionality, the full
bispectrum has never been used for analyzing large cryo-EM data sets to generate
class averages.
We proceed to describe our generalization of the bispectrum of 1-D periodic signals
to 2-D images. Our approach relies on the expansion of images in a steerable basis,
where the basis images are orthonormal and take the form
uk,q (r, θ) = f k,q (r)eιkθ ,
(3.16)
where k and q in basis image uk,q are indices for angular frequency and radial frequency, respectively. There are many choices of radial functions f k,q (r), for example,
the radial functions f k,q can be computed from the Fourier-Bessel steerable PCA [122],
which provides an optimal basis in the least-squares sense, or they can be chosen independently of the data as Bessel functions or as delta functions. The ith image Ii can
be expanded in Fourier-Bessel steerable basis image with expansion coefficients aikq .
When image Ii is rotated counter-clockwise by angle α, the expansion coefficients of
−ιkα
i
. Therefore, rotating the image is equivalent
Ii (r, θ − α) are given by ai,α
k,q = ak,q e
46
to phase shifting its expansion coefficients, which is similar to adding a phase shift to
the Fourier coefficients when we translate the 1D periodic signal as in (3.13).
Most of the clean images’ energy is concentrated in M components with low
angular frequencies (−kmax ≤ k ≤ kmax ), whereas the additive white Gaussian noise
spreads over all components. Representing the images using only the top M basis
can compress and denoise the images. Therefore, we use the truncated expansion
coefficients with M terms instead of the total number of pixels in the original image.
We define the bispectrum for the steerable basis expansion coefficients as
bk1 ,k2 ,q1 ,q2 ,q3 = ak1 ,q1 ak2 ,q2 ak1 +k2 ,q3 ,
(3.17)
where k1 and k2 are the angular indices and q1 , q2 and q3 are the radial indices. It
is worth noting that due to the symmetry of the bispectrum (3.15), it suffices to
compute bispectrum coefficients for 0 ≤ k1 ≤ kmax and 0 ≤ k2 ≤ min(k1 , kmax − k1 ).
The rotationally invariant bispectrum of a mirrored image is the complex conjugate
of the original bispectrum.
A modification to the bispectrum is needed when treating noisy signals. Suppose
the observed signal y is the true signal x contaminated with additive white Gaussian
noise n, satisfying n ∼ N (0, σ 2 I):
y = x + n.
(3.18)
Then the expansion coefficients satisfy
ayk,q = axk,q + ank,q ,
47
(3.19)
with ank,q satisfies Eank,q = 0 and Eanki ,ql ankj ,qm = σ 2 δij δlm . Then the bispectrum of y
satisfies
Ebyk1 ,k2 ,q1 ,q2 ,q3 = E[(axk1 ,q1 + ank1 ,q1 )(axk2 ,q2 + ank2 ,q2 )(axk1 +k2 ,q3 + ank1 +k2 ,q3 )]
¤
£
= axk1 ,q1 axk2 ,q2 axk1 +k2 ,q3 + E ank1 ,q1 ank2 ,q2 ank1 +k2 ,q3
¤
¤
£
£
+ axk2 ,q2 axk1 +k2 ,q3 E ank1 ,q1 + axk1 ,q1 axk1 +k2 ,q3 E ank2 ,q2
¤
£
¤
£
+ axk1 ,q1 axk2 ,q2 E ank1 +k2 ,q3 + axk1 ,q1 E ank2 ,q2 ank1 +k2 ,q3
£
¤
£
¤
+ axk2 ,q2 E ank1 ,q1 ank1 +k2 ,q3 + axk1 +k2 ,q3 E ank1 ,q1 ank2 ,q2 .
(3.20)
As a result,
Ebyk1 ,k2 ,q1 ,q2 ,q3 = bxk1 ,k2 ,q1 ,q2 ,q3 + σ 2 (δq2 ,q3 ax0,q1 + δq1 ,q3 ax0,q2 + δq1 +q2 ,0 ax0,q3 ).
(3.21)
Therefore, if ax0,q = 0 for all q, then the bispectrum is unbiased, i.e., Eby = bx . As
a result, removing the zero-frequency part of the modified bispectrum makes it less
sensitive to contamination by additive white Gaussian noise.
Van Heel et al. [89, 112] have previously noted that the ACF overweighs the
already strong frequency components in the image due to the squaring of the Fourier
components and therefore they defined a self correlation function (SCF) which underemphasizes all amplitudes by replacing them by their square roots. The SCF was
shown to perform better than the ACF. A similar situation occurs for the bispectrum,
due to the multiplication of three frequency components. We therefore modify the
expansion coefficients such that the amplitude is the cubic root of the original:
ãik,q =
aik,q
.
|aik,q |2/3
(3.22)
Notice that the phase information of the bispectrum is unaltered, as only the amplitudes are modified. It is natural to take the cubic root since in this way the bispectrum
48
scales linearly with the intensity of the image (that is, multiplying an image I by the
constant c results in multiplication of b̃ by c, instead of c3 for b). However, we note
that the bispectral invariant features are not Lipschitz continuous. Figure 3.3 illustrates that after building the invariant features, the underlying geometry of the data
is that of S 2 .
Figure 3.3: Illustration of the geometric structure of rotationally invariant features
for centered clean cryo-EM images.
The rotationally invariant image representation derived in (3.17) is of very high dimensionality. Suppose that the truncated expansion coefficients have M components
and that the corresponding maximum angular frequency is kmax , then the resulting
³ 3 ´
. Computing the inner product of vectors
invariant feature vector is of length O kM
max
of such length can be quite expensive. It is therefore required to reduce the dimen-
sionality of the invariant feature vectors. While this reduction can be achieved by
PCA, the typically large number of images and the high dimensionality of the feature
vectors make the computational cost of classical PCA quite demanding. Instead, we
use the recently proposed randomized PCA algorithm [83, 35, 34]. The randomized
PCA algorithm is a very efficient method for determining the low rank approximation
of a large matrix, and detailed performance analysis and computational complexity
are described in [83, 35, 34]. The algorithm seeks a rank M ′ approximation matrix B
for a data matrix A of size m × n. It tries to find the range of A by first generating
a real l × m matrix G (l ≪ n) whose entries are i.i.d. Gaussian random variables
49
of zero mean and unit variance and then applies G to A and forms R = GA. Use a
pivoted QR-decomposition algorithm on RT , we form matrix Q whose columns are
orthonormal and project original data matrix A to the orthonormal basis to form
T = AQ. Singular value decomposition is performed on a much smaller matrix of
size m × l, T = U ΣW T . Then the low rank approximation of A is B = UM ′ ΣM ′ VMT ′ ,
where V = QW . In this way, we reduced the dimensionality of the bispectrum-like
rotationally invariant data matrix.
3.4
Nearest neighbor search and rotational alignment
We define the rotationally invariant affinity between image Ii and image Ij as the normalized cross-correlation between their corresponding low dimensional feature vectors
i
j
b̃ and b̃ (of length M ′ ):
Cij = Re
(P
M′ i
j
k=1 b̃ (k)b̃ (k)
i
j
kb̃ kkb̃ k
)
.
(3.23)
A fixed number of nearest neighbors with the largest normalized cross-correlation Cij
with image i are determined, with computational complexity O(n2 M ′ ). For large data
sets, consisting of 105 images or more, the randomized approximate nearest neighbor
(RANN) search algorithm [43] is an efficient way for finding the nearest neighbors
without computing Cij for all pairs of i and j. RANN is an iterative algorithm. It
first randomly rotates the data points (in our case, complex valued vectors of length
M ′ ) and subdivides them into smaller boxes by looking at 1, 2, 3, 4 . . . coordinates,
until each box contains about κ points. Then the suspected nearest neighbors are
determined locally as those in same boxes. The process is repeated through independent iterations, and the list of suspected neighbors is refined. In practice only a small
50
number of iterations is needed (see Table 3.1 and Table 3.3) in order to find the true
nearest neighbors with very high probability. The computational complexity for this
randomized algorithm is O(T n(M ′ log M ′ + κ log κ log n) + nκ2 (M ′ + log κ)), where T
is the number of iterations and κ is the number of nearest neighbors.
After classifying images of similar views, we rotationally align images with their
∗
nearest neighbors. The in-plane rotation angle αij
for a pair of neighboring images Ii
and Ij is determined using top M steerable basis expansion coefficients with highest
angular frequency kmax as
∗
αij
= argmax
θ
kX
max
pk
X
aik,q ajk,q eιkθ ,
(3.24)
k=−kmax q=1
where pk is the highest radial frequency associated with the angular frequency k.
3.5
Shift alignment
The input images are the result of the particle picking procedure. Therefore, the
images are not perfectly centered. To find images from the similar viewing directions
and good alignment with nearest neighbors, we also need to search for shifts. What we
are going to elucidate is it is hard at the class averaging stage to center all projection
images.
There are three degrees of freedom in defining the centers of the images. The three
degrees of freedom correspond to the definition of the center of the three-dimensional
molecule. We can fix the three degrees of freedom by choosing the center of mass of the
volume as the origin. Then the center of mass of the clean projection images should
also be at the origin. Therefore, for clean images with the same CTF function, we can
center the images by finding the center of mass of the projection images. However,
this method performs poorly at low SNR and when images are pooled together from
different defocus groups. The practice procedure in the field is to shift-align the
51
images iteratively by correlating with the mean of the data set or with a circular
reference image. The estimation error for this procedure is typically of the order of 5
pixels in each direction.
To align images, we have to perform brute force shift search for the rudimentarily
∗
shift-aligned images. For image i and image j of the same view, with alignment αij
and relative shift (sij,x , sij,y ), the following equation holds,



∗
cos αij
 xi  

−
∗
yi
sin αij
∗
− sin αij
∗
cos αij




  xj   sij,x 

=
,
yj
sij,y
(3.25)
where (xi , yi ) and (xj , yj ) are the location of the center of the projection images.
Equation (3.25) is exact only when i and j share exactly the same viewing direction. When they are slightly different, this equation is not exact anymore. Therefore,
the least squares solution to (3.25) does not produce the true global shifts (xi , yi ).
The least squares solution would perform well in aligning neighboring images, but it
is not expected to find the shifts between different classes.
The rotationally invariant features described in previous sections are not shift
invariant. Therefore, we would like the images to be centered. However, as we
have shown above, centering the images at the stage of class averaging is hard to
achieve. As a result, in practice we produce rotationally invariant features for many
shifted copies of each image, and when comparing images we exhaustively search for
the best match. However, for the classification by VDM, the consistency of shifts is
ignored and we only use the consistency of the rotations. Once we identify the nearest
neighbors and rotational alignment, we use the least-squares solution of (3.25) to
improve the estimation to get more accurate relative shifts between nearest neighbor
pairs. Although we cannot globally center the projection images in this way, the
centers can be estimated later on using common-lines [14].
52
3.6
Experimental results
We performed numerical experiments to test the speed and accuracy of our algorithm
on a machine with 2 Intel(R) Xeon(R) CPUs X5570, each with 4 cores, running at
2.93 GHz with 200GB RAM in total. These experiments were performed in MATLAB
on a single core with a UNIX environment. We compared our algorithms against the
programs in SPIDER [93]. The volume of Escherichia coli 70S ribosome-elongation
factor G(EF-G) [93] is used to simulate projections images. The image size is 129×129
pixels with 2.82Å/pixel. For experiments in SPIDER, all noisy images are filtered
with a low-pass Butterworth filter, with the pass band and stop band at 0.08 and
0.12 respectively, given in reciprocal pixels, as described in [93]. To convert these
values to Angstroms, divide the pixel size by the spatial frequency, i.e., in our case,
2.82/0.12Å
−1
= 23.5Å.
We used Fourier Bessel steerable PCA basis [122] see Chapter 2 to compute the
expansion coefficients. The top M components with non-negative angular frequency
indices were chosen to compute the rotationally invariant representation. The invariant features were projected to an M ′ dimensional space using the randomized low
rank matrix approximation method [83].
In our simulation we know the original viewing angles, so for each image we compute the angles (in degrees) between the viewing direction of the image and the
viewing angles of its 50 nearest neighbors. Small angles indicate successful identification of true neighbors that belong to a small spherical cap, while large angles
correspond to outliers. We can also compute the in-plane rotational alignment angle
α̃ij defined in (3.2) and (3.3). The error of alignment is the difference between the
best alignment angle αij and α̃ij .
53
Figure 3.4: Top row: simulated noise-free projections of the 70S ribosome. Second
row: clean projections with Gaussian envelope function and different Contrast Transfer Functions (CTF). Third row: projection images with CTF contaminated with
white Gaussian noise (SNR=1/50). Bottom row: noisy images that have been phase
flipped.
3.6.1
Simulated clean data
We first applied the algorithm on 104 centered clean 70S ribosome projection images,
whose viewing directions were sampled from the uniform distribution over the sphere
(see top row of Figure 3.4). For comparison, we used SPIDER reference-free rotational
alignment program AP RA to align all the images, followed by K-means clustering
to cluster the aligned images into 200 classes. Our rotational invariant classification
method produced satisfactory results for the simulated clean images (see Figure 3.5a).
SPIDER clusters images on a much larger spherical cap (see Figure 3.5b). This
54
is largely due to the errors in the estimation of in-plane rotational angles by the
global alignment of the images (see Figure 3.2a). K-means clustering helps reduce
the rotational alignment error in the same cluster by excluding the pairs of similar
views that are not properly aligned. However the in-plane rotational alignment in
the same cluster still has some large errors (see Figure 3.5d). These results are a
manifestation of the fact that it is impossible to globally align all images (see Section
3.2). The Fourier-Bessel steerable PCA based rotational invariant classification took
only 12 minutes to complete classification and alignment, whereas SPIDER program
takes over an hour to finish the alignment and classification.
To test the performance of our classification method for larger data sets, we simulated 105 clean centered projection images of 70S ribosome and searched for the 50
nearest neighbors. The pre-processing for preparing rotationally invariant features
scales linearly with the number of images. However correlation scales quadratically
with the number of images. Randomized approximate nearest neighbor search algorithm [43] reduces the time for nearest neighbor search considerably (see Table 3.1).
For 105 images with M ′ = 100, the running time for getting 99% correct nearest
neighbors is almost 40 times faster than the naı̈ve nearest neighbor search. The classification and alignment results using the Fourier-Bessel steerable basis are shown in
Figure 3.6. It is expected that the nearest neighbor pairs are concentrated on smaller
spherical caps and the in-plane rotational alignments are more accurate (compared
to the previous experiment with fewer images). The running time for regular nearest
neighbor search with rotational invariant feature of dimension M ′ = 200 implemented
in MATLAB takes 1 hour for the large data set with 105 images (see table 3.1) to
find nearest neighbors and the total running time including basis expansion, invariant
feature construction and alignment takes about 2 hours. Randomized approximate
nearest neighbor search algorithm [43] reduces the time for nearest neighbor search
considerably (see Table 3.1). For 105 images with M ′ = 100, the running time for
55
4
4
2.5
x 10
2
2
x 10
1.5
N
N
1.5
1
1
0.5
0.5
0
0
5
10
acos < vi , vj >
0
0
15
5
10 15 20 25
acos < vi , vj >
(a)
30
35
60
80
(b)
4
9
x 10
6
log10 (N + 1)
5
N
6
3
4
3
2
1
0
−4
−2
0
αij − α̃ij
2
0
−80 −60 −40 −20
4
0
20
αij − α̃ij
(c)
40
(d)
Figure 3.5: Classification and alignment results for 104 simulated clean and centered
projection images of 70S ribosome. M = 100, M ′ = 100 and κ = 50. (a) Histogram of
angular distance between nearest neighbor pairs found by normalized cross-correlation
of the rotationally invariant features. (b) Histogram of angular distance between nearest neighbor pairs found by SPIDER. (c) Histogram of errors in in-plane rotational
alignment for nearest neighbor pairs using steerable PCA basis expansion coefficients.
(c) Histogram of errors in in-plane rotational alignment estimated from SPIDER.
getting 99% correct nearest neighbors is almost 40 times faster than the naı̈ve nearest
neighbor search.
4
n = 10
n = 104
n = 105
n = 105
′
M
M′
M′
M′
= 100
= 200
= 100
= 200
1 iter (sec)
1.1
2.0
17
31
% correct
67
70
72
69
4 iter (sec) % correct
4.5
99
8.1
99
68
99
123
99
naı̈ve (sec)
103
107
2817
3676
Table 3.1: Running time of finding κ = 50 nearest neighbors with the randomized
nearest neighbor algorithm. The last column is the timing for naı̈ve nearest neighbor
search. The datasets are all simulated centered clean images.
56
5
5
4
x 10
4
x 10
3
N
N
3
2
2
1
1
0
0
1
2
3
4
5
acos < vi , vj >
6
0
0
7
1
2
3
4
5
acos < vi , vj >
(a)
7
(b)
5
10
6
5
x 10
10
8
6
6
N
N
8
x 10
4
4
2
2
0
−1.4
−0.7
0
αij − α̃ij
0.7
0
−1.2
1.4
(c)
−0.6
0
αij − α̃ij
0.6
1.2
(d)
Figure 3.6: Classification and alignment results for 105 simulated clean and centered
projection images of 70S ribosome. M = 100, M ′ = 100 and κ = 50. (a) Histogram
of angular distance of nearest neighbor pairs found by normalized cross-correlation of
rotationally invariant features. (b) Histogram of angular distance of nearest neighbor pairs found by 4 iterations of randomized approximate nearest neighbor search
algorithm. (c) Histogram of error in in-plane rotational alignment for the nearest
neighbor pairs in (a). (d) Histogram of error in in-plane rotational alignment for the
nearest neighbor pairs in (b).
3.6.2
Simulated noisy data
Images observed by electron microscope are not true projections of the specimen.
Imaging modifications include the effects of the contrast transfer function (CTF),
which is introduced through electron lens aberrations and defocusing [123], and also
the envelope function of the microscope, which contains contributions from a number
of effects, such as spatial and temporal coherence, specimen motion, etc. [36]. In
addition, background noise is present from a variety of sources. Therefore, in the
57
next experiment, we attempted to closely emulate the image formation process in the
electron microscope including the effects of CTF, envelope function and noise. We
projected 104 images at directions sampled from the uniform distribution over the
sphere (see the top row of Figure 3.4). Then a Gaussian low-pass filter with half-width
1/10Å
−1
was applied to simulate the effect of the envelope function. In addition, to
make the test realistic, CTFs with different defocus values were applied to the images
(see the third row of Figure 3.4 ). The contrast transfer functions in the experiment
are generated according to the formula from [36] with imaging parameters taken from
the simulative data in SPIDER protocol [93]: electron beam energy E = 200keV
with wavelength λ = 0.025Å and spherical aberration is Cs = 2mm. The images
were divided into 7 different defocus groups, with minimum defocus 2.158µm and
maximum defocus 3.459µm (see Table 3.2). The images are then contaminated with
defocus group
1
2
3
4
5
6
7
defocus (µm) 2.158 2.483 2.645 2.832 3.099 3.315 3.459
Table 3.2: 7 different defocus values in µm
additive Gaussian white noise with SNR= 1/50 (see the third row of Figure 3.4). The
SNR in all our experiments is defined by
SNR =
Var(Signal )
.
Var(Noise)
(3.26)
The images were CTF corrected by phase-flipping, which is a typical image restoration
procedure at the alignment stage [67] (see Figure 3.4, bottom row). Then we applied
our rotational invariant viewing angle classification on the phase-flipped images. For
SPIDER, we used AP C which rotationally aligns all images and tries to classify them
into K groups. It was used for this data set because it generated better classification
and rotational alignment results than the SPIDER algorithm that was used for clean
data set. Our rotational invariant classification achieves better classification results in
58
finding particles of similar views than SPIDER (see Figure 3.7). It only took 745sec for
our algorithm to find nearest neighbors and their in-plane alignment angles, about
5 times faster than the SPIDER program. The iterative reference-free alignment
algorithm implemented in SPIDER needed 63 iterations to rotationally align the
centered images and it took 11 iterations of K-means clustering to classify them into
200 groups of particles of similar views. The distribution of angles between pairs of
images in the same group (Figure 3.7b) shows that SPIDER misclassified some images
that are of different views into the same group. In addition, Figure 3.9 shows that
some clusters are much more populated than others. The maximum cluster size is
121, whereas the minimum cluster size is 3. On the contrary, in the nearest neighbor
classification method, we are able to find a fixed number of nearest neighbors for each
5
5
4
4
log10 (N + 1)
log10 (N + 1)
image to average over.
3
2
2
1
1
0
0
3
5
10 15 20 25
acos < vi , vj >
30
0
0
35
(a)
40
80
120
acos < vi , vj >
160
(b)
Figure 3.7: Histogram of the angles between the viewing angles of images and their
nearest neighbors through different classification methods. Each image is identified
with its 50 nearest neighbors. (a) Normalized cross-correlation of rotationally invariant image representation. (b) Iterative reference-free alignment and clustering
implemented in SPIDER. 104 images belong to 7 different defocus groups with white
noise (SNR= 1/50). All images are CTF corrected through phase-flipping.
We also simulated 105 projection images with CTF and additive Gaussian white
noise to test our classification and alignment method on a large noisy data set. Increasing the number of projection images achieves better classification and alignment
59
6
5
5
log10 (N + 1)
log10 (N + 1)
6
4
3
2
1
0
−150 −100 −50
4
3
2
1
0
αij − α̃ij
50
100
0
150
−150 −100 −50
0
50
αij − α̃ij
(a)
100 150
(b)
Figure 3.8: Histogram of errors in in-plane rotational angles between images and their
nearest neighbors found in different classification methods. Each image is averaged
with its 50 nearest neighbors. (a) Direct normalized cross-correlation of rotationally invariant image representation with expansion coefficients from Fourier-Bessel
steerable PCA basis. (b) Iterative reference-free alignment and clustering implemented in SPIDER. 104 images belong to 7 different defocus groups with white noise
(SNR= 1/50). All images are CTF corrected through phase-flipping.
140
120
100
80
60
40
20
0
0
50
100
150
200
Figure 3.9: Number of particles in each cluster. The data set has 104 simulated
projections images. Those images belong to 7 different defocus groups and are contaminated by additive Gaussian white noise (SNR= 1/50). All images were phase
flipped.
results (see Figure 3.10) compared to using just 104 images. Without the effort of
Vector Diffusion Map described in Chapter 5, the initial nearest neighbor search managed to find nearest neighbors within 14◦ cap (see Figure 3.10a). The randomized
approximate nearest neighbor search algorithm reduced the computational time by
about 10 times and achieved a satisfactory classification (see Figure 3.10b). Table 3.3
60
shows timing for 104 and 105 images with different number of rotationally invariant
features and notice that the number of iterations to achieve the same percentage of
correct nearest neighbors is larger for noisy data sets.
5
5
4
x 10
5
x 10
4
3
N
N
3
2
2
1
1
0
0
7
acos < vi , vj >
0
0
14
(a)
3
6
9
12
a coshvi , vj i
15
18
(b)
Figure 3.10: Classification results for 105 simulated centered images from 7 different
defocus groups contaminated by additive Gaussian white noise (SNR= 1/50). All
images were phase flipped. M = 200, M ′ = 200 and κ = 50. (a) Histogram of angular distance for nearest neighbors estimated by direct normalized cross-correlation of
rotationally invariant features. (b) Histogram of angular distance for nearest neighbors found by randomized approximate nearest neighbor search algorithm with 10
iterations.
4
n = 10
n = 104
n = 105
n = 105
′
M
M′
M′
M′
= 100
= 200
= 100
= 200
1 iter (sec) % correct 10 iter (sec) % correct naı̈ve (sec)
1.1
43.2
11
99
103
2.0
35.5
20
97.7
107
17
32.5
175
97.1
2822
31
26.8
316
94.6
3619
Table 3.3: Running time of finding 50 nearest neighbors with the randomized approximate nearest neighbor search algorithm for noisy simulated projection images that
belong to 7 different defocus groups. The last column is the timing for naı̈ve nearest
neighbor search in MATLAB.
61
3.7
Summary and discussion
Vitreous-ice-embedded biological macromolecules show a great randomness in orientation. This randomness is exactly what is desired for obtaining high quality 3D
reconstructions. However the variety of viewing angles poses a problem for the iterative reference-free alignment method since it is mathematically impossible to bring
all images to global alignment. This means that in practice, the distance computed
from allegedly globally aligned images is not a rotationally invariant distance. In this
chapter, we introduced a new rotationally invariant image representation for cryoEM projection images. This invariant representation is based on the bispectrum of
the expansion coefficients of the images on Fourier-Bessel steerable PCA basis. Although the resulting invariant feature vectors are of very high dimensionality, we
are able to efficiently project them into a lower dimensional space that captures the
most variability. This allows faster and more accurate nearest neighbor classification.
For simulated clean and noisy projection images, the new rotaional invariant viewing angle classfication and the subsequent fast rotational alingment among neighbors
prove to be more accurate and efficient than the iterative reference-free alignment
and K-means clustering method used in the field.
Symmetry and invariants play a major role in physics [27]. The information
content of images is typically not affected under the action of groups such as translations and rotations. The framework described in this chapter can be extended to
the construction of both translationally and rotationally invariant features. This is
very useful since the cryo-EM images are usually not centered. However, if we are
to construct a kernel which is invariant to both translation and rotation, due to the
intricate way in which translation and rotation interact, we need to take a slightly
more abstract viewpoint and re-examine the problem from the point of view of group
theory. In [49], Kondor generalized the concept of bispectrum to SO(3) and proposed to project the 2D images on the sphere. In such way, the non-commutative
62
bispectrum is invariant over both rotation and translation. However, projecting images on the sphere introduces some artifacts. Moreover, the cryo-EM images are
only slightly non-centered. Introducing full invariance over translation will make the
features non-discrimiative. Another way of building invariant features over groups is
through group invariant scattering [57]. Such scattering operator can build Lipschitz
continuous invariants to actions of compact Lie groups G, which has great application
on texture classification [9, 10].
63
Chapter 4
Class Averaging Matrix
4.1
Small-world graph on S 2
The purpose of this section is to motivate the construction of the class averaging matrix. We provide here the intuition that leads to the construction of the class averaging
matrix, whose mathematical meaning will become apparent only in section 4.3.
As mentioned earlier, the information in the optimal in-plane rotation angles
has yet to be utilized in existing class averaging algorithms. We incorporate this
additional information as follows. When computing the optimal alignment of images
Ii and Ij and their invariant distance dij in (3.4), we also record the rotation angles
αij in (3.5) that brings the distance between the two images to a minimum. Note
that
αij = −αji
mod 2π,
(4.1)
as optimal rotation from Ij to Ii is in the opposite direction as that from Ii to Ij .
We assume that there is a unique optimal in-plane rotation angle αij for which
the minimum in (3.5) is attained. Note that this assumption excludes, for example,
projection images of symmetric molecules when the viewing direction coincides with
the symmetry axis, and perhaps other projection images are also excluded. Note that
64
if Ii and Ij are clean images having the same viewing angle vi = vj , then the optimal
in-plane rotation angle αij computed in the optimization procedure (3.5) agrees with
the angle αij introduced in (1.2). In that sense, the computed angles αij provide
additional information about the unknown rotation matrices R1 , . . . , Rn .
By making a histogram of all distances dij , one can choose some threshold value ǫ,
such that dij ≤ ǫ is indicative that perhaps Ii and Ij have nearby viewing angles. The
threshold ǫ defines an undirected graph G = (V, E) with n vertices corresponding to
the projection images, with an edge between nodes i and j iff their invariant distances
is smaller than ǫ (the ǫ-neighborhood graph):
{i, j} ∈ E ⇐⇒ dij ≤ ǫ.
(4.2)
Alternatively, an undirected graph may be constructed from the identification of
nearest neighbors, such that {i, j} ∈ E iff i is one of the N nearest neighbors of j
or j is one of the N nearest neighbors of i, where N ≪ n is a fixed parameter. Yet
another possibility is to take the intersection instead of the union; i.e., {i, j} ∈ E iff
i is one of the N nearest neighbors of j and j is one of the N nearest neighbors of i.
Either way, in an ideal noiseless world, the topology of the graph is that of S 2 ; for
each vertex there corresponds a viewing angle, which is a unit vector in three-space
realized as a point on the sphere. If all invariant distances were trustworthy such
that small distances imply similar viewing angles, then the edges of G would link
neighboring points on S 2 . The drawing of such a graph in 3D space would show
scattered points (vertices) on the sphere connected by short chords (edges). The
experimental world, however, is far from ideal and is ruled by noise, giving rise to
false edges that shortcut the sphere by long chords. Such graphs are known as “smallworld” graphs [118], a popular model to describe social network phenomena such as
the six degrees of separation: our social network consists of people living in our own
65
town (neighboring edges), but also some other family and friends who live across
the world (shortcut edges). Planar drawings of a ring graph and its corresponding
small-world graph are given in Figure 4.1.
(a)
(b)
Figure 4.1: (a) A ring graph with 30 vertices each of which connected to its eight
nearest neighbors with short edges; (b) a small-world graph obtained by randomly
rewiring the edges with probability 0.2 leading to about 20% shortcut edges.
Can we tell the good edges (short chords) from the bad edges (long chords)? It is
possible to denoise small-world graphs based on the fact that they have many more
“triangles” than random graphs: two images Ii and Ij that have nearby viewing angles
should have common neighboring images Ik whose viewing angles are close to theirs.
All three edges {i, j}, {j, k}, and {k, i} are in E forming a triangle (i, j, k). On the
other hand, shortcut edges are not expected to be sides of as many triangles. This
“cliquishness” property of small-world graphs was used by Goldberg and Roth [29] to
denoise protein-protein interaction maps by thresholding edges that appear in only a
few triangles.
In the class averaging problem of cryo-EM, we can further test for the consistency
of the triangles. Indeed, if the three images Ii , Ij , and Ik share the same viewing
angle, then the three corresponding rotation angles αij , αjk , and αki must satisfy
αij + αjk + αki = 0 mod 2π,
66
(4.3)
because rotating first from Ii to Ij , followed by a rotation from Ij to Ik , and finally a
rotation from Ik to Ii together complete a full circle. Equation (4.3) is a consistency
relation that enables us to detect image triplets with similar viewing angles and to
identify good triangles. Similarly, we may write consistency relations that involve
four or more images.
At first (but incorrect1 ) inspection, the triplet consistency relation seems to be a
byproduct of an underlying angular synchronization problem [95]. If all projection
images can be initially rotated such that they are optimally rotationally aligned, then
we can let αi be the rotation angle of image Ii that brings it in sync with all other
images. The mutual rotation angles αij should satisfy the difference equations
αi − αj = αij
mod 2π for (i, j) ∈ E,
(4.4)
from which the consistency relation (4.3) immediately follows:
αij + αjk + αki = αi − αj + αj − αk + αk − αi = 0 mod 2π.
(4.5)
In [95], a robust and efficient synchronization algorithm was introduced for estimating angles α1 , . . . , αn from noisy offset measurements of the form (4.4). The first
step of synchronization algorithm is to construct an n × n Hermitian matrix H as
Hij =
where ι =
√



eιαij , {i, j} ∈ E


0,
(4.6)
{i, j} ∈
/E
−1. The matrix H is Hermitian, i.e., Hij = H ji , because the offsets are
skew symmetric; i.e., αij = −αji mod 2π. As H is Hermitian, its eigenvalues are real.
The second step of the synchronization algorithm is to compute the top eigenvector
1
The explanation for this incorrectness is given in Section 3.2 and Section 4.2.
67
v1 of H with maximal eigenvalue, and to derive an estimator α̂1 , . . . , α̂n for angles in
terms of top eigenvector as
eια̂i =
v1 (i)
,
|v1 (i)|
i = 1, . . . , n.
(4.7)
The motivation and analysis of this eigenvector-based synchronization algorithm are
detailed in [95].
4.2
Hairy ball theorem
Unfortunately, the top eigenvector of H constructed with the offset angles given
in (3.5) does not provide rotation angles that would bring all the projection images in
sync due to the hairy ball theorem (see Section 3.2). One may argue that we should
not care too much about the hairy ball theorem, because the number of images n
is finite, so it is always possible to remove an arbitrarily small spherical cap from
the sphere that does not include any of the n viewing angles, and that the resulting
chopped sphere can be combed, using, e.g., the tangent vector field f given in (3.6).
We move on to explain that the hairy ball theorem affects the eigenvectors of the
matrix H even when the number of images is finite, and that the above argument for
chopping off the sphere fails already for moderate values of n.
To that end, recall that in the noise-free case, the underlying graph used to construct H is a neighborhood graph on S 2 . It may be further assumed that an edge
between Ii and Ij exists if and only if the angle between their viewing angles is less
than or equal to β, that is, {i, j} ∈ E ⇔ (vi , vj ) > cos β. In other words, tilting the
molecule by less than β results in a projection image that is similar enough to be
recognized as a neighbor of the original (non-tilted) image. Consider now the union
of n spherical caps of opening angle β centered at the viewing angles v1 , . . . , vn . If
this union of n spherical caps entirely covers the sphere, then it would be impossible
68
to chop an arbitrarily small cap off the sphere without affecting the underlying graph
and the matrix H. Since the surface area of a spherical cap with an opening angle
β is 4π sin2 β2 , the total surface area covered by all n caps cannot exceed n4π sin2
β
2
(union bound). As the total surface area of the sphere is 4π, we see that a complete
coverage of the sphere with random spherical caps is impossible for n ≤
1
sin2
β
2
. For
such small values of n we do not “see” the topology of the sphere through the connectivity graph, and the hairy ball theorem would not be effective. However, we are
much more interested in large values of n corresponding to large image data sets that
we want to class average in order to increase the SNR.
For large values of n, the probability that the sphere is not fully covered is exponentially small. The coverage problem of the sphere by random spherical caps is
considered in [102, Chapter 4, pp. 85-96]. Let Pr(n, β) be the probability that n
random spherical caps of opening angle β cover the sphere. Though we are not familiar with the exact value of Pr(n, β) except for special values of β and n, it is easy to
derive useful upper and lower bounds for Pr(n, β) as well as its asymptotic behavior
in the limit n → ∞. For example, the probability that the north pole is not covered
by any cap is (1 − sin2 β2 )n . If the sphere is covered, then in particular the north pole
is covered, and thus, a simple upper bound for Pr(n, β) is given by
Pr(n, β) ≤ 1 − (1 − sin2
β n
) .
2
(4.8)
A slightly more involved geometric argument [102] shows that a lower bound is given
by
β
β
4
Pr(n, β) ≥ 1 − n(n − 1) sin2 (1 − sin2 )n−1 .
3
2
2
(4.9)
Combining the lower and upper bounds yields
lim
n→∞
log(1 − Pr(n, β))
β
= log(1 − sin2 ).
n
2
69
(4.10)
In particular, Pr(n, β) goes to 1 exponentially quickly as n → ∞; that is,
1 − Pr(n, β) ≈ en log(1−sin
2 β)
2
.
(4.11)
In other words, the hairy ball theorem affects the eigenstructure of H as soon as the
number of images n is of the order of
4.3
1
− log(1−sin2
β
)
2
(∼
1
sin2
β
2
for small β).
Spectral properties of the class averaging matrix.
We note that (4.3) can hold (approximately) without (4.4). So, instead of getting
too pessimistic from the hairy ball theorem, we observe that the matrix H is a welldefined Hermitian matrix, and nothing prevents us from computing its eigenvectors
and eigenvalues. In fact, we expect the eigenvectors of H to somehow capture the
consistency relation (4.3) and all higher-order consistency constraints that correspond
to cycles longer than three. The question is whether or not the eigenvectors of H are
meaningful in the sense that they will allow us to identify images with nearby viewing
angles, and if so, how?
We show that there is indeed a relatively simple algorithm that successfully identifies images with nearby viewing directions using the top three (or more) eigenvectors
of H. The underpinning of the algorithm is the connection between the matrix H and
the parallel transport operator on the sphere. To that end, consider the case in which
the underlying graph is a neighborhood graph on S 2 , without any shortcut edges2 .
More specifically, we assume that the neighborhood of every vertex corresponds to a
small spherical cap with an opening angle β; that is, if v1 , v2 ∈ S 2 are two different
2
We remind the reader at this point that the molecule is assumed to be generic and in particular to
have a trivial point group symmetry (that is, it has no special symmetry), to ensure that projection
images taken at different imaging directions cannot be perfectly rotationally aligned.
70
viewing angles, then there is an edge between v1 and v2 iff hv1 , v2 i > cos β. We denote
the resulting matrix by H clean to emphasize that there are no shortcut edges, which
will be the case if the matrix is constructed from clean projection images that contain
no noise.
As mentioned earlier, we may think of a projection image I with corresponding
rotation matrix R as living on the tangent plane to the sphere at the viewing angle
v = R3 . The tangent plane at the point v can be identified with the standard
Euclidean plane R2 using the first two columns R1 , R2 of R. Let us denote3 this
copy of R2 by TR . In addition, we note that TR can be further identified with C: any
¡¢
vector ab ∈ TR is identified with the complex number z = a + ιb ∈ C. This complex
structure is the one induced from the orientation defined by the viewing angle v.
Between any two non-antipodal points vi , vj ∈ S 2 there is a unique geodesic line
(a great circle) that connects them. We can slide any vector tangent to the sphere at
vj along this geodesic line in such a way that the sliding vector remains tangent to the
sphere until it reaches the point vi , where it becomes a tangent vector to the sphere at
vi . During this transportation, we make sure that the angle that the tangent vector
makes with the geodesic remains constant. The transportation that takes vectors in
TRj to vectors in TRi is a linear transformation denoted by TRi ,Rj : TRj 7→ TRi and is
known in differential geometry as the parallel transport operator on the sphere [19,
Chapter 4].
The main observation is that whenever two projection images Ii and Ij have nearby
viewing angles vi and vj satisfying hvi , vj i > cos β, the matrix element Hijclean = eιαij
is an approximation to the parallel transport operator TRi ,Rj : TRj 7→ TRi , viewed
as an operator from C to C. That is, if zj ∈ TRj , then eιαij zj ≈ TRi ,Rj zj ∈ TRi
(in Appendix B.5 we give the explicit formula of the parallel transport operator on
3
Our notation is somewhat nonstandard. The reader should not confuse TR with the tangent
plane to SO(3) at the point R. For us, TR denotes the tangent plane to S 2 at the point v(R), where
the subscript R conveys the information that we equip this tangent space with the orthonormal basis
given by the columns R1 , R2 of the matrix R.
71
the sphere and show that it coincides with the rotation implied by the optimization
procedure (3.1)).
This implies that the limiting class averaging matrix H = limn→∞ n1 H clean in the
limit of an infinite number of clean images whose viewing angles are independently
drawn from the uniform distribution on S 2 is a local version of the parallel transport
operator T on the sphere. The parallel transport operator takes tangent vector fields
to the sphere to tangent vector fields on the sphere. The (global) parallel transport
operator T can be written in terms of the following integral over SO(3):
(T f )(R) =
Z
SO(3)
TR,U f (U )dU,
(4.12)
where dU is the uniform (invariant Haar) measure over SO(3), and f is any complexvalued function on SO(3) satisfying f (R) ∈ TR for all R ∈ SO(3). We define the
local parallel transport operator Th by the integral
(Th f )(R) =
Z
U ∈SO(3):hv(U ),v(R)i>1−h
TR,U f (U )dU,
(4.13)
where h = 1 − cos β ∈ [0, 2] and β is the opening angle of the spherical cap. To
conclude, we obtain
H = Th .
(4.14)
In this regard, the matrix H clean should be considered as a discretization of Th . Note
that from the experimental data obtained from rotationally aligning the images we
are only able to extract (a discretization of) the local parallel transport operator.
This is due to the fact that we do not know at this point how to transport vectors between projection images (tangent planes) whose viewing angles are far apart,
because the angle αij obtained from the rotational alignment (3.5) of dissimilar im-
72
ages is meaningless in such cases in the sense that it has nothing to do with parallel
transportation.
It follows that the spectral properties of the matrix H clean are governed by the
spectral properties of the operator Th . We now state without proof4 a few results
concerning the spectrum of Th .
The first result states that the multiplicities of the eigenvalues λn (h) of Th are
3, 5, 7, 9, . . . , that is, the following:
The multiplicity of λn (h) is 2n + 1, n = 1, 2, . . . .
Note that the multiplicity 1 that appears in the angular synchronization problem
disappears from the spectrum of Th , a fact that may be attributed to the hairy ball
theorem. The precise formula for the first four eigenvalues is
1
λ1 (h) = h −
2
1
λ2 (h) = h −
2
1
λ3 (h) = h −
2
1
λ4 (h) = h −
2
1 2
h,
8
5 2 1 3
h + h,
8
6
11 2 25 3
h + h −
8
24
19 2 27 3
h + h −
8
8
15 4
h,
64
119 4
7
h + h5 .
64
20
These formulas are valid for all values of h ∈ [0, 2]. The graphs of λn (h), for n =
1, 2, 3, 4, are given in Figure 4.2.
The second result states that the eigenspace of multiplicity 3 (n = 1) corresponds
to the largest eigenvalue of Th . That is, λ1 (h) > λn (h) for n ≥ 2 and for all h ∈ [0, 2].
For example, Figure 4.2 demonstrates that λ1 (h) dominates λ2 (h), λ3 (h), and λ4 (h).
The spectral gap is evident from Figure 4.2, which shows that for small values of
h the second largest eigenvalue among the first four eigenvalues is λ2 (h). In fact,
λ2 (h) > λn (h) for all h ≤ 1/2 and n ≥ 3 (note that λ2 (h) has its maximum at
4
For a complete proof of the results we refer readers to [33], which builds on the analysis presented
in [32].
73
0.5
0.4
n=1
n=2
n=3
n=4
λn (h)
0.3
0.2
0.1
0
−0.1
−0.2
0
0.5
1
h
1.5
2
Figure 4.2: The eigenvalues λn (h) of the local parallel transport operator Th for
n = 1, 2, 3, 4.
h = 1/2). Hence, the spectral gap ∆(h) is
1
1
1
5
∆(h) = λ1 (h) − λ2 (h) = h2 − h2 − h3 ∼ h2
8
8
6
2
for h ≪ 1.
(4.15)
The third result states that if Ri , Rj ∈ SO(3) are two rotations, and ψ1 , ψ2 , ψ3
form an orthonormal basis for the top eigenspace of Th of multiplicity 3, that is,
ψ1 , ψ2 , ψ3 are the top three eigenfunctions of Th satisfying
Th ψm = λ1 (h)ψm ,
m = 1, 2, 3,
(4.16)
then
hv(Ri ), v(Rj )i = 2
|hΨ(Ri ), Ψ(Rj )i|
− 1,
kΨ(Ri )kkΨ(Rj )k
(4.17)
where for every rotation R ∈ SO(3) we define the vector Ψ(R) ∈ C3 as
Ψ(R) = (ψ1 (R), ψ2 (R), ψ3 (R)).
74
(4.18)
Note that the dot product on the left-hand side of (4.17) is between vectors in R3 ,
while the dot product and norms on the right-hand side are of vectors in C3 . The
result (4.17) means that the top three eigenvectors of H allow us to express the dot
product between their corresponding unit vectors
Ψ(Ri )
kΨ(Ri )k
and
Ψ(Rj )
kΨ(Rj )k
in C3 . The third
result is of extreme importance as it lies at the heart of our algorithm. In Appendix B
we give an elementary proof of this result. That proof requires neither knowledge of
representation theory nor familiarity with the tools developed in [32, 33].
4.4
Algorithm
Taking these three spectral properties into account leads to a simple algorithm for
finding images that correspond to similar viewing angles. The input to the algorithm
consists of n projection images I1 , . . . , In (n is often twice as large as the number of
raw images, because for every raw image we artificially add its mirrored image). We
now detail the various steps of the algorithm:
1. Compute the rotationally invariant distances5 dij and the optimal alignment
angles αij as in (3.4) and (3.5) between all pairs of images.6
2. Construct the sparse n × n Hermitian matrix H as defined in (4.6), with the
edge set E defined in (4.2) or, preferably, by taking the N nearest neighbors of
every image (where N ≪ n is a fixed parameter).
3. Since some vertices may have more edges than others, we define H̃ = D−1 H,
P
where D is an n × n diagonal matrix with Dii = nj=1 |Hij |. Notice that H̃ is
similar to the Hermitian matrix D−1/2 HD−1/2 .
5
The specific metric to measure the distances between images need not be the Euclidean distance,
and we leave its specific choice to the user. It is possible of course to first denoise the images (e.g.,
by applying a radial mask and filtering) and normalize the images (so that all of them have the same
Euclidean norm) in order to have distances that are statistically more significant.
6
There is no need to store all O(n2 ) distances and rotation angles. To save on storage we store
only distances and angles corresponding to edges in the graph defined in step 2.
75
4. Compute the top three normalized eigenvectors of H̃, denoted ψ1 , ψ2 , ψ3 ∈ Cn .
Since H̃ is a sparse matrix, its top three eigenvectors are most efficiently computed using an iterative method, such as the MATLAB eigs function.
5. Use ψ1 , ψ2 , ψ3 to define n vectors Ψ1 , . . . , Ψn ∈ C3 by
Ψi = (ψ1 (i), ψ2 (i), ψ3 (i)),
i = 1, . . . , n,
(4.19)
where ψj (i) is the ith entry of the vector ψj (j = 1, 2, 3).
6. Define a measure of affinity Gij between images Ii and Ij as
Gij = 2
|hΨi , Ψj i|
− 1,
kΨi kkΨj k
i, j = 1, . . . , n.
(4.20)
7. Declare neighbors of Ii as
neighbors of Ii = {j : Gij > 1 − γ},
(4.21)
where 0 < γ ≪ 1 controls the size of the neighborhood, or, alternatively, it is
also possible to choose some fixed number of the largest Gij values to define
neighbors.
In section 4.5 we consider a specific model for which the algorithm is shown to
successfully detect the true neighbors, while in section 4.6 we detail the results of
numerous experiments that demonstrate its usefulness in practice. Moreover, in subsection 4.6.4 we describe a generalization of the algorithm that uses more than three
eigenvectors and discuss the reason for which it succeeds even if the viewing directions
are not uniformly distributed.
76
4.5
Probabilistic model and random matrix theory
We now explain why the algorithm succeeds in identifying the true neighbors. To
that end, we will assume a specific probabilistic model for the entries of H. Our
simplified model tries to capture and approximate the main features of the matrix
H when it is constructed by computing rotationally invariant distances and optimal
angles between noisy images.
We start by randomly generating n rotations R1 , . . . , Rn uniformly (according to
the Haar measure) on SO(3). Our model assumes that if Ii and Ij are two projection
images whose viewing angles both belong to a small spherical cap of size β, then with
probability p the rotationally invariant distance dij will be small enough such that
there would be an edge between them in the graph (i.e., {i, j} ∈ E) and that the
optimal angle αij would be accurate in the sense that it would be given by (3.2)-(3.3).
With probability 1−p the distance dij is not small enough for creating an edge between
i and j, and instead there would be a link between i and some random vertex, drawn
uniformly at random from the remaining vertices (not already connected to i). We
assume that if the link between i and j is a random link (such that the corresponding
viewing angles are far away from each other), then the optimal in-plane rotation angle
αij is uniformly distributed in [0, 2π). In our model, the only links existing between
projection images Ii and Ij whose viewing angles do not belong to a small spherical
cap of size β are these shortcut edges obtained by rewiring other good links, and
there are no other links between them. In other words, our model assumes that the
underlying graph of links between noisy images is a small world graph on the sphere,
with edges being randomly rewired with probability 1 − p. The angles take their
correct values for true links and random values for shortcut edges.
The matrix H is a random matrix under this model. Since the expected value of
the random variable eια vanishes for α ∼ Uniform[0, 2π), that is, Eeια = 0, we have
77
that the expected value of the matrix H is
EH = pH clean ,
(4.22)
where H clean is the class averaging matrix that corresponds to p = 1 obtained in the
case that all links and angles are inferred correctly. Here we take full advantage of
our construction of H that sets the elements of H to be complex-valued numbers that
average to 0 (instead of nonnegative entries like 0 and 1 of the adjacency matrix that
cannot have zero mean). We conclude that the matrix H can be decomposed as
H = pH clean + R,
(4.23)
where R is a random matrix whose elements are independent and identically distributed (i.i.d.) zero mean random variables with finite moments (the elements of
R are all bounded). The decomposition (4.23) is extremely useful, as it means that
the top eigenvectors of H approximate the top eigenvectors of H clean as long as the
2-norm of R is not too large. Bounds on the spectral norm of the random sparse
matrices are proved in [46, 47]. Adapting [47, Theorem 2.1, p.126] to our case shows
√
that kRk2 = O(β n) with high probability, since the average degree of the graph
is n sin2
β
2
= O(β 2 n). In the next section we provide the results of numerical ex-
periments from which it seems that the distribution of the eigenvalues of R follows
Wigner’s semicircle law [119, 120]. Moreover, we note that the spectral norm of
H clean is O(nβ 2 ), since
1
H clean
n
converges to the operator H whose spectral norms
O(h) = O(β 2 ). Similarly, the spectral gap of H clean is O(nh2 ) = O(nβ 4 ), since the
spectral gap of H is O(h2 ) = O(β 4 ). The decomposition (4.23) and the bound on kRk2
imply that the top three eigenvectors of H approximate the top three eigenvectors of
H clean as long as kRk2 is smaller than the spectral gap of pH clean or, equivalently, for
78
p > pc , where pc = pc (β, n) = O( β 31√n ) is the threshold probability that depends on
the size of the cap and the number of images.
As discussed earlier, the matrix n1 H clean converges almost surely to the operator
H in the limit n → ∞. This convergence rate is at least as fast as
√1
n
due to the law
of large numbers. Altogether, it follows that for large enough n, the linear span of
the top three eigenvectors of H is a discrete approximation of the top eigenspace of
H, because the top eigenvectors of H approximate the top eigenvectors of H clean , but
these approximate the top eigenfunctions of H. Finally, the spectral properties of H
(Section 4.3) ensure the success of the algorithm under this probabilistic model.
4.6
Numerical experiments
We conducted two types of numerical experiments. The first type involves simulations
of the probabilistic model of section 4.5. The second type mimics the experimental
setup by applying the algorithm to noisy simulated projection images of a given
3D volume. We point out that there is no direct way to compare the performance
of classification algorithms on real microscope images, since their viewing angles are
unknown. The only way to compare classification algorithms on real data is indirectly,
by evaluating the resulting 3D reconstructions. Here we conduct only numerical
experiments from which conclusions can be drawn directly. All experiments in this
section were executed on a Linux machine with eight Xeon 2.93GHz cores and 48GB
of RAM. Unless otherwise specified, the executed code was not parallelized and so
only one core was active.
4.6.1
Experiments with the probabilistic model
The purpose of the first experiment of this type is to illustrate the emergence of
Wigner’s semicircle law and the performance of the algorithm for n = 1000, cos β =
79
0.7 (β = 45.6◦ ), and different values of p. We chose this relatively small value for n
(n = 1000), because showing the semicircle requires the computation of all eigenvalues
of H, not just the few top ones that are required by the algorithm. Also, the value
of β = 45.6◦ is too large to be considered realistic, as it implies that projection
images whose viewing angles differ by as much as 45◦ are similar enough to have
a small rotationally invariant distance. We chose this high value of β so that the
underlying graph would have enough edges and the spectral gap would be noticeable.
The number of graph edges in this experiment is m ≈ 37, 500. We emphasize once
again that the purpose of this experiment is of an illustrational nature.
Figure 4.3 summarizes the results of the first experiment. Evident from the histogram of the eigenvalues of H (left column) is the emergence of the semicircle that
gets slightly wider as p decreases (the right edges of the semicircles never exceed 25).
The multiplicities 3, 5, 7, . . . of the top eigenvalues are clearly demonstrated in the
bar plots of the eigenvalues (middle column). The top three eigenvalues of H are
separated from the semicircle as long as p is not too small. Note that the decrease of
these three eigenvalues as p is decreased stands in full agreement with (4.23), which
shows that they scale linearly with p as they correspond to the eigenvectors of H clean .
The right column shows scatter plots of the Gij values that were estimated from the
algorithm (y-axis) against the dot products hvi , vj i of the simulated (true) rotations
R1 , . . . , Rn with vi = Ri3 , i = 1, ..., n. Figure 4.3c shows that the dot products between
the normalized unit vectors in C3 that are computed from the top three eigenvectors
of H are equal (up to numerical errors) to the dot products of the corresponding
viewing angles. In particular, large Gij values imply large hvi , vj i values; that is, they
indicate the correct neighbors. The scattered points get more spread as the value
for p decreases. The numerical results show that the spreading is wider for large
distances (small values of the dot product, bottom-left corner) and is narrower for
80
small distances. This narrowing near the top-right corner is very appealing, because
it allows us to better distinguish the true neighbors.
In the second experiment, we used parameter values that are more realistic: n =
40, 000, cos β = 0.95 (β = 18.2◦ ), and different values of p. The number of edges was
m ≈ 107 . The running time for each value of p was about 2 minutes, which includes
the construction of the graph and the computation of the top 20 eigenvalues of H using
the MATLAB eigs function. Note that this running time is negligible compared to the
time required to compute all n(n−1)/2 rotationally invariant distances. The results of
this experiment are summarized in Figure 4.4. The spectral gap between the top three
eigenvalues of H and the remaining spectrum, though small, is noticeable even for
p = 0.1 (90% false links), and even the multiplicities 5 and 7 are easily distinguishable
(see Figure 4.4d). The estimated Gij ’s provide a very good approximation to the dot
products hvi , vj i between the true viewing angles, and the approximation is especially
good for neighboring pairs, even for p = 0.2 (80% of shortcut edges), as can be seen
in Figure 4.4h. Even for p = 0.1 (Figure 4.4i), the identification of the neighbors
based on the large Gij values is quite good, and there are only a few images that the
algorithm would confusingly consider as neighbors.
4.6.2
Experiments with noisy simulated images
In the second series of experiments, we tested the eigenvector method on different sets
of 20, 000 simulated noisy projection images of the 50S ribosomal subunit, each set
corresponding to a different level of noise. Including the mirrored images, the total
number of images was n = 40, 000. The simulated images were generated as noisefree centered projections of the macromolecule, whose corresponding rotations were
uniformly distributed on SO(3). Each projection was of size 129 × 129 pixels. Next,
we fixed a signal-to-noise ratio (SNR), and added to each clean projection additive
Gaussian white noise of the prescribed SNR. The SNR in all our experiments is defined
81
400
150
1
0.5
300
Gij
200
λ
f(λ)
100
0
50
−0.5
100
0
−50
0
50
λ
100
0
0
150
10
(a) p = 1
20
30
40
50
−0.5
60
(b) p = 1
100
0
hvi , vj i
0.5
1
(c) p = 1
120
1
100
80
0.5
80
Gij
λ
f(λ)
60
60
0
40
40
−0.5
20
20
0
−50
0
50
λ
100
0
0
150
10
(d) p = 0.7
40
50
−0.5
60
60
70
50
60
0.5
1
1
0.5
50
Gij
40
30
0
hvi , vj i
(f) p = 0.7
λ
0
30
20
20
10
0
−40
−0.5
10
−20
0
20
λ
40
60
0
0
80
(g) p = 0.4
10
20
30
40
50
(h) p = 0.4
35
30
30
25
25
20
20
15
15
10
10
5
5
−20
−10
0
10
20
30
0
0
40
1
0.5
0
−0.5
10
(j) p = 0.2
20
30
40
50
−0.5
60
(k) p = 0.2
0
hvi , vj i
0.5
1
(l) p = 0.2
25
25
0.5
1
λ
30
0
hvi , vj i
(i) p = 0.4
λ
35
0
−30
−0.5
60
Gij
f(λ)
30
(e) p = 0.7
40
f(λ)
20
1
20
0.5
20
Gij
15
λ
f(λ)
15
0
10
10
0
−30
−0.5
5
5
−20
−10
0
λ
10
(m) p = 0.1
20
30
0
0
10
20
30
40
(n) p = 0.1
50
60
−0.5
0
hvi , vj i
0.5
1
(o) p = 0.1
Figure 4.3: n = 1000, β = 45.6◦ , and different values of p. Left column: histogram of
the eigenvalues of H. Middle column: bar plot of the top 50 eigenvalues of H. Right
column: scatter plot of Gij as estimated in step 6 of the algorithm (y-axis) against
the dot product of the true simulated viewing angles hvi , vj i (x-axis).
82
200
600
300
150
λ
250
400
λ
500
800
λ
1000
400
200
100
200
100
50
0
0
5
10
15
20
0
0
25
5
10
(a)
15
20
0
0
25
5
10
(b)
15
20
25
(c)
120
70
100
60
50
80
λ
λ
40
60
30
40
20
20
10
0
0
5
10
15
20
0
0
25
5
10
20
1
1
0.5
0.5
0.5
Gij
1
0
0
−0.5
0
hvi , vj i
0.5
0
−0.5
−0.5
−0.5
25
(e)
Gij
Gij
(d)
15
1
−0.5
0.5
1
−0.5
(g)
1
1
0.5
0.5
0
0
hvi , vj i
0.5
1
(h)
Gij
Gij
(f)
0
hvi , vj i
0
−0.5
−0.5
−0.5
0
hvi , vj i
0.5
1
−0.5
(i)
0
hvi , vj i
0.5
1
(j)
Figure 4.4: n = 40, 000, β = 18.2◦ , and different values of p. Top two rows: bar plot
of the top 20 eigenvalues of H. Bottom two rows: scatter plot of Gij as estimated in
step 6 of the algorithm (y-axis) against the dot product of the true simulated viewing
angles hvi , vj i (x-axis).
by
SNR =
Var(Signal)
,
Var(N oise)
83
(4.24)
where Var is the variance (energy), Signal is the clean projection image, and N oise
is the noise realization of that image. Figure 4.5 shows one of the projections at
different SNR levels.
(a) Clean
(b) SNR= 1
(c) SNR= 1/2
(d) SNR= 1/4
(e) SNR= 1/8
(f) SNR= 1/16
(g) SNR= 1/32
(h) SNR= 1/64
Figure 4.5: Simulated projection with various levels of additive Gaussian white noise.
For each set of n = 40, 000 simulated images with a fixed SNR, we need to compute
¡n¢
all 2 rotationally invariant distances (3.4) and optimal alignment angles (3.5). This
computation can be quite time consuming. Also, we need to pay special attention to
making the computation accurate, as it involves rotating images that are specified on a
Cartesian grid rather than on a polar grid. There are quite a few papers that deal with
the problem of fast and accurate rotational alignment of images; see, e.g., [78, 44, 17].
In the experiments reported here we rotationally align the images using a method
that uses a steerable basis of eigenimages [76]7 . Using this method, we computed
¢
¡
rotational alignments in about 30 minutes using all eight cores (here the
the 40,000
2
computation ran in parallel). The images are first radially masked since pixels near
7
The recently developed Fourier-Bessel steerable basis [122] can improve the initial nearest neighbor search by more accurate estimation of the steerable basis, but in this chapter we are concerned
about how class averaging matrix can further improve the initial classifications.
84
the image boundaries (also known as the “ears”) correspond to noise rather than
signal. Then, the masked images are linearly projected onto the subspace spanned
by a specified number M of eigenimages that correspond to the largest eigenvalues of
the images covariance matrix. After this linear projection, the images are represented
by their M expansion coefficients instead of their original pixel-wise representation,
thus achieving significant compression for M ≪ 1292 . This compression not only
facilitates a faster computation for rotational alignment but also amounts to filtering.
The particular choice of M can be assisted by inspecting the numerical spectrum
of the covariance matrix, but we do not go into the details of this procedure. For
SNR = 1/64 we used M = 55 eigenimages, for SNR = 1/32 we used M = 90, for
SNR = 1/16 we used M = 120, and for higher values of the SNR we used M = 200.
The histograms of Figure 4.6 demonstrate the ability of small rotationally invariant
distances to indicate images with similar viewing directions. For each image we use
the rotationally invariant distances to find its 40 nearest neighbors among the entire
set of 40, 000 images. In our simulation we know the original viewing directions, so for
each image we compute the angles (in degrees) between the viewing direction of the
image and the viewing directions of its 40 neighbors. Small angles indicate successful
identification of true neighbors that belong to a small spherical cap, while large angles
correspond to outliers, that later lead to shortcut edges in the graph. We see that
for SNR = 1/2 there are no outliers, and all the viewing directions of the neighbors
belong to a spherical cap whose opening angle is about 8◦ . However, for lower values
of the SNR, there are outliers, indicated by arbitrarily large angles (all the way to
180◦ ).
After computing the rotationally invariant distances and optimal alignment angles,
we use the N = 40 nearest neighbors of each image to construct the sparse matrix H
using the “intersection” rule (i.e., {i, j} ∈ E iff i is one of the 40 nearest neighbors of
j and j is one of the 40 nearest neighbors of i). Note that, following the application
85
of the intersection rule, it may happen (especially for low values of the SNR) that
some vertices are of degree 0 (i.e., vertices that have no links with other vertices). We
delete from the matrix H the rows and columns corresponding to degree 0 vertices.
After this removal, it is still possible for different vertices to have different degrees
(between 1 and 40), rendering the importance of the normalization step H̃ = D−1 H.
We then compute the eigenvectors and eigenvalues of H̃. Figure 4.7 shows the top
10 eigenvalues of H̃ for different values of the SNR. The multiplicity 3, and even
the multiplicity 5, are evident for high values of the SNR, such as SN R = 1/2 and
SN R = 1/16. Figure 4.8 shows scatter plots of the dot products between the viewing
directions hvi , vj i and the Gij ’s that are deduced from the computed eigenvectors. For
SNR = 1/2, SNR = 1/16, and SNR = 1/32 we get the desired linear correspondence
between hvi , vj i and Gij from which we can infer the true neighbors. For SNR = 1/64
we do not get the desired spectrum, and from the values of Gij we cannot infer the true
neighbors. Still, a closer examination of the scatter plot for SNR = 1/64 reveals that
points are not scattered randomly but rather have some structure. In section 4.6.4
we discuss how to improve the identification of true neighbors, but already at this
point we have established that our method works well even for SNR as low as 1/32.
4
2.5
5
x 10
3
5
x 10
2.5
2.5
2
4
x 10
14
2
10
2
8
1.5
1
N
N
N
1.5
N
1.5
6
1
1
0.5
0
0
4
0.5
0.5
2
4
a coshvi , vj i
6
(a) SNR= 1/2
8
0
0
x 10
12
60
120
0
0
180
2
60
120
180
0
0
60
120
a coshvi , vj i
a coshvi , vj i
a coshvi , vj i
(b) SNR= 1/16
(c) SNR= 1/32
(d) SNR= 1/64
180
Figure 4.6: Histograms of the angle (in degrees, x-axis) between the viewing directions
of 40, 000 images and the viewing directions of their 40 nearest neighboring images
as found by computing the rotationally invariant distances.
86
−3
5
x 10
0.014
0.03
0.012
0.025
0.15
4
0.006
2
4
i
6
8
0
0
10
(a) SNR= 1/2
0.05
0.005
0.002
0
0
0.015
0.01
0.004
1
0.1
1 − λi
2
0.02
0.008
1 − λi
3
1 − λi
1 − λi
0.01
2
4
i
6
8
0
0
10
(b) SNR= 1/16
2
4
i
6
8
10
0
0
(c) SNR= 1/32
2
4
i
6
8
10
(d) SNR= 1/64
1
1
0.5
0.5
0.5
0.5
0
0
−0.5
−0.5
−0.5
0
hvi , vj i
0.5
1
(a) SNR= 1/2
Gij
1
Gij
1
Gij
Gij
Figure 4.7: Bar plots of the 10 largest eigenvalues {λi }10
i=1 of H̃. Since the eigenvalues
are very close to 1, the y-axis corresponds to 1 − λ.
0
−0.5
−0.5
−0.5
0
hvi , vj i
0.5
1
−0.5
(b) SNR= 1/16
0
0
hvi , vj i
0.5
1
(c) SNR= 1/32
−0.5
0
hvi , vj i
0.5
1
(d) SNR= 1/64
Figure 4.8: Scatter plots of Gij (computed from the top three eigenvectors of H̃)
against hvi , vj i.
4.6.3
Numerical comparison with diffusion maps
We compared our algorithm against what we refer to as the “diffusion maps” approach. Diffusion maps were introduced in [13, 52], but here we do not assume the
reader to be familiar with that method in its generality. We are not aware of any earlier attempts to apply diffusion maps for the solution of the class averaging problem
in cryo-EM. We note, however, that the diffusion maps approach presented below is
quite similar to the solution to the problem of two-dimensional (2D) random tomography [100], where the goal is to find the unknown viewing angles of one-dimensional
(1D) tomographic projections of an unknown 2D object; see also [4, 5]. In that regard, we emphasize that the main difference between the 3D cryo-EM class averaging
problem and its 2D counterpart is perhaps the extra angular information that we
encode in the class averaging matrix H. This angular information does not exist in
the 2D problem, where 1D projection signals (instead of 2D images) are compared.
87
In the diffusion maps approach, the matrix H is replaced by the adjacency matrix
A of the graph G = (V, E). Specifically,
Aij =



1
if {i, j} ∈ E
(4.25)


0, if {i, j} ∈
/ E,
Notice that Aij = |Hij |. We normalize the matrix A the same way we normalized
H. That is, we define à as à = D−1 A. Under the clean model of section 4.5,
the eigenvectors of A approximate the eigenfunctions of an integral operator over
the sphere whose kernel function is the characteristic function of a small spherical
cap. As such, this operator commutes with rotations, and from the Funk-Hecke
theorem [64, p. 195] it follows that its eigenfunctions are the spherical harmonics,
whose multiplicities are 1, 3, 5, . . . (see, e.g., the discussion in [16, section 4.4]). In
particular, the second, third, and fourth eigenvectors φ2 , φ3 , and φ4 of à correspond
to the linear spherical harmonics x, y, and z (multiplicity 3), where v = (x, y, z) is
the viewing direction. The first eigenvector φ1 corresponds to the constant function
and is therefore not required by this method. This motivates us to define
Φi = (φ2 (i), φ3 (i), φ4 (i))
(4.26)
and
G′ij =
hΦi , Φj i
kΦi kkΦj k
(4.27)
The above discussion implies that under the “clean” model of section 4.5 we have
that G′ij = hvi , vj i.
Figure 4.9 depicts bar plots of the largest 10 eigenvalues of the matrix à for
different values of the SNR. Notice that λ1 = 1 and that the multiplicity 3 of λ2 , λ3 ,
λ4 and the multiplicity 5 of λ5 , . . . , λ9 are evident for SNR = 1/2 and SNR = 1/16.
88
−3
6
x 10
0.035
0.015
0.15
0.03
5
3
2
0.005
0.1
0.02
1 − λi
1 − λi
0.01
1 − λi
1 − λi
4
0.025
0.015
0.05
0.01
1
0.005
0
0
2
4
i
6
8
0
0
10
(a) SNR= 1/2
2
4
i
6
8
0
0
10
(b) SNR= 1/16
2
4
i
6
8
0
0
10
(c) SNR= 1/32
2
4
i
6
8
10
(d) SNR= 1/64
1
1
0.5
0.5
0.5
0.5
0
0
−0.5
−0.5
−0.5
0
hvi , vj i
0.5
(a) SNR= 1/2
0
−0.5
−0.5
1
G′ij
1
G′ij
1
G′ij
G′ij
Figure 4.9: Bar plots of the 10 largest eigenvalues {λi }10
i=1 of Ã. Since the eigenvalues
are very close to 1, the y-axis corresponds to 1 − λ.
0
hvi , vj i
0.5
−0.5
1
−0.5
(b) SNR= 1/16
0
0
hvi , vj i
0.5
1
−0.5
(c) SNR= 1/32
0
hvi , vj i
0.5
1
(d) SNR= 1/64
1
1
0.5
0.5
G′ij
Gij
Figure 4.10: Scatter plots of G′ij (computed from the second, third and fourth eigenvectors of Ã) against hvi , vj i.
0
0
−0.5
−0.5
−0.5
0
hvi , vj i
0.5
1
−0.5
(a) H̃
0
hvi , vj i
0.5
1
(b) Ã
Figure 4.11: SNR = 1/40: scatter plots using the eigenvectors of H̃ (left) and using
the eigenvectors of à (right).
Figure 4.10 shows scatter plots of Gij against hvi , vj i. The diffusion maps approach
works well even for SNR = 1/32, but also breaks down for SNR = 1/64. From
the numerical experiments shown thus far we cannot conclude that the eigenvectors
of H̃ outperform the eigenvectors of à in terms of improving the identification of
the true neighbors. We therefore conducted another experiment with SNR = 1/40.
89
From Figure 4.11 we draw the conclusion that the method based on the matrix H̃
outperforms the diffusion maps approach.
4.6.4
Using more than three eigenvectors
Thus far we have seen the usefulness of the H matrix method for SNRs as low as
1/40, but it was not successful for SNR = 1/64. Also, the reader may question the
validity of the assumption that the viewing directions are uniformly distributed over
the sphere, an assumption that seems essential in order to get the linear relation
between Gij and hvi , vj i. We now address these important issues using both theory
and numerical simulations.
The first point that we would like to make here is that in the class averaging
problem, we are not required to find the viewing directions of the images. We just
need to find images with similar viewing directions. Once the neighboring images
are identified, they can be rotationally aligned and averaged to produce images of
higher SNR, and later procedures that are based on common lines and refinements
can be used to deduce the viewing directions of the class averages. Asking for the
viewing directions is perhaps too much to require at this preliminary stage of the data
processing. Instead, we can settle for improving our identification of neighbors.
The second point that we raise is that the eigenvectors of the matrix H̃ approximate the eigenfunctions of an integral operator over SO(3), even if the viewing directions are not uniformly distributed. Since the eigenfunctions are continuous (they
become more oscillatory for smaller eigenvalues of H̃), it follows that the eigenvectors
of H̃ do not change by much (as tangent vector fields to the sphere) for images whose
viewing directions are restricted to the same spherical cap. However, we do not need
to restrict ourselves to just the top three eigenvectors as we have done up to this
point. We can use more than just three eigenvectors in order to identify images with
similar viewing directions. Let K ≥ 3 be the number of eigenvectors ψ1 , ψ2 , . . . , ψK
90
K
that we use. For the ith image (i = 1, . . . , n) we define ΨK
as
i ∈ C
ΨK
i = (ψ1 (i), ψ2 (i), . . . , ψK (i)).
(4.28)
Moreover, we define the similarity measure GK
ij between image i and image j as
GK
ij =
K
|hΨK
i , Ψj i|
,
K
kΨK
i kkΨj k
(4.29)
and we postidentify an image j as a neighbor of image i if GK
ij is large enough.
5
5
2.5
x 10
2.5
2
1.5
1.5
15
N
N
x 10
10
N
2
4
x 10
1
1
0.5
0.5
5
0
0
60
120
180
0
0
60
120
180
0
0
60
120
a coshvi , vj i
a coshvi , vj i
a coshvi , vj i
(a) H̃
(b) Ã
(c) dij
180
Figure 4.12: SNR= 1/64. Histogram of the angles (x-axis, in degrees) between the
viewing directions of each image (out of 40, 000) and its 40 neighboring images. (a)
Neighbors are postidentified using K = 80 eigenvectors of H̃. (b) Neighbors are postidentified using 80 eigenvectors of Ã. (c) Neighbors are identified using the original
rotationally invariant distances
This classification method is proved to be quite powerful in practice. We applied
it with K = 80 to the set of noisy images with SNR = 1/64, for which we observed
already that K = 3 failed. For every image we find its 40 nearest neighbors based
on GK
ij . In the simulation we know the viewing directions of the images, and we
compute for each pair of suspected neighboring images the angle (in degrees) between
their viewing directions. The histogram of these angles is shown in Figure 4.12a.
About 92% of the identified images belong to a small spherical cap of opening angle
20◦ , whereas this percentage is only about 65% when neighbors are identified by the
rotationally invariant distances (Figure 4.12c). For the diffusion maps approach, it is
91
also possible to use K ≥ 3 eigenvectors. The result of the diffusion maps approach
with K = 80 is shown in Figure 4.12b. Also for this modified diffusion maps approach
about 92% of the identified images are in a spherical cap of opening angle 20◦ . We
remark that for SNR = 1/50 this percentage goes up to about 98%. We did not
experiment with SNRs below 1/64.
4.7
Summary and discussion
In this chapter we introduced a new algorithm for improving the initial viewing angle classification based on rotationally invariant distances. This identification is an
important step in 3D structure determination of macromolecules from cryo-EM, because once identified, these images can be rotationally aligned and averaged to produce “class averages” of better qualities. The main advantage of the algorithm is
its extreme robust to noise. The algorithm is also very efficient in terms of running
time and memory requirements, as it is based on the computation of the top few
eigenvectors of a spares Hermitian matrix. These advantages were demonstrated in
our numerical experiments.
The main core of this algorithm is a special construction of a sparse Hermitian
matrix H. The nonzero entries of H correspond to pairs of images whose rotationally
invariant distances are relatively small. The nonzero entries are set to take complexvalued numbers of unit magnitude, with phases that are equal to the optimal inplane rotational alignment angles between the images. We show that images with
nearby viewing angle alignments are identified from the top three eigenvectors of that
matrix. The construction of the matrix H here is similar to the construction in our
solution to the angular synchronization problem [95]. The main difference is that
while angular synchronization required only the top eigenvector, here we need the
top three eigenvectors in order to identify images with nearby viewing direction. We
92
explain this difference using the topology of the sphere and, more specifically, the
hairy ball theorem.
The robustness of the algorithm is guaranteed by random matrix theory, whose
application here is made possible by the special construction of H, since the average
of points on the unit circle in the complex plane is 0. The admissibility (correctness)
of the algorithm follows from the spectral properties of H, which result from the fact
that H is a discretization of a local version of the parallel transport operator on S 2 .
The exact analysis of the spectral properties of this local parallel transport operator
are reported in [33] using representation theory.
The assumption that the viewing angles are uniformly distributed over S 2 is important to our spectral analysis, though it may not hold for some experimental data sets.
We emphasize that our algorithm identifies the correct neighbors also for nonuniform
distributions, because the eigenvectors of H (and H̃) are continuous as a function of
the viewing directions, regardless of the underlying distribution. In other words, a
large value for Gij indicates that Ii and Ij have similar viewing angles also in the case
of a nonuniform distribution. We also showed numerical results in which using more
than three eigenvectors improves the identification of the true neighbors.
We assumed the homogeneity case, that is, that all images correspond to the exact
same 3D structure of the molecule. In the heterogeneity case, the molecule may have
several different conformations, so that projection images also need to be classified
accordingly. This classification is often extremely difficult, due to the low SNR of the
images and the similarity between the 3D structures of different conformations. We
believe that the algorithm presented here may assist in solving the conformational
classification problem.
93
Chapter 5
Vector Diffusion Maps and Class
Averages
Diffusion maps and other non-linear dimensionality reduction methods are either
directly or indirectly related to heat kernel for functions over the data, the vector
diffusion maps (VDM) framework is based on the heat kernel on vector fields. The
construction of the kernel involves the weighted graph and the pairwise orthogonal
transformations. Through the spectral decomposition of this kernel, VDM defines an
embedding of the data in a Hilbert space. In this chapter, we extend the construction
of class averaging matrix and show the application of VDM in cryo-EM class averaging
problem. The metric defined by VDM is more meaningful than currently used metrics,
like rotationally invariant Euclidean distance, diffusion mapping distance, etc., since
it takes into account the linear transformations.
5.1
Vector diffusion maps
In many applications, the representation of the data set can be vastly improved by
attaching to every edge of the graph a linear transformation. In particular, for the
class averaging problem, the graph edge weight is denoted as wij and eιαij character94
izes the local orthogonal transformation between data points i and j. In the VDM
framework, we define the affinity between i and j by considering all paths of length
2t connecting them (t is a parameter called diffusion time). Instead of just summing
the weights of all paths, we sum the transformations. A path of length t from j to i
is some sequence of vertices j0 , j1 , ..., jt with j0 = j and jt = i and its corresponding
orthogonal transformation is obtained by multiplying the orthogonal transformations
along the path in the following order:
Ojt ,jt−1 ...Oj2 ,j1 Oj1 ,j0 .
(5.1)
Every path from j to i therefore results in a different transformation. This is analogous
to the parallel transport operator. We would like to define the affinity between i and
j as the consistency between these transformations, with higher affinity expressing
more agreement along the transformations that are being averaged. We note that such
mathematical and algorithmic framework is an extension for diffusion maps [13]. It
was proved that for diffusion maps [52, 13, 94, 63], the discrete random walk over the
data points converges to a continuous diffusion process over the manifold in the limit
that the number of data sample n → ∞ and the kernel parameter ǫ → 0. The graph
Laplacian on the manifold converges pointwise to the Fokker-Planck operator [94, 63].
Singer and Wu [98, 99] recently proved convergence theorems illuminating the relation
between VDM and the connection-Laplacian operator for vector fields over a manifold.
The results for spectral convergence also hold in the case where the data points
are sampled from a non-uniform distribution, and for manifolds with and without
boundary.
For the application in class averaging, VDM takes into account the consistency of
in-plane rotational transformations (see Figure 5.1). The affinity between images Ii
and Ij (shown as nodes i and j) is defined as the consistency of the transformations
95
summed over all different paths of a fixed length connecting i and j. To quantify
this, we build a sparse n × n Hermitian matrix H (5.2) using the graph construction
which will be discussed in section 5.2.
Hij =



wij eιαij


0,
{i, j} ∈ E,
(5.2)
{i, j} 6∈ E,
where E denotes the edge set of neighboring pairs, wij is the edge weights and αij
is the optimal alignment. The fact that H is Hermitian follows from αij = −αji
(a)
(b)
Figure 5.1: Illustration for Vector Diffusion Map (VDM) affinity. Pick an arbitrary
planar vector for node i (realized as a complex number eιφ ). Consider two different
paths from i to l of length 2: i → j → l and i → k → l. The arrow is rotated
according to the edges i → j and i → k, respectively (by multiplying it by the
phase factors, eιθij and eιθik , respectively), and then rotated according to the edges
j → l and k → l, respectively (by multiplying it by the phase factors, eιθjl and eιθkl ,
respectively). Different paths may be consistent as in (a) or inconsistent as in (b).
When vectors from different paths are added together the amplitude of the resulting
vector can be as large as the number of paths if they are all consistent (a), or much
smaller due to inconsistencies (b). Node i and node l have higher affinity in (a) than
in (b).
mod 2π. Moreover, since only neighboring images contribute non-zero entries in H,
it follows that H is a sparse matrix whose storage requires only O(nk) space. Each
row of H is divided by the degree of the corresponding image, yielding the matrix S
that is given by
S = D−1 H,
96
(5.3)
where D is an n × n diagonal matrix with
D(i, i) = deg(i) = #{j : {i, j} ∈ E}.
(5.4)
The matrix S in equation (5.3) is similar to the Hermitian matrix
S̃ = D−1/2 HD−1/2
(5.5)
through S = D−1/2 S̃D1/2 . We can define the affinity between i and j as |S̃ 2t (i, j)|2 ,
that is, as the squared absolute value of S̃ 2t (i, j), which takes into account all paths
of length 2t, where t is a positive integer. In a sense, |S̃ 2t (i, j)|2 measures not only the
number of paths of length 2t connecting i and j but also the amount of agreement
between their transformations. That is, for a fixed number of paths, |S̃ 2t (i, j)|2 is
larger when the path transformations are in agreement, and is smaller when they
differ. We define the normalized affinity between i and j as
|S̃ 2t (i, j)|2
q
.
2t
2
2t
2
|S̃ (i, i)| |S̃ (j, j)|
(5.6)
Since S̃ is Hermitian, it has a complete set of eigenvectors v1 , v2 , . . . , vn and real
eigenvalues λ1 , λ2 , ..., λn . We order the eigenvalues in decreasing order of magnitude.
The spectral decomposition of S̃ and S̃ 2t are given by
S̃(i, j) =
n
X
λl vl (i)vl (j),
2t
and S̃ (i, j) =
l=1
n
X
λ2t
l vl (i)vl (j).
(5.7)
l=1
It follows that the affinity |S̃ 2t (i, j)|2 is an inner product for the finite dimensional
2
Hilbert space Cn via the mapping Vt :
Vt : i 7→ ((λl λr )t hvl (i), vr (i)i)nl,r=1 .
97
(5.8)
That is,
|S̃ 2t (i, j)|2 = hVt (i), Vt (j)i.
(5.9)
Note that in the manifold learning setup, the embedding i 7→ Vt (i) is invariant to the
choices of basis of Txi M because the dot products hvl (i), vr (i)i are invariant to unitary
transformations. Vt is denoted as vector diffusion mapping. From the symmetry of the
dot products hvl (i), vr (i)i = hvr (i), vl (i)i, it is clear that under the following mapping,
Ṽt : i 7→ (clr (λl λr )t hvl (i), vr (i)i)1≤l≤r≤n
where
clr =
(5.10)

√


 2 l<r


1,
l = r.
the affinity between i and j can be written as
2t
³
´
|S (i, j)| = Re hṼt (i), Ṽt (j)i .
(5.11)
The symmetric vector diffusion distance dVDM,t (i, j) between nodes i and j is
defined as
d2VDM,t (i, j) = |S 2t (i, i)|2 + |S 2t (j, j)|2 − 2|S 2t (i, j)|2
= hVt (i), Vt (i)i + hVt (j), Vt (j)i − 2hVt (i), Vt (j)i
(5.12)
and the normalized affinity (5.6) can be expressed using the mapping Vt as
|S̃ 2t (i, j)|2
q
=
2t
2
2t
2
|S̃ (i, i)| |S̃ (j, j)|
98
¿
Vt (i) Vt (j)
,
|Vt (i)| |Vt (j)|
À
.
(5.13)
The matrix S̃ 2t maybe be too dense to be computed efficiently. Instead, we can
approximate the normalized affinity (5.13) by truncating the mapping Vt to its leading
m2 coordinates (instead of n2 ) as
Vtm : i 7→ ((λl λr )t hvl (i), vr (i)i)m
l,r=1 .
(5.14)
where m is the largest integer satisfying λ2t
m > δ for some δ much smaller than 1. The
approximate normalized affinity becomes
¿
Vtm (i) Vtm (j)
,
|Vtm (i)| |Vtm (j)|
À
.
(5.15)
We use (5.6) as the measure of closeness between two images to improve our
estimation of the k nearest neighbors for each image. This measure of affinity can be
approximated using the eigenvectors of the matrix S̃ as shown in (5.14) and (5.15).
The algorithm is very efficient in terms of running time and memory requirements,
because it is based on the computation of the top eigenvectors of a sparse Hermitian
matrix. The application of VDM in class averaging is summarized in Algorithm 2.
5.2
Graph construction
Graph construction problem is universal in manifold learning problems, for example,
in [6], the Gaussian kernel is used to determine the edge weights. In our algorithm
sparsity is important. There are two typical ways to build a neighborhood graph,
the ǫ-neighborhood graph connecting samples with in a distance of ǫ (4.2), and the
k nearest neighbor (kNN) graph connecting k closest samples. Usually, the kNN
graph remains the more common approach since it is more adaptive to scale and data
density. Each node merely connects to its k nearest neighbors using some similarity
function and instantiates k undirected edges between itself and the neighbors. How99
Algorithm 2 Classification using Vector Diffusion Map
Require: n rectangular cryo-EM projection images, I1 , ..., In .
Compute the rotationally invariant distances dij and the optimal alignment angles
αij as in (3.4) and (3.5) between all pairs of images.
Construct the sparse n × n matrix as in (5.2) with the edge weights wij determined
in (5.20).
P
Define S̃ = D−1/2 HD−1/2 , where D is an n×n diagonal matrix with Dii = nj=1 wij .
Compute orthonormal eigenvectors of S̃, denoted v1 , ..., vn ∈ Cn with corresponding
λ1 , ..., λn .
Do truncated Vector Diffusion Map embedding. For fixed m and δ, choose t such
2t
that λ2t
m > δ > λm+1 .
Vtm : i 7−→ (λl λr )t (hvl (i), vr (i)i)m
l,r=1 ,
(5.16)
where vl (i) is the i’th entry of vector vl (l = 1, ..., m).
Define a measure of affinity Gij between images Ii and Ij as
Gij =
hVt (i), Vt (j)i
,
||Vt (i)||||Vt (j)||
i, j = 1, ..., n
(5.17)
Ii = {j : Gij > 1 − γ},
(5.18)
Declare neighbors of Ii as
neighbors of
where 0 < γ ≪ 1 controls the size of the neighborhood.
ever, connecting neighborhood vertices using union rule (i.e., i and j are neighbors,
if i is j’s k nearest neighbor or j is i’s k nearest neighbor) usually results in imbalance between the degree of vertices and uneven graph connectivity, this was observed
in numerical experiments and the degree imbalance gets worse when the signal-tonoise ratio becomes smaller. Radovanović et al. showed in [81] that such hubness, or
emergence of popular nearest neighbors in high dimensional data is another aspect
of the curse of dimensionality and is related to the concentration of distance in high
dimensional space.
To mitigate the effects of the imbalance in graph connectivity, Jebara et al. proposed a b-matching algorithm [39] to construct balanced graph structure and showed
the importance of the graph construction in semi-supervised learning. The objective
100
for their algorithm is to solve the following optimization problem,
minimize
wij ∈{0,1}
X
wij d(xi , xj )
ij
X
subject to
wij = b,
j
wii = 0,
wij = wji . ∀i, j ∈ 1, . . . , n
(5.19)
where d(·, ·) is a distance function with some kind of similarity function k(·):
p
d(xi , xj ) = k(xi , xi ) + k(xj , xj ) − 2k(xi , xj ). This optimization problem achieves
symmetry directly without post-processing, like using union rule in kNN graph, and
each node is of degree b.
Instead of constraining the matrix to be a binary matrix, we relaxed the condition
to wij ∈ [0, 1]. This problem then can be solved using linear programming,
minimize
wij ∈R
subject to
X
wij d(xi , xj )
ij
X
wij = k,
j
wii = 0,
0 ≤wij ≤ 1,
wij = wji , ∀i, j ∈ 1, . . . , n.
(5.20)
This graph construction method solves the problem of the imbalance of degree
distribution we encountered for very noisy experimental data set. The graph construction problem was also noted by Spielman et al. [18]. They introduced a measure
of how well a combinatorial graph fits a collection of vectors. The optimal graphs
under this measure can be computed by solving convex quadratic programs.
101
5.3
Alignment using eigenvectors
In Section 5.1, the eigen-decomposition of Hermitian matrix S̃ gives a complete set
of eigenvectors v1 , v2 , . . . , vn . These eigenvectors encodes the information for in-plane
rotational alignment between neighboring images. For clean images, if i and j are of
the same viewing directions and their in-plane alignment angle is αij , the following
holds
vl (i) = eιαij vl (j),
∀l = 1, ..., n.
(5.21)
This is illustrated in Figure 5.2. When the viewing directions are close (though not
(a)
(b)
Figure 5.2: When i and j are of the same viewing angle, the tangent plane at point
i coincides with the tangent plane at point j and the eigenvectors satisfy equation
(5.21).
identical), then (5.21) holds approximately. The level of approximation deteriorates
as the eigenvalues become smaller. Therefore, we estimate the rotational angle using
the top m eigenvectors:
∗
αij
= argmin
αij
= argmin
αij
m
X
l=1
m
X
l=1
ιαij
λ2t
vl (j)k22
l kvl (i) − e
( m
)
X
¡
¢
ιαij
λ2t
kvl (i)k22 + kvl (j)k22 − 2Re
λ2t
vl (j)
l
l vl (i)e
(5.23)
l=1
(
= argmax Re eιαij
αij
(5.22)
m
X
l=1
)
λ2t
l vl (i)vl (j)
102
(5.24)
Therefore
ια∗ij
e
Pm
λ2t
l vl (i)vl (j)
.
= Pl=1
m
| l=1 λ2t
l vl (i)vl (j)|
(5.25)
In this way, we improve the initial estimation for in-plane rotational alignment angles.
To summarize the above, denote the images as I = {Ii }ni=1 . We have the following
algorithm to get better “in plane rotation” alignment estimation from the noisy cryoEM images:
1. Build up the n × n affinity matrix H from the I. The (i, j)-th entry of H
is wij eιαij , where wij is determined using the graph construction discussed in
Section 5.2 and αij is the estimated in plane rotation.
2. Denote the eigenvectors of D−1/2 HD−1/2 as n × 1 column vectors v1 , v2 , . . . , vn ,
where D is the degree of each node.
3. For each i construct the 1 × p complex row vector Ei = (λt1 v1 (i), . . . , λtp vp (i)),
where p < n depends on the noise level. Note that vk (i) ∈ C for k = 1, . . . , n
and i = 1, . . . , n.
4. If Ii and Ij are determined to be the same, for example, by VDM, then a better
in plane rotation can be achieved by Ei Ej† after proper normalization.
Why does this work? The main observation is that I is sampled from a smooth
manifold diffeomorphic to SO(3). SO(3) is the frame bundle over S 2 with fiber SO(2).
We represent an image Ii by (xi , Ri ), where xi ∈ S 2 and Ri ∈ SO(2). Thus, we have
the following interpretation for the above algorithm.
Fix Ii = (xi , Ri ). Denote {ixi ,1 , . . . , ixi ,N } the set of indices of the images with
the same projection direction xi ∈ S 2 . For a fixed 1 ≤ j ≤ p, the set of vectors
{vj (ixi ,1 ), vj (ixi ,2 ), . . . , vj (ixi ,N )} is actually a set of different coordinates of the same
vector in Txi S 2 with related to different bases. On the other hand, Ei can be viewed as
a set of coordinates of different vectors with related to the same basis. When xi = xj ,
103
Ei Ej† is a better estimation of the in plane rotation from Rj to Ri . represented by
complex numbers. Moreover, when xi 6= xj but are close, Ei Ej† is a better estimation
of the parallel transport from xj to xi corresponding to the respective bases.
We can also generalize it to the general manifold case. For the general manifold
setup, we consider the d-dim smooth closed manifold and the frame bundle L(M)
over M, which is a principal bundle over M with the fiber O(d). Suppose we have a
set of noisy samples from L(M), where the noise is tangential to the fiber. Then we
can run the following generalized algorithm to get a better estimation of the parallel
transport. Given the data set X := {xi }ni=1 .
1. Build up the nd × nd matrices H,
Hij =



wij Oij


0,
{i, j} ∈ E,
(5.26)
{i, j} 6∈ E.
The matrix H is a block matrix with n × n blocks. Each block is either a d × d
orthogonal transformation Oij multiplied by a scalar weight wij or a zero d × d
matrix. The edge set does not include self loops, therefore wii = 0. The matrix
T
H is symmetric since Oij
= Oji . The degree matrix D is defined as
D(i, i) = deg(i)Id×d ,
where deg(i) =
Pn
j=1
(5.27)
wij .
2. Denote the eigenfunction of D−1/2 HD−1/2 as nd×1 column vectors v1 , v2 . . . , vn .
Denote ṽk (i) the d × 1 column vector (vk (d(i − 1) + 1), . . . , vk (di))T .
3. For each i, construct the d × p matrix Ei = (λt1 ṽ1 (i), . . . , λtp ṽp (i)), where p
depends on the noise level.
104
4. If xi and xj are determined to be close, then the better parallel transport from
xj to xi with related to the bases of Txi M and Txj M can be achieved by Ei EjT
after proper normalization.
When xi = xj , Ei EjT is a better estimation of the in plane rotation from one basis
to another one.
However, notice that ṽl (i) is not always non-zero. Due to the existence of noise,
ṽl (i) might be unstable. So it is better to consider some thresholding scheme to avoid
this instability to improve the final result. Indeed, we could set δ > 0 in the algorithm
and ignore k if |ṽk (i)| < δ or |ṽk (j)| < δ. We can update step 3 to:
3a Fix i, j, i 6= j and δ > 0. Find the set of indices k1 , . . . , kq , q ≤ p, so that
|ṽkl (i)| > δ and |ṽkl (j)| > δ for all l = 1, . . . , q. Here p chosen in the same way.
3b Construct the d × q matrix Ei = (λtk1 ṽk1 (i), . . . , λtkq ṽkq (i)).
This scheme can be used to improve estimation of the parallel transport over a manifold from noisy data points and allow us in practice get a more consistent estimation
of the tangent plane.
Section 5.4 shows numerical results that VDM greatly improves the classification
and alignment. The reason for the noise toleration is similar to what is discussed in
Section 4.5. Neighboring images are aligned together to generate class means. When
the signal to noise ratio is extremely low, it is unavoidable to have some outliers. In
such case, it is helpful to use Euclidean median instead of mean [11].
5.4
Experimental results
We performed numerical experiments to test the accuracy of our algorithm on a
machine with 2 Intel(R) Xeon(R) CPUs X5570, each with 4 cores, running at 2.93 GHz
with 200GB RAM in total. We compare the viewing angle classification and alignment
105
results from normalized cross-correlation of the rotationally invariant features with
the vector diffusion maps classification on three different simulated data sets. The
simulated images follow the following image formation model,
I = F −1 (CTF∆z × FIclean ) + n,
(5.28)
where Iclean is the clean projection image of 70S ribosome (see top row of Figure 3.4). n
represents additive incoherent background noise. F is the Fourier transform operator
and F −1 is the inverse Fourier transform. The contrast transfer function (CTF) [24,
Chapter 2, Section 3] that modulates the clean image in Fourier space changes with
the defocus value ∆z. Assuming there is no astigmatism1 , the contrast transfer
function has the expression as below,
CTF∆z (k) = sin(−πλ∆zk 2 +
π
Cs λ3 k 4 − α) exp(−Bk 2 ),
2
(5.29)
where λ is the electron wavelength, Cs is the spherical aberration and B-factor [85]
controls the Gaussian envelope decay rate. Three examples of the contrast transfer
function with different defocus values ∆z are depicted in Figure 5.3. In the simulation,
I used 20 different defocus groups with defocus values between 1.5µm and 4µm. The
number of images is 104 and there are 500 images in each defocus group. CTF
correction during the alignment procedure is done by phase-flipping [67], that is,
I ′ = F −1 (sign(CTF∆z ) × FI).
(5.30)
For reconstructed volumes for both simulated data set and the real experimental
data sets (see top row of Figure 5.10 and Figure 5.13), we used Fourier shell corre1
It is worth pointing out that typical cryo-EM images to be used for image reconstruction have
a negligible level of astigmatism (Thon ring eccentricity < 0.01). In fact, single-particle cryo-EM
studies have been able to obtain near-atomic-resolution, ( 4Å) 3D reconstructions without the need
of considering astigmatism in the images [40, 53].
106
contrast transfer functions
1
∆ z = 1.5 µ m
∆ z = 2.7 µ m
∆z=4µm
0.5
0
−0.5
−1
0
0.05
0.1
spatical frequency Å−1
0.15
Figure 5.3: Examples contrast transfer functions with different defocus values. Con2
trast transfer function parameters: V = 200kV , λ = 0.0251Å, B-facter = 100Å ,
α = 0.07 (amplitude contrast component), Cs = 2.26mm (spherical aberration), resolution is 2.82Å/pixel and the image size is 129×129 pixels. Defocus values are 1.5µm,
2.68µm, and 4µm.
lation (FSC) [87, 37][24, Chapter 8] to evaluate the accuracy or the resolution of
the reconstructions. FSC measures the normalized cross-correlation between two 3D
volumes over corresponding spherical shells in the Fourier space, i.e.,
P
FV1 (ri )FV2 (ri )
P
2
2
ri ∈r |FV1 (ri )|
ri ∈r |FV2 (ri )|
F SC(r) = qP
ri ∈r
(5.31)
where F(V1 ) and F(V2 ) are the Fourier transform of volume V1 and volume V2 respectively and the overhead bar means taking complex conjugate. ri is the individual
voxel element of radius r in Fourier space. The reconstructed volumes shown in Figure 5.9, 5.11 and 5.14 are visualized using the software Chimera [74]. We use 0.143
cutoff criterion [84, 91, 3] to determine the resolution of the ab inito models. In other
107
words, the spatial frequency at which the Fourier shell correlation value drops below
1.43 is determined to be the resolution.
5.4.1
Simulated data
Simulated images with additive white Gaussian noise
In the first experiment, the images were contaminated by white Gaussian noise with
pixel signal-to-noise ratio equal to 1/50. The weights wij were determined through
5
6
4
x 10
10
5
x 10
8
4
N
N
6
3
4
2
2
1
0
0
60
120
0
0
180
5
10
a coshvi , vj i
(a) before VDM
15
20
a coshvi , vj i
25
30
(b) after VDM
5
5
x 10
2.5
x 10
15
2
10
N
N
1.5
1
5
0.5
0
−180
−120
−60
0
αij − α̃ij
60
120
0
180
(c) before VDM
−10
−5
0
αij − α̃ij
5
10
(d) after VDM
Figure 5.4: Classification and alignment results for 104 simulated centered images
from 20 different defocus groups contaminated by additive Gaussian white noise
(SNR= 1/50). All images were phase flipped. k = 200. (a) Histogram of angular distance (in degrees) for nearest neighbors estimated by direct normalized crosscorrelation of rotationally invariant features. (b) Histogram of angular distance (in
degrees) for nearest neighbors from VDM classification. (c) Error in rotational alignment (in degrees) for nearest neighbor pairs found in (a). (d) Error in rotational
alignment (in degrees) estimated using the VDM eigenvectors.
108
the linear programming (5.20), with each degree equal to 10. In the nearest neighbor
search, we selected 200 nearest neighbors and also included the reflected images, in
effect, it increased the original data set to 20, 000 images. The initial classification
and alignment has a lot of outliers. However vector diffusion maps managed to
find nearest neighbors within 30◦ cap (see Figure 5.4b) and the in plane rotational
alignment error is within ±10◦ (see Figure 5.4d). This is an illustration of how
vector diffusion maps is robust to noise and outliers. In practice, we do not need to
find so many nearest neighbors to average over, because when images from a large
spherical cap neighborhood are averaged over, the resulting class average will be oversmoothed. Therefore, in the later two simulative experiments, we only look for 50
nearest neighbors.
Simulated images with colored noise
In the second experiment, the simulated images were contaminated by colored noise,
with noise response function f (k) =
√ 1
,
1+k2
where k is the radial frequency, pSNR=
1/20 (see the second row of Figure 5.5). We first estimated the power spectrum of
the background noise by computing the autocorrelation function from the boundary
pixels of the images, because the boundary pixels only contain information of the
noise,
τ (r) =
R
|r|=r
R
2 \D
Rx,x+r∈[−L,L]
R
|r|=r
I(x)I(x + r)dxdr
x,x+r∈[−L,L]2 \D
dxdr
,
(5.32)
where the image is of size 2L × 2L pixels and we assume the particle is compactly
supported in disk D. We assumed that noise power spectrum is radially symmetric,
so we estimated 1-D autocorrelation function τ (r). The Fourier transform of this
autocorrelation function τ (r) gave us an estimation of the background noise power
spectrum. The estimated noise response is plotted in red line in Figure 5.6a. The difference between the estimation and the true power spectrum is shown in Figure 5.6b.
Note that since the image size 2L is bounded, we do not have an accurate estimation
109
Figure 5.5: Simulated images with colored noise. Top row: Examples of clean projection images of 70S ribosome. Second row: Images modified by contrast transfer
function and colored noise. Pixel signal to noise ratio is 1/20. Bottom row: Images
pre-whitened by the estimated background power spectrum (red curve in Figure 5.6a).
of long range correlation and we cannot determine the DC components. Therefore the
estimated power spectrum at low frequencies has some discrepancy (see Figure 5.6b),
but this is tolerable in the experiments. The images were prewhitened using the
estimated background power spectrum P1 ,
Iprewhiten = F
−1
µ
FI
√
P1
¶
.
(5.33)
To avoid blowup at frequencies where P1 is close to zero, we introduced a small
regularization term ǫ,
Iprewhiten = F
−1
µ
FI
√
max{ P1 , ǫ}
110
¶
.
(5.34)
0.2
0.3
P
0
0.25
P1
0.2
(P0 − P1 )/P0
P
0.15
0.1
0.15
0.1
0.05
0.05
0
0
0
0.02
0.04
0.06
spatial frequency Å−1
−0.05
0
0.08
(a)
0.02
0.04
0.06
spatial frequency Å−1
0.08
(b)
Figure 5.6: Noise background estimation. (a) P0 : true radial power spectrum; P1 :
estimated radial power spectrum. (b) Difference between the estimated and true
power spectrum.
Figure 5.5 shows the clean projection images, images with colored noise, and the
prewhitened images. After prewhitening, we applied the same rotationally invariant
viewing angle classficiation and nearest neighbor alignment as described in Chapter 3.
The initial classification has 1, 721 pairs of nearest neighbors whose viewing angles
are more than 16◦ apart, whereas after VDM, all pairs of nearest neighbors are within
16◦ (see Figure 5.7b). Note that in this experiment, we also included the reflected
images.
Simulated images with shifts
In experimental data, the particles are shifted off center by a small amount of pixels
due to particle picking and crude centering procedures. To mimic this effect in simulation, we randomly shifted the clean projection images modulated with CTF with
maximum shifts of ±4 pixels and then added additive white Gaussian noise with
SNR= 1/50. To search for relative shifts between two images, we computed the expansion coefficients for all shifted copies of the original images and then computed the
rotationally invariant features. By finding the highest normalized cross-correlation of
111
4
5
x 10
3
2.5
2.5
2
2
N
N
3
1.5
1.5
1
1
0.5
0.5
0
0
60
120
x 10
0
0
180
4
acoshvi , vj i
(a) before VDM
2.5
2.5
2
2
N
N
3
1.5
1
0.5
0.5
−60
0
αij − α̃ij
60
120
0
180
(c) before VDM
x 10
1.5
1
−120
16
4
x 10
0
−180
12
(b) after VDM
5
3
8
acoshvi , vj i
−5
0
αij − α̃ij
5
(d) after VDM
Figure 5.7: Classification and alignment results for simulated centered projection
images modified by contrast transfer function and contaminated with colored noise.
All images were phase flipped. k = 50. (a)and (b): Histogram of angular distance
for nearest neighbors estimated by direct normalized cross-correlation (Initial) of rotationally invariant features and VDM classification. (c) and (d): Error in in-plane
rotational alignment estimated with normalized cross-correlation and the eigenvectors
of VDM.
one image with all the shifted copies of the other, we determined the best shift alignment. In the experiment, the shift search range was ±12 pixels in x and y directions
and the search step was 2 pixels. The shift estimation was then improved by solving the least squares problem (3.25) involving all relative shifts between neighboring
images.
The initial classification and alignment based on the normalized cross-correlation
were significantly improved by VDM. VDM managed to identify nearest neighbors
within a 20◦ spherical cap. The in-plane rotational alignment error is within ±6◦ and
112
5
4
5
x 10
3
x 10
6
2.5
2
4
2
N
N
1.5
1.5
1
2
1
0.5
0.5
0
0
60
120
180
acos < vi , vj >
0
−180 −120 −60
(a)
60
120
180
0
0
3
5
2.5
9
12
3
4
4
x 10
4
4
2
6
pixels
(c)
4
x 10
x 10
3
N
3
2
N
N
1.5
2
1
1
1
0.5
0
0
0
αij − α̃ij
(b)
4
3
x 10
N
2.5
5
10
15
acos < vi , vj >
(d)
20
25
0
−6
−4
−2
0
αij − α̃ij
2
4
6
(e)
0
0
1
2
pixels
(f)
Figure 5.8: Classification and alignment results with 104 noisy non-centered images.
Images are from 20 different defocus group and contaminated with additive white
Gaussian noise (SNR= 1/50). All images were phase flipped. k = 50. (a) Histogram
of angular distances between nearest neighbor pairs (in degrees). (b) Histogram of
errors in in-plane rotation (in degrees). (c) Initial shift alignment error. (d) Histogram
of angular distance (in degrees) between nearest neighbors after VDM classification.
(e) Histogram of error in in-plane rotational alignment (in degrees) estimated using
the eigenvectors of VDM. (f) Histogram of error in shift alignment (in pixels) for
nearest neighbors determined by VDM after least squares.
shift alignment error is within 2.5 pixels (see Figure 5.8). We applied common-lines
based algorithm [97, 116] to determine the orientation of the class averages and used
FIRM [115] to reconstruct the initial model. The volumes were CTF corrected during
the 3D reconstruction process. The reconstructed volume from 1000 class averages
is shown in Figure 5.9a. It is compared with the reference volume (Figure 5.9b) and
the volumes are consistent with each other up to 16Å (at this frequency the FCR2
(Figure 5.9c) between the two volumes drops below 0.7).
2
We use FCR 0.7 cutoff [68] criterion here because we are comparing reconstruction with the
clean volume. For real data experiments, the resolution is still decided by the 0.143 cutoff criterion.
113
Fourier Cross Resolution
1
0.8
0.6
0.4
0.2
0
−0.2
0
0.03
0.06
0.09
0.12
0.15
−1
Spatial frequency (Å
(a) reconstruction
(b) reference volume
0.18
)
(c) FCR
Figure 5.9: Reconstructions of 70S ribosome from non-centered simulated noisy projection images (SNR=1/50) with 20 different defocus groups. (a) Reconstruction from
the 1000 class averages. The volume is filtered by a Gaussian filter with bandwidth
1.0 in Chimera. (b) The reference volume that produces the simulated noisy images.
(c) Fourier cross resolution of the reconstructed volume with the reference volume.
5.4.2
Experimental data
Experimental data: 70S ribosome
We applied the pipeline of image denoising, classification and alignment to an experimental data set provided by Dr. Joachim Frank’s group [1]. This data set comes
from a larger heterogeneous data set with 216, 517 particles. ML3D [92] was used to
separate the data into 6 more homogeneous subsets. The data used here is class number 6 and contains 40, 778 projection images of 70S ribosome (see top row of Figure
5.10). The images are of size 125 × 125 pixels with 3.0Å/pixel and the electron beam
wavelength λ = 0.0197Å. They were pooled together from 77 different defocus groups
and they were CTF corrected by phase-flipping using (5.30). We split the data set
randomly into two equally sized groups, each containing 20, 389 images. 20 nearest
neighbors and the corresponding rotational and shift alignment were identified for
each image. The second row of Figure 5.10 shows the averaged images. To estimate
class average more accurately even at the presence of outliers (images from totally
different viewing directions), we used Euclidean median [11] instead of mean. 1000
class averages were used to build ab initio model for each group, with the common114
Figure 5.10: Top row: Examples of experimental images for 70S ribosome. Bottom
row: Class averages by averaging with their 20 aligned nearest neighbors. Courtesy
to Dr Joachim Frank.
lines based method [97, 116] for orientation determination and the Fourier iterative
reconstruction method (FIRM) [115] for reconstruction. The two volumes (see Figure
5.11) are consistent with each other up to 14.88Å (at this frequency, the Fourier shell
correlation (Figure 5.12) between the two volumes drops below 0.143). We were not
able to reconstruct initial model from the class averages estimated by the iterative
reference-free alignment and K-means clustering methods implemented in SPIDER.
Experimental data: 50S ribosomal subunit
A set of micrographs of E. coli 50S ribosomal subunit was provided by Dr. M. van
Heel. We applied our algorithms to this data set, which contains 27, 121 projection
images of the 50S ribosomal subunit. These micrographs were acquired by a Philips
CM20 electron microscope at 9 different defocus values between 1.37 and 2.06µm.
Each image (see top row of Figure 5.13) is of size 90 × 90 pixels with 3.36Å/pixel.
The particles were picked using the automated particle picking algorithm in EMAN
Boxer [54]. Then using the IMAGIC software package [111], the images were phaseflipped to remove the phase reversals in the CTF, bandpass filtered at 1/150 and
−1
1/8.4Å , normalized by their variance. The images were initially crudely centered
115
Figure 5.11: 70S ribosome reconstructed volumes. Left column: Snap shots of volume
1. Right column: Snap shots of volume 2.
116
Fourier Shell Correlation
1
0.8
0.6
0.4
0.2
0
0
0.04
0.08
0.12
spatial frequency Å−1
0.16
Figure 5.12: Fourier shell correlation between two independently reconstructed volumes. With 0.143 cutoff criterion, the resolution is 14.88Å. Blue line is the Fourier
shell correlation and the red line indicates the cutoff.
by correlating them with a fixed circularly-symmetric reference (rotationally averaged
total sum of the data).
We split the data set randomly into two groups of size 13, 560 to generate class
averages and reconstructions separately. Each image was identified with 50 nearest
Figure 5.13: Class averages. Top row: Examples of experimental images of 50S
ribosomal subunit. Bottom row: Class averages by averaging with their 50 aligned
nearest neighbors (including reflected images). Courtesy to Dr. Fred Sigworth.
neighbors (including reflection) and aligned to get class averaged images. We ran117
domly chose 200 class averages in each group to build the ab initio models with the
common-lines based method [97, 116] for orientation determination and the Fourier
iterative reconstruction method (FIRM) [115] for reconstruction. Figure 5.13 shows
5 examples of class averaged images from our algorithm. The two volumes (see Figure 5.14) are consistent with each other up to 11.4Å (at this frequency, the Fourier
shell correlation (Figure 5.15) between the two volumes drops below 0.143). We used
iterative reference-free alignment and K-means clustering algorithms implemented in
software SPIDER to generate 270 class means from the 13, 560 images in each group.
However we were not able to reconstruct from those class averages a 3D ab initio
model.
5.5
Summary and discussion
This chapter discussed the application of vector diffusion maps in cryo-EM class averaging problem. For extremely low signal to noise ratio, the initial classification
method that uses direct normalized cross-correlation of the rotationally invariant feature vectors breaks down. VDM, an algorithmic and mathematical framework for
analyzing data sets takes into account the consistency of in-plane rotational transformations between image pairs. This affinity is equivalent to an inner product, giving
rise to the embedding of the data points in a Hilbert space and to the definition of
distances between data points, to which we referred as vector diffusion distances. The
eigenvectors of the matrix S̃ contain the information of linear transformation between
nearest neighbor pairs and lead to a much more accurate estimation of the rotational
alignments. The methods presented in this chapter are also applicable for molecules
with non-trivial point-group symmetries. The pipeline for class averaging were tested
on both simulated data and experimental data. For the two sets of experimental data,
we were able to reconstruct ab initial models and achieved resolutions of 15Å and
118
Figure 5.14: Reconstructed volumes of 50S ribosomal subunit. Left column: Snap
shots of volume 1. Right column: Snap shots of volume 2.
119
Fourier Shell Correlation
1
0.8
0.6
0.4
0.2
0
0
0.04
0.08
spatial frequency Å−1
0.12
Figure 5.15: Fourier shell correlation between two independently reconstructed volumes of 50S ribosomal subunit. With 0.143 cutoff criterion, the resolution is 11.4Å.
Blue line is the Fourier shell correlation and the red line indicates the cutoff.
11Å, respectively, based on the 0.143 cutoff criterion for Fourier shell correlation of
two independently reconstructed volumes.
120
Chapter 6
Conclusions and Future Work
Cryo-EM, making it possible to visualize molecules under native conditions, is one of
the most important tools for the investigation of molecular machines. In this thesis,
we have developed new methods to solve class averaging problem for single particle
reconstruction (SPR) in cryo-EM. Because of the constraints on the electron dose on
macromolecules, the signal to noise ratio of the images is very low. Class averaging
is a crucial step before common-lines based ab initio reconstruction. We noticed that
the iterative reference-free alignment, a standard method used in this field for class
averaging, assumes there exists a global alignment for image data set. However, we
proved using Hairy Ball Theorem that such global alignment does not exist. Therefore
such method leads to misclassification and is unable to provide good reconstruction.
In Chapter 2, I developed an efficient and accurate algorithm for computing steerable PCA of a large set of two-dimensional images. The algorithm solves the two
major challenges for computing the steerable PCA basis. The first challenge is to
mitigate the increased computational cost associated with replicating each image
multiple times. The second challenge is associated with noise. The non-unitary
transformation from Cartesian to polar changes the noise statistics. In addition, if we
build the sample covariance matrix by simply enlarging the data set by the number
121
of rotations and reflection, the realizations of noise are no longer independent. The
algorithm starts by expanding each image, originally given on a Cartesian grid, in the
Fourier-Bessel basis for the disk. Because the images are bandlimited in the Fourier
domain, we derived a sampling criterion to truncate the Fourier-Bessel expansion such
that the maximum amount of information is preserved without the effect of aliasing.
The constructed covariance matrix with the expansion coefficients is invariant to rotation and reflection and has a special block diagonal structure. PCA is efficiently
done for each block (subspace with different angular Fourier mode) separately. The
computational complexity is smaller than the traditional PCA without the inclusion
of uniformly rotated copies. I am able to determine the number of informative components in each angular Fourier mode automatically. The Fourier-Bessel steerable
PCA detects more meaningful eigenimages and has improved denoising capability
compared to traditional PCA for a finite number of noisy images. The derivation of
the 2D case can be extended to 3D and it will be interesting to find applications in
volume data, such as functional MRI volumes.
In Chapter 3, I developed rotational invariant features for fast nearest neighbor
search. Each image is first expanded in steerable PCA basis to generate expansion
coefficients and the meaningful components are chosen automatically. Rotating an
image is equivalent to phase shifting the expansion coefficients. Thus I am able to
extend the theory of bispectrum of 1D periodic signals to 2D images. Unlike the
power spectrum, the bispectrum-like invariant features maintain phase information.
It is shift invariant, complete and unbiased. However, because the number of such
features is extremely large, it was regarded impractical for computation. I reduce
the dimensionality of the invariant feature vectors by principal component analysis
(PCA). I use a randomized algorithm for low rank matrix approximation [83] to efficiently compute the principal components, overcoming the difficulties imposed by the
large number of images and the high dimensionality of the input feature vectors. The
122
top M ′ principal components provide the reduced invariant image representation. The
rotationally invariant distance between images is then the Euclidean distance between
their reduced invariant representations. Images with the smallest invariant distances
are identified as nearest neighbors. For a large number of input images, a randomized
approximate nearest neighbor algorithm [43] is used to avoid computing the distances
between all pairs of images and effectively find the nearest neighbors in time nearly
linear with the number of images. This method reduces the computational complexity for finding rotational invariant k nearest neighbor pairs to O(kM ′ n log n), where
k ≪ n and M ′ is much smaller than the number of pixels in the region of interest. For
nearest neighbor image pairs, the alignment angles αij are then computed. Cryo-EM
images are not well centered, we would like to develop new invariant features which
are invariant to all rotations and small shifts in the future.
In Chapter 4 and Chapter 5, I showed the initial classification and alignment can
be further improved by taking into account the underlying manifold structure of the
data set and the consistency of group transformation among nearest neighbor pairs.
The algorithm successfully identify images with nearby viewing directions even in
the presence of high levels of noise in the images. The spectral properties of class
averaging matrix assumes the uniform distribution of the viewing directions. However,
this assumption is not needed for defining the affinity between any two points. We
are able to reconstruct electron density maps with high resolution from experimental
data sets.
Because cryo-EM images are modulated by point spread functions, which depends
on the defocus values, comparing images from different defocus groups is challenging.
Phase flipping is not an optimal solution, although it achieves relatively good results
in the experiments with the simulated data. It would be better to use the information
of the covariance matrix of the projection images without any contrast transfer modulation. One way to estimate it is to solve a least squares problem. Solving this least
123
squares problem is difficult because the size of the problem is humongous. We are
trying to estimate a covariance matrix of size L2 ×L2 , where L is the length of one side
of the image and usually L ≈ 100 pixels. However, with the help of the special block
diagonal structure of Fourier-Bessel steerable PCA, we might be able to decompose
the least squares problem into each sub-block and solve the problem in smaller size
and in parallel. In this way, we can estimate the true sample covariance matrix more
accurately and efficiently. The second question is how to define the metrics between
images from different defocus groups.
Methods described in this thesis can be applied to X-ray free electron Laser
(XFEL) single particle reconstruction [65]. Because of the low photon counts (2.5
photons per frame), single particle imaging experiments of macromolecules at XFEL
require processing hundreds of thousands (or more) of images. Recently an expectation and maximization (EM) algorithm [75] was proposed to process the low-flux data
in aggregate, without any prior knowledge. The algorithm is similar to the expectation and maximization algorithms used in cryo-EM single particle reconstruction.
However, EM algorithms are not guaranteed with convergence. Therefore it will be interesting to use class averaging method on such data to achieve higher photon counts
and then use the knowledge of common-arcs among images to build reconstructed volume. The main difference between XFEL and cryo-EM single particle reconstruction
is the noise model (Poissonian) and there is no modulation with the microscope point
spread function and shifts in XFEL data. The naı̈ve cross-correlation between images
will be almost always zero, but cross-correlation after projection to the Fourier-Bessel
steerable PCA basis may improve the score.
124
Appendix A
PCA and Wiener filtering
It is desirable to compress and denoise the signals and then define the similarities between the signals. A good denoising procedure will retain most characteristic features
of the true signal while diminishing the contribution of noise. For example, in [15],
denoising the projections using wavelet spin-cycling [12] significantly improved the
noise tolerance of their algorithm. A possible limitation of the wavelet denoising
approach is that the pre-chosen wavelet basis is not adaptive to the data, and it is
reasonable to believe that an adaptive basis will lead to improved denoising. One
way of constructing such an adaptive basis for images is using Fourier-Bessel steerable PCA as described in Chapter 2. We detail the derivations for the linear filtering
procedure (2.28) which is asymptotically optimal in the mean squared error sense in
the limit n, pk → ∞ and pk /n = γ fixed. Because of the special block diagonal structure of the covariance matrix (2.23), such linear filtering can be done in parallel for
each separate block of size pk × pk . These derivations are adapted from the analysis
in [100].
125
A.1
PCA: The Basics.
PCA is a linear dimensionality reduction method dating back to 1901 [66] and is one
of the most useful techniques in data analysis. Usually the first step in the analysis
of most types of high-dimensional data is performing PCA. PCA finds an orthogonal
transformation which maps the data to a new coordinate system such that the greatest
variance by any projection of the data comes to lie on the first coordinate (called the
first principal component), the second greatest variance on the second coordinate,
and so on. Given a p-dimensional random variable x with mean Ex = µ(∈ Rp ) and
and a p×p covariance matrix E(x−µ))(x−µ)T = Σ, the solution to the maximization
problem
max Var(uT x) = max E[uT (x − µ)(x − µ)T u] = max uT Σu
kuk=1
kuk=1
kuk=1
(A.1)
is given by the top eigenvector u1 of Σ satisfying Σu1 = λ1 u1 . The eigenvalues of Σ
is denoted as λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0 are known as the population eigenvalues. Similarly, the first d eigenvectors u1 , . . . , ud corresponding to λ1 , . . . , λd are the solution
to
max
u1 ,...,ud :uT
i uj =δij
d
X
uTi Σui .
(A.2)
i=1
The d-dimensional subspace spanned by the first d eigenvectors captures the most
variability of the data. The first principal component u1 is the axis which minizes
the sum of squared distances from the mean shifted data points to their orthogonal
projections on that axis:
min Ek(I − uuT )(x − µ)k2 .
kuk=1
126
(A.3)
A.2
Linear Wiener Filtering.
We assume that the clean signal x is contaminated by additive white gaussian noise
ξ,
y = x + ξ.
(A.4)
We assume that Ex = µ and E(x − µ)(x − µ)T = Σ, while Eξ = 0. We would like
to estimate the true signal and the estimator is derived through solving the following
minimization problem.
min E[kx − x̂k2 |y].
x̂
(A.5)
The well known minimizer of (A.5) is the conditional expectation
x̂ = E[x|y].
(A.6)
However, computation of the conditional expectation (A.6) requires the knowledge of
the probability distribution of x which is not available to us: only the second order
statistics of x is assumed to be known. One of the standard alternative approaches
consists of two modifications: first, replacing the conditional mean squared error
in (A.5) by the mean squared error, and second, restriting the minimizer of (A.5)
to a smaller class of linear estimator instead of all possible estimators. That is, the
estimator x̂ is restricted to be of the formula
x̂ = µ + H(y − µ).
(A.7)
where H is a p×p matrix which is the solution of the following minimization problem,
min Ekx − (µ + H(y − µ))k2 .
H∈Rp×p
127
(A.8)
The solution to (A.8) is given by
H = Σ(Σ + σ 2 I)−1 .
(A.9)
Derivation:
Ekx − (µ + H(y − µ)k2 = Ekx − µ − H(x − µ + ξ)k2
= ET r[(I − H)(x − µ)(x − µ)† (I − H)† + Hξξ † H †
= Tr[(I − H)Σ(I − H)† + σ 2 HH † .]
(A.10)
Set derivative of (A.10) with respect to H equal to zero, we get
−2(I − H)Σ + 2σ 2 H = 0,
(A.11)
whose solution is given in (A.9). Plugging (A.9) in (A.7) provides the linear filter
x̂ = µ + Σ(Σ + σ 2 I)−1 (y − µ).
(A.12)
The estimator x̂ has a simple form when written in the basis of eigenvectors of Σ:
x̂ = µ +
P
X
k=1
1
1+
1
SNRk
hy − µ, uk iuk ,
(A.13)
where uk is the k’th eigenvector of Σ, that is,
Σ=
P
X
λk uk u†k ,
(A.14)
λk
,
σ2
(A.15)
k=1
and
SNRk =
128
is the SNR of the k’th component.
A.3
PCA in High Dimensions.
The filtering formula (A.13) requires knowledge of the first and second order moments
µ, Σ and σ 2 . In practice, however, these are unknown, and need to be estimated
from a finite collection of noisy observations. Once estimated, the naı̈ve approach to
filtering would be to replace µ, Σ and σ 2 in (A.13) by their estimated counterparts.
However, as we show below, the optimal filter turns out to be different from this
naı̈ve procedure. But before constructing the optimal filter, it is important to quickly
review a few recent results regarding PCA in high dimensions. The reader is referred
to [42] for an extensive review of this topic.
Let y1 , . . . , yn ∈ Rp be n noisy observations from the additive noise model A.4.
The sample mean estimator µ̂n is defined as
n
1X
µ̂n =
yi ,
n i=1
(A.16)
and by the law of large numbers µ̂n → µ (almost surely) as n → ∞.
The sample covariance matrix
n
1X
Sn =
(yi − µ̂)(yi − µ̂)† .
n i=1
(A.17)
The sample covariance matrix has eigen decomposition with eigenvalues l1 ≥ l2 ≥
· · · ≥ ln
Sn =
P
X
lk uk u†k .
(A.18)
k=1
The sample covariance matrix converges to Σ + σ 2 Ip×p as n → ∞, while p is fixed.
129
When the number of samples n is not exceedingly large compared to p, the estimation of the eigen space for signal needs more thoughts. Much attention has been
given in recent years to the analysis of PCA in the regime p, n → ∞ with p/n = γ
fixed (0 < γ < ∞) [42]. This is also the interesting regime of parameters for the
cryo-EM problem at hand, since typical values for p is 104 and n ranges between 104
and 105 . In such cases, the largest eigenvalue due to noise can be significantly larger
than σ 2 . It is therefore possible for the smaller eigenvalues of Σ to be “buried” inside the limiting Marčenko-Pastur (MP) distribution for the eigenvalues of the noise
covariance matrix. As a result, the principal components that correspond to such
small eigenvalues cannot be identified. In Chapter 2, we managed to decompose the
√
covariance matrix to block diagonal structure and in effect, reduces p to p, and
therefore mitigated the effect of large p.
The key result is the presence of phase transition: in the joint limit p, n → ∞,
p/n → γ, only components of Σ whose eigenvalues are larger than the critical value
√
λcrit = σ 2 γ
(A.19)
can be identified (almost surely) in the sense that their corresponding sample covariance eigenvalues “pop” outside the MP distribution. Formally (see, e.g., [62]),
lk =



σ 2 (1 + √γ)2
√
if λk < σ 2 γ


(λk + σ 1 )(1 + γ σ2 ) if λk ≥ σ 2 √γ
λk
(A.20)
Moreover, the dot product between the population eigenvector uk and the eigenvector ûk computed by PCA also undergoes a phase transition almost surely:
|huk , ûk i|2 =



0
√
if λk < σ 2 γ
γσ 4
1− 2

λ

k
 γσ
2
1+
λk
130
if λk ≥ σ
2√
(A.21)
γ
A.4
Linear filtering in high dimension
Motivated by the form of optimal linear filter (A.13), the following linear filter was
proposed:
x̂ = µ̂n +
P
X
k=1
hk hy − µ̂, ûk i,
(A.22)
where the scalar filter coefficients h1 , . . . , hn are to be determined from the minimum
MSE criteria (A.8). The solution to (A.8) is given by
hk =
1
1+
σ2
2
ûT
Σû
+|hµ−µ̂
n ,ûk i|
k
k
,
(A.23)
In practice, since σ 2 , Σ and µ are unknown, it is hard to determine the filtering
coefficients written above. However, in the limit n → ∞, µ̂n converges to µ almost
surely, therefore |hµ − µ̂n , ûk i|2 converges to zero for all k.
ûTk Σûk
=
p
X
l=1
2
2
λl |hûk , ul i| = λk |hûk , uk i| +
p
X
l=1,l6=k
λl |hûk , ul i|2
(A.24)
Under the spike model [42] where Σ is assumed to have only a finite number K of nonP
zero eigenvalues, pl=1,l6=k λl |hûk , ul i|2 converges to zero. Therefore ûTk Σûk converges
to λk c2k almost surely for all k. The optimal filter coefficients (A.23) converge almost
surely to
hk =
1
1
,
=
1
σ2
1 + c2 SNR
1 + λk c2
k
(A.25)
k
k
where SNRk is given in (A.15). Using (A.21), we rewrite the filter coefficients as
hk =
1
1+
1
SNRγ,k
131
,
(A.26)
where
SNRγ,k = c2k SNRk =



0
if SNRk <

2

 SNRk −γ
SNRk +γ
if SNRk ≥
√
γ
√
(A.27)
γ
In the limit n, p → ∞ and p/n = γ fixed, the asymptotically optimal linear Wiener
filter is given by
x̂ = µ +
∞
X
k=1
1
1+
hy − µ, ûk iûk
1
SNRγ,k
(A.28)
In practice, where only a finite number of samples are available, we propose to use
x̂ = µ̂n +
K̂
X
k=1
1
1+
1
SNRγ,k
hy − µ̂n , ûk iûk ,
(A.29)
√
where K̂ is the number of components estimated to be satisfying λk > σ 2 γ.
In order to use the filter (A.29) in practice we need to know the noise variance σ 2
and the eigenvalues λ1 , . . . , λp . These are however unknown, and need to be estimated
from the data. We mentioned earlier that σ 2 can be estimated using the boundary
pixels that correspond to noise. We can also adapt the estimation method suggested
by Kritchman and Nadler [50]. Their method provides an estimator of σ̂ 2 for the
noise variance and an estimator K̂ for the number of components satisfying (A.19)
under the spike model assumption. The top K̂ population eigenvalues λ1 , . . . , λK are
then estimated as the positive solutions to the decoupled quadratic equations
lk = (λ̂k + σ̂ 2 )(1 + γ
σ̂ 2
λ̂
as implied by (A.20).
132
),
k
k = 1, . . . , K̂,
Appendix B
Parallel transport on the sphere
and related operators
Our algorithm is heavily based on (4.17), which provides a way to express the dot
product between unknown viewing angles using the Hermitian dot product between
vectors in C3 obtaind from an orthonormal basis of the maximal eigenspace of multiplicity 3 of the local parallel transport operator Th . We calculate an explicit orthonormal basis for this eigenspace and give an elementary verification of identity (4.17)
without using representation theory. In passing, we make a few important observations about the parallel transport operator that go beyond the mere purpose of
identifying its maximal eigenspace. In particular, we relate the parallel transport
operator to two other integral operators, which we call the common lines operator
and the orthographic lines operator, first introduced in [97] and [32], respectively, as
well as their localized version. At first we disregard the complex structure, thinking
of TRi ,Rj as a map from R2 to R2 .
133
B.1
Common lines and orthographic lines
One of the fundamental concepts in cryo-EM is that of the common line [104, 106]:
if Ii and Ij are two projection images with corresponding rotations Ri and Rj , then
the two planes v(Ri )⊥ and v(Rj )⊥ must have a common line of intersection whose
normalized direction in R3 is the unit vector
Ri3 × Rj3
v(Ri ) × v(Rj )
=
.
kv(Ri ) × v(Rj )k
kRi3 × Rj3 k
(B.1)
Using the orthonormal bases Ri1 , Ri2 for v(Ri )⊥ and Rj1 , Rj2 for v(Rj )⊥ , the unit
vector (B.1) is identified with the unit vectors cij ∈ TRi = R2 and cji ∈ TRj = R2 ,
given by
and


(B.2)


(B.3)
3
3
 1 0 0  −1 Ri × Rj
cij = 
 Ri
kRi3 × Rj3 k
0 1 0
3
3
 1 0 0  −1 Ri × Rj
.
cji = 
 Rj
kRi3 × Rj3 k
0 1 0
Note that the cross product satisfies
(Rw1 ) × (Rw2 ) = R(w1 × w2 ) for all R ∈ SO(3) and w1 , w2 ∈ R3 .
(B.4)
The common lines (B.2)-(B.3) can therefore be simplified using

 −U23

Ri−1 (Ri3 × Rj3 ) = (Ri−1 Ri3 ) × (Ri−1 Rj3 ) = I 3 × U 3 = 
 U13

0
134






(B.5)
and

 U32

Rj−1 (Ri3 × Rj3 ) = (Rj−1 Ri3 ) × (Rj−1 Rj3 ) = (U −1 )3 × I 3 = 
 −U31

0
where U =
Ri−1 Rj ,
3
3
U is its third column, and I =
µ
0 0 1
¶T



,


(B.6)
is the third column
of the 3 × 3 identity matrix I. As rotations are isometries, we also have that
kRi3 × Rj3 k2 = kRi (I 3 × U 3 )k2 = kI 3 × U 3 k2 = (U13 )2 + (U23 )2 = 1 − (U33 )2 . (B.7)
It follows from (B.5)-(B.7) that the common lines (B.2)-(B.3) are given by


(B.8)


(B.9)
1
 −U23 
cij = p


1 − (U33 )2
U13
and
1
 U32 
cji = p


1 − (U33 )2
−U31
We proceed to define the orthographic lines oij and oji . Let õij ∈ R3 be the
normalized projection of v(Rj ) onto v(Ri )⊥ , given by
hRi1 , Rj3 iRi1 + hRi2 , Rj3 iRi2
q
,
õij =
1 − hRi3 , Rj3 i2
(B.10)
and let õij ∈ R3 be the normalized projection of v(Ri ) onto v(Rj )⊥ , given by
õji =
hRj1 , Ri3 iRj1 + hRj2 , Ri3 iRj2
q
.
1 − hRi3 , Rj3 i2
135
(B.11)
We identify õij and õji with the vectors oij ∈ TRi and oji ∈ TRj , respectively, written
explicitly as
oij = q
1
1−
hRj3 , Ri3 i2

hRi1 , Rj3 i

hRj1 , Ri3 i


hRi2 , Rj3 i


= p


(B.12)


(B.13)
 U13 


1 − (U33 )2
U23
1
and

1
1


 U31 
oji = q

= p

.
1 − (U33 )2
1 − hRj3 , Ri3 i2
hRj2 , Ri3 i
U32
The common lines (B.8)-(B.9) and the orthographic lines (B.12)-(B.13) are related
through
cij = Joij ,
where
cji = −Joji ,


 0 −1 
J =

1 0
(B.14)
(B.15)
is the 2×2 rotation matrix by 90◦ . Note that J T = J −1 = −J and that multiplication
by J is equivalent to multiplication by ι in the complex structure. From (B.14) it
follows that the orthographic lines are perpendicular to the common lines, that is,
cTij oij = cTji oji = 0, because xT Jx = 0 for every x ∈ R2 .
136
B.2
Common lines, orthographic lines, and parallel transport kernels.
We define the orthographic lines kernel as the 2 × 2 rank-one matrix ORi ,Rj that maps
the orthographic line oji to the orthographic line oij :
ORi ,Rj =
oij oTji


1
 U13 
=


1 − (U33 )2
U23
µ
U13 U23
¶
.
(B.16)
Similarly, we define the common line kernel as the 2 × 2 rank-one matrix CRi ,Rj that
maps the common line cji to the common line cij :
CRi ,Rj = cij cTji = Joij (−Joji )T = −JORi ,Rj J −1


¶
µ
1
 −U23 
=

 U32 −U31 .
1 − (U33 )2
U13
(B.17)
(B.18)
Note that unlike the parallel transport kernel TRi ,Rj , the orthographic lines kernel
ORi ,Rj and the common lines kernel CRi ,Rj do not respect the complex structure:
they can be considered as maps from R2 to R2 , but not as maps from C to C.
Lemma B.2.1. The parallel transport kernel TRi ,Rj , admits the decomposition
TRi ,Rj = CRi ,Rj − ORi ,Rj .
(B.19)
Proof. The geometric definition of the orthographic line implies that õji is the normalized velocity vector of the geodesic path from v(Rj ) to v(Ri ) at v(Rj ). Similarly,
the orthographic line õij is the normalized velocity vector of the geodesic path from
v(Ri ) to v(Rj ) at v(Rj ). The differential geometric definition of parallel transport
means that TRi ,Rj maps oji to −oij . Moreover, since TRi ,Rj is indeed given by (B.19),
137
we need to check that it satisfies
TRi ,Rj oji = −oij
(B.20)
TRi ,Rj Joji = −Joij ,
(B.21)
and
but these follow immediately from (B.16), (B.17), and the orthogonality of the common lines and the orthographic lines:
(CRi ,Rj − ORi ,Rj )oji = (cij cTji − oij oTji )oji = −oij ,
(CRi ,Rj − ORi ,Rj )Joji = (−Joij oTji J −1 − oij oTji )oji = −Joij ,
B.3
Global and local integral operators
Using the kernels CRi ,Rj , ORi ,Rj and TRi ,Rj we define the integral operators C, O, and
T as follows:
(Cf )(Ri ) =
Z
SO(3)
(Of )(Ri ) =
Z
SO(3)
(T f )(Ri ) =
Z
SO(3)
CRi ,Rj f (Ri )dRj ,
ORi ,Rj f (Ri )dRj ,
TRi ,Rj f (Ri )dRj ,
where f is a function from SO(3) to R2 . We refer to C, O, and T as the global
common lines operator, the global orthographic lines operator, and the global parallel
transport operator, respectively.
138
We also define the local integral operators Ch , Oh , and Th , for h ∈ [0, 2], as
(Ch f )(Ri ) =
Z
Rj ∈SO(3):hv(Ri ),v(Rj )i>1−h
(Oh f )(Ri ) =
Z
Rj ∈SO(3):hv(Ri ),v(Rj )i>1−h
(Th f )(Ri ) =
Z
Rj ∈SO(3):hv(Ri ),v(Rj )i>1−h
CRi ,Rj f (Ri )dRj ,
(B.22)
ORi ,Rj f (Ri )dRj ,
(B.23)
TRi ,Rj f (Ri )dRj .
(B.24)
We refer to Ch , Oh , and Th as the local common lines operator, the local orthographic
lines operator, and the local parallel transport operator, respectively.
B.4
Spectral analysis of the local operators
The following lemma is key to the characterization of the maximal eigenspace of Th .
Lemma B.4.1. Each of the three columns of




 1 0 0  −1  R11 R21 R31 
F (R) = 
R = 

0 1 0
R12 R22 R32
(B.25)
is an eigenfunction of both Ch , Oh , and Th , with
1
(Ch F )(R) = hF (R),
4
(B.26)
1
1
(Oh F )(R) = −( h − h2 )F (R),
4
8
(B.27)
1
1
(Th F )(R) = ( h − h2 )F (R).
2
8
(B.28)
Proof. Recalling that U = Ri−1 Rj gives




 1 0 0  −1 −1  U11 U21 U31  −1
F (Rj ) = F (Ri U ) = 
 U Ri = 
 Ri .
0 1 0
U12 U22 U32
139
(B.29)
Together with (B.17) we obtain
CRi ,Rj F (Rj ) =


1
 −U23 


1 − (U33 )2
U13
µ
U32 −U31
¶


 U11 U21 U31  −1

 Ri .
U12 U22 U32
(B.30)
We note that
µ
−U32 U31
¶

 U11 U21 U31 

=
U12 U22 U32
= (I 3 × (U −1 )3 )T U −1

¶  U11

−U32 U31 0 
 U12

U13
µ
= [U (I 3 × (U −1 )3 )]T = (U 3 × I 3 )T = U23

µ

U21 U31 

U22 U32 


U32 U33
¶
−U13 0
,
which, substituting in (B.30), results in


µ
¶
1
 U23 
−1
CRi ,Rj F (Rj ) =

 U23 −U13 0 Ri
1 − (U33 )2
−U13


2
−xy 0  −1
1  y
=

 Ri ,
1 − z2
−xy x2 0
140
(B.31)
where U 3 =
µ
x y z
¶T
. Similarly,




1
 U11 U21 U31  −1
 U13 


 Ri

U
U
31
32
1 − (U33 )2
U12 U22 U32
U23


1
 U13 
−1 3
3 T −1 −1
=

 [(U ) − U33 I ] U Ri
2
1 − (U33 )
U23


1
 U13  3
3 T −1
=

 (I − U33 U ) Ri
1 − (U33 )2
U23


2
2
1  −zx −zxy x(1 − z )  −1
(B.32)
=

 Ri .
1 − z2
2
2
−zxy −zy y(1 − z )
ORi ,Rj F (Rj ) =
µ
¶
From (B.31) and (B.32) it follows that the integrals over SO(3) defining Ch F and
Oh F in (B.22)-(B.23) collapse to the following integrals over S 2 :
(Ch F )(Ri ) =
Z
Cap(h)
(Ch F )(Ri ) =
Z
Cap(h)


2
−xy 0 
1  y
−1
 dSRi ,

2
1−z
−xy x2 0

2
2

1  −zx −zxy x(1 − z ) 
−1

 dSRi ,
1 − z2
2
2
−zxy −zy y(1 − z )
(B.33)
(B.34)
where Cap(h) = {(x, y, z) ∈ S 2 : z > 1 − h} and dS is the uniform measure over S 2
R
satisfying S 2 dS = 1. In spherical coordinates
x = sin θ cos φ,
y = sin θ sin φ,
z = cos θ,
the cap is given by 0 ≤ φ < 2π and 0 ≤ θ < α, where cos α = 1 − h and dS =
1
4π
sin θdθdφ. In order to simplify (B.33)-(B.34) we calculate the following integrals
141
using spherical coordinates and symmetry whenever possible:
Z
Cap(h)
Z
Cap(h)
Z
Cap(h)
xy
dS =
1 − z2
zx2
dS =
1 − z2
Z
Cap(h)
zxy
dS =
1 − z2
Z
ydS = 0,
Cap(h)
Z
Z
1
1
y2
x2 + y 2
dS =
=
dS
2
2 Cap(h) 1 − z 2
2 Cap(h)
Cap(h) 1 − z
Z 2π Z α
1
1
1
=
sin θdθ = (1 − cos α) = h,
dφ
8π 0
4
4
0
Z
Z
Z
1
1
zy 2
z(x2 + y 2 )
dS =
=
zdS
2
2 Cap(h) 1 − z 2
2 Cap(h)
Cap(h) 1 − z
Z 2π Z α
1
1
1
1
=
dφ
cos θ sin θdθ = (1 − cos2 α) = h − h2 .
8π 0
8
4
8
0
zx2
dS =
1 − z2
Z
Substituting the values of the integrals above into (B.33)-(B.34) gives




1  1 0 0  −1 1
(Ch F )(Ri ) = h 
 Ri = hF (Ri ),
4
4
0 1 0
1
1 2
1
1
 1 0 0  −1
(Oh F )(Ri ) = −( h − h2 ) 
 Ri = −( h − h )F (Ri ).
4
8
4
8
0 1 0
Together with Lemma B.2.1 we have
1
1
1
(Th F )(Ri ) = (Ch F )(Ri ) − (Oh F )(Ri ) = hF (Ri ) + ( h − h2 )F (Ri )
4
4
8
1 2
1
= ( h − h )F (Ri ),
2
8
and the proof of the lemma is completed.
142
Not that from (B.17) it follows that f : SO(3) 7→ R2 is an eigenfunction of the
operator Oh satisfying
(Oh f )(R) = λf (R) for all R ∈ SO(3),
iff Jf is an eigenfuction of the operator Ch satisfying
(Ch Jf )(R) = −λJf (R) for all R ∈ SO(3),
Indeed, suppose (Oh f )(R) = λf (R) for all R ∈ SO(3); then
(Ch Jf )(R) =
Z
hv(Ri ),v(Rj )i>1−h
=
Z
hv(Ri ),v(Rj )i>1−h
= −J
Z
CRi ,Rj Jf (Rj )dRj
−JORi ,Rj J −1 Jf (Rj )dRj
hv(Ri ),v(Rj )i>1−h
ORi ,Rj f (Rj )dRj = −λJf (Rj ).
The other direction of the “iff” statement is verifed in a similar manner. We conclude
that the three columns of JF (R) are also eigenfunctions of Th with the same eigenvalue
λ1 (h) = 12 h − 18 h2 . Overall, th multiplicity of the eigenvalue λ1 (h) of Th is 6.
B.5
Complex structure of the parallel transport
kernel.
We return to the complex structure and regard the parallel transport kernel TRi ,Rj
as a mapping1 from C to C. Since parallel transport maps oji to −oij (B.12)-(B.13),
1
The parallel transport kernel TRi ,Rj is either a 2 × 2 real-valued (rotation) matrix or a 1 × 1
complex-valued matrix. Though our notation is ambiguous, the exact meaning of TRi ,Rj should be
clear from the context.
143
the complex structure implies that
TRi ,Rj (U31 + ιU32 ) = −(U13 + ιU23 ),
(B.35)
and we conclude that the single matrix element for TRi ,Rj is
TRi ,Rj = −
U13 + ιU23
.
U31 + ιU32
(B.36)
We proceed to prove that the rotation angle α̃ij of (3.2)-(3.3) obtained by optimal
alignment of the bases is the same rotation angle of the parallel transport kernel.
Establishing this fact would justify that the class averaging matrix H, which is computed from the image data, is indeed the (discretized version of the) local parallel
transport operator. Comparing the expression in (B.36) and (3.2)-(3.3), we conclude
that the proof consists of verifying that the identity
−
(U11 + U22 ) + ι(U21 − U12 )
U13 + ιU23
=p
U31 + ιU32
(U11 + U22 )2 + (U21 − U12 )2
(B.37)
holds for all U ∈ SO(3) with v(U ) 6= ±(0, 0, 1)T (the parallel transport operator is
not defined for v(Ri ) = ±v(Rj )). The sphere S 3 ∈ R4 is a double cover of SO(3), and
by Euler’s formula, for any U ∈ SO(3) there corresponds a vector (a unit quaternion)
(a, b, c, d) ∈ S 3 (a2 +b2 +c2 +d2 = 1), unique up to its antipodal point (−a, −b, −c, −d),
such that

2
2
2
2
 a +b −c −d

U =
2bc + 2ad


−2ac + 2bd
2bc − 2ad
2
2
2
2ac + 2bd
2
a −b +c −d
−2ab + 2cd
2ab + 2cd
a2 − b2 − c2 + d2
144






(B.38)
Using Euler’s formula (B.38), the right-hand side of (B.37) is given by
(U + U22 ) + ι(U21 − U12 )
(a2 − d2 ) + 2ιad
p 11
,
=
a2 + d2
(U11 + U22 )2 + (U21 − U12 )2
while the left-hand side is
−
U13 + ιU23
(U13 U31 + U23 U32 ) + ι(U23 U31 − U13 U32 )
=
U31 + ιU32
(U31 )2 + (U32 )2
−4(a2 − d2 )(b2 + c2 ) − 8ιad(b2 + c2 )
=−
4(a2 + d2 )(b2 + c2 )
which verifies that the identity (B.37) holds for all U ∈ SO(3) with v(U ) 6= ±(0, 0, 1)T ,
and the proof is completed.
B.6
Spectral analysis of the complex local parallel
transport operator
In the complex structure, multiplication by J is equivalent to multiplication by ι,
so the columns of JF (R) are linear combinations of the columns of F (R), and the
multiplicity of the eigenvalues λ1 (h) of Th is only 3 (instead of 6 over the reals). We
have the following pair of lemmas.
Lemma B.6.1. The three functions ψ1 , ψ2 , ψ3 : SO(3) 7→ C given by
ψ1 (R) = R11 + ιR12 ,
(B.39)
ψ2 (R) = R21 + ιR22 ,
(B.40)
ψ3 (R) = R31 + ιR32
(B.41)
145
are orthogonal eigenfunctions of Th satisfying
Th ψm = λ1 (h)ψm ,
m = 1, 2, 3,
(B.42)
where
1
1
λ1 (h) = h − h2 ,
2
8
(B.43)
2
kψ1 k2 = kψ2 k2 = kψ3 k2 = .
3
(B.44)
and
Proof. The fact that ψ1 , ψ2 , ψ3 are eigenfunctions of Th with eigenvalue λ1 (h) =
1
h − 81 h2
2
follows immediately from (B.28) (Lemma B.4.1) and the complex structure.
The orthogonality of the eigenfunctions and their normalization are verified through




 1 0 

  1 0 0  −1

F (R)T F (R)dR =
R
 R dR
 0 1 
SO(3)
SO(3)

 0 1 0
0 0
Z
Z
3 3 T
−1
=
R(I − I (I ) )R dR =
(I − R3 (R3 )T )dR
SO(3)
SO(3)


2
 x xy xz 
Z


 xy y 2 yz  dR = I − 1 I = 2 I.
=1−


3
3
SO(3) 

2
xz yz z
Z
Z
3
where R =
µ
x y z
¶T
, and we used symmetry to calculate
1
x dR =
3
SO(3)
Z
2
1
(x + y + z )dR =
3
SO(3)
Z
2
2
146
2
1
dR = .
3
SO(3)
Z
Lemma B.6.2. Suppose ψ1 , ψ2 , ψ3 are the three eigenfunctions of Th given in (B.39)(B.41), and define Ψ : SO(3) 7→ C3 as
Ψ(R) = (ψ1 (R), ψ2 (R), ψ3 (R)) f or all R ∈ SO(3).
Then,
hRi3 , Rj3 i = 2
|hΨ(Ri ), Ψ(Rj )i|
−1
kΨ(Ri )kkΨ(Rj )k
(B.45)
for all Ri , Rj ∈ SO(3).
Proof. We start the proof with the explicit computation of the absolute value of the
Hermitian dot product between Ψ(Ri ) and Ψ(Rj ):
|hΨ(Ri ), Ψ(Rj )i|2 = |hRi1 + ιRi2 , Rj1 + ιRj2 i|2
= |(U11 + U22 ) + ι(U21 − U12 )|2
= (U11 + U22 )2 + (U12 − U21 )2
(B.46)
where U = Ri−1 Rj . The right-hand side of (B.46) can be further simpliefied using
Euler’s formula (B.38):
(U11 + U22 )2 + (U12 − U21 )2 = 4(a2 + d2 )2 = (1 + U33 )2
for all U ∈ SO(3). Therefore,
|hΨ(Ri ), Ψ(Rj )i|2 = (1 + U33 )2 .
In particular,
kΨ(Ri )k = kΨ(Rj )k =
147
√
2,
(B.47)
because
kΨ(Ri )k2 = hΨ(Ri ), Ψ(Ri )i = 1 + I33 = 2.
We conclude that
1 + U33
|hΨ(Ri ), Ψ(Rj )i|
=
kΨ(Ri )kkΨ(Rj )k
2
or, equivalently,
hRi3 , Rj3 i = U33 = 2
|hΨ(Ri ), Ψ(Rj )|
− 1,
kΨ(Ri )kkΨ(Rj )k
and the proof is completed.
148
Bibliography
[1] X. Agirrezabala, H. Y. Liao, E. Schreiner, J. Fu, R. F. Oritz-Meoz, K. Schulten,
G. Green, and J. Frank. Structural characterization of mRNA-tRNA translation
intermediates. Proc. Natl. Acad. Sci., 109(16):6094–6099, 2012.
[2] B. Alberts. The cell as a collection of protein machines: Preparing the next
generation of molecular biologists. Cell, 92:291–294, 1998.
[3] X. Bai, I. S. Fernandez, G. McMullan, and S. H. W. Scheres. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. eLife,
2013.
[4] S. Basu and Y. Bresler. Feasibility of tomography with unknown view angles.
IEEE Transactions on Image Processing, 9:1107–1122, 2000.
[5] S. Basu and Y. Bresler. Uniqueness of tomography with unknown view angles.
IEEE Transactions on Image Processing, 9:1094–1106, 2000.
[6] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and
data representation. Neural Computation, 15:1373–1396, 2003.
[7] E. J. Boekema, J. A. Berden, and M. van Heel. Structure of mitochondrial F1ATPase studied by electron microscopy and image processing. Bioch. biophys.
Acta, 851:353–360, 1986.
149
[8] L. Borland and M. van Heel. Classification of image data in conjugate representation spaces. Journal of the Optical Society of America A, 7(4):601–610,
1990.
[9] J. Bruna and S. Mallat. Classification with scattering operators. In 2011 IEEE
Conference on Computer Vision and Pattern Recognition, pages 1561–1566,
2011.
[10] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE trans
of PAMI, 35(8):1872 – 1886, Aug 2013.
[11] K. N. Chaudhury and A. Singer. Non-local Euclidean medians. IEEE Signal
processing letters, 19(11):745–748, 2012.
[12] R. R. Coifman and D. Donoho. Translation-invariant de-noising. In Lecture
notes in statistics, pages 125–150. Springer-Verlag, New York, 1995.
[13] R. R. Coifman and S. Lafon. Diffusion maps. Appl. Comput. Harmon. Anal.,
21:5–30, 2006.
[14] R. R. Coifman, Y. Shkolnisky, F. J. Sigworth, and A. Singer. Cryo-EM structure
determination through eigenvector of sparse matrices. Technical report, Yale
University, Department of Computer Science, 2007.
[15] R. R. Coifman, Y. Shkolnisky, F. J. Sigworth, and A. Singer. Graph Laplacian
tomography from unkown random projections. IEEE Transactions on Image
Processing, 17(10):1891–1899, 2008.
[16] R. R. Coifman, Y Shkolnisky, F. J. Sigworth, and A. Singer. Reference free
structure determination through eigenvectors of center of mass operators. Appl.
Comput. Harmon. Anal., 28(3):296–312, 2010.
150
[17] Y. Cong, W. Jiang, S. Birmanns, Z. H. Zhou, W. Chiu, and W. Wriggers. Fast
rotational matching of single-particle images. Journal of Structural Biology,
152(2):104–112, 2005.
[18] S. I. Daitch, J. A. Kelner, and D. A. Spielman. Fitting a graph to vector data. In
Proceedings of the 26th Annual International Conference on Machine Learning,
pages 201–208, 2009.
[19] M. P. Do Carmo. Differential Geometry of Curves and Surfaces. Prentice-Hall,
Englewood Cliffs, NJ, 1976.
[20] D. A. Doyle, J. M. Cabral, R. A. Pfuetzner, A. Kuo, J. M. Gulbis, S. L. Cohen,
B. T. Chait, and R. MacKinnon. The structure of the potassium channel:
Molecular basis of K + conduction and selectivity. Science, 280:6977, 1998.
[21] P. Dube, P. Tavares, R. Lurz, and M. van Heel. Bacteriophage SPP1 portal
protein: a DNA pump with 13-fold symmetry. EMBO J., 15:1303–1309, 1993.
[22] C. L. Epstein. Introduction to the Mathematics of Medical Imaging. Pearson
Education, Upper Saddle River, NJ, 2003.
[23] J. Frank. Averaging of low exposure electron micrographs of nonperiodic opbjects. Ultramicroscopy, 1:159–162, 1975.
[24] J. Frank. Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford University Press, New York, 2nd edition, 2006.
[25] J. Frank, M. Radermacher, P. A. Penczek, J. Zhu, Y. Li, M. Ladjadji, and
A. Leigh. SPIDER and WEB: Processing and visualization of images in 3D
electron microscopy and related fields. Journal of Structural Biology, 116:190–
199, 1996.
151
[26] J. Frank, M. Radermacher, T. Wagenknecht, and A. Verschoor. A new method
for three-dimensional reconstruction of single macromolecules using low does
electron micrographs. Ann. NY Acad. Sci., 483:77–87, 1986.
[27] T. Frankel. The Geometry of Physics, An Introduction. Cambridge University
Press, 2004.
[28] P. Gilbert. Iterative methods for the three-dimensional reconstruction of an
object from projections. Journal of Theoretical Biology, 36(1):105–117, 1972.
[29] D. S. Goldberg and F. P. Roth. Assessing experimentally derived interactions
in a small world. Proc. Natl. Acad. Sci., 100(8):4372–4376, April 2003.
[30] R. Gordon, R. Bender, and G. T. Herman. Algebraic reconstruction techiniques
(ART) for three-dimensional electron microscopy and X-ray photography. Journal of Theoretical Biology, 29(3):471–481, 1970.
[31] N. Grigorieff. Resolution measurement in structures derived from single particles. Acta Cryst., D56:1270–1277, 2000.
[32] R. Hadani and A. Singer.
Representation theoretic patterns in three-
dimensional cryo-electron microscopy I–The intrinsic reconstruction algorithm.
Annals of Math., 174(2):1219–1241, 2011.
[33] R. Hadani and A. Singer.
Representation theoretic patterns in three-
dimensional cryo-electron microscopy II–The class averaging problem. Found.
Comput. Math., 11:589–616, 2011.
[34] N. Halko, P. G. Martinsson, Y. Shkolnisky, and M. Tygert. An algorithm for
the principal component analysis of large data sets. SIAM Journal on Scientific
computing, 33 (5):2580–2594, 2011.
152
[35] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decomposition. SIAM Rev., 53(2):217–288, 2011.
[36] K. J. Hanszen. The optical transfer theory of the electron microscope: fundamental principles and applications. In: Barer, R. and Cosslett, V. E. (Eds.),
Advances in Optical and Electron Microscopy, 4:1–84, 1971.
[37] G. Harauz and M. van Heel. Exact filters for general geometry three-dimensional
reconstruction. Optik, 73:146–156, 1986.
[38] H. Hilai and J. Rubinstein.
Recognition of rotated images by invariant
Karhunen-Loéve expansion. J. Opt. Soc. Am. A, 11 (5):1610–1618, 1994.
[39] T. Jebara, J. Wang, and S. Chang. Graph construction and b-matching for
semi-supervised learning. In International Conference on Machine Learning,
2009.
[40] W. Jiang, M. L. Baker, J. Jakana, P. R. Weigele, J. King, and W. Chiu. Backbone structure of the infectious epsilon 15 virus capsid revealed by electron
cryomicroscopy. Nature, 451:1130–1134, 2008.
[41] M. Jogan, E. Zagar, and A. Leonardis. Karhunen-Loéve expansion of a set
of rotated templates. IEEE Transactions on Image Processing, 12(7):817–825,
2003.
[42] I. M. Johnstone. High dimensional statistical inference and random matrices.
In Proceedings of International Congress of Mathematicians, volume 1, pages
1–28, 2006.
[43] P. W. Jones, A. Osipov, and V. Rokhlin. A randomized approximate nearest
neighbors algorithm. Proc. Natl. Acad. Sci., 108(38):15679–15686, 2011.
153
[44] L. Joyeux and P. A. Penczek. Efficiency of 2D alignment methods. Ultramicroscopy, 92(2):33–46, 2002.
[45] Z. Kam. The reconstruction of structure from electron micrographs of randomly
oriented particles. J. theor. Biol., 82:15–39, 1980.
[46] O. Khorunzhiy. Rooted trees and moments of large sparse random matrices. In
Discrete Random Walks, Discrete Math. Theor. Comput. Sci. Proc., volume AC,
pages 145–154, Nancy, France, 2003.
[47] A. Khorunzhy. Sparse random matrices: Spectral edge and statistics of rooted
trees. Appl. Probab., 33:124–140, 2001.
[48] A. Klug and R. A. Crowther. Three-dimensional image reconstruction from the
viewpoint of information theory. Nature, 238:435–440, 1972.
[49] I. R. Kondor. A novel set of rotationally and translationally invariant features for images based on the non-commutative bispectrum.
Available at
http://arxiv.org/pdf/cs/0701127.pdf, 2007.
[50] S. Kritchman and B. Nadler. Determining the number of components in a
factor model from limited noisy data. Chemometrics and Intelligent Laboratory
Systems, 94:19–32, 2008.
[51] S. Kritchman and B. Nadler. Non-parametric detection of the number of signals,
hypothesis testing and random matrix theory. IEEE Transactions on Signal
Processing, 57:3930–3941, 2009.
[52] S. Lafon. Diffusion Maps and Geometric Harmonics. PhD thesis, Yale University, May 2004.
154
[53] S. J. Ludtke, M. L. Baker, D. H. Chen, J. L. Song, D. T. Chuang, and W. Chiu.
De novo backbone trace of GroEL from single particle electron cryo-microscopy.
Structure, 16:441–448, 2008.
[54] S. J. Ludtke, P. R. Baldwin, and W. Chiu. EMAN: semiautomated software for
high-resolution single-particle reconstructions. J. Struct. Biol., 128(1):82–97,
1999.
[55] R. MacKinnon. Potassium channels and the atomic basis of selective ion conduction. Nobel Lecture, Bioscience Reports, 24:75–100, 2004.
[56] J. MacMahon. On the roots of the Bessel and certain related functions. Annals
of Math., 9(1):23–30, 1894-1895.
[57] S. Mallat. Group invariant scattering, 2012. available at arXiv:1101.2286v3.
[58] R. Marabini and J. M. Carazo. On a new computationally fast image invariant
based on bispectral projections. Pattern Recognition Letters, 17:959–967, 1996.
[59] R. Marabini, G. T. Herman, and J. M. Carazo. 3D reconstruction in electron
microscopy using ART with smooth spherically symmetric volume elements
(blobs). Ultramicroscopy, 72(1-2):53–65, 1998.
[60] M. Michaelis and G. Sommer. A Lie group approach to steerable filters. Pattern
Recognition Letters, 16(11):1165 – 1174, 1995.
[61] J. Milnor. Analytic proofs of the “hairy ball theorem” and the Brouwer fixed
point theorem. The American Mathematical Monthly,, 85(7):521–524, 1978.
[62] B. Nadler. Finite sample apprxomiation results for principal component analysis: A matrix perturbation approach. Annals of Statistics, 36(6):2791–2817,
2008.
155
[63] B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis. Diffusion maps,
spectral clustering and eigenfunctions of Fokker-Planck operators. In Advances
in Neural information Processing Systems 18, pages 955–962. MIT Press, 2005.
[64] F. Natterer. The Mathematics of Computerized Tomography. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphi, 2001.
[65] R. Neutze, R. Wouts, D. van der Spoel, E. Weckert, and J. Hajdu. Potential
for biomolecular imaging with femtosecond x-ray pulses. Nature, 406:752–757,
2000.
[66] K. Pearson. On lines and planes of closest fit to systems of points in spaces.
Philosophical Magazine, 2(6):559–572, 1901.
[67] P. A. Penczek. Image restoration in cryo-electronmicroscopy. Methods in enzymology, 482:35–72, 2010.
[68] P. A. Penczek. Resolution measures in molecular electron microscopy. Methods
Enzymology, 482:73–100, 2010.
[69] P. A. Penczek, R. Grassucci, and J. Frank. The ribosome at improved resolution: New techniques for merging and orientationrefinement in 3D cryo-electron
microscopy of biological particles. Ultramicroscopy, 53(3):251–270, 1994.
[70] P. A. Penczek, M. Radermacher, and J. Frank. Three-dimensional reconstruction of single particles embedded in ice. Ultramicroscopy, 40:33–53, 1992.
[71] P. A. Penczek, R. Renka, and H. Schomberg. Gridding-based direct Fourier
inversion of the three-dimensional ray transform. J. Opt. Soc. Am. A, 21(4):499–
509, 2004.
156
[72] P. A. Penczek, J. Zhu, and J. Frank. A common-lines based method for determining orientations for N > 3 particle projections simultaneously. Ultramicroscopy, 63(3-4):205–218, 1996.
[73] P. Perona. Deformable kernels for early vision. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 17(5):488–499, 1995.
[74] E. F. Petterson, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, and T. E. Ferrin. UCSF Chimera-A visualization system
for exploratory research and analysis. Journal of Computational Chemistry,
25:1605–1612, 2004.
[75] H. T. Philipp, K. Ayyer, M. W. Tate, V. Elser, and S. M. Gruner. Solving structure with sparse, randomly-oriented X-ray data. Optics Express, 20(12):13129–
13137, 2012.
[76] C. Ponce and A. Singer. Computing steerable principal components of a large
set of images and their rotations. IEEE Transactions on Image Processing,
20(11):3051–3062, 2011.
[77] M. Radermacher. Weighted back-projection methods, chapter 8, pages 245–273.
Springer, 1992.
[78] M. Radermacher. Three-dimensional reconstruction from random projections:
orientational alignment via Radon transforms. Ultramicroscopy, 53(2):121–136,
1994.
[79] M. Radermacher, T. Wagenknecht, A. Verschoor, and J. Frank. A new 3-D
reconstruction scheme applied to the 50S ribosomal subunit of E. coli. Ultramicroscopy, 141:RP1–RP2, 1986.
157
[80] M. Radermacher, T. Wagenknecht, A. Verschoor, and J. Frank.
Three-
dimensional reconstruction from a single-exposure, random conical tilt series
applied to the 50S ribosomal subunit of Escherichia coli. Journal of Microscopy,
145(2):113–136, 1987.
[81] M. Radovanović, A. Nanopoulos, and M. Ivanović. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research,
11:2487–2531, 2010.
[82] B. K. Rath and J. Frank. Fast automatic particle picking from cryo-electron
micrographs using a locally normalized cross-correlation function: a case study.
Journal of Computational Chemistry, 145(1-2):84–90, 2004.
[83] V. Rokhlin, A. Szlam, and M. Tygert. A randomized algorithm for principal
component analysis. SIAM J. Matrix Anal. Appl., 31:1100–1124, 2009.
[84] P. Rosenthal, R. A. Crowther, and R. Henderson. Optimal determination of
particle orientation, absolute hand, and contrast loss in single particle cryomicroscopy. J. Mol. Biol., 333:721–745, 2003.
[85] A. Saad, S. J. Ludtke, J. Jakana, F. J. Rixon, H. Tsuruta, and W. Chiu.
Fourier amplitude decay of electron cryomicroscopic images of single particles
and effects on structure determination. J. Struct. Biol., 133:32–42, 2001.
[86] B. M. Sadler and G. B. Giannakis. Shift- and rotation-invariant object recognition using the bispectrum. Jounal of Optical Society of America, A, 9(1):57–69,
1992.
[87] W. O. Saxton and W. Baumeister. The correlation averaging of a regularly
arranged bacterial cell envelope protein. Journal of Microscopy, 127(2):127–
138, 1982.
158
[88] W. O. Saxton and J. Frank. Motif detection in quantum noise-limited electron
micrographs by cross-correlation. Ultramicroscopy, 2:219–227, 1977.
[89] M. Schatz and M. van Heel. Invariant classification of molecular views in electron micrographs. Ultramicroscopy, 32:255–264, 1990.
[90] M. Schatz and M. van Heel. Invariant recognition of molecular projections in
vitreous ice preparations. Ultramicroscopy, 45:15–22, 1992.
[91] S. H. W. Scheres and S. Chen. Prevention of overfitting in cryo-EM structure
determinatoin. Nature Methods, 9:853–854, 2012.
[92] S. H. W. Scheres, H. Gao, M. Valle, G. T. Herman, P. P. Eggermont, J. Frank,
and J. M. Carazo. Disentangling conformational states of macromolecules in
3D-EM through likelihood optimization. Nature Methods, 4:27–29, 2007.
[93] T. Shaikh, H. Gao, W. T. Baxter, F. J. Asturias, N. Boisset, A. Leith, and
J. Frank. SPIDER image processing for single-particle reconstruction of biological macromolecules from electron micrographs. Nature Protocols, 3 (12):1941–
1974, 2008.
[94] A. Singer. From graph to manifold Laplacian: The convergence rate. Appl.
Comput. Harmon. Anal., 21(1):128–134, 2006.
[95] A. Singer. Angular synchronizaton by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal., 30(1):20–36, 2011.
[96] A. Singer, R. R. Coifman, F. J. Sigworth, D. W. Chester, and Y. Shkolnisky.
Detecting consistent common lines in cryo-EM by voting. J. Struct. Biol.,
169(3):312–322, 2010.
159
[97] A. Singer and Y. Shkolnisky. Three-Dimensional structure determination from
common lines in cryo-EM by eigenvectors and semidefinite programming. SIAM
J. Imaging Sciences, 4:543–572, 2011.
[98] A. Singer and H.-T. Wu. Vector diffusion maps and the connection Laplacian.
Communications on Pure and Applied Mathematics (CPAM), 65(8):1067–1144,
2012.
[99] A. Singer and H.-T. Wu. Spectral convergence of the connection Laplacian from
random samples. Submitted, 2013. Available at arxiv.org/abs/1306.1587.
[100] A. Singer and H.-T. Wu. Two-dimensional tomography from noisy projections
taken at unknown random directions. SIAM J. Imaging Sciences, 6(1):136–175,
2013.
[101] D. Slepian. Prolate spherioidal wave functions, Fourier analysis, and uncertainty - (IV): extension to many dimensions, generalized prolate spheroidal
wave functions. Bell System Technical Journal, 43:3009–3057, 1964.
[102] H. Solomon. Geometric Probability. SIAM, Philadelphia, 1978.
[103] M. Uenohara and T. Kanade. Optimal approximation of uniformly rotated
images: Relationship between Karhunen-Loéve expansion and discrete cosine
transform. IEEE Transactions on Image Processing, 7(1):116–119, 1998.
[104] B. Vainshtein and A. Goncharov. Determination of the spatial orientation of arbitrarily arranged identical particles of an unknown structure from their projections. In Proceedings of the 11th International Congress on Electron Microscopy,
pages 459–460, Tokyo, Japan, 1986. Japan Soc. Electron Microscopy.
[105] M. van Heel. Multivariate statistical classification of noisy images (randomly
oriented biological macromolcules). Ultramicroscopy, 13:165–184, 1984.
160
[106] M. van Heel. Angular reconstitution: a posteriori assignment of projection
directions for 3d reconstruction. Ultramicroscopy, 21:111–124, 1987.
[107] M. van Heel, J.-P. Bretaudiere, and J. Frank. Classification and multireference
alignment of images of macromolecules. In Proceedings of the 10th International
Congress on Electron Microscopy, volume 1, pages 563–564, 1982.
[108] M. van Heel, J.-P. Bretaudiere, and J. Frank. Structure and Function of Invertebrate Respiratory Proteins. Harwood Academic, Reading, UK, 1982.
[109] M. van Heel and J. Frank. Use of multivariate statistics in analysing the images
of biological macromolecules. Ultramicroscopy, 6(2):187–194, 1981.
[110] M. van Heel, B. Gowen, R. Matadeen, E. V. Orlova, R. Finn, T. Pape, D. Cohen,
H. Stark, R. Schmidt, M. Schatz, and A. Patwardhan. Single-particle electron
cryo-microscopy: towards atomic resolution. Quarterly Reviews of Biophysics,
33 (4):307–369, 2000.
[111] M. van Heel, G. Harauz, E. V. Orlova, R. Schmidt, and M. Schatz. A new generation of the IMAGIC image processing system. Journal of Structural Biology,
116:17–24, 1996.
[112] M. van Heel, M. Schatz, and E. Orlova. Correlation functions revisited. Ultramicroscopy, 46(1-4):307–316, 1992.
[113] M. van Heel and M. Stöffler-Meilicke. The characteristic views of E. coli and B.
stearothermophilus 30S ribosomal subunits in the electron microscope. EMJO
J., 4:2389–2395, 1985.
[114] M. van Heel, H. Winkler, E. Orlova, and M. Schatz. Structure analysis of iceembedded single particles. In Scanning Microscopy Supplement 6: Proceedings
161
of the Tenth Pfefferkorn Conference, pages 23–42, Cambridge University, UK,
1992.
[115] L. Wang, Y. Shkolnisky, and A. Singer. A Fourier-based approach for iterative
3D reconstruction from cryo-EM images. Submitted.
[116] L. Wang, A. Singer, and Z. Wen.
Orientation determination from cryo-
EM images using least unsquared deviation.
Submitted. Also available at
http://arxiv.org/abs/1211.7045.
[117] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality
assessment: From error visibility to structural similarity. IEEE Transactions
on Image Processing, 13(4):600–612, 2004.
[118] D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks.
Nature, 393:440–442, June 1998.
[119] E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Math., 62(2):548–564, 1955.
[120] E. P. Wigner. On the distribution of the roots of certain symmetric matrices.
Annals of Math., 67(2):325–327, 1958.
[121] Z. Zhao and A. Singer. Rotationally invariant image representation for viewing
angle classification. in preparation.
[122] Z. Zhao and A. Singer. Fourier-Bessel rotational invariant eigenimages. J. Opt.
Soc. Am. A, 30(5):871–877, May 2013.
[123] J. Zhu, P. A. Penczek, R. Schröder, and J. Frank. Three-Dimensional reconstruction with contrast transfer function correction from energy-filtered cryoelectron micrographs: Procedure and application to the 70S Escherichia coli
Ribosome. Journal of Structural Biology, 118(3):197–219, 1997.
162
[124] Y. Zhu, B. Carragher, R. M. Glaeser, D. Fellmann, R. Bajaj, M. Bern,
F. Mouche, F. D. de Haas, R. J. Hall, D. J. Kriegman, S. J. Ludtke, S. P.
Mallick, P. A. Penczek, A. M. Roseman, F. J. Sigworth, N. Volkmann, and
C. S. Potter. Automatic particle selection: results of a comparative study. J.
Struct. Biol., 145:3–14, 2004.
163
Fly UP