1,014 122 8MB
Pages 469 Page size 336 x 516 pts Year 2008
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 151
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
Imaging and Electron Physics
E DITED BY
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 151
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK ∞ This book is printed on acid-free paper.
Copyright © 2008, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2008 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2008 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail: [email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier Academic Press publications visit our Web site at www.books.elsevier.com. ISBN-13: 978-0-12-374218-6
PRINTED IN THE UNITED STATES OF AMERICA 08 09 10 11 9 8 7 6 5 4 3 2 1
CONTENTS
C ONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F UTURE C ONTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . .
vii ix xi
Reconstruction Algorithms for Computed Tomography C LAAS B ONTUS AND T HOMAS KÖHLER I. II. III. IV.
Introduction . . . . . . . . . . . . Principles of Computed Tomography CT Reconstruction . . . . . . . . . Outlook . . . . . . . . . . . . . . References . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
2 4 15 59 61
Color Spaces and Image Segmentation L AURENT B USIN , N ICOLAS VANDENBROUCKE AND L UDOVIC M ACAIRE I. II. III. IV. V.
Introduction . . . . . . . . . . . . . . . . . . . . . Color Spaces . . . . . . . . . . . . . . . . . . . . . Color Image Segmentation . . . . . . . . . . . . . . Relationships between Segmentation and Color Spaces Conclusion . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
66 66 110 140 161 162
Generalized Discrete Radon Transforms and Applications to Image Processing G LENN R. E ASLEY AND F LAVIA C OLONNA I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 170 II. Background on Wavelets . . . . . . . . . . . . . . . . . . . . 179 v
vi III. IV. V. VI. VII. VIII. IX.
CONTENTS
Beyond Wavelets . . . . . . . . . . . The Discrete p-Adic Radon Transform Generalized Discrete Radon Transform Noise Removal Experiments . . . . . Applications to Image Recognition . . Recognition Experiments . . . . . . . Conclusion . . . . . . . . . . . . . . References . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
190 197 204 217 221 230 231 235
. . . . . . . .
. . . . . . . .
. . . . . . . .
242 245 251 257 273 310 338 360
. . . . . . . .
. . . . . . . .
363 364 374 380 382 388 410 411
Lie Algebraic Methods in Charged Particle Optics ˇ T OMÁŠ R ADLI CKA
I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . . . . . . . . Trajectory Equations . . . . . . . . . . . . . . . . . . The Field Computation . . . . . . . . . . . . . . . . . Trajectory Equations: Solution Methods . . . . . . . . . The Analytic Perturbation Method . . . . . . . . . . . The Symplectic Classification of Geometric Aberrations Axial Symmetric Aberrations of the Fifth Order . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
Recent Developments in Electron Backscatter Diffraction VALERIE R ANDLE I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . Fundamental Aspects of EBSD . . . . . . The Orientation Map and Data Processing . Established Applications of EBSD . . . . Recent Advances in EBSD . . . . . . . . Advances in EBSD Technology . . . . . . Trends in EBSD Usage . . . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
I NDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
C LAAS B ONTUS (1), Philips Research Europe–Hamburg, Sector Medical Imaging Systems, Röntgenstrasse 24–26, D-22 335 Hamburg, Germany L AURENT B USIN (65), Laboratoire LAGIS UMR CNRS 8146 – Bâtiment P2, Université des Sciences et Technologies de Lille, 59655 Villeneuve d’Ascq, France F LAVIA C OLONNA (169), Department of Mathematical Sciences, George Mason University, Fairfax, VA 22030, USA G LENN R. E ASLEY (169), System Planning Corporation, Arlington, VA 22209, USA T HOMAS KÖHLER (1), Philips Research Europe–Hamburg, Sector Medical Imaging Systems, Röntgenstrasse 24–26, D-22 335 Hamburg, Germany L UDOVIC M ACAIRE (65), Laboratoire LAGIS UMR CNRS 8146 – Bâtiment P2, Université des Sciences et Technologies de Lille, 59655 Villeneuve d’Ascq, France ˇ (241), Institute of Scientific Instruments AS CR, T OMÁŠ R ADLI CKA Královopolská 147, 612 64 Brno, Czech Republic
VALERIE R ANDLE (363), Materials Research Centre, School of Engineering, University of Wales Swansea, Swansea, UK N ICOLAS VANDENBROUCKE (65), Laboratoire LAGIS UMR CNRS 8146 – Bâtiment P2, Université des Sciences et Technologies de Lille, 59655 Villeneuve d’Ascq, France Ecole d’Ingénieurs du Pas-de-Calais (EIPC), Campus de la Malasisse – BP39, 62967 Longuenesse Cedex, France
vii
This page intentionally left blank
PREFACE The five chapters that make up this latest volume range over tomography, color image processing, charged particle optics and electron microscopy. We begin with a full and scholarly account of many of the algorithms used for computed tomography, by C. Bontus and T. Köhler. They take the reader systematically through the mathematical foundations and in particular, devote considerable space to the reconstruction algorithm developed by A. Katsevich for helical computed tomography and the various developments engendered by that work. In a second contribution from the same area, G.R. Easley and F. Colonna provide a very learned account of the generalized discrete Radon transform. After a reminder of the classical Radon transform, they introduce wavelets and their descendants—ridgelets, curvelets and shearlets. All these tools are then applied to the Radon transform and various applications are included. The introduction of color is a major complication of image processing and few of the traditional tools can be transferred immediately from black-andwhite images to color images. L. Busin, N. Vandenbroucke and L. Macaire describe the notion of color spaces and then examine in considerable detail the problem of segmentation of color images. Charged particle optics is still very much a developing subject, as the chapter by T. Radliˇcka shows. He reconsiders the various methods of calculating aberration coefficients, notably using differential algebra, a relatively new addition to the armoury, and examines their relative merits. A very interesting section is devoted to the symplectic classification. Finally, V. Randle provides an account of recent developments in electron backscatter diffraction, a subject on which she is an acknowledged expert. Although the technique is well established, there have been significant advances in the past five years and it is these that are presented here, with a wealth of illustrations. All the authors have taken great pains to ensure that their subjects can be understood by readers from other specialities, which is a feature of these Advances that I like to emphasize. I thank them here for all the trouble they have taken and conclude as always with a list of forthcoming contributions. Finally, just a reminder that these Advances are now available via ScienceDirect; the whole series can be consulted, Advances in Electronics and Electron Physics as well as the present title. Peter Hawkes ix
This page intentionally left blank
FUTURE CONTRIBUTIONS
S. Ando Gradient operators and edge and corner detection P. Batson (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy C. Beeli Structure and microscopy of quasicrystals A.B. Bleloch (special volume on aberration-corrected electron microscopy) Aberration correction and the SuperSTEM project C. Bobisch and R. Möller Ballistic electron microscopy G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams F. Brackx, N. de Schepper and F. Sommen The Fourier transform in Clifford analysis A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases T. Cremer Neutron microscopy xi
xii
FUTURE CONTRIBUTIONS
N. de Jonge and E.C. Heeres Electron emission from carbon nanotubes A.X. Falcão The image foresting transform R.G. Forbes Liquid metal ion sources C. Fredembach Eigenregions for image classification A. Gölzhäuser Recent advances in electron holography with point sources D. Greenfield and M. Monastyrskii Selected problems of computational charged particle optics M. Haider (special volume on aberration-corrected electron microscopy) Aberration correction in electron microscopy M.I. Herrera The development of electron microscopy in Spain N.S.T. Hirata (vol. 152) Stack filter design M. Hÿtch, E. Snoeck and F. Houdellier (special volume on aberrationcorrected electron microscopy) Aberration correction in practice J. Isenberg Imaging IR-techniques for the characterization of solar cells K. Ishizuka Contrast transfer and crystal images A. Jacobo Intracavity type II second-harmonic generation for image processing B. Kabius (special volume on aberration-corrected electron microscopy) Aberration-corrected electron microscopes and the TEAM project
FUTURE CONTRIBUTIONS
xiii
L. Kipp Photon sieves A. Kirkland and P.D. Nellist (special volume on aberration-corrected electron microscopy) Aberration-corrected electron microscopy G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy O.L. Krivanek (special volume on aberration-corrected electron microscopy) Aberration correction and STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations H. Lichte New developments in electron holography M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens P.G. Merli Scanning electron microscopy of thin films S. Morfu, P. Marquié (vol. 152) Nonlinear systems for image processing T. Nitta (vol. 152) Back-propagation and complex-valued neurons M.A. O’Keefe Electron image simulation
xiv
FUTURE CONTRIBUTIONS
D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform K.S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images S.J. Pennycook (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy E. Rau Energy analysers for electron microscopes E. Recami Superluminal solutions to wave equations H. Rose (special volume on aberration-corrected electron microscopy) The history of aberration correction in electron microscopy G. Schmahl X-ray microscopy R. Shimizu, T. Ikuta and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications J.-L. Starck (vol. 152) Independent component analysis: the sparsity revolution I. Talmon Study of complex fluids by transmission electron microscopy N. Tanaka (special volume on aberration-corrected electron microscopy) Aberration-corrected microscopy in Japan
FUTURE CONTRIBUTIONS
xv
M.E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem N.M. Towghi Ip norm optimal filters E. Twerdowski Defocused acoustic transmission microscopy Y. Uchikawa Electron gun optics K. Urban and J. Mayer (special volume on aberration-corrected electron microscopy) Aberration correction in practice K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology R. Withers (vol. 152) Disorder, structured diffuse scattering and local crystal chemistry M. Yavor Optics of charged particle analysers Y. Zhu (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 151
Reconstruction Algorithms for Computed Tomography CLAAS BONTUS AND THOMAS KÖHLER Philips Research Europe–Hamburg, Sector Medical Imaging Systems, Röntgenstrasse 24–26, D-22 335 Hamburg, Germany
I. Introduction . . . . . . . . . . . . . II. Principles of Computed Tomography . . . . . A. X-Ray Attenuation . . . . . . . . . . B. Parameterization of the Measurements . . . . 1. Field of View and Region of Interest . . . . C. Detector Shapes . . . . . . . . . . 1. Basis Vectors . . . . . . . . . . D. Trajectories . . . . . . . . . . . . 1. Circular Trajectory . . . . . . . . . 2. Helical Trajectory . . . . . . . . . III. CT Reconstruction . . . . . . . . . . . A. Radon Transform . . . . . . . . . . 1. Inverse Radon Transform . . . . . . . 2. Fourier Slice Theorem . . . . . . . . B. Exact Filtered Backprojection . . . . . . 1. Derivative of the Projection Data . . . . . 2. Reconstruction Formula . . . . . . . 3. Mathematically Complete Trajectories . . . 4. Reconstruction Algorithm . . . . . . . 5. Filtering . . . . . . . . . . . . 6. Efficient Filtering . . . . . . . . . 7. Backprojection Using Wedge Detector Geometry C. Mathematics on the Planar Detector . . . . . D. Circle Plus Line . . . . . . . . . . 1. Geometrical Properties . . . . . . . 2. Filter Lines . . . . . . . . . . . 3. Proof of Exactness . . . . . . . . . 4. Reconstruction Results . . . . . . . E. The Katsevich Algorithm for Helical Pi Acquisition 1. Geometrical Properties . . . . . . . 2. Filter Lines . . . . . . . . . . . 3. Proof of Exactness . . . . . . . . . F. EnPiT: Helical n-Pi Reconstruction . . . . . 1. Reconstruction Algorithm . . . . . . . 2. Filter Lines . . . . . . . . . . . G. CEnPiT: Helical Cardiac CT . . . . . . . 1. Cardiac CT . . . . . . . . . . . 2. Filter Lines . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 4 4 5 6 7 10 10 10 12 15 15 16 18 19 19 20 23 24 25 27 31 32 34 35 36 38 40 42 42 43 46 49 50 51 52 52 56
1 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)00401-6
Copyright 2008, Elsevier Inc. All rights reserved.
2
BONTUS AND KÖHLER
3. Gated and Ungated Contributions IV. Outlook . . . . . . . . . . References . . . . . . . . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
58 59 61
I. I NTRODUCTION Computed tomography (CT) is of vital importance for medical diagnosis, intervention planning, and treatment evaluation. CT yields three-dimensional (3D) tomographic images of large parts of the human body within a few seconds. The rather wide bore allows use of CT scans for obese patients and claustrophobia is rarely an issue. At the same time, the current pace of technological progress is unprecedented. Two major advances are important. First, X-ray detectors are larger, allowing scanning larger volumes in shorter times. Second, rotation speeds of the tube-detector system are steadily increasing. Although the technological progress is of great benefit for patients and medical staff, the engineering challenges are great. X-ray tubes must be able to withstand the centrifugal forces increasing with the rotation speed. These tubes also must be able to illuminate detectors of increasing sizes. Integration times of the detector elements become shorter. The amount of data measured per time unit increases with increasing rotation speed and detector sizes. All these data must be transmitted and processed in an appropriate amount of time. Data processing is separated into different steps: preprocessing, reconstruction and image processing. Preprocessing covers correction methods to treat disturbing effects and system imperfections, such as beam hardening, cross-talk, and afterglow. For example, beam hardening is a nonlinear effect resulting from the fact that X-rays experience weaker absorption with increasing photon energy. Image-processing methods cover all algorithms that are applied to the image data to assist radiologists in image analysis. Examples for image-processing methods are tools that automatically segment different organs, as well as computer-aided diagnosis (CAD) tools. This chapter covers the field of CT reconstruction, i.e., the mathematics to obtain the image data from the measurements. In other words, reconstruction deals with an inverse problem. Reconstruction algorithms must fulfill certain criteria to be useful in practice. They must be numerically stable and yield images of sufficient quality with respect to artefacts, resolution, and signal to noise ratio (SNR). Furthermore, reconstruction algorithms should be sufficiently fast so that images can be obtained in an adequate amount of time. Historically, most CT scanners were so-called single-row scanners. These scanners yield two-dimensional (2D) images of cross-sectional slices. The
RECONSTRUCTION ALGORITHMS
3
mathematics for single-row scanners is well understood. Modern cone-beam CT scanners, in which the detector consists of many rows, require completely new reconstruction algorithms. The first reconstruction algorithms proposed for cone-beam scanners were extensions of 2D methods. These algorithms, which are of approximative nature, yield good image quality as long as the number of detector rows does not become too large. Once the number of rows becomes large, more effort is necessary and the underlying mathematics must be analyzed to develop more sophisticated algorithms. In 2002 Alexander Katsevich (2002) published the first reconstruction algorithm for helical CT, which is mathematically exact, of the filtered backprojection (FBP) type, and in which the filtering is shift invariant. In particular, the exactness guarantees that remaining artefacts originate from the discrete reformulation of the inversion formulas. This discretization is required by the discrete sampling of the projection data. FBP, in combination with the shift invariance of the underlying filter, ensures that the algorithm can be implemented in an efficient way. Katsevich’s work resulted in many publications analyzing the underlying mathematics or extending the algorithm to different settings. The chapter is devoted to reconstruction algorithms of the FBP type that are derived from an exact method. The underlying mathematics is discussed on a broad basis. Nevertheless, more effort is placed on descriptive discussions than on waterproof mathematical analyses. Section II provides a short summary of the principles. The measurement process as well as different kinds of detectors and trajectories are discussed. Section III covers the field of reconstruction in several subsections, beginning with the Radon transform in Section III.A. As it turns out, the mathematics associated with the discussed algorithms can be separated into two parts. While Sections III.B and III.C deal with the formulas that are independent of the particular system settings, the subsequent sections specify the parameters for different types of acquisitions. The mathematics of the circle and line trajectory (CLT) can be understood readily. This trajectory, therefore, serves as the first example, covered in Section III.D. The Katsevich algorithm for a socalled Pi acquisition is covered in Section III.E, while Section III.F deals with the so-called EnPiT algorithm. The EnPiT algorithm can be considered an extension of the Katsevich method useful for more freely choosing the patient table feed. Cardiac CT is one of the most important applications for modern CT scanners. The CEnPiT algorithm, which was developed for cardiac cone-beam CT based on a helical acquisition, is briefly summarized in Section III.G.
4
BONTUS AND KÖHLER
II. P RINCIPLES OF C OMPUTED T OMOGRAPHY A. X-Ray Attenuation Figure 1 shows a modern CT scanner. Within such a CT system a tube emits X-rays in the direction of a detector. The X-rays penetrate a certain part of the patient’s body before they enter the detector. During the scan the tube-detector system (termed as the gantry) rotates around the patient (see Figure 2) so that projections from a large number of directions are taken. The patient lies on a patient table, that can be moved during the scan. In this manner, a relative motion of the patient with respect to the rotation plane can be realized. In practice, this is used, for example, to obtain a relative trajectory in which the X-ray tube moves on a helix around the patient. Consider the X-ray tube itself. Let us denote the flux density of X-ray photons with energy E emitted into a certain direction by I0 (E). Typical units for the photon flux density are [I ] = 1/s sr keV; that is I (E) is the number of photons emitted within each second per solid angle and per photon energy. Some X-ray photons are scattered or absorbed when penetrating the patient. The number of photons, that pass through the patient (not scattered and not absorbed) is given by the Beer–Lambert law: (1) I (E) = I0 (E) exp − d μ x(), E .
F IGURE 1.
The Philips Brilliance 64 is a modern CT scanner with 64 detector rows.
RECONSTRUCTION ALGORITHMS
5
F IGURE 2. (a) The tube-detector system rotates around the patient during a CT scan. (b) The patient table can be moved so that a helical trajectory can be realized.
Here, x() parameterizes the path along which the X-rays travel. The scalar function μ(x) are the so-called absorption coefficients. Function μ(x) is also known as the object function. Certainly, the values of μ are different for different kinds of tissue. The aim of reconstruction is to recover the μ-values from the measurements. From Eq. (1) we easily obtain d μ x(), E = ln I0 (E)/I (E) .
(2)
In other words, a monochromatic CT system measures line integrals. Unfortunately, X-rays emitted from today’s tubes have a wide spectrum and the detectors average over the photon energies. Therefore, some preprocessing steps are necessary before the measurements can approximately be interpreted as line integrals. B. Parameterization of the Measurements The following sections assume that the preprocessing steps yield data, which can be interpreted as line integrals in the sense of Eq. (2) with E = E0 , where E0 is some mean energy. We choose a coordinate system, which is fixed to the patient. At a particular time the X-ray source will be at a point y with respect to this coordinate system. The line integrals associated with the measurements at this point in time all originate from y. Terms source position and focal spot are used synonymously throughout this chapter and correspond to vectors y.
6
BONTUS AND KÖHLER
Each line integral can be parameterized as ∞ d μ(y + θ),
D(y, θ) =
(3)
0
where θ is a unit vector pointing into the direction of the line integral. The different values of θ are determined by the locations of the different detector elements. If the patient table is fixed during the scan (see Figure 2), the source moves on a circular trajectory with respect to the coordinate system defined above. This trajectory can be parameterized as R cos s (4) y ◦ (s) = R sin s , z0 where R corresponds to the distance from the source to the rotation axis. We use the angular variable s to parameterize points on the trajectory. Each measurement belongs to a certain s. The set of measured values, i.e., all detector values, belonging to a certain s is denoted as a projection. For a helical trajectory, the positions of the source can be described with the help of the vector R cos s (5) y H (s) = R sin s , s h¯ where h = 2π h¯ corresponds to the pitch—the table feed per rotation. Finally, a line parallel to the z-axis is given by R (6) y | (z) = y | (hs) = 0 , z where h is an arbitrary constant, which was introduced to make y depend on the angular variable s. A linear trajectory according to Eq. (6) can be realized if the tube-detector system does not rotate but the patient table moves (compare with Figure 2). 1. Field of View and Region of Interest As discussed above, the aim of CT reconstruction is to recover the absorption coefficients μ from the measurements. The question of whether a certain object point x is reconstructible—if its μ value can be determined—is of particular interest. This question is related to the set of directions from which x is illuminated by X-rays. The set of all object points, which are
RECONSTRUCTION ALGORITHMS
7
reconstructible, is denoted as the field of view (FOV). The region of interest (ROI) is the set of all object points the user wants to be reconstructed. Some approximate algorithms yield absorption coefficients even for object points that do not belong to the FOV. These algorithms provide good approximation for object points that are not reconstructible in a strict sense. In other words, the ROI can be a subset or a superset of the FOV. C. Detector Shapes Detectors in conventional CT scanners are so-called focus-centered detectors. These detectors have a cylindrical shape such that the symmetry axis of the cylinder is parallel to the z-axis and contains the focal spot. The focal spot is the point within the X-ray tube from which the X-rays originate. Points on the focus detector are parameterized using the fan-angle variable ϕ and the coordinate vF as shown in Figure 3. Our conventions are such that the point (ϕ = 0, vF = 0) corresponds to the center of the detector. Physical detectors consist of detector elements, such that the detector coordinates used to describe the measurements take only discrete values. In particular, these values are ϕk = ϕ0 + kϕ, vFp = vF0 + pv,
k = 0, . . . , #columns − 1,
(7)
p = 0, . . . , #rows − 1,
(8)
which defines the notion of detector columns and rows. For convenience we introduce two virtual detectors—the planar and the parallel (wedge) detector. Data on these virtual detectors can be obtained by rebinning steps using the focus-detector data. In our convention, the planar
F IGURE 3. Points on the focus-centered detector are parameterized by the fan-angle variable ϕ and coordinate vF .
8
BONTUS AND KÖHLER
F IGURE 4.
Points on the planar detector are parameterized by the variables uP and vP .
detector contains part of the rotation axis. Points on the planar detector are parameterized using the coordinates uP and vP as shown in Figure 4. Focusand planar-detector coordinates are related by uP = R tan ϕ and vP =
R 2 + u2P tan λ = R
vF R tan λ = , cos ϕ cos ϕ D
(9)
tan λ =
vF , D
(10)
where D is the distance from the source to the center of the focus detector, and λ is the cone angle. The wedge detector is illustrated in Figure 5. It has an extended source. In particular, from each source position along the circle or helix, only data from one virtual focus-detector column enter into a wedge-detector projection. A virtual column is a column located somewhere between physical columns. Data along the virtual column can be obtained by interpolation of data from physical columns. For a helical trajectory the wedge detector is bent along the z-direction; that is its lower and upper boundaries have the slope of the helix. Details are provided in the following text. The wedge detector coordinates are designated as u and v , as shown in Figure 6. Furthermore, we parameterize different wedge-detector projections by the angular variable t similar to s, which parameterizes different focusdetector projections [compare with Eqs. (4) and (5)]. In particular, t is the angle enclosed by the parallel fans shown in Figure 5 and the x-axis. The angle ϕ is measured in a plane parallel to the xy-plane. Therefore, the line
RECONSTRUCTION ALGORITHMS
9
F IGURE 5. For the parallel (wedge) detector data from different source positions must be combined. Data belonging to a particular column of the wedge detector originate from one column of the focus detector at a specific source position. The right image shows the wedge detector on the left seen from above.
F IGURE 6.
Points on the wedge detector are parameterized by the variables u and v .
(intersecting with the rotation axis) used to measure that angle in Figure 6 does not intersect with the center of the wedge detector. The relationships between focus- and wedge-detector coordinates are s = t + ϕ,
(11)
u = R sin ϕ
(12)
and v vF = . (13) D R These equations should be interpreted as follows. For given coordinates (t, u , v ) compute ϕ according to Eq. (12). Now, Eq. (11) indicates from which source positions the data must be taken. Finally, the required row on the focus detector can be computed using Eq. (13). Note that our conventions are such that constant v means constant vF . In other words, data belonging tan λ =
10
BONTUS AND KÖHLER
to a row of the wedge detector are taken from the same row but different projections of the focus detector. Therefore, when performing the rebinning from focus to wedge data, interpolations are necessary in ϕ and s but not in vF . This explains the bending of the wedge detector in the helical case above. 1. Basis Vectors For the subsequent discussion it is useful to introduce an orthonormal basis, which is stationary on the planar detector. We define basis vectors u(s), v(s), and w(s) such that (14) x = uP u + vP + yz (s) v gives the world coordinates of point (uP , vP ). The term proportional to yz (s) (which is the z-component of vector y) in Eq. (14) is necessary, because the center of the planar detector moves with the source along z. A point x is located on the planar detector (at least if we consider the detector to have infinite extents), if x · w = 0. In this case, (uP = x · u, vP = x · v − yz (s)) gives the planar-detector coordinates of x. For the cylindrical trajectories given by Eqs. (4) and (5), the explicit form of the basis vectors are − sin s 0 cos s u(s) = v(s) = 0 , w(s) = sin s . cos s , 0 1 0 (15) D. Trajectories 1. Circular Trajectory Let us have a closer look at the circular trajectory, Eq. (4). For later reference we consider the stereographic projection of the circle onto the planar detector seen from a source at y | (z1 ), which is not necessarily in the circle plane [compare with Eq. (6)]. The following derivation aims at obtaining Eq. (22). We describe the line containing the focal spot and the point y ◦ (s) via R cos s R (s, λ) = (1 − λ)y | (z1 ) + λy ◦ (s) = (1 − λ) 0 + λ R sin s , z0 z1 (16) where λ parameterizes points on the line. See Figure 7 for an illustration. The just-defined line intersects with the planar detector, if 0 = · w = (1 − λ0 )R + λ0 R cos s,
(17)
RECONSTRUCTION ALGORITHMS
F IGURE 7. planar detector.
11
The line connecting the focal spot at y and a point on the circle intersects with the
where λ0 parameterizes the point of intersection. In the latter equation w is the basis vector defined in Section II.C normal to the planar detector. Therefore, w = (1, 0, 0) in our case. Solving for λ0 , we obtain λ0 =
1 1 = . 1 − cos s 2 sin2 s/2
(18)
The planar-detector coordinates of the point of intersection are obtained using basis vectors u = (0, 1, 0) and v = (0, 0, 1) sin s
s = R cot , 2 2 sin s/2 z0 − z1 vP (s) = (s, λ0 ) · v − z1 = λ0 (z0 − z1 ) = . 2 sin2 s/2 uP (s) = (s, λ0 ) · u = λ0 R sin s = R
2
(19) (20)
These equations are still parameterized by s. We solve Eq. (19) for s. Inserting the result into Eq. (20) and using sin2 arctan
R R2 = 2 uP uP + R 2
(21)
provides vP (uP ) =
u2 z0 − z 1 1 + P2 . 2 R
(22)
The latter equation parameterizes the projection of the circle at z = z0 onto the planar detector as a function of uP . For symmetry reasons this equation is valid for all source positions at z = z1 at the surface of the cylinder with radius R.
12
BONTUS AND KÖHLER
2. Helical Trajectory The key for exact reconstruction of helical CT data is the Pi window or Tam– Danielsson window. It is related to the minimum amount of data, that need to be acquired to perform an exact reconstruction (Danielsson et al., 1997; Tam, 1995). The generalization of the Pi window is the so-called n-Pi window, where n is an odd integer (Proksa et al., 2000). Figure 8 depicts the surfaces enclosing the points belonging to the Pi and 3-Pi window associated with a certain source position. These surfaces contain the focal spot and points on the helix. In particular, for the n-Pi window the points on the helix defining the lower surface fulfill the relation s0 − (n + 1)π < s < s0 − (n − 1)π,
(23)
where s0 parameterizes the current source position [see Eq. (5)]. Similarly, for the upper surface s0 + (n − 1)π < s < s0 + (n + 1)π.
(24)
We will now compute the projection of the n-Pi window boundaries onto the planar detector. The derivation is very similar to the treatment of the circular trajectory above. It results in Eq. (32). In particular, the line containing the current source position at y H (s0 ) and another point on the helix at y H (s) is parameterized by R cos s R cos s0 (25) (s, λ) = (1 − λ) R sin s0 + λ R sin s . h¯ s0 h¯ s
F IGURE 8. Left: The Pi window contains object points enclosed by the shown surfaces. These surfaces contain the focal spot and parts of the helical trajectory. Right: For the 3-Pi window the next-to-next turns of the helix determine the enclosing surfaces.
13
RECONSTRUCTION ALGORITHMS
This line intersects with the planar detector, if 0 = (s, λ0 ) · w(s0 ) = R + λ0 R(cos s − 1),
s = s − s0 ,
(26)
where we used the expression for w in Eq. (15). Solving for λ0 , we obtain λ0 =
1 1 = . 2 1 − cos s 2 sin (s/2)
(27)
Coordinates uP and vP of the point of intersection are obtained via uP (s) = (s, λ0 ) · u(s0 ) = R
sin s 2
= R cot
2 sin (s/2) hs ¯ vP (s) = (s, λ0 ) · v(s0 ) − h¯ s0 = . 2 sin2 (s/2)
s , 2
(28) (29)
Now, we introduce angle ¯s , such that ¯s = s + (n + 1)π for the lower n-Pi window boundary, and ¯s = s − (n − 1)π for the upper boundary (compare with Figure 8). Comparing with Eqs. (23) and (24), we realize that 0 < ¯s < 2π. Using ¯s in Eq. (28) and making use of the symmetry of the tan function, we obtain uP ¯s = arccot . 2 R From Eq. (29) we determine
uP 2 ¯s − (n + 1)π, lower boundary, vP = h¯ 1 + ¯s + (n − 1)π, upper boundary R
(30)
(31)
where we inserted Eq. (30) into the denominator of Eq. (29) and used Eq. (21). Now, using Eq. (30) again and making use of arccot uP /R = π/2 − arctan uP /R, we derive uP π uP 2 up,low . n ∓ arctan vP,n (uP ) = ±h¯ 1 + R 2 R
(32)
The latter equation gives the upper and lower boundaries of the n-Pi window on the planar detector. Figure 9 shows the corresponding curves for cases n = 1, 3, and 5. For later reference, the projections of the lower and upper boundaries of the focus detector result in 2 2 H uP + R F , (33) (uP ) = ± vP± 2 D where H is the height of the focus detector.
14
BONTUS AND KÖHLER
F IGURE 9. The Pi window (solid), 3-Pi window (dashed), and 5-Pi window (dash-dotted) boundaries on the planar detector.
The introduction of the Pi and n-Pi window gives rise to the definition of the Pi and n-Pi segments. These segments are parts of the helix associated with each object point. We start with the Pi segment and consider the projection of a certain object point x onto the planar detector. From each source position y H (s), x is either projected inside or outside the Pi window. Now, the Pi segment is the union of all points on the helix from which x is projected into the Pi window. In other words, y H (s) belongs to the Pi segment of x, if up
low (uP ) vP vP,1 (uP ). vP,1 low Here, (uP , vP ) are the coordinates of the projected object point x, and vP,1 up and vP,1 are the Pi-window boundaries according to Eq. (32). For each object point two angles s1 and s2 can be found, such that y H (s) belongs to the Pi segment of x, if and only if s1 s s2 (Defrise, Noo and Kudo, 2000). Points y H (s1 ) and y H (s2 ) are sometimes denoted as sunrise and sunset. The three points x, y H (s1 ), and y H (s2 ) are located on a straight line, since x is projected onto the Pi-window boundary, i.e., onto the helix, from the points of sunrise and sunset (Figure 10). Interval [s1 , s2 ] is denoted as the Pi interval of object point x. The line containing y H (s1 ) and y H (s2 ) is called the Pi-line of x. Obviously, all points located on that line share the same Pi-line and Pi segment. The situation is similar but slightly more complicated for the n-Pi window. Every point y H (s) belongs to the n-Pi segment of x, if x is projected between the lower and upper n-Pi window boundaries. Now, for some object points the n-Pi segment is not a single segment. In fact, the n-Pi segment can consist of up to n subsegments. Therefore, some object points experience a sunrise and a sunset up to n times along the helix, a phenomenon known as interrupted illumination. See Figure 11 for an example. Similar to the definition of the
RECONSTRUCTION ALGORITHMS
15
F IGURE 10. The Pi segment of the exemplified object point x along the helix is printed bold. From each point y H (s) within the Pi segment, x is located between the Pi-window surfaces shown in Figure 8.
F IGURE 11. Left: The bold points denote the 3-Pi segment of the exemplified object point. Right: Projection of the object point onto the planar detector from different source positions results in the shown trace on the detector. Sunrise or sunset occur if the object point is projected onto the boundaries of the 3-Pi window.
Pi-line, an n-Pi line contains a point of sunrise, a point of sunset, and x. The number of n-Pi lines can reach the value of n if interrupted illumination occurs.
III. CT R ECONSTRUCTION A. Radon Transform The Radon1 transform is an important tool for the analysis and development of reconstruction algorithms. The 3D Radon transform is the set of all plane integrals through a given object function. We parameterize Radon planes by normal vectors ω and scalars ρ, where ρ is the shortest distance of the plane to the origin (Figure 12). With these conventions the Radon transform of object 1 Johann Radon, Austrian mathematician, 1887–1956.
16
BONTUS AND KÖHLER
F IGURE 12. An exemplary Radon plane with normal vector ω. The dashed line has the length ρ, which is the shortest distance from the origin to the plane. Points x on the plane fulfill the relation ω · x = ρ.
function μ(x) is given by ∞
Rμ(ρ, ω) =
d3 x μ(x)δ(ρ − ω · x).
(34)
−∞
1. Inverse Radon Transform The inverse of the Radon transform can be obtained via the following formula (Natterer, 1986) −1 μ(x) = 8π 2
π
2π dϑ
0
dα sin α R μ(ω · x, ω),
(35)
0
where ω = (cos α sin ϑ, sin α sin ϑ, cos ϑ)
(36)
and
R μ(ρ, ω) =
∂ 2 Rμ(ρ, ω) ∂ρ 2
(37)
is the second derivative of the Radon transform with respect to ρ. The interpretation of Eq. (35) is as follows. In standard convention the (scalar) value associated with one particular Radon plane enters Radon space at point ρω. Now, for the evaluation of Eq. (35) all Radon values are taken into account for which ρ = ω · x, i.e., all Radon planes are considered, which contain x.
RECONSTRUCTION ALGORITHMS
17
F IGURE 13. Two-dimensional cut parallel to ω and containing x and the origin O. The Radon plane R is a straight line in this view, since ω is normal to R.
We show that all Radon values that must have to be taken into account are located on the surface of a sphere in Radon space (Figure 13). This figure shows a 2D cut parallel to ω containing x and the origin O. The Radon plane R defined by ρ and ω is perpendicular to this view since ω is orthogonal to R. The Radon plane contains x since ρ = ω · x for the evaluation of Eq. (35). Obviously |x| cos ε = ω · x = ρ.
(38)
We compute the distance r measured from x/2 to ρω with the help of the law of cosine: 2 1 1 |x| + ρ 2 − 2 |x|ρ cos ε. (39) r2 = 2 2 Exchanging |x| cos ε by ρ yields r = |x|/2, which is independent of ρ and ω. In other words, the Radon values required in Eq. (35) are located on the surface of a sphere with radius r. The center of this Radon sphere is located at x/2 as illustrated in Figure 14. As discussed previously, a CT scanner measures line integrals, such that Eq. (35) cannot be applied directly. Grangeat (1991) found a relation between the first derivative of the Radon transform and cone-beam projections. Based on this relation, different authors proposed reconstruction algorithms for the helical trajectory, which can also be applied to CT data with projections truncated along z (Kudo, Noo and Defrise, 1998; Schaller et al., 2000; Tam et al., 1999). Nevertheless, reconstruction algorithms based on direct Radon inversion tend to be inefficient and numerically unstable. The theory of Radon inversion will be used in the sequel for the derivation of filtered backprojection (FBP) algorithms. These FBP algorithms are numerically very efficient and yield results in which artefacts result only from the discrete implementation.
18
BONTUS AND KÖHLER
F IGURE 14. Radon values located on the surface of a sphere are required for the evaluation of the inverse Radon transform at x according to Eq. (35). The center of the Radon sphere is located at x/2 and the radius of the sphere is equal to |x/2|.
2. Fourier Slice Theorem The Fourier slice theorem gives an important relation between the Radon transform and the object function in the Fourier domain. For its derivation we consider Eq. (34) and perform the Fourier transform along ρ: ∞
FRμ(ρ, ˜ ω) =
dρ exp(−2π iρρ) ˜ Rμ(ρ, ω) −∞ ∞
=
∞ dρ exp(−2π iρρ) ˜
−∞ ∞
=
d3 x μ(x)δ(ρ − ω · x)
−∞
d3 x μ(x) exp(−2πiρω ˜ · x),
(40)
−∞
where in the last step the δ-function was evaluated for the integration over ρ. Now, introducing the 3D Fourier transform of the object function ∞
F μ(x, ˜ y, ˜ z˜ ) =
d3 x exp −2πi(x x˜ + y y˜ + z˜z) μ(x, y, z)
(41)
−∞
and using Eq. (36) yields
FRμ(ρ, ˜ ω) = F μ(x, ˜ y, ˜ z˜ )| x= ˜ ρ˜ cos α sin ϑ . y= ˜ ρ˜ sin α sin ϑ z˜ =ρ˜ cos ϑ
(42)
RECONSTRUCTION ALGORITHMS
19
The latter equation is known as the Fourier slice theorem. It builds a relationship between the Fourier transform of the object function and the Radon transform. Consider a Radon plane, which is perpendicular to the z-axis, i.e., ϑ = 0. According to Eq. (42) we have to set x˜ = y˜ = 0 for these planes. In other words, planes perpendicular to the z-axis contribute only to vanishing frequency components along xy. Next, consider a Radon plane, that is nearly perpendicular to the z-axis, i.e., ϑ is small. The maximum value of ρ˜ is of the order of the largest frequency necessary to describe the object function (or the Nyquist frequency). Therefore, ρ˜ sin ϑ is much smaller than this frequency and x˜ and y˜ are small. In conclusion, Radon planes, which are nearly perpendicular to the z-axis, contribute to low-frequency components of trans-axial slices only.
B. Exact Filtered Backprojection Reconstruction algorithms discussed in this chapter share the basic principles. The formulas describing these principles are described in this section before discussion in subsequent sections of how these equations can be applied to different trajectories. 1. Derivative of the Projection Data The derivative of the projection data is of particular interest. Specifically, we consider the derivative of the line integral data [compare with Eq. (3)] with respect to s: ∂D(y(q), θ) . D y(s), θ = (43) ∂q q=s Notice that (since θ is constant) the evaluation of the differentiation requires the consideration of parallel rays from different source positions. Therefore, the differentiation can be computed by considering data along the rows of the wedge detector. Using Fourier filtering techniques, differentiated data along a complete wedge-detector row can be obtained in a single processing step. Furthermore, Fourier techniques allow a modification of the filter in Fourier space. The latter can, for example, be used to suppress the high and emphasize the low frequencies. On the other hand, the Fourier filtering approach requires two rebinning steps. Data must be rebinned from focus to wedge geometry before filtering
20
BONTUS AND KÖHLER
and back to focus geometry afterward. The determination of the derivative via discrete differences is computationally more efficient. A smart way to compute the derivative using discrete differences was published by Noo, Pack and Heuscher (2003). 2. Reconstruction Formula We assume that the trajectory is described by a vector y(s) similar to Eqs. (4), (5), or (6). Let us introduce the unit vector b(s, x) pointing from y(s) to the object point at x: b(s, x) =
x − y(s) . |x − y(s)|
(44)
Furthermore we consider a set of unit vectors eν (s, x) and a set of corresponding constants (weights) μν , ν = 1, . . . , Ne . Vectors eν and weights μν are trajectory-specific (or algorithm specific) reconstruction parameters. They must be chosen properly, as will become evident below. Generally, the number of e-vectors, Ne , can depend on s. Each e-vector is supposed to be orthogonal to b(s, x), such that each pair b(s, x) and eν (s, x) spans a plane containing x and y. We denote these planes as κ-planes. See Figure 15 for an illustration. We introduce the backprojection interval IBP (x) and the backprojection segment CBP (x). For instance, IBP (x) can correspond to the Pi interval introduced in Section II.D. The backprojection interval defines projections, i.e., s-values, which are taken into account for the reconstruction of μ(x). Similarly, the backprojection segment is the set of all points along the trajectory associated with IBP (x). Therefore,
CBP (x) = y(s) | s ∈ IBP (x) . (45)
F IGURE 15. Vectors b and e span a plane containing the object point x and the focal spot y. We denote these planes as κ-planes. κ-Planes must not be confused with Radon planes.
RECONSTRUCTION ALGORITHMS
The reconstruction formula derived in this section is I (s, x) −1 , ds μ(x) ˜ = |x − y(s)| 2π 2
21
(46)
IBP (x)
where I (s, x) =
Ne
π μν
ν=1
−π
dγ D y(s), cos γ b + sin γ eν sin γ
(47)
D
is given by Eq. (43). Eq. (46) corresponds to an exact reconstruction and algorithm, if the reconstruction parameters eν (s, x) and μν have been chosen properly. For the derivation of Eq. (46), we evaluate Eq. (3) at θ(γ ) = cos γ b(s, x) + sin γ eν (s, x),
(48)
where −π γ π. Inserting Eq. (35) we obtain D(y, cos γ b + sin γ eν ) −1 d dΩ = R μ ω · y + (cos γ b + sin γ e ) ,ω , ν 8π 2 where dΩ = dϑ sin α dα. Eq. (47) becomes
(49)
Ne dγ −1 d dΩ μν I (s, x) = sin γ 8π 2 ν=1
∂ R μ ω · y(q) + (cos γ b + sin γ eν ) , ω q=s ∂q by insertion of Eq. (49). Inserting a δ-function δ ω · y(q) + (cos γ b + sin γ eν ) − ξ ∞ = dρe2πiρ(ω·[y(q)+(cos γ b+sin γ eν )]−ξ )
(50)
(51)
−∞
we make
R μ
depend on ξ
I (s, x) =
Ne dγ −1 d dΩ dρ dξ μ ν sin γ 8π 2 ν=1 ∂ 2πiρ(ω·[y(q)+(cos γ b+sin γ eν )]−ξ ) e R μ(ξ, ω). ∂q q=s
(52)
22
BONTUS AND KÖHLER
Next, we make a change of variables u1 = cos γ , u2 = sin γ , such that ddγ du1 du2 ddγ = = . (53) sin γ u2 u2 Using this, the integrations over and γ yield (Jeffrey, Gradshteyn and Ryzhik, 1994) du2 du1 exp 2π iρω · (u1 b + u2 eν ) u2 = iπ sgn(ρω · eν )δ(ρω · b) = iπ sgn(ω · eν )δ(ω · b)/ρ = iπ x − y(s) sgn(ω · eν )δ ω · x − y(s) /ρ,
(54)
where the last step follows directly from Eq. (44). The derivative with respect ˙ to q in Eq. (52) yields a factor 2π iρω · y(s), and together with Eq. (54) we obtain Ne −1 ˙ dΩ dρ dξ 2π iρω · y(s) iπ x − y(s) μ I (s, x) = ν 8π 2 ν=1 sgn(ω · eν )δ ω · x − y(s) /ρe2πiρ(ω·y(s)−ξ ) R μ(ξ, ω) Ne 1 ˙ sgn(ω · eν ) = x − y(s) μν dΩω · y(s) 4 ν=1 δ ω · x − y(s) R μ(ω · x, ω), (55) where in the last step the integrations over ρ and ξ were performed. We made R μ depend on ω·x using the δ-function under the integral. For the evaluation of the δ-function, remember that ω is the normal vector of a Radon plane, which contains the object point x. Vector (x − y(s)) points from the focal spot to x. Therefore, the argument of the δ-function vanishes exactly at those points y(sk˜ ), at which the Radon plane intersects with the trajectory. Using the scaling property of the δ-function, we obtain δ(s − sk˜ ) , (56) δ ω · x − y(s) = ˙ |ω · y(s)| k˜
where the sum runs over all intersection points sk˜ of the trajectory with the Radon plane defined by x and normal vector ω. The main result of this section is obtained by integration of I (s, x) over s as in Eq. (46). It yields −1 dΩ R μ(ω · x, ω)w(x, ω), (57) μ(x) ˜ = 8π 2
RECONSTRUCTION ALGORITHMS
23
F IGURE 16. The backprojection segment CBP (x) associated with IBP (x) for the object point x is drawn in bold along the helix. The Radon plane defined by x and ω has three intersection points with the helix: y(s1 ), y(s2 ), and y(s3 ). It has only one intersection point, y(s2 ), with CBP (x).
where w(x, ω) =
NI N e (k) k
˙ k ) sgn(ω · eν ). μν sgn ω · y(s
(58)
ν=1
The sum over k is different from the sum over k˜ in Eq. (56). The reason for this is that the integration over s was performed over IBP (x). Therefore, all sk fulfill sk ∈ IBP (x). In other words, k counts all intersection points (IPs) of the Radon plane (associated with ω) with the backprojection segment CBP (x) along the trajectory, whereas k˜ counts all IPs of the Radon plane with the complete trajectory (Figure 16). As mentioned, the number of eν -vectors, which we define, can depend on s. Therefore, Ne depends on k in Eq. (58). In the following we denote Radon planes as m-planes, if they have m IPs with CBP (x). Comparing Eqs. (35) and (57), we realize that Eq. (57) is the inverse Radon transform if and only if w(x, ω) is identical to one independent of x and ω. Therefore, if we define all eν and μν properly, such that this requirement is fulfilled, we derive a reconstruction algorithm that is mathematically exact. Subsequent sections are devoted to adequate definitions of these variables for different cases. First, we describe the general scheme of the reconstruction algorithm and show how it can be implemented efficiently. Section III.C shows how Eq. (58) can be evaluated using projections onto the planar detector. 3. Mathematically Complete Trajectories The foregoing discussion showed that we want to define the parameters of our reconstruction algorithms in such a way that Eq. (58) results in a constant value of 1. Here we elaborate on the requirements that the trajectory must fulfill. The result is known as the Tuy condition (Tuy, 1983).
24
BONTUS AND KÖHLER
F IGURE 17. Every Radon plane intersects at least once with the helix (left). The circular trajectory is incomplete for object points located outside of the circle plane (right).
Remember that k in Eq. (58) counts the number of IPs of the Radon plane with the backprojection segment CBP (x). Therefore, if Radon planes exist, which do not intersect with CBP (x), Eq. (58) will always give a vanishing result for these planes. In other words, an object point x is reconstructible only if every Radon plane containing x intersects with CBP (x) at least once. Trajectories fulfilling this requirement for all x in the ROI are denoted complete. Figure 17 illustrates completeness and incompleteness using the helical and circular trajectories as examples. In practice, an even a stronger restriction exists. The argumentation so far neglects the finite size of the detector. Taking the detector dimensions into account, a certain trajectory is complete, if every Radon plane that contains x has at least one IP from which x is projected onto the detector. In the sequel, we tacitly assume that IBP (x) is defined such that x is projected onto the detector from every point within CBP (x). This ensures that a certain trajectory is complete if every Radon plane that contains x intersects with CBP (x). 4. Reconstruction Algorithm In the following we assume that vectors eν and weights μν have been defined such that Eq. (58) results in a constant value of 1. With this assumption, Eqs. (47) and (46) define a reconstruction algorithm, which recovers the μvalues in a mathematically exact way. The reconstruction steps are as follows: 1. Compute the derivative in the sense of Eq. (43). 2. Filter the differentiated data, i.e., filter D (y(s), θ). 3. Perform the backprojection along IBP (x)—evaluate Eq. (46). Steps 1 and 2 follow from Eq. (47): the filtering is performed along γ with kernel 1/ sin γ . The filtering depends on x only via b(s, x). Now, all object points on one particular line from the focal spot to the detector share the same b. Consider two vectors b1 and b2 , where b2 is located in the κ-plane of b1 . In many cases, b1 is also located in the κ-plane of b2 . Now the filtering step
RECONSTRUCTION ALGORITHMS
25
is shift invariant. Therefore, using Fourier techniques, filtered values for all object points in the κ-plane spanned by b and eν (see Figure 15) can be obtained in one single processing step. Independent of this observation, this processing step must be repeated for each ν, i.e., the sum over ν must be calculated. The filtering step is further discussed in Section III.B.5. The backprojection in step 3 corresponds to the integration over s in Eq. (46). It incorporates the sum over ν in Eq. (47) using the weights μν . Section III.B.7 shows how the backprojection can be implemented most efficiently using wedge-detector geometry. 5. Filtering The filtering, i.e., the integration over γ in Eq. (47), is one of the central steps of the FBP algorithms discussed here. The vectors eν define which data must be involved in the filtering step. In practice, vectors eν are not the most useful quantities for parameterization of the filtering. Instead, we consider the intersection of the κ-planes with the planar detector (Figure 18). The line of intersection defines the filter line, which is sometimes called a κ-line. Any κ-line can be parameterized by two constants, which we denote as v0 and σ . With these, points on the κ-line are given by vP (uP ) = v0 + σ uP
(59)
in planar-detector coordinates. We derive some relationships between the quantities of a κ-plane and the corresponding κ-line in the following. Consider the line containing the focal spot at y(s) and being parallel to θ(γ ) defined in Eq. (48). Points on this line are given by (λ, γ ) = y(s) + λθ(γ ),
(60)
where λ parameterizes the points. Obviously, this line lies in the κ-plane. It intersects with the planar detector, if 0 = (λ0 , γ ) · w(s) = R + λ0 [cos γ b · w + sin γ e · w],
F IGURE 18.
The intersection of the κ-plane with the planar detector defines the κ-line.
(61)
26
BONTUS AND KÖHLER
where w(s) corresponds to the basis vector defined in Section II.C [see Eq. (15)]. Here, we have used the fact that y · w = R for the trajectories considered in this chapter. The point of intersection is parameterized by λ0 . From the latter equation we derive λ0 =
−R . cos γ b · w + sin γ e · w
(62)
With these results points on the κ-line can be parameterized by angle γ as cos γ b · u + sin γ e · u , (63) cos γ b · w + sin γ e · w cos γ b · v + sin γ e · v vP (γ ) = (λ0 , γ ) · v(s) − yz (s) = −R , (64) cos γ b · w + sin γ e · w
uP (γ ) = (λ0 , γ ) · u(s)
= −R
where we have used y · v − yz = 0. Finally, we compute the parameters of Eq. (59). Since σ corresponds to the gradient of the filter line, it can be computed from the last two equations by σ =
vP (γ2 ) − vP (γ1 ) uP (γ2 ) − uP (γ1 )
(65)
for arbitrary but different γ1 and γ2 . Parameter v0 corresponds to the vP coordinate of the κ-line at uP = 0. It is given by v0 = vP (γ0 ),
tan γ0 = −
b·u , e·u
(66)
where γ0 is defined by the condition uP (γ0 ) = 0. So far we have considered the case that vectors b and e define the filtering plane. Based on this, we have computed the parameters in Eq. (59). For completeness, we now follow the opposite approach: we assume that the κline is given by Eq. (59) and compute vectors b and e. Since Eq. (14) gives points in world coordinates, vector b associated with coordinate uP and the filter line under consideration can be computed via b(uP ) =
uP u(s) + (vP (uP ) + yz (s))v(s) − y(s) . |uP u(s) + (vP (uP ) + yz (s))v(s) − y(s)|
(67)
For convenience we introduce vector w1 (uP ) = b(uP + u) − b(uP ),
(68)
which is parallel to the filter line. Since vector e is parallel to the κ-plane, it is a linear combination of b(uP ) and w 1 (uP ). Furthermore, e is orthogonal to b. A vector that fulfills both criteria is given by wT (uP ) = w 1 (uP ) − (w 1 · b)b(uP ).
(69)
RECONSTRUCTION ALGORITHMS
27
Normalization gives e(uP ) =
wT . |w T |
(70)
6. Efficient Filtering So far we have considered reconstruction as computing the μ-value of one particular object point. In practice, reconstruction aims at computing the absorption coefficients in the ROI. Therefore, we now change our paradigm from an object-point–driven reconstruction to a projection-driven reconstruction. For this we assume that we have a set of filter lines distributed over the detector. Figure 19 shows an example. Each of the filter lines shown corresponds to a κ-line in the sense of Figure 18. Each filter line intersects with the uP -axis at a certain v0 . The gradient in the sense of Eq. (59) is given by v0 v0 (71) cot σ = h¯ R for each of the filter lines in our example. The number of filter lines shown in Figure 19 must not be confused with the sum over ν in Eq. (47). The example in Figure 19 has only one set of filter lines: the sum takes only the value ν = 1. As discussed in further detail in Section III.E the described filter lines suffice for exact reconstruction using the Pi segment. Within the Pi segment,
F IGURE 19. the Pi segment.
The Pi window on the planar detector and filter lines for exact reconstruction along
28
BONTUS AND KÖHLER
any object point will be projected into the Pi window. From each source position it will be projected onto a certain filter line. Now, the filter lines are defined such that the filtered value at the point of projection should be used for the backprojection. With this a projection-driven reconstruction can be phrased as follows: 1. Transform each projection into one or more filtered projections. 2. During the backprojection use the filtered values associated with the points onto which the object point is projected from each source position. Step 1 indicates that we perform the filtering along each of the filter lines, by which we obtain filtered values along the filter lines. For the backprojection in step 2, we might need to take different sets of filter lines into account and sum these using the correct weights. In other words, we must perform the sum over ν in Eq. (47) and use the weights μν . Using the results of Section III.B.5 points on the filter lines can be parameterized by angle γ . With this, the filtering along each line can be performed based on Eq. (47); that is, we must perform the integration over γ utilizing the shift invariance of the filtering kernel. In practice, the measured data are associated with the focus detector, whereas the filter lines have been defined on the planar detector. The relationship between focus- and planardetector variables is given by the formulas in Section II.C. We can proceed from one geometry to the other using interpolations. Nevertheless, it is most convenient if we can sample the data on the filter lines equidistantly in the focus-detector variable ϕ. Sampling the data equidistantly in ϕ requires fewer interpolations and therefore results in a better resolution. Angle γ is different from angle ϕ (Figure 20). Therefore, sampling points equidistantly in γ on the filter lines requires interpolations along vF and along ϕ. In the following we show how the formula for filtering can be
F IGURE 20.
Angles γ and ϕ are measured in different planes.
RECONSTRUCTION ALGORITHMS
29
) on the κ-line are separated by F IGURE 21. The two points P = (uP , vP ) and P = (uP , vP angle γ . The dashed line is orthogonal to the κ-line and has length . The three solid lines originating from the focal spot have lengths r, r , and r .
transformed so that the points on the filter lines can be sampled equidistantly in ϕ. With the subsequent derivation we follow Noo, Pack and Heuscher (2003) but aim for a slightly different result. Consider Figure 21. It shows the planar detector and one particular κ-line. Two specific points on the κ-line are denoted as P = (uP , vP ) and P = ). They are separated by an angle γ seen from the focal spot. The (uP , vP dashed line is orthogonal to the κ-line and has length . Using the formulas, which compute the area of a triangle, we observe that L = rr sin γ and L = r r sin γ . Hence, r sin γ L = . L r sin γ
(72)
At the same time L/L = uP /(uP − uP ), such that uP r sin γ . = uP − uP r sin γ
(73)
Since we are interested in infinitesimal quantities, this yields duP r dγ . = uP − uP r sin γ
(74)
For the transition to focus-detector coordinates, we realize by comparison with Eq. (9) that uP − uP = R(tan ϕ − tan ϕ) = R
sin(ϕ − ϕ) , cos ϕ cos ϕ
(75)
30
BONTUS AND KÖHLER
such that duP R = . dϕ cos2 ϕ
(76)
Using this in Eq. (74), we obtain r cos ϕ dϕ dγ = . sin γ r cos ϕ sin(ϕ − ϕ)
(77)
The latter equation still contains quantities r and r . From Figure 21 we can establish 2 r 2 cos2 ϕ = R 2 + u2P + vP cos2 ϕ tan2 λ R2 2 cos , (78) ϕ = = R 2 1 + tan2 ϕ + cos2 ϕ cos2 λ where we have used Eqs. (9) and (10). Now the integration measure can be written as dγ cos λ dϕ = . sin γ cos λ sin(ϕ − ϕ)
(79)
Comparing this result with Eq. (47) allows us to write the filtering step as π Pν (s, uP , vP ) = −π
dϕ cos λ D y(s), uP (ϕ ), vP (ϕ ) , sin(ϕ − ϕ) (80)
where
vP (ϕ ) = vP uP (ϕ ) ,
(81)
vP (uP ) is defined in Eq. (59), and uP (ϕ ) is given by Eq. (9). The factor 1/ cos λ in Eq. (79) was not used in Eq. (80). It will be taken into account in the backprojection formula below. Variable ϕ in Eq. (80) is associated with uP via Eq. (9). In other words, Eq. (80) parameterizes the filtered values in coordinates of the planar detector. The relationship with focus-detector coordinates can easily be established with the equations in Section II.C. The index ν indicates that the formula in Eq. (80) gives filtered values associated with a specific set of filter lines. Within this set the κ-line containing point (uP , vP ) must be used to obtain the desired filtered value. The integration, i.e., the filtering, is now performed over the angular variable ϕ . The filtering kernel is identical to the one in Eq. (47). Therefore
RECONSTRUCTION ALGORITHMS
31
the shift invariance is preserved. Notice the extra factor cos λ in Eq. (80). According to Eq. (10) this factor depends only on the focus-detector row. Therefore, the factor cos λ can easily be taken into account by a simple modification of the projection data. This can be done before or after the derivative D is computed. With the applied modifications the backprojection [compare with Eq. (46)] can now be written as Ne −1 1 ds μ(x) = μν Pν s, uP (x, s), vP (x, s) . 2 |x − y(s)| cos λ 2π IBP (x)
ν=1
(82) Here the factor 1/ cos λ stems from Eq. (79), since it was not absorbed into Eq. (80). This factor corresponds to a postprocessing step after the filtering. Similar to the factor 1/ cos λ in Eq. (80), it depends only on the detector position. This factor can therefore be taken into account by an object-point– independent modification of the filtered projection data. Functions uP (x, s) and vP (x, s) correspond to the coordinates of the object point x projected onto the planar detector from y(s). These coordinates are given by Eqs. (63) and (64) for γ = 0. 7. Backprojection Using Wedge Detector Geometry The backprojection formula [Eq. (82)] contains a factor 1/|x − y(s)|. This factor depends on the object point x and on the focal-spot position y(s). Due to this factor the backprojection becomes rather inefficient. Here we show how we can eliminate this factor by a transition to parallel (wedge) detector coordinates. For this consider Figure 22. It shows a plane containing x and parallel to the x- and y-axes. The source path y(s) is projected into this plane. Variable ϕ corresponds to the fan angle under which the object point appears from the projected y(s). While s corresponds to the angular variation of the source position, variable t is the angle under which the source is seen from x. From Figure 22 we realize that L dt R ds = . (83) cos ϕ Here, L is the distance from the projected object point to the projected focalspot position. This distance is related to λ via L , (84) cos λ = |x − y(s)| such that we derive from Eq. (83) ds cos λ = dt. (85) |x − y(s)| R cos ϕ
32
BONTUS AND KÖHLER
F IGURE 22. Projection of the object point x into the xy-plane. While angle s is measured from the rotation axis, angle t is measured from the object point.
Using this result, the backprojection formula Eq. (82) becomes μ(x) =
−1 2π 2
I˜BP (x)
Ne dt μν P˜ν t, u (x, t), v (x, t) . R cos ϕ ν=1
(86)
Here, P˜ν (t, u , v ) corresponds to the filtered projection data rebinned into wedge geometry. Functions u (x, t) and v (x, t) yield the coordinates of the object point projected onto the wedge detector, and I˜BP (x) is the backprojection interval of x in parallel geometry. The factor 1/R cos ϕ in Eq. (86) is independent of the object point x. This factor varies only with the position on the detector. Again, this factor is taken into account by a modification of the projection data right after filtering. C. Mathematics on the Planar Detector The preceding sections describe a certain type of a filtered backprojection algorithm. So far, the algorithm has been presented in its most general way. The subsequent sections are devoted to specific trajectories and acquisition types. The concept of projecting the trajectory and the vectors involved onto
RECONSTRUCTION ALGORITHMS
33
F IGURE 23. The intersection of the κ-plane defined by e and the planar detector results in the shown κ-line. The Radon plane defined by ω intersects with the detector, resulting in line Lω . Projecting e onto the detector gives vector eˆ , which is parallel to the κ-line. The projection of ω, ˆ is orthogonal to Lω . denoted as ω,
the planar detector proves very useful (Katsevich, 2004b). Some general formulas helpful for the mathematical discussions that follow are presented here. The projections of different trajectories were discussed in Section II.D. ˙ Remember that vectors e and ω define Now we consider vectors e, ω, and y. ˙ κ- and Radon planes, respectively. Notice that y(s), i.e., the tangent on the trajectory, is parallel to the planar detector for the trajectories considered. Figure 23 shows exemplary vectors e and ω. The projections of these vectors ˆ In particular, the κ-line is onto the planar detector are denoted as eˆ and ω. parallel to eˆ , while ωˆ is orthogonal to Lω . Here, Lω is the line that results from the intersection of the Radon plane with the detector. Explicit expressions for eˆ and ωˆ can be obtained via eˆ = (e × b) × w
(87)
ωˆ = (ω × w) × w = ω + w(ω · w).
(88)
and
Here, w is the basis vector defined in Section II.C and b is defined in Eq. (44). Furthermore, we used the relation a × (b × c) = b(a · c) − c(a − b) above. Now, using the general formula (a × b) · (c × d) = (a · c)(b · d) − (b · c)(a · d)
(89)
twice and applying b · ω = 0, we derive eˆ · ωˆ = (w × ω) · (e × b) = −(b · w)(e · ω).
(90)
34
BONTUS AND KÖHLER
F IGURE 24. The planar detector seen from y(s2 ) in Figure 16. The intersection of the Radon ˆ The backprojection segment is drawn plane results in line Lω . Object point x is projected onto point x. bold.
Similarly, using w · y˙ = 0, we obtain from Eq. (88) ˙ ˙ ωˆ · y(s) = ω · y(s).
(91)
Considering Eqs. (90) and (91) and keeping Eq. (58) in mind yields sgn(ω · e) = sgn(ωˆ · eˆ )
and
˙ = sgn(ωˆ · y), ˙ sgn(ω · y)
(92)
where we have used b · w < 0. Comparing Eqs. (58) and (92) shows that for the evaluation of Eq. (58) it is sufficient to consider the projections of the involved vectors onto the planar detector. We further argue that projections onto the planar detector also are useful for answering certain questions (such as, how often does a certain Radon plane intersect with the trajectory?). For an illustration consider Figure 24. It shows the planar detector seen from y(s2 ) in Figure 16. The backprojection segment along the helix is drawn bold. As seen, the other two IPs of the Radon plane (besides y(s2 )) do not intersect with the backprojection segment. Eq. (58) is invariant under ω → −ω. We will use this by ensuring that ˙ = 1 in the examples given throughout this chapter. sgn(ω · y) D. Circle Plus Line This section is devoted to a trajectory consisting of a circle and a line, where the line is parallel to the z-axis. The circle and line trajectory (CLT) is best suited to demonstrate the methods used in this chapter. It is relatively easy to understand, while it requires applications to the typical case differentiations. The resulting reconstruction algorithm was first published by Katsevich (2004a).
RECONSTRUCTION ALGORITHMS
35
1. Geometrical Properties The parameterization of the trajectory under consideration is shown by Eqs. (4) and (6). The circle and the line intersect at one point, denoted as Pc . For every object point x a line can be found that contains x and intersects with the circle and the line. Figure 25 illustrates this concept. In analogy to helical CT, such a line is called a Pi-line. Furthermore, the IPs of the Pi-line with the circle and the line are denoted as Pπc and Pπ , respectively. Now, we choose the backprojection segment CBP (x) for the CLT to extend from Pπ c to Pc along the circle and from Pc to Pπ along the line. This gives two possibilities along the circle (Figure 25). Using one of these backprojection segments, the combination of the Pi-line and CBP (x) gives a closed contour. Therefore, every Radon plane containing x intersects with CBP (x) and the trajectory is complete. Moreover, every Radon plane has either one or three IPs with CBP (x). We neglect planes that are tangential on the circle or contain the line since these have a Lebesgue measure zero. As mentioned in Section III.B.2, Radon planes with one IP are denoted as 1planes, while Radon planes with three IPs are called 3-planes. Radon planes with three IPs intersect twice with the circle and once with the line, whereas 1-planes either have one IP with the line or one IP with the backprojection segment along the circle. Figure 26 illustrates the three different cases. In the sequel we define filter lines separately for the circle and for the line part of CBP (x). Based on Eq. (58) we show that the filter lines are defined such that every 1-plane receives a weight of 1 from the single IP. Every 3plane gets a weight of 1 from each of the two IP along the circle. It gets a weight of −1 from the IP on the line, such that the sum over all three weights results in +1, as desired.
F IGURE 25. The circle and the line intersect at point Pc . For each object point x a unique Pi-line can be found, which contains x and intersects with the circle and the line. The points of intersection are denoted as Pπ c and Pπ , respectively. For the backprojection segment, which is drawn in bold, the two possibilities are shown in the left and right images.
36
BONTUS AND KÖHLER
F IGURE 26. Left: The two Radon planes each have one IP with the backprojection segment. Right: The Radon plane has three IPs. For the other possibility of CBP (x), this plane corresponds to a 1-plane.
2. Filter Lines The methods introduced in Section III.B can be used to parameterize each filter line using planar-detector coordinates. In particular, we specify pairs of constants v0 and σ , which define a filter line in the sense of Eq. (59). For the circle part of CBP (x), the filter lines used are all parallel to the uP -axis, such that σ = 0. In other words, specifying a complete set of filter lines for the circle part, parameter v0 differentiates between different lines of this set. We need to define only one set of filter lines for the circle, such that Ne = 1 [compare with Eqs. (47), (58), and (86)]. Moreover, the weights belonging to this filter-line set must be set to μν = 1. Figure 27 depicts the filter lines. We also need one set of filter lines for the line part of the CLT, i.e., Ne = 1. The filter lines are tangents on the projected circle. The projection of the circle can be computed with Eq. (22), where |z0 − z1 | is the distance between the
F IGURE 27.
The filter lines for the circle part of the CLT are all parallel to the uP -axis.
RECONSTRUCTION ALGORITHMS
37
F IGURE 28. Projection of the circle onto the planar detector from a point on the line for the two cases shown in Figure 25. The backprojection segment is drawn bold. The dashed line is the projection of the Pi-line. This projection is parallel to the vP -axis.
F IGURE 29. Figure 25.
Sets of filter lines for a particular point on the line for the two cases shown in
point on the line and the circle plane. Since Eq. (22) depends on z, the sets of filter lines vary for different positions on the line. Figure 28 considers the two different cases for the choice of CBP (x). The backprojection segment is drawn bold, and the projection of the Pi-line is dashed. The specific definition of the filter lines requires that each filter line is tangential on the projected backprojection segment, i.e., on the segments drawn bold in Figure 28. This results in the filter-line sets depicted in Figure 29 for the two cases. Obviously, these filter lines are tangential on the projected circle, and the point of tangency is either right or left of the point onto which x is projected. The weights must be set such that μν = 1 for the case shown in the left image of Figure 29, while μν = −1 for the case shown in the right image of Figure 29. a. Circular Full Scan. For the two possibilities of CBP (x) shown in Figure 25, the two segments on the circle together cover the complete circle. This provides an easy “recipe” for defining a full-scan algorithm, which uses data from the complete circle. We could apply just the algorithm defined above for both cases and average the results. More efficiently, we obtain the same output by performing the backprojection over the complete circle using the filter lines defined above. For the line, we filter the data using both sets of filter lines and add the data before backprojection. In other words, we need to
38
BONTUS AND KÖHLER
set Ne = 2 in Eqs. (82) and (86). Weights μν must be multiplied by 1/2 for the line part. 3. Proof of Exactness We now prove that the reconstruction algorithm is mathematically exact. As discussed in Section III.B, we need to show that Eq. (58) results in a constant value of 1 using the defined filter lines and weights. The result of Eq. (58) must be equal to 1 for every object point in the FOV and for every Radon plane containing the object point. We begin with analyzing the filter lines of the circle part. Remember that every Radon plane either has zero, one or two IPs with the circle part of CBP (x). If it has two IPs, the Radon plane is a 3-plane, i.e., it has one additional IP with the line. According to Section III.C it is sufficient to consider the projection of vectors ω and e onto the planar detector. The projected vectors are denoted as ωˆ and eˆ , respectively. Moreover, the tangent ˙ on the trajectory y(s) is parallel to the planar detector. We defined the filter lines for the circle part to be parallel to the uP -axis. Therefore, eˆ is also parallel to this axis for each filter line. Realizing that y˙ ◦ (s) is also parallel to the uP -axis, we conclude that the two sgn-functions in Eq. (58) result in ˙ sgn(ωˆ · eˆ ν ) = sgn2 (ωˆ · eˆ ν ) = 1. Now, μν = 1, Ne = 1, and k sgn(ωˆ · y) counts the number of IPs. Therefore, evaluating Eq. (58) only for the circle part results in a value of 0, 1, or 2. In other words, for a 1-plane with IP along the circle, Eq. (58) gives the desired result of 1. For a 3-plane the two IP along the circle contribute with a weight of 2. In the following we therefore must show that the line contributes with a weight of −1 for 3-planes, while 1-planes with IP on the line get a weight of 1. For the discussion of the filter lines used along the line consider Figure 30. It corresponds to the left image in Figure 28. We assume that the object point ˆ Remember that x is located under consideration is projected onto point x. on the Pi-line, which is drawn as a dashed line. The corresponding filter line, which is parallel to vector eˆ , is tangential on the projected circle, with the point ˆ The tangent on the trajectory, y˙ | , is parallel to of tangency located right of x. the vP -axis, i.e., it is parallel to the z-axis. The two dotted lines correspond to exemplary Radon planes with normal vectors ω1 and ω2 . Obviously the Radon plane associated with ω1 is a 1-plane. The sole IP is the current source position. The Radon plane associated with ω2 is a 3-plane, for which only one IP on the circle is shown in Figure 30. The second IP is located farther to the right. As discussed in Section III.C, Eq. (58) is invariant under ω → −ω. For ˙ = 1 in our examples. The angle convenience, we ensure that sgn(ω · y) between eˆ and ωˆ 1 is less than π/2. Therefore, sgn(ωˆ 1 · eˆ ) = 1. Evaluating
RECONSTRUCTION ALGORITHMS
39
F IGURE 30. The planar detector seen from one point on the line. The two dotted lines ˆ correspond to Radon planes with normal vectors ω1 and ω2 . The object point is projected onto point x. The filter line, which is parallel to eˆ , is tangential on the projected circle with the point of tangency ˆ located to the right of x.
Eq. (58) with Ne = NI = 1 and μν = 1, we realize that the Radon plane associated with ω1 receives a weight of 1, as desired for a 1-plane. The angle between eˆ and ωˆ 2 is larger than π/2, such that sgn(ωˆ 2 · eˆ ) = −1. Now, with NI = 3 we can evaluate Eq. (58) and obtain 1 + 1 − 1 = 1, where the first two summands stem from the two IPs on the circle. In other words, the 3-plane associated with ω2 derives the desired weight of 1 from the complete CLT. Figure 31 shows the situation of Figure 30 for the second possibility of CBP (x). Both Radon planes now correspond to 1-planes. The filter line is ˆ Obviously, tangential on the circle with point of tangency located left of x.
F IGURE 31. The same scenario as in Figure 30 but now for the second case shown in Figure 25. The filter line, which is parallel to eˆ , is tangential on the projected circle with point of tangency located ˆ left of x.
40
BONTUS AND KÖHLER
sgn(ωˆ 1 · eˆ ) = sgn(ωˆ 2 · eˆ ) = −1, and with μν = −1 Eq. (58) results in a value 1 for the two Radon planes shown. All Radon planes that can be covered from the current focal spot position y | (z) must contain y | (z) and x. Therefore, all these Radon planes can be ˆ Starting considered by rotating one of the dashed lines in Figure 30 around x. with the line indicated by ωˆ 1 and rotating counterclockwise, the associated Radon plane becomes a 3-plane as soon as the line intersects with the circle. Now, since the filter line is tangential on the circle, ωˆ · eˆ changes sign exactly at the transition from a 1-plane to a 3-plane. The next sign change occurs when the dashed line becomes parallel to the vP -axis—remember that we ˙ = 1. Therefore, we must change ωˆ from pointing to the left to want sgn(ωˆ · y) pointing to the right, once the dashed line becomes parallel to vP . Again, the sign change occurs exactly at the transition from a 3-plane to a 1-plane, since the projected Pi-line is parallel to the vP -axis. Continuing with the rotation until the dashed line becomes parallel to the line with which we started, no further sign change occurs. In other words, for the current source position and object point x, all Radon planes can be treated in the same way as above for the exemplary Radon planes with normal vectors ω1 and ω2 . The discussion so far has not been specific in the choice of the focal spot position or the object point. Choosing different x and/or y | (z) all arguments can be repeated. In other words, 1- and 3-planes get the desired weights from all source positions, such that the proof is finished. 4. Reconstruction Results For the evaluation of the CLT reconstruction algorithm we simulated a CT scanner with 256 detector rows. Table 1 summarizes the simulation parameters of the circle part. The circle plane was located at a distance TABLE 1 S IMULATION PARAMETERS Projections per turn Detector columns Detector rows Fan angle Detector height Focal-spot size Distance source-rotation axis Distance source-detector center Detector oversampling Focal-spot oversampling Angular oversampling Reconstructed voxel size
1160 512 256 26.5 291.9 0.91 × 1.37 570 1040 3×3 3×3 3 (0.5)3
degrees mm mm2 mm mm
mm3
41
RECONSTRUCTION ALGORITHMS
(a)
(b)
(c)
F IGURE 32. Reconstruction results for the CLT. The three images show (a) the result of the complete CLT, (b) the contribution of the circle, and (c) the contribution of the line, respectively. L/W: 40 HU/200 HU for the first two images, L/W: −950 HU/200 HU for the line image. L, level; W, window.
z = 40 mm from the origin of the phantom. For the line part, we simulated 1025 projections over a distance of 320 mm along z. We used the Forbild head phantom (www.imp.uni-erlangen.de/forbild/english/results/head/head.html), which contains some high-frequency components as well as some lowcontrast parts. The head phantom therefore is a challenge for every reconstruction algorithm. Throughout this chapter we use Hounsfield2 units (HU) for the presentation of reconstruction results. Hounsfield units correspond to a shift and scaling of the absorption coefficients, such that HU(μ) = 1000
μ − μ H2 O . μ H2 O
(93)
Here, μH2 O corresponds to the absorption coefficient of water. We have set μH2 O = 0.0183/mm. Notice that the shift used in the transition from μ-values to HU must be taken into account when adding images. Figure 32 shows some reconstruction results. The specification of the level and window (L/W) 40 HU/200 HU means that the available gray-level range is assigned to object points between −60 HU and 140 HU. Object points below −60 HU are drawn as black, while object points above 140 HU are white. Figure 32 shows the result for the complete CLT, as well as the contributions of the circle and the line. The (transaxial) slice shown is perpendicular to the z-axis and has a distance of 45 mm to the circle plane. Obviously, the addition of the line significantly reduces the artefacts visible in the image of the pure circle. 2 Sir Godfrey Newbold Hounsfield, English electrical engineer, 1919–2004.
42
BONTUS AND KÖHLER
E. The Katsevich Algorithm for Helical Pi Acquisition The reconstruction algorithms described next deal with helical CT, beginning with the Pi mode. An exact shift invariant FBP algorithm for a helical Pi acquisition was first found by Katsevich (2002). The following sections describe the further improvement of the algorithm published in Katsevich (2004b). 1. Geometrical Properties The helical trajectory is parameterized by Eq. (5). As discussed in Section II.D, a Pi-line can be found for every object point x in the FOV. A Pi-line contains x and two points y H (sπ1 ), y H (sπ2 ) on the helix with sπ2 −sπ1 < 2π. All points y H (s) on the helix with sπ1 s sπ2 belong to the Pi segment of x. The lower and upper Pi-window boundaries are given by Eq. (32) setting n = 1. From every point y H (s) within the Pi segment, x is projected onto the planar detector between these boundaries. For the reconstruction algorithm discussed here, the Pi segment corresponds to the backprojection segment CBP (x). Since the combination of the Pi-line and the Pi segment gives a closed contour, every Radon plane containing x intersects with the Pi segment. The trajectory is complete. In particular, every Radon plane has either one or three IPs with CBP (x) (again we neglect planes tangential on the helix, since these have a measure zero). Figure 33 depicts one particular object point x and its Pi-line. The shown Radon plane intersects three times with the Pi segment, and the IPs are y H (s1 ), y H (s2 ), and y H (s3 ). Figure 34 shows the planar detector seen from the three points y H (s1 ) through y H (s3 ). The object ˆ The dotted lines in point is projected onto different positions indicated by x. Figure 34 correspond to the intersections of the Radon plane with the detector.
F IGURE 33. point x.
The three IPs of the Radon plane are located within the Pi segment of object
RECONSTRUCTION ALGORITHMS
43
F IGURE 34. The planar detector seen from the three IPs y H (s1 ) through y H (s3 ) in Figure 33. ˆ The dotted lines indicated by R correspond to the The object point is projected onto points x. intersections of the Radon plane. Projections of the Pi-line result in the solid lines indicated by π . The other solid line L1 corresponds to the asymptote on the helix.
The asymptote on the projected helix plays a crucial role. The asymptote is indicated by the solid line L1 in Figure 34. It is parallel to y˙ H (s). Line L1 contains the origin of the planar detector and its gradient is equal to h/R. The ¯ latter follows from Eq. (32) using arctan x → (π/2 − 1/x) for x → ∞. The solid lines indicated by π in Figure 34 are the projections of the Piline. Remember that the points on the Pi-window boundaries are associated with points on the helix. Traversing these boundaries from right to left the corresponding s-value increases. In particular, two points on the lower and upper Pi-window boundaries at the same uP -coordinate are separated by an angular distance of 2π [compare with Eq. (28)]. Since the points of sunrise and sunset now are separated by less than 2π , the Pi-line projected onto the planar detector must have positive slope. Point y H (s1 ) is the first IP of the exemplary Radon plane in Figure 33, which is reached when traversing the helix. Seen from that point (the first image in Figure 34), the projection of the Radon plane (dotted line) has two IPs with the upper Pi-window boundary. Moreover, these two IPs belong to the Pi segment, since the intersection of the Pi-line with the Pi-window boundary is farther to the left. In summary, from the first image in Figure 34 it can be deduced that the indicated Radon plane is a 3-plane with two further IPs in the future. Similarly, in the second image of Figure 34 the dotted line has a gradient larger than the gradient of L1 . Therefore, the dotted line has one IP with the lower Pi-window boundary and one IP with the upper one. This indicates that the current source position corresponds to the central IP of the 3-plane. 2. Filter Lines As discussed in Section III.B, the filtering can be defined using κ-planes or κ-lines, where the latter are the intersection of the κ-planes with the
44
BONTUS AND KÖHLER
F IGURE 35. The κ-planes used for Pi reconstruction contain the object point x, the current source position y(s), and two further points on the helix: y(sκ1 ) and y(sκ2 ). The angular distance sκ2 − s is twice as large as the distance sκ1 − s.
planar detector. For the reconstruction algorithm described here, the κ-planes correspond to planes that contain the object point, the current source position, and two additional points on the helix denoted as y H (sκ1 ) and y H (sκ2 ) in the following. Both points y H (sκ1 ) and y H (sκ2 ) must belong to the Pi segment of x, and they must fulfill sκ2 − sκ1 = sκ1 − s, where y H (s) corresponds to the current source position (Figure 35). If x is projected above line L1 (compare with Figure 34), s < sκ1 < sκ2 , while sκ2 < sκ1 < s, if x is projected below L1 . If x is projected onto the asymptote L1 , the κ-line is identical to L1 , i.e., the κ-plane is parallel to the tangent on the helix. For the method specified here, the weights must be set to μν = 1. We show in the following that the definition of the κ-planes results in filter lines with gradients given by Eq. (71). For symmetry reasons we can choose the source to be located at y H (0). We consider the κ-plane, which contains the focal spot and points y H (s) and y H (2s). A parameterization of this plane is given by p(α, β) = y H (0) + α y H (s) − y H (0) + β y H (2s) − y H (0) R R(cos s − 1) R(cos 2s − 1) = 0 +α +β . (94) R sin s R sin 2s 0 hs 2hs ¯ ¯ Due to our choice of parameters the plane intersects with the planar detector at x = 0, such that 0 = 1 + α(cos s − 1) + β(cos 2s − 1)
(95)
RECONSTRUCTION ALGORITHMS
45
gives the condition for the points of intersection. Solving the latter equation for β results in 1 + α(cos s − 1) 1 + α(cos s − 1) = . (96) 1 − cos 2s 2 sin2 s Inserting this equation into Eq. (94), the y-component corresponds to the uP coordinate of the filter line: sin 2s sin 2s(cos s − 1) uP (α) = R + αR sin s + 2 sin2 s 2 sin2 s (1 − cos s) , (97) = R cot s + αR sin s or uP sin s − R cos s uP − R cot s α(uP ) = sin s = . (98) R(1 − cos s) R(1 − cos s) Similarly, for the parameterization of the vP -coordinate of the κ-line, we obtain h¯ s cos s − 1 vP (α) = + α hs ¯ 1+ sin2 s sin2 s h¯ s cos s(1 − cos s) = + α hs . (99) ¯ 2 sin s sin2 s Inserting Eq. (98) gives β=
h¯ s hs cos2 s ¯ cos s − hs + uP ¯ 2 2 R sin s sin s sin s hs ¯ = hs ¯ + uP cot s R v0 v0 (100) = v0 + uP cot , h¯ R which is the parameterization of the filter line in the sense of Eqs. (59) and (71) with v0 = hs. ¯ Considering Figure 19 we observe that the defined filter lines cover the complete Pi window, such that a filter line can be found for every object point and for every source position within CBP (x). Nevertheless, the filter lines have different gradients, such that intersection points exist. For illustration consider Figure 36. The two κ-lines intersect at point P1 . The definition given above specifies that κ-planes must have three IPs with the Pi segment. Line κ1 has two IPs outside the Pi segment for object points projected onto P1 . Remember that the projection of the Pi-line has a positive slope on the planar detector (compare with Figure 34). Therefore, we must use filtered values associated vP (uP ) =
46
BONTUS AND KÖHLER
F IGURE 36. The two filter lines κ1 and κ2 intersect at point P1 . For line κ2 only the corresponding κ-plane has intersection points within the Pi segment of object points projected onto P1 .
with line κ2 for points projected onto P1 . Line κ1 serves as a filter line for object points projected, for example, onto point P2 . 3. Proof of Exactness We now prove that the defined filter lines specify an exact reconstruction algorithm by showing that Eq. (58) results in a constant value of 1. Consider Figure 37. An exemplary object point is projected onto point xˆ of the planar detector. The corresponding Pi-line is indicated by π . The line indicated by T contains xˆ and is tangential on the upper Pi-window boundary. The Pi-line and line T divide the detector plane into areas A and B, such that the dotted line belongs to area B. The Radon planes with lines of intersection in area A are 1planes. These planes either have only one IP with the helix (the current source position) or the remaining IPs do not belong to the Pi segment, i.e., to CBP (x). Radon planes with intersection lines in area B are 3-planes. The proof given in the sequel consists of different steps. First, we show that 1-planes receive a weight of 1 from the single IP. Second, we argue that for 3-planes, the central IP always contributes with a weight of 1. Finally, we prove that for 3-planes the contributions of the first and third IP cancel. We begin with some general observations. Remember that line L1 is the asymptote on the helix. Line L1 is parallel to y˙ H (s). We denote the line of intersection of the Radon plane and the planar detector as R. Now, we argue that projecting x into the Pi window from the first point of a 3-plane, the point of projection must be above L1 . Line R must intersect twice with the CBP (x)part of the upper Pi-window boundary seen from the first IP (compare with Figure 34). If x is projected below L1 , line R must be steeper than line L1 in order to have an intersection with the upper Pi-window boundary. Now,
RECONSTRUCTION ALGORITHMS
47
ˆ The Pi-line (indicated by π ) and the tangent F IGURE 37. The object point is projected onto x. on the upper Pi-window boundary (indicated by T ) separate the detector plane into areas A and B. The dotted line corresponds to a Radon plane with three IPs with CBP (x).
line R can have at most one IP with the CBP (x)-part of the upper Pi-window boundary, such that the source position cannot be the first IP, if x is projected below L1 . Similarly, x must be projected below L1 from the third IP of a 3plane. With these observations in mind we realize that the tangential line T in Figure 37 must have a gradient smaller than the gradient of line L1 . The arrow in Figure 38 originating from xˆ is parallel to y˙ H (s) such that the line that contains xˆ and is parallel to the arrow is parallel to L1 . This line divides area B into areas B1 and B2 . Obviously, for Radon planes with intersection lines R in B1 , the current source position corresponds to the first IP, whereas for 3-planes with R in B2 the current source position is the ˆ is central IP. We argue that the filter line, which must be used for point x, located in area B1 . Remember that the corresponding κ-plane has three IPs with CBP (x). If xˆ is located above L1 , the κ-plane has two IPs in the future. In other words, the κ-line must have two IPs with the upper Pi-window boundary. This is only possible if the filter line is located in area B1 . Consider the exemplary line R in Figure 38 located in area B2 . Once again, ˆ i.e., the projection of the normal vector of the Radon plane, we draw ω, such that sgn(ωˆ · y˙ H ) = 1. Since the filter line is located in B1 , we realize that also sgn(ωˆ · eˆ ν ) = 1. This remains true if we rotate the dotted line R counterclockwise into area A (remember that the gradient of line T is smaller than the gradient of L1 ). In other words, the two sgn-functions in Eq. (58) contribute with a factor 1, both for 1-planes (area A) and from the central IP of 3-planes (area B1 ). Therefore, all that remains to finish the proof is to show that the contributions of the first and third IPs of a 3-plane cancel when summing over k in Eq. (58).
48
BONTUS AND KÖHLER
F IGURE 38. The arrow originating from xˆ is parallel to y˙ H (s). The line containing xˆ and the arrow is parallel to the asymptote on the helix. This line divides area B defined in Figure 37 into areas B1 and B2 . The dotted line R is located in area B2 . For the 3-plane associated with R the current focal-spot position is the central IP.
We consider a 3-plane and denote the three IPs as y H (s1 ), y H (s2 ), and y H (s3 ), s1 < s2 < s3 . Seen from y H (s1 ), the line R associated with the 3-plane must be located in area B1 . Moreover, the filter line to be used from y H (s1 ) is also located in area B1 . We denote the remaining two IPs of the corresponding κ-plane as y H (sκ1 ) and y H (sκ2 ). To be precise, we write sκ1 (s1 ), sκ2 (s1 ) for the specification of the filter line to be used from y H (s1 ). There are two possibilities for the ordering of the IP, illustrated in Figure 39. The solid lines denoted by κ indicate the filter lines, while the dotted lines correspond to the Radon plane. The two possibilities are treated separately in the following two cases. Case 1. s1 < sκ1 (s1 ) < s2 < s3 < sκ2 (s1 ). This case corresponds to the first image in Figure 39. Obviously, the angles between ωˆ and y˙ H and between ωˆ and eˆ ν are less than 90 degrees. Therefore, the two sgn-functions in Eq. (58) contribute with a factor of 1 for the first IP of the Radon plane in this case. With the definition of the κ-planes given above, the central IP of the κ-plane must be located right in the (angular) middle of the other two IPs. Therefore, sκ2 (s1 ) + s1 < s2 < s3 < sκ2 (s1 ). (101) 2 We now argue that, seen from y H (s3 ), the situation shown in Figure 40 follows, such that s1 < sκ2 (s3 ) < sκ1 (s3 ) < s2 < s3 . Otherwise, sκ2 (s3 ) < s1 < s2 < sκ1 (s3 ) = (sκ2 (s3 ) + s3 )/2 must be fulfilled. Together with Eq. (101) this leads to (sκ2 (s1 ) + s1 )/2 < (sκ2 (s3 ) + s3 )/2 and sκ2 (s3 ) < s1 < s3 < sκ2 (s1 ), which is clearly a contradiction. sκ1 (s1 ) =
RECONSTRUCTION ALGORITHMS
49
F IGURE 39. Upper half of the planar detector. Considering 3-planes with two IPs in the future, two possibilities exist for the order of IPs. Either s1 < sκ1 < s2 < s3 < sκ2 or s1 < s2 < sκ1 < sκ2 < s3 . For illustration purposes, the shown filter lines κ deviate from the filter lines of the specified algorithm. Nevertheless, the given arguments remain valid.
F IGURE 40. The lower half of the planar detector; similar to the first image in Figure 39, seen from the third IP of the 3-plane. Here s1 < sκ2 < sκ1 < s2 < s3 .
The angle between ωˆ and eˆ ν in Figure 40 is larger than 90 degrees, such that the third IP contributes with a weight of −1 in Eq. (58). In other words, the contributions of the first and the third IPs cancel as desired. Case 2. s1 < s2 < sκ1 (s1 ) < sκ2 (s1 ) < s3 . This case corresponds to the second image in Figure 39. Now, the first IP contributes with a weight of −1 in Eq. (58). The proof that the third IP now contributes with a positive weight is identical to Case 1. F. EnPiT: Helical n-Pi Reconstruction The Pi-method discussed in the previous section suffices for demonstrating exact reconstruction using helical data. Nevertheless, data obtained with a Piacquisition have only few redundancies. Therefore, Pi-methods are sensitive to patient motion and might result in noisy data if the tube current is not sufficiently high. Moreover, for given gantry rotation speed and detector
50
BONTUS AND KÖHLER
height, the Pi-mode determines the table feed. The latter can reach huge values for modern cone-beam scanners. The n-Pi mode introduced in Section II.D reduces some of these problems. For an n-Pi acquisition, the table feed is much smaller than for the Pi-mode, and more redundant data are acquired. Redundancies reflect themselves in the type of Radon planes that occur. In the n-Pi case, Radon planes can have 1, 3, . . . , n + 2 IPs with CBP (x). The backprojection segment, CBP (x), contains all points along the helix from which x is projected into the n-Pi window. The vast majority of Radon planes has n IPs for typical system parameters. Here we summarize a reconstruction algorithm denoted as EnPiT, which was published by Bontus et al. (2003, 2005). EnPiT is a quasi-exact reconstruction algorithm in the sense that some n- and (n + 2)-planes receive the incorrect weighting. Nevertheless, reconstruction results show that the approximation is rather good. An exact algorithm for the 3-Pi case was published by Katsevich (2006). 1. Reconstruction Algorithm (L) The filter-line sets defined below are denoted as Lp (p odd, 1 p < n) (R) and Lp (p odd, 1 p n). As shown in Bontus et al. (2005), these sets provide the following features at least in cases in which no interrupted illumination occurs:
1. A 1-plane gets a positive contribution from each filter-line set. (R) 2. An n-plane gets a positive contribution from set Ln for each of the n IPs (intersection with CBP (x)). The contributions of Lp(L) and Lp(R) with p < n cancel. (R) 3. An m-plane, 3 m < n, gets a positive contribution from Ln at each of (L) (R) the m IPs, while the contributions of Lp and Lp with p < m cancel. (L) (R) For m p < n, Lp and Lp contribute with a positive weight only to the first and last IPs while the contributions of these sets cancel for the inner IP. Filter lines of each set Lp(L) or Lp(R) are located completely in the pPi window. In order to consider all filter lines for an n-Pi reconstruction, (L) (L) (L) (R) (R) (R) we must use all sets L1 , L3 , . . . , Ln−2 and L1 , L3 , . . . , Ln . From every position within the backprojection interval the object point under consideration is projected onto the detector. For the reconstruction, the values of all filter lines, which contain the projection point, must be added using the correct weighting factors μν . This means that up to n values must be added, depending on the position of the projection point. We denote the weights for (R) (L) (R) (L) sets Lm and Lm as μm and μm , respectively. For n = 3 all weights are
RECONSTRUCTION ALGORITHMS
equal to 1. For n > 3 the corresponding values are given by ⎧ ⎨ n+1 4 , m = 1, (R) nμm = 1 , 3 m n − 2, ⎩2 1, m = n,
n+1 , m = 1, (L) −nμm = 1 4 , 3 m n − 2. 2
51
(102)
These weights have been chosen such that (keeping the features enumerated above in mind) the majority of Radon planes receive the correct weighting. For instance, the second point in the enumeration above tells us that n-planes (R) receive positive contributions only from set Ln . From each of the n IPs the n-plane gets a weight of 1/n. 2. Filter Lines To define the different filter lines that are necessary to perform the n-Pi reconstruction we first introduce certain lines, which separate the planar detector into different regions. As above, L1 is the asymptote of the Pi window. In the same manner, we define the lines Lm and L−m as the lines parallel to L1 but tangential on the upper and lower m-Pi window boundaries, respectively. Finally, we define the set of lines Lm −p as the lines that have a negative gradient and are both tangential on the upper m-Pi window boundary and on the lower p-Pi window boundary. Figure 41 shows some of these lines within the 5-Pi window. (L) We define different sets of filter lines and denote these sets as Lm (R) and Lm for different values of m. In particular, the corresponding filter (L) line within Lm for a point (uP , vP ) within the m-Pi window fulfills the following conditions: IF (uP , vP ) is above Lm −1 , the filter line is tangential on the upper m-Pi window boundary ELSE IF (uP , vP ) is above L1−1 , the filter line is tangential on the lower Pi-window boundary ELSE IF (uP , vP ) is above L1−m , the filter line is tangential on the upper Pi-window boundary ELSE the filter line is tangential on the lower m-Pi window boundary. (R)
The corresponding filter line within Lm for a point (uP , vP ) within the m-Pi window fulfills the following conditions: IF (uP , vP ) is above Lm , the filter line is tangential on the upper m-Pi window boundary
52
BONTUS AND KÖHLER
F IGURE 41.
Different lines of the sets Lm and Lm −p within the 5-Pi window.
ELSE IF (uP , vP ) is above L−m , the filter line is parallel to L1 ELSE the filter line is tangential on the lower m-Pi window boundary. Obviously, any filter line is either tangential on one of the p-Pi window boundaries or parallel to y˙ H (s). For the definitions of the tangents, we still have to say, whether the point of tangency is on the left-hand side (LHS) or the right-hand side (RHS) of (uP , vP ). In general, for filter lines within Lm(L) , the point of tangency is on the LHS of (uP , vP ) if the line is tangential on one of the upper p-Pi window boundaries, while it is on the RHS, if the line is tangential on one of the lower p-Pi window boundaries. For those lines within (R) Lm that are tangential on the m-Pi window boundary, we must consider the point at which Lm is tangential on the m-Pi window boundary. If (uP , vP ) is located left of this point, the point of tangency is on the RHS of (uP , vP ); if (uP , vP ) is located right of this point, the point of tangency is on the LHS of (uP , vP ). Figures 42 and 43 show the filter lines for the 5-Pi case. G. CEnPiT: Helical Cardiac CT 1. Cardiac CT Cardiac CT is one important application of CT. Due to the beating heart cardiac CT is challenging with respect to image quality and temporal resolution. We restrict this application to a discussion of cardiac CT based on a helical
53
RECONSTRUCTION ALGORITHMS
F IGURE 42.
(L)
5-Pi filter lines; from top to bottom: L1
(L)
and L3
.
acquisition. In particular, we assume that the data were obtained with a lowpitch helix, which can be associated with an n-Pi acquisition. Typical values of n are 9 or 11. Cardiac CT data usually are acquired simultaneously with the electrocardiogram (ECG). The aim of reconstruction is to obtain images associated with a certain motion state of the heart, and the ECG data are related to the motion states. R peaks are the canonical markers to describe the cardiac cycle based on the ECG. The cardiac cycle is considered to reach from one R peak to the next. States of the cardiac cycle are described based on these intervals. For instance, a phase point of 70% specifies that all time points are considered that are located at t = t0 + 0.7t. Here t is the time period between the R peak at t0 and its successor (see Figure 44). The situation becomes more difficult since the phase point in which the heart is in the state of least motion varies from patient to patient. Motion maps (Manzke et al., 2004) yield good results for finding the phase points of least motion. In the following discussion, we assume that the phase point at which the reconstruction shall be performed is known.
54
BONTUS AND KÖHLER
F IGURE 43.
(R)
5-Pi filter lines. From top to bottom: L1
(R)
, L3
(R)
, and L5
.
RECONSTRUCTION ALGORITHMS
55
F IGURE 44. The highlighted points on the ECG correspond to the phase points. The phase points are specified relative to the R peaks. Each phase point is associated with a certain point on the helix. The amount of data used for reconstruction around each phase point define the width of the gating window.
A simple EnPiT reconstruction cannot yield good cardiac images. Cardiac CT requires to use of data acquired at time points close to the phase points, while data at a large temporal distance to the phase points should be neglected. A good reconstruction algorithm therefore must exploit the given redundancies. Redundancies reflect themselves in the different types of Radon planes for the kinds of reconstruction algorithms discussed here. Certainly for 1-planes there is no redundancy. Fortunately, the majority of Radon planes has n IPs and Radon planes with fewer IPs contribute mainly to the low-frequency components of the images (compare with Section III.A.2). The CEnPiT algorithm, published first in Bontus et al. (2006), exploits these facts. Moreover, CEnPiT uses data of the complete focus detector—it is not restricted to the n-Pi window. The latter fact is important for obtaining the optimal temporal resolution and the best SNR in all cases. Reconstruction algorithms are called gated algorithms if they assign weights to the projections and if the weights depend on the distance of the projection to the phase point. The CEnPiT approach separates the data into two contributions, μug and μg , which are added to result in the final image. In particular, μug contains contributions of Radon planes with up to n IPs, while μg contains contributions of Radon planes with more than n IPs. With discussion in mind, we realize that μug yields mainly low-frequency components. Together with the fact that μug contains Radon planes with only few redundancies, we are motivated to use an ungated backprojection when computing μug . Gating is applied only to obtain μg . For the possibilities to choose an adequate gating function we refer to Bontus et al. (2006). Here we summarize the filter-line sets, which can be used to separate the data into μug and μg .
56
BONTUS AND KÖHLER
2. Filter Lines The filter lines summarized here were first published by Köhler, Bontus and Koken (2006). The topic was discussed on an even wider basis in Köhler, Bontus and Koken (2006), where the results were used to incorporate overscan data. Overscan data correspond to measured data outside of the n-Pi window. (L) (R) , Pm , and The different kinds of filter-line sets are denoted as Lm(R) , Pm (R) T , where m can be any positive odd integer. The definition for sets Lm(R) has been given in Section III.F. For the definition of the remaining sets, we m F introduce helper lines Lm p and L± in addition to lines Lm , L−m , and L−p introduced in Section III.F (compare with Figure 41). In particular, line Lm p has a positive gradient and is tangential on the upper m-Pi window and on the lower p-Pi window. Lines LF± are parallel to L1 and tangential on the projected boundaries of the focus detector [compare with Eq. (33)]. Figure 45 shows some of these helper lines. The definition of filter-line set T (R) is similar to that of sets Lm(R) . Filter lines within T (R) are parallel to L1 , if (uP , vP ) is located between LF− and LF+ . Otherwise, the filter lines are tangential on the projected focusdetector borders [cf. Eq. (33)]. Figure 46 shows the filter-line set T (R) . The (L) (R) filter lines Pm and Pm are completely contained within the m-Pi window. (R) In particular, the corresponding filter line within Pm for a point (uP , vP ) within the m-Pi window fulfills the following conditions:
F F IGURE 45. Different lines of set Lm p and lines L± within the 7-Pi window. The dash-dotted lines correspond to the projections of the focus-detector boundaries [Eq. (33)].
RECONSTRUCTION ALGORITHMS
F IGURE 46.
57
Filter-line set T (R) .
IF (uP , vP ) is above Lm 3 , the filter line is tangential on the upper m-Pi window boundary , the filter line is tangential on the lower ELSE IF (uP , vP ) is above Lm−2 3 3-Pi window boundary , the filter line is tangential on the upper ELSE IF (uP , vP ) is above Lm−2 5 (m − 2)-Pi window boundary , the filter line is tangential on the lower ELSE IF (uP , vP ) is above Lm−4 5 5-Pi window boundary ··· ELSE IF (uP , vP ) is above L3m , the filter line is tangential on the upper 3-Pi window boundary ELSE the filter line is tangential on the lower m-Pi window boundary. (L) for a point (uP , vP ) within the The corresponding filter line within Pm m-Pi window fulfills the following conditions:
IF (uP , vP ) is above Lm −1 , the filter line is tangential on the upper m-Pi window boundary ELSE IF (uP , vP ) is above Lm−2 −1 , the filter line is tangential on the lower Pi-window boundary ELSE IF (uP , vP ) is above Lm−2 −3 , the filter line is tangential on the upper (m − 2)-Pi window boundary ELSE IF (uP , vP ) is above Lm−4 −3 , the filter line is tangential on the lower 3-Pi window boundary ··· ELSE IF (uP , vP ) is above L1−m , the filter line is tangential on the upper Pi-window boundary ELSE the filter line is tangential on the lower m-Pi window boundary.
58
BONTUS AND KÖHLER
F IGURE 47.
(R)
Filter-line sets P3
(R)
and P7 .
We still must specify whether the point of tangency is on the RHS or LHS (L) of (uP , vP ). In general, for filter lines within Pm , the point of tangency is on the LHS of (uP , vP ), if the line is tangential on one of the upper p-Pi window boundaries, while it is on the RHS, if the line is tangential on one of (R) the lower p-Pi-window boundaries. For filter lines within Pm , the point of tangency is on the RHS of (uP , vP ), if the line is tangential on one of the upper p-Pi window boundaries, while it is on the LHS, if the line is tangential (R) on one of the lower p-Pi window boundaries. Figure 47 exemplifies sets P3 and P7(R) , while Figure 48 illustrates sets P1(L) and P5(L) . 3. Gated and Ungated Contributions With the definitions of the filter lines we can now specify how the gated and ungated contributions μg and μug are obtained. We assume that n specifies the maximum Pi-mode; i.e. the n-Pi window can be projected completely onto the focus detector, while the (n + 2)-Pi window cannot. With this convention the filter-line sets and weights μν for μug are summarized in Table 2. The
RECONSTRUCTION ALGORITHMS
F IGURE 48.
(L)
Filter-line sets P1
59
(L)
and P5 .
contributions of all those filter-line sets must be added. For instance, line 2 in (R) (R) (R) Table 2 shows that sets L3 , L5 , . . . , Ln−2 must be incorporated using the specified weights. Table 3 summarizes the filter-line sets and weights for μg .
IV. O UTLOOK The publication of Katsevich’s Pi algorithm initiated many studies. It even resulted in new insights on fan-beam CT (Noo, Clackdoyle and Pack, 2004). We have summarized the principles and showed some applications of the methods. A more detailed analysis of the underlying mathematics was given by Chen (2003), Katsevich (2003). In practice, a utilization of all measured data is desired. A restriction to the n-Pi window in the case of helical CT is not acceptable for clinical applications. Köhler, Bontus and Koken (2006) showed how overscan data can be incorporated in the reconstruction. A nice FBP algorithm for the saddle trajectory was presented by Yang et al. (2006). Future applications could require algorithms for helical CT with
60
BONTUS AND KÖHLER TABLE 2 F ILTER -L INE S ETS AND THE C ORRESPONDING W EIGHTS FOR μug
TABLE 3 F ILTER -L INE S ETS AND THE C ORRESPONDING W EIGHTS FOR μg
nμν
Filter-line sets
(R)
n 3
Ln
(R)
2n p(p+2)
Pn
1
(R) Pn T (R)
Filter-line sets L1
Lp , 3 p < n (R) Ln (L) P1 (L) Pp , 3 p < n (L) Pn (R) Pp , 3 p < n (R) Pn
− n3
nμν
(R)
n − n+2
(L)
n 2(n+2) n 2(n+2) n n+2
−n p(p+2) − 12 −n p(p+2) 1 −2
a tilted gantry (Noo, Defrise and Kudo, 2004) or for acquisitions in which the table feed (pitch) is not constant (Katsevich, Basu and Hsieh, 2004; Ye, Zhu and Wang, 2004). Circular CT becomes increasingly attractive with greater numbers of detector rows. This is especially true if the organ under examination is completely covered by the X-ray cone. As discussed previously, a circular cone-beam acquisition yields an incomplete data set. The combination of the circle with a line provides one way to restore completeness as shown in Section III.D. Alternatively, the circle can also be combined with a helical segment (Bontus et al., 2007a, 2007b). The advantage of this approach is that the gantry need not be decelerated or accelerated, while the helical segment contributes only to low-frequency components in the same sense as the line does in the CLT. A completely new class of reconstruction algorithms appeared by exploitation of a formula derived by Zou and Pan (2004a). The authors derived two reconstruction algorithms for which the data must be filtered either in the image domain (Zou and Pan, 2004a) (backprojection filtering, BPF) or along projected Pi-lines (Zou and Pan, 2004b) (filtered backprojection, FBP). Another BPF formula was derived by Zhuang et al. (2004), which can be applied to different kinds of source trajectories. BPF algorithms apply a backprojection to data, that are only differentiated along the tangent of the trajectory [as in Eq. (43)]. The result corresponds to image data modified by a Hilbert transform. Therefore, an inverse Hilbert transform must be applied as one of the final processing steps. Pack, Noo and Clackdoyle (2005) presented a BPF method that can be used for different source trajectories and allows use of an arbitrary amount of overscan data during the backprojection.
RECONSTRUCTION ALGORITHMS
61
Nevertheless, with this method a large number of processing steps can become necessary to obtain an adequate SNR. CT reconstruction is an interesting and agile field of research. The future will show which of the proposed algorithms fulfill the requirements imposed by clinical practice.
R EFERENCES Bontus, C., Köhler, T., Proksa, R. (2003). A quasiexact reconstruction algorithm for helical CT using a 3-Pi acquisition. Med. Phys. 30 (9), 2493– 2502. Bontus, C., Köhler, T., Proksa, R. (2005). EnPiT: Filtered back-projection algorithm for helical CT using an n-Pi acquisition. IEEE Trans. Med. Imaging 24 (8), 977–986. Bontus, C., Koken, P., Köhler, T., Grass, M. (2006). CEnPiT: Helical cardiac CT reconstruction. Med. Phys. 33 (8), 2792–2799. Bontus, C., Grass, M., Koken, P., Köhler T. (2007a). Exact reconstruction algorithm for circular short-scan CT combined with a helical segment. In: Proceedings of the Fully 3D Meeting. Lindau, Germany, pp. 88–91. Bontus, C., Koken, P., Köhler, T., Proksa, R. (2007b). Circular CT in combination with a helical segment. Phys. Med. Biol. 52, 107–120. Chen, G.-H. (2003). An alternative derivation of Katsevich’s cone-beam reconstruction formula. Med. Phys. 30 (12), 3217–3225. Danielsson, P.E., Edholm, P., Eriksson, J., Magnusson-Seger, M. (1997). Towards exact 3D-reconstruction for helical cone-beam scanning of long objects. In: Proceedings of the 3D’97 Conference. Nemacolin, Pennsylvania, pp. 141–144. Defrise, M., Noo, F., Kudo, H. (2000). A solution to the long object problem in helical cone-beam tomography. Phys. Med. Biol. 45, 623–643. Grangeat, P. (1991). Mathematical Framework of Cone-Beam 3D-Reconstruction via the First Derivative of the Radon Transformation. Mathematical Methods in Tomography. Springer, Berlin, Germany. Jeffrey, A., Gradshteyn, I.S., Ryzhik, I.M. (Eds.) (1994). Table of Integrals, Series, and Products, fifth edition. Academic Press, San Diego. Katsevich, A. (2002). Theoretically exact FBP-type inversion algorithm for spiral CT. SIAM J. Appl. Math. 62, 2012–2026. Katsevich, A. (2003). A general scheme for constructing inversion algorithms for cone beam CT. Int. J. Math. Math. Sci. 21, 1305–1321. Katsevich, A. (2004a). Image reconstruction for the circle and line trajectory. Phys. Med. Biol. 49, 5059–5072. Katsevich, A. (2004b). An improved exact FBP algorithm for spiral CT. Adv. Appl. Math. 32 (4), 681–697.
62
BONTUS AND KÖHLER
Katsevich, A. (2006). 3PI algorithms for helical computed tomography. Adv. Appl. Math. 36, 213–250. Katsevich, A., Basu, S., Hsieh, J. (2004). Exact filtered backprojection reconstruction for dynamic pitch helical cone beam computed tomography. Phys. Med. Biol. 49, 3089–3103. Köhler, T., Bontus, C., Koken, P. (2006). The Radon-split method for helical cone-beam CT and its application to nongated reconstruction. IEEE Trans. Med. Imaging 25 (7), 882–897. Kudo, H., Noo, F., Defrise, M. (1998). Cone-beam filtered-backprojection algorithm for truncated helical data. Phys. Med. Biol. 43, 2885–2909. Manzke, R., Köhler, T., Nielsen, T., Hawkes, D., Grass, M. (2004). Automatic phase determination for retrospectively gated cardiac CT. Med. Phys. 31 (12), 3345–3362. Natterer, F. (1986). The Mathematics of Computerized Tomography. Wiley, New York. Noo, F., Clackdoyle, R., Pack, J. (2004). A two-step Hilbert transform method for 2D image reconstruction. Phys. Med. Biol. 49, 3903–3923. Noo, F., Defrise, M., Kudo, H. (2004). General reconstruction theory for multi-slice x-ray computed tomography with a gantry tilt. IEEE Trans. Med. Imaging 43 (9), 1109–1116. Noo, F., Pack, J., Heuscher, D. (2003). Exact helical reconstruction using native cone-beam geometries. Phys. Med. Biol. 48, 3787–3818. Pack, J.D., Noo, F., Clackdoyle, R. (2005). Cone-beam reconstruction using the backprojection of locally filtered projections. IEEE Trans. Med. Imaging 24 (1), 70–85. Proksa, R., Köhler, T., Grass, M., Timmer, J. (2000). The n-PI-method for helical cone-beam CT. IEEE Trans. Med. Imaging 19 (9), 848–863. Schaller, S., Noo, F., Sauer, F., Tam, K.C., Lauritsch, G., Flohr, T. (2000). Exact Radon rebinning algorithm for the long object problem in helical cone-beam CT. IEEE Trans. Med. Imaging 19 (5), 361–375. Tam, K.C. (1995). Three-dimensional computerized tomography scanning method and system for large objects with smaller area detectors. US Patent 5,390,112. Tam, K.C., Sauer, F., Lauritsch, G., Steinmetz, A. (1999). Backprojection spiral scan region-of-interest cone beam CT. Proc. SPIE Med. Imaging Conf. 3661, 433–441. Tuy, H.K. (1983). An inversion formula for cone-beam reconstructions. SIAM J. Appl. Math. 43 (3), 546–552. Yang, H., Li, M., Koizumi, K., Kudo, H. (2006). View-independent reconstruction algorithms for cone beam CT with general saddle trajectory. Phys. Med. Biol. 51, 3865–3884.
RECONSTRUCTION ALGORITHMS
63
Ye, Y., Zhu, J., Wang, G. (2004). Minimum detection windows, PI-line existence and uniqueness for helical cone-beam scanning of variable pitch. Med. Phys. 31 (3), 566–572. Zhuang, T., Leng, S., Nett, B.E., Chen, G. (2004). Fan-beam and cone-beam image reconstruction via filtering the backprojection image of differentiated projection data. Phys. Med. Biol. 49, 5489–5503. Zou, Y., Pan, X. (2004a). Exact image reconstruction on PI-lines from minimum data in helical cone-beam CT. Phys. Med. Biol. 49, 941–959. Zou, Y., Pan, X. (2004b). Image reconstruction on PI-lines by use of filtered backprojection in helical cone-beam CT. Phys. Med. Biol. 49, 2717–2731.
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 151
Color Spaces and Image Segmentation LAURENT BUSIN1 , NICOLAS VANDENBROUCKE1,2 , AND LUDOVIC MACAIRE1 1 Laboratoire LAGIS UMR CNRS 8146 – Bâtiment P2,
Université des Sciences et Technologies de Lille, 59655 Villeneuve d’Ascq, France 2 Ecole d’Ingénieurs du Pas-de-Calais (EIPC), Campus de la Malasisse – BP39,
62967 Longuenesse Cedex, France
I. Introduction . . . . . . . . . . . . . . . . II. Color Spaces . . . . . . . . . . . . . . . A. Digital Color Image . . . . . . . . . . . . 1. Human Color Vision . . . . . . . . . . . 2. Color Image Acquisition . . . . . . . . . . 3. Color Image Visualization . . . . . . . . . . B. Color Spaces . . . . . . . . . . . . . . 1. Primary Spaces . . . . . . . . . . . . . 2. Luminance-Chrominance Spaces . . . . . . . . 3. Perceptual Spaces . . . . . . . . . . . . 4. Independent Axis Spaces . . . . . . . . . . 5. Hybrid Color Space . . . . . . . . . . . C. Digital Color Images and Color Spaces . . . . . . . 1. Color Space Coding for Image Analysis . . . . . 2. Application to Color Spaces . . . . . . . . . D. Summary . . . . . . . . . . . . . . . III. Color Image Segmentation . . . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . B. Edge Detection . . . . . . . . . . . . . . 1. Overview . . . . . . . . . . . . . . 2. Edge Detection by the Analysis of a Color Gradient Vector C. Region Construction Based on a Spatial Analysis . . . . 1. Segmentation by Region Growing . . . . . . . 2. Segmentation by Region Merging . . . . . . . D. Region Construction Based on a Color Space Analysis . . 1. Introduction . . . . . . . . . . . . . . 2. Analysis of One-Dimensional Histograms . . . . . 3. Analysis of the Three-Dimensional Color Histogram . . 4. Clustering-Based Segmentation . . . . . . . . 5. Spatial-Color Classification . . . . . . . . . E. Summary . . . . . . . . . . . . . . . IV. Relationships between Segmentation and Color Spaces . . . A. Introduction . . . . . . . . . . . . . . . B. Edge Detection . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66 66 67 67 71 74 75 76 84 92 100 102 104 104 106 109 110 110 111 111 114 115 115 117 119 119 123 129 133 136 139 140 140 141
65 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)00402-8
Copyright 2008, Elsevier Inc. All rights reserved.
66
BUSIN , VANDENBROUCKE AND MACAIRE
1. Quantitative Evaluation Methods with Ground Truth . . . . . . . 2. Evaluation Methods for Color Image Segmentation by Edge Detection . . C. Region Construction . . . . . . . . . . . . . . . . . 1. Quantitative Evaluation Methods with Ground Truth . . . . . . . 2. Quantitative Evaluation Methods without Ground Truth . . . . . . 3. Evaluation Methods for Color Image Segmentation by Region Construction D. Selection of the Most Discriminating Color Space . . . . . . . . 1. Pixel Classification in an Adapted Hybrid Color Space . . . . . . 2. Unsupervised Selection of the Most Discriminating Color Space . . . E. Conclusion . . . . . . . . . . . . . . . . . . . . V. Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
141 143 143 143 146 147 147 149 151 161 161 162
I. I NTRODUCTION One of the most important problems in color image analysis is segmentation. This chapter considers color uniformity as a relevant criterion to partition an image into significant regions. For this purpose, the color of each pixel is initially represented in the (R, G, B) color space where the color component levels of the corresponding pixel are namely red (R), green (G), and blue (B). Other color spaces can be used, and the performance of an image segmentation procedure depends on the choice of the color space. Our goal is to describe the relationships between the color spaces and the color image segmentation. The different color spaces used to characterize the colors in the digital images are detailed in Section I. Section II is devoted to the segmentation methods designed for color image analysis. In Section III, we present several studies about the impact of the choice of the color space on the quality of results provided by segmentation methods.
II. C OLOR S PACES This section provides the details of how the colors of the pixels of digital color images can be represented in different color spaces for color image analysis applications. The first part of this section shows how color is quantized to produce a digital color image. The basics of human color vision are first summarized to facilitate understanding of this scheme. Then the problem of color image acquisition is discussed. In the second part, the most classical color spaces are regrouped into different families. In the framework of the use of these color spaces by color image analysis applications, the last part highlights the need to efficiently code the components of these color spaces to preserve their intrinsic properties.
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 1.
67
Digital color image acquisition.
A. Digital Color Image A digital color image acquisition device is composed of different elements (Figure 1). First, the characteristics of the light and the material that constitute the observed object determine the physical properties of its color. Then, color images are acquired by an image sensor with an optical device and finally digitized thanks to a digital processor (Sharma and Trussell, 1997; Trussell, Saber and Vrhel, 2005; Vrhel, Saber and Trussell, 2005) before being displayed on a monitor or printed by a printer. To understand how color is represented in digital images, it is first necessary to explain how it is perceived by a human observer. Thus, the basics of the color human vision are presented first. Then, the color image acquisition device is briefly described, and finally, the color image display device is presented. 1. Human Color Vision The human perception of color is the response of the human receptor (the eye) and the human interpretation system (the brain) to a color stimulus— the reflection or the transmission of a light source by a material. In addition to the light and the material characterized by their physical properties, the eye and the brain, which are specific to each observer, provide physiologic and psychological features to the color sensation (Wyszecki and Stiles, 1982). This section presents these different properties of the color, which are very complex notions. a. Physical Properties. First, there is no color without any light. The light, at the beginning of the color sensation, is defined as a electromagnetic radiation—a set of electromagnetic waves produced by the propagation of luminous particles, the photons. An electromagnetic radiation is characterized
68
BUSIN , VANDENBROUCKE AND MACAIRE
by its wavelength λ expressed in meters (m). Visible light is the region of the electromagnetic radiation emitted by the sun for which our eyes are sensitive (between ∼ 380 and 780 nm). This light can be scattered in a spectrum of luminous rays thanks to a prism. Each ray is composed of radiations with the same wavelength, the monochromatic waves. Thus, each light source is characterized by its spectral power distribution (SPD), the quantity of energy per unit of wavelength (usually 1 nm). Certain sources corresponding with classical observation conditions have been normalized by the Commission Internationale de l’Éclairage (CIE) under the name of illuminant (Commission Internationale de l’Éclairage, 1986). CIE is an international standardization organization devoted to establishing recommendations for all matters relating to the science and art of lighting. Another characteristic of an illuminant is its correlated color temperature—the temperature for which it would be necessary to carry a black body radiator to obtain the visual impression nearest to that produced by the corresponding source of light. Illuminants A (light emitted by a black body radiator at the temperature of 2856 ◦ K or incandescence lamp), C (direct light of the sun at midday), D (daylight), F (fluorescent lamp), and E (light of equal energy) are the most widely used illuminants. When the light source lights a material, the luminous rays emitted by this source are reflected or transmitted by this material. Thus, the material transforms the properties of the light to produce a color stimulus. When the light comes into contact with a material, two phenomena occur: a surface reflection of the incident radiations and a penetration of the incident light into the material. When the light penetrates into the material, it meets the pigments. Pigments are particles that determine the color of material while absorbing, diffusing, or transmitting the light that reaches them. The pigments modify the SPD of the light by selectively absorbing a part of the corresponding electromagnetic waves. The light that is not absorbed by the pigments is diffused or transmitted outside the material and carries the color information of this material. According to its nature, a material can be characterized by its capacity to reflect (reflectance) or to transmit (transmittance) incident light energy. Conversely, it also can be characterized by its capacity to absorb (absorption) the incident light energy. Finally, the color of a material depends not only on its characteristics but also on the conditions of lighting and viewing (position, orientation, distance, and so on). b. Physiologic and Psychological Properties. After crossing several elements of the eye, the color stimulus reaches a photosensitive zone at the back of the eye, the retina, where the images of the observed scene are projected. The retina contains two kinds of photosensitive cells: cones and rods. The rods allow night vision (scotopic vision) while the cones allow diurnal vision (photopic vision). There are three kinds of cones: the S cones, which are
COLOR SPACES AND IMAGE SEGMENTATION
69
sensitive to short wavelengths close to blue; the M cones, which are sensitive to medium wavelengths close to green; and the L cones, which are sensitive to long wavelengths close to red. The eye converts a color stimulus into a color signal at the entry of the optic nerve. The optic nerve transmits this electric signal to the external (or lateral) geniculate body, a relay charged to establish connections with fibers going to the brain. A first analysis of the data is achieved there. According to the opposite colors theory of Hering (1875), it seems that the color signal is coded in an antagonistic fashion—by an achromatic signal (black-white opposition) and by red-green and blue-yellow opposition signals (Hering, 1875). These signals finally are transmitted to another area of the brain, the visual cortex, where color interpretation is performed. c. Trichromacy. Color human perception is characterized by its threedimensional (3D) aspect. The works of Young at the beginning of the nineteenth century and Helmholtz in 1866 showed that any color stimulus can be reproduced by the mixture of three other stimuli: the red, the green and the blue stimuli, termed primaries or reference stimuli (Helmholtz, 1866; Young, 1807). This principle is known as trichromacy, trichromatic theory, or color mixture. Three primaries are therefore necessary and sufficient to match any color by mixture; colorimetry, the science of color measurement, is based on this theory. The amounts of each of the primaries necessary to match a color are called the tristimulus values. There are two kinds of color mixture: the additive and subtractive color mixtures (Figure 2). The
(a) F IGURE 2. Insert.)
(b)
Color mixture. (a) Additive color mixture. (b) Subtractive color mixture. (See Color
70
BUSIN , VANDENBROUCKE AND MACAIRE
additive color mixture is the result of the juxtaposition of colored lights corresponding to each of the three primaries. The additive mixture in equal amount of the three primaries creates white. Additive color mixture is used for constituting the image of a television, a monitor, or the image acquired by a color video or still camera. The subtractive color mixture is the result of the selective absorption principle of the light by a material with respect to different wavelengths. Subtractive color mixture is used in printing or painting. The primaries used differ according to the type of color mixture. The primaries used by additive color mixture are the red, the green and the blue colors, whereas the magenta, the cyan, and the yellow colors are used by subtractive color mixture (the color black is often added as a primary in printing applications). The primaries of these two types of color mixture are complementary. The additive color mixture of two complementary colors creates the color white. On the basis of the trichromatic theory established by Young, in 1853 Grassman proposed the laws consigning the fundamental properties of the color mixtures that were supplemented by Abney in 1913 (Abney, 1913; Grassman, 1853). These laws make it possible to apply the additive, associative, multiplicative, and transitive properties of the algebraic equalities to the colorimetric equalities. Today Grassman’s laws are the mathematical basis of colorimetry. d. Perceptual Attributes. The human perception of color is a subjective reaction to the stimulation of the eye, and it seems to be more adapted to characterize a color in terms of brightness, hue, and saturation. Brightness is the attribute of a visual sensation according to which an area appears to emit more or less light. Thus, it corresponds to a feeling in terms of light (like dark or luminous) and characterizes the luminous level of a color stimulus. The terms brightness, lightness, luminance, luminosity, radiant intensity, luminous intensity, illumination, luma, and so on are often used in the literature to indicate the concept of luminosity. Poynton (1993) highlights the confusion between these terms. Hue corresponds to the main colors: red, green, blue and yellow. It is the attribute of a visual sensation according to which an area appears to be similar to one of these perceived colors. It corresponds to the dominant wavelength of a SPD—the wavelength with the highest energy. Hue can be defined by an angle termed hue angle. Since black, white, or gray are colors without any hue, they are called neutral colors or achromatic colors. Saturation is an attribute that allows the colorfulness of an area to be estimated with respect to its brightness. The chroma is a brightness-dependent attribute that also estimates the colorfulness of the hue. Saturation represents the purity of a perceived color, such as bright, pale, dusty, and so on. It
COLOR SPACES AND IMAGE SEGMENTATION
71
corresponds to the ratio between the energy of the dominant wavelength of an SPD and the sum of the energies of the other wavelengths. 2. Color Image Acquisition The previous section explained that, in the framework of human vision, the eye and the brain are the receptor and the interpretation systems of the color stimuli, respectively. In the framework of computer vision, the receptor is a color camera and the interpretation system is a computer (see Figure 1). This section describes how a color image can be acquired by a color camera and be digitized by an image acquisition device. a. Color Camera. The main element of a camera is the image sensor (Trussell, Saber and Vrhel, 2005; Vrhel, Saber and Trussell, 2005). The image sensor is composed of a set of photosensitive elements that convert the luminous flux (photon) into an electric information (electron) to provide one or several video signals to the digitization system. Each photosensitive receptor produces an increasing voltage in function of the received light intensity. They can be arranged either as a row (line scan camera) or as an array (area array camera). The obtained image is made up of a set of points called pixels,1 which correspond to the photosensitive receptor. Two types of technology exist to make a sensor array: charged coupled device (CCD) technology and complimentary metal-oxide–semiconductor (CMOS) technology. The CCD sensors provide higher signal-to-noise ratio (SNR) levels, whereas the CMOS sensors have a lower SNR but allow the integration of other components into each photosensitive receptor of the sensor. By analogy with the human vision system, color information is created by means of three color filters. Each is sensitive to the short (red), the medium (blue), or the long (green) wavelengths. Two kinds of color camera exist (Figure 3): • One-chip color cameras are designed with one single-sensor array and three interlaced color filters, termed a color filter array. A Bayer color filter array is the most widely used device. Thus, the photosensitive receptors that are laid sequentially on a same line of the sensor are fitted with red, green, and blue filters in succession. The color information is obtained by receptors located at different places. This technology generates a loss of resolution and chromatic aberration (false colors). Therefore, it requires interpolation techniques (termed demosaicking) so that the color of each pixel of the image is defined by three components (Gunturk et al., 2005). Other 1 PICture ELement = pixel.
72
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b) F IGURE 3. Insert.)
Color camera. (a) One-chip color camera. (b) Three-chip color camera. (See Color
technologies, based on only one sensor array, resolve these problems (Lyon and Hubel, 2002). • The three-chip color cameras are fitted out with three sensor arrays associated with an optical system based on prisms. Each of the three sensors receives simultaneously either the red, or the green, or the blue stimulus via dichroic filters fixed on prisms. The color of a pixel is provided by the response of three receptors and therefore there is not loss of resolution. Demosaicking is not necessary, and the quality of the image is better than the quality of an image acquired by a one-chip color camera. However, the three-chip technology can generate the phenomenon of shading, which creates a color scale on the image of a white background when luminous rays reaching the prisms are not perfectly parallel.
COLOR SPACES AND IMAGE SEGMENTATION
73
Color cameras are fitted with infrared filters because sensors are sensitive to wavelengths located over the visible domain. The spectral response of an image sensor is not the same as that of the spectral response of the human eye. b. Color Image Digitization. A digitization system achieves the conversion (amplification, sampling, and so on) of one or more video signals derived from an image sensor into a digital signal where each element of a color image is represented by three digital values that can be stored in the computer memory for further processing and display application. This stage of acquisition is called color quantization. Color quantization consists of associating, for each pixel of a color image, three digital values—a red level (R), a green level (G), and a blue level (B)— which correspond to the tristimulus values of this color. Thus, a digital color image is an array of pixels that are located by their spatial coordinates in the image plane and whose color is defined by the three components R, G, and B. Generally, each of these three components is coded on 8 bits and are quantized with 256 different unsigned integer values belonging to the interval [0, 255]. A color is so coded with 3 × 8 (24) bits and it is then possible to represent 224 (16,777,216) colors by additive mixture, whereas the human visual system allows differentiation of approximatively 350,000 colors. However, even though this quantization represents more colors than the human observer can see, the human visual system is not uniform; for specific intervals, it is necessary to code the color component on 12 bits to discriminate all the colors that the human visual system can perceive on this interval. Digital values, which are provided by the digitization system, depend on the characteristics of the device used, the choice of lighting system, and the setup of the acquisition system. Indeed, the SPD of a light depends on the lighting system, and the spectral responses of an image sensor depend on the chosen color camera. Thus, the R, G, and B components of a given color are not the same with different equipments and parameter setups; therefore, the color components of an acquired image are device dependent (or rendered) (Trussell, Saber and Vrhel, 2005). To ensure that the R, G, and B color components are invariant to acquisition system, it is necessary to apply a calibration step. Calibration is achieved by using a chart (Macbeth Color Checker, IT8 chart, white target) with different colors whose theoretical tristimulus values are known a priori. The image of this chart is acquired by the acquisition system and the parameters of this system are adjusted so that the R, G, and B components of each color on the chart in the image are the same as their theoretical values. This adjustment can be achieved via simple white balance or by more precise numerical methods (Bala and Sharma, 2005; Ramanath et al., 2005).
74
BUSIN , VANDENBROUCKE AND MACAIRE
3. Color Image Visualization In most applications, color images are designed to be displayed on a screen or printed on paper. For color image analysis applications, it is not always necessary to display the color when the result of the analysis is not an image but instead some data. However, many color spaces have been developed to display color images on televisions or monitors and depend on their technologies. In order to understand these color spaces, which are presented later, we describe how the colors are displayed. a. Display. The display of acquired color image is usually achieved by a cathode ray tube (CRT) monitor or by a liquid crystal display (LCD) flat panel monitor (Vrhel, Saber and Trussell, 2005). The screen of the CRT monitor consists of a thin layer of a mosaic of red, green, and blue phosphor cells. The CRT is to bombard the screen with electron beams that stimulate each of the three types of phosphor with varying intensities. Three colored lights (whose spectral characteristics depend on the chemical nature of the phosphor) are produced and display a color on the screen according to the additive mixture principle. For an LCD flat panel monitor, color display is produced by a mosaic of red, green, and blue color filters that are backlit by a fluorescent light source. These backlight and color filters were designed to produce chromaticities identical to those of the CRT phosphors to ensure compatibility between the two devices. To achieve the display of a color image on a screen, the digital values of the color components of each pixel of the image are loaded into the memory of the video card. They then are converted to analog data to specify the intensity of the beam (the amount of light that reaches the screen). This intensity, denoted x, is a nonlinear function, denoted Γ , of the voltage generated by the video card. For a CRT monitor, it follows a power law defined by Γ (x) = x γ , where the value of γ generally ranges between 2 and 3. For an LCD monitor, the response of an LCD pixel cell is quite different from that of a CRT and tends to follow a sigmoidal law. To correct this nonlinearity, an inverse law must be applied when the image is displayed. This operation is known as gamma correction. b. Gamma Correction. Generally, the acquisition system performs the gamma correction, which consists of compensating for the nonlinearity of display devices by transforming the acquired video signals so that the R, G, and B trichromatic components of the pixels of the acquired image are corrected. For example, a television channel transmits the signals that can be displayed on a CRT TV screen and are gamma corrected. This correction depends on
COLOR SPACES AND IMAGE SEGMENTATION
75
the technology used, which defines the primaries necessary to achieve the additive mixture of the colors of the image. More precisely, this technology depends on the country. Thus, North American television receivers accept both National Television Standards Committee (NTSC) old standard, which uses the primaries defined by the Federal Communications Commission (FCC), and the Society of Motion Picture and Television Engineers (SMPTEC) new standard. For the NTSC standard, γ is set to 2.2. European television receivers satisfy the phase alternation by line (PAL) German standard, which uses the primaries defined by the European Broadcasting Union (EBU) or can satisfy the SÉquentiel Couleur À Mémoire (SECAM) French standard. For the PAL standard, γ is set to 2.8. With most color cameras, it is possible to adjust the gamma correction directly on the red, green, and blue acquired signals. This camera setup allows correction of the display problems only on monitors and so, it is not necessary to use this setup for color image analysis application. On the other hand, to render color applications or for color measure applications on a screen, it is necessary to apply gamma correction on the displayed color image. When color images whose acquisition conditions are not specified (web images) are considered, it is very difficult to know whether these images are gamma corrected. Poynton (1996) noticed an amazing coincidence between the gamma correction law and the transfer function of the human eye. Indeed, the human eye response to a stimulus is nonlinear but corresponds to a logarithmic law very close to the gamma correction law. Thus the application of a gamma correction on the acquired color component produces a color representation close to the human color perception. According to Poynton (1993), even if the gamma correction is not necessary for technological reasons, it is useful for perceptual reasons. B. Color Spaces A color space is a geometrical representation of colors in a space and allows specification of colors by means of (generally) three components whose numerical values define a specific color. The first proposed color spaces to specify colors were based on color human perception experiments designed to develop classifications of colors. The ordered color systems are collections of color samples (color charts, color atlas). These spaces are generally 3D spaces where the units of the components are merely conventional and help to specify colors within this representation. These spaces (e.g., Munsell system, Natural Color System [NCS], Optical Society of America [OSA] system, or Deutsches Institut für Normung [DIN] system) are rarely used for color image analysis applications.
76
BUSIN , VANDENBROUCKE AND MACAIRE
The most frequently used spaces propose a metric to measure the distance between colors. The goal of this section of the text is to present the most frequently used classical color spaces in the framework of color image analysis. According to their characteristics, they are classified into four families: 1. The primary spaces are based on the trichromatic theory, assuming that it is possible to match any color by mixing appropriate amounts of three primary colors. 2. The luminance-chrominance spaces where one component represents the luminosity and the two others the chromaticity. 3. The perceptual spaces attempt to quantify the subjective human color perception by using the intensity, the hue, and the saturation components. 4. The independent axis spaces resulting from different statistical methods, which provide the least correlated components as possible. Since many color spaces for different applications and several definitions for the same color space exist, it is difficult to select only one for a color image analysis application; therefore it is important to provide an overview of their specificities and their differences. 1. Primary Spaces The primary spaces are based on the trichromatic theory, which assumes that we are able to match any color by mixing appropriate amounts of three primary colors. By analogy to physiology of the human eye, the primary colors are close to red (R), green (G), and blue (B). Therefore, a primary space depends on the choice of a set of primary colors denoted here [R], [G], and [B]. Ratios fixed for each primary are defined in order to reproduce a reference white (or white point), denoted [W ], by the additive mixing of the same amount of each primary. The choice of the primary depends on the device used, and the reference white is generally a CIE illuminant. The next sections of the text present the most useful primary spaces. They can be separated into the real primary spaces, for which the primary colors are physically feasible, and the imaginary primary spaces, whose primaries do not physically exist. a. Real Primary Spaces. The CIE (R, G, B) color space is derived from color-matching experiments led in 1931 by Wright and Guild (Commission Internationale de l’Éclairage, 1986) that used the primaries, denoted here [RC ], [GC ] and [BC ] (where C indicates CIE), respectively, to match all the available colors of the visible spectrum. These primaries are red, green, and blue monochromatic color stimuli with wavelengths of 700.0, 546.1, and
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 4.
77
RGB color cube. (See Color Insert.)
435.8 nm, respectively, and the reference white is the equal-energy illuminant E (Commission Internationale de l’Éclairage, 1986). The CIE (R, G, B) color space can be considered as the reference (R, G, B) color space because it defines a standard observer whose eye spectral response approximates the average eye spectral response of a representative set of human observers. For each of the three [RC ], [GC ], and [BC ] primaries there correspond three −−→ −−→ −−→ normalized vectors RC , GC , and BC , respectively, that constitute the reference of a vectorial space whose original point is denoted O. In this space, each color stimulus [C] is represented by a point C that defines the color vector −−→ OC. The coordinates of this vector are the tristimulus values RC , GC , and BC . The coordinates of specific color vectors are negative because they correspond to color stimuli that are not reproducible (matched) by additive mixture. Points that correspond to color stimuli with positive tristimulus values are inside a cube, named the RGB color cube (Figure 4). The original point O corresponds to the black color (RC = GC = BC = 0), whereas the reference white is defined by the additive mixture of equal quantities of the three primaries (RC = GC = BC = 1).
78
BUSIN , VANDENBROUCKE AND MACAIRE
The straight line joining the points Black and White in Figure 4 is called the gray axis, the neutral color axis, or the achromatic axis. Indeed, the points of this line represent gray nuances from black to white. The tristimulus values of a color stimulus depend on the color’s luminance. Two different color stimuli can be described by the same chromatic characteristics (here called chrominance), but their tristimulus values can be different due to their luminance. In order to obtain color components that do not depend on the luminance, it is necessary to normalize their values. For this purpose, it is possible to divide each color component value by the sum of the three ones. The three thus-obtained color components, called chromaticity coordinates or normalized coordinates, are denoted rC , gC , and bC , respectively, and are defined by: ⎧ RC ⎪ r = RC +G , ⎪ C +BC ⎨ C GC gC = RC +G , (1) C +BC ⎪ ⎪ ⎩b = BC . C
RC +GC +BC
The transformation defined by Eq. (1) corresponds to the projection of the C point on the plane that is normal to the achromatic axis. This plane is defined by the equation: RC + GC + BC = 1, and the intersection between this plane and the RGB color cube constitutes an equilateral triangle whose summits are the three primaries [RC ], [GC ], and [BC ]. This triangle, termed the Maxwell triangle, is represented by a dotted line in Figure 4. The color space that can be associated with the chromaticity coordinates is called the normalized (RC , GC , BC ) color space and is denoted (rC , gC , bC ). As rC + gC + bC = 1, two components are sufficient to represent the chrominance of a color stimulus. Thus, Wright and Guild have proposed a diagram called the chromaticity diagram (rC , gC ). Figure 5 represents this diagram, which contains a curve (the spectrum locus), joining the points corresponding to monochromatic color stimuli whose wavelengths range between 380 and 780 nm. The two extremities of this curve are connected by a straight line known as the nonspectral line of purples. All the colors of the visible spectrum are contained in the area defined by the spectrum locus and the purple line. In this figure, the Maxwell triangle does not contain all the colors because several colors cannot be matched by additive mixture and are defined by negative chromaticity coordinates. As mentioned previously, the CIE (R, G, B) color space can be considered as the reference (R, G, B) color space because it defines the standard observer. In many application fields, however, it is not possible to use this color space and other primary colors must be used. These other (R, G, B) color spaces differ by the primaries used to match colors by additive mixture and by the used reference white. Thus, they are device dependent
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 5.
79
CIE chromaticity diagram (rC , gC ).
color spaces (Süsstrunk, Buckley and Swen, 1999). The most widely known (R, G, B) color spaces used by the following application fields are as follows: • Video and image acquisition. In color image analysis applications, the images can be acquired by analog or digital video camera, by digital still camera, by scanner, and so on. The color of each pixel of an image is defined by three numerical components that depend on the acquisition device and its setup to obtain a reference white. Thus, an image acquired with the same lighting and observation conditions by two different cameras produces different colors if the primaries’ colors associated with the two cameras are not the same. In this chapter, the (R, G, B) color space used for acquiring images is called the image acquisition space. • Television image display. For TV analog signal transmission, it is assumed that the chromaticity coordinates of the primaries are different from those of
80
BUSIN , VANDENBROUCKE AND MACAIRE
the CIE primaries because they correspond to phosphors or color filters with different wavelengths. The different TV signal transmission standards use (R, G, B) color spaces that are different. For example, the NTSC standard uses the (RF , GF , BF ) color space based on the FCC primaries; the PAL and SECAM standards use the (RE , GE , BE ) color space based on the EBU primaries, and SMPTE-C standard uses the (RS , GS , BS ) color space based on the SMPTE primaries. In the same way, the reference white to reproduce depends on the display technology and its setup. For example, we generally assume that the reference white used by the NTSC standard is the C illuminant, whereas the D65 illuminant is used by the PAL/SECAM or SMPTE-C standard. • Computer graphics. Primaries and reference white used by the monitors of computers are defined by their manufacturers and differ from those used in television or defined by CIE. In the same way, when an image is generated by computer, the color of the pixel depends on the (R, G, B) color space that uses the software for image coding. • Image printing. Printing systems usually use color spaces based on a subtractive color mixture. The primaries used are cyan, magenta, and yellow, and the primary color space is the (C, M, Y ) color space that can be derived from an (R, G, B) color space by the relation: C R (2) M =1− G . Y B The reference white used in printing systems is often the D50 illuminant. This multitude of primary color spaces leads to many errors in color management. It is necessary to dispose of an (R, G, B) color space well suited to CRT and LCD monitors, TVs, scanners, cameras, and printers to ensure compatibility between these different devices and so that the colors are rendered identically (Bala and Sharma, 2005; Ramanath et al., 2005). This is why the sRGB color space has been proposed as a standard default (R, G, B) color space that characterizes an average monitor (IEC 61966-2-1/FDIS, 1999). This color space can be considered an independent device (or unrendered) color space for color management applications. The reference white and the primaries are defined by the International Telecommunication Union (ITU) standard according to the ITU-R BT.709 recommendation (ITU-R BT.709-5, 2002). Thus, the sRGB color space is denoted here as (RI , GI , BI ). Regardless of which primary space is used, it is possible to achieve a primary conversion as a simple linear transformation of the component values by means of a matrix equation. So there exists a transformation matrix P that allows (R, G, B) color space conversions (Pratt, 1978, 1991).
COLOR SPACES AND IMAGE SEGMENTATION
81
For each primary space, it is possible to define a RGB color cube and chromaticity coordinates. b. Imaginary Primary Space. The (R, G, B) color spaces present some major drawbacks: • Because it is not possible to match all the colors by additive mixture with a real primary space, the tristimulus values and chromaticity coordinates can be negative. • The tristimulus values depend on the luminance, which is a linear transformation of the R, G, B color components. • Because (R, G, B) color spaces are device dependent, there is a multitude of (R, G, B) color spaces with different characteristics. The CIE therefore defines the imaginary primary space, named the CIE (X, Y, Z) color space, where the primary colors are imaginary (virtual or artificial) in order to overcome the problems of the primary spaces. In this space, the [X], [Y ], and [Z] primaries are not physically realizable, but they have been defined so that all the color stimuli are expressed by positive tristimulus values and so that one of these primaries, the [Y ] primary, represents the luminance component. Because all the (R, G, B) color spaces can be converted to the (X, Y, Z) color space by linear transforms, this space is an independent device color space. It defines an ideal observer, the CIE 1931 standard colorimetric observer, and all the colorimetry applications are based on it. Chromaticity coordinates can be deduced from the (X, Y, Z) color space to obtain a normalized (X, Y, Z) color space denoted (x, y, z). Thus, the chromaticity coordinates x, y, and z are derived from the X, Y , and Z tristimulus values by: ⎧ X ⎪ ⎨ x = X+Y +Z , y = X+YY +Z , ⎪ ⎩ z = X+YZ +Z .
(3)
As x + y + z = 1, z can be deduced from x and y. Thus, the colors can be represented in a plane called the (x, y) chromaticity diagram (Figure 6). In this diagram, all the colors are inside the area delimited by the spectrum locus and the purple line. Since the chromaticity coordinates of these colors are positive, the CIE primaries [X], [Y ], and [Z] allow matching of any color by additive color mixture.
82
BUSIN , VANDENBROUCKE AND MACAIRE
F IGURE 6.
(x, y) Chromaticity diagram.
The conversion from an (R, G, B) color space to the (X, Y, Z) color space is defined by the following equation: X R X R XG XB (4) Y = P × G , with P = YR YG YB . ZR ZG ZB Z B The coefficients of the P matrix are defined with respect to the [R], [G], and [B] primaries and the reference white used to reproduce the equal mixing of the three [X], [Y ], [Z] primaries. Generally, the primaries and the reference white are characterized by their x and y chromaticity coordinates. Table 1 lists the x and y chromaticity coordinates of the primaries used by different (R, G, B) color spaces, and Table 2 lists those of the A, C, D65, D50, F2, and E illuminants.
COLOR SPACES AND IMAGE SEGMENTATION
83
TABLE 1 x, y AND z C HROMATICITY C OORDINATES OF D IFFERENT P RIMARIES Standard
Primaries
x
y
z
CIE
[RC ] [GC ] [BC ]
0.735 0.274 0.167
0.265 0.717 0.009
0.000 0.009 0.824
FCC
[RF ] [GF ] [BF ]
0.670 0.210 0.140
0.330 0.710 0.080
0.000 0.080 0.780
SMPTE
[RS ] [GS ] [BS ]
0.630 0.310 0.155
0.340 0.595 0.070
0.030 0.095 0.775
EBU
[RE ] [GE ] [BE ]
0.640 0.290 0.150
0.330 0.600 0.060
0.030 0.011 0.790
ITU
[RI ] [GI ] [BI ]
0.640 0.300 0.150
0.330 0.600 0.060
0.030 0.010 0.790
TABLE 2 x, y AND z C HROMATICITY C OORDINATES OF D IFFERENT I LLUMINANTS AND T HEIR C ORRELATED C OLOR T EMPERATURES (Tp ) Illuminant
Tp (◦ K)
x
y
z
A C D50 D65 E F2
2856 6774 5000 6504 5400 4200
0.448 0.310 0.346 0.313 0.333 0.372
0.407 0.316 0.358 0.329 0.333 0.375
0.145 0.374 0.296 0.358 0.333 0.253
A set of primaries defines a triangle in the (x, y) chromaticity diagram whose summits are the chromaticity coordinates of these primaries. Examples of this triangle and white points have shown in the (x, y) chromaticity diagram in Figure 7. This figure shows that none of these triangles contains all the visible colors; this means that all the colors cannot be matched by additive mixture with an (R, G, B) color space. Each (R, G, B) color space defines a set of matched colors called a gamut.
84
BUSIN , VANDENBROUCKE AND MACAIRE
F IGURE 7.
Illuminants and gamuts in the (x, y) chromaticity diagram.
Table 3 provides examples of P transformation matrices used to convert (R, G, B) color spaces to the (X, Y, Z) color space (Pascale, 2003). 2. Luminance-Chrominance Spaces Because a color can be expressed by its luminosity and chromaticity, several color spaces separate the luminance from the chrominance information. These color spaces can be categorized in the family of luminance-chrominance spaces. The components of a luminance-chrominance space are derived from the component of an (R, G, B) color space by linear or nonlinear transformations. The type of transformation depends on the type of luminance-chrominance spaces that can be classified in the following spaces:
COLOR SPACES AND IMAGE SEGMENTATION
85
TABLE 3 (R, G, B) TO (X, Y, Z) C OLOR S PACE C ONVERSION (PASCALE , 2003) Primary system
Color space
Reference white
CIE
(RC , GC , BC )
E
FCC
(RF , GF , BF )
C
SMPTE
(RS , GS , BS )
D65
EBU
(RE , GE , BE )
D65
ITU
(RI , GI , BI )
D65
P transformation matrix ⎡ 0.489 0.311 ⎣ 0.176 0.813 ⎡ 0.000 0.010 0.607 0.174 ⎣ 0.299 0.587 ⎡ 0.000 0.066 0.394 0.365 ⎣ 0.212 0.701 ⎡ 0.019 0.112 0.431 0.342 ⎣ 0.222 0.707 ⎡ 0.020 0.130 0.412 0.358 ⎣ 0.213 0.715 0.019 0.119
⎤ 0.200 0.011 ⎦ 0.990 ⎤ 0.200 0.114 ⎦ 1.116 ⎤ 0.192 0.087 ⎦ 0.958 ⎤ 0.178 0.071 ⎦ 0.939 ⎤ 0.180 0.072 ⎦ 0.950
• The perceptually uniform spaces, which propose a metric to establish a correspondence between a color difference perceived by a human observer and a distance measured in the color space. • The television spaces, which separate luminosity signal to chromaticity signals for the television signal transmission. • The antagonist (or opponent color) spaces, which aim to reproduce the model of the opponent color theory proposed by Hering. Let L be the luminance component of a luminance-chrominance space. In function of the specified luminance-chrominance space, the luminance component can represent the lightness, brightness, luminous intensity, luma, or luminance. The term luminance used in this chapter is a general terminology of one of these magnitudes. Let Chr1 and Chr2 be the two chrominance components of a luminance-chrominance space. A luminance-chrominance space is there denoted as (L, Chr1 , Chr2 ). a. Perceptually Uniform Spaces. Nonuniformity is one drawback of the (X, Y, Z) color space. Indeed, in the (x, y) chromaticity diagram, an equal distance between two color points does not correspond to the same color difference perceived by a human observer according to the considered area of the diagram. Thus there are areas in the (x, y) chromaticity diagram for which color differences are not perceptible by a human observer. The size and orientation of these regions, known as the MacAdam ellipses, depend
86
BUSIN , VANDENBROUCKE AND MACAIRE
on their positions in the diagram (MacAdam, 1985). This nonuniformity is a problem for applications where it is necessary to measure color difference. Indeed, colors that are perceptually close can be separated by longer distances, whereas colors that are perceptually different can be close in the (x, y) chromaticity diagram. Because the Euclidean distances evaluated in the (R, G, B) or (X, Y, Z) color spaces do not correspond to the color differences that are actually perceived by an human observer, the CIE recommends two perceptually uniform spaces—the (L∗ , u∗ , v ∗ ) and (L∗ , a ∗ , b∗ ) color spaces—where L∗ represents the lightness (luminance component) and where u∗ , v ∗ and a ∗ , b∗ are chromaticity coordinates (chrominance components) (Commission Internationale de l’Éclairage, 1986). The first perceptually uniform color space proposed by the CIE in 1960 was the (U, V , W ) color space, which is derived from the (X, Y, Z) color space. In this color space, V is a luminance component and it is possible to define a uniform chromaticity diagram (called the 1960 Uniform Chromaticity Scale [UCS] diagram or (u, v) chromaticity diagram). Because the (U, V , W ) color space does not satisfy the nonuniformity problem, the CIE proposed the (U ∗ , V ∗ , W ∗ ) in 1964 (Pratt, 1978, 1991). This color space finally underwent significant modification in 1976 to become the (L∗ , u∗ , v ∗ ) color space (also named the CIELUV color space). The luminance component of the (L∗ , u∗ , v ∗ ) color space is expressed as: ⎧ ⎨ 116 × 3 Y − 16 if Y > 0.008856, YW YW L∗ = (5) ⎩ 903.3 × Y if Y 0.008856, YW
YW
where X W , Y W , and Z W are the tristimulus values of the reference white. The chrominance components are (6) u∗ = 13 × L∗ × u − u W , (7) v ∗ = 13 × L∗ × v − v W , with 4X , X + 15Y + 3Z 9Y , v = X + 15Y + 3Z
u =
(8) (9)
where u W and v W are the chrominance components of u and v for the reference white, respectively. Figure 8 shows the CIE 1976 (u , v ) chromaticity diagram derived from the (L∗ , u∗ , v ∗ ) color space.
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 8.
87
(u , v ) Chromaticity diagram.
The (L∗ , a ∗ , b∗ ) color space proposed by the CIE (also named the CIELAB color space) is derived from the (X, Y, Z) color space by nonlinear relations. The luminance component is defined by Eq. (5) and the chrominance components are expressed as Y X ∗ −f , (10) a = 500 × f XW YW Z Y b∗ = 200 × f − f , (11) YW ZW with
√ 3 x f (x) = 7.787x +
16 116
if x > 0.008856, if x 0.008856.
(12)
88
BUSIN , VANDENBROUCKE AND MACAIRE
In these two last uniform color spaces, the luminance component L corresponds to the lightness (or brightness) and represents the human eye response to a level of luminance. The CIE models this nonlinear response by a cubic root relation. The first chrominance component (a ∗ or u∗ ) of these two color spaces corresponds to a green-red color opposition, whereas the second chrominance component (b∗ or v ∗ ) corresponds to a blue-yellow color opposition. b. Television Spaces. The transmission of TV signals requires the separation between the luminance and chrominance components. This separation can be achieved by a linear transformation of the component values of an (R, G, B) color space. The luminance component corresponds to the Y color component of the (X, Y, Z) color space. The Chr1 and Chr2 chrominance components are as follows:
Chr1 = a1 (R − Y ) + b1 (B − Y ), (13) Chr2 = a2 (R − Y ) + b2 (B − Y ). The coefficients a1 , b1 , a2 , and b2 are specific to the norms used, standards, or commissions (e.g., NTSC, PAL, SECAM, SMPTE). Because the Y color component is evaluated by means of a linear transformation of the R, G, and B components, it is possible to express the conversion of an (R, G, B) color space [or (X, Y, Z) color space] to (Y, Chr1 , Chr2 ) color space by using a transformation matrix Q: R Y X (14) Chr1 = Q × G = Q × P × Y . Chr2 B Z Notice that the signal transmitted by TV channels are gamma corrected for different reasons (display, bandwidth, human eye response and so forth). Therefore, the conversion of an (R, G, B) color space to a television space is applied to gamma-corrected R, G, and B components, denoted R , G , and B . The use of the prime ( ) notation is extended to all color components that are gamma corrected. The main television spaces are as follows: • (Y , I , Q ) color space used by the NTSC television standard, where Y is the luminance component and I and Q are the chrominance components. The Q transformation matrix (listed in Table 3) is used to define the Y color component as: Y = 0.299 × R + 0.587 × G + 0.114 × B .
(15)
The chrominance components are defined by Eq. (13) with a1 = 0.74, b1 = −0.27, a2 = 0.48 and b2 = 0.41.
COLOR SPACES AND IMAGE SEGMENTATION
89
• (Y , U , V ) color space used by the EBU broadcasting standard where the chrominance components are U and V . To ensure compatibility among the different TV standards, the luminance component of the EBU standard is also defined by Eq. (15). The chrominance components are defined by Eq. (13) with a1 = 0, b1 = 0.493, a2 = 0.877 and b2 = 0. • (Y , Dr , Db ) color space used by the SECAM color TV broadcasting standard, where the chrominance components are Dr and Dr . The luminance component is also defined by Eq. (15), and the chrominance components are defined by Eq. (13) with a1 = −1.9, b1 = 0, a2 = 0 and b2 = 1.5. • (Y , Cb , Cr ) color space is an international standard for digital image and video coding. ITU proposes two recommendations: ◦ The ITU-R BT.601 recommendation for digital coding of standarddefinition television (SDTV) signals. According to this recommendation, the (Y , Cb , Cr ) color space is independent of the primaries and the reference white (ITU-R BT.601-7, 2007) and it is also used by video and image-compression schemes such as MPEG and JPEG. The luminance component is also defined by Eq. (15), and the chrominance components are defined by Eq. (13) with a1 = 0, b1 = 0.564, a2 = 0.713 and b2 = 0. ◦ The ITU-R BT.709 recommendation for digital coding of high-definition television (HDTV) signals (ITU-R BT.709-5, 2002) with a1 = 0, b1 = 0.534, a2 = 0.635 and b2 = 0. Table 4 lists some transformation matrices to convert a primary space to one of the main TV spaces. c. Antagonist Spaces. The antagonist spaces are based on the Hering color opponent theory in order to model the human visual system. According to this theory, the color information acquired by the eye is transmitted to the brain in three components: an achromatic component A and two chrominance components C1 and C2 . The A color component integrates the signals derived from the three types of cones of the human retina and represents a blackwhite opposition signal, whereas the C1 and C2 components integrate only the signals provided by different types of cones and correspond to green-red and yellow-blue oppositions, respectively. The (L∗ , u∗ , v ∗ ) and (L∗ , a ∗ , b∗ ) color spaces previously presented also can be considered as antagonist spaces because they share these same properties. Different antagonist spaces have been proposed for color image analysis. In 1976, Faugeras proposed a human visual system model in which A, C1 , and C2 color components are evaluated from three primaries, denoted [L], [M], and [S], which correspond to the three types of cones of the human retina (Faugeras, 1979). He proposed transformation matrices to convert an (R, G, B) color space to the (L, M, S) color space. For example,
90
BUSIN , VANDENBROUCKE AND MACAIRE TABLE 4 P RIMARY S PACE TO T ELEVISION S PACE C ONVERSION
Norm– Standard
Television space
Primary space
Reference white
NTSC
(Y , I , Q )
(RF , GF , BF )
C
PAL
(Y , U , V )
(RE , GE , BE )
D65
SECAM
(Y , Dr , Db )
(RE , GE , BE )
D65
ITU-R BT.601
(Y , Cb , Cr )
None
None
ITU-R BT.709
(Y , Cb , Cr )
(RI , GI , BI )
D65
Q transformation matrix ⎡ ⎤ 0.299 0.587 0.114 ⎣ 0.596 −0.274 −0.322 ⎦ ⎡ 0.212 −0.523 0.311 ⎤ 0.299 0.587 0.114 ⎣ −0.148 −0.289 0.437 ⎦ ⎤ ⎡ 0.615 −0.515 −0.100 0.299 0.587 0.114 ⎣ −0.450 −0.883 1.333 ⎦ ⎡ −1.333 1.116 0.217 ⎤ 0.299 0.587 0.114 ⎣ −0.169 −0.331 0.500 ⎦ ⎡ 0.500 −0.419 −0.081 ⎤ 0.222 0.715 0.072 ⎣ −0.115 −0.386 0.500 ⎦ 0.501 −0.455 −0.046
the conversion of an (R, G, B) color space used by a CRT monitor to the (L, M, S) color space with the C illuminant as reference white can be achieved by the following transformation matrix: 0.3634 0.6102 0.0264 P = 0.1246 0.8138 0.0616 . (16) 0.0009 0.0602 0.9389 A, C1 , and C2 color components are defined by: A = a α log(L) + β log(M) + γ log(S) , C1 = u1 log(L) − log(M) , C2 = u2 log(L) − log(S) .
(17) (18) (19)
These equations indicate that the cone response to a color stimulus is nonlinear. Therefore, Faugeras proposed to model this nonlinearity by using the logarithmic function, while the CIE method uses the cube root function [see Eq. (5)]. By adjusting the a, α, β, γ , u1 , and u2 parameters, Faugeras proposes different applications of the model. For example, he uses the following coefficients for color image analysis: a = 22.6, α = 0.612, β = 0.369, γ = 0.019, u1 = 64, and u2 = 10. In the framework of artificial vision, Garbay propose application of this space directly on the R, G, and B color components of an image acquisition
COLOR SPACES AND IMAGE SEGMENTATION
91
system (Chassery and Garbay, 1984; Garbay, Brugal and Choquet, 1981). The a, α, β, γ , u1 , and u2 parameters are defined by: 1 × log(R) + log(G) + log(B) , (20) 3 √ 3 C1 = × log(R) − log(G) , (21) 2 log(R) + log(G) . (22) C2 = log(B) − 2 Ballard and colleagues proposed the use of an antagonist space that does not take into account the nonlinearity of the human eye response (Ballard and Brown, 1982; Swain and Ballard, 1991). Thus, the equation of this space, denoted as (wb, rg, by), can be written with a transformation matrix: ⎡ 1 1 1⎤ 3 3 3 wb R √ ⎥ ⎢ √3 3 (23) rg = P × G , with P = ⎣ 2 − 2 0 ⎦. by B 1 1 −2 −2 1 A=
Other human visual system models have been applied in the color image analysis field, such as the (G1 , G2 , G3 ), (G∗1 , G∗2 , G∗3 ) or (H1 , H2 , H3 ) color spaces (Braquelaire and Brun, 1997; Robinson, 1977; Pratt, 1978). d. Other Luminance-Chrominance Spaces. Other luminance-chrominance spaces, which cannot be directly classified in the previous subfamilies of luminance-chrominance spaces, are applied in color image analysis. By studying the properties of different luminance-chrominance spaces, Lambert and Carron (1999) proposed a transformation matrix to convert an (R, G, B) color space to the luminance-chrominance space denoted (Y, Ch1 , Ch2 ) in which the luminance component is defined by the wb color component of Eq. (23), whereas the chrominance components are defined by the following: ⎡1 1 1 ⎤ 3 3 3 R Y ⎢ ⎥ (24) Ch1 = P × G , with P = ⎣ 1 − 12 − 12 ⎦ . √ √ Ch2 B 3 3 0 − 2 2 The Ch1 and Ch2 color components correspond to a cyan-red and green-blue color oppositions, respectively. By associating the r and g chromaticity coordinates with the luminance component defined by Eq. (23) and denoted here I , the (I, r, g) luminancechrominance space is defined. This space is often used in color image
92
BUSIN , VANDENBROUCKE AND MACAIRE
analysis (Nevatia, 1977; Ohta, Kanade and Sakai, 1980; Sarabi and Aggarwal, 1981). CIE proposed use of the (Y, x, y) color space, where Y represents the luminance component [see Eq. (15)] and x and y, which are the chromaticity coordinates of the (X, Y, Z) color space [see Eq. (3)], represent the chrominance. 3. Perceptual Spaces Humans do not directly perceive color as a combination of tristimulus values related to primary colors but according to more subjective entities related to luminosity, hue, and saturation. Therefore, it is natural that many color spaces quantify the color according to these perceptual data and are grouped here into the perceptual space family. This perceptual approach allows an adequate communication between human vision and machinery for describing colors. There exist many such color spaces presented with different notations, such ISH, HSB, HSL, HSV, LCH, LSH, and so on. These different notations correspond to the same color components, but the equations to determine them are different. Two kinds of perceptual spaces can be distinguished: • The polar (or cylindrical) coordinate spaces that correspond to expressions in polar coordinates of the luminance-chrominance components2 • The perceptual coordinate spaces that are directly evaluated from primary spaces. Perceptual spaces also can be considered as luminance-chrominance ones because they are consist of a luminance component and two chrominance components. a. Polar Coordinates Spaces. This family of color spaces is derived from the color spaces that separate color information into a luminance axis and a chrominance plane by transposition of Cartesian coordinates to polar coordinates. Let P be a point that represents a color in an (L, Chr1 , Chr2 ) luminance-chrominance space. This point is− defined by the coordinates of −−→ → −−−→ −−−→ the color vector OP in the reference (O, L, Chr1 , Chr2 ). Let P be the projection of P on the (O, Chr1 , Chr2 ) plane along the L axis and let O be the orthogonal projection P−→on the L axis. By construction, the norm −−−→ of−− −−−→ and the −−−→orientation of O P and OP vectors are equal as well as for the OO and P P vectors. Thus it is possible to locate the P point via the norm of the −−−→ −−−→ −−−→ OP vector, the angle between the OP and Chr1 vectors, −−−→ and its coordinate along the L axis, which is equal to the norm of the OO vector. The three components so obtained are denoted L, C, and H and constitute the (L, C, H ) polar coordinate space (see Figure 9). 2 These color spaces are also named spherical color transform (SCT).
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 9.
93
Polar coordinate space.
The first component of a polar coordinate space represents the luminance component, L, which is identical to the first color component of the corresponding luminance-chrominance space. −−−→ The norm of the OP vector represents the chroma, C, which is defined by −−−→ C = OP = Chr21 + Chr22 . (25) The chroma corresponds to the distance between the P point and the luminance axis. −−−→ −−−→ The angle of the OP vector with the Chr1 vector represents the hue, H . In order to obtain hue values ranging between 0 and 2π , the evaluation of H must satisfy these conditions: −−− → −−−→ with H = Chr 1 , OP ⎧ if Chr1 > 0 and Chr2 0 then 0 H < π , ⎪ ⎪ ⎨ if Chr1 0 and Chr2 > 0 then π H < 2π, 2 (26) if Chr1 < 0 and Chr2 0 then π H < 3π ⎪ 2 , ⎪ ⎩ if Chr1 0 and Chr2 < 0 then 3π 2 H < 2π.
94
BUSIN , VANDENBROUCKE AND MACAIRE
The trigonometric functions are used to evaluated H . For example, H can be defined by: Chr2 . (27) H = arctan Chr1 This equation gives an angle belonging in the interval [−π/2, π/2]. In order to obtain an angle belonging in the interval [0, 2π ] and satisfying Eq. (26), it is necessary to first apply an operation on Chr2 when Chr1 < 0: if Chr2 0 then H = π + H , else H = π − H . This first operation gives an angle belonging in the interval [−π, π ]. So the following second operation is applied: H = H + 2π , if H < 0. By definition, it is possible to construct an (L, C, H ) color space from any of the luminance-chrominance spaces. For example, by using Eqs. (25) and (27), CIE defines the components of two (L, C, H ) polar coordinate spaces from the perceptually uniform spaces (L∗ , u∗ , v ∗ ) and (L∗ , a ∗ , b∗ ), ∗ , h ) and (L∗ , C ∗ , h ), respectively. These spaces are denoted (L∗uv , Cuv uv ab ab ab respectively (Commission Internationale de l’Éclairage, 1986). The components of these color spaces are used to evaluate color differences (Commission Internationale de l’Éclairage, 1986, 1995). b. Perceptual Coordinate Spaces. The perceptual coordinate spaces are evaluated directly from a primary space and represent the subjective entities of the color human perception in terms of intensity (I ), saturation (S), and hue (T ). The intensity I corresponds to a luminance component, and the saturation S is related to a chroma component expressed by the relation S = C / L. The perceptual coordinate spaces are denoted here as (I, S, T ) to differentiate them from the polar coordinate spaces. This part presents the most widely used perceptual coordinate spaces (Shih, 1995): • The triangle model • The hexcone model • The double hexcone model. To propose a perceptual coordinate space that takes into account most of the models, Levkowitz and Herman (1993) proposed a generalized lightness, hue, and saturation (GLHS) color model. i. Triangle Model. The triangle model corresponds to the expression of the I , S, and T color components in the RGB color cube (Kender, 1976). In the representation of the RGB color cube (see Figure 4), the achromatic axis corresponds to the intensity axis in the (I, S, T ) color space. A point P , whose coordinates are the values of the R, G, and B color components, is located on a plane perpendicular to the achromatic component. The intersections of this
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 10.
95
Triangle model. (See Color Insert.)
plane with the red, green, and blue axes are the summits of a triangle (α, β, γ ), which is proportional to the Maxwell triangle (Figure 10). On the (α, β, γ ) triangle, it is possible to evaluate the saturation and hue components. Different methods have been proposed, and many formulations of the (I, S, T ) color spaces exist. Some of them are presented here. the achromatic axis. The I Let O be the orthogonal projection of P on−− −→ color component corresponds to the norm of the OO vector. To maximize its value to the unity when R = G = B = 1, the I color component is evaluated by the equation: 1 −−−→ R + G + B . (28) I = √ OO = 3 3 To simplify this calculation, a nonweighted formulation of the previous equation is often used: I = R + G + B.
(29)
−−−→ The saturation corresponds to the ratio between the norm of the O P vector
(distance between the P point and the achromatic axis) and the norm of the
96
BUSIN , VANDENBROUCKE AND MACAIRE
−−−→ OO
vector, which represents the intensity component. The calculation of the S color component is by √ −−−→ R 2 + G2 + B 2 − RG − GB − RB O P √ . (30) S = −−−→ = 2 × R+G+B OO
In this formulation, colors with equal saturation values are located on a circular base cone centered on the achromatic axis and with O as summit. The saturation value reaches its maximal value only for the three primary colors. In order to maximize the saturation value for all points belonging to the boundaries of the (α, β, γ ) triangle (like those corresponding to complementary colors), the saturation component is evaluated by the minimum distance between the P point and one boundary of this triangle: 3 × min(R, G, B) . (31) R+G+B Different formulations of this equation exist and it is important to show their relations. Thus, another similar equation can be used to define the S color component: S =1−
S = 1 − 3 × min(r, g, b).
(32)
This equation uses the r, g, and b chromaticity coordinates and corresponds to the evaluation of the saturation component directly in the Maxwell triangle. The saturation component can be expressed in function of the intensity component of Eq. (29): 3 × min(R, G, B) . (33) I −−−→ The hue component corresponds to the orientation of the O P vector. Let M be the intersection point between the achromatic axis and the Maxwell triangle, and let P be the projection of the P point on this triangle along the achromatic axis. Generally, the axis defined by the M point and the Red point [coordinates (1, 0, 0)] − in−−→ the RGB color cube is the reference axis to define the orientation of the O P vector and to specify the value of the hue component. So −−−−− −−→ −−−→ MP T = M Rouge, S =1−
and the value of the red color is set to 0. The hue component can be evaluated via trigonometric relations. For example, the following equation defines the T color component: √ 3(G − B) T = arctan . (34) 2R − G − B
COLOR SPACES AND IMAGE SEGMENTATION
97
To obtain hue values included between 0 and 2π , it is necessary to test the sign of the numerator and the denominator. Another analog trigonometric equation can be used to define the T color component: 1 ⎧ [(R−G)+(R−B)] ⎪ if B G, ⎨ arccos √ 2 (R−G)2 +(R−B)(G−B) T = (35) 1 ⎪ ⎩ 2π − arccos √ 2 [(R−G)+(R−B)] if B > G. 2 (R−G) +(R−B)(G−B)
With this formulation, it is necessary to test if B > G in order to consider only those angles included between π and 2π . By using the chromaticity coordinates, Eq. (35) can be written as ⎧ 2r−g−b ⎪ arccos if b g, ⎪ 2 2 2 ⎨ T =
6×[(r− 13 ) +(g− 31 ) +(b− 31 ) ]
⎪ ⎪ ⎩ 2π − arccos
2r−g−b 2
2
2
6×[(r− 13 ) +(g− 31 ) +(b− 31 ) ]
(36) if b > g.
All these formulas correspond to the triangle model and can be associated to constitute the (I, S, T ) color space. ii. Hexcone Model. Each point P , whose coordinates are the values of the R, G, and B color components, belongs to a face of a color cube whose summit O corresponds to the maximum level of the R, G, and B components. The projection of the points of this color cube along the achromatic axis on the plane perpendicular to this axis and joining the O point constitutes a hexagonal closed area whose summits are the projections of the summits of the so defined color cube and whose center is the O point. In this model, which is represented by Figure 11, it is thus possible to define the I , S, and T color components. The intensity component, known by the term value and denoted V , is represented by this achromatic point and is expressed as 1 −−−→ I = V = √ OO = max(R, G, B). (37) 3 The projection on the plane perpendicular to the achromatic axis and joining the summit O of the so constructed color cube defines a hexagon in which is located the P point, projection of P . The saturation component corresponds to the length of the O P segment divided by the maximal length for a same hue. It is expressed by Eq. (33): S=
V − min(R, G, B) . V
(38)
98
BUSIN , VANDENBROUCKE AND MACAIRE
F IGURE 11.
Hexcone model. (See Color Insert.)
In the case when V = 0, the saturation components cannot be defined (hence S = 0). By construction, the White point is also the projection of the Black point. −−−−−−−−−→ −−−−−−−−→ The projection of the Black Red vector is thus the White Red vector. Let M be the projection of P on the plane perpendicular to the achromatic axis and joining the White point. The hue component is defined as the angle between −−−−−−−−−→ −−−−−−−→ the White M vector and the White Red vector. Therefore, −−−−−−−− −→ −−−−−−−→ , White M T = White Red and the T color component is evaluated, for all S = 0, by ⎧ G−B ⎪ if V = R, ⎪ ⎨ V −min(R,G,B) B−R if V = G, T = 2 + V −min(R,G,B) ⎪ ⎪ R−G ⎩4 + if V = B.
(39)
V −min(R,G,B)
If S = 0 (when R = G = B), then the hue component is not defined. Moreover, if V = R and min(R, G, B) = G, then T is negative. The hexcone model is defined by Eqs. (37), (38), and (39) (Foley, Dam and Feiner, 1990; Marcu and Abe, 1995; Shih, 1995).
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 12.
99
Double hexcone model. (See Color Insert.)
iii. Double Hexcone Model. This model is based on the same principle as the previous model, except that the projections of the subcube are achieved on each side of the plane perpendicular to achromatic axis and joining the middle of this axis (Foley, Dam and Feiner, 1990; Marcu and Abe, 1995; Shih, 1995). Figure 12 shows a representation of the double hexcone model. If min = min(R, G, B) and max = max(R, G, B), the intensity component is expressed as max + min . (40) 2 Let Imax be the maximum value of the I color component. The saturation component is defined, for all I = 0, by I=
S=
max−min max+min max−min 2×Imax −max−min
if I if I >
Imax 2 , Imax 2 .
(41)
100
BUSIN , VANDENBROUCKE AND MACAIRE
The saturation component is equal to 0 (S = 0) if I = 0 and the hue component is not defined if S = 0 (when R = G = B). Otherwise, the relations that allow processing the T color component are the same as those of the above hexcone color model [see Eq. (39)]. ∗ , h ) Color Space. This space was defined by the CIE at iv. (L∗uv , Suv uv ∗ , h ) and (L∗ , C ∗ , h ) color spaces were the same time that the (L∗uv , Cuv uv ab ab ab ∗ ∗ ∗ defined. In the (L , u , v ) color space, CIE defines the saturation component as the ratio ∗ Suv =
∗ Cuv . L∗uv
(42)
The components L∗uv and huv are defined by Eqs. (5) and (25), respectively (Commission Internationale de l’Éclairage, 1986). 4. Independent Axis Spaces Because the color components of an (R, G, B) color space are strongly correlated, they share common luminance information (Lee, Chang and Kim, 1994; Ohta, Kanade and Sakai, 1980). Therefore, many authors attempt to determine color spaces whose components are independent (i.e., components that share different noncorrelated and nonredundant informations). A general solution consists in applying a Karhunen–Loeve transformation or a principal component analysis (PCA) to the components of a color space. a. Principal Component Analysis. Principal component analysis is a data analysis method. Its goal is to analyze a set of quantitative data represented in a multidimensional space to obtain a representation space with (eventually) a reduced dimension whose components, named principal components, are uncorrelated and share different informations. When the data are the values of the R, G, and B color components of pixels of an image, the principal components analysis provides a color space whose color components are as uncorrelated as possible and can be independently considered. For this purpose, the set of color vectors within a color image is characterized by its diagonalized covariance matrix. The eigenvectors (denoted wi of this matrix) are determined and the principal components (denoted Xi ) are processed by the relation: Xi = wi [R G B]T . The Karhunen–Loeve transformation is used to apply this relation to each of the Xi components. Thus, this linear transformation consists of projecting the data into another space. The maximal eigenvalue λi corresponds to the first principal component X1 —the component that mainly represents the image. The components
COLOR SPACES AND IMAGE SEGMENTATION
101
are thus ordered with respect to their decreasing discriminant powers (their eigenvalues λi ). This transformation, which also allows reduction of the dimension of the color space, is principally applied from an (R, G, B) color space but can be derived from any of color spaces (Savoji and Burge, 1985; Tominaga, 1992). The drawback of the PCA-based methods is that it depends on the statistical properties of a data set. In color image analysis, the application of PCA on each analyzed color image is time consuming. To reduce computation time, several authors determined independent axis spaces that approximate the Karhunen–Loeve transformation by applying PCA on several different sets of images, such as (I 1, I 2, I 3), (P 1, P 2, P 3), (I, J, K), or (i1new , i2new , i3new ) color spaces (Ohta, Kanade and Sakai, 1980; Philipp and Rath, 2002; Robinson, 1977; Verikas, Malmqvist and Bergman, 1997). In the framework of natural color image analysis, an experiment led by Ohta et al. in 1980 on eight different images allows determination of a color space based on the Karhunen–Loeve transformation (Ohta, Kanade and Sakai, 1980). These authors proposed segmenting color images into regions by a recursive thresholding method and applying a PCA at each iteration step of the algorithm. They showed that there exists a single transformation that allows efficient conversion of the (R, G, B) color space to a space denoted (I 1, I 2, I 3). This transformation can be defined by this transformation matrix: ⎡ 1 1 1 ⎤ 3 3 3 I1 R (43) I 2 = P × G , with P = ⎣ 21 0 − 12 ⎦ . I3 B 1 1 1 − − 4
2
4
The first component of the (I 1, I 2, I 3) color space is the most discriminating one and represents a luminance component because it verifies Eq. (28). The two other color components represent the blue-red and magenta-green color oppositions, respectively. These two color components are less discriminating than the first one because their corresponding eigenvalues are low and the third component is not always used by color image analysis. Thus, the (I 1, I 2, I 3) color space can also be considered a luminance-chrominance space. b. Independent Component Analysis. Another solution to obtain an independent axis space consists of transforming the R, G, and B color components of the pixels to independent components via an independent component analysis method. This method allows determination of color spaces whose components are statistically independent but where information is dispersed on all components, whereas the first component deduced by PCA contains the maximum information (Lee and Lewicki, 2002).
102
BUSIN , VANDENBROUCKE AND MACAIRE
5. Hybrid Color Space The analysis of the colors of the pixels in a color space is not restricted to the (R, G, B) color space. Indeed, the previous text shows that there exist a large number of color spaces that can be used to represent the colors of the pixels (Sharma and Trussell, 1997). They are supported by specific physical, physiologic, and psychovisual properties (Sangwine and Horne, 1998). The multitude and the diversity of available color spaces requires classifying them into categories according to their definitions (Poynton, 1995; Tkalcic and Tasic, 2003). We propose grouping the most classical color spaces into four principal families that are further divided into subfamilies. Figure 13 distinguishes these four main families by four main rectangles: • • • •
Primary spaces Luminance-chrominance spaces Perceptual space Independent axis spaces.
The dotted rectangles within the main rectangles correspond to subfamilies of color spaces. In general, color images are acquired through the (R, G, B) color image acquisition space. Therefore, all color spaces are defined by an equation whose inputs are the R, G, and B color components. Figure 13 shows how a color space is determined by following the arrows starting from the (R, G, B) color space. We have compiled a nonexhaustive list of the most widely used color spaces by image analysis applications. Authors sometimes propose color spaces that are specific to their applications. For example, CIE presently proposes a color appearance model (CIECAM) that attempts to consider all the visual phenomena interfering on the color human perception (Commission Internationale de l’Éclairage, 2004). This empirical model defines the color of a surface by taking into account its neighborhood. In the color image analysis framework, the use of such a color model leads to considering the neighborhood of the pixels to evaluate color components. On the one hand, when considering the multitude of available color spaces, it is difficult to select a specific color space adapted to all the color image analysis applications. On the other hand, the performance of an image-processing procedure depends on the choice of the color space. Many authors have tried to determine the color spaces that are well suited for their specific color image segmentation problems. Authors provide contradictory conclusions about the relevance of the available color spaces in the context of color image analysis; it is thus easy to conclude that, there currently does not exist any classical color space that provides satisfying results for the analysis of all types of color images.
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 13.
Color space families. (See Color Insert.)
103
104
BUSIN , VANDENBROUCKE AND MACAIRE
Instead of searching for the best classical color space for color image analysis, Vandenbroucke, Macaire, and Postaire proposed an original approach to improve the results of image processing. They defined a new kind of color space by selecting a set of color components that can belong to any of the different classical color spaces listed in Figure 13 (Vandenbroucke, Macaire and Postaire, 1998). These spaces, which have neither psychovisual nor physical color significance, are called hybrid color spaces and can be automatically selected by means of an iterative feature selection procedure operating with supervised learning scheme (Vandenbroucke, Macaire and Postaire, 2003). C. Digital Color Images and Color Spaces In the previous section, the most classical color spaces were presented and classified into different families. However, the use of these color spaces in color image analysis requires some precautions. Indeed, when a color image is converted to an image where colors are represented in another color space, their values belong to different value domains in function of the applied transformation. In analyzing the color distribution of an image in a color space (e.g., Euclidean distance comparison, histogram evaluation, memory storage, image coding, color display), the values of the transformed color components often must belong to the same value domain as that in the original color space. To satisfy this constraint, it is necessary to code each color space without changing its intrinsic properties (Vandenbroucke, Macaire and Postaire, 2000a). This last condition proposes a color space-coding scheme applied to the different families of color spaces as presented by Figure 13. 1. Color Space Coding for Image Analysis The coding of the color spaces consists of rounding, scaling, and normalizing the values of their components in order to process values that range between the unsigned integer values 0 and 255. We propose a color space-coding scheme that preserves the properties of each of the color spaces. a. Independent and Dependent Coding. The components of all color spaces are defined by one or several successive color transformations of the R, G, and B components. Generally, each of these three components is coded on 8 bits and are quantized with 256 different unsigned integer values belonging to the interval [0, 255]. Let us denote T1 , T2 , and T3 , the transformed components of any color spaces obtained from the R, G, and B components by a set of color
COLOR SPACES AND IMAGE SEGMENTATION
F IGURE 14.
105
Color space coding.
transformations T. In most of cases, transformed color components assume signed floating values that do not range between 0 and 255. For several color image analysis applications, the transformed values of the color components must be comparable to those represented by the original color space. That means that a specific coding scheme is necessary. Let T1 , T2 and T3 , be the coded components provided by a transformation, denoted C (Figure 14). The space-coding scheme of a transformed color space (T1 , T2 , T3 ) is divided into different successive steps as follows: • Shifting the color component values so that they are unsigned. This operation requires knowing the minimal value of each component of the considered color space. • Normalizing the shifted component values so that they range between 0 and 255 (for 8 bit coding). This operation requires knowing the range of each component of the considered color space (i.e., their minimal and maximal values). • Rounding the normalized shifted component values to obtain integer values. These values are rounded to the nearest integer. In order to achieve the coding of the transformed color components T1 , T2 , and T3 , it is necessary to determine the extreme values of their ranges. Let us denote mk and Mk (k ∈ {1, 2, 3}), the minimum and maximum values of the transformed color component Tk , respectively, so that k = Mk − mk represents the range of the transformed color component Tk . In order to adjust the width of this range to 255, the transformed color component Tk can be coded independently of the two other components as 255 × (Tk − mk ). (44) k According to the properties of a color space, it is sometimes necessary to use a dependent coding scheme so that the range of at least one color component is equal to 255. Let max denote the larger range of the three components of a color space, defined as Tk =
max = max (k ). Tk
(45)
106
BUSIN , VANDENBROUCKE AND MACAIRE
The dependent coding of the components of a color space is expressed as 255 × (Tk − mk ). (46) max The dependent coding scheme achieves an equal scaling process for each component of a color space. Therefore, the relative position of colors in the space defined by this coding scheme is not modified. Furthermore, the Euclidean distances between colors are preserved by such a coding scheme. Tk =
2. Application to Color Spaces In order to preserve their properties, we apply one of the two above-defined coding schemes to the color spaces of the four families described by Figure 13. a. Primary Spaces. The transformation of an (R, G, B) color space to any other primary space is a linear transformation that depends on the choice of the primaries and the reference white. For instance, the transformation from the NTSC (RF , GF , BF ) color space to the CIE (X, Y, Z) color space with the C illuminant is defined via a transformation matrix [X Y Z]T = T [RF GF BF ]T (see Table 3), where 0.607 0.174 0.200 T = 0.299 0.587 0.114 . (47) 0.000 0.066 1.116 The chromaticity coordinates of the C illuminant are xn = 0.310 and yn = 0.316, and those of the primaries RF , GF , BF are xr = 0.670, yr = 0.330, xg = 0.210, yg = 0.710, and xb = 0.140 and yb = 0.080, respectively. Table 5 lists the coding parameters of the (X, Y, Z) color space. Ohta, Kanade and Sakai (1980) proposed coding this color space by means of an independent coding scheme associated with a matrix Cind so that [T1 T2 T3 ]T = Cind T [RF GF BF ]T with 0.618 0.177 0.205 Cind T = 0.299 0.587 0.114 . (48) 0.000 0.056 0.944 TABLE 5 X, Y AND Z C OMPONENTS E XTREMES VALUES
mk Mk
X
Y
Z
0 250.16
0 255.00
0 301.41
COLOR SPACES AND IMAGE SEGMENTATION
107
With such a coding scheme, the evaluated chromaticity coordinates are xn = 0.333, yn = 0.333, xr = 0.674, yr = 0.326, xg = 0.224, yg = 0.743, and xb = 0.162 and yb = 0.090. They do not correspond to the coordinates that are computed by the matrix T. Since the position of the C illuminant in the chromaticity diagram has changed with the independent coding scheme, we can conclude that the definition of the (X, Y, Z) color space from the primary space NTSC is modified by such a coding scheme. By extending the case of this illuminant to all the available colors, we conclude that the spectrum locus contained by the (x, y) chromaticity diagram is distorted by the independent coding scheme. Conversely, the dependent coding scheme provides a matrix Cdep so that: 0.514 0.147 0.169 Cdep T = 0.253 0.497 0.096 . (49) 0.000 0.056 0.944 The dependent coding scheme does not modify the actual chromaticity coordinates. This example can be generalized to the other primary spaces. Thus, in order to preserve the colorimetric properties of a primary space, we propose to achieve a dependent coding for each of its components. b. Luminance-Chrominance Spaces. The color spaces of this family are defined by linear or nonlinear transformations. The color spaces obtained through a linear transformation can be considered as primary spaces and must be coded with a dependent coding scheme for the same reasons as the primary spaces. The color components of the (L∗ , u∗ , v ∗ ) color space are defined by nonlinear transformations [see Eqs. (5)–(7)]. By applying an independent coding to the components of this space, the ellipses of MacAdam are distorted so that the color distances that are evaluated in this color space do not correspond to the color differences perceived by a human observer. To preserve the shape of MacAdam ellipses and to preserve an adequacy between the Euclidean distance and the visual perception, we propose application of a dependent coding scheme to the perceptually uniform spaces. To illustrate this phenomenon, an ellipse is represented in the (u∗ , v ∗ ) chromaticity diagram of Figure 15a before applying a coding scheme. In this figure, the two distances denoted D1 and D2 are equal. MacAdam rules assume that there are no perceptible differences between the corresponding colors for the human eye. By applying an independent coding scheme to the (L∗ , u∗ , v ∗ ) color space (Figure 15b), these properties are modified because D1 = D2 , whereas with a dependent coding scheme, these properties are always satisfied because D1 = D2 (Figure 15c). We extend this dependent coding scheme to all the luminance-chrominance spaces obtained through nonlinear transformations.
108
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c)
F IGURE 15. Application of the independent and dependent codings to an ellipse represented in the (u∗ , v ∗ ) chromaticity diagram. (a) Ellipse in the (u∗ , v ∗ ) chromaticity diagram. (b) Independent coding. (c) Dependent coding.
COLOR SPACES AND IMAGE SEGMENTATION
109
c. Perceptual Spaces. Since the components of the perceptual spaces represent three subjective entities to qualify a color, they can be considered independently. Furthermore, in order to compare two colors, the Euclidean distance is meaningless since the hue component is periodical. Therefore, we propose applying an independent coding scheme to the perceptual spaces. d. Independent Axis Spaces. Because the components of the independent axis spaces are determined by linear transformations, they can be considered primary spaces so they must be coded with a dependent coding scheme. e. Hybrid Color Spaces. The components of a hybrid color space are selected among all the components of several color spaces. Thus, this color space has neither psychovisual nor physical color significance and dependent color coding is not justified. That means that the hybrid color spaces must be coded by means of an independent color-coding scheme. D. Summary This first text section which dealt with the representation of the colors characterizing the pixels in a color space, underscoring two important points: • In order to use a color space, it is important to know the acquisition conditions (i.e., to specify the reference white, the lighting system, and the camera or scanner parameters, such as gain, offset, and gamma corrections). • A color calibration of the color image acquisition device is recommended so that the colors of the objects observed by an image sensor are represented correctly in the acquired color images. The second part of this section presented the most commonly used color spaces for color image analysis. Each color space is characterized by its physical, physiologic or psychological properties, but most of these color spaces are not initially developed for color image analysis applications. Therefore, their exploitation by a digital color image acquisition device require that some conditions be satisfied. The last part of this section shows that a specific coding scheme must be applied on these color spaces to correctly use them for color image analysis without modifying their intrinsic properties. The next section of this chapter shows how the color spaces can be exploited by color image segmentation algorithms, and the last section studies the impact of a choice of a color space on the results provided by these algorithms.
110
BUSIN , VANDENBROUCKE AND MACAIRE
III. C OLOR I MAGE S EGMENTATION A. Introduction Generally, we assume that the different colors that are present in an image correspond mainly to different properties of the surfaces of the observed objects. The segmentation procedures analyze the colors of pixels in order to distinguish the different objects that constitute the scene observed by a color sensor or camera. It is a process of partitioning an image into disjoint regions (i.e., into subsets of connected pixels that share similar color properties). The segmentation of the image denoted I regroups connected pixels with similar colors into NR regions Ri , i = 1, . . . , NR (Zucker, 1976). The pixels of each region must respect homogeneity and connectedness conditions. The homogeneity of a region Ri is defined by a uniformity-based predicate, denoted Pred(Ri ), which is true if the colors of Ri are homogeneous and false on the opposite. The regions must respect the following conditions: ! • I = i=1,...,NR Ri : each pixel must be assigned to one single region and the set of all the regions must correspond to the image. • Ri contains only connected pixels ∀i = 1, . . . , NR : a region is defined as a subset of connected pixels. • Pred(Ri ) is true ∀i = 1, . . . , NR : each region must respect the uniformitybased predicate. • Pred(Ri ∪ Rj ) = false ∀i = j , Ri and Rj being adjacent in I: two adjacent regions do not respect the predicate. The result of the segmentation is an image in which each pixel is associated with a label corresponding to a region. Segmentation schemes can be divided into two primary approaches with respect to the used predicate (Cheng et al., 2001). The first approach assumes that adjacent regions representing different objects present local discontinuities of colors at their boundaries. (Section III.B describes the edge detection methods deduced from this assumption.) The second approach assumes that a region is a subset of connected pixels that share similar color properties. The methods associated with this assumption are called region construction methods and look for subsets of connected pixels whose colors are homogeneous. These techniques can be categorized into two main classes—whether the distribution of the pixel colors is analyzed in the image plane or in the color space. The spatial analysis described in Section III.C is based on a region growing or merging process.
COLOR SPACES AND IMAGE SEGMENTATION
111
The analysis in the color space takes advantage of the characterization of each pixel P of an image I by its color point I(P ) in this space. Since the pixels are represented by points in a 3D color space, this approach assumes that homogeneous regions in the image plane give rise to clusters of color points in the color space. Each cluster corresponds to a class of pixels that share similar color properties. These clusters generally are identified by means of an analysis of the color histogram or a cluster analysis procedure and are mapped back to the original image plane to produce the segmentation (see Section III.D). In this chapter, we do not consider the physically based segmentation techniques that use the explicit assumptions about the physics that create the images. Klinker, Shafer and Kanade (1990) use a model of the formation of the color, called the dichromatic reflection model order to segment a color image. This model contains several rigid assumptions, such as the illumination conditions and the type of observed materials. For most real scenes, these assumptions cannot always be justified. Therefore, the proposed model can be used only for a restricted number of images, such as the scenes observed within a controlled environment. For illustration purposes, we apply segmentation schemes to the synthetic image of Figure 16a, which contains six different regions with different colors and shapes (a brown background, a small yellow square, a orange large square, a purple patch, and two concentric green disks). This image, with color coded in the (R, G, B) color space, is corrupted by an uncorrelated Gaussian noise with a standard deviation σ = 5, which is independently added to each of the three color components. To show the influence of the chosen color space on the segmentation results, the colors of the pixels are represented by some of the most widely known color spaces: the (R, G, B) color space (Figure 16a), the (Y , U , V ) color space defined by Section II.B.2.b (Figure 16b), and the triangular (I, S, T ) color space defined by Section II.B.3.b (Figure 16c). B. Edge Detection 1. Overview Edge detection consists of detecting local discontinuities of colors. The result of the edge pixel detection procedure is a binary image composed of edge and nonedge pixels. Cumani proposed detecting the zero-crossings of second-order derivative filters (Cumani, 1991). However, this approach is very sensitive to the presence of noise in the images, why is the reason several authors apply first-order derivative filters (gradient) to detect edges. Locations of gradient maxima that are higher than a threshold generate edge pixels.
112
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c) F IGURE 16. Synthetic image whose color is coded in different color spaces. (a) Color synthetic (R, G, B) image. (b) Image of (a) coded in the (Y , U , V ) color space. (c) Image of (a) coded in the (I, S, T ) color space. (See Color Insert.)
The color edge detection methods based on gradient analysis can be divided into three families according to the analyses of the color component images denoted Ik , k = R, G, B (i.e., the monochromatic images where pixels are characterized by the levels of one single color component): 1. The methods that fuse edge binary images obtained by the analyses of different component images (Rosenfeld and Kak, 1981) (Figure 17a). 2. The methods that independently analyze each color component image in order to process three marginal gradient vectors. These three vectors are combined to provide the norm of a gradient (Lambert and Carron, 1999) (Figure 17b). 3. The methods that process a color gradient vector from three gradient vectors computed in each of the three color component images. The edge pixels are detected by an analysis of a color vector gradient (Di Zenzo, 1986; Lee and Cok, 1991) (Figure 17c). Since these methods provide the
COLOR SPACES AND IMAGE SEGMENTATION
113
(a)
(b)
(c) F IGURE 17. Different approaches for color edge detection. (a) Analysis of edge binary images. (b) Analysis of the norm of a color gradient. (c) Analysis of a color gradient vector. (See Color Insert.)
114
BUSIN , VANDENBROUCKE AND MACAIRE
best results in terms of edge detection (Cheng et al., 2001), we describe this method, which is the most widely used method. 2. Edge Detection by the Analysis of a Color Gradient Vector In order to detect color edge pixels, the color gradient magnitude and direction can be determined by Di Zenzo’s algorithm, which computes the first-order differential operators (Di Zenzo, 1986). The color gradient detection consists in determining, at each pixel, the spatial direction denoted θmax along which the local variations of the color components are the highest. The absolute value of this maximum variation corresponds to the gradient module, which is evaluated from the horizontal and vertical first derivatives of each color component. Let us denote IR (x, y), IG (x, y), and IB (x, y) as the three (R, G, B) color components of a pixel P (x, y) with spatial coordinates (x, y) in the color image I and θ the gradient direction. Let F be a variational function expressed as F (θ) = p cos2 θ + q sin2 θ + 2t sin θ cos θ,
(50)
where p is the squared module of the horizontal first-order partial derivatives, q corresponds to the squared module of the vertical first-order partial derivatives, and t is the mixed-squared module of the horizontal and vertical first-order partial derivatives: R G B ∂I (x, y) 2 ∂I (x, y) 2 ∂I (x, y) 2 p= + + , (51) ∂x ∂x ∂x R G B ∂I (x, y) 2 ∂I (x, y) 2 ∂I (x, y) 2 q= + + , (52) ∂y ∂y ∂y and ∂IR (x, y) ∂IR (x, y) ∂IG (x, y) ∂IG (x, y) ∂IB (x, y) ∂IB (x, y) + + . ∂x ∂y ∂x ∂y ∂x ∂y (53) The direction of the color gradient that maximizes the function F (θ) is given by: t=
2t 1 arctan , (54) 2 p−q √ and the gradient magnitude is therefore equal to F (θmax ). This maximum variation is evaluated from the first horizontal and vertical derivatives of each color component [see Eq. (50)]. For this purpose, Deriche proposes to apply a recursive filter as a differential operator (Deriche, 1990). θmax =
COLOR SPACES AND IMAGE SEGMENTATION
115
This optimal filter is well suited for edge detection in noisy images (Stoclin, Duvieubourg and Cabestaing, 1997). The behavior of this filter is governed by a parameter α. It is adjusted according to the expected filter performance, which is evaluated in terms of detection and localization. In order to extract well-connected edge pixels, detection of all the spatial color variations is more important than the accuracy of their localizations. The parameter α is then adjusted by the analyst so that the trade-off between detection and localization favors detection. To extract thin edges from the gradient image obtained by this process, the edge pixels whose magnitude are local maxima along the gradient direction are retained. Then, an hysteresis thresholding scheme, with a low threshold Thl and a high threshold Thh , provides a binary edge image. These parameters are adjusted by the analyst so that a maximum of pertinent edge pixels are detected. To illustrate the behavior of Di Zenzo’s method, we propose its applications to the synthetic images in Figure 16. In the images of Figure 18, the detected edge pixels are represented by a white overlay. Different tests show that this approach is not very sensitive to the adjustment of the thresholds used by the hysteresis scheme on these images. The three images in Figure 18 show that the edge detection using the same parameter values provides different results with respect to the color space used. Furthermore, one limitation of such an approach is the connectivity between the detected edge pixels (Ultré, Macaire and Postaire, 1996). Indeed, the region extraction is performed by a chaining scheme of edge pixels. This scheme assumes that the detected edge pixels that represent the boundary of a region are well connected (Rosenfeld and Kak, 1981). A postprocessing edge closing step has to be achieved when the edge pixels are not connected, which is why many authors prefer approaches based on region construction in order to segment color images. C. Region Construction Based on a Spatial Analysis These methods, which analyze the image plane to construct the regions in the image, can be divided into two approaches: the region growing method and the region merging method. 1. Segmentation by Region Growing A region growing method consists of sequentially merging neighboring pixels of similar colors starting from initial seeds (Trémeau and Borel, 1997). Regions are expanded as far as possible with respective aggregating conditions.
116
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c) F IGURE 18. Edge pixels of the images in Figure 16 detected by Di Zenzo’s method. The parameters α, Thh , and Thl are set to 1.0, 10, and 5, respectively. (a) (R, G, B) color space. (b) (Y , U , V ) color space. (c) Triangular (I, S, T ) color space. (See Color Insert.)
The region growing method scans the image in some predetermined manner such as left-right top-bottom. The first analyzed pixel constitutes an initial region, named seed, and a next-neighboring pixel is examined. The color of this pixel is compared with the color of the already (but not necessarily) completed neighboring regions. If its color and the color of a neighboring region in progress are close enough (i.e., the Euclidean distance separating the colors is lower than a threshold), then the pixel is added to this region and the color of this region is updated. If there are several regions in progress whose colors are close enough, then the pixel is added to the region in progress whose color is the closest. However, when the colors of the two adjacent regions in progress are close enough, the two regions are merged and the pixel is added to the merged region. When the color of the examined pixel and that of any neighboring region are not close enough, a new region in progress is created (Moghaddamzadeh and Bourbakis, 1997).
COLOR SPACES AND IMAGE SEGMENTATION
117
Trémeau and Borel proposed three criteria to compare the color of a pixel with that of the region in progress (Trémeau and Borel, 1997). Let us denote Ri , the subset of pixels that constitutes the region in progress, P the analyzed pixel, N(P ) the set of neighboring pixels of P , and I(P ) the color of the pixel P . We denote with the same manner the color of a pixel P with (x, y) spatial coordinates by I(P ) and I(x, y). The pixel P merges with one of its neighboring pixels Q that also belongs to Ri : (Q ∈ N(P ) and Q ∈ Ri ), if the three following aggregating conditions are respected: • I(P ) − I(Q) < T1 • I(P ) − μN (P ) < T2 • I(P ) − μRi < T3 , where the thresholds Tj are adjusted by the analyst, is the Euclidean distance, μN (P ) represents the mean color of the pixels Q that belong to N(P ), and Ri , and μRi represents the mean color of the region Ri in progress. This scheme grows the region in progress until the maximum of the differences between one color component of the region and that of the examined pixel is higher than 10 (Figure 19). The first pixel to be analyzed is located in the left-top side of the image, and the image is scanned in the leftright top-bottom order. In the images in Figure 19, the constructed regions are labeled with false colors. To show the influence of the color space on the quality of the region construction, the colors of the pixels are coded in the (R, G, B) color space (see Figure 19a), in the (Y , U , V ) color space (see Figure 19b) and in the triangular (I, S, T ) color space (see Figure 19c). These three images show that the growing scheme using the same parameter values provides different results with respect to the used color space. This scheme suffers from the difficulty of adjusting relevant thresholds for all the color spaces. Furthermore, one of the main drawbacks of these neighborhoodbased segmentation methods is their sequential nature: the resulting regions depend on the order in which pixels are merged and on the selection of the initial seeds. 2. Segmentation by Region Merging In order to avoid a posteriori the oversegmentation problem, Trémeau and Colantoni proposed that the segmentation is achieved with two successive steps: a low-level step provides an oversegmented image by means of a region growing scheme (see Section III.C.1) and a high-level step merges adjacent regions with similar colors (Trémeau and Colantoni, 2000). He proposes that the high-level scheme is applied to a region adjacency graph that models a segmented image. A node of this graph represents a region, and an edge between two nodes corresponds to a pair of adjacent regions in the image.
118
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c) F IGURE 19. Segmentation of the images in Figure 16 by region growing. The region grows until the max of the differences between one color component of the region and that of the examined pixel is higher than 10. (a) (R, G, B) color space. (b) (Y , U , V ) color space. (c) Triangular (I, S, T ) color space. (See Color Insert.)
Figure 20b shows the adjacency graph obtained from the presegmented image of Figure 20a. The iterative analysis of the graph consists of merging two nodes relied by an edge when they respect a fusion criterion at each iteration step. More precisely, this procedure merges adjacent regions whose colors are close enough (Lozano, Colantoni and Laget, 1996). In addition to the color homogeneity, Schettini proposes another criterion that considers the perimeters of two adjacent regions that are candidate for merging (Schettini, 1993). These criteria are adapted in case of uniform color regions but not in case of textured regions. This led Panjwani and Healey to model the regions by color Markov fields, which are designed for characterizing color textures (Panjwani and Healey, 1995).
COLOR SPACES AND IMAGE SEGMENTATION
(a)
119
(b)
F IGURE 20. Adjacency graph of regions. (a) Presegmented image. (b) Schematic adjacency graph of the image of figure (a). (See Color Insert.)
These methods favor the spatial interactions between pixels and analyze the colors of the pixels only with respect to grow or merge regions. Since these methods require an a priori knowledge of the images in order to adjust the used parameters, many authors prefer to achieve a global analysis of the color distribution in the color space. D. Region Construction Based on a Color Space Analysis 1. Introduction Because the color of each pixel can be represented in a color space, it is also possible to analyze the distribution of pixel colors rather than examining the image plane. In the (R, G, B) color space, a color point is defined by the color component levels of the corresponding pixel, namely, red (R), green (G), and blue (B). It is generally assumed that homogeneous regions in the image plane create clusters of color points in the color space, each cluster corresponding to a class of pixels that share similar color properties. Let us consider the synthetic image of Figure 21a which is composed of 6 regions, denoted Ri , i = 1, . . . , 6, and is identical to that of Figure 16a. The color points representing the pixels in the (R, G, B) color space are shown in Figure 21b. To estimate the separability of the clusters, let us also examine Figures 21c–e, which show the clusters of color points of the image of Figure 21a projected onto the different chromatic planes (R, G), (G, B), and (R, B), respectively. The regions R1 , R2 , R3 , and R4 give rise to wellseparated clusters of color points, whereas those of the regions R5 and R6 form two overlapping clusters. It is difficult to identify these two clusters by the analysis of the color point distribution in the (R, G, B) color space.
120
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c)
(d)
(e) F IGURE 21. Clusters of color points of the image of figure (a) in the (R, G, B) color space. (a) Color synthetic image. (b) Clusters of color points in the (R, G, B) color space. (c) Clusters of color points in the (R, G) chromatic plane. (d) Clusters of color points in the (G, B) chromatic plane. (e) Clusters of color points in the (R, B) chromatic plane. (See Color Insert.)
COLOR SPACES AND IMAGE SEGMENTATION
(a)
121
(b)
F IGURE 22. Clusters of color points of the image of Figure 21a in different color spaces. (a) Clusters of color points in the (Y , U , V ) color space. (b) Clusters of color points in the (I, S, T ) color space. (See Color Insert.)
The classes of pixels are constructed by means of a cluster identification scheme that is performed either by an analysis of the color histogram or by a cluster analysis procedure. When the classes are constructed, the pixels are assigned to one of them by means of a decision rule and are mapped back to the original image plane to produce the segmentation. The regions of the segmented image are composed of connected pixels, which are assigned to the same classes. When the distribution of color points is analyzed in the color space, the procedures generally lead to a noisy segmentation with small regions scattered through the image. Usually, a spatial-based postprocessing is performed to reconstruct the actual regions in the image (Cheng, Jiang and Wang, 2002; Nikolaev and Nikolayev, 2004). Several color spaces can be used to represent the colors of the pixels. Figure 22 shows the color points of the image in Figure 21a represented in the (Y , U , V ) and (I, S, T ) color spaces. By visually comparing this figure with Figure 21e, we see that the clusters representing the colors of the regions R5 and R6 form two overlapping clusters in the (R, G, B) color space—one single cluster in the (Y , U , V ) color space and well-separated clusters in the (I, S, T ) color space. These two figures show that the color distribution and the cluster separability depend on the choice of the color space. To illustrate this problem with a real example (soccer image segmentation), let us examine Figure 23a extracted from Vandenbroucke, Macaire and Postaire (2000b) where each of the images contains one soccer player. The players are regrouped into four classes ωj , j = 1, . . . , 4. Figure 23b shows
122
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c)
F IGURE 23. Clusters corresponding to different classes projected onto different color spaces. (a) Classes of pixels to be considered. (b) Color points in the (R, G, B) color space. (c) Color points in Carron’s (Y, Ch1 , Ch2 ) color space. (d) Color points in the hybrid (x, Ch2 , I 3) color space. (See Color Insert.)
COLOR SPACES AND IMAGE SEGMENTATION
123
(d) F IGURE 23.
(continued)
the colors of pixels of each class projected onto the (R, G, B) color space. Clusters corresponding to the classes ω2 , ω3 , and ω4 are not well separated in the (R, G, B) color space while they are more compact in the (Y, Ch1 , Ch2 ) Carron’s color space (defined in Section II.B.2.d; see Figure 23c) and well separated in the (x, Ch2 , I 3) hybrid color space (defined in Section II.B.2.d; see Figure 23d). This example shows that the selection of the color space is crucial for the construction of pixel classes. The key of the segmentation problem based on color-spaced analysis consists of the construction of the pixel classes. The segmentation of color image based on pixel classification can be divided into four groups: (1) the analysis of histograms (see Section III.D.2), (2) the analysis of the 3D color histogram (see Section III.D.3), (3) the clustering approaches, which take into account only color distribution (see Section III.D.4), and (4) the spatial-color classification methods, which simultaneously consider the spatial interactions between the pixels and the color distribution (see Section III.D.5). 2. Analysis of One-Dimensional Histograms Many statistical procedures seek to accomplish class construction by detecting modes of the color space. The modes are characterized by high local concentration of color points in the color space separated by valleys with a low local concentration of color points. For this purpose, they estimate the probability density function (pdf) underlying the distribution of the color points (Devijver and Kittler, 1982). Assuming that each mode of the color space corresponds to the definition domain of a pixel class, the color image segmentation can be viewed as a pdf mode detection problem. The most widely used tools for the approximation of the pdf are the 1D histograms denoted H k [I], of the color component images Ik , k = R, G, B,
124
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c) F IGURE 24. One-dimensional histograms of the image in Figure 16a. (a) One-dimensional histogram of the color component image IR . (b) One-dimensional histogram of the color component image IG . (c) One-dimensional histogram of the color component image IB . (See Color Insert.)
COLOR SPACES AND IMAGE SEGMENTATION
125
F IGURE 25. Mode detection in the chromatic plane (G, B) by the analysis of the 1D histograms H G [I] and H B [I] of the image in Figure 21a. (See Color Insert.)
when the (R, G, B) color space is used. The bin H k [I](n) contains the number of pixels P whose color component level Ik (P ) is equal to n. By examining Figure 24 (which contains three 1D histograms H k [I], k = R, G, B, in the image of Figure 21a), we see that the color component distributions of the different regions create peaks of the 1D histograms. Furthermore, the histograms show that the pixel classes may not be equiprobable. The peaks of the corresponding 1D histogram are determined by means of searching thresholds that delimit them (Schettini, 1993). When the thresholds are determined, the color space is partitioned into parallelepipedic boxes by means of the Cartesian product of the intervals delimited by these thresholds. An analysis of the populations of pixels whose color points fall into these boxes allows us to identify the modes of the color space (Figure 25).
126
BUSIN , VANDENBROUCKE AND MACAIRE
A peak of a 1D histogram may contain the color component levels of pixels that constitute different regions of the image (see Figure 24a). Pixels of each region must be regrouped into a specific class of pixels defined by a mode of the color space. Therefore, the methods of mode detection analyzes the case when one peak of the 1D histogram corresponds to several modes of the color space (Ohlander, Price and Reddy, 1978). For this purpose, an iterative analysis of the 1D histograms is achieved (Busin et al., 2005). At each iteration step, this procedure constructs one class of pixels. The procedure looks for the most significant mode to construct one class, according to different criteria, such as the population size. The pixels assigned to this class and which are connected in the image constitute one of the reconstructed regions in the segmented image. The pixels assigned to the so-constructed class are extracted from the color image so that they are not taken into account at the next iteration steps of the procedure. The iterative procedure stops when only a few pixels that are not assigned to any constructed classes remain. When the iterative procedure stops, the pixels have not been assigned to any class, that can be assigned to one of the constructed classes by means of a specific decision rule. Figure 26 illustrates the iterative segmentation of the Hand image in Figure 26a (Busin et al., 2004). The image in Figure 26b shows pixels assigned at the first class built by an analysis of the 1D color histograms. White-labeled pixels in the image of Figure 26c are not assigned to this class and thus are considered by the second iteration step in order to build the second class. Images of Figures 26d, f, h, and j show the successively built classes, whereas images of Figures 26e, g, i, and k contain the pixels to be analyzed by the successive iteration steps. The final segmented image of Figure 26l with pixels labeled to the five built classes shows that this scheme provides satisfying results. Actually, the performance of the mode extraction scheme is strongly affected by the choice of color space, implicitly indicating that the performance depends on the color distribution of the analyzed image (Lim and Lee, 1990). That finding led Tominaga to perform the Karhunen–Loeve transformation of the (R, G, B) color components (i.e., the PCA) in order to construct the classes (Tominaga, 1992). This procedure transforms the (R, G, B) color space into the (X1 , X2 , X3 ) principal component space. The histogram H [I]X1 of the most discriminating component X1 is first analyzed. If this histogram is a multimodal one, the most significant peak is detected. Then a new Karhunen–Loeve transformation is achieved with the pixels that do not belong to the built class. If the histogram of the first component contains only one single peak, then Tominaga proposes analysis of the second component X2 . This iterative scheme stops when the histograms
127
COLOR SPACES AND IMAGE SEGMENTATION
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
F IGURE 26. Iterative segmentation of the Hand image. (a) Hand image. (b) Pixels assigned to the class built at the first iteration. (c) Pixels to be analyzed after the first iteration. (d) Pixels assigned to the class built at the second iteration. (e) Pixels to be analyzed after the second iteration. (f) Pixels assigned to the class built at the third iteration. (g) Pixels to be analyzed after the third iteration. (h) Pixels assigned to the class built at the fourth iteration. (i) Pixels to be analyzed after the fourth iteration. (j) Pixels assigned to the class built at the fifth iteration. (k) Pixels to be analyzed after the fifth iteration. (l) Segmented image. (See Color Insert.)
128
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c) F IGURE 27. Color component images resulting from the Karhunen–Loeve transformation of the image in Figure 21a. (a) Color component image IX1 . (b) Color component image IX2 . (c) Color component image IX3 .
are empty or when the histogram of the third component X3 contains one single peak. Figure 27 shows the three color component images IX1 , IX2 , and IX3 resulting from the Karhunen–Loeve transformation of the image in Figure 21a. The IX1 color component image contains the greatest amount of information. Tominaga proposes to first extract the most significant peak from the histogram H X1 [I] (Figure 28a). Pixels of the image IX1 whose X1 levels fall into the interval delimiting this peak are shown with a false color in the image of Figure 29a, where unassigned pixels are labeled as black. By examining this image, we see that the pixels of the background constitute the first built class. The scheme is iterated to yield the final satisfying segmentation shown by Figure 29b.
COLOR SPACES AND IMAGE SEGMENTATION
(a)
129
(b)
(c) F IGURE 28. One-dimensional histograms of the IX1 , IX2 , and IX3 color component images in Figure 27. (a) One-dimensional histogram of the image in Figure 27a. (b) One-dimensional histogram of the image in Figure 27b. (c) One-dimensional histogram of the image in Figure 27c. (See Color Insert.)
3. Analysis of the Three-Dimensional Color Histogram Figure 24 shows that a peak of each 1D histogram of a color image can reflect several classes. Even when the mode detection is separately performed on the three 1D histograms, the appropriate number of classes may not be detectable because of this superposition. Moreover, since the thresholds obtained from the analysis of the 1D histograms are used to partition the color space into parallelepipedic boxes, the boundaries between classes cannot be adequately determined, sometimes yielding false segmentation results. Because color is 3D information, several authors use the 3D color histogram for approximating the pdf. The color histogram is composed of bins whose coordinates are the three color components. Each bin H [I](C) indicates the number of pixels P whose color I(P ) is equal to the color C.
130
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
F IGURE 29. Image of Figure 21a segmented by Tominaga’s scheme. (a) Pixels assigned to the class built at the first iteration step. (b) Segmented image segmented after the last iteration step. (See Color Insert.)
Since the colors of pixels are usually quantified with 256 levels, a 3D histogram requires a considerable amount of computer memory. That explains why only a few authors have analyzed the 3D histogram for color image segmentation (Lambert and Macaire, 2000). A simple solution consists of color quantization by considering only the highest bits for each component in order to reduce the computational complexity. This solution yields poor results, because this quantization phase leads to a very crude segmentation. Another approach consists of mode detection by thresholding the bins. Nevertheless, the determination of the threshold is very delicate, which is why several mode detection methods use specific morphological transformations of the color histogram to enhance the modes (Park, Yun and Lee, 1998; Shaffarenko, Petrou and Kittler, 1998). Let us consider the image of Figure 30a, which is made of two concentric disks lying on a distinct background. This image is corrupted by a noncorrelated Gaussian noise, with a standard deviation equal to 9, that is independently added to the color components. We can visually distinguish three regions, each being characterized by a specific shade of green. Therefore, pixels of this image must be grouped into three classes. In order to simplify the illustration, the blue component has been set to 0 throughout the image. The color histogram of this image is plotted in Figure 30b over a restricted domain of the chromatic plane (R, G) where its values are non-zero. Since the distributions of those colors overlap, the three color modes are barely detected by an automatic processing of this histogram. Gillet et al. define a mode as a parallelepipedic box of the color space where the histogram function is locally concave while reaching values high
COLOR SPACES AND IMAGE SEGMENTATION
(a)
131
(b)
(c)
(d)
(e)
F IGURE 30. Mode detection by convexity analysis of the color histogram. (a) Image. (b) Histogram. (c) Modes of the histogram detected by the convexity test; their identification is illustrated by false colors. (d) Prototypes of the three built pixel classes. (e) Pixels assigned to the three classes. (See Color Insert.)
132
BUSIN , VANDENBROUCKE AND MACAIRE
enough to be relevant (Gillet et al., 2002). In order to assign a “concave” or “convex” label to each color C of the color space, a local convexity test is performed on the histogram. Let D(C, l) be a cubic box, centered at the color C = [C R , C G , C B ]T in the (R, G, B) color space, whose edges of length l (an odd integer) are parallel to the color component axes. The box D(C, l) therefore includes all the color points c = [cR , cG , cB ]T such that l−1 k k C k − l−1 2 c C + 2 , k = R, G, B. The mode detection is based on the property stating that, at each color point C where a trivariate function f is locally concave, the average value of f processed over a box D(C, l) is a decreasing function of l (Postaire and Vasseur, 1980). The histogram local convexity is therefore evaluated at each color point C by examining the variation of the average histogram, denoted μ, according to the size of the cubic box centered at C. First, we average the histogram over the box D(C, l1 ): " cD(C,l1 ) H (c) μ1 (C) = . (55) (l1 )3 We then compute the second average value μ2 using a slightly larger box D(C, l2 ): " cD(C,l2 ) H (c, l) μ2 (C) = , with l2 > l1 . (56) (l2 )3 Empirically, length l1 is set to 5, while length l2 is set to l1 + 2. In these conditions, if μ1 (C) > μ2 (C), the histogram is considered as locally concave. If this condition is satisfied and if H [I](C) is high enough (i.e., higher than a threshold adjusted by the analyst), we decide that the color point C belongs to a mode. This mode detection algorithm is applied to the 2D histogram of the synthetic image (see Figure 30b). Because small modes may be detected by convexity analysis, a binary morphological opening is applied to the modes to preserve only the significant ones—modes whose volume is larger than that of the structuring element that is used. Figure 30c shows the three detected modes in the chromatic plane (R, G). The detected modes then are identified using a connected component analysis procedure in the color space. Each mode identified in this manner is the definition domain of a pixel class. All the pixels whose colors are located in an identified mode define the prototypes of the corresponding class and are assigned to it. Figure 30d shows the prototypes of the three pixel classes labeled as false colors, constructed by the analysis of the histogram in Figure 30b. The black-marked pixels are the ones that are not yet assigned. This image shows that the prototypes correctly represent the background and
COLOR SPACES AND IMAGE SEGMENTATION
133
the two concentric shapes. This example illustrates that the convexity analysis of the histogram efficiently constructs the pixel classes even when the color distributions of the different regions strongly overlap. The next classification procedure step consists of assigning all the pixels that are not yet labeled by use of a specific decision rule in order to produce the final segmented image in Figure 30e (Vannoorenberghe and Flouzat, 2006). 4. Clustering-Based Segmentation The multithresholding methods based on histogram analysis require the computation of the histogram, which is both time and memory space consuming. For this reason, many authors have proposed a looking for clusters of color points in the color space. Among the clustering techniques based on the least sum of squares criterion, the c-means algorithm is one of the most widely used methods for clustering multidimensional data (Ismail and Kamel, 1989). This iterative approach requires the analyst to adjust the desired number Nω of pixel classes. Let Cj denote the gravity center of the cluster that is associated with the class ωj . This approach tends to minimize the global within-class dispersion, which is defined by SW =
Nω I(P ) − Cj .
(57)
j =1 P ∈ωj
At the initial iteration step, the locations of the Nω gravity centers are randomly fixed in the color space. At each iteration step denoted t of the process, the Nω gravity centers Cj (t) are updated via this construction scheme: • Each pixel P is assigned to the class ωj 0 whose gravity center Cj 0 (t) is the nearest one to the color I(P ) in the color space: Cj 0 (t) − I(P) = min Cj (t) − I(P) . j =1,...,Nω
• The gravity center Cj (t) of each class ωj is updated by taking into account the pixels assigned to ωj : I(P ). Cj (t) = P ∈ωj
• The variation denoted ε of the gravity centers between the preceding (t −1) and current (t) steps is determined by: Cj (t) − Cj (t − 1). ε= j =1,...,Nω
134
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c) F IGURE 31. Clustering-based segmentation of the image of Figure 21a. (a) Image segmented thanks to the c-means scheme with a random initialization of the gravity centers. (b) Image segmented thanks to the c-means scheme with an interactive initialization of the gravity centers. (c) Image segmented thanks to the competitive learning scheme. (See Color Insert.)
• If ε is higher than a threshold, then the process achieves a new iteration step. Wherever the initialization of the gravity centers, the scheme converges to a local minimum of SW . In this way, the algorithm stops at a local optimum, which can be far away from the real global optimum, especially when large numbers of pixels and clusters are involved. Since the convergence and the classification results depend on the initialization step, different approaches are proposed as follows. • The initial locations of the gravity centers are randomly selected (see Figure 31a)
135
COLOR SPACES AND IMAGE SEGMENTATION
• During a supervised learning, the initial locations of the gravity centers are interactively selected so that they are uniformly scattered in the color space (see Figure 31b). The isodata method adds two rules to determine the number Nω of classes: if the within-class dispersion of a class is greater than a threshold, then the class is split into two different classes. If the distance between two gravity centers is too low, then two corresponding classes are merged into one class (Takahashi, Nakatani and Abe, 1995). In order to minimize the within-class dispersion SW , Uchiyama and Arbib constructed the Nω classes by means of an iterative process based on competitive learning (Uchiyama and Arbib, 1994). This process, which does not require an initialization step, converges to approximate the optimum solution. We apply the competitive learning scheme to the image in Figure 21a (see Figure 31c). By examining the segmented image, we notice that the regions R5 and R6 are merged into one class, because the color component distributions overlap. The regions R4 and R2 are represented by one class, because the regions with the lowest population (in our case R4 ) are not taken in account by the competitive learning scheme. Liew et al. apply the fuzzy c-means, a fuzzy classification scheme that considers that each pixel P belongs to each class ωj according to a membership degree denoted Uj (P ) ranging between 0 and 1 (Liew, Leung and Lau, 2000). This method constructs the classes by minimizing the m-fuzzy within-class dispersion J (m): J (m) =
Nω
Uj (P )
P
m Cj − I(P ),
(58)
j =1
where
Uj (P )
m
=
−1 Nω I(P ) − Ci 2/(m−1) i=1
I(P ) − Cj
.
(59)
The color components of the gravity center Cj associated with the class ωj are defined by: " (Uj (P ))m .Ik (P ) k , with k = R, G, B. (60) Cj = P" m P (Uj (P )) When the gravity centers of the classes are estimated, the membership degrees of each pixel to the classes are determined. The pixel P is assigned to the class for which the membership degree is the highest (Scheunders, 1997).
136
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
F IGURE 32. Relationship between the regions in the image and the clusters of color points in a color space. (a) Original synthetic image. (b) Clusters of color points projected onto the (R, G) chromatic plane. (See Color Insert.)
5. Spatial-Color Classification Let us consider the synthetic image of Figure 32a, which is composed of six regions, denoted Ri , i = 1, . . . , 6, with different sizes and shapes. The color distribution in this synthetic image is different from that of Figure 16a. For the sake of illustration simplicity, the blue level of the pixels is set to 0. The projections of the color points representing the regions onto the (R, G) chromatic plane are displayed in Figure 32b. The regions R1 and R2 give rise to well-separated clusters of color points, whereas those of regions R3 and R4 form two overlapping clusters. Note that as the color points of regions R5 and R6 constitute a single cluster, they cannot be discriminated by the analysis of the color point distribution in the (R, G) chromatic plane. This image shows that there is not always a one-to-one correspondence between the regions in the image and the clusters of color points in the color space. Classical classification schemes cannot identify clusters of color points in the color space that correspond to several different regions in the image. A class construction procedure that would consider simultaneously the color properties of pixels as well as their spatial arrangement in the image could be appealing to identify pixel classes corresponding to the actual regions (Balasubramanian, Allebach and Bouman, 1994; Orchard and Bouman, 1991). The JSEG algorithm proposed by Deng and Manjunath (2001) separates the segmentation procedure into two successive steps. In the first step, pixels are classified in the color space without considering their spatial distribution. Then the colors of the pixels are replaced by their corresponding color class
COLOR SPACES AND IMAGE SEGMENTATION
137
labels, thus forming a class-map of the image. At each pixel the denoted JSEG criterion, which depends on the dispersion of the labels associated with its neighboring pixels, is measured. In the second step, the application of this criterion in the class-map yields an image in which high and low values correspond to possible boundaries and interiors of regions, respectively. A region growing method (see Section III.C.1) using this criterion provides the final segmentation. Cheng, Jiang and Wang (2002) proposes a fuzzy homogeneity approach to take into account simultaneously the color and spatial properties of the pixels. Their segmentation scheme also is divided into two steps. In the first step, each of the three color component images Ik , k = R, G, B, is analyzed to aggregate its pixels into classes. For this purpose, the authors introduce the homogram that is defined for each color component level as the mean measure of homogeneity degrees described in Cheng et al. (1998) between pixels with this level and their neighbors. A fuzzy analysis detects the peaks of each homogram in order to identify the prominent classes of pixels for each color component image. The prominent classes built by the analyses of the three color component images are combined to form the classes of pixels of the color image. When some of these classes contain too few pixels, the procedure may lead to oversegmentation. This problem is partially addressed in the second step by merging neighboring classes. This interesting approach is very sensitive to the adjustment of the parameters required for determining the peaks of each homogram in the first step and for merging classes in the second step. Another solution consists of estimating the gradient of the probability density of color occurring jointly in spatial and color domains of the color image (Comaniciu and Meer, 2002). Each pixel is associated with the closest local mode of the density distribution determined by means of the nonparametric “mean-shift” estimation. The quality of the segmentation depends on the precise adjustment of two parameters that control the resolutions in the spatial and in the color domains. We propose to segment the image of Figure 32a by use of the EDISON software developed by Comaniciu and Meer (available at the web address: http://www.caip.rutgers.edu/riul/research/code.html). For this purpose, the pair of parameters hs and hr , which are the side lengths of the windows used to compute the gradient in the spatial and color domains, are set to the values suggested by the authors for segmenting most color images. The parameter M, which corresponds to the minimal population sizes of the classes, is set to 1 so that all the classes, even those with small population sizes, are selected. Figures 33a and b yield the best results obtained by the “mean-shift” analysis. Figure 33a shows that when the side length of the color domains is
138
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
F IGURE 33. Image in Figure 32a segmented by the “mean-shift” analysis. The labels of pixels are false colors. (a) hs = 8, hr = 4, M = 1. (b) hs = 8, hr = 8, M = 1. (See Color Insert.)
low (hr is set to 4), the procedure based on the “mean-shift” method succeeds in separating the two clusters of color points representing the two green concentric disks. However, it also splits the cluster of color points representing the background of the image, which is segmented into several different regions. When the side length of the color domains increases (hr is set to 8), the procedure fails to separate the two clusters of color points corresponding to the two green concentric disks (see Figure 33b). It is difficult, or even impossible, to adjust the parameters required by this procedure in order to identify the two classes of pixels representing the two concentric green disks without splitting the other clusters corresponding to the other regions of the image. However, the mean-shift method does not require adjustment of the desired number of pixel classes. When the number of pixel classes is given, a postprocessing step to the mean-shift method would lead to good segmentation results. Macaire, Vandenbroucke and Postaire (2006) assume that each region can be considered as a subset of strongly connected pixels with homogeneous colors. Hence, they propose selecting parallelepipedic boxes in the color space that define subsets (named color subsets) of strongly connected pixels in the image with as homogeneous colors as possible. In other words, pixels in the image whose color points fall into such boxes (named color domains) of the color space constitute color subsets, so that each color domain is associated with a color subset. The color and connectedness properties of the color subsets associated with the considered color domains are simultaneously taken into account by the pixel class construction scheme. To measure these properties, Fontaine, Macaire and Postaire (2000) introduced the concept of spatial-color compactness degree (SCD) of a color subset, which measures
COLOR SPACES AND IMAGE SEGMENTATION
(a)
139
(b)
F IGURE 34. Image in Figure 32a segmented by the SCD analysis. (a) Color domains of the image in Figure 32a automatically selected by the proposed procedure. (b) Segmented color image from the analysis of the image of Figure 32a. The boundaries of the reconstructed regions are displayed in black. (See Color Insert.)
the spatial arrangement of its pixels in the image plane and the dispersion of the color points representing its pixels in the color space. In order to select the color domains that define color subsets corresponding to the actual regions in the image, the pixel class construction procedure looks for maxima of the SCDS in the color space. The labeled image in Figure 34b shows how the regions of the image in Figure 32a are reconstructed from the identified class color domains displayed in Figure 34a according to specific rules (Macaire, Vandenbroucke and Postaire, 2006). In Figure 34b the boundaries between the reconstructed regions are overlaid in black. This image shows that the six regions are well reconstructed by the SCD analysis. The overlapping of the color point distributions of the regions R3 and R4 explains why a few pixels of the region R3 are misclassified. Note that even with the strong overlapping of the color point distributions of the two regions R5 and R6 , most pixels of the regions corresponding to the two green concentric disks are correctly classified. This result shows that this method is able to handle unequiprobable and overlapping classes of pixels. E. Summary In this section, we have attempted to present classical color segmentation methods. Each method is adapted to a type of color image, so a method that provides a satisfying segmentation of all images does not exist. Several
140
BUSIN , VANDENBROUCKE AND MACAIRE
approaches combine the analysis of the spatial interaction between pixels and of the color distribution to improve segmentation results. Moreover, we have shown with different examples that the color distribution strongly depends on the selection of the color space. That explains why the results provided by segmentation procedures are affected by the chosen color space. The next text section describes studies about the relationships between color spaces and image segmentation.
IV. R ELATIONSHIPS BETWEEN S EGMENTATION S PACES
AND
C OLOR
A. Introduction The first part of this chapter has shown that the information of color can be coded by several color spaces that respect their own specific properties. The second part of the chapter has shown that color image segmentation could be divided into two main approaches according to whether the pixels are analyzed in the image plane or in a color space. If the pixels are analyzed in the image plane, then two dual types of methods are available. On the one hand, the segmentation can be produced by an edge detection scheme. On the other hand, the segmentation can be produced via region construction analysis. Several authors have studied the influence of color spaces for color image segmentation with the objective to determine if there exists a color space that improves the quality reached by color image segmentation methods. To determine whether this “best” color space exists, they estimated the results produced by color image segmentation methods via evaluation methods. First, Ohlander, Price and Reddy (1978) and Ohta, Kanade and Sakai (1980) visually estimated the quality of color segmentation. Because it is very difficult to evaluate the segmentation of a natural scene, several authors adopted the qualitative evaluation method as the most reliable one. Consequently, this part of text is divided as follow. Section IV.B presents the primary evaluation methods used for color image segmentation by edge detection, and Section IV.C presents the main evaluation methods used for color image segmentation by region construction. For both sections, quantitative evaluation methods with ground truth, which correspond to unsupervised evaluations, and quantitative evaluation methods without ground truth, which correspond to supervised evaluations, are presented in the subsections. To illustrate the relationships between the color spaces and segmentation, Section IV.D details two pixel classification methods that determine the most discriminating color space (hybrid color space or classic color spaces).
COLOR SPACES AND IMAGE SEGMENTATION
141
B. Edge Detection In order to evaluate the result provided by an edge detection segmentation, the most common method used in the literature consists of comparing the result of the segmentation I with the ground truth Iref (also called the reference segmentation or gold standard). These comparison methods are presented in Section IV.B.1. Since very few methods based on an evaluation method without ground truth are available and used in the literature for color edge detection segmentation methods, none of them are presented here. Finally, a table summarizing the selected color spaces by means of these criteria that provide the best segmentation results is presented in Section IV.B.2. 1. Quantitative Evaluation Methods with Ground Truth The most common way to evaluate the segmented image computed via an edge detection segmentation algorithm consists of evaluating the extracted edges of the segmented image I with the edges of the ground truth image Iref . For this type of evaluation, two primary methods are available: a probabilistic error approach and a spatial approach. For both approaches, the discrepancy between a binary image computed by means of an edge detection segmentation algorithm consists of evaluating the extracted edges of the segmented image with the edges of the ground truth (Román-Roldán et al., 2001). To detail these both approaches, we define the following: • Mistake—The discrepancy between the detected edge and the real edge is due to individual discrepancies arising from pixel to pixel, which are referred to as mistakes. These may be of two kinds: 1. Bit or overdetected edge pixel—A mistake due to excess, when a pixel is erroneously defined as an edge pixel. 2. Hole or subdetected edge pixel—A mistake due to failure of the method to identify an edge pixel as such. • • • •
Nb —The number of bits in the segmented image. Nh —The number of holes in the segmented image. Ne —The number of edge pixels in the ground truth. N—The number of pixels in the segmented image.
a. Probabilistic Error Approach. The first discrepancy measure established by Peli and Malah (1982) is called error probability and is denoted Pe (I, Iref ) hereafter. This rate is based on a statistical comparison between the number Nb of bits and the number Ne of edge pixels in the ground truth image Iref
142
BUSIN , VANDENBROUCKE AND MACAIRE
and can be computed as: Pe (I, Iref ) =
Nb . Ne
(61)
Another probabilistic discrepancy measure introduced by Lee, Chung and Park (1990)—denoted PE (I, Iref ) hereafter—is based on a two-class problem (denoted ω1 and ω2 ). In the edge detection segmentation methods, the class of pixels ω1 corresponds to those that are assigned as edge pixels, whereas the class of pixel ω2 corresponds to those that are assigned as nonedge pixels. This discrepancy measure is better than the measure proposed by Peli and Malah (1982) because both types of errors (holes and bits) are taken into account. The PE (I, Iref ) probabilistic discrepancy measure based on the classical classification error measure can be expressed as: PE (I, Iref ) = P (ω1 ) × P (ω2 |ω1 ) + P (ω2 ) × P (ω1 |ω2 ) =
Nh + Nb , N (62)
where: Ne N N−Ne P (ω2 ) = N h P (ω2 |ω1 ) = N Ne Nb P (ω1 |ω2 ) = N−N . e
• P (ω1 ) = • • •
b. Spatial Approach. Because the Peli and Lee discrepancy measures do not consider the spatial distance between the edges detected in the segmented image and the edges from the ground truth, most authors prefer Pratt’s discrepancy measure based on the mean-square distance called figure of merit (FOM) (Pratt, 1978) and defined as: FOM(I, Iref ) =
1 M
Ne −N h +Nb i=1
1 , 1 + αd(i)2
(63)
where α is a scaling parameter, M = max(Ne , Ne − Nh + Nb ), and d(i) is the distance from the ith edge pixel in the segmented image to its exact location in the ground truth. The measure is a normalized one so that FOM = 1 indicates a perfect segmentation. Pratt’s measure is the most frequently used discrepancy measure, even though it has been modified by Strastersa and Gerbrands (1991) to improve its accuracy with respect to bits and holes. Another interesting discrepancy measure proposed by Román-Roldán et al. (2001) requires a training step to fix the parameters used.
COLOR SPACES AND IMAGE SEGMENTATION
143
TABLE 6 M ETHODS U SED TO E VALUATE C OLOR I MAGE S EGMENTATION BY E DGE D ETECTION Reference
Evaluation method
Candidate color spaces
Selected color spaces
Rakotomalala et al. (1998)
Pratt’s discrepancy measure and probability error
(R, G, B), (X, Y, Z), (I 1, I 2, I 3), (A, C1 , C2 ), (I, S, T ), (L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ )
(L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ )
Wesolkowski, Jernigan and Dony (2000)
Pratt’s discrepancy measure
(R, G, B), (X, Y, Z), (L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ ), (r, g, b), (l1 , l2 , l3 ), (h1 , h2 , h3 )
(h1 , h2 , h3 )
2. Evaluation Methods for Color Image Segmentation by Edge Detection Several authors apply these methods to evaluate the accuracy of the color image segmentation by edge detection for several color spaces. For the sake of simplicity, the results of these works are presented in Table 6. Table 6 indicates that no color space is actually adapted to edge detection. Moreover, the conclusions of the authors cited are contradictory— Rakotomalala et al. (1998) select (L∗ , a ∗ , b∗ ) or (L∗ , u∗ , v ∗ ) spaces, whereas Wesolkowski, Jernigan and Dony (2000) do not select this as a color space in which the edge detection is efficient. The next text section details the primary evaluation methods for region construction segmentation methods to determine whether an efficient color space exists for such methods. C. Region Construction Evaluation methods for region construction segmentation methods can be divided into two main approaches according to whether a ground truth is available or not. The primary evaluation methods based on discrepancy measure with a ground truth are presented in Section IV.C.1. Without ground truth, it is impossible to compare the segmented images. Therefore, the unsupervised evaluation methods that attempt to mimic human sensitivity are presented in Section IV.C.2. Finally, the selected color spaces by means of these criteria that provide the best segmentation results are summarized in a table in Section IV.C.3. 1. Quantitative Evaluation Methods with Ground Truth Supervised evaluation methods of color image segmentation by region construction can be divided into two families. The first family is based on the
144
BUSIN , VANDENBROUCKE AND MACAIRE
discrepancy area of the regions between the segmented image I and the ground truth Iref . For this method, correct classification rate is the most often used (Sonka, Hlavac and Boyle, 1994). The second family consists of evaluating the color discrepancy between the pixels of the segmented image I and the ground truth Iref . For this method, two approaches are available. The first consists of analyzing color discrepancies between the segmented image I and the ground truth Iref . For this evaluation methods, the color of each pixel in both the segmented image I and the ground truth Iref must be the mean color of the pixels that belong to the same region. The most current evaluation methods for this approach are the mean square error (MSE) and the peak SNR. The second approach consists of evaluating the probability error of mislabeling pixels between the segmented image I and the ground truth Iref . Receiver operator characteristic (ROC) curves and probabilistic error approach often are used for a such analysis. a. Correct Classification Rate. Generally, a similarity measure is used based on the correct classification rate between a segmented image I and a ground truth Iref (Sonka, Hlavac and Boyle, 1994). According to the notation by Chabrier et al. (2006), the first step of the method is computing the superposition table T (I, Iref ) as follows: ref ref T (I, Iref ) = card Ri ∩ Rj , i = 1, . . . , NR , j = 1, . . . , NR , (64) ref
where card{Ri ∩ Rj } is the number of pixels that belong to both the region ref Ri in the segmented image I and to the region Rj in the ground truth ref Iref while NR and NR are the number of regions in the image I and Iref , respectively. From this superposition table, only the C couples that maximize ref card(Ri ∩ Rj ) are selected to compute correct classification rate. Thus, the dissimilarity measure that is a normalized one is computed, so that a measure close to 0 means that the images I and Iref strongly share similar regions, whereas a measure close to 1 means that the images I and Iref are very dissimilar. This measure, denoted CCR(I, Iref ), is computed as follows: CCR(I, Iref ) =
card(I) −
"
C
ref
card(Ri ∩ Rj )
card(I)
.
(65)
b. Mean Square Error. The mean square error, denoted as MSE(I, Iref ), expresses the delineation accuracy and region homogeneity of the final partitioning between the segmented image I and the ground truth Iref . The lower the values of the MSE(I, Iref ), the better the segmentation results.
COLOR SPACES AND IMAGE SEGMENTATION
The MSE(I, Iref ) can be computed as Iref (P ) − I(P ), MSE(I, Iref ) =
145
(66)
P ∈I
where Iref (P ) is the color mean of the region to which the pixel P belong. c. Peak SNR. Peak signal to noise ratio, denoted as PSNR(I, Iref ), is close to the MSE(I, Iref ) evaluation method and expresses the same concept. The main difference lies in the interpretation of the result provided by the PSNR(I, Iref ). Indeed, the higher the PSNR(I, Iref ), the better the segmentation result. The PSNR of an X × Y image (measured in decibels) can be computed as 2552 × X × Y × card(k) . (67) PSNR(I, Iref ) = 10 × lg " P ∈I Iref (P ) − I(P ) d. Probabilistic Error Approach. The discrepancy measure introduced in Section IV.B.1 has been extended for the ωn classes problem. In the case of an ωn class problem, each ωn class corresponds to a specific region, usually determined by a pixel classification, as in Lim and Lee (1990) and Park, Yun and Lee (1998), for example. The PE (I, Iref ) probabilistic discrepancy measure can be calculated for an ωn class problem as follows: PE (I, Iref ) =
Rn Rn
P (Ri |Rj ) × P (Rj ).
(68)
j =1 i=1 i =j
e. ROC Curves. ROC curves are commonly used to present results for binary decision problems. By comparison of the labeled pixels provided by a classification method, four cases are possible as shown in the confusion matrix shown in Table 7. The ROC curves can be represented by plotting in a 2D space where the xaxis corresponds to the fraction of false positives (false-positive rate, or FPR) and the y-axis corresponds to the fraction of true positives (true positive rate, TABLE 7 C ONFUSION M ATRIX
Pixels assigned to the class ωi (i = 1, 2) in I
ω1
Pixels assigned to the class ωi (i = 1, 2) in Iref ω1 ω2 True positive (TP) False positive (FP)
ω2
False negative(FN)
True negative (TN)
146
BUSIN , VANDENBROUCKE AND MACAIRE
TPR) (Liu and Shriberg, 2007). The TPR and FPR are expressed as: TP , (69) TPR = TP + FN FP FPR = . (70) FP + TN Analog curves of ROC curves, denoted precision recall (PR) curves, exist. Davis and Goadrich (2006) have shown that a deep relationship exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. Precision and Recall are expressed as TP Precision = , (71) TP + FP TP Recall = . (72) TP + FN Martin, Fowlkes and Malik (2004) emphasize that PR curves are better adapted for color image segmentation by edge detection than ROC curves. 2. Quantitative Evaluation Methods without Ground Truth The mislabeling rate often is used to evaluate the quality of the segmentation. However, a segmented image Ia with a mislabeling rate lower than a segmented image Ib may not correspond to a better segmentation with a visual estimation, as illustrated by Liu and Yang (1994). They explain this phenomenon by the variable sensitivity of human perception according to the inspected scene. Moreover, it is demanding to construct ground truth segmentation of real scene images. Martin et al. (2001) have built a benchmark image database that contains human segmentation of real scene images. Global measures of segmentation quality were proposed first by Liu and Yang (1994) and improved by Borsotti, Campadelli and Schettini (1998). Both criteria do not require evaluation parameters. a. Criterion of Liu and Yang. To evaluate segmentation results both locally and globally, the evaluation function F (I) is defined by Liu and Yang (1994) as F (I) =
#
NR ×
NR e2 √i , Ai i=1
(73)
where NR is the number of regions in the segmented image, Ai is the√area of the ith region, and ei its mean error color. In this equation, the term NR penalizes√an oversegmentation while the local measure is estimated by the term ei2 / Ai and penalizes small regions or regions with a large color error. The lower the value of the criterion F (I), the better the segmentation result.
COLOR SPACES AND IMAGE SEGMENTATION
147
b. Criterion of Borsotti, Campadelli and Schettir. Liu and Yang’s criterion and Borsotti et al.’s criterion are empirically defined as Zhang (1996) suggests. Borsotti et al. improve Liu and Yang’s criterion by a heavier penalization of the segmented image with too many small regions and inhomogeneous color regions. For this task, the sum of Eq. (73) is split into two terms rather than only one. The criterion of Borsotti et al. Q(I) of an X × Y segmented image I can be computed as √
Q(I) =
R ei2 NR NR (Ai )2 × + , X×Y 1 + log Ai A2i i=1
N
(74)
where NR (Ai ) is the number of regions having an area equal to Ai . 3. Evaluation Methods for Color Image Segmentation by Region Construction This section provides an overview of several works that used the previous evaluation methods to determine whether a single color space that provides best results for color image segmentation by region construction exists. As in Section IV.B.2, these studies are summarized in a table (see Table 8). As shown in Tables 6 and 8 there is not single color space that is recommended by all studies. Thus, we can conclude that there does not exist a single color space that provides the best results for color image segmentation. In fact, the choice of the color space varies according to the type of image to segment and the segmentation method used. D. Selection of the Most Discriminating Color Space Sections IV.B and IV.C have shown contradictory conclusions about the pertinence of the available color spaces in the context of image segmentation. Instead of searching for the best classical color space to improve the results of image segmentation by pixel classification, Vandenbroucke, Macaire and Postaire (2003) defined a new type of color space by selecting a set of color components that can belong to any of several color spaces. Such spaces, which have neither psychovisual nor physical color significance, are called hybrid color spaces (HCS) by the authors and have been used to extract meaningful regions representing the soccer players and to recognize their teams. More details about this original approach are presented in Section IV.D.1. Another interesting approach proposed by Busin et al. (2005) consists of selecting a specific color space for the construction of each extracted class of pixels by using to an iterative procedure. We explain this method in Section IV.D.2.
148
BUSIN , VANDENBROUCKE AND MACAIRE
TABLE 8 M ETHODS U SED TO E VALUATE C OLOR I MAGE S EGMENTATION BY R EGION C ONSTRUCTION Reference
Evaluation method
Candidate color spaces
Selected color spaces
Meas-Yedid et al. (2004)
Liu’s criterion
(R, G, B), (r, g, b), (H 1, H 2, H 3) (I 1, I 2, I 3), (X, Y, Z), (Y, I, Q) (L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ )
(I 1, I 2, I 3)
Borsotti’s criterion specific criterion
(Y , I , Q ) (H 1, H 2, H 3)
Makrogiannis, Economou and Fotopoulos (2005)
PSNR
(R, G, B), (Y, Cb , Cr ), (Y, U, V ), (L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ )
(L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ )
Liu and Yang (1994)
Liu’s criterion
(R, G, B), (I, S, T ), (I 1, I 2, I 3), (L∗ , a ∗ , b∗ ), (X, Y, Z), (Y , I , Q )
Varies according to the image type
Littmann and Ritter (1997)
ROC curves
(R, G, B), (Y, u, v), (Y, QRG , QRB )
(R, G, B)
Phung, Bouzerdoum and Chai (2005)
ROC curves
(R, G, B), (H, S, V ), (Y, Cb , Cr ), (L∗ , a ∗ , b∗ ), (r, g), (H, S), (Cb , Cr ), (a ∗ , b∗ )
(Cb , Cr )
Lezoray (2003)
MSE
(R, G, B), (X, Y, Z), (H, S, L), (I 1, I 2, I 3), (Y , I , Q ), (Y , U , V ), (Y, Ch1 , Ch2 ), (Y, Cb , Cr ), (L∗ , u∗ , v ∗ ), (L∗ , a ∗ , b∗ )
Color spaces ordered
Lim and Lee (1990)
Probability error
(R, G, B), (X, Y, Z), (Y , I , Q ), (U ∗ , V ∗ , W ∗ ), (I 1, I 2, I 3)
(I 1, I 2, I 3)
Park, Yun and Lee (1998)
Probability error
(R, G, B), (X, Y, Z), (Y , I , Q ), (U ∗ , V ∗ , W ∗ ), (I 1, I 2, I 3)
Segmented images are unaffected by the choice of the color space
COLOR SPACES AND IMAGE SEGMENTATION
149
1. Pixel Classification in an Adapted Hybrid Color Space Instead of searching for the best classical color space to improve the results of image segmentation by pixel classification, Vandenbroucke, Macaire and Postaire (2003) defined an HCS by selecting a set of color components that can belong to any of the different color spaces listed in Figure 13. The HCSs have been used to extract meaningful regions representing the soccer players and to recognize their teams. The goal of the analysis of these images is to extract the players and the referee (Figure 35). In this context, there are three pixel classes of interest—the pixels representing the players of the two opposing teams and the referee—which are identified by the colors of their uniforms. This section is divided as follow. First, we detail the determination of an HCS. Second, we present the color pixels classification algorithm. Finally, we show several results obtained with this method. a. Adapted HCS Construction. For this study, the color components of the HCS are chosen among a set of color components Π K = (R, G, B, r, g, b, I, H, S, X, Y, Z, x, y, z, L∗ , a ∗ , b∗ , u∗ , v ∗ , I 1, I 2, I 3, A, C1, C2, Y , I , Q , ∗ , ho , S ∗ , C ∗ , ho ), where K is the number of available color U , V , Cuv uv uv ab ab components. Each player pixel P (x, y) in Figure 35 is represented by a point in the space Π K whose kth coordinate is denoted π k (x, y). ωj is a class of player pixels (j = 1, . . . , Nω where Nω is the number of classes). They also denote Nωj , the number of player pixels in the class of player pixels ωj . For each color feature of this space, the average mkj of the player pixels values in class of player pixels ωj is computed as: mkj =
1 × Nωj
π k (x, y).
(75)
P (x,y)∈ωj
F IGURE 35. Example of three classes of pixels representing the players of the two opposing teams and the referee. (See Color Insert.)
150
BUSIN , VANDENBROUCKE AND MACAIRE
Thus, they define a K-dimensional color feature vector T Mj = m1j , . . . , mkj , . . . , mK j for each class of player pixels ωj . In order to select the set of most discriminating color features among K available color features, they propose use of a specific informational criterion. They assume that the better the classes are separated and compact in the HCS, the higher the discriminating power of the selected feature. That assumption leads to their choice of the measures of separability and compactness. The measure of compactness of each class ωj (j = 1, . . . , Nω ) is defined by the within-class dispersion matrix Σj Σj =
1 × Nωj
(Xj − Mj )(Xj − Mj )T ,
(76)
P (x,y)∈ωj
where Xj = [π 1 (x, y), . . . , π k (x, y), . . . , π K (x, y)]T is the color point of the pixel P (x, y) that belongs to the class ωj . They define the total withinclass dispersion matrix ΣC as ω 1 × Σj . Nω
N
ΣC =
(77)
j =1
The measure of the class separability is defined by the between-class dispersion matrix ω 1 × (Mj − M)(Mj − M)T , Nω
N
ΣS =
(78)
j =1
where M = [m1 , . . . , mk , . . . , mK ]T is the mean vector of all the classes ω 1 × Mj . Nω
N
M=
(79)
j =1
The most discriminating set of color features maximizes the criterion J = trace(ΣC−1 ΣS ). The HCS is constructed by means of the “knock-out” algorithm (Firmin et al., 1996), and its dimension d is based on a correlation measure. The dimensionality of the HCS increases while this correlation measure is lower than a given threshold. With this procedure, they select the most discriminating color features among the K available ones. The player pixels are classified in this most discriminating HCS.
COLOR SPACES AND IMAGE SEGMENTATION
151
b. Color Pixels Classification Algorithm. Before classifying the player pixels of a current color image, the player pixels are extracted. Then the R, G, and B features of each player pixel P (x, y) are transformed into HCS features. To classify a player pixel P (x, y) of a color image, Vandenbroucke, Macaire and Postaire (2003) consider the set of player pixels falling into a neighborhood of P (x, y). The size of this neighborhood depends on the mean player size in the image. Then for each player pixel, they evaluate a mean vector MP = [m1p , . . . , mdp ]T of the HCS features of the player pixels belonging to neighborhood. For each class ωj , they evaluate the Euclidean distance Dj (x, y) between the mean vector Mj of the class ωj and the mean vector MP in the HCS $ % d % 2 mk − mk . (80) D (x, y) = M − M = & j
j
P
j
p
k=1
Finally, a minimum decision rule is used to assign P (x, y) to the class ωj (j = 1, . . . , Nω ) for which Dj (x, y) is minimum. c. Results. Vandenbroucke, Macaire and Postaire (2003) illustrate the effectiveness of the HCS for color image segmentation by pixels classification to extract meaningful regions representing the soccer players and to recognize their teams. The images in Figure 36 constitute a test sample extracted from the same sequence. These images are not easy to segment because each of them contains at least two adjacent players. The player pixels extracted from the images of Figure 36 by means of the multithresholding scheme of Ohlander, Price and Reddy (1978) are shown in Figure 37. The images in Figure 38 show how the extracted player pixels have been classified in the (a ∗ , CU V ) adapted HCS. The player pixels assigned to the same class are labeled with the same color. The adjacent player pixels with the same label constitute regions that correspond to the different soccer players. Vandenbroucke, Macaire and Postaire (2003) compare the results (see Figure 38) with the ground truth of Figure 39 by means of the error classification rate (see Table 9). Since the mean classification error rate associated with the adapted hybrid color space (a ∗ , CU V ) is the lowest one, they conclude that the classification in the HCS provides better results than the classification in the classical color spaces. 2. Unsupervised Selection of the Most Discriminating Color Space For several applications such as image retrieval, color space must be selected without any learning. Busin et al. (2005) proposed a selection scheme based on an unsupervised learning analysis of color histogram.
152
BUSIN , VANDENBROUCKE AND MACAIRE
F IGURE 36.
F IGURE 37.
F IGURE 38.
Color soccer images (125 × 125 pixels). (See Color Insert.)
Player pixels extracted from the images in Figure 36. (See Color Insert.)
Player pixels of Figure 37 classified in the hybrid color space. (See Color Insert.)
F IGURE 39.
Ground truth of the player pixels of Figure 37. (See Color Insert.)
COLOR SPACES AND IMAGE SEGMENTATION
153
TABLE 9 C LASSIFICATION E RROR R ATES OF P LAYER P IXELS IN F IGURE 37 Color space
(r, g, b) (H, S, I ) (Y , CU V , hU V ) (a ∗ , CU V ) (R, G, B)
Classification error rates image (a)
image (b)
image (c)
image (d)
Mean error rate
32.63 8.63 12.28 11.5 24.12
10.01 27.77 17.15 7.35 29.25
5.19 11.63 23.49 1.86 44.57
0.65 1.81 6.14 0.29 48.23
10.24 14.39 15.08 4.98 36.73
Because the classes constructed by a classification procedure depend on the color space used, it would be prudent to select the color space that is the most relevant for detecting the modes that correspond to regions. For this purpose, Busin et al. (2005) assumed that the higher the discriminating power of the 1D histogram, the more probable the detected modes correspond to regions with the same colors. Thus from among several color spaces, the proposed procedure selects those for which the discriminating powers of the 1D histograms are the highest. The discriminating power of a 1D histogram depends on the number of detected modes and on the connectedness properties of pixels whose color component levels fall into these detected modes. They consider that the higher the number of detected modes and the greater the connectivity in the image plane of the pixels belonging to the detected modes, the more discriminating the considered 1D histogram. The proposed unsupervised procedure selects the most relevant color space for constructing each class of pixels at each iteration step. The selection simultaneously takes into account the color and spatial connectedness properties of the pixels in the image. Each stage of the proposed iterative method is shown by Figure 40 and is detailed in the following subsections. a. One-Dimensional Histograms Determination. At each iteration step, the color vectors of the pixels submitted to the analysis are represented into the NS = 11 color spaces ((R, G, B), (r, g, b), (X, Y, Z), (x, y, z), (Y , I , Q), (Y , U , V ), (bw, rg, by), (Y, Ch1 , Ch2 ), (L∗ , a ∗ , b∗ ), (L∗ , u∗ , v ∗ ), and (I 1, I 2, I 3)). In the ith color space, each of the three 1D histograms H i,j (x) of each color component numbered j (j = 1, 2, 3), where x is the color component level, is determined. The 1D histograms of the red, green, and blue components of the House image in Figure 41a are represented in Figures 41b, c, and d, respectively.
154
BUSIN , VANDENBROUCKE AND MACAIRE
F IGURE 40.
Color image segmentation flowchart.
b. One-Dimensional Histograms Smoothing. Because the 1D histograms are corrupted by noise, it is difficult to detect their modes. Thus, Busin et al. (2005) propose to smooth them by means of adaptive filtering. A smoothed i,j histogram Hσ (x) is computed by the convolution between the 1D histogram i,j H (x) and a Gaussian kernel gσ (x), where σ is the standard deviation i,j hi,j σ (x) = h (x) ∗ gσ (x),
(81)
where “∗” denotes the convolution operator and 2 1 −x . gσ (x) = √ exp 2σ 2 2π σ The effect of the smoothing depends on the standard deviation σ used to define the Gaussian kernel. For each 1D histogram, σ is automatically determined by means of the procedure proposed by Lin, Wang and Yang (1996), so that the smoothed 1D histogram reveals its modes. Figure 42 shows the smoothed 1D histograms of the House image of Figure 42a. These smoothed 1D histograms can be easily analyzed for mode detection. c. Most Relevant Color Space Selection. The thresholds that delimit the i,j modes of each smoothed histogram Hσ (x) are determined by the analysis of the zero-crossing of its first-derivative function. A threshold is detected by i,j a zero-crossing of the first-derivative function of Hσ (x) whose sign changes from minus to plus (local minimum). A mode is detected by a zero-crossing of i,j the first-derivative function of Hσ (x) whose sign changes from plus to minus (local maximum). The number of these detected modes is denoted N i,j .
COLOR SPACES AND IMAGE SEGMENTATION
(a)
(b)
(c)
(d)
155
F IGURE 41. One-dimensional histograms H i,j of the House image in the (R, G, B) color space (i = 1). (a) Original image. (b) Red one-dimensional histogram H 1,1 . (c) Green onedimensional histogram H 1,2 . (d) Blue one-dimensional histogram H 1,3 . (See Color Insert.)
Three features characterize the kth (k = 1, . . . , N i,j ) detected mode of the i,j 1D histogram Hσ (x): i,j,k
• The left and right detected thresholds Tleft • The amplitude
i,j,k
and Tright
i,j,k
lTright
Ai,j,k = max hi,j σ (l). i,j,k
l=Tleft i,j
For each 1D histogram Hσ (x), K(i, j ) denotes the rank order of the detected mode with the highest amplitude.
156
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c)
(d) i,j
F IGURE 42. Smoothed one-dimensional histograms Hσ by Lin et al.’s method processed from one-dimensional histograms of Figure 41. (a) Original image. (b) Smoothed one-dimensional histogram Hσ1,1 . (c) Smoothed one-dimensional histogram Hσ1,2 . (d) Smoothed one-dimensional histogram Hσ1,3 . (See Color Insert.)
To illustrate the procedure, Table 10 shows the features of the detected modes in the smoothed histograms of Figures 42b–d with only the (R, G, B) color space (i = 1). Figure 43 shows the features of the detected modes from the smoothed histogram Hσ1,3 (x) in Figure 42c. The main problem of analysis for schemes 1D histograms is that they analyze only the similarities between the colors of the pixels and ignore their spatial arrangement in the image. However, a region is a subset of pixels that shares similar color properties and is strongly connected in the image plane. To select the most relevant color space, Busin et al. (2005) measure the connectedness degrees of subsets of pixels S by means of connected degrees
157
COLOR SPACES AND IMAGE SEGMENTATION
TABLE 10 F EATURES OF THE D ETECTED M ODES IN THE 1D H ISTOGRAMS OF F IGURES 42 B – D IN THE (R, G, B) C OLOR S PACE j
1
N i,j k 1,j,k Tleft 1,j,k Tright A1,j,k
CD(S) R 1,j
2
2
3
3
2
1
2
1
2
3
1
2
135
20
145
75
0
140
30
205
134
225
144
74
255
139
1556
373
1051
894
273
1077
929
0.93
0.94
0.94
0.92
0.96
0.96 1.89
2.80
0.97 1.93
j is the number of the color component in the considered color space (j ∈ {1, 2, 3}).
F IGURE 43. Features of the detected modes from the smoothed 1D histogram Hσ1,3 of the blue component. (See Color Insert.)
CD(S) introduced by Macaire, Vandenbroucke and Postaire (2006). The connectedness degree CD(S) is a normalized measure, so that a connectedness degree close to 1 means that the pixels belonging to the subset S are strongly connected in the image, whereas a connectedness degree close to 0 means that the pixels are sparsely scattered throughout the image. Table 10 shows the connectedness degrees CD(S) of the detected modes of the smoothed histograms shown in Figures 42b–d. Among the three smoothed 1D histograms of the ith color space, i = 1, . . . , NS , the most discriminating
158
BUSIN , VANDENBROUCKE AND MACAIRE
1D histogram is determined. For this purpose the discriminating power, dei,j noted R i,j , is evaluated for each smoothed 1D histogram Hσ (x). R i,j is the i,j,k i,j,k sum of the connectedness degrees of the subsets S[Tleft , Tright ] associated with the detected modes of the histogram i,j
R
i,j
=
N
i,j,k i,j,k CD S Tleft , Tright .
(82)
k=1
Because the Gaussian smoothing eliminates the nonsignificant modes, the 1D histogram allows discrimination of the classes if the number of the detected modes is high and if the sum of their connectedness degrees is high. Thus, the higher the discriminating power R i,j , the more probable the modes correspond to regions with the same colors in the image. Therefore, the most discriminating 1D histogram of the ith color space is the 1D histogram with the highest discriminating power R i,j . They denote J (i) as the rank order of the color component that corresponds to the most discriminating 1D histogram of the ith color space—that is, the 1D histogram with the highest discriminating power R i,j . Table 10 shows that, in our example, among the 3D histograms of Figure 42, the most discriminating histogram of the (R, G, B) color space (i = 1) is the histogram of the component G, so J (1) is set to 2. The most relevant color space is selected among the NS color spaces as that with the highest discriminating power. If the discriminating powers of several color spaces are equal, the most relevant color space among those spaces is selected as that which contains the 1D histogram with the second highest value of discriminating power R i,j . They denote I the rank order of the color space that is selected as the most relevant one. For the House image in Figure 41a, the most relevant color space that is selected at the first step is the (Y , U , V ) color space (I = 6). The most discriminating 1D histogram of the selected color space is the V component (J (6) = 3). d. One Class Construction. One class of pixels is constructed by analyzing the most relevant color space with rank order I , which has been determined at the current iteration step of the algorithm. This class of pixels is defined by a parallelepipedic box in the most relevant color space. Only one parallelepipedic box is selected to build the class of pixels. This box is delimited by two thresholds defined along each color component of the most relevant color space. Along the color component with rank order J (I ) which corresponds to the I,J (I ),K(I,J (I )) most discriminating 1D histogram, the two thresholds Tleft and I,J (I ),K(I,J (I ))
Tright are those that delimit the mode with the highest amplitude. The thresholds along the two other color components are selected among the
COLOR SPACES AND IMAGE SEGMENTATION I,j,k
159
I,j,k
thresholds Tleft and Tright , j = J (I ), determined by the mode detection stage. The selected thresholds delimit the box into which fall the color vectors of the highest population of pixels. The pixels whose color vectors fall into this box constitute the class of pixels constructed at the current iteration step. e. Pixels of the Class Extraction. The pixels assigned to the constructed class are extracted from the color image so that they are not taken into account at the next iteration steps of the procedure. The pixels that are assigned to this class and which are connected in the image constitute one of the reconstructed regions in the segmented image. f. Stopping Criterion. The iterative procedure stops when a percentage p of pixels of the image have not been assigned to any of the previously constructed classes. The parameter p, adjusted by the analyst, allows the desired coarseness of the segmentation be tuned. When the iterative procedure stops, the pixels that have not been assigned to any class could be assigned to one of the constructed classes by means of a specific decision rule. g. Results. This class construction scheme is based on the analysis of both the connectedness and the color properties of the subsets of pixels. To demonstrate the interest of this approach, Busin et al. (2005) propose segmenting the benchmark image named House (see Figure 44a) by means of the presented procedure (see Figure 44b). The extracted pixels of a constructed class at each iteration step of the algorithm are labeled with a false color in the segmented image of Figure 44b. These labels are represented in the first column of Table 11. As these false colors are ordered, the iteration steps at which the classes of pixels are constructed by examining the proposed segmented images results can be determined. Table 11 indicates the discriminating powers of the color spaces selected at each step of the procedure applied to the image in Figure 44a. It shows that there does not exist one single color space that is the most relevant at all iteration steps of the procedure. Figure 44b shows that this procedure provides satisfying segmentation results in terms of pixel classification. Indeed, the regions representing the different objects in the images are well reconstructed. The result obtained with the proposed method is compared with the one obtained when only the (R, G, B) color space is taken into account by each iteration step (see Figure 44c), and with the one obtained when only the most relevant color space selected at the first iteration step is taken into account by the other iteration steps (see Figure 44d). By examining the segmented images, Busin et al. (2005) conclude that the selection of the most relevant
160
BUSIN , VANDENBROUCKE AND MACAIRE
(a)
(b)
(c)
(d)
F IGURE 44. Segmentation of the House image (255 × 255 pixels) by the proposed method. (a) Original image House. (b) Segmented image House by the proposed method. (c) Segmented image House when the (R, G, B) color space is selected at each iteration. (d) Segmented image House when the (Y , U , V ) color space is selected at each iteration. (See Color Insert.) TABLE 11 N UMBER OF M ODES N I,J (I ) AND D ISCRIMINATING P OWER R I,J (I ) OF THE M OST D ISCRIMINATING 1D-H ISTOGRAM J (I ) OF THE C OLOR S PACE I S ELECTED AT E ACH S TEP OF THE P ROCEDURE A PPLIED TO THE H OUSE I MAGE Iteration step
(See Color Insert.)
I
J (I )
N I,J (I )
R I,J (I )
(Y , U , V ) (Y , U , V ) (R, G, B) (R, G, B) (bw, rg, by) (I 1, I 2, I 3) (r, g, b) (Y , U , V )
3 3 2 2 1 1 1 1
4 3 5 4 3 3 2 2
3.42 2.54 3.48 2.52 1.75 1.94 1.20 1.49
COLOR SPACES AND IMAGE SEGMENTATION
161
color space provides results that are more acceptable in terms of segmentation quality than the class construction achieved in one single color space. These results show that the selection of different color spaces at the iteration steps of the procedure, which is designed to discriminate the considered pixel classes, is a relevant solution for the construction of the regions. E. Conclusion This section has presented several evaluation methods of color image segmentation to highlight the relationships between segmentation and color spaces. These evaluation methods have been used to compare the segmentation results according to the chosen color space used to represent the colors of the pixels. We have shown that the choice of the color space depends on the kind of image to be segmented and the segmentation method. Because no single “best” color space exists for color image segmentation, several studies select them by means of decision rules. The results provided by these color image segmentation methods allow improved image segmentation results.
V. C ONCLUSION In this chapter we have first described the most used color spaces for digital color image analysis. We have pointed out that most transformations from (R, G, B) to another color space require the prior knowledge of the acquisition conditions (reference white, illuminating properties). Moreover, the coding of the color components must be adapted to the used color space. In the second section we presented classical segmentation schemes designed for exploiting colors, which can be divided in two families. Since edge detection requires postprocessing steps to yield closed boundaries, most schemes construct the regions in the image via either image plane analysis or color space analysis. Recent approaches tend to combine the spatial and the colorimetric analysis to improve the quality of the segmentation. The third section deals with the relationships between color spaces and segmentation. Regardless of the criteria used to evaluate the quality of a segmentation, no color spaces effect in which segmentation schemes provide efficient results of all images, which prompted our development of schemes to determine the color space that is the most adapted to a specific set of images. These schemes are based on multidimensional statistical criteria but not on psychovisual criteria. One improvement in our approaches could be the integration of these criteria to the selection of the color space for image segmentation.
162
BUSIN , VANDENBROUCKE AND MACAIRE
R EFERENCES Abney, W.W. (1913). Researches in Color Vision. Longmans, Green, London. Bala, R., Sharma, G. (2005). System optimization in digital color imaging. IEEE Signal Processing Magazine 22 (1), 55–63. Balasubramanian, R., Allebach, J.P., Bouman, C.A. (1994). Color-image quantization with use of a fast binary splitting technique. J. Opt. Soc. Am. 11 (11), 2777–2786. Ballard, D.H., Brown, C.M. (1982). Computer Vision. Prentice-Hall, New Jersey. Borsotti, M., Campadelli, P., Schettini, R. (1998). Quantitative evaluation of color image segmentation results. Pattern Recogn. Lett. 19 (8), 741–747. Braquelaire, J.-P., Brun, L. (1997). Comparison and optimization of methods of color image quantization. IEEE Trans. Image Proc. 6 (7), 1048–1052. Busin, L., Vandenbroucke, N., Macaire, L., Postaire, J.-G. (2004). Color space selection for unsupervised color image segmentation by histogram multithresholding. In: IEEE International Conference on Computer Vision. Singapore, pp. 203–206. Busin, L., Vandenbroucke, N., Macaire, L., Postaire, J.-G. (2005). Color space selection for unsupervised color image segmentation by analysis of connectedness properties. Int. J. Robot. Autom. 20 (2), 70–77. Chabrier, S., Emile, B., Rosenberger, C., Laurent, H. (2006). Unsupervised performance evaluation of image segmentation. In: Special Issue on Performance Evaluation in Image Processing. EURASIP J. Appl. Signal Proc. 5, 1–12. Chassery, J.M., Garbay, C. (1984). An iterative method based on a contextual color and shape criterion. IEEE Trans. Pattern Anal. Mach. Intell. 6 (6), 794–800. Cheng, H., Jiang, X., Wang, J. (2002). Color image segmentation based on homogram thresholding and region merging. Patt. Recogn. 35 (2), 373– 393. Cheng, H.D., Chen, C.H., Chiu, H.H., Xu, H.J. (1998). Fuzzy homogeneity approach to multilevel thresholding. IEEE. Trans. Image Proc. 7 (7), 1084– 1088. Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J. (2001). Color image segmentation: advances and prospects. Pattern Recogn. 34 (12), 2259–2281. Comaniciu, D., Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24 (5), 603–619. Commission Internationale de l’Éclairage (1986). Colorimetry. Technical report 15.2. Bureau central de la CIE, Vienna. Commission Internationale de l’Éclairage (1995). Industrial color-difference evaluation. Technical report 116. Bureau central de la CIE.
COLOR SPACES AND IMAGE SEGMENTATION
163
Commission Internationale de l’Éclairage (2004). A color appearance model for color management systems: CIECAM02. Technical report. 159. Bureau central de la CIE, Vienna. Cumani, A. (1991). A note on the gradient of multi-image edge detection in multispectral images. Comput. Vis. Graph. Image Proc. 53, 40–51. Davis, J., Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In: International Conference on Machine Learning, pp. 233– 240. Deng, Y., Manjunath, B. (2001). Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. Pattern Anal. Mach. Intell. 23 (8), 800–810. Deriche, R. (1990). Fast algorithms for low level vision. IEEE Trans. Pattern Anal. Mach. Intell. 12 (1), 78–87. Devijver, P., Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Prentice-Hall, New York. Di Zenzo, S. (1986). A note on the gradient of multi-image. Comput. Vis. Graph. Image Proc. 33, 116–125. Faugeras, O.D. (1979). Digital color image processing within the framework of a human visual model. IEEE Trans. Acoust. Speech Signal Process. 27 (4), 380–393. Firmin, C., Hamad, D., Postaire, J.-G., Zhang, R.D. (1996). Feature extraction and selection for fault detection in production of glass bottles. Machine Graphics & Vision 5 (1), 77–86. Foley, J.D., Dam, A.V., Feiner, S.K. (1990). Computer Graphics, Principles and Practice, 2nd ed. Addison-Wesley, New York. Fontaine, M., Macaire, L., Postaire, J.-G. (2000). Image segmentation based on an original multiscale analysis of the pixel connectivity properties. In: IEEE International Conference on Image Processing, vol. 1. Vancouver, BC, pp. 804–807. Garbay, C., Brugal, G., Choquet, C. (1981). Application of colored image analysis to bone marrow cell recognition. Anal. Quant. Cytol. 3 (4), 272– 280. Gillet, A., Macaire, L., Botte-Lecocq, C., Postaire, J.-G. (2002). Fuzzy filters for image processing. In: Nachtegael, M., Van der Weken, D., Van De Ville, D., Kerre, E.E. (Eds.), Color Image Segmentation by Analysis of 3D Histogram With Fuzzy Morphological Filters. In: Studies in Fuzziness and Soft Computing. Springer, pp. 154–177. Grassman, H. (1853). On the theory of compound colors. Phil. Mag. Series 4 (7), 254–264. Gunturk, B.K., Glotzbach, J., Altunbasak, Y., Schafer, R.W., Mersereau, R.M. (2005). Demosaicking: Color filter array interpolation. IEEE Signal Processing Magazine 22 (1), 44–54.
164
BUSIN , VANDENBROUCKE AND MACAIRE
Helmholtz, H. (1866). Handbuch der physiologishen Optik. Voss, Hambourg, Leipzig. Hering, E. (1875). Zur lehre vom lichtsinne. Wien. Math. Nat. Kl. 70, 169– 204. IEC 61966-2-1/FDIS (1999). Multimedia systems and equipment—Color measurement and management—part 2-1: Color management—Default RGB color space—SRGB. Technical report. International Electrotechnical Commission. Ismail, M.A., Kamel, M.S. (1989). Multidimensional data clustering utilizing hybrid search strategies. Patt. Recogn. 22 (1), 75–89. ITU-R BT.601-7 (2007). Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios. Technical report. International Telecommunication Union. ITU-R BT.709-5 (2002). Parameter values for the HDTV standards for production and international programe exchange. Technical report. International Telecommunication Union. Kender, J.R. (1976). Saturation, hue, and normalized color: Calculation, digitization effects, and use. Technical report. Department of Computer Science, Carnegie-Mellon University, Pittsburgh. Klinker, G.J., Shafer, S.A., Kanade, T. (1990). A physical approach to color image understanding. Int. J. Comput. Vis. 4 (1), 7–30. Lambert, P., Carron, T. (1999). Symbolic fusion of luminance-hue-chroma features for region segmentation. Patt. Recogn. 32 (11), 1857–1872. Lambert, P., Macaire, L. (2000). Filtering and segmentation: The specifity of color images. In: International Conference on Color in Graphics and Image Processing. Saint-Etienne, France, pp. 57–71. Lee, H.C., Cok, D. (1991). Detection boundaries in a vector field. IEEE Trans. Signal Proc. 39 (5), 1181–1194. Lee, J.H., Chang, B.H., Kim, S.D. (1994). Comparison of color transformations for image segmentation. Electron. Lett. 30 (20), 1660–1661. Lee, S.U., Chung, S.Y., Park, R.-H. (1990). A comparative performance study of several global thresholding techniques for segmentation. Comput. Vis. Graphics Image Proc. 52 (2), 171–190. Lee, T., Lewicki, M.S. (2002). Unsupervised image classification, segmentation, and enhancement using ICA mixture model. IEEE Trans. Image Proc. 11 (3), 271–279. Levkowitz, H., Herman, G.T. (1993). GLHS: A generalized lightness, hue, and saturation color model. Graph. Model Image Proc. 55 (4), 271–285. Lezoray, O. (2003). Supervised automatic histogram clustering and watershed segmentation. Application to microscopic medical images. Image Anal. Stereol. 22 (2), 113–120. Liew, A., Leung, S., Lau, W. (2000). Fuzzy image clustering incorporating spatial continuity. IEEE Proc. Vis. Image Sign. Proc. 147 (2), 185–192.
COLOR SPACES AND IMAGE SEGMENTATION
165
Lim, Y.W., Lee, S.U. (1990). On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Patt. Recogn. 23 (9), 935–952. Lin, H.-C., Wang, L.-L., Yang, S.-N. (1996). Automatic determination of the spread parameter in Gaussian smoothing. Patt. Recogn. Lett. 17, 1247– 1252. Littmann, E., Ritter, H. (1997). Adaptive color segmentation—A comparison of neural and statistical methods. IEEE Trans. Neural Netw. 8 (1), 175–185. Liu, J., Yang, Y.-H. (1994). Multiresolution color image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 16 (7), 689–700. Liu, Y., Shriberg, E. (2007). Comparing evaluation metrics for sentence boundary detection. IEEE Acoust. Speech Signal Proc. 4, IV-185–IV-188. Lozano, V., Colantoni, P., Laget, B. (1996). Color objects detection in pyramidal adjacency graphs. IEEE Int. Conf. Image Proc. 3, 1007–1010. Lyon, R., Hubel, P. (2002). Eyeing the camera: Into the next century. In: EUROCON 2003. Computer as a Tool. The IEEE Region 8, vol. 10, pp. 349–355. MacAdam, D.L. (1985). Color Measurement, Theme and Variation, 2nd rev. ed. Optical Sciences. Springer-Verlag, Berlin, New York. Macaire, L., Vandenbroucke, N., Postaire, J.-G. (2006). Color image segmentation by analysis of subset connectedness and color homogeneity properties. Comput. Vis. Image Under. 102 (1), 105–116. Makrogiannis, S., Economou, G., Fotopoulos, S. (2005). A region dissimilarity relation that combines feature-space and spatial information for color image segmentation. IEEE Trans. Syst. Man Cyber. B 35 (1), 44–53. Marcu, G., Abe, S. (1995). Three-dimensional histogram vizualisation in different color spaces and applications. J. Electr. Imaging 4 (4), 232–243. Martin, D., Fowlkes, C., Tal, D., Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. IEEE Int. Conf. Comput. Vis. 2, 416–423. Martin, D.R., Fowlkes, C.C., Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Patt. Anal. Mach. Intell. 26 (5), 530–549. Meas-Yedid, V., Glory, E., Morelon, E., Pinset, C., Stamon, G., OlivoMarin, J.-C. (2004). Automatic color space selection for biological image segmentation. Int. Conf. Patt. Recogn. 3, 514–517. Moghaddamzadeh, A., Bourbakis, N. (1997). A fuzzy region growing approach for segmentation of color images. Patt. Recogn. 30 (6), 867–881. Nevatia, R. (1977). A color edge detector and its use in scene segmentation. IEEE Trans. Syst. Man Cyber. 7 (11), 820–826. Nikolaev, D., Nikolayev, P. (2004). Linear color segmentation and its implementation. Comput. Vis. Image Understand. 94 (1–3), 115–139.
166
BUSIN , VANDENBROUCKE AND MACAIRE
Ohlander, R., Price, K., Reddy, D.R. (1978). Picture segmentation using a recursive region splitting method. Comput. Graph. Image Proc. 8, 313– 333. Ohta, Y.I., Kanade, T., Sakai, T. (1980). Color information for region segmentation. Comput. Graph. Image Proc. 13, 222–241. Orchard, M.T., Bouman, C.A. (1991). Color quantization of images. IEEE Trans. Signal Proc. 39 (12), 2677–2690. Panjwani, D.K., Healey, G. (1995). Markov random field models for unsupervised segmentation of textured color images. IEEE Trans. Patt. Anal. Mach. Intell. 17 (10), 939–954. Park, S.H., Yun, I.D., Lee, S.U. (1998). Color image segmentation based on 3D clustering: Morphological approach. Patt. Recogn. 31 (8), 1061–1076. Pascale D. (2003). A review of RGB color spaces. Technical report. The Babel Color Company, Montreal, Quebec. Peli, T., Malah, D. (1982). A study of edge detection algorithms. Comput. Graph. Image Process. 20, 1–21. Philipp, I., Rath, T. (2002). Improving plant discrimination in image processing by use of different color space transformations. Comput. Electr. Agr. 35 (1), 1–15. Phung, S.L., Bouzerdoum, A., Chai, D. (2005). Skin segmentation using color pixel classification: Analysis and comparison. IEEE Trans. Patt. Anal. Mach. Intell. 27 (1), 148–154. Postaire, J.-G., Vasseur, C. (1980). A convexity testing method for cluster analysis. IEEE Trans. Syst. Man Cybern. 10, 145–149. Poynton, C.A. (1993). Gamma and its disguises: The nonlinear mappings of intensity in perception, CRTs, film and video. SMPTE J. 102 (12), 1099– 1108. Poynton C.A. (1995). A guided tour of color space. In: SMPTE Advanced Television and Electronic Imaging Conference. San Fransisco, CA, pp. 167–180. Poynton, C.A. (1996). A Technical Introduction to Digital Video. John Wiley & Sons, New York. Pratt, W.K. (1978). Digital Image Processing. John Wiley and Sons, New York. 2nd edition, 1991. Rakotomalala, V., Macaire, L., Valette, M., Labalette, P., Mouton, Y., Postaire, J.-G. (1998). Bidimensional retinal blood vessel reconstruction by a new color edge tracking procedure. In: IEEE Symp. Image Anal. Interp. IEEE, pp. 232–237. Ramanath, R., Snyder, W.E., Yoo, Y., Drew, M.S. (2005). Color image processing pipeline. IEEE Sign. Process. Mag. 22 (1), 34–43. Robinson, G.S. (1977). Color edge detection. Opt. Eng. 16 (5), 479–484.
COLOR SPACES AND IMAGE SEGMENTATION
167
Román-Roldán, R., Gómez-Lopera, J.F., Atae-Allah, C., Martínez-Aroza, J., Luque-Escamilla, P.L. (2001). A measure of quality for evaluating methods of segmentation and edge detection. Patt. Recogn. 34, 969–980. Rosenfeld, A., Kak, A. (1981). Digital Image Processing, vol. Computer Science and Applied Mathematics. Academic Press, New York. Sangwine, S.J., Horne, R.E.N. (1998). The Color Image Processing Handbook. Chapman & Hall, London, New York. Sarabi, A., Aggarwal, J.K. (1981). Segmentation of chromatic images. Patt. Recogn. 13 (6), 417–427. Savoji, M.H., Burge, R.E. (1985). Note on different methods based on the Karhunen–Loeve expansion and used in image analysis. Comput. Vis. Graph. Image Proc. 29, 259–269. Schettini, R. (1993). A segmentation algorithm for color images. Patt. Recogn. Lett. 14, 499–506. Scheunders, P. (1997). A genetic c-means clustering algorithm applied to color image quantization. Patt. Recogn. 30 (6), 859–866. Shaffarenko, L., Petrou, M., Kittler, J. (1998). Histogram-based segmentation in a perceptually uniform color space. IEEE Trans. Image Proc. 7 (9), 1354–1358. Sharma, G., Trussell, H.J. (1997). Digital color imaging. IEEE Trans. Image Proc. 6 (7), 901–932. Shih, T.-Y. (1995). The reversibility of six geometric color spaces. Photogram. Eng. Remote Sens. 61 (10), 1223–1232. Sonka, M., Hlavac, V., Boyle, R. (1994). Image Processing, Analysis and Machine Vision. Chapman & Hall, London, New York. Stoclin, V., Duvieubourg, L., Cabestaing, F. (1997). Extension and generalization of recursive digital filters for edge detection. IEEE Int. Conf. Syst. Man Cyber. 4, 3666–3669. Strastersa, K.C., Gerbrands, J.J. (1991). Three-dimensional image segmentation using a split, merge and group approach. Patt. Recogn. Lett. 12, 307–325. Süsstrunk, S., Buckley, R., Swen, S. (1999). Standard RGB color spaces. In: 7th IS&T/SID Color Imaging Conference, vol. 7, pp. 127–134. Swain, M.J., Ballard, D.H. (1991). Color indexing. Int. J. Comput. Vis. 7 (1), 11–32. Takahashi, K., Nakatani, H., Abe, K. (1995). Color image segmentation using ISODATA clustering method. In: Second Asian Conference on Computer Vision, vol. 1. Singapore, pp. 523–527. Tkalcic, M., Tasic, J. (2003). Color spaces—perceptual, historical and applicational background. In: 10th IS&T/SID Color Imaging Conference, vol. 1, pp. 304–308. Tominaga, S. (1992). Color classification of natural color images. Color Res. Appl. 17 (4), 230–239.
168
BUSIN , VANDENBROUCKE AND MACAIRE
Trémeau, A., Borel, N. (1997). A region growing and merging algorithm to color segmentation. Patt. Recogn. 30 (7), 1191–1203. Trémeau, A., Colantoni, P. (2000). Regions adjacency graph applied to color image segmentation. IEEE Trans. Image Proc. 9 (4), 735–744. Trussell, H.J., Saber, E., Vrhel, M. (2005). Color image processing. IEEE Signal Proc. Magazine 22 (1), 14–22. Uchiyama, T., Arbib, M.A. (1994). Color image segmentation using competitive learning. IEEE Trans. Patt. Anal. Mach. Intell. 16 (12), 1197–1206. Ultré, V., Macaire, L., Postaire, J.-G. (1996). Determination of compatibility coefficients for color edge detection by relaxation. IEEE Int. Conf. Image Proc. 3, 1045–1048. Lausanne. Vandenbroucke, N., Macaire, L., Postaire, J.-G. (1998). Color pixels classification in an hybrid color space. IEEE Int. Conf. Image Proc. 1, 176–180. Chicago, IL. Vandenbroucke, N., Macaire, L., Postaire, J.-G. (2000a). Color systems coding for color image processing. Int. Conf. Color Graph. Image Proc. 1, 180–185. Saint-Etienne. Vandenbroucke, N., Macaire, L., Postaire, J.-G. (2000b). Unsupervised color texture features extraction and selection for soccer images segmentation. IEEE Int. Conf. Image Proc. 2, 800–803. Vancouver, BC. Vandenbroucke, N., Macaire, L., Postaire, J.-G. (2003). Color image segmentation by pixel classification in an adapted hybrid color space. Application to soccer image analysis. Comput. Vis. Image Unders. 90 (2), 190–216. Vannoorenberghe, P., Flouzat, G. (2006). A belief-based pixel labelling strategy for medical and satellite image segmentation. In: IEEE Int. Conf. Fuzzy Systems. IEEE, pp. 1093–1098. Vancouver, BC. Verikas, A., Malmqvist, K., Bergman, L. (1997). Colour image segmentation by modular neural network. Patt. Recogn. Lett. 18 (2), 173–185. Vrhel, M., Saber, E., Trussell, H.J. (2005). Color image generation and display technologies. IEEE Signal Proc. Mag. 22 (1), 23–33. Wesolkowski, S., Jernigan, M., Dony, R. (2000). Comparison of color image edge detectors in multiple color spaces. IEEE Int. Conf. Image Proc. 2, 796–799. Wyszecki, G., Stiles, W.S. (1982). Color Science: Concept and Methods, Quantitative Data and Formulas. John Wiley and Sons, New York. Young, T. (1807). Lectures on Natural Philosophy, vol. II. Johnson, London. Zhang, Y. (1996). A survey of evaluation methods for image segmentation. Patt. Recogn. 29 (8), 1335–1346. Zucker, S. (1976). Region growing: Childhood and adolescence. Comput. Vis. Graph. Image Proc. 5, 382–399.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 151
Generalized Discrete Radon Transforms and Applications to Image Processing GLENN R. EASLEY1 AND FLAVIA COLONNA2 1 System Planning Corporation, Arlington, VA 22209, USA 2 Department of Mathematical Sciences, George Mason University, Fairfax, VA 22030, USA
I. Introduction . . . . . . . . . . . . . . . . . . . . . . A. Definition and Properties of the Classical Radon Transform . . . . . . . B. Definition and Properties of the Discrete Radon Transform . . . . . . . C. Discrete versus Continuous Environments . . . . . . . . . . . . D. Generalized Discrete Radon Transforms for Estimation and Invariant Object Recognition . . . . . . . . . . . . . . . . . . . . . . . E. Topical Outline . . . . . . . . . . . . . . . . . . . . II. Background on Wavelets . . . . . . . . . . . . . . . . . . A. Discrete Implementation . . . . . . . . . . . . . . . . . B. Nonlinear Approximation . . . . . . . . . . . . . . . . . III. Beyond Wavelets . . . . . . . . . . . . . . . . . . . . . A. Ridgelets . . . . . . . . . . . . . . . . . . . . . . B. Local Multiscale Ridgelets . . . . . . . . . . . . . . . . . C. Curvelets and Shearlets . . . . . . . . . . . . . . . . . . IV. The Discrete p-Adic Radon Transform . . . . . . . . . . . . . . A. Improved Stability of the Discrete p-Adic Radon Transform . . . . . . . B. Discrete p-Adic Ridgelet Transform . . . . . . . . . . . . . . V. Generalized Discrete Radon Transform . . . . . . . . . . . . . . A. Special Cases . . . . . . . . . . . . . . . . . . . . . B. Generalized Discrete Ridgelet Transform . . . . . . . . . . . . C. Examples of Frame Elements . . . . . . . . . . . . . . . . D. A Direct Radon Matrix Approach . . . . . . . . . . . . . . . E. Generalized Discrete Local Ridgelet Transform and Discrete Curvelet Transform . VI. Noise Removal Experiments . . . . . . . . . . . . . . . . . VII. Applications to Image Recognition . . . . . . . . . . . . . . . A. Translation Invariant Feature Vectors . . . . . . . . . . . . . . B. Construction of a Rotation Invariant Feature Vector . . . . . . . . . . C. Classifiers: Feedforward Artificial Neural Network and Nearest-Neighbor Network VIII. Recognition Experiments . . . . . . . . . . . . . . . . . . IX. Conclusion . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
170 171 172 175 177 178 179 181 186 190 190 193 194 197 203 204 204 207 212 212 213 215 217 221 226 226 228 230 231 235
169 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)00403-X
Copyright 2008, Elsevier Inc. All rights reserved.
170
EASLEY AND COLONNA
I. I NTRODUCTION The concept of Radon transform, originally presented in 1917 by Johann Radon (Radon, 1917) is the underlying basis for the development of many of today’s imaging techniques. Its most famous application is computerized tomography, whose theoretical foundation was laid out by Nobel Prize winner Allan Cormack in the early 1960s (Cormack, 1963, 1964), although the first applications of the Radon transform appeared in radio astronomy and electron microscopy (Deans, 1983). In a general sense, the Radon transform projects data onto an appropriate number of lower-dimensional “slices.” In many image-sensing applications, the data collected are in the form of these slices. The inverse Radon transform thus allows reconstruction of the original data from the slices. In its classical formulation, the Radon transform can be described abstractly as follows. Let L be a collection of lines in R2 whose union is the entire plane. For a function f belonging to a suitable class such as '
2 1 2 |f | < ∞ L R = f : R → R: and ∈ L, define (Rf )() as the integral along of f . Specifically, if we describe a line in the plane as y = mx + t, where the parameters m and t are the slope and intercept of the line, we may define ∞ f (x, mx + t) dx.
(Rf )(m, t) = −∞
The inversion formula that allows reconstruction of the function f from g = Rf can be written as f = R ∗ Kg, where R ∗ is the dual transform (also known as the backprojection operator) given by ∞
∗
(R g)(b, x) =
g(m, b − mx) dm, −∞
and the operator K is defined by ∞ |s|g(m, ˆ s)e2πist ds,
(Kg)(m, t) = −∞
with ∞ g(m, ˆ s) = −∞
g(m, t)e−2πist dt.
GENERALIZED DISCRETE RADON TRANSFORMS
171
In operator form, the Radon inversion formula is thus R ∗ KR = I , where I is the identity operator. In dimension two, the Radon transform has been studied on the sphere S 2 , on the hyperbolic plane H2 , and on the projective plane P 2 R by Helgason and several others. In addition to lines, other classes of curves have been used in these spaces. For example, in the case of the hyperbolic plane, the families of curves that have been considered are the geodesics and the horocycles. For an in-depth study of the classical Radon transform, see Helgason (1980). In the past two decades, several discretizations of the Radon transform have been considered. In the infinite discrete setting of homogenous trees (i.e., trees all of whose vertices have the same number of neighbors), which are viewed as discrete analogues of H2 , two formulations of the Radon transform have been introduced: the geodesic Radon transform (also known as the X-ray transform) and the horocyclic Radon transform. References to these settings include articles by Berenstein et al. (1991), Betori and Pagliacci (1986), Betori, Faraut and Pagliacci (1989), Casadio-Tarabusi, Cohen and Colonna (2000), and Cohen and Colonna (1993). We now outline the background on the two environments for the Radon transform: the continuous or classical setting, and the discrete setting. We also address the issues associated with balancing the needs for the two settings in the application phase of the theory. A. Definition and Properties of the Classical Radon Transform The classical Radon transform can be described in the following manner. Definition I.1. The Radon transform is an operator that associates with each suitable function f defined on R2 and each pair (θ, t) ∈ [0, 2π ) × R, the value Rf (θ, t) = f (x, y)δ(x cos θ + y sin θ − t) dx dy, where δ is the Dirac distribution at the origin. The convolution of two functions f, g ∈ L1 (Rk ), k 1, is defined as f ∗ g(t) = f (t − s)g(s) ds, where the bold notation is used for a vector. The convolution operator is commutative by the invariance of the above integral under translation. We now recall two useful properties of the Radon transform (Natterer, 2001).
172
EASLEY AND COLONNA
F IGURE 1.
Illustration of the Fourier slice theorem.
Theorem I.1 (convolution property). Let f, g ∈ L1 (R2 ). Then R(f ∗ g) = Rf ∗ Rg, where the convolution on the left (respectively, right) is in R2 (respectively, in R). The Fourier transform of a function f ∈ L1 (Rk ) is defined as fˆ(ω) = f (t)e−2πit,ω dt, where t, ω denotes the standard inner product in Rk . Theorem I.2 (Fourier slice theorem). Given f ∈ L1 (R 2 ), let fˆ(ω1 , ω2 ) ( (θ, ω) be be the two-dimensional (2D) Fourier transform of f and let Rf the one-dimensional (1D) Fourier transform of Rf (θ, t) with respect to the parameter t. Then ( (θ, ω). fˆ(ω cos θ, ω sin θ ) = Rf An illustration of the Fourier slice theorem is shown in Figure 1. B. Definition and Properties of the Discrete Radon Transform Many discretizations of the Radon transform have been delineated. In one of the most notable formulations credited to Bolker (1987), the finite Radon
GENERALIZED DISCRETE RADON TRANSFORMS
173
transform of a real function f defined on a finite set S with respect to a collection B of subsets is the function Rf on B obtained by summing f over the subsets. Not much can be said about this transform under these general assumptions. When the set S has some algebraic or combinatorial structure, such as a finite or infinite group structure or a tree structure, it is possible to obtain some specific information on the transform (Berenstein et al., 1991; Betori, Faraut and Pagliacci, 1989; Casadio-Tarabusi, Cohen and Colonna, 2000; Kung, 1987). Let G be a finite group, that is, a finite set endowed with an associative binary operation (usually denoted multiplicatively) with an identity and such that every element has an inverse. Let F (G) be the space of complex functions on G of dimension |G|. Given f1 , f2 ∈ F (G), the associated convolution product is defined by (f1 ∗ f2 )(h) = |G|−1 f1 (g)f2 g −1 h . g∈G
Define the Radon projection of a function f ∈ F (G) along a normal subgroup H of G to be the function RH f on the quotient group G/N given by RH f (gH ) = |H |−1 f (x), x∈gH
for gH ∈ G/H . The values of the Radon projection are thus equal to the averages of the function f over the cosets gH = {gh: h ∈ H }. Given a positive integer N, let ZN be the cyclic group of order N—the quotient group Z/NZ whose elements are the sets consisting of the integers that differ by an integer multiple of N. For convenience of notation we shall often identify ZN with {0, 1, . . . , N − 1}. We shall also use the notation ZkN (k 2) for the Cartesian product of k copies of ZN . In the spirit of the continuous theory, we may view the set of )vectors defined on ZN as the finite discrete analog of L2 (R) = {f : R → R: |f |2 < ∞}. We shall denote this set by 2 (ZN ) with inner product x, y =
N−1
x(j )y(j ).
j =0
The following are definitions of discrete Radon transforms for the groups Z2p , where p is a prime, and Z22n , where n is any positive integer. Definition I.2. (See Hsung, Lun and Siu, 1996.) Let p be a prime. The discrete periodic Radon transform over 2 (Z2p ) is defined as follows. Given
174
EASLEY AND COLONNA
f ∈ 2 (Z2p ), l, m ∈ Zp , let Rfm1 (l) =
p−1
f x, [l + mx]p ,
x=0
and Rf02 (l) =
p−1
f (l, y),
x=0
where [k]p is the least nonnegative residue of k modulo p, i.e., the unique integer r ∈ {0, . . . , p − 1} such that p divides r − k (in symbols, r ≡ k (mod p)). Definition I.3. (See Hsung, Lun and Siu, 1996.) Let n be a positive integer and set N = 2n . The discrete dyadic Radon transform over 2 (Z2N ) is defined as follows. Given f (x, y) ∈ 2 (Z2N ), l, m ∈ ZN , and s ∈ ZN/2 , let Rfm1 (l)
=
N−1
f x, [l + mx]N ,
x=0
and Rfs2 (l)
=
N−1
f [l + psy]N , y .
x=0
The 1D periodic convolution of two functions on ZN is defined as (f g)(t) =
N−1
f [t − k]N g(k) for t ∈ ZN .
k=0
The 2D periodic convolution of two functions f, g on Z2N is defined as (f 2 g)(x, y) =
N−1 N−1
f [x − k]N , [y − m]N g(k, m) for x, y ∈ ZN .
k=0 m=0
Discrete analogs of Theorems I.1 and I.2 hold for both the periodic and the dyadic cases. We state these results for only the dyadic case. Theorem I.3 (discrete Radon convolution property). (See Hsung, Lun and Siu, 1996.) Let n be a positive integer, N = 2n , and let f, h ∈ (Z2N ). Define
GENERALIZED DISCRETE RADON TRANSFORMS
175
g = f 2 h. Then 1 Rgm = Rfm1 Rh1m
and Rgs2 = Rfs2 Rh2s for m ∈ ZN , and s ∈ ZN/2 . Theorem I.4 (discrete Fourier slice theorem). (See Lun, Hsung and Shen, 2003.) Let N = 2n with n ∈ N, let f be a function in 2 (Z2N ), and let fˆ denote the 2D discrete Fourier transform (DFT) of f defined by fˆ(u, v) =
N−1 N−1
f (x, y)e2πi(xu+vy)/N ,
x=0 y=0
where u, v ∈ ZN . Then, for m, u, v ∈ ZN , and s ∈ ZN/2 , N−1 ˆ f u, [−2su]N = Rfs2 (l)e−2πilu/N , l=0
fˆ [−mv]N , v =
N−1
Rfm1 (l)e−2πilv/N .
l=0
C. Discrete versus Continuous Environments As noted previously, different versions of the Radon transform are used depending on whether the problem is viewed as discrete or continuous. When developing applications for image processing there is often a mix of the two approaches. For example, consider the case when the data are collected from a computer-assisted tomography (CAT) scanner. The data are then often interpreted as samples of the continuous Radon transform and the desired image is formed by discretizing the continuous inversion formulas. There are several approaches to this inversion that depend on whether the data are manipulated in the frequency or the time domain and on what efficiency and/or accuracy is desired. Ultimately, the data are manipulated and presented in a finite discrete setting. The discrete Radon transform can be used to turn a finite discrete dataset (a matrix representing an image) into a form that allows for a more efficient and useful arrangement, as will be seen later. For instance, the periodic Radon transform was proposed by Gertner (1988) to implement a more efficient 2D fast Fourier transform.
176
EASLEY AND COLONNA
Although this aspect of the mixture of continuous and discrete derivation is generally understood, another subtle aspect is not widely acknowledged. This aspect deals with taking continuous concepts and translating them into a completely discrete setting. An example of such an endeavor is the discrete reformulation of the Heisenberg–Gabor uncertainty principle (Donoho and Stark, 1989). The two versions of this principle are stated in Theorem I.5 (Flandrin, 1999) and Theorem I.6 (Donoho and Stark, 1989). Definition I.4.
The moments of inertia are defined as ∞ 1/2 1 2 2 t |x(t)| dt t = x2 −∞
and 1 ω = x2 where x2 = (
)
2 R |x(t)| dt
∞
2 dω ˆ ω x(ω)
1/2
2
,
−∞
< ∞)1/2 .
Theorem I.5 (Heisenberg–Gabor uncertainty principle). tω
1 . 4π
Theorem I.6 (discrete uncertainty principle). Given x : ZN → C and its DFT N−1 1 x[n]e−2πink/N , x[k] ˆ =√ N n=0
for k = 0, . . . , N − 1, let tN and ωN be the number of nonzero entries in x and x, ˆ respectively. Then tN ωN N. This is the basis for a new field of study known as compressive sensing (Candès, Romberg and Tao, 2006). Another example of this dichotomy between continuous and discrete environments is the formulation of the concepts of frame and sampling in terms of matrices (Gröchenig, 1993). A notion of smoothness in the discrete setting has been given by Easley, Healy and Berenstein (2005b). In addition to providing an appropriate description for what happens in the digital setting, new approaches at discretizing continuous concepts have led to error-free solutions in several
GENERALIZED DISCRETE RADON TRANSFORMS
177
applications, including the multichannel deconvolution problem (Colonna and Easley, 2004). D. Generalized Discrete Radon Transforms for Estimation and Invariant Object Recognition The concept of a general discrete Radon transform that we shall define is based on the realization that for many physical phenomena nonradial decompositions of the 2D frequency domain are needed. For instance, magnetic resonance imaging (MRI) can presently be done by collecting samples in the Fourier domain in a spiral-like fashion (Macovski and Meyer, 1986). Much research has been done in addressing whether such sampling collections provide enough data to uniquely form images (Benedetto and Ferreira, 2001), and many algorithms have been devised based on resampling the data (Sha, Guo and Song, 2003). The formulations we devised offer new tools for approaching these problems in the digital setting more effectively. Another motivation for introducing the concept of a generalized discrete Radon transform comes from the realization that this transform can be useful when dealing with large multidimensional datasets and trying to detect or to extract useful patterns contained in them. In particular, it can be used to reduce the dataset to be analyzed to a lower-dimensional one. For estimation purposes, the transform has the property that the Radon projection slices tend to be smoother than similar slices obtained directly from the data. Indeed, on the average, the transform rearranges the frequency data so that highfrequency values are turned into lower-frequency values in a slice. Thus, the transform can also be interpreted as a vehicle for providing averages over various frame elements and, as an averaging operator, it tends to smooth data. This property of the generalized discrete Radon transform makes it well suited for estimation. We develop new techniques based on the new discrete Radon and ridgelet transforms for the recognition of objects in images that arise in various settings, such as facial and target identification and inspection of carry-on luggage. Medical imaging applications include chromosome classification and tumor identification. For image recognition, the objective is to extract features that are independent of the position, orientation, and size of the object to be identified. Thus, for object recognition the needed features must be invariant under translation, rotation, and scaling. Techniques aimed at providing rotation invariant features include the standard Radon transform, the Zernike moments, and many complex artificial neural networks (see AlShaykh and Doherty, 1996; Fukumi, Omatu and Nishikawa, 1997; Khotanzad and Hong, 1990; Pun and Lee, 2003; and You and Ford, 1992). As will be
178
EASLEY AND COLONNA
seen, the success in the use of the generalized discrete Radon transform in a ridgelet decomposition for denoising purposes suggests that the rotational invariance of the Radon transform combined with the wavelet transform along the projections could provide feature vectors robust to rotation and noise. These new Radon transforms are well suited for this task because they are algebraically exact (no interpolation or approximation is required), geometrically faithful, and computationally efficient. E. Topical Outline Section II provides background information on wavelets that explains how their transforms are computed and their usefulness. In addition, we recall the definition of frame. Section III describes alternative 2D transforms, such as the ridgelet and the curvelet transforms, that were developed to improve the sparse representations of a certain class of images. For this purpose, a discrete Radon transform based on projecting the data onto a pseudopolar grid (see Figure 11b) was developed by Stark, Candès and Donoho (2002). Do and Vetterli (2000a, 2000b) proposed an alternative discrete ridgelet transform (finite ridgelet transform [FRIT]) based on the discrete periodic Radon transform. The existence of two effective yet completely different discrete Radon transforms for computing the discrete ridgelet transform provided the impetus for developing the generalized discrete Radon transform that incorporated both versions. The new transforms presented here offer broad opportunities for applications by considering a wide class of frequency decompositions. In Section IV we develop the discrete p-adic ridgelet transform, which is the application of the wavelet transform to the p-adic Radon transform projections. After establishing the necessary framework for the p-adic Radon transform, we derive an inversion formula and generalize some theorems provided by Do and Vetterli (2001) for the prime case and by Hsung, Lun and Siu (1996) for the dyadic case. We then formulate the p-adic Radon transform R as a matrix in terms of the DFT matrix and use it to find the eigenvalues of R T R. We show that the discrete p-adic Radon transform on a square grid of size pn yields a frame in 2 (Zp2n ) and establish its tightest frame bounds. Next we discuss how to improve the stability of the discrete p-adic Radon transform. Furthermore, this transform yields a new approach for carrying out the local ridgelet transform and curvelet transform in a discrete setting, as envisioned by Stark, Candès and Donoho (2002). Section V presents the generalized discrete Radon transform to be used for the ridgelet transform. This transform is based on replacing the DFT matrix with an orthogonal matrix Q and on considering in place of the
GENERALIZED DISCRETE RADON TRANSFORMS
179
standard projection slices a larger class of selection slices (described by selection matrices). For any choice of orthogonal matrix and selection matrix, a scaled version of the generalized discrete Radon transform yields a tight frame. When Q is the normalized DFT matrix, the discrete convolution property holds for any choice of selection matrix. We outline a direct Radon matrix approach and indicate a way of improving performance by means of regularization. Furthermore, we outline the procedure for the implementation of the discrete local ridgelet transform and the discrete curvelet transform. The numerical demonstrations in Section VI show that for certain choices of the orthogonal matrix Q, the generalized discrete ridgelet transform outperforms the standard ridgelet transform. We tested the performances of the algorithms based on the discrete local ridgelet and the curvelet transforms and compared our methods with other denoising routines. In Section VII we investigate new techniques based on the generalized discrete Radon and ridgelet transforms for image recognition. We present results on the use of several versions of the discrete Radon transform to derive rotationally invariant feature vectors. Section VIII details recognition experiments using a database consisting of objects presented at all angles between 0 and 360 degrees. The results of the classification show that the methods based on the new discrete Radon transforms outperform the standard Radon transform techniques. The conclusions are summarized in Section IX.
II. BACKGROUND ON WAVELETS A basic problem in signal and image processing is the construction of simple, easily generated countable collections of functions {ψi : i ∈ I } so that any signal f in a certain class has a representation of the form ∞ f (t)ψi (t) dt ψi = f, ψi ψi , f = i∈I
i∈I
−∞
with some control on the coefficients f, ψi . A representation of f is sparse if most of the energy of f can be recovered using only a “few” terms in the representation: f, ψi 2 . f 2 ≈ few terms
If f has a sparse representation with respect to a basis {ψi : i ∈ I }, then “most” representation coefficients are small (< T ). Thus, we can set to 0 the coefficients |f, ψi | < T and still preserve most of the information.
180
EASLEY AND COLONNA
If u = f + , where is considered to be noise, a sparse representation of f has the ability to compress most of the signal into the largest representation coefficients and to transform the noise into small representation coefficients. As a result, thresholding the small coefficients removes most of the noise. A continuous function f on an interval [0, L] can be represented by the Fourier series (Fourier, 1807, published in 1822): f (x) = f, en en (x), n∈Z
where en (x) = L2 (R) as
e−2πinx/L .
Wavelets provide representations of functions on
f (x) =
f, ψj,k ψj,k (x),
j,k∈Z
where the basis functions are obtained as dilations and shifts of wavelet function ψ: ψj,k (x) = 2j/2 ψ 2j x − k . Fourier series are not localized, that is, the basis functions are uniform waves since e2πinx/L = cos (2π nx/L) + i sin (2π nx/L). On the contrary,
F IGURE 2. Illustration of a possible time-frequency tiling of the time-frequency plane. The Heisenberg–Gabor uncertainty principle dictates that each rectangle must have an area greater than or 1 . equal to 4π
GENERALIZED DISCRETE RADON TRANSFORMS
181
the wavelet ψ is typically a function with fast decay in both R (the time domain) and * R (the frequency domain). Figure 2 illustrates an example of the time-frequency tiling of a wavelet transform. Formally, a wavelet is defined in terms of the following admissibility condition. A real-valued function ψ ∈ L2 (R) is called a wavelet if 2 ψ(ω) /|ω| dω < ∞, ˆ
Definition II.1.
ˆ where ψ(ω) =
)
ψ(x)e−2πiωx dx. A. Discrete Implementation
An important orthonormal system in wavelet analysis is known as the Haar system (Haar, 1910). This system is one of the simplest and earliest examples of an orthonormal wavelet basis (Walnut, 2002). Given a set X and A ⊂ X, the characteristic function of A is the function 1A on X defined by + 1A (x) = 1 if x ∈ A, 0 if x ∈ / A. Definition II.2. Define
Let p(t) = 1[0,1) (t), and h(t) = 1[0,1/2) (t) − 1[1/2,1) (t).
pj,k (t) = 2j/2 p 2j t − k ,
and hj,k (t) = 2j/2 h 2j t − k
for each j, k ∈ Z. The collection {hj,k (t)} is a complete orthonormal system on R called the Haar system. Thus, given a nonnegative integer J , we may describe a function f defined on [0, 1] by its coefficients in the Haar basis representation as ∞ 2 −1 2 −1 f (t) = f, hj,k hj,k (t) + f, pJ,k pJ,k (t) j
J
j =J k=0
k=0
in L2 ([0, 1]). In this decomposition, J is the number of resolutions used, f, hj,k are the detailed (wavelet) coefficients at scale j , and f, pJ,k are the smooth or coarse versions of the original signal f . In general, a scaling function φ and an orthogonal wavelet function ψ satisfy the relations h(n)φj +1,n+2k (t), φj,k (t) = n
182
EASLEY AND COLONNA
ψj,k (t) =
g(n)ψj +1,n+2k (t),
n
for some filter h, where g(k) = (−1)k h(1 − k). In the case of the Haar system, φ(t) = p(t) and ψ(t) = h(t). Since 1 pj,k (t) = √ pj +1,2k (t) + pj +1,2k+1 (t) , 2 1 hj,k (t) = √ pj +1,2k (t) − pj +1,2k+1 (t) , 2 we deduce that the filters are ⎧ √
√ 2, for n = 0, ⎨ 1/ √ 1/ 2, for n = 0, 1, g(n) = −1/ 2, for n = 1, h(n) = ⎩ 0, otherwise, 0, otherwise. The scaling filter h(n) and the wavelet filter g(n) satisfy the following properties: √ " • h(n) = 2 "n g(n) = 0 • "n " • k h(k)h(k − 2n) = k g(k)g(k − 2n) = δ(n) " • g(k)h(k − 2n) = 0 for all n ∈ Z "k " • k h(m − 2k)h(n − 2k) + k g(m − 2k)g(n − 2k) = δ(n − m). Assuming c0 = f considered as a vector of length N ∈ N, the wavelet decomposition can be computed recursively as cj +1,k = ↓(cj ∗ h), dj +1,k = ↓(cj ∗ g), ¯ g() = g(−), ¯ and ↓ denotes downsampling so where h() = h(−), that only even-numbered indices are kept. The convolution usually is done by assuming either a reflexive boundary condition (cj,k+N = cj,N −k ) or a periodic boundary condition (cj,k+N = cj,k ). The upsampling operator ↑ is applied by inserting a zero between adjacent entries of c(n). Using this notation, (H ∗ c)(n) = (↑c) ∗ h(n) and (G∗ c)(n) = (↑c) ∗ g(n). Let dj (k) = dN−j,k , and cj (k) = cN −j,k . The inverse transform is computed by the formula cj (k) = (H ∗ cj +1 )(k) + (G∗ dj +1 )(k) or cj = h ∗ ↑(cj +1 ) + g ∗ ↑(dj +1 ).
GENERALIZED DISCRETE RADON TRANSFORMS
183
In the finite discrete setting, we have the following: Definition II.3. Let h be a scaling filter and let g(k) = (−1)k h(1 − k). For N = 2J for some J ∈ N, the discrete wavelet transform (DWT) of a signal c0 : ZN → C is the collection of sequences
dj (k): 1 j J ; k ∈ ZN ∪ cJ (k): k ∈ ZN , where cj +1 = ↓(cj ∗ h)
and
dj +1 = ↓(cj ∗ g).
The inverse DWT is defined by the formula cj = h ∗ ↑(cj +1 ) + g ∗ ↑(dj +1 ). Although this wavelet transform algorithm leads to many successful applications such as compression, it is not very effective for other types of analysis. To improve the performance of the transform for analysis of data, we implement the wavelet via the à trous algorithm. To describe this “with holes” algorithm, we define hj to be equal to h() if /2j ∈ Z and 0 otherwise. The coefficients are computed recursively as cj +1,k = cj ∗ hj , dj +1,k = cj ∗ g j . The inverse is computed using a similar scheme. Another version of the wavelet transform that is powerful for denoising purposes is an extension of the orthogonal wavelets known as biorthogonal wavelets. Given a linearly independent set {xi }m xi }m i=1 , a collection of vectors {˜ i=1 is m said to be biorthogonal to {xi }i=1 if ˜xi , xj = δi,j , where δ denotes the Kronecker delta. The collection {˜xi }m i=1 is known as a dual basis of {xi }m . i=1 Similarly, from a continuous perspective, given a linearly independent sequence of functions {fn }n∈N in L2 (I ) (I ⊂ R interval), we say that a sequence of functions {f˜n }n∈N in L2 (I ) is biorthogonal to {fn } if ˜ fn , fm = fn (x)f˜m (x) dx = δ(n − m), where δ is the Dirac distribution.
184 Definition II.4. Riesz basis if
EASLEY AND COLONNA
A sequence of functions {fn } in L2 (I ) (I ⊂ R interval) is a
1. {fn } is linearly independent and 2. there exist positive constants A and B with A B such that for all continuous compactly supported functions f on I , Af 22
∞ f, fn 2 Bf 2 . 2
(1)
n=1
A sequence {fn } in L2 (I ) that satisfies condition 2 is called a frame. The constants A and B are called the frame bounds. The representation provided by a frame is stably invertible. The following example given in Walnut (2002) illustrates the need for stability. Example II.1. Let {en }n∈N be an orthonormal basis in L2 ([0, 1]) and set fn = n1 en . Then {fn } is linearly independent but is not a frame because if Eq. (1) holds, then taking as test function f = em (m ∈ N), we deduce that A 1/m2 , and "hence A = 0. Since f = ∞ n=1 f, en en , we obtain f =
∞
f, fn nen .
n=1
If, at some index m ∈ N, the value of the coefficient f, em has an added error of due to noise, then the estimate of f provided by the reconstruction formula yields f˜ =
∞ f, en nen + mem . n=1
Thus, f − f˜22 = ||2 m2 . From a signal and image-processing perspective, frames provide a stable representation suitable for analysis and synthesis. In the finite discrete setting, the issue of stability can be addressed by considering a discrete version of frames. Definition II.5. For positive integers k and m, a discrete frame of Rk is k a sequence {ϕn }m n=1 in R satisfying the following condition. There exist
GENERALIZED DISCRETE RADON TRANSFORMS
185
positive constants A and B with A B such that for all x ∈ Rk m x, ϕn 2 Bx2 . Ax 2
(2)
n=1
A discrete frame is said to be tight if A = B. The ratio of the frame bounds B/A measures the numerical stability of the reconstruction of x from x, ϕn since it is an upper bound for the condition number of the matrix operator associated with the frame. Indeed, if {ϕn }m n=1 is a discrete frame, and F is the m × k matrix whose nth row is ϕn , then Eq. (2) may be written as x T Ax x T F T F x x T Bx.
(3)
Hence the eigenvalues of F T F are between the frame bounds. Thus, the tightest bounds occur when A = λmin and B = λmax , the smallest and the largest eigenvalue of F T F , respectively. The condition number κ of F with respect to the 2 -norm satisfies the condition , λmax λmax B = . κ= λmin A λmin Thus, the stability of the representation in terms of a tight discrete frame is the best possible. To construct biorthogonal wavelets, the goal is to find scaling functions ϕ and ϕ˜ such that h(k)21/2 ϕ(2x − k), and ϕ(x) = k
ϕ(x) ˜ =
1/2 ˜ ϕ(2x ˜ − k), h(k)2
k
where h and h˜ must satisfy conditions similar to those stated for the orthogonal wavelet filter. By defining dual filters g and g˜ by ˜ − k) and g(k) ˜ = (−1)k h(1 − k), g(k) = (−1)k h(1 {ψj,k } and {ψ˜ j,k } are Riesz bases on R where 1/2 ˜ g(n)21/2 ϕ(2x − n) and ψ(x) = g(n)2 ˜ ϕ(2x ˜ − n). ψ(x) = n
n
This fact indicates the uniqueness and stability of the representation provided by biorthogonal wavelets. The algorithm for computing the biorthogonal wavelet transform is shown below.
186
EASLEY AND COLONNA
Definition II.6. Let h and h˜ be scaling filters, N = 2J for some J ∈ N. Define ˜ − k) and g(k) ˜ = (−1)k h(1 − k). g(k) = (−1)k h(1 The biorthogonal discrete wavelet transform of a signal c0 : ZN → C is the collection of sequences
dj (k): 1 j J ; k ∈ ZN ∪ cJ (k): k ∈ ZN , where cj +1 = ↓(cj ∗ h)
and
dj +1 = ↓(cj ∗ g).
The inverse transform is defined by the formula cj = h˜ ∗ ↑(cj +1 ) + g˜ ∗ ↑(dj +1 ). In higher dimensions, the wavelet transform is computed by taking products of 1D wavelets. In dimension two, given a 1D scaling function φ and a wavelet function ψ, the three functions ψ 1 (x, y) = φ(x)ψ(y), ψ 2 (x, y) = ψ(x)φ(y), and ψ 3 (x, y) = ψ(x)ψ(y) generate a biorthogonal basis for L2 (R2 ) by translation and dilation. An illustration of the 2D DWT is shown in Figure 3. An isotropic à trous wavelet transform algorithm can be implemented by obtaining the wavelet coefficients as the difference between resolutions: wj +1,l,k = cj,l,k − cj +1,l,k .
" In this case, cj +1,l,k = (hj hj ∗ cj )l,k and c0,l,k = cJ,l,k + Jj=1 wj,l,k . An illustration of the 2D isotropic discrete wavelet transform implemented by an à trous algorithm is shown in Figure 4. B. Nonlinear Approximation Because they are well localized, wavelets are particularly effective in representing functions that have discontinuities. Indeed, if f has a discontinuity at x0 , only a few wavelet coefficients f, ψj,k are needed to represent f near x0 accurately. This is not true for Fourier series. The nonlinear approximation of f is defined by f, ψi ψi , f˜N = i∈IN
where IN is the set of the indices corresponding to the N largest coefficients (F ) (W ) (R) (LR) |f, ψi |. We shall use the notation f˜N (respectively, f˜N , f˜N , f˜N ,
GENERALIZED DISCRETE RADON TRANSFORMS
F IGURE 3.
187
The 2D discrete wavelet transform of the image of a house.
F IGURE 4. The coefficients of an isotropic à trous algorithm applied to an image. Clockwise, the images are from the first to the fourth decomposition level.
188
EASLEY AND COLONNA
f˜N(C) , and f˜N(S) ) when referring to the Fourier (respectively, wavelet, ridgelet, local ridgelet, curvelet, and shearlet) nonlinear approximation of f . To illustrate the effectiveness of a wavelet representation, consider the following example. Let s > 0 and let f (t) = 1{t>0} h(t), where h is a smooth function of compact support in the Sobolev space W2s , that is, h satisfies the condition ∞
2 dω < ∞. ˆ |ω|2s h(ω)
−∞
The nonlinear approximation error for f has the following decay rates (Candès, 1999b): • Fourier approximation error: f − f˜(F ) 2 CN −1/2 , N • Wavelet approximation error: f − f˜(W ) 2 CN −s , N
for N large.
for N large.
It is known that wavelets provide the optimal approximation error rate in dimension one for signals that are smooth away from isolated discontinuities. An example is provided in Figure 5. Generally, wavelets are not optimal in dimensions greater than one. Indeed, there are other kinds of discontinuities, such as discontinuities along curves and surfaces. Since wavelets interact extensively with distributed discontinuities, many coefficients are needed to represent these features accurately. The illustration in Figure 3 provides a clear example of this interaction, in which many diagonal elements are shared among the three wavelet regions. In an optimal representation, such commonality among regions does not occur. Let f be a C 2 function away from a C 2 edge, that is, a composite of a 2 C function plus an indicator function of a set whose boundary is C 2 . An illustration is shown in Figure 6. Theoretically, the nonlinear approximation error for this type of function has the following decay rates (Candès and Donoho, 2002): • Fourier approximation error: f − f˜(F ) 2 CN −1/2 , N
for N large.
• Wavelet approximation error: f − f˜(W ) 2 CN −1 , N
for N large.
GENERALIZED DISCRETE RADON TRANSFORMS
189
F IGURE 5. Top, the plot of the given signal. Middle, the plot of the nonlinear Fourier basis approximation consisting of 60 terms; the relative error is 0.1553. Bottom, the plot of the nonlinear wavelet basis approximation consisting of 60 terms; the relative error is 0.0200.
F IGURE 6.
An example of an f ∈ C 2 (R2 ) away from a C 2 edge.
190
EASLEY AND COLONNA
• Optimal approximation error: f − f˜N 2 CN −2 ,
for N large.
The optimal approximation rate can be obtained by adaptive triangulations (Donoho, 2001). To accomplish this, one considers a dictionary of indicator functions of triangles with arbitrary shapes and locations. A natural question that arises is: Is it possible to obtain such a rate with a new kind of basis that is nonadaptive? The answer is affirmative, as will be seen in the next section.
III. B EYOND WAVELETS To motivate the origins of a ridgelet frame, let us consider the following simple model. Given constants θ0 and τ , assume the image f is a smooth function except for a linear edge described by f (x, y) = 1{x cos θ0 +y sin θ0 −τ } h(x, y)
for (x, y) ∈ [0, 1],
with h ∈ W2s , s being the degree of smoothness. Then, we have the following approximation rate independent of s (Candès, 1999b): • Wavelet approximation error: f − f˜(W ) 2 CN −1/2 , N
for N large.
To improve the approximation error rate, more geometrically oriented transforms are needed. One such transform is the ridgelet transform. A. Ridgelets The ridgelet transform of an image f can be simply stated as the result of the application of the wavelet transform to the Radon transform of f . Definition III.1. Let ψ be a 1D wavelet, that is, a function on R satisfying ) ˆ 2 the admissibility condition |ψ()| d < ∞. The ridgelet coefficients of f ||2 are defined by Rf (a, b, θ ) = Rf (θ, t)a −1/2 ψ (t − b)/a dt. A ridgelet may be viewed as
ψa,b,θ (x, y) = a −1/2 ψ (x cos θ + y sin θ − b)/a .
191
GENERALIZED DISCRETE RADON TRANSFORMS
F IGURE 7.
Illustration of a ridgelet ψa,b,θ (x, y).
Figure 7 is an illustration of such a function. Fixing a value θ0 , the parameters of this transform can be discretized as follows ψj,l,k (x, y) = 2j/2 ψ 2j x cos(θj,l ) + y sin(θj,l ) − k j j ,l,k , 0
where θj,l = 2π θ0 l2−j , j, k ∈ Z, l = 0, . . . , 2i−1 − 1, i i0 , j (Candès, 1999a). To explain the usefulness of the ridgelet transform in its estimating capabilities, we note that the Radon transform changes a 2D representation problem of a 1D singularity (edge) into several 1D representation problems that deal with point singularities. In practice, the application of the Radon transform R to an image f increases its smoothness. Roughly, Rf has one-half derivative more smoothness than f . The following provides a good example of what is meant by increase of smoothness. Example III.1. Let f be the characteristic function of the disk centered at the origin of radius R. Then the Radon transform of f is
√ 2 2 if |t| R, Rf (θ, t) = 2 R − t 0 otherwise. For a fixed θ , Rf (θ, t) is a continuous function of t that is not smooth only at two points. This is in contrast to a direct slice through f , which would yield the characteristic function 1{|t|R} . Thus, for a fixed θ , Rf (θ, t) can be more efficiently represented in a wavelet basis than 1{|t|R} . Figure 8 shows how these slices can be approximated in a Fourier and a wavelet-based representation. Furthermore, the wavelet-based approximation for the continuous slice is better than the wavelet-based approximation of the (discontinuous) characteristic function. Theorem III.1. (See Candès, 1999b.) Given h ∈ W2s (R2 ), and constants θ0 and τ , let f (x, y) = 1{x cos θ0 +y sin θ0 −τ } h(x, y). Then the nonlinear
192
EASLEY AND COLONNA
(a)
(b)
(c)
F IGURE 8. (a) Plots of the given signals. (b) Plots of the nonlinear Fourier basis approximation consisting of 24 terms; their relative errors are 0.0959 and 0.0344. (c) Plots of the nonlinear wavelet basis approximation consisting of 24 terms; their relative errors are 0.0530 and 0.0098.
GENERALIZED DISCRETE RADON TRANSFORMS
193
orthonormal ridgelet approximation error has decay rate f − f˜(R) 2 CN −s/2 hW s . N 2 The rate of approximation in Theorem III.1 is the same as for a function without a singularity. When dealing with images that have more than just straight edges, an effective technique is to apply the ridgelet transform locally and at multiscales. B. Local Multiscale Ridgelets To define the local multiscale ridgelet transform, partition the unit square into dyadic squares of length 1/2s for some scale parameter s > 0. Specifically, for integers k1 , k2 between 0 and 2s − 1, the dyadic square at scale s parametrized by k1 and k2 is defined by [k1 /2s , (k1 + 1)/2s ) × [k2 /2s , (k2 + 1)/2s ). Denote by Qs the collection of all dyadic squares at scale s. For each dyadic square Q, choose a window function wQ such that 2 wQ = 1. Q∈Qs
Define the rescaling operator TQ by
TQ (f )(x, y) = 2s f 2s x − k1 , 2s y − k2 .
Definition III.2. The local multiscale ridgelets are defined by ψQ,j,l,k = wQ TQ (ψj,l,k ), for j, k ∈ Z, l = 0, . . . , 2i−1 − 1, i i0 , j , and Q ∈ Qs . This is just a formal way of saying that for a given scale parameter s, the image space is divided into dyadic squares and a windowed rescaled ridgelet transform is applied to each square. It can be shown that for a special class of images (see Candès, 1999a), thresholding the expansion of the function in a local ridgelet dictionary gives approximation bounds as if there were no singularity. Theorem III.2. (See Candès, 1999a.) Let h be a function in C 2 with support on the unit disk {x 2 + y 2 1} and let f (x, y) = 1{(x−R)2 +y 2 R 2 } h(x, y). Then, the nonlinear local ridgelet approximation error has decay rate f − f˜(LR) 2 CN −2 , N for some C, R ∈ R+ and R N.
194
EASLEY AND COLONNA
C. Curvelets and Shearlets To gain even better estimation rates for the class of images that are regarded as composed of C 2 functions with a C 2 edge, we need the concept of a curvelet transform. The original curvelet transform is defined below (Stark, Candès and Donoho, 2002). Definition III.3. Given a function f , a low-pass filter P0 , and, for each s = 1, . . . , n, a passband filter s concentrated near the frequencies [2s , 22s+2 ], the curvelet transform is computed as follows: • f is filtered into subbands: f → (P0 f, 1 f, 2 f, . . . , n f ). • Each subband is smoothly windowed into dyadic squares Q ∈ Qs : s f → (wQ s f ). • Each windowed square is renormalized to unit scale: hQ = (TQ )−1 (wQ s f ). • Each windowed square is analyzed in an orthonormal ridgelet system: hQ , ψQ,j,l,k . We can consider the curvelet frame elements as f, s ψQ,j,l,k . The curvelet frame elements that are nonnegligible are supported in regions whose length Ls is approximately 2−s and whose width Ws is approximately 2−2s . Thus, the frame elements obey a scaling relation Ws ≈ L2s . Furthermore, the number of orientations at a given scale is proportional to the scale. As a consequence, curvelets yield an almost optimal approximation decay rate. After the curvelet transform was first proposed, an alternative definition and implementation based on deriving the desired spatial-frequency tiling by manipulating the data has appeared (Candès et al., 2005). This spatialfrequency tiling is shown in Figure 9a. The shaded regions are an example of the frequency support of a particular frame element. Each of the angular segments shown obey the scaling relation Ws ≈ L2s . By contrast, Figure 9b displays the spatial-frequency tiling in the finite discrete setting for images that do not have circular boundaries. To formalize what we mean by C 2 functions with a C 2 edge, we first need to define the class STAR2 (A) of characteristic functions of sets B with C 2 boundaries ∂B satisfying constraints depending on the parameter A. In polar coordinates, let ρ(θ ) : [0, 2π ) → [0, 1] × [0, 1] be a radius function and
GENERALIZED DISCRETE RADON TRANSFORMS
195
F IGURE 9. Illustration of the spatial-frequency tiling of (a) the curvelet transform and (b) its discrete version.
define B as the set of all x such that |x| ρ(θ ) whose boundary ∂B is the curve β(θ) = (ρ(θ ) cos θ, ρ(θ ) sin θ ). The function ρ satisfies the conditions sup ρ (θ ) A, ρ ρ0 , for some constant ρ0 < 1. A set B ⊂ [0, 1] × [0, 1] is in STAR2 (A) if B is a translate of a set obeying the above constraints. We now define the set E 2 (A) of functions that are C 2 away from a C 2 edge as the collection of functions of the form f = f0 + f1 1B , where f0 , f1 ∈
B ∈ STAR2 (A), and f C 2 = D α f ∞ 1.
C02 ([0, 1]2 ),
α2
The following result describes the approximation rate of curvelets. Theorem III.3. (See Candès and Donoho, 2002.) Given f ∈ E 2 (A), the nonlinear curvelet approximation error has decay rate f − f˜(C) 2 CN −2 (log N)3 . N The rate of approximation is nearly optimal. Finally, we would like to draw attention to another geometrically oriented transform known as the shearlet transform with a similar type of spatialfrequency tiling as the curvelet transform as shown in Figure 10. A key
196
EASLEY AND COLONNA
F IGURE 10. (a) The tiling of the frequency plane * R2 induced by the shearlets. (b) The frequency support of a shearlet ψj,,k satisfies parabolic scaling. The figure shows only the support for ω1 > 0; the other half of the support, for ω1 < 0, is symmetrical.
difference from the curvelet transform is that the spatial-frequency tiling of the discrete shearlet transform is the same as its continuous definition. Definition III.4. Consider the 2D affine system
−1 1 ψast (x) = |det Mas |− 2 ψ Mas x − t : t ∈ R2 , where
Mas =
1 s 0 1
a 0
√0 a
is a product of a shearing and anisotropic dilation matrix for (a, s) ∈ R+ × R. The generating functions ψ are such that ξ ˆ ) = ψ(ξ ˆ 1 , ξ2 ) = ψˆ 1 (ξ1 )ψˆ 2 2 , ψ(ξ ξ1 where ψ1 is a continuous wavelet for which ψˆ 1 ∈ C ∞ (R) with supp ψˆ 1 ⊂ [−2, −1/2] ∪ [1/2, 2], and ψ2 is chosen so that ψ2 = 1, ψˆ 2 ∈ C ∞ (R), supp ψˆ 2 ⊂ [−1, 1], and ψˆ 2 > 0 on (−1, 1). Then any function f ∈ L2 (R2 ) has the representation ∞ ∞ da f, ψast ψast (x) 3 ds dt f (x) = a R2 −∞ 0
GENERALIZED DISCRETE RADON TRANSFORMS
197
for a ∈ R+ , s ∈ R, and t ∈ R2 . The operator S defined by Sf (a, s, t) = f, ψast is called the continuous shearlet transform of f ∈ L2 (R). It is dependent on the scale a, the shear s, and the location t. The collection of discrete shearlets is defined by
ψj,,k = |det A|j/2 ψ B Aj x − k : j, ∈ Z, k ∈ Z2 , where
B=
1 0
1 1
,
A=
2 0
√0 2
.
The shearlet construction can be viewed as a natural extension of wavelets into two dimensions and yields the same ideal approximation error rate as the curvelets. For more information, see Easley, Labate and Lim (2006). Theorem III.4. (See Guo and Labate, 2007.) Given f ∈ E 2 (A), the nonlinear shearlet approximation error has decay rate f − f˜(S) 2 CN −2 (log N)3 . N Finally, we mention that the spatial-frequency tiling bears a direct relation to the discrete Radon transform described in Section V. Indeed, the implementation of this spatial-frequency tiling was developed using this discrete Radon transform (Easley, Labate and Lim, 2006 and in press).
IV. T HE D ISCRETE p-A DIC R ADON T RANSFORM This section develops a Fourier slice theorem for the discrete p-adic Radon transform and derives an inversion formula. We then obtain a matrix representation in terms of the DFT that allows us to show that the reconstruction is numerically stable. We assume N = pn where p is a prime and n is a positive integer. Definition IV.1. The discrete p-adic Radon transform over 2 (Z2N ) is defined as follows. For f ∈ 2 (Z2N ), l, m ∈ ZN , and s ∈ ZN/p , let Rfm1 (l) =
N−1 x=0
f x, [l + mx]N ,
and Rfs2 (l) =
N −1
f [l + psy]N , y ,
y=0
where [k]N is the least nonnegative residue of k modulo N.
198
EASLEY AND COLONNA
Theorem IV.1 (discrete p-adic Fourier slice theorem). Let f ∈ 2 (Z2N ) and let fˆ denote the 2D Fourier transform of f . For k ∈ {0, . . . , n}, m, v, u ∈ ZN , and s ∈ ZN/p , we have N/p −1 −1 p k k k ˆ Rfm1 l + j N/pk e−2πilv/(N/p ) , f p [−mv]N/pk , p v = k
k
l=0
j =0
(4) fˆ pk u, pk [−psu]N/pk =
k k −1 N/p −1 p
l=0
k Rfs2 l + j N/pk e−2πilu/(N/p ) .
j =0
(5) Proof. Use induction on k. For k = 0, we obtain -1 [v] = Rf m
N−1 N−1
f x, [l + mx]N e−2πilv/N
l=0 x=0
=
N−1 N−1
f (x, y)e−2πi([−mv]N x+vy)/N = fˆ [−mv]N , v ,
x=0 y=0
proving Eq. (4). The proof of Eq. (5) is analogous. Suppose for some nonnegative integer k < n Eq. (4) holds. Then fˆ pk+1 [−mv]N/pk+1 , pk+1 v = fˆ pk [−mpv]N/pk , pk pv =
k k −1 N/p −1 p
l=0
k+1 Rfm1 l + j N/pk e−2πlv/(N/p ) .
j =0
Next observe that
' N N k l + j k : 0 l k − 1, 0 j p − 1 p p '
N N = l + j k+1 : 0 l k+1 − 1, 0 j pk+1 − 1 . p p Hence, k k −1 N/p −1 p
l=0
j =0
Rfm1
l + j N/p
k
=
k+1 k+1 N/p −1 p −1
l=0
j =0
Rfm1 l + j N/pk+1 .
GENERALIZED DISCRETE RADON TRANSFORMS
199
The proof of Eq. (4) follows at once. We leave the proof of Eq. (5) to the reader. The discrete p-adic Fourier slice theorem allows us to obtain an inversion formula for the discrete p-adic Radon transform. Theorem IV.2. An inverse of the discrete p-adic Radon transform is given by f (x, y) = χ0 (x, y) − (p − 1)
n−1
χk (x, y) −
k=1
where χk (x, y) = χk1 (x, y)
χk2 (x, y)
χk1 (x, y) + χk2 (x, y)
1 = k p N 1 = k p N
k k −1 N/p −1 p
m=0
N −1 p Rf01 (l), N2
(6)
l=0
with
Rfm1 [y − mx]N/pk + j N/pk ,
j =0
N/pk+1 −1
k −1 p
s=0
j =0
Rfs2 [x − psy]N/pk + j N/pk .
The proof, inspired by the proof of the inversion formula in Hsung, Lun and Siu (1996), can be found in Colonna and Easley (2005). We now show that the discrete p-adic Radon transform can be viewed as a matrix multiplication. Let R1 (respectively, R2 ) be the N × N (respectively, N × N/p) matrix whose (l, m)-entry (respectively, (l, s)-entry) is Rfm1 [l] (respectively, Rfs2 [l]). Let f be the image of f represented as a column vector and let . / r r= 1 , r2 where r1 and r2 are the column vectors obtained from R1 and R2 by stacking the columns. For 0 l N − 1, define
{(x, [l + mx]N ): x ∈ ZN } if 0 m N − 1, Lm,l = {([l + p(m − N)y]N , y): y ∈ ZN } if N m N + Np − 1. Definition IV.2. The p-adic Radon matrix relative to N is the (N 2 + N 2 /p) × N 2 matrix R whose entries are defined by
1 if (x, y) ∈ Lm,l , R(m,l) [x, y] = 0 otherwise.
200
EASLEY AND COLONNA
Observe that r = Rf. We wish to obtain a description of the p-adic Radon matrix in terms of the DFT matrix of order N (that is, the matrix WN with entries wk,l = e2πikl/N ). For this purpose we need the following definition. Definition IV.3. The Kronecker product of an m × n matrix A = (ai,j ) and a matrix B is defined by ⎡ ⎤ a11 B a12 B · · · a1n B .. ⎦ . .. .. A ⊗ B = ⎣ ... . . . am1 B am2 B · · · amn B Definition IV.4. Let S be the (1 + 1/p)N 2 × N 2 matrix whose entries are 0 except for the entries 1 in position (mN + v, vN + [−mv]N ) for m, v ∈ {0, . . . , N − 1}, and in position (sN + u, [−p(s − N)u]N N + u) for s = N, . . . , (N + N/p) − 1, u = 0, . . . , N. We call S a p-adic selection matrix. Proposition IV.1. (a) The p-adic Radon matrix relative to N can be represented as 1 T R = IN+N/p ⊗ WN S(WN ⊗ WN ), N where IK is the Kth-order identity matrix and WN is the DFT matrix of order N. (b) R T R = N(WN ⊗ WN )−1 S T S(WN ⊗ WN ). In particular, R T R and NS T S have the same eigenvalues. Proof. First observe that (WN ⊗ WN )f is the 2D Fourier transform of f in vector form. Thus, S(WN ⊗ WN )f is the column vector whose entries are the values of fˆ([−mv]N , v) (m, v ∈ ZN ) and fˆ(u, [−psu]N ) (u ∈ ZN , s ∈ ZN /p). By the discrete Fourier slice theorem [see Eqs. (4) and (5) for -2 arranged in a single column. -1 and Rf the case k = 0], these values are Rf m s Recalling that r = Rf, we deduce T -1 -2 -1 -2 Rf0 [0] · · · Rf N−1 [N − 1]Rf0 [0] · · · Rf N/p−1 [N − 1] = (IN+N/p ⊗ WN )Rf. Thus, S(WN ⊗ WN )f = (IN+N/p ⊗ WN )Rf. Left multiplication by the inverse of (IN +N/p ⊗ WN ) together with the property WN−1 = N1 WNT yields (a).
GENERALIZED DISCRETE RADON TRANSFORMS
201
By properties (7), (5), and (10) on page 23 in Davis (1979), we determine 1 T T T T T R R = (WN ⊗ WN ) S IN+N/p ⊗ WN N 1 × IN+N/p ⊗ WNT S(WN ⊗ WN ) N T T 1 T = WN ⊗ WN S IN+N/p ⊗ WN N 1 × IN+N/p ⊗ WNT S(WN ⊗ WN ) N 1 T = WN ⊗ WNT S T (IN +N/p ⊗ IN )S(WN ⊗ WN ) N = N(WN ⊗ WN )−1 S T S(WN ⊗ WN ). Since R T R and NS T S are similar, they have the same eigenvalues. Lemma IV.1. S T S is a diagonal matrix whose largest eigenvalue is (1 + 1/p)N and whose smallest eigenvalue is 1. Proof. Each row of S has a unique entry 1 and all other entries are 0. We prove that S has (maximum) rank N 2 by showing that each column contains at least one entry 1, and for any k ∈ {0, . . . , N 2 −1} either there exist m, v ∈ ZN such that k = vN + [−mv]N or there exist u ∈ ZN , s ∈ {N, . . . , N + N/p − 1} such that k = [−p(s − N)u]N N + u. By the division algorithm, there exists a unique pair of nonnegative integers (q, r) such that k = qN + r with 0 r N − 1. Let d = gcd(q, N), the greatest common divisor of q and N. If d divides r, then setting v = q, the congruence −vx ≡ r (mod N) has at least one solution modulo N. Indeed, the number of incongruent solutions modulo N is N/d. Setting m equal to the least nonnegative residue of x modulo N, we obtain k = vN + [−mv]N . If d does not divide r, then d = p j for some positive integer j < n and so the highest power of p that divides r is pj −1 . Set u = r and observe that the congruence −ux ≡ q/p (mod N/p) has at least one solution x modulo N/p since gcd(−u, N/p) divides q/p. Then s = N + [x]N/p satisfies the condition k = [−p(s − N)u]N N + u. Thus, S T S is an invertible diagonal matrix and each diagonal entry dk is the number of 1s in the kth column of S. The first column contains precisely (1 + 1/p)N 1s since the 1s occur at the multiples of N, and there are N + N/p multiples of N. Observe that no two 1s in any column other than the first column can be in positions j1 and j2 where 0 j1 N 2 − 1 and N 2 j2 N 2 +
202
EASLEY AND COLONNA
N 2 /p − 1. To prove this, suppose there exist constants m, v, u ∈ ZN and s ∈ {N, . . . , N +N/p−1} such that vN +[−mv]N = [−p(s−N)u]N N +u. Then v = [−p(s − N)u]N and, reducing modulo N we obtain u = [−mv]N . Thus v = −p(s − N)(aN − mv) + bN for some integers a, b. Hence, (pms − 1)v is a multiple of N. Since p does not divide pms − 1, v must be 0. This implies that vN + [−mv]N = 0, that is, 1s in positions mN + v and sN + u can occur only in the first column. As argued above, repeated 1s can occur in positions off by a multiple of N. Thus, the column with the greatest number of 1s must be the first column. Hence, the largest eigenvalue of S T S is (1 + 1/p)N. Next we show that the last column of S contains a single-entry 1, which occurs in the row N 2 − 1 (for m = v = N − 1). Indeed, for m = v = N − 1 the value of vN + [−mv]N is N 2 − 1. Furthermore, N 2 − 1 = qN + r with q = r = N − 1 and so the greatest common divisor of q and N is 1. By the above reasoning, there is a unique solution (m, v) with 0 m, v N −1 such that vN + [−mv]N = N 2 − 1. Therefore, the smallest eigenvalue of S T S is 1. Since the discrete p-adic Radon transform is an invertible operator, it can be viewed as a frame operator in 2 (Z2N 2 ) with frame given by the characteristic function of Lk,l . From part (b) of Proposition IV.1 and Lemma IV.1, we derive the following result. Theorem IV.3. The tightest bounds for the discrete p-adic Radon frame in 2 (Z2N ) are A = N and B = (1 + 1/p)N 2 . Thus the condition number of the √ associated matrix is (1 + 1/p)N. Definition IV.5. A circulant matrix is a square matrix whose entries in each row are shifted one space to the right from top to bottom, i.e., a circulant matrix has the form ⎤ ⎡ c1 c2 · · · cn ⎢ cn c1 · · · cn−1 ⎥ ⎢ . . .. ⎥ .. . . ⎣ .. . . ⎦ . c2 c3 · · · c1 A block circulant matrix is a block matrix of the form ⎡ ⎤ B1 B2 · · · Bn ⎢ Bn B1 · · · Bn−1 ⎥ ⎢ . . .. ⎥ .. . . ⎣ .. . . ⎦ . B2
B3
···
B1
Denote by BCCB m,n the set of block circulant matrices of order mn consisting of circulant blocks of order n. Recall the following result.
GENERALIZED DISCRETE RADON TRANSFORMS
203
Theorem IV.4. (See Davis, 1979.) Let A = (Wm ⊗ Wn )−1 (Wm ⊗ Wn ) where is a diagonal matrix. Then A ∈ BCCB m,n . By part (b) of Proposition IV.1, Lemma IV.1, and Theorem IV.4, we deduce: Corollary IV.1. R T R ∈ BCCB N,N . In the special case when N is a prime, R T R is circulant, a property that was observed in Do and Vetterli (2001). A. Improved Stability of the Discrete p-Adic Radon Transform Our aim is to obtain a variant of the discrete p-adic Radon transform that has a lower condition number. To accomplish this, we normalize an image f by subtracting its mean value from all pixels, that is, we replace f by f˜ = f − mean(f ). Since N−1 N −1 1 ˜ f (x, y) = 0, fˆ˜(0, 0) = 2 N x=0 y=0
-1 [0] = Rf -2 [0] = 0. In particular, since by Theorem IV.1 it follows that Rf m s "N−1 1 1 Rf0 [0] = l=0 Rf0 (l), the reconstruction formula in Eq. (6) simplifies to f˜(x, y) = χ0 (x, y) − (p − 1)
n−1
χk (x, y).
k=1
Define the matrix S˜ by
s˜i,j =
si,j 0
if j =
0, if j = 0.
˜ N ⊗ WN )f˜. Thus, if we Then (IN +N/p ⊗ WN )R f˜ = S(WN ⊗ WN )f˜ = S(W 1 T ˜ N ⊗ WN ), we obtain let R˜ = (IN+N/p ⊗ W )S(W N
N
R f˜ = R˜ f˜. ˜ Since all entries of the first column of Furthermore, R˜ T R˜ is similar to N S˜ T S. T ˜ and hence of R˜ T R, ˜ is 0. To avoid this, ˜ ˜ S are 0, the smallest eigenvalue of S S, we replace the entry 0 of the first column of S˜ in the position (N 2 , 0) with 1. With this modification, the condition R f˜ = R˜ f˜ still holds. We deduce the following result.
204
EASLEY AND COLONNA
Theorem IV.5. The tightest bounds for the discrete p-adic Radon frame in 2 (Z2N ) restricted to the space of zero-mean functions are A = N and B = N 2 /p. Consequently, the condition number of the associated matrix is √ N/p. B. Discrete p-Adic Ridgelet Transform Consider signals of length N = pn for some prime p and n ∈ N. Two approaches can be used to deal with the wavelet transform when p is not necessarily equal to 2. Let N˜ = 2J be the greatest dyadic number that is less than or equal to N. We then apply the wavelet transform to the first N˜ coefficients and carry the N − N˜ remaining coefficients unchanged. Alternatively, add trailing 0s to the signal until it is of a dyadic length N˜ = 2J for some J ∈ N. We prefer the second approach when dealing with denoising applications. This second approach for the definition of the discrete p-adic ridgelet transform is stated below. Definition IV.6. Let h be a scaling filter and let g(k) = (−1)k h(1 − k). Given an N ×N image f with N = pn for some n ∈ N and N˜ = min{2j : j ∈ N, 2j N}, the discrete p-adic ridgelet transform of f is a decomposition N/p−1 N −1 ∪ {Rfs2 [l]}s=0 into the of a discrete p-adic Radon transform {Rfm1 [l]}m=0 collection of vectors 2 0
a
a dj,m (k): 1 j J ; k ∈ ZN ∪ cJ,m (k): k ∈ ZN ,
a=1
where
a [l] c0,m
and for j 0
=
Rfma [l], for l = 0, . . . , N, 0, for l = N + 1, . . . , N˜ ,
a cja+1,m = ↓ cj,m ∗h ,
a ∗g , dja+1,m (k) = ↓ cj,m
for m ∈ ZN if a = 1, and for m ∈ ZN/p if a = 2.
V. G ENERALIZED D ISCRETE R ADON T RANSFORM This section gives a definition of the Radon transform in a discrete setting that incorporates earlier definitions and yields multiple discrete versions of the Fourier slice theorem.
GENERALIZED DISCRETE RADON TRANSFORMS
205
Definition V.1. Let Q be an orthogonal matrix of order N. Given an N × N image matrix f , define the 2D Q-transform of f as ˆfQ = (Q ⊗ Q)f, where f is the matrix f turned into a vector by stacking its columns. The inverse Q-transform of F is defined as Fˇ Q = QT ⊗ QT F. ˆ
Let R f be the N × N matrix obtained by arranging ˆfQ columnwise. We now expand the concept of selection matrix introduced in Definition IV.4 for the p-adic case. ˆ Given an integer k > N, decompose R f into k sequences
fˆ
Rj [l]
N l=1
,
j = 1, . . . , k, ˆ
in such a way that all the entries of R f are contained in at least one of these sequences. This decomposition can be described in terms of a transformation that creates redundancies when applied to a vector. Intuitively, consider a rectangular matrix that when multiplied by a vector yields a vector of larger size containing all the entries of the original vector arranged in some order, with some repetitions. More precisely, we define a selection matrix to be a kN × N 2 matrix S consisting of 0s and 1s, such that each row contains a single-entry 1 and each column contains at least one entry 1. In particular, fˆ T f fˆ fˆ S ˆfQ = R1 [1] . . . R1 [N] . . . Rk [1] . . . Rk [N] . Definition V.2. The generalized discrete Radon transform relative to Q and the selection matrix S is defined as Rfk [l] =
N
fˆ
QT (l, m)Rk [m].
m=1
The Radon matrix relative to Q and S is defined by R = Ik ⊗ QT S(Q ⊗ Q).
(7)
If we denote by Rf the column vector whose entries are Rfk [l] arranged columnwise, we can write Rf = Rf.
(8)
206
EASLEY AND COLONNA
Example V.1. Let Q be the discrete DFT matrix of order 2, and let k = 3. Suppose the chosen sequences turned into column vectors are . ˆ / . ˆ / . / f (1, 0) f (0, 1) fˆ(1, 1) fˆ fˆ fˆ R1 = , R2 = , R3 = ˆ . f (1, 0) fˆ(0, 0) fˆ(0, 0) Then
⎤ ⎡ ˆ f (1, 0) ⎡ fˆ(0, 0) ⎤ ⎢ fˆ(0, 0) ⎥ ⎥ ⎢ ⎢ fˆ(1, 0) ⎥ ⎢ fˆ(0, 1) ⎥ ⎥ ⎢ ⎥ S ˆfQ = S ⎢ ⎥, ⎣ fˆ(0, 1) ⎦ = ⎢ ⎢ fˆ(0, 0) ⎥ ⎥ ⎢ ⎣ fˆ(1, 1) ⎦ fˆ(1, 1) fˆ(1, 0)
⎡
so
0 ⎢1 ⎢ ⎢0 S=⎢ ⎢1 ⎣0 0
1 0 0 0 0 1
0 0 1 0 0 0
⎤ 0 0⎥ ⎥ 0⎥ ⎥. 0⎥ ⎦ 1 0
Orthogonal matrices that can be used effectively to implement the generalized ridgelet transform are 1 lkπ 2 QSE (l, k) = sin , N +1 N +1 2 2lkπ QS (l, k) = √ , sin 2N + 1 2N + 1 1 QF (l, k) = √ e2πi(l−1)(k−1)/N , N ⎧ 1 √ if l = 1, ⎪ ⎪ N ⎪ ⎪ ⎨√ 1 if l > 1, k < l, (l−1)l QHel (l, k) = ⎪ −(l−1) ⎪ ⎪ ⎪ √(l−1)l if l = k > 1, ⎩ 0 if 1 < l < k, π π + cos 2(l − 1)(k − 1) , QHar (l, k) = sin 2(l − 1)(k − 1) N N 1 π 2 QC (l, k) = cos (l − 1/2)(k − 1/2) . N N The matrix QSE is the symmetric eigenvector matrix of a second difference matrix; QS , QF , QHar , and QC are the matrices associated with the discrete sine transform (DST), the normalized DFT, the discrete Hartley transform, and the discrete cosine transform (DCT), respectively; QHel is called the Helmert matrix of order N. The notion of generalized discrete Radon transform can be further extended by defining the Radon matrix relative to three preassigned orthogonal matrices
GENERALIZED DISCRETE RADON TRANSFORMS
207
Q, U , and V and a selection matrix S as R = (Ik ⊗Q)S(U ⊗V ). The purpose of extending the definition in this manner is to be able to apply the transform for image deconvolution (Easley, Healy and Berenstein, 2005b). Remark V.1. If S is a selection matrix, then S T S is a diagonal matrix whose j th diagonal entry is the number of 1s in the j th column of S.
A. Special Cases When N = pn , k = N + N/p, Q is the normalized DFT matrix QF , and S is the selection matrix in Definition IV.4, the generalized discrete Radon transform relative to Q and S is a scaled version of the discrete p-adic Radon transform. When Q is the normalized DFT matrix QF with the columns reordered so that the zero-frequency terms are in the center, and S is the selection matrix defined by the pseudopolar grid (see Figure 11b), the discrete Radon transform relative to Q and S is the Radon transform given in Stark, Candès and Donoho (2002), which we shall refer to as the direct slice Radon transform. Since the discrete Hartley transform of an image f can be written as Re fˆ − Im fˆ, the matrix QHar behaves like QF with the lowest frequency in the center. Thus, for both QF and QHar , we choose as selection matrix the pseudopolar grid. The rotation theorem for the 2D Fourier transform states that if an image function f is rotated through some angle, its 2D DFT fˆ does the same through the same angle. As a consequence, the Hartley transform has the same
F IGURE 11.
(a) Triangular grid. (b) Pseudopolar grid.
208
EASLEY AND COLONNA
F IGURE 12.
Pseudospiral grid.
property. Therefore, the Fourier and the Hartley transforms are rotationally invariant under radial slices, and hence the projections of the radial slices are also rotationally invariant. We can create other transforms that share this rotation invariance property by taking polynomial functions of the real and the imaginary parts of the DFT. We successfully tested this invariance using the transform defined by f → (Re fˆ)n + (Im fˆ)n for various choices of the positive integer n. When Q is QSE , QS , QHel , QC , or any orthogonal matrix such that lower frequency values start in the upper left-hand corner, we choose as selection matrix a triangular grid (see Figure 11a). The pseudospiral grid illustrated in Figure 12 models spiral-like MRI data collection. Illustrations of some of these transforms and the corresponding ridgelet coefficients for different choices of selection matrices are shown in Figure 14. The DCT and the DST closely approximate the Karhunen–Loève transform for a class of random signals known as first-order Markov processes that model several real-world images. Theoretically, these yield a sparse decomposition with much of the density in the upper left-hand corner. The coefficients of the DCT for an N × N image f = (fm,n ) are Fj,k
N N 2 = fm,n cos π(m − 1/2)(j − 1/2)/N N m=0 n=0 × cos π(n − 1/2)(k − 1/2)/N .
The coefficients of the DST have a similar description in terms of the sine function. It follows that taking sections of Fj,k that start at (0, 0) and radially
GENERALIZED DISCRETE RADON TRANSFORMS
209
F IGURE 13. An illustration of the smoothing properties of the generalized discrete Radon transform relative to the DCT. (a) Triangular edge. (b) Radon projection.
extend to the edges of the matrix F along the triangular grid, we obtain slices representing coefficients from lowest to highest frequency. The application of the 1D inverse transform along these slices typically produces a nonsparse decomposition of 1D signals with an increased degree of smoothness. Thus, the wavelet representation is likely to better approximate the signals and to improve the estimate for the denoised signals than the wavelet representation of the 1D slices found without applying an orthogonal transform. To illustrate this point, consider the 32 × 32 image square with a sharp edge discontinuity shown in Figure 13a. Figure 13b shows that the projections obtained after taking the generalized discrete Radon transform corresponding to the DCT and the triangular grid have an increased degree of smoothness. Using Eq. (7) and Lemma IV.1, we obtain the following extension of Theorem IV.3. Theorem V.1. Let N = pn , where n 2 and p is a prime, and let Q be any orthogonal matrix of order N. Then the generalized discrete Radon matrix R relative to Q and the p-adic selection matrix S in Definition IV.4 is a frame with frame bounds 1 and (1 + 1/p)N. In particular, its condition number is √ (1 + 1/p)N . We call R a generalized p-adic Radon frame. The numerical stability of the generalized Radon frame operator can be improved by scaling the selection matrix via multiplication by a suitable diagonal matrix. Proposition V.1. Let S be a kN × N 2 selection matrix, where k and N are positive integers such that k > N. For each j = 1, . . . , N 2 , let nj be the number of 1s in the j th column of S and let D be the diagonal matrix with √ diagonal entries d1 , . . . , dN 2 , where dj = 1/ nj . Then the matrix Rnorm =
210
EASLEY AND COLONNA
F IGURE 14. Examples of transforms for the 32 × 32 top image. (a) Time-domain implementation of the discrete Radon transform. (b) Wavelet coefficients of the projections from (a). (c) Discrete dyadic Radon transform. (d) Wavelet coefficients of the projections from (d). (e) Generalized discrete Radon transform corresponding to QSE . (f) Wavelet coefficients of the projections from (e). (g) Generalized discrete Radon transform corresponding to QHar . (h) Wavelet coefficients of the projections from (g). (i) Generalized discrete Radon transform corresponding to QF using a pseudospiral grid. (j) Wavelet coefficients of the projections from (i).
GENERALIZED DISCRETE RADON TRANSFORMS
211
(Ik ⊗ QT )SD(Q ⊗ Q) is a tight discrete frame with frame bounds equal to 1. In particular, its condition number is 1. Proof. The matrix S T S is diagonal and its j th diagonal entry is nj . From the definition of selection matrix, it follows that each nj > 0. Thus, the √ diagonal matrix D whose diagonal entries are 1/ nj (j = 1, . . . , N 2 ) is well defined and invertible. Furthermore, (SD)T (SD) = D(S T S)D, a product of invertible diagonal matrices. Since (Rnorm )T Rnorm is similar to (SD)T (SD), it is invertible and its eigenvalues are all equal to 1. Thus, Rnorm is a tight frame with frame bounds equal to 1. Using properties of the Kronecker product and the invertibility of S T S, we obtain the following inversion formula. Theorem V.2. Let R be the Radon matrix R = (Ik ⊗ Q)S(U ⊗ V ), where Q, U , and V are orthogonal matrices and S is a selection matrix S. The inverse of the generalized discrete Radon transform with Radon matrix R in matrix form is given by −1 R inv = U T ⊗ V T S T S S T Ik ⊗ QT . We now show that the generalized discrete Radon transform satisfies the convolution property. Theorem V.3 (discrete convolution property). Let Rh be the generalized discrete Radon transform of h relative to the normalized DFT matrix QF and any selection matrix S. Given N × N image functions f, g, for any m = 1, . . . , N, Rfm Rgm = R(f 2 g)m . Proof. By the discrete convolution theorem (see Mallat, 1998), we have -m [k]Rg -m [k]. (Rfm Rgm )*[k] = Rf Using the vector notation introduced in Definition V.1 and in Eq. (8) for the special case where Q is the normalized DFT matrix, we obtain * Rg[k] ( * = R(f 2 g) *[k]. Rf[k] = S ˆf[k]S gˆ [k] = S fg[k] Thus, (Rfm Rgm )*[k] = (R(f 2 g)m )*[k]. The result follows at once by taking the inverse Fourier transform. The convolution property for the generalized discrete Radon transform relative to the DFT matrix QF holds only if the data outside the observed
212
EASLEY AND COLONNA
blurred image are assumed to be a periodic extension of the data inside. There are two additional types of boundary assumptions that can be considered when using the generalized discrete Radon transform for image convolution/deconvolution problems (Easley, Berenstein and Healy, 2005a, 2005b). A boundary assumption, known as the zero boundary condition, is that the data outside the domain of the observed blurred image are 0. Another boundary assumption, known as the reflexive boundary condition, is that the data outside the observed blurred image are a reflection of the data inside. In order for the convolution property of the generalized discrete Radon transform to hold in these cases, the orthogonal matrices Q, U , and V used to define the generalized discrete Radon transform must diagonalize the convolution matrix that represents the boundary assumptions. In Ng, Chan and Tang (1999) it is shown that if the blurring function is symmetric with respect to the origin, then the convolution matrix for the reflexive boundary conditions can be diagonalized by QC . Easley, Healy and Berenstein (2005b) show how the singular value decomposition (SVD) can be used to find orthogonal matrices Q, U , and V that diagonalize the convolution matrix when the zero boundary condition is assumed. B. Generalized Discrete Ridgelet Transform Definition V.3. Let h be a scaling filter and let g(k) = (−1)k h(1 − k). Given an image f : Z2N → C with N = 2J for some J ∈ N, the generalized discrete ridgelet transform is a decomposition of a generalized discrete Radon transform {Rfm [l]}K−1 m=0 into the collection of K vectors
dj,m (k): 1 j J ; k ∈ ZN ∪ cJ,m (k): k ∈ ZN , for m = 0, . . . , K − 1, where c0,m = Rfm and for 0 j < J cj +1,m = ↓(cj,m ∗ h),
dj +1,m (k) = ↓(cj,m ∗ g).
C. Examples of Frame Elements The frame elements of a generalized discrete Radon transform can be found by reshaping each row in the Radon matrix into an N × N matrix. Figure 15 shows sets of four consecutive frame elements applied to an 8 × 8 image. The frame elements of the generalized discrete Radon transform for QC do not represent traditional lines when the selection matrix is chosen to correspond to the triangular grid.
GENERALIZED DISCRETE RADON TRANSFORMS
213
F IGURE 15. Each image consists of examples of successive frame elements for the generalized discrete Radon transform corresponding to QC .
D. A Direct Radon Matrix Approach A discrete Radon transform can be implemented by rotating the image block by an interpolation method and calculating the projections for each angle. Because its inversion by a standard backprojection routine can be unstable and cause undesirable image artefacts in the reconstructed image (compare with Kak and Slaney, 1987, and Natterer, 2001), we have conceived of a much simpler time-domain procedure to be used as part of a local ridgelet transform. We consider a matrix R such that Rf gives the sums of the image values over perpendicular lines for each line that makes up the pseudopolar grid. In particular, no interpolation is required and direct invertibility is possible
214
EASLEY AND COLONNA
by means of a pseudoinverse. This method may be ill conditioned for large image block sizes. For small block sizes, however, the condition number is fairly low. For example, with 8 × 8 and 16 × 16 block sizes, the condition numbers are estimated to be 12.4656 and 26.4763, respectively. Definition V.4. Let f be an N × N image, Sm the slice of direction m in the N × N pseudopolar grid and for j = 1, . . . , N, let Pm,j be the slice through Sm [j ] perpendicular to Sm . Then define the direct matrix discrete (time-domain) Radon transform of f by Rf where for m, j = 1, . . . , N Rfm [j ] = f (k, l). (k,l)∈Pm,j
Through regularization applied to the pseudoinverse matrix R + , it is possible to improve stability at a cost of approximating the solution. For example, using the traditional Tikhonov regularization by means of ˜f = R + r = R T R + λI −1 R T r λ for some λ > 0, the conditioning of R can be reduced. Let R = U ΣV ∗ be the SVD of R, where U and V are orthogonal matrices and Σ is a diagonal matrix whose diagonal entries σ1 , . . . , σN 2 are such that σ1 · · · σN 2 0. Then Rλ+ = V Σλ+ U ∗ , where Σλ+ is a diagonal matrix whose kth diagonal entry is σk /(σk2 + λ). The conditioning of R becomes σ1 σN 2 /(σN2 2 + λ). For example, for blocks of size 8 × 8 and 16 × 16, and for λ = 1, the corresponding condition numbers reduce to 5.1683 and 10.1938. Improved reconstruction performance after applying this regularization is confirmed by our numerical simulations. Definition V.5. Let h be a scaling filter and let g(k) = (−1)k h(1 − k). Given an N × N image f with N = 2J for some J ∈ N, the direct matrix (timedomain) discrete ridgelet transform is a decomposition of the discrete time2N−1 into the collection of 2N vectors domain Radon transform {Rfm [j ]}m=0
dj,m (k): 1 j J + 1; k ∈ Z2N+1 ∪ cJ +1,m (k): k ∈ Z2N+1 , for m = 0, . . . , 2N − 1, where c0,m = Rfm and for 0 j < J + 1 cj +1,m = ↓(cj,m ∗ h),
dj +1,m (k) = ↓(cj,m ∗ g).
GENERALIZED DISCRETE RADON TRANSFORMS
215
E. Generalized Discrete Local Ridgelet Transform and Discrete Curvelet Transform We adopted a denoising routine inspired by Stark, Candès and Donoho (2002) based on the application of the generalized discrete ridgelet transform to each block of a redundant dyadic block decomposition of the image. The image is then reassembled by combining the blocks according to a specific weighting function w. More precisely, we have the following definition. Definition V.6. Given an L × L image f with L = 2n k, n, k ∈ N, partition f into blocks of size 2n as follows. • Partition the image into k × k blocks. • Partition the image into k × (k −1) blocks with the top left and bottom right block corners at (0, 2n−1 ) and (L, L − 2n−1 ), respectively. • Partition the image into (k − 1) × k blocks, with the top left and the bottom right block corners at (2n−1 , 0) and (L − 2n−1 , L), respectively. • Partition the image into (k −1)×(k −1) blocks with the top left and bottom right block corners at (2n−1 , 2n−1 ) and (L − 2n−1 , L − 2n−1 ), respectively. By a generalized discrete local ridgelet transform of f we mean the application of the generalized discrete ridgelet transform to each of the above dyadic blocks. Figure 16 gives an illustration of this partitioning with k = 16 and n = 4. To obtain a denoised image, after applying a generalized discrete ridgelet transform to each of the blocks, the ridgelet coefficients that are no greater than the expected noise level are set equal to 0. The inverse generalized discrete ridgelet transform is then applied. The block partitions are reassembled as follows. • Multiply each block in the first partition by a weighting function w such as sin2 ((j − 1)/2n )π )block(i, j )). • Apply w only to the right (respectively, the left) side of the block corresponding to the first (respectively, the last) column. • Apply w to all blocks in the second partition. • Add together the first and second partition. • Apply w in the horizontal direction. • Add together the third and fourth partition. • Apply w to the two regions obtained in the fourth and the sixth steps in the vertical direction. • Add these two regions. In our demonstrations we shall use the generalized DCT inspired by Stark, Candès and Donoho (2002), which consists of applying the à trous algorithm
216
EASLEY AND COLONNA
F IGURE 16.
Local block decomposition of a band-pass image of Peppers (see Figure 19).
to an image to decompose it into frequency subbands, and then applying the generalized local ridgelet transform to each subband. Definition V.7. Given an image function f , using the à trous algorithm, decompose f as f (x, y) = cJ (x, y) +
J
wj (x, y),
j =1
for J scales, where cJ is a smooth image version of f , and wj represents details of f at scale 2−j . Then, the generalized discrete curvelet transform of f is the application of the generalized local ridgelet transform of varying block sizes to cJ and {wj }Jj=1 .
GENERALIZED DISCRETE RADON TRANSFORMS
217
Our experimental results showed that the generalized discrete curvelet transform is an effective method for denoising. In our demonstrations we divided the image into 5 subbands (J = 4) and applied the generalized local ridgelet transform to w1 and w2 with 16 × 16 blocks, to w3 and w4 with 32 × 32 blocks, and to c4 with 64 × 64 blocks.
VI. N OISE R EMOVAL E XPERIMENTS In this section we demonstrate the capability of the generalized Radon transform to remove noise when used in a local ridgelet or curvelet scheme. Experiments with denoising images that have noise artificially added allow for explicit measures of performance. It is also important, however, to see how these methods perform on real applications. One such real application is the removal a type of noise known as speckle in synthetic aperture radar (SAR) imagery. A wave transmitted by a radar interacts with the objects in the imaging scene and the radar returns both a phase and an amplitude associated with the objects. As the radar moves through an aperture, the data collected are in the form of a restricted subset (an annulus) of the frequency data from a Radon transform of the scene. This information is then used to form an SAR image. On a scale smaller than what the resolution cell of the radar can achieve, scatterers at different parts of the resolution cell contribute different small phases to the return. These small variations in the phases result in a noiselike degradation called speckle. It has been shown that when the intensity image is logarithmically transformed, the speckle noise is approximately a Gaussian additive noise (Arsenault and April, 1976). It is important to remove speckle to improve the performance of automatic target detection and recognition algorithms based on SAR imagery. Some common techniques devised for reducing speckle noise are based on using local statistical filters (see Dewaele et al., 1990). These filters, however, are not very effective in suppressing isolated noise spikes (Lu et al., 1999). Speckle reduction schemes based on wavelets have been investigated and have shown promise (Guo et al., 1994; Fukuda and Hirosawa, 1997; Sveinsson and Benediktsson, 2000, 2001; and Ulfarsson, Sveinsson and Benediktsson, 2002). For our denoising experiments, we used an SAR image obtained from the Moving and Stationary Target Acquisition and Recognition (MSTAR) database. We estimated the standard deviation of the noise σnoise in the given N × N SAR image f by using the first level of the standard wavelet transform of the image. Specifically, given a wavelet ψ, let 3 ψj,n,m (x, y) = 1/2j ψ 3 x − 2j n /2j , y − 2j m /2j ,
218
EASLEY AND COLONNA TABLE 1 G ENERALIZED L OCAL R IDGELET AND C URVELET M ETHODS IN T ERMS OF ENL
Algorithm
ENL
Algorithm
ENL
Local ridgelet (p = 7) Local ridgelet (dyadic, 8) Local ridgelet (modified dyadic, 8) Local ridgelet (modified dyadic, 8) Local ridgelet (p-adic, 9) Local ridgelet (mod. p-adic, 9) Local ridgelet (time-domain, 8, λ = 0) Local ridgelet (time-domain, 8, λ = 2) Local ridgelet (8, QSE ) Local ridgelet (8, QS ) Local ridgelet (8, QF ) Local ridgelet (8, QHel ) Local ridgelet (8, QHar ) Local ridgelet (8, QC ) Curvelet transform (QSE ) Curvelet transform (QF ) Curvelet transform (QHar ) Curvelet transform (dyadic)
18.34 21.71 22.07 24.25 13.34 14.26 27.50 30.46 18.20 22.63 28.56 23.40 21.01 23.62 26.03 39.36 31.41 32.23
Local ridgelet (p = 17) Local ridgelet (dyadic, 16) Local ridgelet (modified dyadic, 16) Local ridgelet (modified dyadic, 16) Local ridgelet (dyadic, 27) Local ridgelet (mod. p-adic, 27) Local ridgelet (time-domain, 16, λ = 0) Local ridgelet (time-domain, 16, λ = 2) Local ridgelet (16, QSE ) Local ridgelet (16, QS ) Local ridgelet (16, QF ) Local ridgelet (16, QHel ) Local ridgelet (16, QHar ) Local ridgelet (16, QC ) Curvelet transform (QS ) Curvelet transform (QHel ) Curvelet transform (QC ) Curvelet transform (time-domain, λ = 2)
21.65 20.86 25.84 29.12 23.67 18.90 21.78 24.42 21.18 28.95 27.61 33.74 23.43 24.42 30.86 40.83 33.31 43.24
where we recall from Section II that ψ 3 (x, y) = ψ(x)ψ(y). We used (Mallat, 1998) σnoise =
3 median(|f, ψ1,n,m |)0n,mN/2
0.6745
.
The evaluation of the denoising methods was done by computing the equivalent number of looks (ENL) = E 2 (f )/σ 2 (f ), where E(f ) is the expectation value of a homogeneous region of f and σ (f ) is the standard deviation of f (Fukuda and Hirosawa, 1997). An additional postfiltering was applied to the estimated images from the generalized discrete local ridgelet and curvelet algorithms by using an adaptive Wiener filter with a window size of 2 × 2. This improves the performance only slightly and had been suggested by Do and Vetterli (2001) to be used with the FRIT algorithm. The original SAR image has an ENL of 3.81. Results for some of the algorithms are shown in Figures 17 and 18. Table 1 provides a complete list of the results for the generalized discrete local ridgelet and curvelet methods in terms of ENL.
GENERALIZED DISCRETE RADON TRANSFORMS
219
F IGURE 17. (a) Original speckled SAR image. (b) The local ridgelet transform using 16 × 16 blocks with QHel . (c) The curvelet transform with QHel . (d) The curvelet transform using the time-domain implementation with λ = 2.
Using the images of Pepper, Goldhill, Elaine, and Baboon shown in Figure 19, we artificially added white Gaussian noise to test the new methods. We used the quantitative measure . / σ 2 (f ) SNR(f, fest ) = 10 log10 , mean(f − fest ) where f and fest are the original image and the estimated image. We compared our methods with some of the state-of-the-art wavelet-based denoising routines and provided comments on how many artefacts remained. These methods were the following: • Soft thresholding of the discrete wavelet transform using the Daubechies wavelet with 6 vanishing moments (DWT Daubechies 6 with soft thresholding).
220
EASLEY AND COLONNA
F IGURE 18.
A closeup view of the results in Figure 17.
• Soft thresholding of the stationary wavelet transform using the Daubechies wavelet with 6 vanishing moments (SWT Daubechies 6 with soft thresholding). • Hard thresholding of the discrete biorthogonal wavelet transform using the Daubechies–Antonini 7/9 filters (DWT 7/9 filters with hard thresholding). • Hard thresholding of the stationary biorthogonal wavelet transform using the Daubechies–Antonini 7/9 filters (SWT 7/9 filters with hard thresholding). Table 2 shows the results of these experiments. We also have included some of the results from the local ridgelet transform using the dyadic Radon transform for an 8 × 8 block decomposition and the local ridgelet transform using the time-domain Radon transform (λ = 2) for an 8 × 8 block decomposition. The results in terms of SNR for all of the methods are provided in Table 3. Figure 20 shows some of the results of the algorithms for the Elaine image.
GENERALIZED DISCRETE RADON TRANSFORMS
F IGURE 19.
221
Test images. From top left, clockwise: Peppers, Elaine, Goldhill, and Baboon.
Closeup images of some of the results are shown in Figure 21. Clearly, many more details are visible with two of our best-performing transforms.
VII. A PPLICATIONS TO I MAGE R ECOGNITION In this section, we study the application of some of the developed discrete Radon transforms for image identification. The Fourier-based direct slice Radon transform, the discrete Hartley Radon transform, and the time-domain Radon transform have rotationally invariant features. For this reason they are well suited for invariant object recognition. Another discrete Radon transform that is geometrically faithfully to the continuous Radon transform and avoids the use of interpolation is known as the fast slant stack Radon transform. It was reported by Stark, Candès and Donoho (2002) that its use in the ridgelet and curvelet algorithms for denoising was not as effective as the direct slice version. Since its rotation invariant properties for recognition have not been investigated before, we now include it in this analysis.
222
EASLEY AND COLONNA TABLE 2 R ESULTS OF WAVELET-BASED D ENOISING ROUTINES
Noisy image
Peppers Goldhill Elaine Baboon
8.61 dB 7.83 dB 7.25 dB 6.51 dB
Algorithm
SNR (in dB)
Comments
DWT Daubechies 6 with soft thresholding
13.15 11.06 12.10 3.73
Very many artefacts Very many artefacts Very many artefacts Very many artefacts
SWT Daubechies 6 with soft thresholding
16.90 13.66 15.03 6.18
Few artefacts Few artefacts Few artefacts Few artefacts
DWT 9-7 filters with hard thresholding
14.41 11.87 13.79 3.95
Many artefacts Many artefacts Many artefacts Many artefacts
SWT 9-7 filters with hard thresholding
16.92 13.81 14.90 7.48
Many artefacts Many artefacts Many artefacts Many artefacts
Local ridgelet dyadic Radon, 8
17.25 14.88 15.60 9.54
Very few artefacts Very few artefacts Very few artefacts Very few artefacts
Local ridgelet (time-domain, 8, λ = 2)
17.05 14.62 15.58 9.17
Very few artefacts Very few artefacts Very few artefacts Very few artefacts
Parts of the fast slant stack Radon transform have their origins in the seismic and radar imaging community. Its formulation as a discrete analysis tool was developed in Avenbuch et al. (in press), and its 3D version has appeared in Avenbuch and Shkolnisky (2003). Definition VII.1. Let f be an N × N image. The fast slant stack Radon transform of f is defined by Rf = F1−1 Z1 Z2T ,
GENERALIZED DISCRETE RADON TRANSFORMS
223
TABLE 3 G ENERALIZED L OCAL R IDGELET AND C URVELET M ETHODS IN T ERMS OF SNR Peppers 8.61∗ 17.10
Goldhill
Elaine
Noisy image 7.83 7.25 Local ridgelet (p = 7) 14.79 15.32
Baboon
6.51 9.38
Peppers
Goldhill
Elaine
Baboon
8.61
Noisy image 7.83 7.25
6.51
16.55
Local ridgelet (p = 17) 14.32 14.89
8.98
16.46
Local ridgelet (dyadic, 8) 14.45 14.87 9.27
17.25
Local ridgelet (modified dyadic, 8) 14.88 15.60 9.54
16.28
Local ridgelet (p-adic, 9) 14.47 14.59 9.39
16.36
Local ridgelet (modified p-adic, 9) 14.51 14.68 9.36
Local ridgelet (modified p-adic, 27) 16.08 14.30 14.68 9.34
Local ridgelet (time-domain, 8, λ = 0) 14.56 15.36 9.27 16.91
Local ridgelet (time-domain, 16, λ = 0) 16.68 14.41 15.13 8.94
Local ridgelet (time-domain, 8, λ = 2) 17.05 14.62 15.58 9.17
Local ridgelet (time-domain, 16, λ = 2) 17.00 14.61 15.45 9.29
16.89
Local ridgelet (8, QSE ) 14.65 15.27
16.86
Local ridgelet (8, QS ) 14.61 15.29
16.29
Local ridgelet (8, QF ) 14.21 15.17
16.35
Local ridgelet (8, QHel ) 14.22 15.05
17.11
Local ridgelet (8, QHar ) 14.82 15.40
9.49 9.44 8.43 8.81 9.52
15.90
Local ridgelet (dyadic, 16) 14.06 14.46 9.04
Local ridgelet (modified dyadic, 16) 16.55 14.40 15.10 9.21 16.00
Local ridgelet (dyadic, 27) 14.15 14.66 9.12
16.64
Local ridgelet (16, QSE ) 14.54 14.95
9.54
16.62
Local ridgelet (16, QS ) 14.52 14.94
9.52
15.79
Local ridgelet (16, QF ) 13.96 14.42
8.86
15.44
Local ridgelet (16, QHel ) 13.42 14.16
8.36
16.83
Local ridgelet (16, QHar ) 14.63 15.08
9.51
Local ridgelet (16, QC ) 14.46 14.99 Curvelet transform (QS ) 14.71 15.48
16.84
Local ridgelet (8, QC ) 14.60 15.34 9.39 Curvelet transform (QSE ) 14.58 15.46 9.41
16.03
Curvelet transform (QF ) 13.99 15.16 8.38
15.14
Curvelet transform (QHel ) 13.28 14.36 8.13
16.88
Curvelet transform (QHar ) 14.63 15.48 9.44
16.76
Curvelet transform (QC ) 14.62 15.45
17.08
Curvelet transform (dyadic) 14.80 15.62 9.64
Curvelet transform (time-domain, λ = 2) 16.84 14.46 15.62 9.07
16.92
∗ Measured as dB.
16.63 16.86
9.46 9.52
9.49
224
EASLEY AND COLONNA
F IGURE 20. (a) Original noisy image of Elaine. (b) Local discrete prime Radon transform (p = 7). (c) Discrete dyadic Radon transform (8). (d) Time-domain Radon transform with λ = 2 (8).
where F1−1 is the 1D inverse DFT along the columns, Z1 (j1 , j2 ) =
4 N2
N/2−1
N/2−1
f (k1 , k2 )e
−iπ N (j1 k1 −2j1 j2 k2 /N)
,
k1 =−N/2 k2 =−N/2
for −N j1 N − 1, −N/2 j2 N/2 − 1, and Z2 (j1 , j2 ) =
4 N2
N/2−1
N/2−1
f (k1 , k2 )e
k1 =−N/2 k2 =−N/2
for −N/2 j1 N/2 − 1, −N j2 N − 1.
−iπ N (j2 k2 −2j2 j1 k1 /N)
,
GENERALIZED DISCRETE RADON TRANSFORMS
225
F IGURE 21. Elaine. (a) Noisy image. (b) Original image. (c) Stationary orthogonal wavelet transform using Daubechies filter 6. (d) Stationary biorthogonal wavelet transform using 7/9 filters. (e) Curvelet using the dyadic Radon transform. (f) Curvelet using the time-domain Radon transform with λ = 2.
Because these discrete Radon transforms will be used as part of an image recognition algorithm, it is important to consider the efficiency in computing them. Given an N × N image matrix, the number of operations needed to perform the fast slant stack Radon algorithm on an N × N image is O(N 2 log N) (Avenbuch et al., in press). The direct slice Radon transform and the Hartley Radon transform can be computed efficiently using the fast Fourier and the Hartley transforms. The number of operations for these transforms is also of the order of N 2 log N. The time-domain method and the standard Radon algorithm both use O(MN 2 ) flops, where M is the number of angles. Although the time-domain approach uses O(N 3 ) flops in the case M = N, it is possible to streamline the computations to reduce the number of flops to O(N 2 log N) (Brady, 1998).
226
EASLEY AND COLONNA
This reduction in operations is possible by computing any shared partial sum only once. A. Translation Invariant Feature Vectors The rapid transform described below is translation invariant and hence useful for image-signal processing. We shall use the rapid transform to obtain translation invariant feature vectors from the discrete Radon transforms of interest. Definition VII.2. (See Burkhardt and Müller, 1980.) Let n ∈ N, N = 2n , and let m be a divisor of N. For each vector x0 of length N, let x(i,m) be the ith of the m subvectors containing N/m consecutive elements of x0 . Let F be the transformation defined by F (x) = [y, z], where yi = xi + xi+n−1 ,
zi = |xi − xi+n−1 |,
for i = 1, . . . , L/2, L being the length of the vector x0 . The rapid transform of x0 is defined by the following sequential applications of F [x(1,2) , x(2,2) ] = F (x0 ) [x(1,4) , x(2,4) ] = F (x(1,2) ) [x(3,4) , x(4,4) ] = F (x(2,2) ) .. .. . = . [x(2n−1 ,2n ) , x(2n ,2n ) ] = F (x(2n−1 ,2n−1 ) ) with final output {x(1,2n ) , x(2,2n ) , . . . , x(2n ,2n ) }. To obtain a translation invariant feature vector from the Radon data we can also take the absolute value of the DFT along the angular direction, since a translation becomes a phase multiplication in the Fourier domain. B. Construction of a Rotation Invariant Feature Vector Applying the discrete Radon transforms of interest provides a coverage of essentially 180 degrees. In analogy to the case of the standard Radon transform in the continuous setting, in order to obtain 360 degrees of coverage, we adjoin to the output matrix its reflection across the last column.
GENERALIZED DISCRETE RADON TRANSFORMS
227
To obtain a rotation and translation invariant feature vector using the discrete Radon transforms we tested, we proceed as follows. After deriving the projections that contain 360 degrees of coverage, we center each projection around its center of mass (where by mass we mean the intensity of the gray scale) by making a cyclic shift of the projections. A method for obtaining a rotation invariant feature vector (as presented in Al-Shaykh and Doherty, 1996) is to consider, given an image f , the Radon projection matrix (Rfk [l])l,k , apply a singular value decomposition, and take the feature vector to be the diagonal entries of the resulting singular value matrix. This vector is invariant under translation (since the singular values are arranged in decreasing order) and can be further refined by normalizing it to obtain scale invariance and removing the first singular value, that accounts for the DC coefficient (i.e. mean value), and the small values that are sensitive to noise. In our experiments for 64×64 images, we retained as feature vectors the singular values {σi }17 i=2 . The angle of rotation can be estimated by calculating the shift of the left singular vectors. Another method for providing rotation invariant feature vectors is based on the application of the rapid transform (You and Ford, 1992). We propose the application of the Fourier descriptors to the projection matrix along the angular axis. A widely used tool in image recognition to obtain scale invariance is the Mellin transform M defined by ∞
Mf (s) =
r s−1 f (r) dr, 0
where f (r) is the input function, and s is generally a complex number but is chosen to be a real number that is not too large for easier computation and to prevent noise sensitivity. Thus, scale invariance can be obtained by applying the Mellin transform followed by either one of the above transforms. Examples of the use of the standard Radon transform and the fast slant stack Radon algorithm are shown in Figure 22. Observe that if one extracts as feature vectors the sequences of singular values of the transposed projection matrices shown in Figure 22, the differences among the feature vectors extracted from the fast slant stack Radon algorithm are 0, whereas the differences among the feature vectors extracted from the standard Radon transform are not. The feature vectors provided by a more algebraically exact algorithm (i.e., no interpolation is involved) such as the fast slant stack Radon algorithm tend to be more similar for the different rotation angles than in the standard Radon transform case. Although there might be more visually pronounced variations for the discrete Radon transforms used in this section (as the variations shown
228
EASLEY AND COLONNA
F IGURE 22.
(a) Given image. (b) Radon projections. (c) Fast slant stack Radon.
in Figure 22c demonstrate) than in the standard case, the former transforms offer many advantages in terms of speed and efficiency. C. Classifiers: Feedforward Artificial Neural Network and Nearest-Neighbor Network We propose the use of a feedforward artificial neural network (ANN) architecture for classification consisting of three layers: the input layer, the hidden layer, and the output layer. Given a feature vector x = {x1 , . . . , xn }, the input layer consists of n nodes. If m and l are the number of nodes in the hidden layer and the output layer, respectively, the output of the hidden layer at the ith node is given by n h wij xj oi = f j =1
for i = 1, . . . , m, where f is the sigmoidal activation function f (x) =
1 , 1 + e−x
GENERALIZED DISCRETE RADON TRANSFORMS
F IGURE 23.
229
Illustration of a feedforward neural network.
and wij is the weight between the j th input unit and ith hidden unit. The output of the output layer at the kth node is given by oko
=f
m
h wki oi
i=1 is the weight between the ith hidden unit and the for k = 1, . . . , l, where wki kth output unit. The input object is then classified according to which output node has the greatest value. An illustration of a feedforward neural network is shown in Figure 23. The network is trained to find the weights w and w by means of the backprojection algorithm (Haykin, 1999). The parameters used in the experiments are for n = 16, m = 14, and l = 3. The rate of classification is calculated as the sum of the diagonal elements of the confusion matrix (i.e., the matrix whose columns represent the instances in a predicted class and whose rows are the instances of the actual class), divided by the total number of examples given to the network. Another simple network to use for classification is a nearest-neighbor network. The feature vector xtest is assigned the class label of the coefficient vector xtrain in the training set such that the angle between xtest and xtrain is as small as possible.
230
EASLEY AND COLONNA
VIII. R ECOGNITION E XPERIMENTS We created an experimental dataset that consists of the letters X, Y , and Z that can be presented at any desired rotation angle. This character dataset is generated by rotating the linear segments on a refined scale and then quantizing to a discrete grid. In other words, the dataset is generated at a very refined scale, rotated using rotation matrices, and finally pixelated by rounding off the coordinates of the linear segments. This avoids interpolation to artificially rotate the images and we can display a character at any angle. Examples of the characters obtained through the use of this technique are shown in Figure 24. Examples of tested noisy images rotated at angles of 0, 30, and 60 degrees are shown in Figure 25. We can determine the limitations of our recognition techniques because we can test all possible rotations using this kind of dataset. We discovered that the feature vectors extracted by taking the SVD of the standard Radon transform did not create a complete class separation for our original dataset. By this we mean that when we trained the neural network on a set of objects at zero rotation angle and then tested the recognition performance using these feature vectors on the images rotated at increments of 5 degrees with no noise added, we achieved less than a 100% recognition. The problem was that the feature vectors did not clearly separate between the class of objects for all rotation angles. Thus, we adjusted the images so that each object had a slightly different gray scale. This modified dataset then conformed more closely to the dataset tested in Al-Shaykh and Doherty (1996), which consists of images with varying gray scale. After doing so, we were able to classify correctly for all rotation angles using the standard Radon transform method. When dealing with the neural network trained by the method of backprojection, we discovered that using just one example of each object to train the network was inadequate for training the network to be robust against a small amount of noise. For this reason, we trained the network by providing an
F IGURE 24.
Example of database of images rotated at angles from 0 to 360 degrees.
GENERALIZED DISCRETE RADON TRANSFORMS
F IGURE 25.
231
Test images rotated at angles of 0, 30, and 60 degrees.
example of each object rotated at angles of 0, 5, and 10 degrees to improve convergence. With this adjusted training set, the network seems to perform well when using the standard Radon transform. We used the same set to train other neural networks to classify objects by means of the fast slant stack, direct slice, Hartley, and time-domain Radon transforms. The results on the performance given in Figure 26 were calculated as a function of signal to noise. The single parameter SNR ratio was calculated by taking the average of the individual SNR ratios for the different images. For the other recognition techniques the feature vectors were not influenced by the gray scale. Thus, we tested the remaining classifiers for the images of objects that had equal gray-scale values. In addition, the single parameter SNR ratio better reflects the amount of signal to noise for each image. The results of the classification by applying the rapid transform to the various Radon transforms are shown in Figure 27. Figure 28 displays the results of the classification based on the Fourier descriptors after performing the various Radon transforms. Figures 29 and 30 show the performance of the classification when the ridgelet transforms are used. More results of similar experiments can be found in Easley and Colonna (2004).
IX. C ONCLUSION In this chapter, we have extended the versions of the discrete Radon transform in the literature and shown that the new transforms are well suited for many
232
EASLEY AND COLONNA
F IGURE 26.
Performance of ANN given by singular values as feature vectors.
F IGURE 27.
Rapid transform nearest-neighbor results.
GENERALIZED DISCRETE RADON TRANSFORMS
F IGURE 28.
F IGURE 29.
Fourier descriptor nearest-neighbor results.
Ridgelet Fourier descriptor nearest-neighbor results.
233
234
EASLEY AND COLONNA
F IGURE 30.
Ridgelet rapid transform nearest-neighbor results.
image-processing applications. We developed the generalized discrete ridgelet transform and have demonstrated its usefulness as a denoising tool. We obtained a closed-form formula for inverting the discrete p-adic Radon transform and we have determined its frame bounds. We also derived a modified transform with lower condition number when the domain is restricted to zero-mean functions. Our experiments with the use of the various choices of transforms indicate that the curvelet version performs better in terms of ENL than the other local ridgelet transforms. Such a difference in performance between curvelets and local ridgelets did not arise for the artificially corrupted images. During the development of the algorithms we noticed that the performance of the algorithms varies with the particular wavelet and the orthogonal matrix chosen. We decided to use one fixed wavelet (Symmlet with four vanishing moments) for comparison purposes rather than generating different choices, to optimize the results. In many cases, some of the new ridgelet and curvelet variants performed better than the original versions. The success of our versions of the Radon transform can be explained by the fact that the corresponding Radon matrices behave as smoothing matrices that decouple the 2D representation into a set of 1D representations, in analogy to the Radon transform in the continuous setting.
GENERALIZED DISCRETE RADON TRANSFORMS
235
We applied the direct slice Radon and the discrete Hartley transforms for image recognition. The results have shown that all the proposed methods work well when a very high amount of noise is present. The best performance came from applying the Fourier descriptor or the rapid transform to the various ridgelet transforms. In some cases, the direct slice and the fast slant stack Radon transforms performed the best. Zero padding was not implemented when computing the direct slice and the Hartley Radon transforms. This caused some slight wraparound artefacts in the background of the projection matrices for some angles. This means that these projection matrices were the smallest among the various alternatives, so a reduction in dimensionality occurred when extracting the feature vectors from these transforms, which is one of the reasons for their success. The time-domain method was very successful for the case of the Fourier descriptors. In fact, it had 100% classification rate down to the SNR ratio value of approximately 0.2, performing much better than the standard Radon transform. Overall, we have suggested some very efficient and effective invariant object recognition algorithms. A newly developed transform not yet tested is a new fast polar transform (Avenbuch et al., 2003), which uses interpolation to convert the pseudopolar grid from the intermediate step of the fast slant stack algorithm to a polar grid. We intend to study the use of this transform for image recognition in the future.
ACKNOWLEDGEMENTS Most definitions, theorems, and proofs on the generalized discrete Radon transform given in Sections IV and V have been adapted from Colonna and Easley (2005), with kind permission of Springer Science and Business Media.
R EFERENCES Al-Shaykh, O.K., Doherty, J.F. (1996). Invariant image analysis based on Radon transform and SVD. IEEE Trans. on Circuits and Systems II 43, 123–133. Arsenault, H.H., April, G. (1976). Properties of speckle integrated with a finite aperture and logarithmically transformed. J. Opt. Soc. Am. 66, 1160–1163. Avenbuch, A., Shkolnisky, Y. (2003). 3D Fourier based discrete Radon transform. Appl. Comput. Harmonic Anal. 15, 33–69. Avenbuch, A., Coifman, R., Donoho, D., Elad, M., Israeli, M. (2003). Accurate and fast discrete polar Fourier transform. In: Proc. 37th Asilomar Conf. Signals, Systems, and Computers, vol. 2. IEEE, pp. 1933–1937.
236
EASLEY AND COLONNA
Avenbuch, A., Coifman, R., Donoho, D., Israeli, M., Waldén, J. (in press). Fast slant stack: A notion of Radon transform for data in a Cartesian grid which is rapidly computable, algebraically exact, geometrically faithful and invertible. SIAM J. Sci. Comput. Benedetto, J.J., Ferreira, P.J.S.G. (Eds.) (2001). Modern Sampling Theory: Mathematics and Applications. Applied and Numerical Harmonic Analysis Series. Birkhäuser, Boston, MA. Berenstein, C.A., Casadio-Tarabusi, E., Cohen, J.M., Picardello, M.A. (1991). Integral geometry on trees. Am. J. Math. 113, 441–470. Betori, W., Faraut, J., Pagliacci, M. (1989). An inversion formula for the Radon transform on trees. Math. Z. 201, 327–337. Betori, W., Pagliacci, M. (1986). The Radon transform on trees. Bol. Unione Mat. Ital. B 5, 267–277. Bolker, E. (1987). The finite Radon transform. In: Contemp. Math., vol. 63, pp. 27–50. Brady, M.L. (1998). A fast discrete approximation algorithm for the Radon transform. SIAM J. Comput. 27, 107–119. Burkhardt, H., Müller, X. (1980). On invariant sets of a certain class of fast translation-invariant transforms. IEEE Trans. Acoust. Speech Signal Proc. 28, 517–523. Candès, E.J. (1999a). Monoscale ridgelets for the representation of images with edges. Technical Report. Department of Statistics, Stanford University, Stanford, CA. Candès, E.J. (1999b). Ridgelets and their derivatives: Representation of images with edges. In: Cohen, A., Rabut, C., Schumaker, L. (Eds.), Curves and Surface Fitting: Saint-Malo 1999. Vanderbilt University Press, Nashville, TN. Candès, E.J., Donoho, D.L. (2002). New tight frames of curvelets and optimal representations of objects with piecewise-C2 singularities. Comm. Pure Appl. Math. 57, 219–266. Candès, E., Romberg, J., Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52, 489–509. Candès, E.J., Demanet, L., Donoho, D.L., Ying, L. (2005). Fast discrete curvelet transforms. Multiscale Model. Simul. 5, 861–899. Casadio-Tarabusi, E., Cohen, J., Colonna, F. (2000). Range of the horocyclic Radon transform on trees. Ann. Inst. Fourier Grenoble 50, 211–234. Cohen, J.M., Colonna, F. (1993). The functional analysis on the X-ray transform on trees. Adv. Appl. Math. 14, 123–138. Colonna, F., Easley, G.R. (2004). The multichannel deconvolution problem: A discrete analysis. J. Fourier Anal. Appl. 10, 351–376. Colonna, F., Easley, G.R. (2005). Generalized Radon transforms and their use in the ridgelet transform. J. Math. Imaging Vis. 23, 145–165.
GENERALIZED DISCRETE RADON TRANSFORMS
237
Cormack, A.M. (1963). Representation of a function by its line integrals, with some radiological applications. J. Appl. Phys. 34, 2722–2727. Cormack, A.M. (1964). Representation of a function by its line integrals, with some radiological applications II. J. Appl. Phys. 35, 195–207. Davis, P.J. (1979). Circulant Matrices Pure and Applied Mathematics. John Wiley and Sons, New York. Deans, S. (1983). The Radon Transform and Some of Its Applications. John Wiley and Sons, New York. Dewaele, P., Wambacq, P., Oosterlinck, A., Marchand, J.L. (1990). Comparison of some speckle reduction techniques for SAR images. IGARSS 10, 2417–2422. Do, M.N., Vetterli, M. (2000a). Orthonormal finite ridgelet transform for image compression. Proc. Int. Conf. Image Processing 2, 367–370. Do, M.N., Vetterli, M. (2000b). Image denoising using orthonormal finite ridgelet transform. Proc. SPIE Conf. Wavelet Appl. Signal and Image Processing VIII 4119. San Diego, CA. Do, M.N., Vetterli, M. (2001). The finite ridgelet transform for image representation. Technical Report DSC/2001/019. Communication Systems Department EPFL. Donoho, D.L. (2001). Sparse components of images and optimal atomic decomposition. Constr. Approx. 17, 353–382. Donoho, D.L., Stark, P.B. (1989). Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49, 906–931. Easley, G.R., Berenstein, C.A., Healy, D.M. (2005a). Deconvolution in a ridgelet and curvelet domain. In: Proc. SPIE, Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks III, vol. 5818. SPIE, Orlando, FL. Easley, G.R., Healy, D.M., Berenstein, C.A. (2005b). Image deconvolution in a general ridgelet and curvelet domain. Preprint. Easley, G.R., Colonna, F. (2004). Invariant object recognition based on the generalized discrete Radon transforms. Proc. SPIE, Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks II 5439. Orlando, FL. Easley, G.R., Labate, D., Lim, W. (2006). Optimally sparse image representations using shearlets. In: Proc. 40th Asilomar Conf. Signals, Systems, and Computers. IEEE, pp. 974–978. Easley, G.R., Labate, D., Lim, W. (in press). Sparse directional image representations using the discrete shearlet transform. Appl. Comput. Harmonic Anal. Flandrin, P. (1999). Time-Frequency/Time-Scale Analysis. Academic Press, New York. Fourier, J.B.J. (1822). Théorie Analytique de la Chaleur. In: Oeuvres I, with notes by G. Darboux.
238
EASLEY AND COLONNA
Fukuda, S., Hirosawa, H. (1997). Multiresolution analysis and processing of synthetic aperture radar images using wavelets. Geoscience and Remote Sensing Symp., Proc. IGARSS 1997, IEEE Int. Conf. 3, 1187–1189. Fukumi, M., Omatu, S., Nishikawa, Y. (1997). Rotation-invariant neural pattern recognition system estimating a rotation angle. IEEE Trans. Neural Networks 8, 568–581. Gertner, I. (1988). A new efficient algorithm to compute the two-dimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Proc. 36, 1036–1050. Gröchenig, K. (1993). A discrete theory of irregular sampling. Linear Algebra Appl. 193, 129–150. Guo, K., Labate, D. (2007). Optimally sparse multidimensional representation using shearlets. SIAM J. Math. Anal. 39, 298–318. Guo, H., Odegard, J.E., Lang, M., Gopinath, R.A., Selesnick, I.W., Burrus, C.S. (1994). Wavelet based speckle reduction with application to SAR based ATD/R. Image Processing, Proc. ICIP-94, IEEE Int. Conf. 1, 75– 79. Haar, A. (1910). Zur Theorie der onthogonalen Funktionensysteme. Mathematische Annalen 69, 331–371. Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River, New Jersey. Helgason, S. (1980). The Radon Transform. Progr. Math., vol. 5. Birkhäuser, Boston. Hsung, T.C., Lun, D.P.K., Siu, W.C. (1996). The discrete periodic Radon transform. IEEE Trans. Signal Proc. 44, 2651–2657. Khotanzad, A., Hong, Y.H. (1990). Invariant image recognition by Zernike moments. IEEE Trans. Patt. Anal. 12, 489–497. Lu, Y.H., Loh, Y.L., Yeo, T.S., Zhang, C.B. (1999). Speckle reduction by wavelet transform. Microwave Conference, 1999 Asia Pacific 2, 542–545. Lun, D.P.K., Hsung, T.C., Shen, T.W. (2003). Orthogonal discrete periodic Radon transform, Part I: Theory and realization. Signal Processing (EURASIP) 83, 941–955. Elsevier Science, Amsterdam. Kak, A.C., Slaney, M. (1987). Principles of Computerized Tomography Imaging. IEEE Press. Kung, J.P.S. (1987). Matchings and Radon transforms in lattices. II. Concordant sets. Math. Proc. Camb. Phil. Soc. 101, 221–231. Macovski, A., Meyer, C. (1986). A novel fast-scanning system. In: Proc. Fifth Annual Meeting of the Society of Magnetic Resonance in Medicine. Society of Magnetic Resonance, New York, pp. 156–157. Mallat, S. (1998). A Wavelet Tour of Signal Processing. Academic Press, New York. Natterer, F. (2001). The Mathematics of Computerized Tomography. Classics in Applied Math. SIAM, Philadelphia.
GENERALIZED DISCRETE RADON TRANSFORMS
239
Ng, M.K., Chan, R.H., Tang, W. (1999). A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J. Sci. Comput. 21, 851–866. Pun, C.M., Lee, M.C. (2003). Log-polar wavelet energy signatures for rotation and scale invariant texture classification. IEEE Trans. Patt. Anal. 25, 590– 603. Radon, J. (1917). Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Mannigfaltigkeiten. Berichte Sächsische Akademie der Wissenschaften, Leipzig, Math.-Phys. Kl. 69, 262–267 [reprinted in Helgason, 1980, 177–192]. Sha, L., Guo, H., Song, A.W. (2003). An improved gridding method for spiral MRI using nonuniform fast Fourier transform. J. Magn. Reson. 162, 250– 258. Stark, J.L., Candès, E.J., Donoho, D.L. (2002). The curvelet transform for image denoising. IEEE Trans. Image Proc. 11, 670–684. Sveinsson, J.R., Benediktsson, J.A. (2000). Speckle reduction of SAR images using wavelet-domain hidden Markov models. Geoscience and Remote Sensing Symp., Proc. IGARSS 2000, IEEE Int. Conf. 4, 1666–1668. Sveinsson, J.R., Benediktsson, J.A. (2001). Speckle reduction of SAR images in the complex wavelet-domain. Geoscience and Remote Sensing Symp., Proc. IGARSS 2001, IEEE Int. Conf. 5, 2346–2348. Ulfarsson, M.O., Sveinsson, J.R., Benediktsson, J.A. (2002). Speckle reduction of SAR images in the curvelet domain. Geoscience and Remote Sensing Symp., Proc. IGARSS 2002, IEEE Int. Conf. 1, 315–317. Walnut, D.F. (2002). An Introduction to Wavelet Analysis. Birkhäuser, Boston. You, S.D., Ford, G.E. (1992). Object recognition based on projection. International Joint Conference on Neural Networks 4, 31–36.
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 151
Lie Algebraic Methods in Charged Particle Optics ˇ TOMÁŠ RADLICKA Institute of Scientific Instruments AS CR, Královopolská 147, 612 64 Brno, Czech Republic
I. Introduction . . . . . . . . . . . . . . . . II. Trajectory Equations . . . . . . . . . . . . . . A. Newton Formulation . . . . . . . . . . . . . B. Lagrangian Approach . . . . . . . . . . . . C. Hamiltonian Approach . . . . . . . . . . . . D. Hamilton–Jacobi Approach . . . . . . . . . . . III. The Field Computation . . . . . . . . . . . . . A. The Basic Equations . . . . . . . . . . . . . B. The Field in the Vicinity of the Optic Axis . . . . . . IV. Trajectory Equations: Solution Methods . . . . . . . . A. Paraxial Approximation . . . . . . . . . . . . 1. Paraxial Approximation from the Trajectory Equation . . 2. The Hamiltonian Formulation of the Paraxial Approximation B. The Paraxial Transformation: General Dispersion Case . . . C. Polynomial Form of Solution . . . . . . . . . . D. Numerical Methods . . . . . . . . . . . . . V. The Analytic Perturbation Method . . . . . . . . . . A. The Differential Algebraic Method . . . . . . . . . B. The Trajectory Method . . . . . . . . . . . . C. The Eikonal Method . . . . . . . . . . . . . 1. The First-Order Perturbation . . . . . . . . . . 2. The Second-Order Perturbation . . . . . . . . . D. The Lie Algebraic Method . . . . . . . . . . . E. Dispersion and Chromatic Aberration . . . . . . . . 1. The Trajectory Method . . . . . . . . . . . 2. The Lie Algebraic Method . . . . . . . . . . 3. The Differential Algebraic Method . . . . . . . . F. Example: Round Magnetic Lens . . . . . . . . . 1. The Eikonal Method . . . . . . . . . . . . 2. The Lie Algebraic Method . . . . . . . . . . 3. The Trajectory Method . . . . . . . . . . . 4. Comparison of Methods . . . . . . . . . . . VI. The Symplectic Classification of Geometric Aberrations . . . A. Aberrations and Lie Transformations . . . . . . . . B. Aberrations and Paraxial Approximation . . . . . . . . . . . . . . . . . . . . . C. Lie Algebra h2 . D. Representation of h2 on the Space of Complex Polynomials E. Example: Decomposition of Polynomial Space up to Fourth Order 1. Polynomials of the Third Order . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
242 245 245 248 249 250 251 251 252 257 257 258 263 265 267 272 273 273 276 280 282 284 288 295 295 296 297 297 298 302 304 309 310 311 312 315 316 322 322
241 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)00404-1
Copyright 2008, Elsevier Inc. All rights reserved.
242
ˇ RADLI CKA
2. Polynomials of the Fourth Order . . . . . . . . . . . . . . F. Representation of h2 on the Real Space of Polynomials . . . . . . . . G. Example: Real Polynomials up to the Fourth Order . . . . . . . . . 1. The Third-Order Polynomials . . . . . . . . . . . . . . 2. Polynomials of the Fourth Order . . . . . . . . . . . . . . . . . . . . . . H. Decomposition of Polynomials with Respect to M1 I. The Third-Order Axially Symmetric Aberrations . . . . . . . . . J. Reflection Symmetry . . . . . . . . . . . . . . . . . . K. Changing Parameterization . . . . . . . . . . . . . . . . VII. Axial Symmetric Aberrations of the Fifth Order . . . . . . . . . . . A. Axial Symmetric Polynomial of the Sixth Order . . . . . . . . . . B. Algebra of the Axial Symmetric Polynomials up to the Sixth Order . . . . C. Interaction Hamiltonian of a Round Magnetic Lens . . . . . . . . . D. Calculation of g4 and g6 . . . . . . . . . . . . . . . . . E. Spherical Aberration of the Fifth Order . . . . . . . . . . . . F. Distortion: Seidel weight −5/2 . . . . . . . . . . . . . . . G. Coma: Seidel Weight 3/2 . . . . . . . . . . . . . . . . H. Peanut Aberration: Seidel Weight 1/2 . . . . . . . . . . . . . I. Elliptical Coma: Seidel Weight −1/2 . . . . . . . . . . . . . J. Astigmatism and Field Curvature: Seidel Weight −3/2 . . . . . . . . Appendix A. The Hamiltonian Transformation . . . . . . . . . . . Appendix B. The Form of the Interaction Hamiltonian for the Round Magnetic Lens References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
323 324 326 326 327 328 330 332 334 338 339 340 342 344 347 349 351 353 354 355 355 357 360
I. I NTRODUCTION Calculations of electron optical systems are important for the design of electron microscopes, lithography, or other devices. The optical properties are calculated in the approximation of geometrical optics; quantum effects are in most cases negligible. The standard computation methods come from either the trajectory equation, which is derived from the equation of motion for a particle in an electromagnetic field, or from the eikonal equation. Unfortunately, neither equation is analytically solvable for other than trivial problems. The paraxial approximation provides the first insight into the properties of an optical system. This linear approximation of the trajectory equation is also used in light optics. The description of the paraxial properties is essential for the basic system design. Unfortunately, the omission of higher-order terms in the trajectory equation is much more problematic than in light optics. In light optics the maximal resolution is given by the wavelength of light and it varies between 400 and 800 nm. Objects of these dimensions cannot be observed with this technology. On the other hand, the wavelength of electrons is on the order of magnitude of pm depending on energy. Hence, in electron optics the resolution is created mainly by the influence of the nonlinear terms in the trajectory equation.
LIE ALGEBRAIC METHODS
243
A consequence of the nonlinear terms in the trajectory equation is that a point is not imaged on a point by the optical system. This property of optical systems is already known from light optics, where it was described by Seidel in 1856 (Seidel aberrations). However, the systems in electron optics contain more complicated elements; therefore, the nonlinear properties of these systems are also more complicated: elements with lower than axial symmetry often are used in electron optical systems, and because of the magnetic field the electron rays are not perpendicular to the wave surfaces. As a consequence, aberrations with no analogy in light optics exist in the field of electron optics. For these reasons it was necessary to develop new perturbation methods that are suitable for the description of nonlinear properties of electron optical systems. These methods are represented by the trajectory method, the eikonal method, and the Lie algebraic method. All methods are analytical perturbation methods, each of which is from one point of view advantageous and from the other disadvantageous. The trajectory method is based on an iterative solution of trajectory equations, the eikonal method is derived from the perturbation method for the eikonal, and the Lie algebraic method is based on the canonical perturbation method. The trajectory method and the eikonal method have been used from the early days of electron optics. Both methods were introduced into electron optics in the 1930s—the trajectory method by Scherzer (1933) and the eikonal method by Glaser (1935). The eikonal method is used for optical calculation of more complicated systems; the method was pedagogically explained by Hawkes and Kasper (1989) in the textbook of electron optics; the practical use of the method was shown by Rose (2004) in the calculation of aberration correctors. An alternative method was developed by Dragt et al. at the University of Maryland in 1980s. The basic concepts of the Lie algebraic method were published in a series of articles (Dragt, 1987; Dragt and Forest, 1986a, 1986b; Dragt et al., 1988). The method was developed as a tool for description of nonlinear behavior in accelerators where the global stability of the system is the most important property. This problem is connected with the theory of dynamic systems in which the Hamiltonian formalism is a very common and powerful tool. The stability of the phase space according to the transfer map is investigated and the normal form method is the standard tool used for the description of the stability of the system (Bazzani, 1988; Bazzani et al., 1994). Even if the method is used primarily in the accelerator community, its good properties in high-order aberration calculations led to its application in electron optics. First, the axially symmetric magnetic field was calculated (Dragt, 1987; Dragt and Forest, 1986a); later still came the deflection systems (Hu and Tang, 1998, 1999). The goal of all these works was to find new
244
ˇ RADLI CKA
results, mainly for high-order aberrations. There is no work that compares the Lie algebra method with the standard methods in electron optics. This is the main aim of our work here. First, we introduce each method and then compare the procedures for solving a simple round lens. The second chapter contains basic approaches in description of the properties of optical systems, particularly different forms of the trajectory equation and its derivations. In the third chapter the basic properties of fields that are used in charged particle optics are summarized. Particular forms of series expansions of fields in the vicinity of the optic axis are also presented. The next chapter deals with the method of solution of trajectory equations. The majority of the chapter describes the paraxial approximation for monoenergetic and general dispersion case. The form of the solution is described both in Lagrangian and Hamiltonian variables. A procedure for a general case is outlined at the end of the chapter. The fifth chapter provides an overview of the perturbation methods. After a short introduction of the differential algebraic method, the trajectory eikonal and Lie algebraic methods are presented in more details. We show that the Lie algebraic method is based on the canonical perturbation theory, and we demonstrate a significant relationship between the factorization theorem and the canonical perturbation theory. We also extend the Lie algebraic method to the parameterization by the position in the object and aperture plane, which often is used in electron microscopy. All mentioned perturbation methods are applied to a simple but important example—a round magnetic lens. The sixth chapter describes the structure of aberration polynomials according to representation of Sp(2, R). The way in which the classification is connected to the form of the paraxial approximation affects the relationship among the aberration coefficients. Because of the complexity of general system classification, we concentrate on the classification of stigmatic systems. The classification of the aberration polynomials is described as a representation of the Lie group adjoint to the algebra of the quadratic polynomials that is determined by the quadratic part of Hamiltonian. This representation is explicitly presented. The last sections cover the description of the symmetry of the aberration polynomials with respect to the reflection and changing of the parameterization. The last chapter deals with description and calculation of the fifth-order geometric aberrations of an axial symmetric magnetic lens. We present the analytical form of all coefficients. The results are applied to a simple magnetic lens, and the coefficient value for the spherical aberration is compared with results calculated by ray tracing.
LIE ALGEBRAIC METHODS
245
II. T RAJECTORY E QUATIONS The motion of particles in an electromagnetic field is completely determined by the equations of motion. Unfortunately, their mathematical form does not describe the optical properties in an appropriate manner. The use of the time parameterization when only the trajectories of particles are relevant leads to clumsy formulations without direct optical meaning. For this reason, the time parameterization of trajectories is replaced by other parameterization (e.g., by the optic axis position). The equations of motion that use time derivatives are then replaced by the trajectory equation, which contains derivatives with respect to the new independent variable. The solutions of such equations are just particle trajectories—not the motion of particles; however, if the rays are known, the velocity of a particle in a given point can be easily calculated. The coordinate axis z of the used coordinate system coincides with the optic axis, and coordinates x and y form the transversal space. The optical devices are constructed so that the particle with given energy that starts to move on the optic axis with 0 slope remains on the axis. Such a particle is known as a design particle and its energy as design energy. The design trajectory term is used for the trajectory of the design particle. Unless stated otherwise, we restrict ourselves to systems with straight optic axis. The treatment of the trajectory equations differs according to which formulation of mechanics is used. We will review some of them. A. Newton Formulation Newton’s second law, relativistically modified for an electron in an electromagnetic field (Jackson, 1998), d (γ r˙ ) = −eE − e˙r × B, (1) dt completely describes the system evolution. The standard notation for time derivative is used, γ = (1 − v 2 /c2 )−1/2 , r = (x, y, z), the electron charge −e = −1.602 · 10−19 C, and mass of the electron m = 9.109534 · 10−31 kg. Now we switch to the parameterization of the trajectory by the axis coordinate z. Only time independent forces are considered. We begin with the time-parameterized trajectory ζ (t) = (x(t), y(t), z(t)). Let us suppose that it can be reparameterized by the axis coordinate z as ζ (z) = (x(z), y(z), z). The time derivative of the trajectory is then expressed as dζ dz(t) dζ dζ = = vz , dt dz dt dz m
246
ˇ RADLI CKA
or more generally, the time derivative along a trajectory takes the form d d = vz . dt dz The z-component of velocity reads
(2)
v vz = # , 1 + q 2
(3)
with notation q = (x, y)T for the vector of transversal deviations of the d trajectory from the optic axis and q = dz q. The velocity v is in direct relationship with the kinetic energy of the particle. It can be expressed using scalar potential Φ. If we suppose that all particles have the same energy, the additive constant in Φ can be chosen so that the value eΦ coincides with the kinetic energy of the particle, T = eΦ.
(4)
Because the total energy can be written in the form E = γ mc2 = T + mc2 ,
(5)
using Eq. (4) the relativistic factor γ may be written eΦ . mc2 The kinetic momentum and energy are connected by the relation γ =1+
E2 − g 2 = m2 c 2 ; c2 hence, using Eqs. (7), (5), and (4), we can write g= where η =
e ∗1 Φ 2, η
(6)
(7)
(8)
√ e/2m and the acceleration potential e Φ Φ∗ = Φ 1 + 2mc2
(9)
were introduced. Now we can express the velocity from the kinetic momentum eΦ ∗ 2 g = v= mγ ηγ m 1
(10)
LIE ALGEBRAIC METHODS
247
and using Eq. (3), the z component of velocity can be found: eΦ ∗ 2 # . vz = mηγ (1 + q 2 ) 1
From Eq. (2), we find d eΦ ∗ 2 d # = . 2 dt dz mηγ 1 + q 1
(11)
Substituting into Eq. (1) yields (after some trivial calculations) the trajectory equations 1 1 2 d Φ∗ 1 + q 2 2 1 By − y Bz . (12) q = γ ∇Φ + η −Bx + x Bz dz 2 Φ∗ 1 + q 2 Note that this form of trajectory equations can be used only if the trajectory can be parameterized by the axis coordinate; otherwise, as in case of an electric mirror, some other parameterization must be adopted (Hawkes and Kasper, 1989). A new quantity δ with the meaning of the particle energy deviation from the design particle must be introduced for a beam of particles with different energy. As the system is time independent, δ remains constant along each ray. For the scalar potential to be determined uniquely, eΦ must coincide with the kinetic energy of the particle that has design energy. The relationship between the relativistic factor γ and the scalar potential Φ is then modified to γ =1+
δ δ eΦ + = γ0 + mc2 mc2 mc2
(13)
and Eq. (8) similarly 1 2 δ2 e γ0 δ ∗ Φ + + g= . 2 η e 2mec
(14)
The subscript 0 denotes the value for particles with the same energy as the design particle, that is, eΦ . mc2 The form of the trajectory equation (12) remains unchanged; only the acceleration potential must be replaced by γ0 = 1 +
δ2 γ0 δ + , e 2mec2 and the relativistic factor γ by Eq. (13). Φ∗ +
(15)
ˇ RADLI CKA
248
B. Lagrangian Approach The particle trajectories in time parameterization are found as extremals of the functional t2 S=
L r(t), r˙ (t) dt,
(16)
t1
called action. In the case of an electromagnetic field the function L, the Lagrangian, reads (Jackson, 1998) , 2 v L = mc2 1 − 1 − 2 − e(vA − Φ), (17) c where A denotes the vector potential. Because we assume that the Lagrangian does not explicitly depend on time, instead of direct reparameterization, the Maupertius principle (Goldstein, 1980) can be used. The trajectories are then found as the extremals of the functional t2 S=
p˙r dt =
t1
p dq,
(18)
ζ
where the canonical momentum is defined as ∂L(r, v) = g − eA. (19) ∂v Substituting Eq. (19) into Eq. (18) and using z-parameterization yields p=
zi S=
1 1 Φ ∗ 2 1 + q 2 2 − η Ax x + Ay y + Az dz.
(20)
zo
The physical interpretation of the integrand is an index of refraction, which is effectively appropriate for an anisotropic nonhomogeneous medium. Let us denote it 1 1 M(q, q , z) = Φ ∗ 2 1 + q 2 2 − η(Ax x + Ay y + Az ). (21) The trajectory equations found as equations for the extremals of Eq. (20) ∂M d ∂M =0 − dz ∂q ∂q
(22)
LIE ALGEBRAIC METHODS
249
take the form 1 1 2 1 Φ∗ 1 + q 2 2 d By − y Bz , q = γ ∇Φ + η −Bx + x Bz dz 2 Φ∗ 1 + q 2 which are equivalent to Eq. (12). Even though the final trajectory equation is identical to the trajectory equation in Newtonian formulation, the advantage of this approach is that it shows the analogy between light optics represented by Fermat’s principle and electron optics. Moreover, the Lagrangian perturbation methods can be used in the aberration computation. The extension to the case in which the particles have different energy is completely analogous to the description of such systems in Newton formulation. The action to minimalize takes the form 1 / zi . 2 1 δ2 γ0 δ ∗ 2 2 S= Φ + + − η(Ax x + Ay y + Az ) dz, 1+q e 2mec2 zo (23) and its extremals are solutions of the equations equivalent to the trajectory equation extended to cases when the particles have different energy. Note that setting δ = 0 reduces Eq. (23) to Eq. (20). C. Hamiltonian Approach The relativistic electron moving through an electromagnetic field is described by the Hamiltonian (Jackson, 1998) H = m2 c4 + c2 (p + eA)2 − eΦ − mc2 . (24) The value of the Hamiltonian along a phase-space trajectory is equal to δ—the deviation of the particle energy from the design energy. The phase-space trajectories in time parameterization can be found as extremals of the action t2 S= (25) p˙r − H (r, p, t) dt = p dr − H (r, p) dt. t1
ζ
With notation pt = −H the action in z-parameterization of the trajectory reads zi S= (26) pq + pt t − K(q, p, pt , z) dz, zo
ˇ RADLI CKA
250
where henceforth p and A denote transversal part of momentum p = (px , py )T and vector potential A = (Ax , Ay )T , respectively. The function K, the solution of
H(px , py , −K, x, y, z) = −pt , has the meaning of the Hamiltonian in z parameterization and takes the form , e2 p2 (27) K = − 2 Φ ∗ + 2t − 2mpt γ0 − (p + eA)2 + eAz . η c The time and pt play role of canonical variables in z-parameterization. It is convenient to choose phase-space variables so that they all vanish on the design trajectory. Evidently this is true for all of them except t, which must be replaced by new variable τ = c(t − z/v0 ). It causes a change of the canonical conjugate variable pt to pτ = pt /c. This change is represented by the canonical transformation described by the generating function pτ z . (28) F2 = qp˜ + cpτ t − β0 The canonical variables q and p do not change with the transformation, and for convenience we drop the use of tilde ( ˜ ). The new Hamiltonian denoted H reads , e2 pτ . (29) H = − 2 Φ ∗ + pτ2 − 2mcpτ γ0 − (p + eA)2 + eAz − β0 η The trajectory equations can be found as equations for extremals of Eq. (26), which take the form of standard Hamilton equations ∂H , ∂p ∂H , τ = ∂pt
q =
∂H , ∂q ∂H = 0. pτ = − ∂t p = −
(30a) (30b)
The case when the energy of particles does not differ and forces do not depend on time is described by the Hamiltonian with pτ = 0 substituted into Eq. (29). D. Hamilton–Jacobi Approach The Hamilton–Jacobi approach is based on the fact that motion can be described as a canonical transformation that transforms the initial state into the state with given z. Finding such a transformation is equivalent to solving the trajectory equations. In transformed coordinates the evolution is
LIE ALGEBRAIC METHODS
251
an identity, the consequence of which is vanishing of the Hamiltonian. This leads to the Hamilton–Jacobi equation (Goldstein, 1980) ˜ z) ∂F2 (q, p, ∂F2 ,z + = 0, (31) H q, ∂q ∂z the solution of which is the generating function of the desired canonical ˜ p, ˜ z) of the transformation and the trajectories are solutions q = q(q, algebraic equation q˜ =
∂F2 , ∂ p˜
p=
∂F2 , ∂q
(32)
where q˜ and p˜ play the role of the initial conditions. Generally, solving the partial differential equation (31) is more complicated than solving the trajectory equation; however the formalism of canonical transformations is very suitable for the perturbation calculus. The idea is not to compensate the entire Hamiltonian at once [as in Eq. (31)], but to do so step by step, for only those parts that contribute to a given perturbation order. With slight modification, we use this approach in the Lie algebraic method.
III. T HE F IELD C OMPUTATION Except for some trivial assumptions, we have not described the field in which the electrons are moving; therefore, we first summarize basic properties and forms of the electromagnetic field. Because the applied fields are static in the majority of practical optical devices, we can restrict our attention here to the stationary field. A. The Basic Equations The electromagnetic field is described by the Maxwell equations, which in the case of stationary fields reduce to (Jackson, 1998) ∇ × E = 0,
∇ × H = j,
(33a)
∇D = ρ,
∇B = 0.
(33b)
These equations are completed by the material equations D = E,
B = μH (H = νB).
(34)
In ferromagnetic materials the reluctance ν is a function of B = |B|. The space charge density and current density are regarded as functions of position.
ˇ RADLI CKA
252
The source-free Maxwell equations allow us to introduce electromagnetic potentials E = −∇Φ,
B = ∇ × A.
The scalar potential can then be found as the solution of −∇ (r)∇Φ = ρ,
(35)
(36)
which reduces to ∇ 2Φ = 0
(37)
in domains free of space charge. Similarly, the equation for the vector potential can be found as ∇ × ν |∇ × A| ∇ × A = j, (38) which reduces for constant ν and gauge ∇A = 0 to ∇ 2 A = μj.
(39)
Yet another simplification is possible in vacuum current-free domains, where ∇ × H = 0. In such cases, it is always permissible to write B(r) = −∇W (r),
(40)
where W (r) is the scalar magnetic potential. Since ∇B = 0, the scalar magnetic potential satisfies the Laplace equation ∇ 2 W = 0.
(41)
The simplification achieved lies in the fact that only one scalar differential equation is to be solved instead of three coupled equations. B. The Field in the Vicinity of the Optic Axis The region where the field will be computed is a vacuum source-free domain; hence, Eqs. (37) and (41) can be used for the field calculation. The procedure is similar for the electric and magnetic fields; therefore, we show it for the electric case, whereas the magnetic field is only briefly summarized. Using the standard separation of variables the solution of the Poisson equation in cylindrical coordinates can be found in the form Φ=
∞ m=0
Φm (r, z, ϕ) =
∞ m=0
Φm,s (r, z) sin(mϕ) + Φm,c (r, z) cos(mϕ), (42)
253
LIE ALGEBRAIC METHODS
where Φ0 is an axial symmetric field, Φ1 is a dipole field, Φ2 is a quadrupole field, and so on. For each Φm,c and Φm,s can be found (Venturini and Dragt, 1999) ∞ Φm,α (r, z) =
(α = c, s),
dk eikz Im (kr)am,α (k)
(43)
−∞
with Im is the modified Bessel function Im (x) = −i−m Jm (ix) =
∞
x 2n+m
n=0
22n+m n!(n + m)!
(44)
and am,α represents some function of k. Hence, Φm,α =
∞ n=0
r 2n+m 22n+m n!m!
∞ dk eikz k 2n+m am,α (k).
(45)
−∞
When we introduce the functions 1 cm,α (z) = m 2
∞ dk eikz k m am,α (k),
(46)
dk eikz k 2n+m am,α (k),
(47)
−∞
the derivatives of which read (2n) (z) cm,α
(−1)n = 2m
∞ −∞
Eq. (45) can be written in the form Φm,α
∞ (2n) (−1)n cm,α (z) 2n+m = r . n!(n + m)!22n
(48)
n=0
We will next show the form of the standard multipole fields, starting from the axially symmetric field. In this case, the sum in Eq. (42) consists of only one member with m = 0, that is, Φ = Φ0 =
(2n) ∞ (−1)n c0,c (z) n=0
n!2 22n
r 2n .
(49)
(2n)
The meaning of the coefficient c0,c can be easily found φ(z) = Φ(0, z, φ) = c0,c ;
(50)
ˇ RADLI CKA
254 hence, we can write Φ = Φ0,c =
∞ (−1)n φ (2n) (z) n=0
n!2 22n
r 2n .
(51)
The form of the dipole field is more complicated; in this case m = 1 and Eq. (42) reduces to Φ = Φ1,c cos(ϕ) + Φ1,s sin(ϕ) ∞ (2n) (−1)n (2n) r 2n+1 c1,c cos(ϕ) + c1,s sin(ϕ) = 2n n!(n + 1)!2 n=0
=
∞ n=0
(2n) (−1)n (2n) r 2n c1,c x + c1,s y . 2n n!(n + 1)!2
The meaning of the coefficients can be found similarly ∂Φ1 F1 = Ex (z) = − = −c1,c , ∂x x=y=0,z ∂Φ1 = −c1,s . F2 = Ey (z) = − ∂y
(52)
(53)
x=y=0,z
For quadrupole fields, m = 2 and Φ(r, z, ϕ) = Φ2 (r, z, ϕ) ∞ (2n) (−1)n (2n) = r 2n+2 c2,c cos(2ϕ) + c2,s sin(2ϕ) . 2n n!(n + 2)!2
(54)
n=0
Because
and
r 2 sin(2ϕ) = 2r 2 sin ϕ cos ϕ = 2xy
(55)
r 2 cos(2ϕ) = r 2 cos2 ϕ − sin2 ϕ = x 2 − y 2 ,
(56)
Eq. (54) reads Φ2 (r, z, ϕ) =
∞ (−1)n (x 2 + y 2 )n n=0
Because
n!(n + 2)!22n
∂ 2 Φ2 p2 = = 2c2,c , ∂x 2 x=y=0,z
(2n) (2n) c2,c x 2 − y 2 + 2c2,s xy . (57)
∂ 2 Φ2 q2 = = 2c2,s , (58) ∂x∂y x=y=0,z
255
LIE ALGEBRAIC METHODS
the quadrupole field reads Φ2 (r, z, ϕ) =
∞ (−1)n (x 2 + y 2 )n 1 n=0
n!(n + 2)!22n
p2(2n) x 2 2
−y
2
+ q2(2n) xy
.
(59)
A similar approach can be used for higher-orders multipoles. The magnetic scalar potential also satisfies the Poisson equation; hence, the form of the expansion will be similar to the expansion for the electrostatic potential, that is, W (r, z, ϕ) =
∞
Wn (r, z, ϕ),
(60)
n=0
where Wm (r, z, ϕ) ∞ (2n) (−1)n (2n) = r 2n+m dm,c (z) cos(mϕ) + dm,s (z) sin(mϕ) . (61) 2n n!(n + m)!2 n=0
The meaning of the coefficients dm,α can be simply found d0,s = 0, d0,c = − B(0, 0, z) dz, d1,c = −Bx (0, 0, z) =: −B1 , 1 ∂ 2 W d2,c = =: P2 , 2 ∂x 2 (0,0,z)
(62a)
d1,s = −By (0, 0, z) =: −B2 , (62b) 1 ∂ 2 W d1,s = =: Q2 . (62c) 2 ∂x∂y (0,0,z)
A knowledge of the vector potential is necessary for Lagrangian and Hamiltonian formulation. We will derive the form for only the axial symmetric case. The vector potential in the general case in appropriate gauge can be found in Hawkes and Kasper (1989): 1 2 y 2 Ax = − B − x + y B 2 8 1 4 xy 1 3 1 x − y 4 B2 − B1 + x y + xy 3 B1 + x 2 − y 2 B2 − 4 48 2 24 1 3 1 3 − (63a) x − 3xy 2 Q2 − y − 3x 2 y P2 , 12 12 1 x B − x 2 + y 2 B Ay = 2 8 1 4 xy 1 3 1 2 B2 − + x − y 2 B1 − x − y 4 B1 + x y + xy 3 B2 4 48 2 24
ˇ RADLI CKA
256
1 3 1 3 x − 3xy 2 P2 + y − 3x 2 y Q2 , (63b) 12 12 1 Az = −xB2 + yB1 + x 2 + y 2 (xB2 − yB1 ) 8 1 4 1 3 1 2 + x − y 2 Q2 − xyP2 − x − y 4 Q2 + x y + xy 3 P2 2 24 12 1 3 1 3 2 2 − x − 3xy Q3 − y − 3x y P3 . (63c) 6 6 Now we derive the axially symmetric vector potential up to higher orders, because it will be needed later. The axial symmetry of the system implies that all components of the vector potential except Aϕ vanish; that is, A = Aϕ eϕ ; hence, using Eq. (35) −
Br = −
∂Aϕ ∂z
(64)
and using Eqs. (40) and (61) ∞ n(−1)n 2n−1 (2n−1) r d0,c (z) Aϕ = − Br dz = ∂r W0 dz = (n!)2 22n−1 n=1
=
∞
(−1)n
n=0
(n + 1)!n!22n+1
r 2n+1 B (2n) (z) =
1 rΠ (z, r), 2
(65)
where Π (z, r) =
∞ (−1)n B (2n) 2n r . (n + 1)!n!22n
(66)
n=0
In Cartesian coordinates the vector potential reads 1 Ax = − sin ϕAϕ = − yΠ z, q2 , 2 1 Ay = cos ϕAϕ = xΠ z, q2 , 2 Az = 0.
(67a) (67b) (67c)
The last yet not the least trivial issue relates to determining the axial field distribution. It can be obtained from numerical solution of Maxwell equations. Three common methods exist—the boundary elements method (BEM) (Harrington, 1967); the finite difference method (FDM) (Rouse and Munro, 1989); and the finite elements method (FEM) (Khursheed, 1999; Lencová, 1995). Each method computes the field values in nodal points; the approximate field value in any point can then be computed using some
LIE ALGEBRAIC METHODS
257
of the spline methods (Barth et al., 1990; Press et al., 1986). However, the computation of potential derivatives is more sophisticated; in fact, the accuracy of the methods used is not good. This creates other sources of errors in analytical calculation of aberrations. Several methods of calculating the high-order derivatives of the axial potential exist. Munro et al. (MEBS, 2007; Wang et al., 2004) use the expansion into the basis of Hermite functions. Matsuya, Saito and Nakagawa (1995) used the expansion into another basis. Berz (1999) introduced the method based on Gaussian wavelet transformation, which Liu (2006) used for the calculation of an electrostatic lens. The methods mentioned do not use any property that the field fulfills. Conversely, Venturini and Dragt (1999) proposed for magnetic fields a way of using the fact that the field is a solution of the Poisson equation. A similar method was introduced by Manikonda and Berz (2006) and was implemented in the computer code COSY INFINITY (Michigan State University, Department of Physics and Astronomy, East Lansing, MI, USA) (COSY INFINITY, 2007).
IV. T RAJECTORY E QUATIONS : S OLUTION M ETHODS We have shown that the forms of the trajectory equations are similar independent of the manner in which they are derived or of the coordinates used. From the mathematical point of view, they are the set of two nonlinear ordinary differential equations of the second order with variable coefficients in the case of Lagrangian coordinates, which is equivalent to the set of four ordinary differential equations of the first order in the Hamiltonian case. In general, no analytic form of the solution exists; hence, perturbation or numerical methods must be used. Although the numerical solution relying mostly on the Runge–Kutta algorithm is the easiest and most exact, the methods that allow the solution to be found in the form of a polynomial in initial conditions are very suitable. We describe the features of these two classes of methods in this chapter. A. Paraxial Approximation The particles moving close to the optic axis are well described by the paraxial approximation. This approach is based on the fact that the deviation of the particle trajectories from the axis and their slopes are so small that their second and higher powers can be neglected. It is the standard assumption of Gaussian optics. First, we will presuppose that the energy of particles does not differ. Similar to the case of the trajectory equation, two approaches can be used here: the direct linearization of the trajectory equation or the Hamiltonian approach. We review both of them.
ˇ RADLI CKA
258
1. Paraxial Approximation from the Trajectory Equation The paraxial trajectory equations emerge by linearization of general trajectory Eq. (12), d ∗ 1 F1 (F1 x + F2 y) γ Ex + η(By − y Bz ) − , φ 2x = − 1 3 dz 2φ ∗ 2 4φ ∗ 2
(68a)
γ Ey d ∗ 1 F2 (F1 x + F2 y) + η(x Bz − Bx ) − , φ 2y = − 1 3 dz 2φ ∗ 2 4φ ∗ 2
(68b)
where 1 φ x + F1 − p2 x − q2 y, 2 1 Ey = φ y + F2 + p2 y − q2 x, 2 Ez = −φ
Ex =
and 1 Bx = − B x + B1 − Q2 y − P2 x, 2 1 By = − B y + B2 − Q2 x + P2 y, 2 Bz = B. eφ Henceforward γ relates to the design particle, γ = 1 + mc 2 . Having the orientation of dipoles and quadrupoles selected such that x–z plane is the symmetry plane of the electric dipole and quadrupole fields and the antisymmetry plane of the magnetic dipole and quadrupole fields, F2 , q2 , B1 , and P2 vanish. This reduces the paraxial approximation to F12 η 1 γ φ γφ γp2 ηQ2 B y+y B − ∗ + + ∗2 x + 1 x + ∗x + 1 2φ 4φ ∗ 2φ 4φ φ∗ 2 φ∗ 2 2 γ F1 ηB2 =− ∗ + 1, (69a) 2φ φ∗ 2 η 1 γ φ γφ γp2 ηQ2 B x + x B = 0. y + ∗y + y− 1 + ∗ − 1 2φ 4φ ∗ 2φ φ∗ 2 φ∗ 2 2 (69b) The linear differential equations of the second-order (69a) and (69b) are not separated and homogeneous. The term that causes the mixing of coordinates generates the rotation of particles around the z-axis; it can be eliminated by
LIE ALGEBRAIC METHODS
transition to rotating coordinates X x = Rˆ , Y y where Rˆ =
cos Θ − sin Θ
sin Θ cos Θ
259
(70) (71)
represents the rotation through an angle η Θ(z) = 2
z
B(z) φ ∗ 2 (z) 1
zi
dz.
In such coordinates the paraxial trajectory equations read F12 γ φ γ φ η2 B 2 + + ∗2 X X + ∗X + 2φ 4φ ∗ 4φ ∗2 8φ 2 F1 γp2 ηQ2 cos(2Θ)X − sin(2Θ)Y + − ∗ + 1 ∗2 2φ 8φ φ∗ 2 γ F1 ηB2 cos Θ, − = ∗ 12 2φ ∗ φ F12 γ φ γ φ η2 B 2 Y + ∗Y + + + ∗2 Y 2φ 4φ ∗ 4φ ∗2 8φ 2 F1 γp2 ηQ2 − sin(2Θ)X + cos(2Θ)Y − + 1 2φ ∗ 8φ ∗2 φ∗ 2 ηB2 γ F1 =− sin Θ. − 1 2φ ∗ φ∗ 2
(72a)
(72b)
Unfortunately, the equations are still neither separated nor homogeneous, because the field considered is too general for real optic devices, which contain fields restricted by given requirements. Some of the requirements for the use of most of these devices are considered here. The stability condition for the design particle trajectory is the first such requirement. It is conditioned by vanishing of the right-hand sides of Eqs. (72a) and (72b)—that is, B2 −
γ F1 2ηφ
∗ 12
= B2 −
F1 = 0, vo
(73)
ˇ RADLI CKA
260
F IGURE 1.
The paraxial rays in a magnetic lens using standard and rotating coordinates.
which forms a relationship between magnetic and electric dipole fields—the Wien condition. Because the paraxial trajectory equations (72a) and (72b) are not separated, another natural restriction is to have a field in which they would be. Two such situations exist. The first represents systems in which no axial magnetic field is present. Thus, Θ = 0, which leads to the paraxial trajectory equations X +
F12 γ φ γp2 ηQ2 X = 0, + − + 1 4φ ∗ 2φ ∗ 4φ ∗2 φ∗ 2 γ φ γφ γp2 ηQ2 Y + ∗ Y + Y = 0. + − 1 2φ 4φ ∗ 2φ ∗ φ∗ 2
γ φ X + 2φ ∗
LIE ALGEBRAIC METHODS
261
The second and more interesting situation occurs when F12 γp2 ηQ2 − ∗ + = 0. (74) 1 2φ 8φ ∗2 φ∗ 2 The paraxial trajectory equations are then separated; moreover, their form is the same for X and Y coordinates: F12 γ φ γ φ η2 B 2 (75) + + ∗2 Q = 0, Q + ∗Q + 2φ 4φ ∗ 4φ ∗2 8φ where the vector Q defined as Q = (X, Y )T was used. A consequence is that an electron initially traveling on any surface αX + βY = 0 remains on this surface. Such types of systems are called stigmatic, and the condition expressed in Eq. (74) is known as the stigmatic condition. We describe only stigmatic systems fulfilling the Wien condition unless stated otherwise. This form of the paraxial equations occurs later; therefore, it is suitable to define a linear operator Pˆ1 : F12 γ φ γ φ η2 B 2 f. (76) + + Pˆ1 (f ) = f + ∗ f + 2φ 4φ ∗ 4φ ∗2 8φ ∗2 Eq. (75) thus is then equivalent to Pˆ1 (Q) = 0. The trajectory equations may be written in more compact form using Picht’s transformation Qp = φ ∗ 4 Q, 1
(77)
which transforms the trajectory equations into F12 (2 + γ 2 )φ 2 η2 B 2 Qp + + + ∗2 Qp = 0. (78) 4φ ∗ 16φ ∗2 8φ The result is of great interest for two reasons. First, it is simpler to perform numerical calculations with this approach than with Eqs. (69a) and (69b). Secondly, as φ ∗ 0, the coefficient F12 (2 + γ 2 )φ 2 η2 B 2 + + 4φ ∗ 16φ ∗2 8φ ∗2 in Eq. (78) is essentially nonnegative, which imposes an interesting restriction on stigmatic electron lenses: they always exert a converging action. Using numerical calculations the solution for Eq. (75) can be found in the form of a block matrix g(z)1ˆ h(z)1ˆ Qo Q(z) , (79) = Qo Q (z) g (z)1ˆ h (z)1ˆ G(z) =
ˇ RADLI CKA
262
where 1ˆ = 10 01 , subscript o means that the value of a function it evaluated in the object plane z = zo , and g(z) or h(z) fulfill Pˆ1 (g) = 0, Pˆ1 (h) = 0,
g(zo ) = 1, g (zo ) = 0, h(zo ) = 0, h (zo ) = 1.
The solution in the original coordinates then takes form g(z)1ˆ h(z)1ˆ qo 0 Rˆ −1 q = , qo −Θ JˆRˆ −1 Rˆ −1 q g (z)1ˆ h (z)1ˆ where Jˆ is the standard symplectic matrix 0 1 Jˆ = . −1 0
(80)
(81)
The solutions in Eqs. (79) and (80) are parameterized by the position of the ray in object plane and its slopes. Similarly, we can parameterize the rays using the position in the object plane and the position in the aperture plane z = za . Let us choose the pair of independent solutions s(z) and t (z) that fulfil s(zo ) = 1, s(za ) = 0, Pˆ1 (s) = 0, t (zo ) = 0, t (za ) = 1. Pˆ1 (t) = 0, The solution in rotating coordinates then reads Q(z) s(z)1ˆ t (z)1ˆ Qo , = Q (z) s (z)1ˆ t (z)1ˆ Qa and in original coordinates takes form q 0 Rˆ −1 s(z)1ˆ t (z)1ˆ qo = . q −Θ JˆRˆ −1 Rˆ −1 s (z)1ˆ t (z)1ˆ qa In the first case, we can write the Wronskian function as ∗ − 1 ∗ − 1 2 2 φ φo Wg = g(z)h (z) − h(z)g (z) = Wg (zo ) o∗ = , ∗ φ φ whereas in the second case, the Wronskian reads ∗ − 1 2 φ W = s(z)t (z) − t (z)s (z) = Ws (zo ) o∗ . φ In both cases the quality φ ∗ 2 (z)W (z) = const 1
is invariant.
(82)
(83)
(84)
(85)
(86)
LIE ALGEBRAIC METHODS
263
2. The Hamiltonian Formulation of the Paraxial Approximation Unlike the expansion of the trajectory equation used in the previous paragraph, the Hamiltonian approach is based on the expansion of the Hamiltonian into polynomials in canonical variables. The second-order terms of the expanded Hamiltonian completely describe the paraxial properties. Because we are considering the monochromatic systems, pτ = 0 is substituted into Eq. (29). Using the same restriction on dipole and quadrupole fields as in the previous paragraph, the expansion up to the second order takes the form e 1 (87a) H0 = − φ ∗ 2 , η F1 (87b) − B2 x, H1 = e v0 eF12 η ηB 1 eγ0 φ eηB 2 2 H2 = q2 p + Lz + + + ∗ 12 ∗ 12 ∗ 12 ∗ 12 ∗ 32 2 2eφ 2φ 4ηφ 4φ 8ηφ 2 1 eF1 eγ0 p2 (87c) + − + eQ2 x 2 − y 2 . ∗ 12 2 8ηφ ∗ 32 2ηφ The notation Lz = xpy − ypx for the z-component of angular momentum L was used. The zero-order part of the Hamiltonian does not contribute to the equations of motion, and vanishing of the term H1 is equivalent to fulfilling Wien’s condition. The transition into rotating coordinates is represented by the extended canonical transformation ˆ Q = Rq, η ˆ P = Rp, e
(88)
where Rˆ is given in Eq. (71), which transforms the Hamiltonian according to ∂F2 η ˜ H q(Q, P, z), q(Q, P, z), z + , H = e ∂z where the generation function F2 reads ˆ F2 = (Rq)P. Applying this to the quadratic part of the Hamiltonian yields 2B 2 F12 γ φ 1 1 η 2 Q2 P + + + H˜ 2 = ∗ 12 ∗ 12 ∗ 12 ∗ 32 2 2φ 4φ 4φ 8φ 2 1 F1 γ 0 p2 ηQ2 cos(2Θ) sin(2Θ) T Q Q, + − + 1 1 sin(2Θ) − cos(2Θ) 2 8φ ∗ 32 2φ ∗ 2 φ∗ 2 (89)
ˇ RADLI CKA
264
which for stigmatic systems reduces to 2B 2 F12 γ φ 1 1 η 2 Q2 . H˜ 2 = P + + + ∗ 12 ∗ 12 ∗ 12 ∗ 32 2 2φ 4φ 4φ 8φ
(90)
The quadratic part of such a Hamiltonian is axially symmetric, which implies that Lz is the integral of motion in the paraxial approximation. The Hamilton equations of motion can be used to find the paraxial trajectory equations Q = φ ∗− 2 P, F12 γ φ η2 B 2 P =− Q, + + 1 1 3 4φ ∗ 2 4φ ∗ 2 8φ ∗ 2 1
(91a) (91b)
which, after elimination of P, can be written in the form of ordinary differential equations of the second order F12 γ φ γ φ η2 B 2 (92) + + ∗2 Q = 0, Q + ∗Q + 2φ 4φ ∗ 4φ ∗2 8φ in agreement with the results of the previous paragraph. The solution then takes a form analogous to Eq. (79) ⎛ ⎞ ∗− 12 ˆ ˆ g(z)1 φo h(z)1 Q Qo ⎝ ⎠ = (93) 1 φ∗ P Po φ ∗ 2 g (z)1ˆ h (z)1ˆ φo∗
and in original coordinates ˆ −1 R q = p 0
0 e ˆ −1 R
η
⎛ ⎝
g(z)1ˆ φ ∗ 2 g (z)1ˆ 1
∗− 12
φo ∗
h1ˆ
φ ˆ φo∗ h (z)1
⎞ ⎠
qo po
.
(94)
When the position in the object and aperture planes is used to parameterize the rays, the solution in rotating coordinates takes a similar form s(z)1ˆ t (z)1ˆ Qo Q (95) = 1 1 P Qa φ ∗ 2 s (z)1ˆ φ ∗ 2 t (z)1ˆ and in the original coordinates ˆ −1 s(z)1ˆ t (z)1ˆ 0 R 1 q = e ˆ −1 ∗ 12 ∗ 12 0 p ˆ ˆ 0 R φ s (z)1 φ t (z)1 η
0 ˆ a) R(z
qo qa
.
(96)
LIE ALGEBRAIC METHODS
265
The Picht transformation corresponds to the canonical transformation Qp = φ ∗ 4 Q, 1
Pp = φ ∗− 4 P, 1
(97)
which transforms the quadratic part of the Hamiltonian into the form of the Hamiltonian for a linear oscillator F12 1 2 1 (2 + γ 2 )φ 2 η2 B 2 ˜ H2 = P + + + ∗2 Q2 . (98) 2 2 4φ ∗ 16φ ∗2 8φ The equations of motion then agree with trajectory equations (78). In both approaches the paraxial approximation is represented by the linear transformation described by the transfer matrix. The main consequence is that a point in the object plane is imaged into a point in the image plane. This is an important property of the paraxial monochromatic optics. For detailed description of the paraxial properties, cardinal elements, and so forth, see Hawkes and Kasper (1989), for example. B. The Paraxial Transformation: General Dispersion Case We can now abandon the assumption that all electrons have the same energy. According to the previous description, the new variable δ describing the energy deviation must be introduced. Just as for the space variables, only the first power of δ will contribute to the paraxial approximation. The paraxial trajectory equation is then obtained by linearization of the general trajectory equation described previously. If the same assumptions as in the previous subsection are used for the field, the trajectory equations read F12 η 1 γ φ γφ γp2 ηQ2 B x + ∗ x + x + − + + y + y B 1 1 2φ 4φ ∗ 2φ ∗ 4φ ∗2 φ∗ 2 φ∗ 2 2 ηB2 δF1 γ F1 , (99a) =− ∗ + 1 + 2φ 4eφ ∗2 φ∗ 2 η 1 γ φ γφ γp2 ηQ2 B x + x B = 0. (99b) y− 1 + ∗ − y + ∗y + 1 2φ 4φ ∗ 2φ φ∗ 2 φ∗ 2 2 The equations must be completed by the equation δ = 0. In the rotation coordinates the trajectory equations of the stigmatic systems fulfilling the Wien condition can be written F12 δF1 γ φ γ φ η2 B 2 X + ∗ X + X= + + cos(Θ), (100a) ∗ ∗2 ∗2 2φ 4φ 4φ 8φ 4eφ ∗2
ˇ RADLI CKA
266 γ φ Y + ∗Y + 2φ
F12 δF1 γ φ η2 B 2 + + ∗2 Y = − sin(Θ). (100b) ∗ ∗2 4φ 4φ 8φ 4eφ ∗2
These are two separate differential equations but, unlike those of the previous chapter, they are inhomogeneous. The method of variation of parameters can be used to find the general solution X g(z) h(z) −g(z)μ(h) ˆ + h(z)μ(g) ˆ Xo X = g (z) h (z) −g (z)μ(h) ˆ + h (z)μ(g) ˆ Xo , (101) δ δ 0 0 1 Y g(z) h(z) −g(z)νˆ (h) + h(z)ν(g) ˆ Yo (102) Yo , Y = g (z) h (z) −g (z)νˆ (h) + h (z)νˆ (g) δ δ 0 0 1 where the functionals are defined as follows: 1 F1 f cos Θ μ(f ˆ )= dz, 1 ∗2 ∗ 32 φ 4eφo 1 F1 f sin Θ νˆ (f ) = − dz. 3 ∗ 12 φ∗ 2 4eφo
(103)
Finding the solution in the original coordinates x, y is trivial. Note that both μ(f ˆ ) and ν(f ˆ ) vanish when the dipole field is not present in the system; when the system does not contain a dipole field, there is no dispersion. In the Hamiltonian approach the dispersion case is described by the quadratic part of the Hamiltonian (29), in which, unlike Eq. (87c), pτ is not neglected. When we consider the stigmatic systems in which the Wien condition is satisfied, the quadratic part of the Hamiltonian reads eF12 η ηB 1 eγ0 φ eηB 2 2 q2 H2 = p + Lz + + + 1 1 1 3 2 4ηφ ∗ 12 2eφ ∗ 2 2φ ∗ 2 4φ ∗ 2 8ηφ ∗ 2 +
ηmc2 4e2 φ
∗ 32
pτ2 +
F1 mcη 2eφ ∗ 2 3
xpτ .
(104)
The transition into rotating coordinates and using Pτ = ηe pτ transforms the Hamiltonian into F12 1 1 γ φ η2 B 2 2 ˜ Q2 H2 = P + + + ∗ 12 ∗ 12 ∗ 12 ∗ 32 2 2φ 4φ 4φ 8φ +
mc2 4eφ
∗ 32
Pτ2 +
F1 mcη 2eφ
∗ 23
cos ΘXPτ −
F1 mcη 2eφ ∗ 2 3
sin ΘY Pτ . (105)
LIE ALGEBRAIC METHODS
267
We start from results of the previous subsection. First a canonical transformation, which compensates the geometrical part of Eq. (105), is applied. It coincides with Eq. (93) and the transformed Hamiltonian takes the form 1 mc2 ˜ 2 F1 mcη ˜τ g(z)X˜ + h(z)φo∗− 2 P˜x H¯ 2 = + cos Θ P P τ ∗ 32 ∗ 32 4eφ 2eφ F1 mcη ∗− 1 − sin Θ P˜τ g(z)Y˜ + h(z)φo 2 P˜y . 3 2eφ ∗ 2
(106)
In these coordinates, the trajectory equations are trivial X˜ =
F1 mcη 2eφ
P˜x = −
∗ 32
∗1 φo 2
F1 mcη 2eφ
∗ 32
h(z) cos Θ P˜τ ,
g(z) cos Θ P˜τ ,
F1 mcη Y˜ = − h(z) sin Θ P˜τ , 1 ∗ 32 ∗ 2 2eφ φo P˜y =
F1 mcη 2eφ
∗ 32
g(z) sin Θ P˜τ ,
P˜τ = 0.
(107a) (107b) (107c)
The solution is found after integration: ⎛1 0 0 0 ⎞ 2mcημ(h) ˆ ⎛ ˜ ⎞ ⎛ ˜ ⎞ Xo X 0 1 0 0 2mcη ν ˆ (h) ⎜ ⎟ ˜ Y ⎜ ⎟ ⎜ ⎟ ⎜ Y˜o ⎟ ∗ 12 ⎜ P˜ ⎟ ⎜ ⎟ ⎜ P˜ ⎟ ˆ ⎟ ⎜ xo ⎟ . ⎜ x ⎟ = ⎜ 0 0 1 0 −2mcηφo μ(g) ⎟ ⎝ P˜ ⎠ ⎝ P˜ ⎠ ⎜ y yo ⎝ ⎠ ∗ 12 ˆ 0 0 0 1 −2mcηφo ν(g) P˜τ P˜τ o 0 0 0 0 1
(108)
η δ verifies that the Transforming to rotating coordinates and using Pτ = ec result is in agreement with Eqs. (101) and (102). As in the monochromatic case, the dispersion paraxial approximation is described by a linear transformation. But in contrast to the monochromatic case, the added dimension causes the following result—in presence of the dipole field, a point in the object plane is not imaged into a point in the image plane. In fact, the trajectories of electrons with energy about δ higher than the design energy are shifted about a value proportional to δ. An example of imaging with two different energies of electrons can be seen in Figure 2.
C. Polynomial Form of Solution A direct extension of the paraxial approximation, in which the second and higher powers of the trajectory deviations and slopes are neglected, results in methods in which only powers of higher order than a given one can
ˇ RADLI CKA
268
F IGURE 2.
Focus of the rays with different energy.
be neglected. The computed solution is then in the form of a k-order polynomial, where k is the highest power that is not neglected. Unfortunately, this extension leads to great difficulties in solving the trajectory equation. Adding the higher-orders terms causes the equations to become nonlinear. The trajectory equations in rotating coordinates now read δF1 Pˆ1 (X) − cos Θ = f2 (Q, Q , Q , δ, z) + · · · + fn (Q, Q , Q , δ, z), 4eφ ∗2 (109a) δF 1 sin Θ = g2 (Q, Q , Q , δ, z) + · · · + gn (Q, Q , Q , δ, z), Pˆ1 (Y ) + 4eφ ∗2 (109b) where fl (Q, Q , Q , δ, z) and gl (Q, Q , Q , δ, z) are homogeneous polynomials of l-th order in the variables Q, Q , Q , and δ with z-dependent coefficients. These equations cannot be solved analytically. Fortunately, we do not need the exact solution but only its part up to the n-th order polynomial in the variables mentioned: x(z) =
i+j +k+ln
aij kl x i y j x k y l
(110)
LIE ALGEBRAIC METHODS
in the case of Lagrangian coordinates, or bij kl x i y j pxk pyl x(z) =
269
(111)
i+j +k+ln
in the case of Hamiltonian coordinates. A number of perturbation methods exist for the computation of such a solution. They are based on an interactive solution of the trajectory equations, Lagrangian or Hamiltonian perturbation method, or on the solution of the trajectory equations in the higher-degree polynomial space where they become linear. Some of these methods are described in the next section. What effect do the higher-order terms have from a physical point of view? They change the image properties rapidly. In brief, they cause an object plane point not to be imaged into a point in the image plane; the focusing is perturbed. We observe it as image imperfections. The benefits of this form of solution compared to direct numerical solution of the equations of motion lie in the possibility of obtaining a qualitative description of the imperfection. Each coefficient of the polynomial represents an aberration, a basic optical characteristic of the device. Such coefficients are known as aberration coefficients. Knowledge of the aberration coefficients is sufficient to classify the imperfections and to find ways of compensating for them. The perturbation methods compute the coefficients of the polynomial solution either in analytical or numerical form. The analytical form of the coefficients includes most information about the optical elements; for example, the spherical aberration coefficient of the axial symmetric magnetic lens reads (Hawkes and Kasper, 1989) CS =
∗− 1 φo 2
zi
L1 h4 + 2L2 h2 h 2 + L3 h 4 dz,
(112)
zo
where L1 =
η2 B B
+
η4 B 4
, (113) 1 3 8φ ∗ 2 32φ ∗ 2 η2 B 2 L2 = , (114) 1 8φ ∗ 2 1 1 L3 = φ ∗ 2 . (115) 2 The influence of the field is readily apparent from this form of solution, but the mathematical form of the term is complicated even if the system is very
270
ˇ RADLI CKA
F IGURE 3. Paraxial beam focus and focus of the same beam influenced by the spherical aberration. zi is the position of the Gaussian image.
271
LIE ALGEBRAIC METHODS
simple. In fact, the complexity of such terms increases with the polynomial order and number of elements in the device. In general, its use for higherorder aberrations is very difficult. On the other hand, there are the numerical methods, which allow us to express the numerical solution in polynomial form. The aberration coefficients are then computed numerically. They do not include as much information about the system, but the method can easily be used for higher-order aberrations. In general, the coefficients differ depending on the choice of the parameterization method. We now show the relationship between the coefficients in parameterization by qo , qo and the coefficients in parameterization by canonically conjugate variables qo , po . The relationship between these two parameterizations and parameterization by the object and aperture position is discussed later in this chapter. The relation between p and q is given by p=
∂M , ∂q
where M is defined in Eq. (21), that is, p=Φ
∗ 12
#
(116)
q 1 + q 2
−η
Ax Ay
.
(117)
In real devices, the object is often outside the field and Eq. (117) is then reduced to p = φ∗ 2 # 1
q 1 + q 2
.
(118)
In this case, x i y j pxk pyl = φ ∗
k+l 2
= φ∗
k+l 2
− k+l x i y j x k y l 1 + q 2 2
xi yj x ky l 2 2 k+l 1 1 2 +1 q × 1 − (k + l)q + (k + l) + ··· . 2 4 2 (119)
Substituting in Eq. (111) and calculating the coefficients at x i y j x k y l , we can easily find the desired relationship. As an example we show how to proceed in the case of a lens, which has only spherical aberration. The ray is then described by 2 3 (120) q = M q + C3 p2 p + C5 p2 p + C7 p2 p + · · ·
ˇ RADLI CKA
272 or by
2 3 q = M q + C˜ 3 q 2 q + C˜ 5 q 2 q + C˜ 7 q 2 q + · · · .
(121)
With the aid of Eq. (119), Eq. (120) can be transformed to 3 2 15 2 2 ∗ 32 2 q = M q + C3 φ q q 1 − q + q 2 8 3 2 ∗ 52 2 2 ∗ 72 2 3 + C7 φ q q 1− q q + o(9), (122) + C5 φ q 2 and by comparing Eqs. (121) and (122), we derive C˜ 3 = φ ∗ 2 C3 , 3
(123a) 3 3 C˜ 5 = φ C5 − φ ∗ 2 C3 , (123b) 2 7 3 5 15 3 C˜ 7 = φ ∗ 2 C7 − φ ∗ 2 C5 + φ ∗ 2 C3 , (123c) 2 8 where coefficients with tildes denote the spherical aberration coefficients in the parameterization by object position and slope. ∗ 52
D. Numerical Methods The numerical methods, which are based primarily on the standard Runge– Kutta numerical algorithm, compute the ray from its position and slope in the object plane. The methods do not use the analytical form of the field in the vicinity of the optic axis, but the values in an arbitrary point are calculated by a spline method (Barth et al., 1990; Zhu and Munro, 1989) from the values of the field at nodal points computed by (FEM, BEM, FDM). The advantage of this approach is that the field derivatives need not be computed, hereby avoiding a source of errors. The computation usually does not use the trajectory equations but it is based on the solution of the equation of motion in time parameterization d (mγ r˙ ) = −eE − e˙r × B, (124) dt where, in contrast to analytical methods, the values of γ , E, and B are given in every point in the manner mentioned previously. The accuracy of such methods is determined by the choice of numerical algorithm; generally it is very high, especially for small z. Nevertheless, methods have two disadvantages. The first restriction is common to any numerical method. Hence the computation of one ray does not include any information about neighboring
LIE ALGEBRAIC METHODS
273
rays, each ray of interest must be computed separately. This means that we cannot qualitatively describe the relationship between object ray conditions and the image properties—the aberrations coefficients—as we could from the polynomial form of solutions. Although the polynomial regression of numerical results is the method used for such problems, it is advisable for low orders of aberrations only (Hawkes and Kasper, 1989). The second disadvantage lies in the fact that the method forms a relationship between only the initial and final coordinates without any parameter adopted. This means that the solution itself does not include any information about the system; hence, it is not possible to recognize which elements must be changed to improve it. Despite these issues, the method is a powerful tool for exact ray computation when the field distribution is known. Moreover, it is useful for checking the accuracy of the polynomial form of solution, but its use in designing optical devices is limited.
V. T HE A NALYTIC P ERTURBATION M ETHOD The analytical perturbation methods are used to describe the nonlinear properties of electron optical systems. They are represented by the trajectory method, the eikonal method, and the Lie algebraic method. The differential algebraic method also can be used for calculation of aberration integrals, but it is used for frequently for numerical evaluation of the aberration coefficients. The first calculation of aberrations was done in the early 1930s by Scherzer (1933), who used the trajectory method, and by Glaser (1935), who introduced the eikonal method into electron optics. The other methods were introduced much later: the Lie algebraic method by Dragt (Dragt and Forest, 1986b) in the 1980s and the differential algebraic method by Berz (1999) in the late 1990s. We first briefly describe each of the perturbation methods and then show the application of the trajectory method, the eikonal method, and the Lie algebraic method to the simple system of the round magnetic lens. A. The Differential Algebraic Method It is not common to start the introduction to perturbation methods used in electron optics with the differential algebraic method (Berz, 1999), which is not as familiar in the electron optics community as the methods described in the following subsections. Even though the polynomial form of the solution is not exact, it is exact up to order of the solution polynomial. For example,
ˇ RADLI CKA
274 the polynomial solution
1 x = xo − f 2 (z)xo2 + g(z)xo 2 is the second-order approximation to the exact solution x = xo + cos f (z)xo − 1 + sin g(z)xo .
(125)
The principle of the differential algebraic method rests in the observation that the nonlinear approximated solution (125) in coordinates xo , xo corresponds to the linear approximation of the map in coordinates xo , xo extended by the polynomial coordinates xo2 , xo xo , and (xo )2 ⎞⎛ ⎞ ⎛ xo ⎛ ⎞ 1 g(z) − 12 f 2 (z) 0 0 x ⎜ ⎟ ⎜ 0 0 ⎟ ⎜ x ⎟ ⎜ 0 g (z) −f (z)f (z) ⎟ ⎜ xo ⎟ ⎜ 2 ⎟ ⎜ ⎟⎜ ⎟ 0 1 2g(z) g 2 (z) ⎟ ⎜ xo2 ⎟ , (126) ⎜ x ⎟ = ⎜0 ⎟ ⎜ ⎟ ⎝ xx ⎠ ⎜ ⎝0 0 0 g (z) g g ⎠ ⎝ xo xo ⎠ 2 (x ) 0 0 0 0 g 2 (xo )2 which is the solution of some linear differential equation in the polynomial space of the second order. Thus, if such equations from the nonlinear differential equations are found in coordinates x and x , it is just the set of the linear differential equations that will be solved, the solution of which can be written in a matrix form similar to Eq. (126). The coefficients in the polynomial solution will be represented by the matrix elements present in the first two rows. Now let us apply it to the trajectory equation. The trajectory equations in accuracy up to k-th order read ∂Hk+1 (q, p, z) ∂H2 (q, p, z) + ··· + , ∂p ∂p ∂Hk+1 (q, p, z) ∂H2 (q, p, z) − ··· − . p = − ∂q ∂q
q =
(127a) (127b)
We will now find the linear approximation to these equations in the polynomial space of the k-th order in the variables q and p. We define the differential algebraic basis element |l1 , l2 , l3 , l4 = x l1 y l2 pxl3 pyl4
(128)
whose derivative can be calculated by Poisson brackets (Goldstein, 1980) [g, h] =
∂g ∂f ∂g ∂f − ∂q ∂p ∂p ∂q
(129)
LIE ALGEBRAIC METHODS
275
in the form d (130) |l1 , l2 , l3 , l4 = |l1 , l2 , l3 , l4 , H2 + · · · + Hl dz and l = k − l1 − l2 − l3 − l4 + 2, which is enough information to describe the k-th aberration order, as degree([g, h]) = degree(g) + degree(h) − 2. This procedure can be used to find the set of 4+k − 1 linear differential 4 equations of the first order, which determines the solution up to the k-th order. Generally it takes form ˆ = 0, w + A(z)w
(131)
wT = x, y, px , py , x 2 , xy, . . . , py2 , . . . , x k , . . . , pyk .
(132)
where
The evolution operator of this can be formally written z ˆ dt , M(z, zo ) = T exp − A(t)
(133)
zo
where T represents the time ordering. The usual method of practical computation is based on an iterative method, but unfortunately the matrix Aˆ with block structure ⎞ ⎛ 3+k · · · 4 × 20 4 × 4 × 4 k ⎟ ⎜ ⎟ ⎜ ⎜ 3+k ⎟ ⎟ ⎜ ··· 20 × 20 20 × k 0 ⎟ ⎜ ⎟ ⎜ ˆ A=⎜ (134) ⎟ ⎟ ⎜ · · · · · · 0 0 ⎟ ⎜ ⎟ ⎜ ⎝ 3+k 3+k ⎠ ··· × k 0 0 k is not nilpotent, which implies that Aˆ n = 0 for any n; hence, the iteration method will not converge. The standard calculation method used in the differential algebraic method is to solve Eq. (131) numerically. The method is hence not analytical but it combines approaches of analytical and numerical methods. The method was implemented into a computer code (COSY INFINITY, 2007) and is often used for numerical calculation of higher-order aberrations (Cheng et al., 2006). Such calculations are based on Eq. (131). The result is not the formula for the aberration coefficient, like the formula in Eq. (112), but only the numerical value.
ˇ RADLI CKA
276
The Hamiltonian must be expressed in the DA basis in order to automate the method for any aberration order and any optical system. This task can be solved by using the approach of nonstandard analysis (Berz, 1999). The Hamiltonian in the DA basis takes the form aij kl |i, j, k, l (135) H = ij kl
and Eq. (130) then reads d |l1 , l2 , l3 , l4 = aij kl |l1 , l2 , l3 , l4 , |i, j, k, l . dz
(136)
ij kl
The Poisson brackets can be easily found |l1 , l2 , l3 , l4 , |i, j, k, l = (l1 k − l3 i)|l1 + i − 1, l2 + j, l3 + k − 1, l4 + l + (l2 l − l4 j )|l1 + i, l2 + j − 1, l3 + k, l4 + l − 1.
(137)
Using this procedure the solution of Eq. (131) can be automated for general optical systems and for any order of aberration. The calculation is then done numerically using a method like the Runge–Kutta method. Generally, it takes the form w(z) = M(z, zo )w(zo ) and hence, x=
i
M1i wi (zo ),
y=
M2i wi (zo );
(138)
(139)
i
from these equations the numerical value of the aberration coefficients can be easily found. The previous text was just a brief description of the principles of the method. More information can be found in Berz (1999). The form of the method is independent of the order of aberration, but unfortunately, to know the form of the Hamiltonian the analytical form of the field must be known. Thus, having the exact axial potentials and their derivatives, we can compute aberrations of any order. Use of the method is limited by inaccuracy of the higher-order derivatives of the axial potential in real optical systems. B. The Trajectory Method Although the differential algebraic method allows computation of the polynomial form of solution in a very representative manner, finding the analytical
LIE ALGEBRAIC METHODS
277
form of differential equations in higher polynomial space is too lengthy. In the trajectory method, the polynomial form of solution is computed directly from the trajectory equations, but conversely, it loses the transparency of the differential algebraic method. The computation is commonly based on the iterative solution of the trajectory equation (12), in which the linear approximation is taken as the initial assumption (Hawkes and Kasper, 1989). The equation for the k-th iteration in the rotating coordinates then takes the form Pˆ1 Q[k] = f2 Q[k−1] , Q[k−1] , Q[k−1] , z + f3 Q[k−1] , Q[k−1] , Q[k−1] , z + · · · , (140) where Q[0] = Q[0] (Qo , Qo , z) is the solution of the paraxial equation in rotating coordinates. The highest order of the terms considered on the righthand side coincides with the order of the aberrations computed. Eq. (140) forms a set of two linear inhomogeneous differential equations of the second order, the solution of which can be found using variation of parameters in the case of parameterization by the object position and slopes z 1 h(z) [k] g(t)φ ∗ 2 f2 Q[k−1] , Q[k−1] , Q[k−1] , t Q = 1 ∗ φo 2 zo + f3 Q[k−1] , Q[k−1] , Q[k−1] , t + · · · dt z 1 g(z) − h(t)φ ∗ 2 f2 Q[k−1] , Q[k−1] , Q[k−1] , t 1 ∗ φo 2 zo [k−1] [k−1] [k−1] + f3 Q ,Q ,Q , t + · · · dt, (141) where Q[k−1] = Q[k−1] (qo , qo , z). In the case of parameterization by position in the object and aperture planes z 1 1 [k] t sφ ∗ 2 f2 Q[k−1] , Q[k−1] , Q[k−1] , α Q = 1 ∗ Wso φo 2 zo + f3 Q[k−1] , Q[k−1] , Q[k−1] , α + · · · dα z 1 − s tφ ∗ 2 f2 Q[k−1] , Q[k−1] , Q[k−1] , α zo
[k−1]
+ f3 Q
,Q
[k−1]
where Q[k−1] = Q[k−1] (Qo , Qa , z).
,Q
[k−1]
, α + · · · dα ,
(142)
278
ˇ RADLI CKA
This procedure commonly used; however, it is not obvious how many iteration steps are needed to evaluate the solution exactly up to k-th order. For this we must examine the trajectory equations more carefully. The best approach is to introduce the perturbation parameter λ. Let us suppose the solution is written in the polynomial form Q = Q1 (Qo , Qo , z) + Q2 (Qo , Qo , z) + Q3 (Qo , Qo , z) + · · · , (143) where Qk (Qo , Qo , z) is a k-th order homogeneous polynomial in Qo and Qo . Similarly, when the parameterization by position in the object and aperture plane is used, the solution will be in the form Q = Q1 (Qo , Qa , z) + Q2 (Qo , Qa , z) + Q3 (Qo , Qa , z) + · · ·
(144)
and Qk (Qo , Qa , z) is a k-th order homogeneous polynomial in Qo and Qa . Using the perturbation parameter, it can be rewritten as Q(λ) = Q1 + λQ2 + λ2 Q3 + · · · ,
(145)
which would be used in the perturbation procedure. Comparing Eqs. (143) and (145) one can easily see that Q(λ = 1) = Q. Similar to Eq. (143), the right-hand side of the trajectory equation (140) must be rewritten (146) Pˆ1 Q(λ) = λf2 + λ2 f3 + · · · , where fk is a k-th order homogeneous polynomial in Qo and Qo or Qo and Qa . Let us now substitute Eq. (145) into the trajectory equation Pˆ1 Q1 + λQ2 + λ2 Q3 + · · · = λf2 (Q1 + λQ2 + · · · , Q1 + λQ2 + · · · , Q1 + λQ2 + · · · , z) + λ2 f3 (Q1 + · · · , Q1 + · · · , Q1 + · · · , z) + · · · .
(147)
Comparing zero-order terms in λ, one finds the paraxial trajectory equations γ 2 F12 γ φ γ φ η2 B 2 ˆ Q1 = 0, (148) + + P1 (Q1 ) = Q1 + ∗ Q1 + 2φ 4φ ∗ 4φ ∗2 8φ ∗2 while comparing the first-order terms leads to equations that determine the second aberration order γ φ γ φ η2 B 2 γ 2 F12 Q2 = f2 (Q1 , Q1 , Q1 , z). (149) + + Q2 + ∗ Q2 + 2φ 4φ ∗ 4φ ∗2 8φ ∗2 As on the right-hand side there is only a function of z (Q1 is the solution of the paraxial approximation), the solution of Eq. (149) can be evaluated by variation of parameters for the case of parameterization by object position and
279
LIE ALGEBRAIC METHODS
slope in the form Q2 =
z
∗− 1 h(z)φo 2
φ ∗ 2 g(t)f2 (Q1 , Q1 , Q1 , t) dt 1
zo
z
∗− 1 − g(z)φo 2
φ ∗ 2 h(t)f2 (Q1 , Q1 , Q1 , t) dt 1
(150)
zo
or in the case of parameterization by position in the object and aperture planes,
1
Q2 =
z t (z)
∗ 12
Wso φo
φ ∗ 2 s(α)f2 (Q1 , Q1 , Q1 , α) dt 1
zo
z − s(z)
φ
∗ 12
t (α)f2 (Q1 , Q1 , Q1 α) dα
.
(151)
zo
Computing higher aberration orders requires comparison of the higher orders of λ; for example, the third aberration order is determined by the equations Pˆ1 (Q3 ) =
2 ∂f2 (Q1 , Q , Q , z) 1
1
∂Qi
i=1
Q2i +
∂f2 (Q1 , Q1 , Q1 , z) Q2i ∂Qi
∂f2 (Q1 , Q1 , Q1 , z) + Q2i + f3 (Q1 , Q1 , Q1 , z). ∂Qi
(152)
The solution of the previous equations found by the trajectory method takes a form similar to Eq. (150) or (151). Note that Eqs. (150) and (151) have a simpler form when the aberrations are expressed in the image plane, where g(zi ) = M and h(zi ) = 0, or s(zi ) = M and t (zi ) = 0. In such cases, they simplify to Q2 (zi ) =
∗− 1 −Mφo 2
z
φ ∗ 2 h(t)f2 (Q1 , Q1 , Q1 , t) dt
(153)
φ ∗ 2 t (α)f2 (Q1 , Q1 , Q1 α) dα.
(154)
1
zo
or Q2 (zi ) = −
M
z
∗1 Wso φo 2 zo
1
ˇ RADLI CKA
280
Use of the previous procedure allows computation of the aberrations of any order; moreover, by introducing the perturbation parameter, the calculation becomes more transparent compared to the procedure based on Eq. (140). C. The Eikonal Method Introducing the eikonal method moves into the class of perturbation methods based on variational principles. First introduced into charged particle optics by Glaser (1935), this approach was described by Rose (1987), who used it for calculation of correctors (2004). Thanks to its use by Hawkes and Kasper (1989), the method is very familiar in the electron optics community. Let us start from the Lagrangian formulation of the trajectory equations, namely from Eq. (20), where henceforth we will denote the integrand by M, that is: 1 1 M = Φ ∗ 2 1 + q 2 2 − η(Ax x + Ay y + Az ).
(155)
The paraxial approximation is described by the action, known in geometric optics as the eikonal (0) S12
z2 =
M2 q(0) , q(0) , z dz,
(156)
z1
where M2 is the quadratic part of the Lagrangian (155). Varying such an action yields (0) δS12
z2 =
∂M2 (q(0) , q(0) , z) d ∂M2 (q(0) , q(0) , z) δq(0) dz − dz ∂q(0) ∂q(0)
z1 (0) (0) (0) + p(0) 2 δq2 − p1 δq1 ,
(157)
where p(0) = ∂M2 /∂q . When we assume that q(0) fulfils the paraxial equation of motion d ∂M2 (q(0) , q(0) , z) ∂M2 (q(0) , q(0) , z) − = 0, ∂q dz ∂q
(158)
the equation reduces to (0)
(0)
(0)
(0)
(0)
δS12 = p2 δq2 − p1 δq1 .
(159)
However, the intention here is to describe the situation when higher terms of the Lagrangian are retained. Let us consider that Eq. (155) can be expanded
281
LIE ALGEBRAIC METHODS
to M(q, q , z) = M2 (q, q , z) + λM I (q, q , z) + λ2 M I I (q, q , z) + · · · ; (160) then the paraxial trajectory is changed to q = q(0) + λq(1) + λ2 q(2) + · · · ,
(161)
where parameter λ is perturbation parameter. The variation of the action z2 S12 =
M q, q , z dz
(162)
z1
can be found by using either a method similar to that used to derive Eq. (159) δS = p2 δq2 − p1 δq1 (0) (0) (1) (2) (1) (2) = p2 + λp2 + λ2 p2 + · · · δ q2 + λq2 + λ2 q2 + · · · (0) (0) (1) (2) (1) (2) − p1 + λp1 + λ2 p1 + · · · δ q1 + λq1 + λ2 q1 + · · · (163) with p(1) = ∂M I /∂q , p(2) = ∂M I I /∂q , etc., or by varying the action (162) into which Eqs. (160) and (161) were substituted S=
z2
M2 q(0) + λq(1) + λ2 q(2) + · · · , q(0) + λq(1) + λ2 q(2) + · · · , z
z1
+ λM I q(0) + λq(1) + λ2 q(2) + · · · , q(0) + λq(1) + λ2 q(2) + · · · , z + λ2 M I I q(0) + λq(1) + λ2 q(2) + · · ·, q(0) + λq(1) + λ2 q(2) + · · ·, z
+ · · · dz. (164)
The idea is to expand Eqs. (163) and (164) into powers of λ (0) (1) (2) S12 = S12 + λS12 + λ2 S12 + · · · , (0)
(1)
(2)
δS12 = δS12 + λδS12 + λ2 δS12 + · · ·
(165a) (165b)
and from a comparison to find the q(i) and p(i) . We will show the procedure on perturbation of the first and second orders.
ˇ RADLI CKA
282 1. The First-Order Perturbation This perturbation is described by (1) S12
z2
∂M2 (q(0) , q(0) , z) (1) M I q(0) , q(0) , z + q ∂q
= z1
∂M2 (q(0) , q(0) , z) (1) , + q ∂q
(166)
which, using the integration by parts and the paraxial equations of motion, is reduced to (0) (0) (1) (0) (1) (0) (1) I q1 , q1 , z + p2 q2 − p1 q1 , S12 = S12 (167) where z2 I S12
=
M I q(0) , q(0) , z dz
(168)
z1
is a given function of q1 , p1 , and z. The variation of Eq. (167) reads (1) (1) (0) (1) (0) (1) (0) (1) I δS12 = δS12 + δp(0) 2 q2 + p2 δq2 − δp1 q1 − p1 δq1 ; (169)
on the other hand, we can express the variation from Eq. (163) (1)
(0)
(1)
(1)
(0)
(0)
(1)
(1)
(0)
δS12 = p2 δq2 + p2 δq2 − p1 δq1 − p1 δq1
(170)
and comparing the two previous equations, the first-order perturbation relation (Sturok, 1955) can be found (1)
(0)
(1)
(0)
(0) (1)
(0) (1)
I δS12 = p2 δq2 − p1 δq1 − δp2 q2 + δp1 q1 .
(171)
The relation is more general than commonly needed; for example, the perturbed rays may be required to fulfil some constraints. Two such cases are relevant. The first one (1)
(1)
q 1 = p1 = 0
(172)
constrains the start position and momenta of perturbed rays to be the same as the unperturbed ones. The perturbed ray is then determined by its position and slope in z1 . The first-order perturbation relation then reads (1)
(0)
(1)
(0)
I = p2 δq2 − q2 δp2 , δS12
(173)
283
LIE ALGEBRAIC METHODS
which leads to the set of equations (0)
(0)
I (1) ∂q2k ∂p2k ∂S12 (1) = p2k − q2k , ∂q1 ∂q1 ∂q1 2
(174a)
k=1
(0)
(0)
I (1) ∂q2k ∂p2k ∂S12 (1) = p2k − q2k , ∂p1 ∂p1 ∂p1 2
(174b)
k=1
which can be written in the form ⎛ ∂S I ⎞
12
⎝
∂q1 I ∂S12 ∂p1
⎠ = MT1
p(1) 2
,
(1)
−q2
(175)
where M1 is a paraxial matrix in parameterization by qo and po . The (1) (1) coordinates q2 and p2 are easy to calculate, ⎛ ∂S I ⎞ (1) 12 p2 T −1 ∂q1 ⎝ ⎠. = M1 (176) I (1) ∂S12 −q2 ∂p1
When the rotating coordinates are used, the previous result can be simplified for stigmatic systems to ⎞ ⎛ ∂S I ⎞ 1 (1) ⎛ φ ∗ ˆ 12 −φ ∗ 2 g 1ˆ P2 ∂Q1 φo∗ h 1 ⎝ ⎠ ⎠ ⎝ = (177) I ∂S12 ∗− 12 −Q(1) 2 g 1ˆ −φo h1ˆ ∂P 1
and in the image plane, where h(zi ) = 0 and g(zi ) = M, we can write I ∂S12 . ∂Po The second reasonable choice of constraint reads (1)
= −M
(178)
(1)
(1)
(179)
Qi
q1 = q2 = 0,
which means that the perturbed ray is determined by its positions in z1 and z2 . The first-order perturbation relation then takes the form (1)
(0)
(1)
(0)
I = p2 δq2 − p1 δq1 . δS12
(180)
In practical computations, the rays are determined by their position in the object and aperture planes, which means that the constraints take the form (1) q(1) o = qa = 0
(181)
ˇ RADLI CKA
284
and the first-order perturbation relations read (1)
(0)
(1)
(0)
(182)
(1)
(0)
(1)
(0)
(183)
I (0) = p2 δq2 − q2 δp2 − p(1) δSo2 o δqo , I (0) δSa2 = p2 δq2 − q2 δp2 − p(1) a δqa . (1) The quantities p(1) 2 and q2 can be evaluated from the equations (0)
(0)
I (1) ∂q2k ∂p2k ∂So2 (1) = p2k − q2k , ∂qa ∂qa ∂qa 2
(184a)
k=1
I ∂Sa2
∂qo
=
2
(0)
(1) p2k
k=1
∂q2k ∂qo
(0)
(1) − q2k
∂p2k ∂qo
,
(184b)
I and S I were parameterized by positions in the object where the functions So2 a2 I I (q , q , z). As in the previous case, and aperture plane So2 (qo , qa , z) and Sa2 o a ⎛ ∂S I ⎞ a2 p(1) 2 ∂qo T ⎠ = M1 ⎝ , (185) I (1) ∂S12 −q2 ∂qa
where M1 is a paraxial matrix in parameterization by qo and qa . For a stigmatic system in rotating coordinates ⎛ I ⎞ (1) ∗1 ∂Sa2 1 ∗ P2 ˆ ˆ 2 2 1 φ t 1 −φ s 1 ⎝ ∂Qo ⎠ = , (186) 1 I (1) ∗ ∂S12 ˆ ˆ −t 1 s 1 2 −Q2 Wso φo ∂Qa where Ws = st − s t is Wronskian (85). Thus, in image plane, where t (zi ) = 0 and s(zi ) = M, we can write Q(1) i =−
M ∗1
φo 2 Wso
I ∂So2 . ∂Qa
2. The Second-Order Perturbation This perturbation order is described by the part of the action z2
(2) M I I q(0) , q(0) , z + S12 = z1 +
∂M I (q(0) , q(0) , z) (1) q ∂q
∂M I (q(0) , q(0) , z) (1) ∂M2 (q(0) , q(0) , z) (2) q q + ∂q ∂q
(187)
285
LIE ALGEBRAIC METHODS
∂M2 (q(0) , q(0) , z) (2) 1 ∂ 2 M2 (q(0) , q(0) , z) (1) (1) + q + qi qj ∂q 2 ∂qi ∂qi i,j
+2
∂ 2M
2
(q(0) , q(0) , z) ∂qi ∂qj
qi(1) qj(1) +
∂ 2 M2 (q(0) , q(0) , z) (1) (1) qi qj ∂qi ∂qj
' dz. (188)
Similar to the first-order perturbation, by use of equation of motion we can write z2
∂M2 (q(0) , q(0) , z) (2) ∂M2 (q(0) , q(0) , z) (2) q + dz q ∂q ∂q
z1 (0) (2)
(0) (2)
= p2 q2 − p1 q1 .
(189)
(2)
S12 can be also simplified using z2
(1) ∂M
q
∂q
∂M + q(1) ∂q
(1) dz = q(1) 2 p2 − q1 p1 ,
(190)
z1
where the left-hand side can be expanded into powers of the perturbation parameter λ z2
(1) ∂M2
q
∂q
∂M2 + q(1) ∂q
z1
z2
dz + λ
q(1)
∂M I ∂M I + q(1) ∂q ∂q
dz
z1
z2 2 2 2 (1) (1) ∂ M2 (1) (1) ∂ M2 (1) (1) ∂ M2 + qi q j dz + 2qi qj + qi qj ∂qi ∂qj ∂qi ∂qj ∂qi ∂qj z1
i,j
+ o λ2 ,
(191)
and the right-hand side to (1)
(1)
q2 p2 − q1 p1
(1) (1) (1) (0) (1) (0) (1) (1) = q2 p2 − q1 p1 + λ q2 p2 − q1 p1 + o λ2 .
When linear terms in λ are compared,
(192)
ˇ RADLI CKA
286 z2
(1) ∂M2
q
∂q
∂M2 + q(1) ∂q
dz
z1
z2 2 2 2 (1) (1) ∂ M2 (1) (1) ∂ M2 (1) (1) ∂ M2 qi q j dz =− + 2qi qj + qi qj ∂qi ∂qj ∂qi ∂qj ∂qi ∂qj i,j
z1
(1) (1)
(1) (1)
+ q2 p2 − q1 p1 .
(193)
Hence, (2)
(0) (2)
(0) (2)
(1) (1)
(1) (1)
II S12 = S12 + p2 q2 − p1 q1 + p2 q2 − p1 q1 ,
(194)
where the second-order characteristic function z2
II S12
= z1
1 ∂ 2 M2 (q(0) , q(0) , z) (1) (1) M I I q(0) , q(0) , z − qi qj 2 ∂qi ∂qj
+2
i,j
∂ 2M
2
(q(0) , q(0) , z) ∂qi ∂qj
(1) (1) qi q j
∂ 2 M2 (q(0) , q(0) , z) (1) (1) + qi qj ∂qi ∂qj
' dz (195)
q
depends either on z and values of q and in z1 , or on z and positions of the ray in the object and aperture planes, similar to the second and third term on the left-hand side, while the functions, which must be computed, can be found on the right-hand side. Varying Eq. (188) and comparing it with the expansion of Eq. (163), the second-order perturbation relation can be derived (Sturok, 1955) (1) (1) (1) (1) II δS12 − q2 δp2 + q1 δp1 (0) (2) (0) (2) (0) (2) (0) = p(2) 2 δq2 − p1 δq1 − q2 δp2 + q1 δp1 .
(196)
Two perturbation relations can be found using a procedure similar to the first-order perturbation. The first one is for rays determined by the position and slopes in the object plane (1)
(1)
(2)
(0)
(2)
(0)
II δS12 − q2 δp2 = p2 δq2 − q2 δp2 ,
(197)
which leads to the set of the equations (1)
(0)
(0)
II (1) ∂p2k (2) ∂q2k ∂p2k ∂S12 (2) − q2k = p2k − q2k , ∂q1 ∂q1 ∂q1 ∂q1 2
k=1
2
k=1
(198a)
287
LIE ALGEBRAIC METHODS (1)
(0)
(0)
II (1) ∂p2k (2) ∂q2k ∂p2k ∂S12 (2) − q2k = p2k − q2k . ∂p1 ∂p1 ∂p1 ∂p1 2
k=1
2
(198b)
k=1
As in the first-order perturbation, it takes the form ⎛ II " (1) ⎞ (2) ∂S12 (1) ∂p2k 2 − q p2 k=1 2k ∂q1 ⎟ ⎜ ∂q1 T ; = M1 ⎝ II (1) ⎠ (2) "2 ∂S12 (1) ∂p2k −q 2 k=1 q2k ∂p1 ∂p1 −
(199)
hence,
p(2) 2 (2) −q2
⎛ T ⎜ = M−1 ⎝ 1
II ∂S12 ∂q1 II ∂S12 ∂p1
− −
"2
(1)
⎞
(1)
⎟ ⎠
(1) ∂p2k ∂q1
k=1 q2k
"2
(1) ∂p2k k=1 q2k ∂p1
(200)
and for a stigmatic system in rotating coordinates in the image plane, it can be written 2 (1) II ∂Soi (2) (1) ∂Pik Qi = −M . (201) − Qik ∂Po ∂Po k=1
The second perturbation relation for rays determined by the positions in the object and aperture planes reads (1)
(1)
(2)
(0)
(2)
(0)
II (0) δSo2 − q2 δp2 = p2 δq2 − q2 δp2 − p(2) o δpo ,
(202a)
(1) (2) (0) (2) (0) II (2) (0) δSa2 − q(1) 2 δp2 = p2 δq2 − q2 δp2 + qa δpa ,
(202b)
which leads to the set of equations (1)
(0)
(0)
II (1) ∂p2k (2) ∂q2k ∂p2k ∂So2 (2) − q2k = p2k − q2k , ∂qa ∂qa ∂qa ∂qa 2
k=1
2
(1)
(0)
(0)
I (1) ∂p2k (2) ∂q2k ∂p2k ∂Sa2 (2) − q2k = p2k − q2k . ∂qa ∂qa ∂qa ∂qa 2
k=1
(203a)
k=1 2
(203b)
k=1
Using the same procedure as in the previous case, we can find for the aberration in image plane 2 (1) II ∂Soi M (2) (1) ∂Pik . (204) − Qik Qi = − 1 ∗ ∂Qa ∂Qa k=1 φo 2 Wso
288
ˇ RADLI CKA
D. The Lie Algebraic Method This method was introduced by Dragt et al. in the University of Maryland in the 1980s. The basic ideas were published by Dragt and colleagues (Dragt, 1987; Dragt and Forest, 1986a, 1986b; Dragt et al., 1988). It is commonly used in beam mechanics (Dragt et al., 1988; Forest, 1998), where it was employed in the computer code MARYLIE (Department of Physics, University of Maryland, College Park, MD, USA) (MARYLIE, 2006). The use of the method in electron optics is rare (Dragt, 1990; Dragt and Forest, 1986b; Ximen, 1995); the authors have concentrated primarily on the calculation of high-order aberrations of simple systems (Hu and Tang, 1998, 1999). The great advance of the method was its description of stability of Hamiltonian systems (Bazzani, 1988; Dragt, 1982; Meyer, 1974). The method’s name is derived from the mathematical structure used. It is based on Poisson brackets, which form the Lie algebra structure on the phase space, and canonical perturbation theory (Cary, 1981). We use the standard notation ∂f ∂g ∂f ∂g − (205) [f, g] = ∂q ∂p ∂p ∂q for the Poisson brackets of two functions. It is easy to see that the equations of motion can be written consistently in the form w = [w, H ],
(206)
where w is vector in the phase space
q w= . p
(207)
Direct calculation allows easy demonstration of the Lie algebra structure of the phase space generated by the Poisson bracket, [f, αg + βh] = α[f, g] + β[f, h]
(Linearity),
(208a)
[f, g] = −[g, f ] f, [g, h] + h, [f, g] + g, [h, f ] = 0
(Antisymmetry),
(208b)
(Jacobi identity).
(208c)
Because the Poisson brackets exist for all differentiable functions, using this operation it is possible to assign the linear operator to every such function, f → :f :,
:f :g = [f, g].
(209)
This is called a Lie operator. The linearity is the direct consequence of Eq. (208a) :f :(αg + βh) = α:f :g + β:f :h,
(210)
289
LIE ALGEBRAIC METHODS
and the Jacobi identity implies that it acts as the derivative on the algebra of functions with product represented by the Poisson brackets, :f :[g, h] = [:f :g, h] + [g, :f :h].
(211)
A similar property also can be found for the algebra of functions where the product is represented by the standard product of function (Steinberg, 1986), :f :gh = (:f :g)h + g:f :h.
(212)
All of these properties have important consequences as mentioned later. The Lie algebraic method is based on the canonical transformations. The standard description of canonical transformation uses the generating function, in which the new and original coordinates are mixed (Goldstein, 1980). That is, ˜ q, z) ˜ q, z) ∂F (p, ∂F (p, , p= . (213) ∂ p˜ ∂q However, for the purposes of the perturbation theory the description via transformation using only original coordinates is more advisable. It is possible to do so by use of the Lie transformations, which are defined as the exponential of the Lie operator; that is, q˜ =
˜ ˜ ˜ z) = e:g(w,z): w, w(w,
(214)
where 1 1 (215) e:g: = 1ˆ + :g: + :g:2 + :g:3 + · · · . 2 3! It can be shown (Steinberg, 1986) that when Eq. (211) is valid, the transformation so produced is canonical. That is, e:g: [f, h] = e:g: f, e:g: h . (216) Moreover, from Eq. (212) it can be seen that all analytical functions are transformed via (Steinberg, 1986) ˜ p,z): ˜ ˜ p, ˜ z) = f q(q, ˜ p, ˜ z), p(q, ˜ p, ˜ z), z = e:g(q, ˜ p, ˜ z). (217) f˜(q, f (q, Unfortunately, the situation for the Hamiltonian is more complicated because the transformation rule for Hamiltonians differs from the transformation rule for functions (Goldstein, 1980). An expansion in terms of the Lie transformation can be found in the form (Cary, 1977) ˜ p,z): ˜ ˜ p, ˜ z) + ˜ p, ˜ z) = e:g(q, H (q, H˜ (q,
1 0
(See also Appendix A.)
˜ p,z): ˜ eθ:g(q,
∂g dθ. ∂z
(218)
290
ˇ RADLI CKA
In fact, the Lie algebraic method is a just modification of the canonical perturbation methods used in the classical mechanics (Lichtenberg and Lieberman, 1982). Like the Hamilton–Jacobi theory, it is based on the fact that motion is the canonical transformation that compensates the Hamiltonian. But in contrast to the Hamilton–Jacobi theory, the Hamiltonian is not compensated using one transformation but it is compensated term by term. For our case, when the Hamiltonian is easily expanded into polynomial in canonical variables H = H 2 + H3 + H4 + · · · ,
(219)
it is natural to find such a sequence of the canonical transformations, which compensate the Hamiltonian order by order. Let us say that the canonical transformation Mk transforms the system with Hamiltonian (219) into a system with Hamiltonian H = Hk+2 + Hk+3 + · · · .
(220)
The equations of motion then take the form d ˜ = [w, ˜ H ] = fk+1 + fk+2 + · · · , w dz
(221)
that is, the lowest order of terms on the right-hand side is k + 1. The physical meaning of Mk arises from the fact that the solution of such an equation can be evaluated in the form ˜ o , z) + gk+2 (w ˜ o , z) + · · · ; ˜ =w ˜ o + gk+1 (w w
(222)
hence, there is no evolution up to k-th order present. If we do not include terms of higher order than k into the calculations, the evolution described ˜ = w ˜o in the previous equation is approximated by an identity; that is, w ˜ = Mk w ˜ o completely describes the and the canonical transformation w evolution up to k-th order. Before we present the procedure for finding such a transformation, we mention a property that is useful when polynomial orders are compared. If we evaluate the Poisson bracket of two homogeneous polynomials of k-th and l-th order, respectively, the result emerges as the homogeneous polynomial of (k + l − 2)-th order; that is, [gk , gl ] = hk+l−2 .
(223)
The proof is based directly on the definition of Poisson brackets, which consists of multiplication and two derivatives.
291
LIE ALGEBRAIC METHODS
The procedure starts by compensation of the quadratic part of the Hamiltonian and is equivalent to the solution of the paraxial approximation. Two separate approaches can be used. The first is to use parameterization with initial position and momentum. The extended canonical transformation representing such a transformation is described by the linear map (94) ⎞ ⎛ ∗− 12 ˆ −1 ˆ ˆ g(z)1 φo h(z)1 0 R ⎠ w, ⎝ 1 ∗ ˜ = ˜ (224) w = M1 w e ˆ −1 φ ∗2 0 R ˆ φ g (z)1ˆ h (z) 1 η φ∗ o
and the new Hamiltonian—the interaction Hamiltonian—reads η ˜ z), z + H4 w(w, ˜ z), z + · · · H int = H3 w(w, e ˜ z) + H4int (w, ˜ z) + · · · . = H3int (w,
(225)
In the second approach the rays are determined by their position in the object and aperture plane. Such a transformation is the extended canonical transformation; in the rotating coordinates it takes the form ˜ Q s 1ˆ t 1ˆ Q , (226) = 1 1 ∗ ∗ ˆ ˆ P P˜ s φ 21 t φ 21 where P˜ corresponds to Qa , which represent the generalized momentum. The interaction Hamiltonian in this case reads η ˜ z), z + H4 w(w, ˜ z), z + · · · H3 w(w, H int = 1 ∗ eφo 2 Wso ˜ z) + H4int (w, ˜ z) + · · · , = H3int (w, (227) where Wso = s(zo )t (zo ) − t (zo )s (zo ) is the Wronskian (85). The following procedure is similar for both types of parameterization. We will find the canonical transformation that compensates the third-order part of an interaction Hamiltonian. Let us assume that it takes the form [1] ,z):
˜ = e:g3 (w w
w[1] .
It transforms the Hamiltonian into H
[1]
[1]
w ,z = e
:g3 (w[1] ,z):
H
int
[1]
1
w ,z +
[1] ,z):
eθ:g3 (w
∂g3 dθ . ∂z
(228)
0
Moreover, we require the transformed Hamiltonian not to include the thirdorder part; therefore, H [1] w[1] , z = H4[1] w[1] , z + H5[1] w[1] , z + · · · . (229)
ˇ RADLI CKA
292
Comparing terms of the third order on the right-hand side of Eqs. (228) and (229), one can find H3int +
∂g3 = 0, ∂z
(230)
the solution of which z g3 = −
H3int w[1] , z dz
(231)
zo
completely describes the desired canonical transformation. Such a transformation, together with M1 , describes the system up to the second aberration order [1] ,z):
w(2) (z) = M2 w[1] = M1 e:g3 (w
w[1] ,
(232)
and when we neglect the evolution of the third and higher orders w[1] = wo , we can write w(2) (z) = M2 wo = M1 e:g3 (wo ,z): wo .
(233)
Explicitly, for stigmatic systems in rotating coordinates, it takes the form ⎛ ⎞ ∗− 12 ∂g3 ˆ ˆ Qo − ∂P g(z)1 φo h(z)1 Q(2) o ⎠ ∗ (234) =⎝ 1 ∂g φ 3 P(2) ∗2 ˆ Po + ∂Qo φ g (z)1ˆ φ ∗ h (z)1 o
and in the image plane can be written ∂g3 Q(2) = M Qo − . ∂Po
(235)
When we use the aperture position instead of the initial momentum we must proceed differently. When Eq. (233) is applied to the coordinate Q, we find Q(2) = s(z)e:g3 : Qo + t (z)e:g3 : Qa ,
(236)
Q(2) (za ) = e:g3 (za ): Qa
(237)
which is reduced to
in the aperture plane (s(za ) = 0, t (za ) = 1). This means that the perturbed ray does not intersect the aperture plane in the point Qa ; hence, we modify the value Qa → e−:g3 (za ): Qa
(238)
293
LIE ALGEBRAIC METHODS
and Eq. (233) is modified to −:g3 (z): Qo Qo e Q(2) = M1 e:g3 (z): = , M 1 P(2) e−:g3 (za ): Qa e−:g3a (z): Qa
(239)
where z g3a = −
H3int (z1 ) dz1 .
(240)
za
Note that Qa is generalized momentum, and the result expressed up to the third order thus takes the form Q − ∂g3 s(z)1ˆ t (z)1ˆ o ∂Qa Q(2) , (241) = 1 ∗ 12 ∗ ∂g P(2) φ s (z)1ˆ φ 2 t (z)1ˆ Qa + ∂Q3ao which reduces to
∂g3 Q(2) = M Qo − ∂Qa
(242)
in the image plane. Using Eq. (228) one can evaluate the transformed Hamiltonian H4[1]
z
1 − 2
= H4
int
dz1 H3int w[1] , z1 , H3int w[1] , z ,
(243)
zo
H5[1] = H5int −
z
dz1 H3int w[1] , z1 , H4int w[1] , z
zo
1 + 3
z
z ds
zo
dt H3int w[1] , t , H3int w[1] , s , H3int w[1] , z . (244)
zo
The previous procedure can be used to find the canonical transformation [2] w[1] w[2] , z = e:g4 (w ,z): w[2] , which compensates the fourth-order part of the Hamiltonian, that is, H
[2]
H
[2]
=e
:g4 (w[2] ,z):
H
[1]
[2]
1
w ,z +
[2] ,z):
eθ:g4 (w
∂g4 (w[2] , z) dθ, (245a) ∂z
0
=
H5[2]
+ H6[2]
+ ···.
(245b)
ˇ RADLI CKA
294
Comparing terms of the fourth order and solving the differential equation yields z g4 = −
H4[1] w[2] , z dz
zo z
=−
int
H4 zo
1 w , z dz + 2 [2]
z
z2 dz2
zo
dz1 H3int w[2] , z1 , H3int w[2] , z2 .
zo
(246) The transformed Hamiltonian is determined by Eq. (245a) and the canonical ˜ 3 w[2] reads ˜ =M transformation to interaction coordinates w ˜ 3 = e:g3 (w[1] ,z): e:g4 (w[2] ,z): = e:g4 (w[2] ,z): e:g3 (w[2] ,z): . M
(247)
The rays are then described up to the third aberration order by the canonical transformation w(3) = M1 e:g4 : e:g3 : wo ,
(248)
which for stigmatic systems in rotating coordinates takes ⎛ ⎞ ∗− 12 ˆ ˆ g(z) 1 φ h(z) 1 o Q(3) ⎠ ∗ =⎝ 1 φ P(3) ˆ h (z) 1 φ ∗ 2 (z)g 1ˆ ∗ φo ∂g3 ∂g4 ∂g3 1 − − [g , ] Qo − ∂P 3 ∂P 2 ∂P o o o × ∂g3 ∂g4 ∂g3 Po + ∂Q + ∂Q + 12 [g3 , ∂Q ] o o o
(249)
and when the aperture position instead of the initial momentum is used, we find Q − ∂g3 − ∂g4 − 1 [g , ∂g3 ] s(z)1ˆ t (z)1ˆ o ∂Qa ∂Qa 2 3 ∂Qa Q(3) . = 1 1 ∗ ∗ ∂g ∂g P(3) φ 2 s (z)1ˆ φ 2 t (z)1ˆ Qa + 3a + 4a + 1 [g3a , ∂g3a ] ∂Qo
∂Qo
2
∂Qo
(250) By applying this procedure to higher-order terms it is possible to find the canonical transformation that compensates the Hamiltonian up to a given order, let us say up to the k-th ˜ ˜ ˜ ˜ k−1 = e:gk (w,z): M e:gk−1 (w,z): · · · e:g3 (w,z): .
(251)
The evolution in original coordinates can be then evaluated as ˜ k−1 wo , w(k−1) = M1 M
(252)
LIE ALGEBRAIC METHODS
295
which is the analogue of the factorization theorem presented by Dragt (Dragt and Forest, 1986b). In the case of parameterization by position in the object and aperture planes ˜ Mk−1 Qo , (253) w(k−1) = M1 ˜ (a) Qa M k−1 where ˜ ˜ ˜ ˜ (a) = e:gk,a (w,z): M e:gk−1,a (w,z): · · · e:g3,a (w,z): . k−1
(254)
E. Dispersion and Chromatic Aberration We have described previously the perturbation methods only for systems in which the electron energy does not differ. Let us abandon this assumption and focus on the case when a small energy deviation is present. This subsection describes the changes and modifications of the perturbation methods caused by this extension. 1. The Trajectory Method From Eqs. (12) and (100a) it is clear that the additional dimension changes the trajectory equations to − cos Θ F 1 δ Pˆ1 (Q) + 4eφ ∗7/4 sin Θ (255a) = f2 δ, Q, Q , Q , z + f3 δ, Q, Q , Q , z + · · · , δ = const,
(255b)
(δ, Q, Q , Q , z)
where the terms fk on the right-hand side represent the k-th order homogeneous polynomial in variables δ, Q, Q , and Q with z dependent coefficients. According to the procedure used in the monochromatic case, using the perturbation parameter λ one can find − cos Θ F1 2 ˆ δ P1 Q1 + λQ2 + λ Q3 + · · · + 4eφ ∗7/4 sin Θ = λf2 (δ, Q1 + λQ2 + · · · , Q1 + λQ2 + · · · , Q1 + λQ2 + · · · , z) + λ2 f3 (δ, Q1 + · · · , Q1 + · · · , Q1 + · · · , z) from which, by comparing of each order in λ, it is possible to find equations that determine all Q(i) , namely F1 − cos(Θ) ˆ P1 (Q1 ) = − δ, (256a) 7 sin(Θ) 4eφ ∗ 4
ˇ RADLI CKA
296
Pˆ1 (Q2 ) = f2 (δ, Q1 , Q1 , Q1 , z), Pˆ1 (Q3 ) =
∂f2 (δ, Q1 , Q1 , Q1 , z) ∂Qi +
(256b) Q2i +
∂f2 (δ, Q1 , Q1 , Q1 , z) Q2i ∂Qi
∂f2 (δ, Q1 , Q1 , Q1 , z) Q2i + f3 (δ, Q1 , Q1 , Q1 , z) ∂Qi
(256c)
.. . The following procedure is similar to the procedure mentioned in Section IV.A. 2. The Lie Algebraic Method For the description of the dispersion case the phase space must include two more canonically conjugate variables, τ and pτ , and the Poisson brackets therefore read [f, g] =
∂f ∂g ∂f ∂g ∂f ∂g ∂f ∂g − + . − ∂q ∂p ∂p ∂q ∂τ ∂pτ ∂pτ ∂τ
(257)
The definition and properties of the Lie operator and Lie transform remain unchanged except that the Poisson brackets and the phase space do change. Hence, the Hamiltonian must include the variable pτ and its expansion will take the form H (q, p, pτ , z) = H2 (q, p, pτ , z) + H3 (q, p, pτ , z) + H4 (q, p, pτ , z) + · · · . Moreover, if the dipole fields represented by coefficient F1 paraxial approximation takes the form ⎞⎛ ⎛ −1 ∗− 1 Rˆ 0ˆ 0 g 1ˆ φ0 2 1ˆ ∗ ⎟ ⎜ ∗1 ⎜ e ˆ −1 φ ˆ M1 = ⎝ 0ˆ 0⎠⎜ ηR ⎝ φ 2 g 1ˆ φ0∗ h 1 0 0 1 0 0 ⎛ ⎞ 1 0 0 0 2mcημ(h) ˆ ⎜ 0 1 0 0 2mcηνˆ (h) ⎟ ⎜ ⎟ ⎜ ⎟ × ⎜ 0 0 1 0 −2mcημ(g) ˆ ⎟ ⎜ ⎟ ⎝ 0 0 0 1 −2mcηνˆ (g) ⎠ 0
0
0
0
1
(258) are present, the ⎞ 0
⎟ ⎟ 0⎠ 1
LIE ALGEBRAIC METHODS
297
which causes the following change of the interaction coordinates q˜ q (259) p = M1 p˜ p˜ τ pτ and the interaction Hamiltonian as well. The rest of the procedure is unaffected. 3. The Differential Algebraic Method In addition to the image position and momentum, the particles’ trajectories are determined by their energy deviation δ, if one considers the dispersion case. For this method, the differential algebraic basis must include other elements. The general basis elements then read |l1 , l2 , l3 , l4 , d = Xl1 Y l2 Pxl3 Pyl4 Pτd
(260)
and since Pτ remains constant along each ray (i.e., Pτ = 0), the derivative of the general differential algebraic element takes the form d |l1 , l2 , l3 , l4 , d = − H2 + H3 + H4 + · · · , |l1 , l2 , l3 , l4 , d . (261) dz − 1 linear differential equations of the first In this case, one can find 5+k 5 order, which determine the solution up to the k-th order. The procedure that follows is analogous to the monochromatic case. F. Example: Round Magnetic Lens The round magnetic lens is the basic element of most electron microscopes. Our intention is to illustrate the application of the perturbation methods described above rather than to describe the physical properties of the lens. First, we perform the calculation using the eikonal method and the Lie algebraic method. The trajectory method will be applied last. We omit the differential algebraic method because it is not appropriate for analytic calculation of aberrations. We than compare the results of all these methods. The magnetic axially symmetric field is determined by one function— the axial flux density B(z). We use the following expansion of the vector potential 1 1 Ax = − yB(z) + y x 2 + y 2 B (z), (262a) 2 16 1 1 Ay = xB(z) − x x 2 + y 2 B (z), (262b) 2 16 Az = 0. (262c)
ˇ RADLI CKA
298
The paraxial equation of the system has already been studied, so we can start to calculate the aberration. 1. The Eikonal Method The Lagrangian of the system up to the fourth order reads M = M 2 + M4 , where 1 ∗1 2 1 φ 2 q − ηB(z)(xy − x y), (263) 2 2 1 1 2 1 (264) M4 = − φ ∗ 2 q 2 + ηB (xy − x y)q2 ; 8 16 hence, the third-order Lagrangian is not present in the system. We will solve the problem in rotating coordinates, where the Lagrangian reads M2 =
1 ∗ 1 2 η2 B 2 (z) 2 φ 2Q − Q , (265) 1 2 8φ ∗ 2 2 1 2 1 1 M4 = − L1 Q2 − L2 Q2 Q 2 − L3 Q 2 − R(XY − X Y )2 4 2 4 − CP Q2 (XY − X Y ) − CQ Q 2 (XY − X Y ) (266) M2 =
with the coefficients defined as follows: L1 = L2 = L3 = R= CQ = CP =
η4 B 4 ∗ 32
32φ η2 B 2
−
η2 8φ
∗ 12
, 1 8φ ∗ 2 1 ∗1 φ 2, 2 η2 B 2 , 1 8φ ∗ 2 1 ηB, 4 η3 B 3 ηB . − 16φ ∗ 16
BB ,
(267a) (267b) (267c) (267d) (267e) (267f)
When we use the paraxial approximation in parameterization with coordinate and momentum in the object plane after substitution of Eq. (93) for Q and Q
LIE ALGEBRAIC METHODS
299
and using L2z = Q2 P2 − (QP)2 , we derive 2 1 M˜ 4 = − φ ∗−2 L1 h4 + 2L2 h2 h 2 + L3 h 4 P2o 4 3 − φ ∗− 2 L1 gh3 + L2 hh (gh) + L3 g h 3 P2o (Qo Po ) − φ ∗−1 L1 g 2 h2 + 2L2 ghg h + L3 g 2 h 2 − R (Qo Po )2 1 − φ ∗−1 L1 g 2 h2 + L2 g 2 h2 + g 2 h 2 + L3 g 2 h 2 + 2R Q2o P2o 2 1 − φ ∗− 2 L1 hg 3 + L2 gg (hg) + L3 h g 3 Q2o (Qo Po ) 2 1 − L1 g 4 + 2L2 g 2 g 2 + L3 g 4 Q2o 4 3 − φ ∗− 2 CQ h 2 + CP h2 P2o Lz − 2φ ∗−1 (CQ g h + CP gh)Qo Po Lz 1 − φ ∗ 2 CQ g 2 + CP g 2 Q2o Lz .
(268)
Because there is no electric field in the system, that is, φ ∗ = φo∗ , a slightly simplified form of the Wronskian remains constant gh − g h = g(zo )h (zo ) − g (zo )h(zo ) = 1,
st − s t = s(zo )t (zo ) − s (zo )t (zo ) = Wso .
(269a) (269b)
When we introduce the notation C=φ
∗− 12
z
L1 h4 + 2L2 h2 h 2 + L3 h 4 dz,
(270a)
zo
K=φ
∗− 12
z
L1 gh3 + L2 hh (gh) + L3 g h 3 dz,
(270b)
L1 g 2 h2 + 2L2 ghg h + L3 g 2 h 2 − R dz,
(270c)
zo
A=φ
∗− 12
z
zo
F =φ
∗− 12
z zo
L1 g 2 h2 + L2 g 2 h2 + g 2 h 2 + L3 g 2 h 2 + 2R dz, (270d)
ˇ RADLI CKA
300 D=φ
∗− 12
E = φ ∗− 2 1
z zo z
L1 hg 3 + L2 gg (hg) + L3 h g 3 dz,
L1 g 4 + 2L2 g 2 g 2 + L3 g 4 dz,
(270e)
(270f)
zo
k=φ
∗− 12
a = φ ∗− 2 1
d = φ ∗− 2 1
z zo z
zo z
CQ h 2 + CP h2 dz,
(271a)
(CQ g h + CP gh) dz,
(271b)
CQ g 2 + CP g 2 dz
(271c)
zo
the action takes the form of 2 3 1 1 I Soi = − φ ∗− 2 C P2o − φ ∗−1 KP2o (Qo Po ) − φ ∗− 2 A(Qo Po )2 4 1 1 1 1 2 − φ ∗− 2 F Q2o P2o − DQ2o (Qo Po ) − φ ∗ 2 E Q2o − φ ∗−1 kP2o Lz 2 4 − 2φ ∗− 2 aQo Po Lz − dQ2o Lz 1
(272)
and using Eq. (178) the perturbation in the image plane reads ∗− 3 2 CP2 Po + φ ∗−1 K P2 Qo + 2(Qo Po )Po Q(1) o o i =M φ + 2φ ∗− 2 A(Qo Po )Qo + φ ∗− 2 F Q2o Po + DQ2o Qo + kφ ∗−1 2Lz Po − P2o JˆQo 1 + 2aφ ∗− 2 Lz Qo − (Qo Po )JˆQo − dQ2o JˆQo , 1
1
(273)
1 where Jˆ was defined in Eq. (81). When we use Po = φ ∗ 2 Qo , Q(1) = M CQo2 Qo + K Qo2 Qo + 2(Qo Qo )Qo + 2A(Qo Qo )Qo + F Q2o Qo + DQ2o Qo + k 2Kz Qo − Qo2 JˆQo (274) + 2a Kz Qo − (Qo Qo )JˆQo − dQ2o JˆQo
with Kz = Xo Yo − Yo Xo .
LIE ALGEBRAIC METHODS
301
In the case of parameterization by position in the object and aperture plane, we proceed similarly 2 1 M˜ 4 = − L1 t 4 + 2L2 t 2 t 2 + L3 t 4 Q2a 4 − L1 st 3 + L2 tt (st) + L3 s t 3 Q2a (Qo Qa ) − L1 s 2 t 2 + 2L2 sts t + L3 s 2 t 2 − RWs2 (Qo Qa )2 1 L1 s 2 t 2 + L2 s 2 t 2 + s 2 t 2 + L3 s 2 t 2 + 2RWs2 Q2o Q2a 2 − L1 ts 3 + L2 ss (ts) + L3 t s 3 Q2o (Qo Qa ) −
2 1 L1 s 4 + 2L2 s 2 s 2 + L3 s 4 Q2o 4 − Ws CQ t 2 + CP t 2 Q2a Lz
−
− 2Ws (CQ s t + CP st)Qo Qa Lz − Ws CQ s 2 + CP s 2 Q2o Lz (275) where now Lz = Xo Ya − Xa Yo . Using coefficients C=φ
∗− 12
Ws−1
z
L1 t 4 + 2L2 t 2 t 2 + L3 t 4 dz,
(276a)
zo
K=φ
∗− 12
Ws−1
z
L1 st 3 + L2 tt (st) + L3 s t 3 dz,
(276b)
zo
A=φ
∗− 12
Ws−1
z
L1 s 2 t 2 + 2L2 sts t + L3 s 2 t 2 − RWs2 dz,
(276c)
zo
F =φ
∗− 12
Ws−1
D = φ ∗− 2 Ws−1 1
z zo z
L1 s 2 t 2 + L2 s 2 t 2 + s 2 t 2 + L3 s 2 t 2 + 2RWs2 dz, (276d)
L1 ts 3 + L2 ss (ts) + L3 t s 3 dz,
(276e)
zo
E=φ
∗− 12
Ws−1
z zo
L1 s 4 + 2L2 s 2 s 2 + L3 s 4 dz,
(276f)
ˇ RADLI CKA
302 k=φ
∗− 12
a = φ ∗− 2 1
d = φ ∗− 2 1
z zo z
zo z
CQ t 2 + CP t 2 dz,
(277a)
(CQ s t + CP st) dz,
(277b)
CQ s 2 + CP s 2 dz
(277c)
zo
the action takes the form of C 2 F I = − Q2a − KQ2a (Qo Qa ) − A(Qo Qa )2 − Q2o Q2a − DQ2o (Qo Qa ) Soi 4 2 E 2 2 2 2 (278) − Q − kQa Lz − 2aQo Qa Lz − dQo Lz 4 o and using Eq. (187) yields Q(1) = M CQ2a Qa + K Q2a Qo + 2(Qo Qa )Qa + 2A(Qo Qa )Qo + F Q2o Qa + DQ2o Qo + k 2Lz Qa − Q2a JˆQo + 2a Lz Qo − (Qo Qa )JˆQo − dQ2o JˆQo . (279) 2. The Lie Algebraic Method The Hamiltonian up to the fourth order takes the form H = H2 + H 4 ,
(280)
where H2 =
η ∗ 12
2eφ η3
p2 +
ηB 2φ
∗ 12
Lz +
ηeB 2 4φ
∗ 12
q2 ,
(281)
2 2 2 2 η3 B 2 2 2 eη3 eη 4 H4 = q p q + B − BB p + 3 3 3 1 ∗ ∗ ∗ ∗ 8e3 φ 2 16eφ 2 128φ 2 32φ 2 3 3 η3 B 2 2 η B ηB η3 B 2 L + L + − q + Lz p2 (282) z ∗ 32 z ∗ 32 ∗ 12 ∗ 32 2 8eφ 16φ 16φ 4e φ which in rotating coordinates takes the form H2 =
P2 2φ
∗ 12
+
η2 B 2 8φ
∗ 12
Q2 ,
(283)
303
LIE ALGEBRAIC METHODS
H4 =
2 1 2 2 1 ∗−1 1 L1 Q + φ L2 Q2 P2 + φ ∗−2 L3 P2 − φ ∗−1 RL2z 4 2 4 + φ ∗− 2 CP Q2 Lz + φ ∗− 2 CQ P2 Lz . 1
3
(284)
In the case of parameterization with coordinates and momentum in the object plane, the interaction Hamiltonian can be calculated using ˜ P), ˜ P(Q, ˜ P) ˜ , H4int = H4 Q(Q, (285) ˜ P) ˜ in the paraxial approximation is where the transformation (Q, P) → (Q, described in Eq. (93); hence, H4int =
2 1 ∗−2 φ L1 h4 + 2L2 h2 h 2 + L3 h 4 P˜ 2 4 3 ˜ P) ˜ + φ ∗− 2 L1 gh3 + L2 hh (gh) + L3 g h 3 P˜ 2 (Q ˜ P) ˜ 2 + φ ∗−1 L1 g 2 h2 + 2L2 ghg h + L3 g 2 h 2 − R (Q 2 2 1 ˜ P˜ + φ ∗−1 L1 g 2 h2 + L2 g 2 h2 + g 2 h 2 + L3 g 2 h 2 + 2R Q 2 2 1 ˜ P) ˜ ˜ (Q + φ ∗− 2 L1 hg 3 + L2 gg (hg) + L3 h g 3 Q 2 2 1 ˜ + L1 g 4 + 2L2 g 2 g 2 + L3 g 4 Q 4 3 + φ ∗− 2 CQ h 2 + CP h2 P˜ 2 Lz ˜ PL ˜ z + 2φ ∗−1 (CQ g h + CP gh)Q 1 ˜ 2 Lz + φ ∗ 2 CQ g 2 + CP g 2 Q
(286)
where we used L2z = p2 q2 − (qp)2 . Eqs. (231), (243) and (246) can be used to write g3 = 0,
(287) z
g4 = −
H4int Q[2] , P[2] , z dz;
(288)
zo
that is, 2 2 3 1 1 g4 = − φ ∗− 2 C P[2]2 − φ ∗−1 KP[2]2 Q[2] P[2] − φ ∗− 2 A Q[2] P[2] 4 1 1 − φ ∗− 2 F Q[2]2 P[2]2 − DQ[2]2 Q[2] P[2] 2 2 1 1 1 − φ ∗ 2 E Q[2]2 − φ ∗−1 kP[2]2 Lz − 2φ ∗− 2 aQ[2] P[2] Lz − dQ[2]2 Lz . 4
ˇ RADLI CKA
304
The position in the image plane can be calculated using Eq. (249) ∂g4 (Qo , Po ) Q(3) = M Qo − , ∂Po
(289)
4 (wo ) where the second term −M ∂g∂P agrees with Q(1) from Eq. (273). o When the parameterization with position in image and aperture plane is used, the interaction Hamiltonian reads 2 2 1 ˜a H4int = L1 t 4 + 2L2 t 2 t 2 + L3 t 4 Q 4 2 ˜Q ˜ a) ˜ a (Q + L1 st 3 + L2 tt (st) + L3 s t 3 Q ˜Q ˜ a )2 + L1 s 2 t 2 + 2L2 sts t + L3 s 2 t 2 − R (Q
2 2 1 ˜ Q ˜a L1 s 2 t 2 + L2 s 2 t 2 + s 2 t 2 + L3 s 2 t 2 + 2R Q 2 ˜Q ˜ a) ˜ 2 (Q + L1 ts 3 + L2 ss (ts) + L3 t s 3 Q 2 2 2 1 ˜ ˜ a Lz + CQ t 2 + C P t 2 Q + L1 s 4 + 2L2 s 2 s 2 + L3 s 4 Q 4 ˜Q ˜ a Lz + C Q s 2 + C P s 2 Q ˜ 2 Lz , − 2(CQ s t + CP st)Q (290) +
2 [2] [2] 2 1 g4 = − C Q[2]2 Q Qa − A Q[2] Q[2] − KQ[2]2 a a a 4 1 [2]2 2 1 − F Q[2]2 Q[2]2 − DQ[2]2 Q[2] Q[2] − E Q a a 2 4 [2] [2] [2]2 − kQ[2]2 L − 2aQ Q L − dQ L z z z a a and using Eq. (250), we derive Q(3)
∂g4 (Qo , Qa ) , = M Qo − ∂Qa
(291)
(292)
which corresponds to Eq. (279). 3. The Trajectory Method The trajectory equation can be derived from the Lagrangian equations d ∂M ∂M = 0, − dz ∂Q ∂Q which leads to φ ∗ 2 Q + 1
η2 B 2 4φ
∗ 12
ˆ Q = (L3 Q − L2 Q − CQ J2 Q − 2CQ Jˆ2 Q )Q 2
(293)
LIE ALGEBRAIC METHODS
305
+ (L2 Q + L2 Q − L1 Q − CP Jˆ2 Q − 2Cp Jˆ2 Q )Q2 + 2(L2 Q − CP Jˆ2 Q)QQ + 2(L3 Q − CQ Jˆ2 Q)Q Q + 2(CQ Q − R Jˆ2 Q)Kz + 2(CQ Q + CQ Q − CP Q − 2R Jˆ2 Q − R Jˆ2 Q)Kz ,
(294)
where Kz = XY − Y X . Hence, the trajectory equation takes the form of Q +
η2 B 2 Q = f3 (Q, Q , Q , z); 4φ
(295)
that is, there is no polynomial of the second order on the right-hand side. Hence, Q2 = 0 [Eq. (150)] and Eq. (152) is reduced to Q3 +
η2 B 2 Q3 = f3 (Q1 , Q1 , Q1 , z). 4φ
(296)
Using the method of variation of parameters yields Q3 =
∗− 1 −Mφo 2
zi
φ ∗ 2 h(t)f3 (Q1 , Q1 , Q1 , t) dt 1
zo
zi = −M
h(t)f3 (Q1 , Q1 , Q1 , t) dt,
(297)
zo
or Q3 = −
M
zi
∗1 Wso φo 2 zo
M =− Wso
zi
φ ∗ 2 t (α)f3 (Q1 , Q1 , Q1 α) dα 1
t (α)f3 (Q1 , Q1 , Q1 α) dα,
(298)
zo
depending on the parameterization used. Because the practical calculation is lengthy, we show it only for the case of parameterization by position and slopes in the object plane. In such a case, 2 Q1 = gQo + hQo , Wg = gh − hg = 1, and Q1 = − L L3 Q1 ; hence, Eq. (296)
306
ˇ RADLI CKA
reads η2 B 2 Q3 4φ ∗ 1 = φ ∗− 2 I1 Qo 2 Qo + I2 Qo2 Qo + I3 (Qo Qo )Qo + I4 (Qo Qo )Qo + I5 Q2o Qo + I6 Q2o Qo + A1 Kz Qo + A2 Qo 2 Jˆ2 Qo + A3 Kz Qo + A4 (Qo Qo )Jˆ2 Qo + A5 Q2o Jˆ2 Qo + N1 (Qo Qo )Jˆ2 Q + N2 Qo2 Jˆ2 Qo + N3 Q2o Jˆ2 Qo (299)
Q3 +
where the coefficients take the forms L22 3 h − 2L2 h 2 h + L2 h2 h , I1 = − L1 + L3 L2 I2 = − L1 + 2 gh2 − 2L2 h 2 g + L2 g h2 − 2R h − 4Rh , L3 L22 gh2 − 4L2 hg h + 2L2 ghh + 2R h + 4Rh , I3 = −2 L1 + L3 L22 hg 2 − 4L2 gg h − 4Rg − 2R g + 2L2 ghg , I4 = −2 L1 + L3 L22 2 I5 = − L1 + g h − 2L2 hg 2 + L2 g 2 h + 4Rg + 2R g, L3 L2 I6 = − L1 + 2 g 3 − 2L2 g 2 g + L2 g 2 g , L3 L2 A1 = 2CQ h, h − 2 CP + CQ L3
(300a) (300b) (300c) (300d) (300e) (300f)
(301a)
2 A2 = −CQ h g − CP h2 g − 2CQ h 2 g − 2CP h2 g L2 ghh , − 2 CP − CQ (301b) L3 L2 g + 2CQ g, (301c) A3 = −2 CP + CQ L3 L2 2 2 ghg A4 = −2CQ gg h − 2CP g h − 4CQ g h − 6CP − 2CQ L3 L2 2 g h, − 2 CP − CQ (301d) L3 L2 2 g g , (301e) gg 2 − 2 2CP − CQ A5 = −2CQ g 3 − CP g 3 − CQ L3
LIE ALGEBRAIC METHODS
307
L2 L2 2 ghh − 2 CP − CQ h g − 4CQ g h 2 N1 = − 6CP − 2CQ L3 L3 − 2CQ hg h − 2CP gh2 , (302a) L2 2 h h − 2CQ h 3 − CP h3 − CQ hh 2 , (302b) N2 = −2 2CP − CQ L3 L2 N3 = −2 CP − CQ ghg − 2CP g 2 h − 2CQ g 2 h − CP g 2 h L3 − CQ hg 2 .
(302c)
We used Kz Jˆ2 Q = Q 2 Q − (QQ )Q ,
(303a)
Kz Jˆ2 Q = (QQ )Q − Q2 Q .
(303b)
Using Eq. (297), we can write Q3 in the form of Q3 = M I˜1 Qo 2 Qo + I˜2 Qo2 Qo + I˜3 (Qo Qo )Qo + I˜4 (Qo Qo )Qo + I˜5 Q2o Qo + I˜6 Q2o Qo + A˜ 1 Kz Qo + A˜ 2 Qo2 Jˆ2 Qo + A˜ 3 Kz Qo + A˜ 4 (Qo Qo )Jˆ2 Qo + A˜ 5 Q2o Jˆ2 Qo + N˜ 1 (Qo Qo )Jˆ2 Q + N˜ 2 Qo2 Jˆ2 Qo + N˜ 3 Q2o Jˆ2 Qo (304) where coefficients with tildes are given by ∗− I˜1 = −φo 2 1
zi h(t)I1 (t) dt,
....
(305)
zo
It is clear that Eq. (304) must correspond to Eq. (274); hence, we express the coefficients from Eq. (304) as I˜1 =
∗− 1 φo 2
zi L1 + zo
=
∗− 1 φo 2
L22 4 h − L2 h h3 + 2L2 h 2 h2 dz L3
zi d 4 2 2 4 3 3 L1 h + 2L2 h h + L3 h − L3 h h + L2 h h dz dz zo ∗− 12
= C − φo
L3 h 3 h + L 2 h 3 h
zi zo
=C
(306)
ˇ RADLI CKA
308
2 where we used h = − L L3 h and h(zo ) = h(zi ) = 0. For evaluation of the other coefficients, we proceed similarly:
I˜2 = K − φ
∗− 12
zi
d L3 hh 2 g + L2 h3 g − 2Rh2 dz = K, dz
(307a)
zo
zi d ∗− 12 2 2 2 ˜ L3 hh g + L2 gh h + Rh dz = 2K, (307b) I3 = 2 K − φ dz zo
I˜4 = 2 A − φ
∗− 12
zi zo
I˜5 = F − φ
∗− 12
zi
d −Rgh + L2 gg h2 + L3 g 2 hh dz = 2A, dz (307c)
d 2Rgh + L2 hg 2 h + L3 hh g 2 dz = F, dz
(307d)
d L3 hg 3 + L2 hg 2 g dz = D, dz
(307e)
zo
I˜6 = D − φ ∗− 2 1
zi zo
A˜ 1 = 2k − 2φ
zi
∗− 12
d (CQ h h) dz = 2k, dz
(308a)
d CQ h 2 hg + CP h3 g dz = −k, dz
(308b)
d (CQ g h) dz = 2a, dz
(308c)
zo
A˜ 2 = −k + φ
zi
∗− 12
zo
A˜ 3 = 2a − 2φ ∗− 2 1
zi zo
A˜ 4 = −2a + 2φ
zi
∗− 12
d CP g 2 h2 + CQ ghg h dz = −2a, (308d) dz
zo
A˜ 5 = −d + φ ∗− 2 1
zi zo
d CP g 3 h + CQ ghg 2 dz = −d, dz
(308e)
LIE ALGEBRAIC METHODS
N˜ 1 = 2φ ∗− 2 1
zi
d CP gh3 + CQ h2 g h dz = 0, dz
309
(309a)
zo
N˜ 2 = φ ∗− 2 1
N˜ 3 = φ ∗− 2 1
zi zo zi
d CP h4 + CQ h2 h 2 dz = 0, dz
(309b)
d CP g 2 h2 + CQ h2 g 2 dz = 0. dz
(309c)
zo
The third-order aberration polynomial then takes the form Q3 = M CQo2 Qo + K Qo2 Qo + 2(Qo Qo )Qo + 2A(Qo Qo )Qo + F Q2o Qo + DQ2o Qo + k 2Kz Qo − Qo2 Jˆ2 Qo + 2a Kz Qo − (Qo Qo )Jˆ2 Qo − dQ2o Jˆ2 Qo (310) which coincides with Eq. (274). 4. Comparison of Methods In this section we applied all the analytical perturbation methods described to the simple example of a round magnetic lens. The calculation is not new, but our purpose was to compare the different approaches of the perturbation methods. The comparison shows that even though the formulation of the trajectory method is very straightforward, the practical application is very demanding. On the other hand, there are no significant differences in complexity between the eikonal and the Lie algebraic method. We showed that even though the numerical values of the aberration coefficients are independent of the method used, this does not mean that the form of the aberration integrals is also identical. However, they can be transformed into the same form by using integration by parts, as shown in the comparison of the results of the trajectory method with the eikonal or Lie algebraic method. The eikonal method and the Lie algebraic method are much more desirable for the calculation of higher-order aberrations. Because both methods are based on the symplectic structure of the phase space, the relationship among the aberration coefficients is much more apparent. However, whereas the procedure of derivation of the second-order perturbation relations (196) is not transparent, the calculation of higher-order perturbation in the Lie algebraic method is a straightforward procedure. Moreover, the Lie algebraic method provides insight into the structure of the aberration polynomials—using
ˇ RADLI CKA
310
symplectic classification, we can describe the relationship among aberrations and determine which fields influence a given aberration coefficient. Finally, we can state that the trajectory method is not advisable for calculation of higher-order aberration coefficients. However, we were not able to show any great differences between the eikonal and the Lie algebraic methods. We will show that introduction of the Lie algebra structure into the polynomial space offers some advantages, mostly in the aberrations classification or in the study of periodic systems, but the calculation of the aberration coefficients is still a lengthy task that is rarely done without the use of computer programs for symbolic computation such as Maple (Symbolic Computation Group, University of Waterloo, Waterloo, Ontario, Canada) or Maxima.
VI. T HE S YMPLECTIC C LASSIFICATION OF G EOMETRIC A BERRATIONS The theory of aberrations describes the nature of the optical imperfections of various devices. The aberrations describe the deviation of an electron trajectory from the ideal paraxial ray, determined as a linear function of positions and slopes in the object plane or some other parameterization. From the mathematical point of view, the ray is function r = r(z; ro , ro ),
(311)
which can be expanded in a Taylor series in initial conditions r0 and r0 for each z. The aberration part is determined by the nonlinear part of the expansion. Unfortunately, since the ray is a function of four initial variables, the Taylor polynomial contains a large number of members even in relatively low order, 1 (n + 3)(n + 2)(n + 1). (312) 6 Pn denotes the number of members in the Taylor polynomial of n-th order. It gives 20 members of the third order, 56 members of the fifth order, 220 members of the 9-th order, and so on. Two questions arise. The first is the meaning of each member in the aberration polynomial, including its influence on the optical properties of the system. The second is whether some structure exists among the aberration coefficients, that might be useful for understanding the system properties. Both questions are partly answered by the symplectic classification of aberrations. The symplectic classification of the aberration polynomials of axial symmetric systems was presented in Dragt and Forest (1986a); general systems were classified in a series of works (Navarro-Saad and Wolf, 1986; Pn =
311
LIE ALGEBRAIC METHODS
Wolf, 1986, 1987). We present the classification of aberration polynomials of stigmatic systems in terms of the representation of group adjoint to the algebra of quadratic polynomials that are determined by the quadratic part of the Hamiltonian. We also show the essential influence of the form of the paraxial approximation on the relationship among the aberration polynomials. A. Aberrations and Lie Transformations The previous chapter showed the relation between aberrations and Lie transformations. We found the transfer map in the form of a Lie transformation
M = M1 · · · e:gk (wo ,z): · · · e:g3 (wo ,z): .
(313)
Now we can find the aberrations as the expansion of w(wo , z) = M(wo )
(314)
as a Taylor series in powers of wo ; that is, w = f1 (wo , z) + f2 (wo , z) + · · · + fk−1 (wo , z) + · · · ,
(315)
where fl denotes a homogeneous polynomial of l-th order in the phase-space variables w. Hence, we can describe aberrations using the Lie transformation, particularly using the homogeneous polynomials gk . One of the most significant properties of aberrations is the invariance with respect to a canonical transformation e:f : that may, for example, represent the rotation around the optic axis. The invariance of the aberrations with respect to a canonical transformation e:f : is defined as follows: The canonical transformation exp(:g(w):) is invariant with respect to the canonical transformation exp(:f (w):) when the diagram (316) commutes, w
e:g(w):
˜ w
e:f (w):
wI
˜ e:f (w):
e:g(w
I ):
(316)
˜ I; w
that is, ˜ e:g(w ): e:f (w): = e:f (w): e:g(w): . I
(317)
Such a property of the aberration must influence the form of the Lie transformation, namely g(w) e:g(w): = e−:f (w˜ ): e:g(w ): e:f (w): = e:g:(w ) e−f (w ) e−:g(w ): e:g(w ): ef (w ) = exp :e:f (w): g(w): ; (318) I
I
I
I
I
I
I
ˇ RADLI CKA
312 hence, g(w) must satisfy
g(w) = e:f (w): g(w).
(319)
The canonical transformation does not change the form of g. The invariance of aberrations with respect to a canonical transformation imposes a condition on the Lie transformation. We explore the properties of the Lie transformation in the following text. B. Aberrations and Paraxial Approximation The paraxial approximation describes the optimal electron paths, but we have seen previously that it also plays an important role in the aberration theory. It played an important role in all the perturbation methods. We will focus our attention on the Lie algebraic method, where the paraxial approximation is described by the linear operator M1 that satisfies the equation d M1 = −M1 :H2 :. (320) dz Generally, this equation cannot be solved analytically; however, using the Magnus formula (Jagannathan and Khan, 1996) we can find the operator in the form of the Lie transformation, M1 = exp :g2 (w, z): (321) where z g2 (w, z) = −
1 dz1 H2 (w, z1 ) + 2
zo
z
z2 dz2
zo
dz1 H2 (z1 ), H2 (z2 ) + · · · .
zo
(322) It is a member of subalgebra h2 of the Lie algebra of the quadratic polynomials in phase-space variables with z-dependent coefficients, where the Poisson bracket plays the role of Lie brackets. Mathematically, h2 is the smallest subalgebra of the algebra that contains the quadratic part of Hamiltonian. In the case when [H2 (z1 ), H2 (z2 )] = 0 for each z1 and z2 , h2 is reduced to a one-dimensional (1D) subspace and Eq. (322) is reduced to z g2 (w, z) = −
dz1 H2 (w, z1 ),
if H2 (z1 ), H2 (z2 ) = 0 ∀z1 , z2 .
zo
Unfortunately, this condition is not generally satisfied, and the previous equation is no longer valid. However, if the structure of h2 is known, it is also
313
LIE ALGEBRAIC METHODS
possible to describe the general structure of action of M1 on the polynomial subspace. Let us denote by P the vector space of polynomials in the phase-space variables. The map that assigns for each element g ∈ h2 the adjoint Lie operator :g: ∈ gl(P ) is a representation of the algebra h2 . Using standard procedures we can decompose P into the irreducible subspaces according to the representation of h2 . Let us consider any polynomial f0 that is an element of the invariant subspace U ⊂ P and the series f0 ,
f1 = :g2 :f0 ,
f2 = :g2 :f1 ,
....
Because the first member of the series f0 ∈ U and U is invariant according to the representation of h2 , f1 ∈ U , and similarly for each fi ∈ U (i.e., all members of the series are elements of U ). Moreover, U is a vector space; thus, any linear combination of members of U is member of U as well. Hence,
M1 f = exp(:g2 :)f =
∞ :g2 :n n=0
n!
f ∈ U.
(323)
If the space is invariant with respect to the representation of h2 , then it is invariant with respect to the action M1 = exp(:g2 :), where g2 ∈ h2 . We have thus shown that, if we find the decomposition of the polynomial subspace on the irreducible subspaces under the representation of h2 , these subspaces are also invariant under the action of M1 . But why should we do that? We can provide three reasons. The n-th order part of the Hamiltonian takes the form Hn = aij kl x i y j pxk pyl . i+j +k+l=n
The transition to the interaction Hamiltonian (225) ˜ z), z = M1 Hn (w, ˜ z) = Hn w(w, ˜ z) H int (w,
(324)
is of a complicated form in these coordinates; however, when we decompose the n-th order polynomials on irreducible subspaces Vn = Vn1 ⊕ · · · ⊕ Vnk , the parts of the Hamiltonian belonging to different invariant subspaces do not mix. This simplifies the transition. The action of M1 also arises in two other situations. The first one is on recalculation of the aberration coefficients expressed in the object coordinates to those expressed in the paraxial coordinates in the image. Generally, the coordinates in the image can be expressed by wi = M1 wo + f2 (wo ) + f3 (wo ) + · · · −1 p p p = wi + f2 M−1 1 w i + f 3 M1 w i + · · · ,
(325)
ˇ RADLI CKA
314 p
where wi = M1 wo denotes paraxial coordinates in the image. By using the feature of the Lie transformation (217) the previous equation takes the form of p p −1 p (326) wi = wi + M−1 1 f 2 w i + M1 f 3 w i + · · · , which is in the same form as the previous case. Similarly, we can say that the parts of the aberration polynomials belonging to different irreducible subspaces do not mix during the transformation. The last example concerns the combination of systems. Let us consider two systems, the first of which is described by transition map MI and the second by transition map MI I . The resulting map of these systems, which is formed by a combination of these two maps, is described by the transfer map (Dragt and Forest, 1986b) M = MI MI I . By using decomposition of transfer maps into paraxial and nonlinear parts, we find M = M1 exp :g3 (wo ) + g4 (wo ) + · · · : = MI1 exp :g3I (wo ) + g4I (wo ) + · · · : · MI1I exp :g3I I (wo ) + g4I I (wo ) + · · · : , (327) which can be written by using the manipulations described in the previous section in the form I I )−1 (g
M = MI1 MI1I e:(M1
3 (wo )+g4 (wo )+···):
e:g3
I I (w )+g I I (w )+···: o o 4
. (328)
Using the well-known Baker–Campbell–Hausdorff formula yields −1 I g3 (wo ) = MI1I g3 (wo ) + g3I I (wo ), (329a) −1 I 1 I I −1 I g4 (wo ) + g4I I (wo ) + M1 g3 (wo ), g3I I (wo ) . g4 (wo ) = MI1I 2 (329b) From the last equation, it is clear that it is the aberrations belonging to corresponding irreducible subspaces that mix when combining the transfer maps. These examples show the important role of the paraxial approximation, which can be described by the action of M1 . The decomposition of aberration polynomials into irreducible subspaces shows the structure of the aberration and might be helpful for understanding the optical properties of the system. In particular, we will proceed as follows: • We find the general form of the algebra h2 for stigmatic systems. • We find the decomposition of the polynomial space in irreducible subspaces, according to the representation of h2 . • We describe the structure of invariant subspaces of the polynomial space under the action of M1 .
LIE ALGEBRAIC METHODS
315
C. Lie Algebra h2 In the general case, the quadratic part of the Hamiltonian is described by Eq. (87c); the subalgebra h2 is hence generated by four polynomials q2 , p2 , Lz , and x 2 − y 2 . The algebra h2 is therefore equal to the algebra of all quadratic polynomials, which is via Eq. (312) a 10-dimensional vector space. The structure of such a space is too complicated. Fortunately, the general form of the quadratic part of the Hamiltonian is not necessary, as most optical devices are designed to be stigmatic. When one applies the stigmatic condition (74), the quadratic part of the Hamiltonian reduces to eF12 η ηB 1 eγ0 φ eηB 2 2 q2 . (330) p + Lz + + + H2 = ∗ 12 ∗ 12 ∗ 12 ∗ 12 ∗ 32 2 2eφ 2φ 4ηφ 4φ 8ηφ The subalgebra h2 is then formed by four polynomials p2 , q2 , qp, and Lz . It is usual to introduce the notation (Dragt and Forest, 1986a) 1 a + = − p2 , 2 1 a − = q2 , 2 1 1 a0 = a + , a − = qp, 2 2 Lz = xpy − ypx . One can then find the commutation relations Lz , a + = 0, Lz , a − = 0, + − a0 , a + = a + , a , a = 2a0 ,
(331a) (331b) (331c) (331d)
[Lz , a0 ] = 0, a0 , a − = −a − .
The polynomials a + , a − , and a0 form the Lie subalgebra of the quadratic polynomial algebra, which is isomorphic to sp(2, R); moreover, as the polynomial Lz commutes with the others, the structure of h2 can be written h2 ∼ = sp(2, R) ⊕ R.
(332)
This is the structure considered in the following text. Let us denote the adjoint operator to a + , a − , a0 , and Lz as follows: 1 aˆ + = :a + : = − :p2 :, 2 1 aˆ − = :a − : = :q2 :, 2
(333a) (333b)
ˇ RADLI CKA
316
1 1 + − : a , a := :qp:, 2 2 Lˆ z = :Lz : = :xpy − ypx :. aˆ 0 = :a0 : =
(333c) (333d)
These form the basis of the adjoint algebra :h2 :, which has the same algebraic structure as algebra h2 . D. Representation of h2 on the Space of Complex Polynomials When the structure of h2 is known, we can describe the adjoint representation on the polynomial space. The first important property is that the polynomial order remains unchanged under the action of h2 : the homogeneous polynomials of different order do not mix and they can be investigated independently. Hence, the polynomial space can be decomposed to V =
∞ 8
Vn ,
(334)
n=1
where each Vn is a reducible representation of h2 . The order of a polynomial can be calculated as the eigenvalue of a number operator d d d d +y + px Nˆ = x + py . (335) dx dy dpx dpy Because the action h2 conserves the order of homogeneous polynomials, (336) [Nˆ , aˆ 0 ] = Nˆ , aˆ + = Nˆ , aˆ − = [Nˆ , Lˆ z ] = 0. Now we must find a decomposition of each Vn to irreducible subspaces. First, we describe the eigensubspaces of Lˆ z . This operator is the generator of rotation around the axis z (Goldstein, 1980). The transformation has no eigendirection in the real plane perpendicular to the axis z. The standard manner of describing the decomposition on the irreducible subspaces is to extend the real space to a complex space. In the complex extension, it is possible to find the basis of eigenvectors of any linear operator. We introduce the coordinates √ z = (x + iy)/√2, (337a) z¯ = (x − iy)/ 2 (337b) and canonically adjoint momenta found by a standard procedure (Goldstein, 1980), read √ pz = (px − ipy )/ 2, (337c)
LIE ALGEBRAIC METHODS
√ pz¯ = (px + ipy )/ 2.
317 (337d)
The z-coordinate of angular momentum takes the form Lz = i(zpz − z¯ pz¯ ),
(338)
the action of which is described by Lˆ z zi1 z¯ i2 pzi3 p¯ zi4 = i(i3 − i4 − i1 + i2 )zi1 z¯ i2 pzi3 p¯ zi4 ;
(339)
i that is, the polynomials of form zi1 z¯ i2 pz3 p¯ zi4 are eigenvectors of Lˆ z . Let us denote
l = (i3 − i4 − i1 + i2 ). Using n = i1 + i2 + i3 + i4 , one can find that l = n − 2(i1 + i4 ). Because i1 + i4 can take any value from 0 to n for a given polynomial order n l ∈ {−n, −n + 2, . . . , n − 2, n}.
(340)
The eigenvectors with the same value of l form a subspace of Vn . As [sp(2, R), Lz ] = 0, these subspaces are invariant under the action of h2 and one can write Vn =
n 8
Vn,n−2k .
(341)
k=0 i
Let us consider the polynomial u = zi1 z¯ i2 pz3 pzi¯4 ∈ Vn,l , then the polynomial i u¯ = zi2 z¯ i1 pzi4 pz¯3 ∈ Vn,l must exist in the subspace Vn . Because Lˆ z u = ilu, the action of Lˆ z on u¯ takes the form ¯ Lˆ z u¯ = −i(i3 − i4 − i1 + i2 )u¯ = −il u; hence, u¯ ∈ Vn,−l . Using the linearity of Lˆ z one can find, for each u ∈ Vn,l , a complex conjugate polynomial u¯ ∈ Vn,−l . Thus, it was shown that subspaces Vn,l and Vn,−l are complex conjugate, that is, Vn,l = V¯n,−l .
(342)
The next step is to describe the action of sp(2, R) on the polynomial subspace Vnl . The polynomials a + , a − , and a0 transform as follows: a + = −pz pz¯ ,
(343a)
a = z¯z, 1 a0 = (zpz + z¯ pz¯ ). 2
(343b)
−
(343c)
ˇ RADLI CKA
318
Let us note that all the polynomials Lz , a + , a − , and a0 are real polynomials expressed in complex coordinates. First, we describe general properties of a finite dimensional complex representations of sp(2, R), which is for our case represented by operators aˆ + , aˆ − , and aˆ 0 . Here it is customary to introduce the Casimir operator (Bäuerle and Kerf, 1999), aˆ 2 = aˆ − aˆ + + aˆ 02 + aˆ 0 = aˆ + aˆ − + aˆ 02 − aˆ 0 .
(344)
This commutes with all elements of sp(2, R) 2 − 2 + 2 aˆ , aˆ = aˆ , aˆ = aˆ , aˆ 0 = 0, hence, in the irreducible subspace there exists a basis formed by common eigenvectors of aˆ 2 and aˆ 0 . Let us denote them |j, m, aˆ 2 |j, m = j (j + 1)|j, m,
(345)
aˆ 0 |j, m = m|j, m.
(346)
From the commutation relations the meaning of aˆ + and aˆ − as raising or descending operator, respectively, can be desired aˆ 0 aˆ ± |j, m = aˆ ± aˆ 0 ± a ± |j, m = (m ± 1)a ± |j, m, ±
a |j, m ∼ |j, m ± 1,
(347) (348)
aˆ +
transforms a vector with an eigenvalue m relative to that is, the action of aˆ 0 to a vector with the eigenvalue m + 1; similarly the action of aˆ − transforms a vector with the eigenvalue m to a vector with the eigenvalue m − 1. The representation is finite, the consequence of which is that there must exist a vector |j, m1 = 0 that vanishes under the action of aˆ + , i.e. a + |j, m1 = 0. Using Eq. (344) it can be known aˆ 2 |j, m1 = m21 + m1 |j, m1 and comparing with Eq. (345) m1 = j . Hence, in the irreducible subspace, there exists a vector |j, j = 0 that vanishes under the action of aˆ + . Applying the operator aˆ − , we can generate a chain of vectors, k (349) |j, j − k ∼ aˆ − |j, j , which is invariant under the action of sp(2, R) and forms an irreducible subspace. For the representation to be finite, the chain must be finite as well; hence, there must exist km such that |j, j − km = 0 that vanishes under the action of aˆ − ; that is, aˆ − |j, j − km ∼ |j, j − km − 1 = 0
(350)
LIE ALGEBRAIC METHODS
and when we calculate the action of aˆ 2 aˆ 2 |j, j − km = aˆ + aˆ − + aˆ 02 − aˆ 0 |j, j − km = (j − km )2 − j + km |j, j − km = j 2 + j |j, j − km
319
(351)
and compare the coefficients in the last equation, we find km = 2j . Hence, we can write the possible eigenvalues of aˆ 0 m ∈ {j, j − 1, . . . , −j + 1, −j }.
(352)
Moreover, because km is an integer, the possible values of j are constrained to '
3 1 (353) j ∈ 0, , 1, , . . . . 2 2 The irreducible representation of sp(2, R) can be characterized by the eigenvalues of aˆ 2 . We will denote them Dj , 9 : (354) Dj = |j, j , |j, j − 1, . . . , |j, −j + 1, |j, −j . We have not yet completely defined the polynomial |j, m; we have only noted that it is proportional to (a − )j −m |j, j ; now we determine the multiplicative factor. We will use a different normalization from that used in quantum mechanics, where the representation of sl(2, R) ∼ = sp(2, R) is used for the description of the angular momentum operator. We will use the normalization introduced by Dragt and Forest (1986a) 1 aˆ − |j, m, m+j 1 aˆ + |j, m. |j, m + 1 = m−j |j, m − 1 =
(355a) (355b)
It is well known (Bäuerle and Kerf, 1999) that the representation of sp(2, R) is completely reducible, which for our case means that the space Vn,l can be written as a direct sum 8 8 Vn,l,j = Dj , (356) Vn,l = j ∈K
j ∈K
where K is an index set. Now two questions arise: (1) what is the form of elements in Dj and (2) how can we determine the index set K? It was shown that each irreducible space Dj is determined by the appropriate vector |j, j ; hence, if one wants to describe all Dj ∈ Vnl , one must find all the polynomials u ∈ Vnl that vanish under the action of aˆ + and are
ˇ RADLI CKA
320
eigenvectors of aˆ 0 and aˆ 2 . The polynomials c u = pza pzb¯ i(zpz − z¯ pz¯ ) = pza pzb¯ Lcz ,
a + b + 2c = n
fulfil such conditions. In fact, aˆ + u = 0, aˆ 0 u = 12 (a + b)u, and aˆ 2 u = ( 14 (a + b)2 + 12 (a + b))u. Moreover, we require the polynomial to be an element of Vnl . Let us find the connection between a, b, c, and n, l, and j .
Nˆ u = (a + b + 2c)u = nu, Lˆ z u = i(a − b)u = ilu, 1 1 2 aˆ u = (a + b) (a + b) + 1 u = j (j + 1)u. 2 2
(357a) (357b) (357c)
Comparing the results and solving the algebraic equation results in the u in form j + 12 l j − 21 l 12 n−j pz¯ Lz
u = pz
j ;l
=: n Pj ,
(358)
where the notation for the polynomial using its eigenvalues relative to j ;l operators N , Lˆ z , aˆ 2 , and aˆ 0 was introduced; thus n Pm satisfies
Nˆ n Pmj ;l = n Pmj ;l , j ;l j ;l Lˆ z n Pm = il n Pm , j ;l aˆ Pm j ;l aˆ 0 n Pm 2n
= j (j + 1) =
(359a) (359b) n
Pmj ;l ,
j ;l m n Pm .
(359c) (359d)
Moreover, it can easily be shown that n
n
Pjj ;l = Lz2
−j 2j
Pjj ;l .
(360)
Because the exponent of pz , pz¯ , and Lz in Eq. (358) cannot be negative, the value of j is constrained with the others, that is, j
1 n, 2
hence,
j
j∈
1 l, 2
1 j − l, 2
|l| |l| n , + 1, . . . , 2 2 2
(361)
' (362)
and subspace Vn,l ∼ = D |l| ⊕ D |l| +1 ⊕ · · · ⊕ D n2 ⊕ Un,l 2
2
(363)
321
LIE ALGEBRAIC METHODS j ;l
where each Dj is generated by the action of aˆ − on the polynomial n Pj , n
1
Pmj ;l = α(j, m)Lz2
n−j − j −m j + 12 l j − 21 l pz pz¯ aˆ
1
= Lz2
n−j 2j
Pmj ;l . (364)
U is some rest subspace and the multiplicative factor α(j, m) ;nlj i=m+1 1/(i + j ). Using Eq. (364) it is easy to see that n
Vnlj = Lz2
−j
· V2j,l,j .
=
(365)
Now we will show that Unl = 0 by comparing the subspace dimensions. The dimension of the n-th order polynomial subspace can be evaluated as 1 (n + 3)! = (n + 3)(n + 2)(n + 1). (366) n!3! 6 Conversely, the dimension can be found n/2 n n dim Vn = Vn,n−2k = dim Dj + dim Un,n−2k dim Vn =
k=0
k=0
j =|n−2k|/2
and as the dimension of Dj is 2j + 1, n/2 n (2j + 1) + dim Un,n−2k dim Vn = k=0
j =|n−2k|/2
n |n − 2k| n + 1 + |n − 2k| + 1 − +1 = 2 2 2 k=0 2 n n n (n − 2k)2 +1 − + dim(Un,n−2k ) + dim Un,n−2k 2 4 n 1
k=0
=
k=0
1 (n + 3)(n + 2)(n + 1) + 6
n
dim(Un,n−2k ).
(367)
k=0
Comparing Eqs. (366) and (367) it shows that dim Unl = 0 and Vn,l ∼ = Vn,l, |l| ⊕ Vn,l, |l| +1 ⊕ · · · ⊕ Vn,l, n2 , 2
(368)
2
where each Vn,l,j ∼ = Dj . We have shown that Vn,l = V¯n,−l ; when we use the decomposition in Eq. (368) it is easy to see that each of the subspaces Vn,l,j is the complex conjugate of the subspace Vn,−l,j , or more particularly, n
¯ Pmj ;l = n Pmj ;−l .
(369)
ˇ RADLI CKA
322
The description of the decomposition of polynomial space to irreducible subspaces is done by combining Eqs. (334), (341) and (368). The particular form of the polynomials is described by Eq. (364). E. Example: Decomposition of Polynomial Space up to Fourth Order We use the method explained in the previous subsection. Since the irreducible subspaces of different orders do not mix, we can describe the decomposition of each order independently. 1. Polynomials of the Third Order For the subspace V3 , n = 3 and using Eq. (340), l ∈ {−3, −1, 1, 3}. Hence, V3 = V3,−3 ⊕ V3,−1 ⊕ V3,1 ⊕ V3,3 .
(370)
For V3,−3 , n = 3 and l = −3 using Eq. (362), j ∈ { 32 }; hence, V3,−3 = V3,−3, 3 ∼ = D3/2 . 2
(371)
The irreducible subspace D3/2 is generated from polynomial (358); that is, 3
3
P 32
;−3
2
= pz3¯
(372a)
and, applying the operator aˆ − , 3
3
P 12
3
P
3
P
;−3
2 3 2 ;−3 − 12 3 2 ;−3 − 32
hence,
1 − 3 32 ;−3 1 − 3 aˆ P 3 = aˆ pz¯ = zpz2¯ , 3 3 2 1 − 3 12 ;−3 1 − 2 = aˆ P 3 = aˆ zpz¯ = z2 pz¯ , 2 2 2 =
− 21 ;−3
= aˆ − 3 P 3 2
= aˆ − z2 pz¯ = z3 ;
9 : V3,−3 = pz3¯ , zpz2¯ , z2 pz¯ , z3 .
(372b) (372c) (372d)
(373)
Using the same procedure one can find for l = −1, j ∈ { 12 , 32 }, and V3,−1 ∼ = D1/2 ⊕ D3/2 , with D1/2 formed by polynomials 3 3
1
P 12 P
;−1
2 1 2 ;−1 − 12
= ipz¯ (zpz − z¯ pz¯ ) = pz¯ Lz ,
(374a)
= iz(zpz − z¯ pz¯ ) = zLz
(374b)
LIE ALGEBRAIC METHODS
323
and D3/2 by 3
1
P 32
3
P
3
P
3
P
;−1
2 1 2 ;−1 1 2 1 2 ;−1 − 21 1 2 ;−1 − 23
= pz pz2¯ , 1 2 2 z¯ p + zpz¯ pz , 3 z¯ 3 2 1 = z¯ zpz¯ + z2 pz , 3 3 =
= z¯ z2 .
(375a) (375b) (375c) (375d)
The subspace V3,1 is complex conjugate to V3,−1 , i.e., V3,1 ∼ = D1/2 ⊕ D3/2 with D1/2 = pz Lz , z¯ Lz , = < 2 1 2 1 2 2 2 2 D3/2 = pz pz¯ , z¯ pz pz¯ + zpz , z¯ pz¯ + z¯zpz , z¯z 3 3 3 3 ∼ and the subspace V3,3 = D3/2 , 9 : V3,3 = pz3 , z¯ pz2 , z¯ 2 pz , z¯ 3
(376a) (376b)
(377)
is complex conjugate to V3,−3 . 2. Polynomials of the Fourth Order For the fourth-order polynomials, the values of l are elements of the set {−4, −2, 0, 2, 4}. We show the procedure only for the case l = 0; for the others, we present only the results. When l = 0, j goes through the set {0, 1, 2}; thus, V4,0 ∼ = V4,0,0 ⊕ V4,0,1 ⊕ V4,0,2 . The irreducible space V4,0,0 ∼ = D0 is a 1D vector space generated by the polynomial
P00;0 = L2z = −(zpz − z¯ pz¯ )2 . (378) The irreducible subspace V4,0,1 ∼ = D1 is generated by the polynomial 4
4
and applying operator
aˆ − ,
P11;0 = pz pz¯ Lz
we generate the basis
1 1 P01;0 = aˆ − 4 P11;0 = (zpz + z¯ pz¯ )Lz , 2 2 4 1;0 P−1 = aˆ − 4 P01;0 = z¯zLz . 4
(379a)
(379b) (379c)
ˇ RADLI CKA
324
The subspace V4,0,2 ∼ = D2 , the descending series determined by the polynomial 4
P22;0 = (pz pz¯ )2 ,
(380a)
takes the form 1 P12;0 = pz pz¯ (zpz + z¯ pz¯ ), 2 1 1 4 2;0 P0 = (zpz + z¯ pz¯ )2 + z¯zpz pz¯ , 6 3 1 4 2;0 P−1 = z¯z(zpz + z¯ pz¯ ), 2 4 2;0 P−2 = z¯ 2 z2 . 4
(380b) (380c) (380d) (380e) ∼ = D2 ,
In the subspace V4,−4 j must be equal to 2; hence, V4,−4 = V4,−4,2 which is generated by : 9 (381) V4,−4,2 = pz4¯ , zpz3¯ , z2 pz2¯ , z3 pz¯ , z4 . For V4,−2 , j goes through the set {1, 2}; hence, V4,−2 = V4,−2,1 ⊕ V4,−2,2 ∼ = D1 ⊕ D2 , where 9 : V4,−2,1 = pz2¯ Lz , zpz¯ Lz , z2 Lz (382) and
< 1 1 V4,−2,2 = pz pz3¯ , pz2¯ (¯zpz¯ + 3zpz ), zpz¯ (zpz + z¯ pz¯ ), 4 2 = 1 2 z (zpz + 3¯zpz¯ ), z¯ z3 . 4
(383)
The subspace V4,2 is complex conjugate to V4,−2 , and V4,4 is complex conjugate to V4,−4 . F. Representation of h2 on the Real Space of Polynomials We have described the representation of h2 on the space of complex polynomials, but the real representation is more complicated. We have noted that, except for the axially symmetric polynomials, there exist no eigenvectors of Lz in the real polynomial space. Fortunately, using the decomposition of the space of complex polynomials, we can find the irreducible subspaces of real polynomials. Let us consider the irreducible subspaces Vn,l,j and Vn,−l,j . We showed that j ;l these subspaces are complex conjugate, and each element n Pm ∈ Vn,l,j is
325
LIE ALGEBRAIC METHODS j ;−l
complex conjugate to n Pm
∈ Vn,−l,j . Let us define two real polynomials
1 n j ;l n j ;−l Pm + Pm , 2 i j ;l j ;−l η = n Pm − n Pm . 2 ξ=
(384a) (384b)
Even though these two polynomials are not eigenvectors of Lˆ z , the subspace ξ, η is invariant under the action; that is, 1 j ;l j ;−l = lη, (385a) Lˆ z ξ = il n Pm − il n Pm 2 i j ;l j ;−l Lˆ z η = il n Pm + il n Pm = −lξ. (385b) 2 Now we will apply this approach to the whole space V . Let us define the real vectors n j ;l Cm = 2j −1 n Pmj ;l + n Pmj ;−l , (386a) n j ;l j −1 n j ;l n j ;−l Sm = i2 Pm − Pm (386b) and for each n, l > 0, and j the real subspaces 9 j ;l j ;l : + = n Cj , . . . , n C−j , Un,l,j 9 j ;l j ;l : − = n Sj , . . . , n S−j . Un,l,j
(387a) (387b)
Using the properties of representation h2 on the space of complex polynomials, we find − aˆ − n j ;l a n j ;l a − n j ;−l Cm = 2j −1 Pm + Pm j +m j +m j +m n j ;l j ;l j ;−l = 2j −1 n Pm−1 + n Pm (388) = Cm−1 and similarly aˆ − n j ;l n j ;l Sm = Sm−1 . j +m
(389) j ;l
+ can also be generated from n Cj by lowering with The subspace Un,l,j − aˆ − and the subspace Un,l,j can likewise be generated from the polynomial n S j ;l . m
Hence, these subspaces are irreducible under the action of sp(2, R) and isomorphic to Dj . Unfortunately, they are not invariant under the action of Lˆ z , j ;l j ;l Lˆ z n Cm = l n Sm ,
(390a)
ˇ RADLI CKA
326
j ;l j ;l Lˆ z n Sm = −l n Cm ,
(390b)
which causes mixing these two subspaces. The irreducible subspace under the action of h2 is then + − ⊕ Un,l,j , Un,l,j = Un,l,j
(391)
where ⊕ for this one instance means just vector space addition. When l = 0, the situation is more straightforward because all the polynoj ;0 mials n Pm are real polynomials expressed in complex coordinates. We need only to express the polynomials in real coordinates and normalize; that is, n j ;0 Cm
j ;0
= 2 j n Pm .
(392)
Hence, the decomposition of the real polynomial space n
U=
1
2 2
Un,2k,j
(393)
n k=0 j =k
is the decomposition on the irreducible subspaces according to the action of h2 . G. Example: Real Polynomials up to the Fourth Order We have calculated the decomposition of the space of complex polynomials to irreducible subspaces according to the action of h2 . Now we will use that result and, applying the procedure described in the last section, we will show the decomposition of the real polynomial space up to the fourth order. 1. The Third-Order Polynomials We will show the procedure for construction of the space U3,3,3/2 . Using + can be found: Eqs. (373), (377) and (386b), the basis of U3,3,3/2
3
C
3 3 1 ;3 ;−3 = 2 2 3 P 32 + 3 P 32 2 √ 32 3 = 2 pz + pz¯ = px 3 − 3px py2 , 1 = 2 2 (¯zpz + zpz¯ ) = x px2 − py2 − 2ypx py ,
3
C
1 = 2 2 pz z¯ 2 + pz¯ z2 = px x 2 − y 2 − 2xypy ,
(394c)
3
C
1 = 2 2 z3 + z¯ 3 = x 3 − 3xy 2 .
(394d)
3
3
C 32
;3
2
3 2 ;3 1 2 3 2 ;3 − 12 3 2 ;3 − 32
(394a) (394b)
327
LIE ALGEBRAIC METHODS
Alternatively, the basis can be found as the descending series generated by the 3/2;3 − polynomial 3 C3/2 . The basis of U3,3,3/2 takes the form
3
S
3 3 1 ;3 ;−3 = i2 2 3 P 32 − 3 P 32 2 √ 32 3 = i 2 pz − pz¯ = 3px2 py − py3 , 1 = i2 2 (¯zpz − zpz¯ ) = y px2 − py2 + 2xpx py ,
3
S
1 = i2 2 pz z¯ 2 − pz¯ z2 = py x 2 − y 2 + 2xypx ,
(395c)
3
S
1 = i2 2 z3 − z¯ 3 = 3x 2 y − y 3 .
(395d)
3
3
S 32
;3
2
3 2 ;3 1 2 3 2 ;3 − 12 3 2 ;3 − 32
Hence, U+
3,3, 32
U−
3,3, 32
9 = px 3 − 3px py2 , x px2 − py2 − 2ypx py , : px x 2 − y 2 − 2xypy , x 3 − 3xy 2 , 9 = 3px2 py − py3 , y px2 − py2 + 2xpx py , : py x 2 − y 2 + 2xypx , 3x 2 y − y 3 .
(395a) (395b)
(396a)
(396b)
Now we present the results for the other subspaces. U+
3,1, 12
U+
3,1, 32
U−
3,1, 32
= px Lz , xLz ,
< 1 = px p2 , xp2 + 3 < 1 = py p2 , yp2 + 3
U−
3,1, 12
= py Lz , yLz ,
2 1 px qp, px q2 + 3 3 2 1 py qp, py q2 + 3 3
= 2 xqp, xq2 , 3 = 2 2 yqp, yq . 3
(397)
(398a) (398b)
2. Polynomials of the Fourth Order Apart from the case l = 0, the procedure is similar to the case for polynomials of the third order. Only the results are presented here. + − U4,4,2 = U4,4,2 ⊕ U4,4,2 , where 4 2;4 C2 4 2;4 C1 4 2;4 C0
2 = px2 − py2 − 4px2 py2 , = (xpx − ypy ) px 2 − py2 − 2px py (xpy + ypx ), = x 2 − y 2 px2 − py2 − 4xypx py ,
(399a) (399b) (399c)
ˇ RADLI CKA
328 4 2;4 C−1 4 2;4 C−2
= (xpx − ypy ) x 2 − y 2 − 2xy(xpy + ypx ), 2 = x 2 − y 2 − 4x 2 y 2
+ and polynomials are bases of U4,4,2 4 2;4 S2 = 4px py px2 − py2 , 4 2;4 S1 = (xpy + ypx ) px2 − py2 + 2px py (xpx − ypy ), 4 2;4 S0 4 2;4 S−1 4 2;4 S−2
= 2(xpx − ypy )(xpy + ypx ), = (xpy + ypx ) x 2 − y 2 + 2xy(xpx − ypy ), = 4xy x 2 − y 2
(399d) (399e)
(400a) (400b) (400c) (400d) (400e)
− . form bases of U4,4,2 The subspace U4,2 is formed from subspaces 9 + U4,2,2 = px2 − py2 p2 , xpx3 − ypy3 , (xpx − ypy )qp, x 3 px − y 3 py , : 2 (401a) x − y 2 q2 , < 1 − = 2px py p2 , p2 (xpy + ypx ) + px py qp, (xpy + ypx )qp, U4,2,2 2 = 1 (401b) (xpy + ypx )q2 + xyqp, 2xyq2 , 2
9 : + U4,2,1 = px2 − py2 Lz , (xpx − ypy )Lz , x 2 − y 2 Lz , 9 : − = 2px py Lz , (xpy + ypx )Lz , 2xyLz . U4,2,1
(402a) (402b)
The subspace belonging to l = 0 is formed from irreducible subspaces = < 2 2 2 2 2 2 1 2 2 2 U4,0,2 = p , p qp, p q + 2(qp) , q qp, q , (403a) 3 : 9 (403b) U4,0,1 = p2 Lz , qpLz , q2 Lz , 9 2: U4,0,0 = Lz . (403c) H. Decomposition of Polynomials with Respect to M1 As shown previously, the irreducible subspaces in real polynomial subspace according to the action of h2 are also invariant with respect to
M1 = e:g2 : ,
g2 ∈ h2 .
(404)
LIE ALGEBRAIC METHODS
329
Moreover, if B(z) = 0, using Eqs. (322) and (330) g2 = a(z)p2 + b(z)qp + c(z)q2 + d(z)Lz ,
(405)
where each coefficient is nonzero, these invariant subspaces are irreducible. The decomposition into irreducible subspaces is then described by Eq. (393). Now we can easily find the physical meaning of the numbers that characterize the basis vectors of irreducible subspaces. First, we will show the meaning of number l. In complex polynomial space, it determines the eigenvalue of Lz , that is, j ;l j ;l Lˆ z n Pm = il n Pm ;
(406)
the meaning of l cannot be found in the real polynomial because, except for the axial symmetric polynomials, there are no eigenvectors of Lz . Let us consider the polynomial n j ;l Cm = 2j −1 n Pmj ;l + n Pmj ;−l , (407) which, according to the action of exp(ϕ:Lz :), takes the form j ;l j ;l j ;−l exp(ϕ:Lz :) n Cm = 2j −1 eϕ:Lz : n Pm + eϕ:Lz : n Pm j ;l j ;−l = 2j −1 eiϕl n Pm + e−iϕl n Pm .
(408)
j ;l
When ϕl = 2π k, the polynomial n Cm is invariant under the Lie transformation exp(ϕ:Lz :). This transformation represents the rotation in phase space j ;l around the optic axis through an angle ϕ = 2π k/ l. The polynomials n Cm j ;l and n Sm remained unchanged when the phase space is rotated around the origin through an angle 2π k/ l. The next interesting characteristic of the homogeneous polynomials is the eigenvalue m of Lie operator aˆ 0 . The operator a0 generates the Lie transformation exp(τ :a0 :), which acts as pure magnification in four-dimensional (4D) phase space, −τ e q q q exp(τ :a0 :) . (409) = exp(τ :qp:) = p p eτ p Thus, the value m describes how the influence of aberrations changes with magnification, or more trivially, by how it expresses excess of p’s over q’s; m is known as the Seidel weight (Navarro-Saad and Wolf, 1986). Aberrations with the highest weight are most important in practical calculations in magnifying systems because they contain just px or py and their influence does not change with distance from the axis in the object plane. The eigenvalue j of operator aˆ 2 has the worst-explained meaning. It expresses the order of skewness variable Lz = xpy −ypx in the homogeneous
ˇ RADLI CKA
330 j ;l
polynomial n Cm , which takes n/2 − j . In the axial symmetric case, when the order of Lz is odd, the polynomial is not invariant under the discrete space reflection x → −x, px → −px , y → y, py → py or y → −y, py → −py , x → x, px → px , respectively, and the aberrations that it describes are anisotropic. I. The Third-Order Axially Symmetric Aberrations The third-order axial symmetric aberrations are described by the Lie transformation exp(:g4 :), where g4 ∈ V4,0 , that is, g4 =
j 2
j ;0
αj,m 4 Cm .
(410)
j =0 m=−j
This type of aberration is well described and classified; we now mention the standard meaning of each member in Eq. (410). The term α2,2 4 C22;0 describes the spherical aberration 2 exp α2,2 :4 C22;0 : q = exp α2,2 : p2 : q = q − 4α2,2 p2 p + o(5) = q + C3 p2 p + o(5)
(411)
where we introduced the coefficient of third-order spherical aberration C3 = −4α2,2 . The next term describes the coma exp α2,1 :4 C12;0 : q = exp −α2,1 :p2 qp: q (412) = q + K p2 q + 2(qp)p + o(5), where the coefficient of the third-order coma K = −α2,1 was introduced. The 2;0 generates the distortion; that is, using D = −α2,−1 polynomial α2,−1 4 C−1 2;0 exp α2,1 :4 C−1 : q = exp −D:q2 qp: q = q + Dq2 q + o(5). (413) 2;0 The polynomial 4 C−2 has no effect on the change of coordinates; it affects only the final slopes. It is necessary to describe the effect of 4 C02;0 and 4 C00;0 together; they generate the astigmatism and the field curvature. We use the definition used in Dragt and Forest (1986b) rather than definition in Hawkes and Kasper (1989): 1 2 2 2 2 4 2;0 4 0;0 α2,0 C0 + α0,0 C0 = α2,0 p q + (qp) + α0,0 q2 p2 − (qp)2 3 3 1 (414) = −A(qp)2 − F p2 q2 , 2
LIE ALGEBRAIC METHODS
331
where the coefficients of astigmatism A = α0,0 − 2/3α2,0 and field curvature F = −2/3α2,0 − 2α0,0 were introduced. The effect of astigmatism then reads exp −A:(qp)2 : q = q + 2A(qp)q + o(5) (415) and similarly, the effect of the field curvature 1 2 2 exp − F :q p : q = q + F p2 q + o(5). 2
(416)
Both aberrations mix the effect of the action of the polynomials 4 C02;0 and 4 C 0;0 . In fact, these polynomials differ only in value of spin; their other 0 characteristics are identical. The polynomials from the space V4,0,1 are not invariant under reflection with respect to any plane that contains the optic axis. Thus, the aberrations that are described by these polynomials are anisotropic. The first is anisotropic coma exp α1,1 :4 C11;0 : q = exp −k:Lz p2 : q = q + k 2Lz p − p2 Jˆ2 q + o(5) = 2kqpJˆ2 p − 3kp2 Jˆ2 q,
(417)
with k = −α1,1 ; the second one is the anisotropic astigmatism exp α1,0 :4 C01;0 : q = exp(−a:Lz pq:)q = q + a(Lz q − qpJˆ2 q) + o(5) (418) = a q2 Jˆ2 p − 2qpJˆ2 q , where a = −α1,0 ; and the last is the anisotropic distortion 1;0 exp α1,−1 :4 C−1 : q = exp −d:Lz q2 : q = q − dq2 Jˆ2 q + o(5). (419) These coefficients have no analogue in light optics; their existence is caused by the presence of a magnetic field (Hawkes and Kasper, 1989). The structure of aberration polynomials merits further mention. They are represented by the third-order homogeneous polynomials as the result of the Poisson bracket [V4,0 , q] and because each g ∈ V4,0 satisfies Lˆ z [g, q] = [Lˆ z g, q] + [g, Lˆ z q] = [g, q]
(420)
where [V4,0 , q] ∈ V3,1 . The polynomial space V3,1 is formed by 12 polynomials and considering the phase-space dimension, it takes 124 possible combinations. However, we showed that there are just nine independent aberration coefficients. This is because the transformation is canonical and can be described by g4 ∈ V4,0 , like the Lie transformation exp(:g4 :). Hence,
ˇ RADLI CKA
332
the general axial symmetric aberration polynomial reads 2 q˜ qo po po qo = +C = exp(:g4 :) po po 0 p˜ 2 2 2(qo po )po + po qo qo po (qo po )qo +K + 2A +F −(qo po )po −p2o po −p2o qo 2 qo qo 0 +D +E −q2o qo −2(qo po )qo − q2o po Lz po − po2 Jˆ2 qo Lz qo − (qo po )Jˆ2 qo +k + a −Lz qo − qo po Jˆ2 po −po2 Jˆ2 po q2o Jˆ2 qo . (421) −d 2Lz qo + q2o Jˆ2 qo J. Reflection Symmetry We have noted that the reflection symmetry has interesting consequences in properties of aberration in the axial symmetric case. Therefore, we will describe the general reflection symmetry of the aberrations polynomials. Let us denote by Ωˆ α the reflection with respect to the plane −x sin α + y cos α = 0, the plane that arises from the plane xz by the rotation around the z-axis about the angle α. Such a transformation can be described by composition of transformations Ωα = Rˆ −α Ωˆ 0 Rˆ α .
(422)
Next we find the transformation property for the complex polynomial
n P j ;l . m
It is readily seen that Ωˆ 0 Lz = −Lz ; hence, n n −j n −j j ;l j ;l j ;l Ωˆ 0 n Pm = Ωˆ 0 Lz2 2j Pm = (−1) 2 −j Lz2 Ωˆ 0 2j Pm .
(423)
The transformation Ωˆ 0 , which in fact changes the sign of y and py , is l;j equivalent to complex conjugation for the complex polynomials 2j Pm . Hence, using Eq. (369) j ;l j ;−l Ωˆ 0 2j Pm = 2j Pm
(424)
and combining the previous results, we find n
−j j ;l j ;−l j ;−l Ωˆ 0 n Pm = (−1) 2 −j Lz2 2j Pm = (−1) 2 −j n Pm . n
n
(425)
333
LIE ALGEBRAIC METHODS j ;l
The transformation property of n Pm according to Ωˆ α can be found j ;l j ;l j ;l j ;−l Ωˆ α n Pm = Rˆ −α Ωˆ 0 Rˆ α n Pm = Rˆ −α Ωˆ 0 eilα n Pm = (−1) 2 −j eilα Rˆ −α n Pm n
n
j ;−l
= (−1) 2 −j e2ilα n Pm
.
(426)
There are two interesting cases, the first one—α = kπ/ l—then j ;l j ;−l Ωˆ k πl n Pm = (−1) 2 −j n Pm n
(427)
and the second—α = π/2l + kπ/ l—then j ;l j ;−l Ωˆ 2lπ +k πl n Pm = −(−1) 2 −j n Pm . n
(428)
These properties can be used to describe the transformation properties of real j ;l polynomials. First, we direct our attention to polynomial n Cm j ;l j ;l j ;−l Ωˆ α n Cm = Ωˆ α 2j −1 n Pm + n Pm n j ;−l j ;l = (−1) 2 −l e2ilα n Pm + e−2ilα n Pm , (429) hence, j ;l j ;l Ωˆ k πl n Cm = (−1) 2 −j n Cm , n
j ;l j ;l Ωˆ 2lπ +k πl n Cm = −(−1) 2 −j n Cm , n
(430a) (430b)
j ;l
and similarly for the polynomial n Sm j ;l j ;l j ;−l Ωˆ α n Sm = Ωˆ α i2j −1 n Pm − n Pm n j ;−l j ;l = i(−1) 2 −l e2ilα n Pm − e−2ilα n Pm ,
(431)
hence, j ;l j ;l Ωˆ k πl n Sm = −(−1) 2 −j n Sm , n
j ;l j ;l Ωˆ 2lπ +k πl n Sm = (−1) 2 −j n Sm . n
(432a) (432b)
The previous equations completely describe the reflection symmetry of polynomials. Unfortunately, from the reflection symmetry of the optical system, it is not possible to find the refraction symmetry of the aberration coefficients. This is a consequence of the anisotropic effect magnetic field, j ;l j ;l which mixes the polynomials n Cm with n Sm . We can proceed in this manner only for electrostatic lenses or in light optics.
ˇ RADLI CKA
334
K. Changing Parameterization We have described how the aberration coefficients change when the paraxial image coordinates are used instead of the object ones. Now we describe the change of the aberration coefficients, when the parameterization by the position in the object and aperture planes is used or when the position of the aperture plane is changed. Let us denote the interaction coordinates in case of parameterization by the object position and momenta q1 and p1 , in case of the parameterization by the position in the object and aperture plane q2 and p2 = qa . The transitions between rotating coordinates and interaction coordinates are then described by ⎞ ⎛ ∗− 12 ˆ ˆ h 1 g 1 φ o q ⎠ q1 ∗ (433) =⎝ 1 φ ˆ p1 p h1 φ ∗ 2 g 1ˆ φo∗
in the case of parameterization by position and momenta in the object plane or by s 1ˆ t 1ˆ q q2 (434) = ∗ 12 ˆ ∗ 12 ˆ p p2 φ s1 φ t1 in the case of parameterization by the position in the object and aperture plane. It is clear that the meaning of q1 and q2 does not differ, thus, q 1 = q2 .
(435)
Comparing the equations for q ∗− 12
q = gq1 + φo
hp1 ,
q = sq2 + tp2 = sq1 + tp2 ,
(436) (437)
we can find ∗− 12
p2 = ga q1 + φo
ha p1
(438)
where we used s=g− t=
ga h, ha
1 h. ha
(439) (440)
The subscript a indicates that the function is evaluated in za , i.e., ha = h(za ).
335
LIE ALGEBRAIC METHODS
The transformation takes the form of an extended canonical transformation ˆ 1 0ˆ q1 q2 1 = ∗− 2 p2 ˆ ˆ p 1 ga 1 φo ha 1 ˆ 1 0ˆ 1 q1 1 . (441) = exp ga :q2 : p1 ˆ0 φo∗− 2 ha 1ˆ 2 Now we determine how the transformation acts on the symplectic polynomials n C j ;l , that is, m j ;l j ;l n j ;l Cm (q2 , p2 ) = Mn Cm = n Cm q2 (q1 , p1 ), p2 (q1 , p1 ) . (442) First, we show the effect of the scaling, which is described by the diagonal matrix ˆ 1 0ˆ 1 Ms = . (443) ∗− 0ˆ φo 2 ha 1ˆ ∗− 12
This transformation leaves q1 unchanged and scales p1 by the factor φo j ;l The power of p1 in n Cm is expressed by the eigenvalue m, that is, n ˜j ;l Cm
∗− 1 n +m n j ;l j ;l = Ms n Cm = φo 2 ha 2 Cm .
ha .
(444)
The action of the canonical part of the transformation takes the form ∞ (ga aˆ − )i n ˜j ;l 1 j ;l j ;l Mc n C˜m = exp ga :q2 : n C˜m = Cm ; (445) 2 i! i=0
using Eq. (388), we find j ;l Mc n C˜m
=
j +m i=0
j + m i n ˜j ;l ga Cm−i . i
(446)
Finally, the transformation is described by the composition of these two maps: j ;l n j ;l Cm (q2 , p2 ) = Mc Ms n Cm j +m ∗− 12 n +(m−i) j + m i n j ;l 2 φo ha ga Cm−i (q1 , p1 ). (447) = i i=0
We can see that all irreducible subspaces Vnlj are invariant under the transformation M and we can describe them independently. Let us consider
ˇ RADLI CKA
336 the polynomial j
gn =
j ;l am n Cm (q1 , p1 )
=
m=−j
j
j ;l
a˜ m n Cm (q2 , p2 ).
(448)
m=−j
Using Eq. (447), we find gn =
j
a˜ m
m=−j
=
j
j +m
n +(m−i) i ∗− 1 φo 2 ha 2 ga
i=0 j
n +k j ;l ∗− 1 φo 2 ha 2 n Ck
k=−j
j +m i
n j ;l Cm−i (q1 , q2 )
gam−k a˜ m
m=k
j +m m−k
(449)
and comparing the previous two equations, it is easy to find the transformation of aberration coefficients in the form j ∗− 12 n +k m−k j + m 2 ak = φo ha . (450) a˜ m ga m−k m=k
Let us now apply the previous results to the transformation of the third-order axial symmetric aberrations. They are described by g4 in Eq. (495a)— j ;0 j ;0 g4 = αj,m 4 Cm (q1 , p1 ) = α˜ j,m 4 Cm (q2 , p2 ) (451) j,m
j,m
and using Eq. (450), we can write for V4,0,2 ⎛ ∗− 1 ⎞ ⎛ (φo 2 ha )−4 α2,2 1 0 0 ⎜ ⎟ ⎜ ∗− 12 ⎟ −3 ⎜ 1 0 ⎜ (φo ha ) α2,1 ⎟ ⎜ 4ga ⎜ ⎟ ⎜ 2 2 ⎜ ∗− 12 ⎟ 6ga 3ga 1 ⎜ (φo ha )−2 α2,0 ⎟ = ⎜ ⎜ ⎟ ⎜ 3 2 ⎝ 4ga 3ga 2ga ⎜ ∗− 1 ⎟ ⎝ (φo 2 ha )−1 α2,−1 ⎠ ga4 ga3 ga2 α2,−2 and similarly for V4,0,1 ⎛ ∗− 1 (φo 2 ha )−3 α1,1 ⎜ ⎜ ∗− 12 ⎜ (φo ha )−2 α1,0 ⎝ ∗− 12
(φo
ha )−1 α1,−1
⎞ ⎟ ⎟ ⎟= ⎠
1 2ga ga2
0 1 ga
1
⎞⎛ ⎞ 0 α˜ 2,2 ⎟ ⎜ 0⎟ ⎟ ⎜ α˜ 2,1 ⎟ ⎟ ⎜ 0 ⎟ ⎜ α˜ 2,0 ⎟ ⎟ (452) ⎟⎜ ⎟ ⎠ ⎝ 0 α˜ 2,−1 ⎠
ga
1
0 0 0
0 0 1
α˜ 2,−2
α˜ 1,1 α˜ 1,0 α˜ 1,−1
(453)
LIE ALGEBRAIC METHODS
337
and finally ∗− 1 −2 α˜ 0,0 . α0,0 = φo 2 ha
(454)
Transformation of the aberration coefficients can be easily found from the transformation property of αj,m using their definition, presented in Section VI.I. The aberration coefficients also vary when the aperture position is changed. Let us consider a system with the object position in zo and the aperture position in za . The rays are parameterized by the position in the object plane q1 and by the position in the aperture plane, which plays the role of the generalized momentum p1 . When the position of the aperture is changed to z = za˜ , the rays are then similarly parameterized by q2 = q1 , p2 . The paraxial ray then reads q = sq1 + tp1 = s˜ q1 + t˜p2 ;
(455)
hence, p2 =
s − s˜ t q1 + p1 , t˜ t˜
(456)
which can be using t˜ =
t , t (za˜ )
s˜ = s −
s(za˜ ) t t (za˜ )
(457)
written in form p2 = s(za˜ )q1 + t (za˜ )p1 .
(458)
The transformation matrix between q1 , p1 and q2 , p2 then describes the extended canonical transformation ˆ ˆ 1 0ˆ 1 0ˆ 1 q2 q1 q1 2 = = exp sa˜ :q : . (459) p2 p1 p1 2 sa˜ 1ˆ ta˜ 1ˆ 0ˆ ta˜ 1ˆ Because the procedure is similar to previous case, we present only the results: j ;l n j ;l Cm (q2 , p2 ) = Mc Ms n Cm j +m n j + m i n j ;l +(m−i) sa˜ Cm−i (q1 , p1 ) (ta˜ ) 2 (460) = i i=0
and analogically to Eq. (450), we can determine j n j +m m−k +k ak = (ta˜ ) 2 . a˜ m sa˜ m−k m=k
(461)
ˇ RADLI CKA
338
When the results are applied to the third-order axial symmetric aberrations, for V4,0,2 we derive ⎞ ⎛ 1 (ta˜ )−4 α2,2 ⎜ (t )−3 α2,1 ⎟ ⎜ 4sa˜ ⎟ ⎜ ⎜ a˜ ⎟ ⎜ 2 ⎜ ⎜ (ta˜ )−2 α2,0 ⎟ = ⎜ 6sa˜ ⎟ ⎜ ⎜ ⎝ (ta˜ )−1 α2,−1 ⎠ ⎝ 4s 3 a˜ α2,−2 s4 ⎛
a˜
0 1 3sa2˜
0 0 1
0 0 0
3sa2˜ sa3˜
2sa˜
1
sa2˜
sa˜
0 ⎞ ⎛ α˜ 2,2 ⎞ ⎟ ⎜ 0⎟ ⎟ ⎜ α˜ 2,1 ⎟ ⎟ ⎜ 0 ⎟ ⎜ α˜ 2,0 ⎟ ⎟. ⎟ ⎟⎜ ⎠ ⎝ 0 α˜ 2,−1 ⎠ α˜ 2,−2 1
(462)
Similarly for V4,0,1 we derive ⎞ ⎛ 1 (ta˜ )−3 α1,1 ⎝ (ta˜ )−2 α1,0 ⎠ = ⎝ 2sa˜ sa2˜ (ta˜ )−1 α1,−1 ⎛
0 1 sa˜
⎞⎛ ⎞ α˜ 1,1 0 0 ⎠ ⎝ α˜ 1,0 ⎠ , 1 α˜ 1,−1
(463)
and finally α0,0 = (ta˜ )−2 α˜ 0,0 .
VII. A XIAL S YMMETRIC A BERRATIONS OF
(464)
THE
F IFTH O RDER
Recently the demand for aberration calculations has increased, since the exact calculations of the field and ray-tracing lead to better designed electron systems. Moreover, the possibility of eliminating all primary aberrations (Rose, 1987) makes a knowledge of the aberrations of fifth order necessary. Therefore, we present a calculation of the fifth-order aberration of a round magnetic lens based on the Lie algebraic method. The value of the spherical aberration will be compared with the value that is calculated by fitting of raytracing results. The analytical calculation of the fifth-order aberration is not an easy task because the symbolic calculation becomes very lengthy. We used the mathematical program Maple (Maple, 2006) for symbolic calculation. The fifth-order aberrations are described by g4 and g6 ; hence, for the calculation the symplectic structure of the axial symmetric polynomials up to sixth order must be known. This is solved in the first two sections. Next we show the expansion of the interaction Hamiltonian up to the sixth order and the calculation of the aberration using the Lie algebraic method.
LIE ALGEBRAIC METHODS
339
A. Axial Symmetric Polynomial of the Sixth Order According to Eq. (368) the space of the sixth-order axially symmetric polynomials is formed by V6,0 ∼ = V6,0,0 ⊕ V6,0,1 ⊕ V6,0,2 ⊕ V6,0,3 .
(465)
The space V6,0,3 is a descending series of the polynomials determined by the polynomial (p2 )3 , 6 3;0 C3 6 3;0 C2 6 3;0 C1 6 3;0 C0 6 3;0 C−1 6 3;0 C−2 6 3;0 C−3
3 = p2 , 2 1 3 = aˆ − p2 = qp p2 , 6 1 1 2 = aˆ − 6 C23;0 = q2 p2 + 4(qp)2 p2 , 5 5 1 2 3 = aˆ − 6 C13;0 = (qp)3 + q2 p2 qp, 4 5 5 1 1 2 4 = aˆ − 6 C03;0 = p2 q2 + q2 (qp)2 , 3 5 5 1 − 6 3;0 = aˆ C−1 = q2 qp, 2 3 3;0 = aˆ − 6 C−2 = q2 .
(466a) (466b) (466c) (466d) (466e) (466f) (466g)
Using Eq. (365) we can easily compute others subspaces from the axial symmetric polynomials of lower orders. The first is the anisotropic subspace of the axial symmetric polynomials of the sixth order, V6,0,2 = Lz V4,0,2 ,
(467)
6 2;0 Cm
(468)
that is 2;0 = Lz 4 Cm .
The next subspace is formed by the isotropic polynomials 9 : V6,0,1 = L2z V2,0,1 = L2z p2 , L2z qp, L2z q2 .
(469)
The last one is formed by one anisotropic polynomial 9 : V6,0,0 = L3z .
(470)
340
ˇ RADLI CKA
B. Algebra of the Axial Symmetric Polynomials up to the Sixth Order The subspace of the axially symmetric polynomials up to the sixth order takes the form
O6 = h2 ⊕ V4,0,0 ⊕ V4,0,1 ⊕ V4,0,2 ⊕ V6,0,0 ⊕ V6,0,1 ⊕ V6,0,2 ⊕ V6,0,3 , (471) where all the subspaces were explicitly described previously. We introduce the algebraic structure of O6 , which is necessary for the description of the fifth-order axially symmetric aberrations. We have shown that each subspace in the decomposition in Eq. (471) is irreducible with respect to the action h2 . However, the space O6 is not an irreducible subspace of the axially symmetric polynomial with respect to the standard Poisson bracket, because e.q. the Poisson bracket [6 C03;0 , 6 C23;0 ] is a polynomial of the tenth order. To avoid the difficulty, we modify the definition of the Lie bracket on the space O6 following
j ;0 j ;0 n j1 ;0 n j2 ;0 [n1 Cm11 , n2 Cm22 ] if n1 + n2 − 2 6, (472) 1C 2 m1 , Cm2 6 = 0 else. This definition ensures that the Poisson brackets, which would be of higher order than six, vanish. This means that the Lie brackets of any two polynomials from V6,0 or one polynomial from V4,0 and one from V6,0 vanish. Because the action of h2 was described previously, we need only describe the Lie bracket of any two polynomials from V4,0 . Because the polynomials are axially symmetric the Lie brackets [V4,0,0 , O6 ] vanish. Now we calculate the Lie brackets [V4,0,1 , V4,0,1 ⊕ V4,0,2 ] 4 1;0 4 j ;0 Cm1 , Cm2 3−j 1;0 2j j ;0 = Lz 2 Cm , Cm2 1 ⎧ 3−j + 2j j ;0 j ;0 −2Lz [a , Cm2 ] = −2(m2 − j ) 6 Cm2 +1 if m1 = 1, ⎪ ⎪ ⎨ 6 j ;0 2j j ;0 = 2L3−j if m1 = 0, (473) z [a0 , Cm2 ] = 2m2 Cm2 ⎪ ⎪ ⎩ 3−j − 2j j ;0 j ;0 2Lz [a , Cm2 ] = 2(m2 + j ) 6 Cm2 −1 if m1 = −1. The calculation of the Lie brackets [V4,0,2 , V4,0,2 ] is more complicated. We will use several properties to shed light on the algebraic structure. First, we describe the eigenvalues of the Lie bracket of two polynomials from V4,0 with respect to aˆ 0 , Lˆ z , Nˆ and aˆ 2 . Because the Lie bracket consists of a multiplication and two derivatives, we can write for the number operator j ;0 j ;0 j ;0 j ;0 Nˆ 4 Cm11 , 4 Cm22 = 6 4 Cm11 , 4 Cm22 . (474)
LIE ALGEBRAIC METHODS
341
Lˆ z and aˆ 0 are Lie operators; hence, j ;0 j ;0 j ;0 j ;0 j ;0 j ;0 Lˆ z 4 Cm11 , 4 Cm22 = Lˆ z 4 Cm11 , 4 Cm22 + 4 Cm11 , Lˆ z 4 Cm22 = 0, (475a) j ;0 j ;0 j ;0 j ;0 j ;0 j ;0 aˆ 0 4 Cm11 , 4 Cm22 = aˆ 0 4 Cm11 , 4 Cm22 + 4 Cm11 , aˆ 0 4 Cm22 j ;0 j ;0 = (m1 + m2 ) 4 Cm11 , 4 Cm22 . (475b) The first equation shows that the Lie bracket of any two axially symmetric polynomials is again the axially symmetric polynomial. The consequence of the second equation is 4 j1 ;0 4 j2 ;0 Cm1 , Cm2 = 0 for |m1 + m2 | > 3. (476) The situation is more complicated for aˆ 2 because it is not a Lie operator and j ;0 j ;0 the Lie bracket [4 Cm11 , 4 Cm22 ] need not be the eigenvector of aˆ 2 ; hence, 4 j1 ;0 4 j2 ;0 6 j ;0 Cm1 , Cm2 = cj Cm1 +m2 . (477) j ∈J
However, we can at least reduce the set J . We have shown that eigenvalue m is connected with eigenvalue j , i.e., m ∈ {−j, . . . , j }; hence, possible values of j are constrained to j ∈ {|m1 + m2 |, . . . , 3}. Because the reflection with respect to the plane that contains the optic axis is the canonical transformation, the Lie brackets satisfy j ;0 j ;0 j ;0 j ;0 Ω0 4 Cm11 , 4 Cm22 = Ω0 4 Cm11 , Ω0 4 Cm22 j ;0 j ;0 = (−1)2−j1 (−1)2−j2 4 Cm11 , 4 Cm22 . (478) The result of the Lie bracket of any two isotropic or anisotropic polynomials is an isotropic polynomial, but when these polynomials are mixed the result is an anisotropic polynomial. In our case of j1 = j2 = 2, this restricts the possible value of j in Eq. (477) to j ∈ {1, 3}. Summarizing the previous results, we can find 4
2;0 4 2;0 Cm , Cm2 = 1
3;0 c3 6 Cm 1 +m2
1 < |m1 + m2 | 3,
1;0 3;0 + c3 6 Cm c1 6 Cm 1 +m2 1 +m2
|m1 + m2 | < 2.
(479)
However, the concrete values of the coefficients c1 and c3 must be determined by standard calculation of Poisson brackets. We present only the results: 3 4 2;0 4 2;0 C2 , C1 = −4 p2 = −46 C33;0 , (480a) 4 2;0 4 2;0 2 2 6 3;0 C2 , C0 = −8 p qp = −8 C2 , (480b) 2 4 2;0 4 2;0 2 8 C2 , C−1 = 8 p2 qp − 4 p2 q2 = − 6 C11;0 − 126 C13;0 , (480c) 5
ˇ RADLI CKA
342 4
2 32 2;0 = −16p2 (qp)2 − 4 p2 q2 = − 6 C01;0 − 166 C03;0 , C22;0 , 4 C−2 5 4 2;0 4 2;0 4 2 1;0 3;0 C1 , C0 = −4p2 q2 = 6 C1 − 46 C1 , 5 4 2;0 4 2;0 4 C1 , C−1 = −4(qp)3 − 4q2 p2 qp = 6 C01;0 − 86 C03;0 , 5 4 2;0 4 2;0 28 1;0 2 3;0 C1 , C−2 = −8(qp)2 q2 − 4p2 q2 = − 6 C−1 − 126 C−1 , 5 4 2;0 4 2;0 4 1;0 3;0 C0 , C−1 = −4q2 (qp)2 = 6 C−1 − 46 C−1 , 5 4 2;0 4 2;0 2 3;0 C0 , C−2 = −8 q2 qp = −86 C−2 , 4 2;0 4 2;0 2 3 3;0 C−1 , C−2 = −4 q = −46 C−3 .
(480d) (480e) (480f) (480g) (480h) (480i) (480j)
These commutation relations completely describe the algebraic structure of O6 . C. Interaction Hamiltonian of a Round Magnetic Lens The vector potential of the axial symmetric magnetic field reads (Hawkes and Kasper, 1989) 1 Ax = − yΠ z, q2 , 2 1 Ay = xΠ z, q2 , 2 Az = 0
(481) (482) (483)
where Π takes the form 1 2 2 (4) 1 (484) q B + ···. Π = B(z) − q2 B + 8 192 When the potential is substituted to Eq. (29) with pτ = 0 and Φ ∗ = φ, the Hamiltonian reads , 1 e2 ∗ 2 − eL Π z, q2 − e2 q2 Π 2 z, q2 . H = φ − p (485) z 4 η2 In rotating coordinates (88) we can find the expansion of the Hamiltonian in the form H2 =
1 ∗− 1 2 1 ∗ 1 2 2 φ 2p + φ 2α q , 2 2
(486)
343
LIE ALGEBRAIC METHODS
2 1 ∗− 3 2 2 1 ∗− 1 2 2 2 1 ∗ 1 4 φ 2 p + φ 2 α q p + φ 2 α − αα q2 8 4 8 1 ∗− 1 2 2 1 ∗−1 2 1 3 1 α − α Lz q2 , + φ 2 α Lz + φ αp Lz + (487) 2 2 2 8 2 2 3 1 1 ∗− 5 2 3 3 1 H6 = φ 2 p + φ ∗− 2 α 2 q2 p2 + φ ∗− 2 3α 4 − αα p2 q2 16 16 16 2 3 1 ∗1 6 1 1 3 2 (4) q + φ 2 α − α α + α − αα 16 8 12 1 ∗− 1 1 2 2 3 2 ∗− 3 2 2 4 + φ 2 3α − αα Lz q + α φ 2 Lz p 4 2 4 3 ∗−2 2 2 1 ∗−1 1 2 2 3 3α − α q p Lz + αφ p Lz + φ 8 4 4 1 1 3 1 2 3α 5 + α (4) − α 2 α Lz q2 + φ ∗−1 α 3 L3z + (488) 8 24 2 2 H4 =
where α(z) =
η 2φ ∗ 2 1
B(z).
(489)
In the calculation we use the parameterization by the positions in the object and aperture plane; thus the paraxial approximation takes the form q(z) = s(z)qo + t (z)qa , p(z) = φ
∗ 12
s (z)qo + φ
∗ 12
(490) t (z)qa .
(491)
The interaction coordinates are then in the form of Eq. (226) and the interaction Hamiltonian can be calculated using Eq. (227): H int =
˜ z), z + H6 w(w, ˜ z), z + · · · . H4 w(w,
η ∗ 12
(492)
eφo Wso After lengthy but trivial calculation, H4int and H6int can be written in the form H4int =
j 2 j =0 m=−j
j ;0
aj,m 4 Cm ,
H6int =
j 3
j ;0
bj,m 4 Cm .
j =0 m=−j
The form of the coefficients aj,m and bj,m is show in Appendix B.
(493)
ˇ RADLI CKA
344
D. Calculation of g4 and g6 The Lie transformation that describes the aberration up to the fifth order for axial symmetric systems is described by the polynomials g4 and g6 ,
M1 e:g6 : e:g4 : .
(494)
The axial symmetry determines the form of g4 and g6 ; generally they are written as g4 =
j 2
j ;0
αj,m 4 Cm
(495a)
j =0 m=−j
or g6 =
j 3
j ;0
βj,m 6 Cm .
(495b)
j =0 m=−j
Now the form of the coefficients αj,m and βj,m must be determined. Because zi g4 (zi ) = −
H4int (qo , qa , z) dz = −
zi j 2
j ;0
aj,m (z) dz 4 Cm
(496)
j =0 m=−j zo
zo
using Eq. (495a), we can easily show that zi αj,m = −
aj,m (z) dz.
(497)
zo
The situation for βj,m is more complicated because the term g6 is calculated from zi zi z 1 int dz dz1 H4 (qo , qa , z1 ), H4 (qo , qa , z) g6 = − H6 (qo , qa , z) dz + 2 zo
=−
zo
3
j
zi
zo
j ;0
bj,m (z) dz 6 Cm
j =0 m=−j zo 3 1 + 2
×
4
j1
zi j2
j1 ,j2 =0 m1 =−j1 m2 =−j2 zo
j1 ;0 4 j2 ;0 Cm . 1 , Cm2
z dz1 aj1 ,m1 (z1 )aj2 ,m2 (z)
dz zo
(498)
LIE ALGEBRAIC METHODS j ;0
345
j ;0
The Lie brackets [4 Cm11 , 4 Cm22 ] render the calculation difficult. We will calculate the contribution of each Seidel weight independently. First, we describe the contribution with Seidel weight m = 3; we show later that this polynomial describes the spherical aberration of the fifth order. Using the algebraic structure of O6 described previously, we find that the contribution to 6 C33;0 yields only the bracket [4 C22;0 , 4 C12;0 ] = −4 6 C33;0 ; hence, zi β3,3 = −
zi b3,3 (z) dz − 2
zo
z dz
zo
dz1 a2,2 (z1 )a2,1 (z) − a2,2 (z)a2,1 (z1 ) .
zo
(499)
To simplify the notation let us define the bilinear operator zi
I (f, g) =
z dz
zo
dz1 f (z1 )g(z) − f (z)g(z1 ) .
(500)
zo
Eq. (499) now reads zi b3,3 (z) dz − 2I (a2,2 , a2,1 ).
β3,3 = −
(501)
zo
Next, we find the part with Seidel weight m = −2, which describes the distortion; that is, we must find the form of β3,−2 and β2,−2 , which are the 3;0 2;0 or 6 C−2 , respectively. We use Eq. (498) and commutation coefficients of 6 C−2 relations in the space O6 . The contribution of the Poisson bracket in Eq. (498) 3;0 to 6 C−2 is produced by 4
2;0 4 2;0 2 2;0 3;0 C02;0 , 4 C−2 , C0 = −8 q2o (qo qa ) = −86 C−2 , = − 4 C−2
hence, zi b3,−2 (z) dz + 4I (a2,−2 , a2,0 ).
β3,−2 = − zo
2;0 is given by the Lie brackets The contribution to 6 C−2
4
2;0 4 1;0 2;0 2;0 C−2 , C0 = − 4 C01;0 , 4 C−2 , = 4 6 C−2 4 1;0 4 1;0 4 2;0 4 1;0 2;0 C−1 , C−1 = − C−1 , C−1 = −2 6 C−2
(502)
ˇ RADLI CKA
346 and β2,−2 then takes the form zi
b2,−2 (z) dz + 2I (a2,−2 , a1,0 ) − I (a2,−1 , a1,−1 ). (503)
β2,−2 = − zo
Using a similar approach we find for m = 2 zi β3,2 = −
b3,2 (z) dz − 4I (a2,2 , a2,0 ),
(504)
b2,2 (z) dz + 2I (a1,0 , a2,2 ) + I (a2,1 , a1,1 ),
(505)
zo
zi β2,2 = − zo
for m = 1 zi b3,1 (z) dz + 6I (a2,−1 , a2,2 ) + 2I (a2,0 , a2,1 ),
β3,1 = −
(506)
zo
zi β1,1 = − zo
4 2 b1,1 (z) dz + I (a2,−1 , a2,2 ) + I (a1,0 , a1,1 ) + I (a2,1 , a2,0 ), 5 5 (507)
zi b2,1 (z) dz + 4I (a1,−1 , a2,2 ) + 2I (a2,0 , a1,1 ) + I (a1,0 , a2,1 ),
β2,1 = − zo
(508)
for m = 0 zi b3,0 (z) dz + 4I (a2,−1 , a2,1 ) + 8I (a2,−2 , a2,2 ),
β3,0 = −
(509)
zo
zi β1,0 = −
6 32 b1,0 (z) dz + I (a2,−1 , a2,1 ) + I (a2,−2 , a2,2 ) 5 5
zo
β2,0
+ 2I (a1,−1 , a1,1 ), zi = − b2,0 (z) dz + 3I (a1,−1 , a2,1 ) + 3I (a2,−1 , a1,1 ), zo
(510) (511)
LIE ALGEBRAIC METHODS
zi β0,0 = −
2 2 b0,0 (z) dz + I (a1,−1 , a2,1 ) + I (a2,−1 , a1,1 ), 5 5
347
(512)
zo
and for m = −1 zi b3,−1 (z) dz + 6I (a2,−2 , a2,1 ) + 2I (a2,−1 , a2,0 ), (513)
β3,−1 = − zo zi
β1,−1 = −
4 b1,−1 (z) dz + I (a2,−2 , a2,1 ) + I (a1,−1 , a1,0 ) 5
zo
β2,−1
2 + I (a2,0 , a2,−1 ), 5 zi = − b2,−1 (z) dz + 2I (a1,−1 , a2,0 ) + 4I (a2,−2 , a1,1 )
(514)
zo
+ I (a2,−1 , a1,0 ).
(515)
The formulas are completed by substituting the terms aj,m and bj,m , which can be found in Appendix B. E. Spherical Aberration of the Fifth Order This subsection present the calculation of the spherical aberration of the fifth order. The ray is described by the Lie transformation q(z) = M1 e:g6 (qo ,po ,z): e:g4 (qo ,po ,z): qo .
(516)
In the image plane and in the rotating coordinates, this can be expanded into qi = Mqo + M g4 (zi ), qo + M g6 (zi ), qo 1 + M g4 (zi ), g4 (zi ), qo + · · · . (517) 2 The first term describes the paraxial approximation, the second one determines the third-order aberrations, and the last two terms cover the fifth-order aberrations. The paraxial approximation and the third-order aberrations were described previously; here we describe only the fifth-order aberration. The spherical aberration is determined by the polynomials with the highest Seidel
ˇ RADLI CKA
348
weight (m = 5/2 for fifth-order polynomials). We separate the effect of g6 and g4 . The effect of g6 consists of the summation of Lie brackets of the form ⎛ 1 ;1 ⎞⎤ ⎡ 1 2 ⎢6 j ;0 ⎜ C− 12 ⎟⎥ 6 j ;0 (518) Cm , qo = ⎣ Cm , ⎝ 1 ⎠⎦ , 1 S 2 ;1 1 −2
1 2 ;1 − 12
where we used x = 1 C
1 2 ;1 − 12
and y = 1 S
. If this term describes the spherical
aberration, it must have the Seidel weight 5/2 (the highest weight in fifthorder polynomials). On the other hand, because aˆ 0 is a Lie operator the Seidel weight of the Lie bracket in Eq. (518) is m − 1/2; thus m = 3. Hence, the effect of g6 on the spherical aberrations is represented by 3 2 j ;0 M βj,3 6 C3 , qo = β3,3 q2a , qo = −6Mβ3,3 q2a qa . (519) j
In the case of g4 the contribution consists of a summation of double Lie brackets of the form ⎡ ⎛ 1 ;1 ⎞⎤⎤ ⎡ 1 2 4 j1 ;0 4 j2 ;0 ⎢4 j1 ;0 ⎢4 j2 ;0 ⎜ C− 12 ⎟⎥⎥ (520) Cm1 , Cm2 , qo = ⎣ Cm1 , ⎣ Cm2 , ⎝ 1 ⎠⎦⎦ , 1 S 2 ;1 1 −2
which has the Seidel weight m1 + m2 − 1/2. Because the Seidel weight must be 5/2, we can easy find constraints such that m1 + m2 = 3. Hence, the contribution of g4 to the spherical aberration of the fifth order takes the form j ;0 j ;0 1 M αj1 ,m1 αj2 ,m2 4 Cm11 , 4 Cm22 , qo , (521) 2 m1 +m2 =3 j1 ,j2
which can be evaluated thus: 1 Mα2,2 α2,1 4 C22;0 , 4 C12;0 , qo + 4 C12;0 , 4 C22;0 , qo 2 1 + Mα2,2 α1,1 4 C22;0 , 4 C11;0 , qo + 4 C11;0 , 4 C22;0 , qo . (522) 2 The first row represents the isotropic spherical aberration and the second the anisotropic spherical aberration. After a short calculation, the first row vanishes and the contribution is reduced to 2 2 ya . (523) −4Mα2,2 α1,1 qa −xa
349
LIE ALGEBRAIC METHODS
The fifth-order spherical aberration then becomes 2 2 −6Mβ3,3 q2a qa − 4Mα2,2 α1,1 q2a
ya −xa
.
(524)
Summarizing, the coefficient of the isotropic spherical aberration reads C5 = −6β3,3
(525)
c5 = −4α2,2 α1,1 = −C3 k3 .
(526)
and the anisotropic one
The results were tested on the simple magnetic lens shown in Figure 4. The field was calculated in the program EOD (Electronic Optic Design) (Lencová and Zlámal, 2006). We used a regular mesh to establish the axial density at equally spaced points. The derivatives of the axial flux density up to fourth order were calculated using Gaussian wavelets (Berz, 1999). The object plane z = −0.04 m is focused to the image plane z = 0.01 m. The aperture plane was chosen to be located at z = −0.02 m. The coefficients of the spherical aberration of the third and fifth order are calculated from the aberration integrals derived. The spherical aberrations also can be easily calculated by fitting ray-tracing results. Let us assume that all rays start from the axis with ya = 0. For the ray intersections with the image plane xi = M C3 xa3 + C5 xa5 + C7 xa7 + · · · , (527) 5 7 yi = −M c5 xa + c7 xa + · · · . (528) The aberration coefficients can be easily computed by fitting the ray-tracing results to these expressions. These results are compared with results computed using aberration integral in Table 1. The results show very good agreement between the results derived by fitting and those obtained by calculation of the aberration integrals. We showed that the spherical aberration also has the anisotropic part. This is a surprising property of fifth-order aberrations. It shows that finding the general structure of the fifth-order aberrations of axially symmetric systems is a very important and difficult task. We will proceed for each Seidel weight independently using a similar procedure to that used above. F. Distortion: Seidel weight −5/2 The distortion depends only on position in the object plane and is hence described by the polynomials that do not depend on the aperture position.
ˇ RADLI CKA
350
F IGURE 4.
Testing a round magnetic lens and the field of the lens on the axis.
LIE ALGEBRAIC METHODS
351
TABLE 1
Coefficient
Aberration integral
Fitting
C3 C5 c5
1.532 · 106 m−2 2.889 · 1011 m−4 7.649 · 1010 m−4
1.532 · 106 m−2 2.831 · 1011 m−4 7.650 · 1010 m−4
In the case of fifth-order polynomials, they have the Seidel weight −5/2. The effect of g6 can be expressed as j ;0 M βj,−2 6 C−2 , qo j
3;0 2;0 , qo + β2,−2 6 C−2 , qo = M β3,−2 6 C−1 2 2 = Mβ3,−2 q2o (qo qa ), qo + Mβ2,−2 Lz q2o , qo .
(529)
The effect of g4 has the form of Eq. (520), but the Seidel weight is now − 52 ; thus, m1 + m2 = −2, 1 M 2
j ;0 j ;0 αj1 ,m1 αj2 ,m2 4 Cm11 , 4 Cm22 , qo .
(530)
m1 +m2 =−2 j1 ,j2
After a short calculation we can easily find 2 2 3 2 1 2 −M β3,−2 + 4α2,0 α2,−2 − α2,−1 qo qo + α1,−1 2 2
(531)
for the isotropic distortion of the fifth order and 2 M(β2,−2 − 2α1,−1 α2,−1 + 2α2,−2 α1,0 ) q2o Jˆqo
(532)
for the anisotropic distortion. G. Coma: Seidel Weight 3/2 We now describe the aberrations that are linear in the distance from the optic axis in the object plane. This aberration is known as coma. Note that we use terminology proposed by Liu (2002) rather than by Zhu et al. (1997). These aberrations must have the Seidel weight m = 3/2. Using a similar approach to that followed for spherical aberration, we can write for the effect of g6 on
ˇ RADLI CKA
352
fifth-order coma as j ;0 Mβj,2 6 C2 , qo j
= Mβ3,2 6 C23;0 , qo + Mβ2,2 6 C22;0 , qo 2 2 = Mβ3,2 q2a (qo qa ), qo + Mβ2,2 q2a Lz , qo .
(533)
Similarly, for the effect of g4 we can constrain m1 + m2 = 2; hence, 1 M 2
j ;0 j ;0 αj1 ,m1 αj2 ,m2 4 Cm11 , 4 Cm22 , qo ,
(534)
m1 +m2 =2 j1 ,j2 j1 ,j2
which can be divided into an isotropic part M α2,2 α2,0 4 C22;0 , 4 C02;0 , qo + 4 C02;0 , 4 C22;0 , qo 2 M 2 4 1;0 4 1;0 M 2 4 2;0 4 2;0 C1 , C1 , qo + α1,1 C1 , C1 , qo + α2,1 (535) 2 2 and an anisotropic part M α2,2 α1,0 4 C22;0 , 4 C01;0 , qo + 4 C01;0 , 4 C22;0 , qo 2 M 4 1;0 4 2;0 M C1 , C1 , qo . + α2,1 α1,1 4 C12;0 , 4 C11;0 , qo + 2 2
(536)
Combining the previous equations, we can find for the isotropic part of fifthorder coma 4 5 2 1 2 2 2 −M β3,2 − α2,2 α2,0 + 8α2,2 α0,0 + α1,1 + α2,1 qa qo 3 2 2 4 1 2 1 2 (qo qa )q2a qa − 4M β3,2 + α2,2 α2,0 − 2α2,2 α0,0 − α2,1 − α1,1 3 2 2 (537) and for the anisotropic part 2 M(5β2,2 + 2α2,2 α1,0 − 2α2,1 α1,1 ) q2a Jˆ2 qo − 4M(β2,2 + 2α2,2 α1,0 )q2o (qo qa )Jˆ2 qa .
(538)
LIE ALGEBRAIC METHODS
353
H. Peanut Aberration: Seidel Weight 1/2 The peanut aberration has the Seidel weight 1/2. Using the same similar approach, we find the effect of g6 to be M
j ;0 βj,1 6 C1 , qo
j
= Mβ3,1 6 C13;0 , qo + Mβ1,1 6 C11;0 , qo + Mβ2,1 6 C12;0 , qo . (539) The constraint for m1 and m2 in the double Lie bracket (520), which describes the effect of g4 , reads m1 + m2 = 1; hence, 1 M 2
j ;0 j ;0 αj1 ,m1 αj2 ,m2 4 Cm11 , 4 Cm22 , qo .
(540)
m1 +m2 =1 j1 ,j2
After a short calculation, we derive 4 −2M β3,1 − β1,1 + 3α2,1 α0,0 + 2α1,1 α1,0 q2a (qo qa )qo 5 4 2 − M β3,1 + 4β1,1 − α2,1 α2,0 + 4α2,2 α2,−1 − 2α2,1 α0,0 − α1,1 α1,0 5 3 4 2 × q2o q2a qa − 2M β3,1 − 2β1,1 + 4α2,2 α2,−1 − α2,1 α2,0 5 3 − 2α2,1 α0,0 − 2α1,1 α1,0 (qo qa )2 qa (541)
for the isotropic part and 2 2M 2β2,1 − α2,1 α1,0 − α1,1 α2,0 + 4α2,2 α1,−1 + α1,1 α0,0 q2a (qo qa )Jˆqo 3 − M(β2,1 + 6α1,1 α0,0 + 8α2,2 α1,−1 )q2a q2o Jˆqa 2 − 2M β2,1 + α2,1 α1,0 + 4α2,2 α1,−1 − α1,1 α2,0 − 2α1,1 α0,0 3 × (qo qa )2 Jˆqa for the anisotropic part.
(542)
ˇ RADLI CKA
354
I. Elliptical Coma: Seidel Weight −1/2 The fifth-order polynomials with Seidel weight −1/2 describe the elliptical coma. The contribution of g6 then takes the form j ;0 Mβj,0 6 C0 , qo = M β3,0 6 C03;0 , qo + β1,0 6 C01;0 , qo j
+ β2,0 6 C02;0 , qo + β0,0 6 C00;0 , qo . (543)
We now find the constraint on m1 and m2 in the form m1 + m2 = 0; hence, the contribution of g4 reads j ;0 j ;0 1 M αj1 ,m1 αj2 ,m2 4 Cm11 , 4 Cm22 , qo . (544) 2 m1 +m2 =0 j1 ,j2
Substituting the forms of polynomials and calculating the Lie bracket, we derive 3 2 2 −M β1,0 + β3,0 + 8α2,2 α2,−2 − α2,1 α2,−1 + α2,0 + 5α1,1 α1,−1 5 9 4 1 2 + α2,0 α0,0 − α1,0 + 2α0,0 q2o q2a qo 3 2 6 8 2 8 2 − M β3,0 − 3β1,0 + 2α1,0 − α2,0 + 2α2,1 α2,−1 + α2,0 α0,0 5 9 3 2 (qo qa )2 qo − 2α1,1 α1,−1 − 2α0,0 6 2 − M 2β1,0 + β3,0 − 2α1,1 α1,−1 − 4α2,0 α0,0 − α1,0 + 16α2,2 α2,−2 5 4 2 q2o (qo qa )qa + 2α2,1 α2,−1 − α2,0 (545) 3 for the isotropic elliptical coma of the fifth order and 2 M β2,0 + 3β0,0 − 2α1,0 α0,0 − α1,0 α2,0 q2o q2a Jˆqo 3 + M 2β2,0 − 3β0,0 + 4α1,0 α0,0 + 2α2,1 α1,−1 2 8 − α1,0 α2,0 + 2α1,1 α2,−1 (qo qa ) Jˆqo 3 2 + 2M −β2,0 − 4α2,1 α1,−1 + α1,0 α2,0 − α1,0 α0,0 q2o (qo qa )Jˆqa (546) 3 for the anisotropic elliptical coma.
LIE ALGEBRAIC METHODS
355
J. Astigmatism and Field Curvature: Seidel Weight −3/2 The astigmatism and the field curvature are described by polynomials linear in p; in the case of fifth-order polynomials, they have the Seidel weight −3/2. Hence, the effect of g6 reads j ;0 M βj,−1 6 C−1 , qo j
3;0 1;0 2;0 , qo + β1,−1 6 C−1 , qo + β2,−1 6 C−1 , qo = M β3,−1 6 C−1
(547)
and the effect of g4 1 M 2
j ;0 j ;0 αj1 ,m1 αj2 ,m2 4 Cm11 , 4 Cm22 , qo .
(548)
m1 +m2 =−1 j1 ,j2
After calculation, one can find 4 2 −2M β3,−1 − β1,−1 − α2,0 α2,−1 + 4α2,1 α2,−2 + α1,0 α1,−1 5 3 + α0,0 α2,−1 q2o (qo qa )qo 2 − M β3,−1 + β1,−1 − α1,0 α1,−1 − 2α0,0 α2,−1 + 4α2,1 α2,−2 5 2 2 − α2,0 α2,−1 q2o qa (549) 3 for the isotropic part of the astigmatism and field curvature and 2 2M β2,−1 − α2,0 α1,−1 + 4α1,1 α2,−2 − α1,0 α2,−1 + α0,0 α1,−1 3 × q2o (qo qa )Jˆqo 8 − M β2,−1 + α2,0 α1,−1 + 4α1,1 α2,−2 − 2α1,0 α2,−1 + 2α0,0 α1,−1 3 2 2 ˆ (550) × qo J qa for the anisotropic part.
A PPENDIX A. T HE H AMILTONIAN T RANSFORMATION Although the transformation rule for the Hamiltonian under a z-dependent canonical transformation has been derived by Cary (1977), for completeness it is summarized here. We start from the rule valid in the z-independent case
ˇ RADLI CKA
356
and the extension to the z-dependent case will be derived using the extended phase-space formalism. ˜ with 2 degrees of Let us consider two canonical coordinate systems w, w ˜ ˜ describing w freedom and an s-dependent Lie transformation w = e:g(w,z): ˜ :g( w,z): ˜ Although the transformation rule for the w. their relationship w = e Hamiltonian in the z-independent case does not differ from the transformation rule for functions, in this case it might not be obvious. Fortunately, each canonical z-dependent system with 2 degrees of freedom is equivalent to the canonical z-independent system with 2 degrees of freedom known as the extended phase space (Goldstein, 1980), in which z is one of canonical variables with canonical conjugate momentum pz . A vector in the extended phase space then reads W = (w, z, pz ), and the Hamiltonian takes the form ˜ is represented by a H¯ = H + pz . The canonical transformation e:g(w,z): ˜ E :g( W): , which does not depend on the independent canonical transformation e variable t in the extended phase space. For the Lie operator in the extended phase space the notation :f :E —with the Poisson bracket defined according to ∂g ∂h ∂h [g, h]E = [g, h] + ∂g ∂z ∂pz − ∂pz ∂z —was used. The transformed Hamiltonian then is written as ˜ E ¯ ˜ ˜ = e:g(W): H¯˜ (W) H (W) (551) and using the coordinates in the original phase space can be written as ˜ s) + p˜ z H˜ (w,
˜ E H (w, ˜ s) + pz = e:g(w,s): . / . . // ∂g 1 ∂g ∂g 1 ˜ :g(w,z): ˜ z) + =e + g, + g, g, + · · · + p˜ z , H (w, ∂z 2 ∂z 6 ∂z 1 ˜ z) ∂g(w, ˜ ˜ :g( w,z): ˜ z) = e ˜ z) + dθ eθ:g:(w,z) H˜ (w, , (552) H (w, ∂z 0
which is the desired expression. The situation is described by the diagram H¯ (W)=H (w,z)+pz
(w, z) ˜ e:g(w,z):
˜ z) (w,
H (w, z) ?
˜ z) H˜ (w,
(W, t) ˜
e:g(W):E
˜ t) (W, H˜ (w,z)=−p ˜ z
H¯ (W) ˜
e:g(W):E
˜ H˜¯ (W).
(553)
LIE ALGEBRAIC METHODS
357
A PPENDIX B. T HE F ORM OF THE I NTERACTION H AMILTONIAN FOR THE ROUND M AGNETIC L ENS This appendix presents the form of the interaction Hamiltonian of the round magnetic lens. We use the notation introduced in Section VII.C. The direct calculations were done in the mathematical program Maple (Maple, 2006). We suppose that H4int and H6int are in the form as in Eq. (493); the coefficients in the fourth-order Hamiltonian then read 2 1 a2,2 = W −1 −αα t 4 + α 2 t 2 + t 2 , (554a) 8 1 a2,1 = W −1 α 4 − αα t 3 s + α 2 stt 2 + α 2 s t t 2 + s t 3 , (554b) 2 1 a2,0 = W −1 4α 2 sts t + −3αα + 3α 4 t 2 s 2 + 3s 2 t 2 4 (554c) + α2t 2s 2 + α2s 2t 2 , 1 a0,0 = W −1 α 2 s 2 t 2 − 2α 2 sts t + α 2 t 2 s 2 + 3α 2 W 2 , (554d) 6 1 a2,−1 = W −1 s 3 t + α 2 s t s 2 + α 2 sts 2 + α 4 − αα ts 3 , (554e) 2 1 −1 4 4 a2,−2 = W s + α − αα s 4 + 2 α 2 s 2 s 2 , (554f) 8 1 3 1 2 1 2 α − α t + αt , (555a) a1,1 = 8 2 2 1 (555b) a1,0 = − α + α 3 ts + s t α, 4 1 3 1 2 1 2 α − α s + αs (555c) a1,−1 = 2 8 2 and for the sixth-order Hamiltonian, they take the form 4 1 3 α − αα t 2 t 4 + 3α 2 t 2 t 4 b3,3 = 16W 1 1 2 (4) 3 6 6 6 , (556) αα − α α + α + α t + t + 12 8 1 b3,2 = 12α 2 s t 3 t 2 + 6α 2 stt 4 + −2αα + 6α 4 t 4 t s 16W i + 12α 4 − 4αα t 2 t 3 s 1 (4) 3 2 5 3 6 5 t s + 6s t , (557) + −6α α + 6α + αα + α 2 4
ˇ RADLI CKA
358 b3,1 =
b1,1 =
b3,0
1 1 3 3 15 − αα + α 4 t 4 s 2 + α 2 s 2 t 4 + s 2 t 4 + 6α 2 sts t 3 4W 4 4 4 4 9 4 3 2 2 2 9 2 2 2 2 α − αα s t t + α s t t + 6α 4 − 2αα st 3 s t + 2 2 2 5 15 15 15 αα (4) + α 2 + α 6 − α 3 α s 2 t 4 , (558) + 16 32 4 4 1 1 15 − α −30W 2 α 3 + 5W 2 α t 2 + α 2 W 2 t 2 5W 8 4 1 3 1 − α −6α 3 + 2α t 2 s 2 t 2 + α 2 s 2 t 4 − α 12α 3 − 4α s t st 3 8 4 8 3 2 3 1 3 3 2 4 2 2 2 2 (559) − α sts t − α −6α + 2α s t + α s t t , 2 8 4
1 5s 3 t 3 + 9α 2 sts 2 t 2 + −3αα + 9α 4 s t s 2 t 2 = 4W 5 5 + 5α 6 + αα (4) − 5α 3 α + α 2 s 3 t 3 + 3α 4 − αα st 3 s 2 12 8 4 2 3 2 3 2 2 3 2 (560) + 3α s t t + 3α − αα s tt + 3α s t s ,
b1,0
b3,−1
1 1 − α −30W 2 α 3 + 5W 2 α ts + 15α 2 W 2 s t = 10W 2 3 − α 3α − α tt 2 s 3 − 2α α 3 − α t s s 2 t 2 + 3α 2 s t 3 s 2 3 2 3 2 2 2 2 3 2 (561) − α 3α − α s st − 6α sts t + 3α s t t ,
1 15 4 2 s t + 4(3α 4 − αα )s 3 ts t + 12α 2 ts 3 t s = 8W 2 3 4 1 4 2 α − αα s t + 9α 2 s 2 t 2 s 2 + 3 3α 4 − αα s 2 t 2 s 2 + 2 2 15 6 15 2 5 (4) 15 3 4 2 3 2 2 4 α + α + αα − α α s t + α t s , + 2 16 8 2 2 (562)
LIE ALGEBRAIC METHODS
b1,−1
b3,−2
359
W 2 1 − α 5α − 30α 3 s 2 + 15α 2 W 2 s 2 − α α − 3α 3 t 2 s 4 = 20W 2 + 2α α − 3α 3 ts t s 3 − α α − 3α 3 s 2 s 2 t 2 + 3α 2 s 2 t 2 s 2 (563) − 6α 2 ts 3 t s + 3α 2 t 2 s 4 , 1 3s 5 t + 2 3α 4 − αα s 2 ts 3 + 3α 2 sts 4 + 6α 2 s 3 t s 2 = 8W 3 2 1 α − 3α 3 α + αα (4) + 3α 6 ts 5 + −αα + 3α 4 s 4 t s , + 8 4 (564) 1 b3,−3 = −αα + 3α 4 s 2 s 4 + 3α 2 s 2 s 4 + s 6 16W 1 2 1 3 6 (4) s6 , (565) α − α α + α + αα + 8 12
b2,2 =
3 5 3 1 (4) 4 1 3 α − α 2 α + α t + − α + α 3 t 2 t 2 , 8 16 192 16 4 (566) 3 3 1 2 3 3 1 2 3 α − α t ts + α − α t ts + αs t 3 = 2 8 2 8 2 1 (4) 3 5 3 2 3 α + α − α α t s, + (567) 48 2 4
3 4 αt + 8
b2,1
1 3 1 (4) 9 2 9 5 2 2 α − α α + α t s b2,0 = − α + α 3 s 2 t 2 + 16 4 32 8 4 1 3 3 2 2 9 2 2 1 3 + − α + α t s + αs t + − α + 3α t s ts, 16 4 4 4 (568) 1 3 3 2 2 1 (4) 3 2 3 5 2 2 b0,0 = − α + α s t + α − α α + α t s 20 5 240 20 10 1 3 3 2 2 3 2 2 1 3 3 α − α t s ts, + − α + α t s + αs t + 20 5 10 20 5 (569)
ˇ RADLI CKA
360
3 3 1 2 3 3 1 2 3 α − α s ts + α − α s ts + s 3 t α 2 8 2 8 2 1 (4) 3 5 3 2 α + α − α α ts 3 , (570) + 48 2 4 3 5 3 2 1 (4) 4 3 5 4 b2,−2 = s + α s α − α α + α 8 16 192 8 1 3 3 2 2 + − α + α s s . (571) 16 4
b2,−1 =
R EFERENCES Barth, J., Lencová, B., Wisselink, G. (1990). Field-evaluation from potentials calculated by the finite-element method for ray tracing—The slice method. Nucl. Instr. Methods Phys. Res. A 298, 263–268. Bäuerle, G., Kerf, E. (1999). Lie Algebras, Part 1—Finite and Infinite Dimensional Lie Algebras and Applications in Physics. Studies in Mathematical Physics. Elsevier, New York. Bazzani, A. (1988). Normal forms for symplectic maps of R2n . Celest. Mech. 42, 107–128. Bazzani, A., Todesco, E., Turchetti, G., Servizi, G. (1994). A normal form approach to the theory of nonlinear betatronic motion. CERN. Yellow Report 94/02, Geneva. Berz, M. (1999). Modern map methods in particle beam physics. Adv. Imaging Electron Phys. 108. Cary, J. (1977). Time-dependent canonical transformations and the symmetry-equals-invariant theorem. J. Math. Phys. 18, 2432–2435. Cary, J. (1981). Lie transform perturbation-theory for Hamiltonian systems. Phys. Rep. 79 (2), 129–159. Cheng, M., Shanfang, Z., Yilong, L., Tiantong, T. (2006). Differential algebraic method for arbitrary-order chromatic aberration analysis of electrostatic lenses. Optik 117, 196–198. COSY INFINITY (2007). COSY INFINITY, computer code. http://bt.pa. msu.edu/index_files/cosy.htm. Dragt, A. (1982). Lectures on nonlinear dynamics in 1981 Fermilab Summer School. AIP Conf. Proc. 7, 147. Dragt, A., Forest, E. (1986a). Foundations of a Lie algebraic theory of geometrical optics. In: Sánchez-Mondragón, J., Wolf, K.B. (Eds.), Lie Methods in Optics. Springer-Verlag, Berlin, pp. 45–103. Dragt, A., Forest, E. (1986b). Lie algebraic theory of charged-particle optics and electron-microscopes. Adv. Electr. Electron Phys. 67, 65.
LIE ALGEBRAIC METHODS
361
Dragt, A., Neri, F., Rangarajan, G., Douglas, D., Healy, L., Ryne, R. (1988). Lie algebraic treatment of linear and nonlinear beam dynamics. Annu. Rev. Nucl. Sci. 38, 455–496. Dragt, A.J. (1987). Elementary and advanced Lie algebraic methods with applications to accelerator design, electron microscopes and light optics. Nucl. Inst. Meth. Phys. Res. A 258, 339–354. Dragt, A.J. (1990). Numerical third-order transfer map for solenoid. Nucl. Inst. Meth. Phys. Res. A 298, 441–459. Forest, E. (1998). Beam Dynamics—A New Attitude and Framework. Harwood Academic Publishers, New York. Glaser, W. (1935). Zur Bildfehlertheorie des Elektronenmikroskops. Z. Physik 97, 177–201. Goldstein, H. (1980). Classical Mechanics. Addison-Wesley. Harrington, R. (1967). Matrix methods for field problems. Proc. Inst. Electr. Electron. Eng. 55 (2), 136–149. Hawkes, P., Kasper, E. (1989). Principles of Electron Optics. Academic Press, London. Hu, K., Tang, T.T. (1998). Lie algebraic aberration theory and calculation method for combined electron beam focusing-deflection systems. J. Vac. Sci. Technol. B 16 (6), 3248–3255. Hu, K., Tang, T.T. (1999). Lie algebra deflection aberration theory for magnetic deflection systems. Optik 110 (1), 9–16. Jackson, J. (1998). Classical Electrodynamics. John Wiley & Sons, New York. Jagannathan, R., Khan, S. (1996). Quantum theory of the optics of charged particles. Adv. Imaging Electr. Phys. 97, 257. Khursheed, A. (1999). The Finite Element Method in Charged Particle Optics. Kluwer Academic. Lencová, B. (1995). Computation of electrostatic lenses and multipoles by the first-order finite-element method. Nucl. Instr. Meth. Phys. Res. A 363, 190–197. Lencová, B., Zlámal, J. (2006). A new program for the design of electron microscopes. In: Abstracts—CPO7, pp. 71–72. Available at http://www.mebs.co.uk. Lichtenberg, A., Lieberman, M. (1982). Regular and Stochastic Motion. Springer-Verlag, New York. Liu, Z.X. (2002). Fifth-order canonical geometric aberration analysis of electrostatic round lenses. Nucl. Instr. Methods Phys. Res. A 488, 42–50. Liu, Z.X. (2006). Differential algebraic method for aberration analysis of typical electrostatic lenses. Ultramicroscopy 106, 220–232. Manikonda, S., Berz, M. (2006). Multipole expansion solution of the Laplace equation using surface data. Nucl. Inst. Meth. Phys. Res. A 558, 175–183. Maple (2006). Symbolic Computation Group. University of Waterloo, Waterloo, Ontario, Canada. Available at maplesoft.com.
362
ˇ RADLI CKA
MARYLIE (2006). Available at www.physics.umd.edu/dsat/dsatmarylie.html. Matsuya, M., Saito, M., Nakagawa, S. (1995). A semianalytical aberration calculation method using polynomials and Lie algebra. Nucl. Instr. Meth. Phys. Res. A 363, 261–269. MEBS (2007). Munro Electron Beam Software, Ltd, London, England. Meyer, K. (1974). Normal form for Hamiltonian. Celest. Mech., 517–522. Navarro-Saad, M., Wolf, K. (1986). The group theoretical treatment of aberrating systems 1. J. Math. Phys. 27 (5). Press, W., Teukolsky, S., Vetterling, W.T., Flannery, B. (1986). Numerical Recipes in Fortran. Cambridge University Press. Rose, H. (1987). Hamiltonian magnetic optics. Nucl. Instr. Meth. A 258, 374– 401. Rose, H. (2004). Outline of an ultracorrector compensating for all primary chromatic and geometrical aberrations of charged-particle lenses. Nucl. Instr. Meth. Phys. Res. A 519, 12–27. Rouse, J., Munro, E. (1989). 3-dimensional computer modeling of electrostatic and magnetic electron—Optical components. J. Vac. Sci. Tech. B 7 (6), 1891–1897. Scherzer, O. (1933). Zur Theorie der elektronenoptischen Linsenfehler. Z. Physik 80, 193–202. Steinberg, S. (1986). Lie series, Lie transformations, and their applications. In: Sánchez-Mondragón, J., Wolf, K.B. (Eds.), Lie Methods in Optics. Springer-Verlag, Berlin, pp. 45–103. Sturok, P. (1955). Static and Dynamic Electron Optics. Cambridge University Press, Cambridge, MA. Venturini, M., Dragt, A. (1999). Accurate computation of transfer maps from magnetic field data. Nucl. Instr. Meth. Phys. Res. A 427, 387–392. Wang, L., Rouse, J., Liu, H., Munro, E., Zhu, X. (2004). Simulation of electron optical systems by differential algebraic method combined with Hermite fitting for practical lens fields. Microelectron. Eng. 90, 73–74. Wolf, K. (1986). The group theoretical treatment of aberrating systems 2. J. Math. Phys. 27 (5). Wolf, K. (1987). The group theoretical treatment of aberrating systems 3. J. Math. Phys. 28 (10). Ximen, J. (1995). Canonical aberration theory in electron optics up to ultrahigh–order approximation. Adv. Imaging Electr. Phys. 91, 1–36. Zhu, X., Liu, H., Munro, E., Rouse, J. (1997). Analysis of off-axis shaped beam systems for high throughput electron beam lithography. In: Proceedings of Charged Particle Optics III. SPIE, pp. 47–61. Zhu, X., Munro, E. (1989). A computer-program for electron-gun design using second-order finite–elements. J. Vac. Sci. Technol. B 7 (6), 1862– 1869.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 151
Recent Developments in Electron Backscatter Diffraction VALERIE RANDLE Materials Research Centre, School of Engineering, University of Wales Swansea, Swansea, UK
I. Introduction . . . . . . . . . . . II. Fundamental Aspects of EBSD . . . . . . A. Early Development . . . . . . . . B. The Basic EBSD System . . . . . . C. The EBSD Pattern . . . . . . . . III. The Orientation Map and Data Processing . . . A. Core Concepts in EBSD Data Processing . . B. Orientation Maps . . . . . . . . . IV. Established Applications of EBSD . . . . . V. Recent Advances in EBSD . . . . . . . A. Overview of Advances in EBSD Technology . B. Overview of Expansion in EBSD Applications VI. Advances in EBSD Technology . . . . . . A. Data Collection Optimization . . . . . 1. Camera Technology and Speed . . . . 2. Resolution . . . . . . . . . . B. Interface Characterization . . . . . . 1. Connectivity . . . . . . . . . 2. The Interface Plane . . . . . . . . C. Three-Dimensional EBSD . . . . . . 1. Sectioning . . . . . . . . . . 2. Focused Ion Beam Tomography . . . . D. Multiphase Materials . . . . . . . . 1. Phase Identification . . . . . . . 2. Orientation Relationships . . . . . . E. Strain Assessment . . . . . . . . VII. Trends in EBSD Usage . . . . . . . . References . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
363 364 365 368 369 374 374 375 380 382 382 384 388 388 388 390 393 394 396 403 403 404 407 407 408 409 410 411
I. I NTRODUCTION The aim of this review is to describe state-of-the-art electron backscatter diffraction (EBSD) and to capture some of the more significant advances in the philosophy, technology, and application of EBSD that have emerged in the past four or five years. EBSD is a scanning electron microscope (SEM)363 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)00405-3
Copyright 2008, Elsevier Inc. All rights reserved.
364
RANDLE
based technique that has become a powerful mainstream experimental tool for materials scientists, physicists, geologists, and others. It is not an exaggeration to say that EBSD has revolutionized the microcharacterization of crystalline materials. EBSD is based on acquisition and analysis of diffraction patterns from the surface of a specimen. To obtain an EBSD diffraction pattern a stationary beam of electrons is sited on the specimen. Backscattered electrons diffract at crystal lattice planes within the probe volume, according to Bragg’s law. The fraction of diffracted backscattered electrons that are able to escape from the specimen surface is maximized by tilting the specimen so that it creates a small angle with the incoming electron beam. The diffracted signal is collected on a phosphor screen and viewed with a low-light video camera. These diffraction patterns provide crystallographic information that can be related back to the position of origin on the specimen. Evaluation and indexing of the diffraction patterns is output in a variety of both statistical and pictorial formats. The most exciting and revealing of these outputs is the orientation map, which is a quantitative depiction of the microstructure of a region in terms of its crystallographic constituents. It has been more than twenty years since the inception of EBSD as an add-on facility to the capabilities of an SEM. Following commercialization of the product at an early stage, EBSD steadily attracted interest in SEM user communities in which crystalline materials are involved. As the capabilities and sophistication of EBSD evolved, so did its popularity. Section II of this review sets the scene by tracing the development and fundamental aspects of EBSD. Key aspects of data processing, particularly the orientation map, are then described in Section III, followed by a description of existing applications of EBSD in Section IV. Thereafter the emphasis is on recent advances in the field of EBSD. Section V identifies these advances and details their development. EBSD advances are addressed from the perspective of improvements in the technology and the consequent expansion in applications. Section VI describes advances in EBSD technology in detail. These include data collection optimization, interface characterization, threedimensional (3D) EBSD, and phase identification. Section VII notes trends in EBSD technology.
II. F UNDAMENTAL A SPECTS OF EBSD This section briefly describes how EBSD has evolved and how the basic information in the diffraction pattern is acquired and evaluated. More detailed information on EBSD operation, data acquisition, processing, specimen preparation, SEM operating conditions, diffraction pattern processing,
ELECTRON BACKSCATTER DIFFRACTION
365
calibration, and so forth is not included here but is covered on the EBSD manufacturers’ websites and in several publications (Anon., 2007a, 2007c; Randle, 2003; Randle and Engler, 2000; Schwartz, Kumar and Adams, 2000). A. Early Development It has been more than fifty years since “high-angle Kikuchi patterns” were first obtained from bulk specimens (Alam, Blackman and Pashley, 1953). Following the first commercial availability of SEMs in the mid-1960s, the work on high-angle Kikuchi patterns from reflected electrons was resurrected, in parallel with development of other aspects of diffraction in the SEM (Venables and Bin-Jaya, 1977; Venables and Harland, 1973). Figure 1a shows an example of a Kikuchi pattern from that era. The diffraction pattern, which is from evaporated gold, was photographed directly from the phosphor screen with a standard 35 mm camera. The black projections in Figure 1a
(a) F IGURE 1. (a) One of the first hard copy EBSD patterns taken from evaporated gold. (Courtesy J. Venables.) (b) Hard copy EBSD pattern from single crystal silicon. (c) Hard copy EBSD pattern from tungsten.
366
RANDLE
(b)
(c) F IGURE 1.
(continued)
ELECTRON BACKSCATTER DIFFRACTION
367
are a calibration tool: spheres (ball bearings) were mounted in front of the phosphor screen so that their cast shadows allowed the diffraction pattern origin (the emitting point of the backscattered electrons) to be located from the intersection of lines passing through the long axes of the elliptical shadows. Later, in the 1980s, diffraction patterns were recorded as hard copy on cut film that was inserted via a modification to the microscope column (Dingley, Baba-Kishi and Randle, 1994). Figures 1b and c show diffraction patterns recorded in this manner. Figure 1b is from a low atomic number element, single crystal silicon wafer cleaved on {001}; and Figure 1c is from a high atomic number element, tungsten. Major zone axes are marked on both figures. The greater contrast and detail in Figure 1c compared to Figure 1b is due to the larger backscattering coefficient of the higher atomic number element. EBSD was launched from its status as a laboratory instrument into a sophisticated analysis system, which diffraction patterns could be both viewed and indexed in real time by video detection (Dingley, 1981). These landmarks in EBSD history may sound straightforward now, but they involved devising solutions to several challenges, such as how to obtain sufficient signal gain from a low-light TV camera to allow viewing the patterns via a phosphor screen rather than on cut film; how to superimpose computer graphics on the diffraction pattern; and how to achieve accurate calibration so that diffraction patterns could be indexed. Details of how these technique developments of EBSD evolved are available elsewhere (Dingley, 1981; Randle, 2003; Randle and Engler, 2000). By 1984 the first EBSD system was available commercially. Initially commercial EBSD systems were used for the measurement of microtexture defined as a population of crystallographic orientations whose individual components are linked to their location within the microstructure. Diffraction patterns were obtained by manually siting a stationary electron probe on an area of interest. Patterns were solved, for cubic symmetries only, by identification of the Miller indices of zone axes. Approximately 100 data points could be indexed in an hour. Data could be output in the form of pole figures or used to compute the misorientation between individual orientations from neighboring grains, hence accessing grain boundary parameters. The indexing procedure was superseded by manual location of Kikuchi bands (Dingley, 1989; Schmidt and Olesen, 1989) and extension to other crystal systems in addition to cubic. The next landmark in the development of EBSD was automatic indexing of diffraction patterns by computer. The main obstacle to this achievement was that most EBSD patterns have a low signal-to-noise ratio (SNR) and, whereas the eye can readily pick out the location of a Kikuchi band on the diffuse gray background, the task for reliable computer recognition was
368
RANDLE
challenging. Consequently, a primary thrust of the development work was the inclusion of enhanced contrast in diffraction patterns and improvement of camera hardware, in addition to parallel work to develop algorithms for band detection and pattern indexing. The Hough transform (Section II.C) became the basis for pattern recognition (Krieger-Lassen, Juul Jensen and Conradsen, 1992). As a consequence of the maturation of reliable automated pattern indexing orientation imaging microscopy (OIM), known generically as orientation mapping, was developed (Adams, Wright and Kunze, 1993). Orientation mapping is the most elegant and versatile outcome of EBSD. In order to compose an orientation map, the beam is moved automatically in regular steps over a predefined region of the stationary specimen. An EBSD diffraction pattern is automatically captured and solved after each step. The output is an orientation map—a visual portrayal of some orientation-related aspect of the microstructure at each sampling point, accompanied by the crystallographic orientation. In summary, by the beginning of this century EBSD had made its mark. B. The Basic EBSD System The components of an EBSD system are a low-light video camera interfaced to a phosphor screen, a computer and dedicated software to control the camera, data processing, pattern indexing, analysis, and output. The EBSD system can be interfaced to virtually any SEM. The crucial prerequisite for EBSD analysis is that the incident beam must make a small angle, usually 20 degrees, with the specimen surface in order to maximize the diffracted backscattered signal. This tilting means that the depth sampled by the electron probe becomes very shallow, typically 10–40 nm, depending on the applied accelerating voltage and the atomic number of the target volume. Within this volume the backscattered electrons will have undergone only a few inelastic scattering events and so can still escape from the specimen surface, relatively undeviated, to contribute to the diffraction pattern. The area of the sampled volume on the specimen surface is determined by the incident beam probe diameter in the SEM. The backscattered electrons form a Kikuchi-type diffraction pattern arising from the sampled volume of specimen. The diffraction pattern impinges on the phosphor screen and is captured by the low-light video camera, which is interfaced to the computer software. The software controls not only the enhancement, processing, evaluation, and indexing of the diffraction pattern but also SEM image capture and movement of the electron beam on the specimen. Figure 2 shows a typical view of EBSD hardware inside a
ELECTRON BACKSCATTER DIFFRACTION
F IGURE 2.
369
View of EBSD setup inside the SEM chamber.
microscope chamber, indicating the juxtaposition of the tilted specimen, the EBSD camera, and the microscope polepiece. A tungsten-filament SEM is the norm in many laboratories, which is satisfactory for standard EBSD applications and the most cost-effective option. A LaB6 emission source has advantages in terms of available beam current. Field emission gun SEMs (FEGSEMs) have become increasingly popular for EBSD because an increase in spatial resolution of two to three times that of a tungsten-filament SEM can be obtained (El Dasher, Adams and Rollett, 2003; Humphreys et al., 1999). A slow-scan charge-coupled device (SSCCD) video camera with solid-state sensors is now used for EBSD, replacing the first generation of silicon-intensified target (SIT) cameras. For most EBSD work, rapid data acquisition is a primary consideration; a TV-rate camera combines sufficient speed with adequate pattern contrast and definition. Recent improvements to the EBSD system, including camera technology, are described in Section VI.A. C. The EBSD Pattern The EBSD Kikuchi diffraction pattern provides a projection of the geometry of the crystal lattice within the sample volume. As shown in Figure 1, the pattern consists of many bands, or pairs of Kikuchi lines, which represent
370
RANDLE
F IGURE 3.
Simulation of a spherically projected EBSD pattern from aluminum.
lattice planes. (Strictly, these are pseudo-Kikuchi bands because of the mechanism of their formation, but in practice they are usually referred to as Kikuchi bands.) Recent thinking on the formation mechanisms of EBSD patterns is recorded in detail elsewhere (Zaefferer, 2007). The diffraction pattern, as recorded on the flat phosphor screen or film, is projected gnomonically. This means that the plane of projection is tangential to a notional reference sphere that has the same origin as the specimen. The radius of the reference sphere is given by the distance between the pattern source point on the specimen and the origin of the diffraction pattern on the recording medium. Hence, the magnification of the pattern is controlled by the specimen-to-screen distance. For comparison with the gnomonically projected patterns, Figure 3 shows a computer simulation of an equivalent pattern projected onto the intermediate reference sphere. The diffraction pattern is used in three ways in EBSD practice. First, it is used to obtain the crystallographic orientation of the volume of crystal, where the phase of the sampled volume is known. This is the major application. If the phase is unknown, the diffraction pattern is used to deduce the crystal symmetry and the atoms in the unit cell. Finally, the clarity or, conversely, the
ELECTRON BACKSCATTER DIFFRACTION
371
diffuseness of the pattern can be evaluated, which in turn is related to lattice strain. Figure 4a shows a processed EBSD pattern from an austenitic steel. The raw EBSD pattern is usually very dim because the SNR is inherently poor. Two types of image-processing routines are applied to enhance the pattern: averaging and background subtraction. After these processing steps the diffraction pattern contains adequate contrast to allow Kikuchi bands to be readily picked out and to provide uniform average intensity, which is important for automated pattern indexing. The quality (clarity and sharpness) of the Kikuchi bands in the diffraction pattern is governed principally by the presence of lattice defects and the condition of the specimen surface. The presence of crystal defects such as dislocations progressively broadens the Kikuchi bands and therefore pattern quality is an indicator of lattice strain. The pattern quality parameter can also be used to differentiate phases in a material. For example, it has been applied in this manner to steels and has allowed the volume fraction of phases to be measured (Petrov et al., 2007; Wilson, Madison and Spanos, 2003). With regard to specimen surface condition, the electron probe must be able to access adequately the top few nanometers of specimen surface, and thus surface contamination and/or poor preparation is extremely detrimental to EBSD analysis. Low-quality patterns reduce the accuracy and speed of the pattern solve routine, and so rigorous specimen preparation is essential. Figure 4a is an example of a high-quality processed pattern wherein the Kikuchi lines bands are sharp and there is good contrast. Various metrics can be used to quantify the diffraction pattern quality. These are described and compared in detail elsewhere (Wright and Nowell, 2006). The diffraction pattern is evaluated automatically by the EBSD software in terms of a pattern quality parameter, crystallographic indexing, and a confidence parameter of the indexing. Kikuchi bands are detected automatically by a procedure that transforms the Kikuchi lines into points, since the location of a point (but not a line) can be determined very accurately. The Hough transform was adapted for this process (Krieger-Lassen, Juul Jensen and Conradsen, 1992). The equation of a straight line in (x, y) coordinate space can be written as ρ = x cos ϕ + y sin ϕ
(1)
where ρ is the perpendicular distance from the origin to the line and ϕ is the angle between the line trace and the x-axis. These variables are indicated on Figure 5a. In Figure 5a the centerline of the Kikuchi band is defined by coordinates on it—four in this case. Points that lie on the line in the (x, y) space maintain the same distance ρ and angle ϕ with respect to the origin and hence they accumulate in one point in (ϕ, ρ) space (i.e., Hough space), as
372
RANDLE
F IGURE 4. Sequence showing online diffraction pattern acquisition and indexing. (a) Processed pattern from an austenitic steel. (b) Hough transform (medium resolution) of the diffraction pattern in (a). (c) Detection of bands in the processed pattern. (d) Indexing of pattern. (See Color Insert.)
ELECTRON BACKSCATTER DIFFRACTION
373
(a)
(b) F IGURE 5. The principle of the Hough transform. (a) A single Kikuchi band in (x, y) coordinate space, in arbitrary units. (b) Locus of points 1, 2, 3, and 4 from (a) plotted in (ϕ, ρ) space (i.e., Hough space).
illustrated in Figure 5b. Figure 4b shows the Hough transform corresponding to the EBSD pattern in Figure 4a. Kikuchi bands are now represented by peaks of high intensity. A masking technique to match the characteristic butterfly shape of the intensity peaks in the Hough transform is used to provide sharp peaks and hence to represent accurately the centerlines of Kikuchi bands. The final part of Kikuchi band detection is reconstruction of the Kikuchi pattern in terms of the centerlines of detected bands, as shown on Figure 4c. The reconstructed Kikuchi lines in Figure 4c are a good match to the bands in the diffraction pattern. Having detected the Kikuchi bands, measurements of interplanar angles, interzonal angles, and band intensities are used to index the diffraction pattern. Figure 4d shows the indexed diffraction pattern used as an example here. The
374
RANDLE
final step is to relate the indexed EBSD pattern to the real crystallographic planes and directions in the specimen via the relationship between the EBSD pattern on the phosphor screen and the tilted specimen in the microscope. Hence, the orientation of the sampled volume is determined. More details on these procedures can be found elsewhere (Randle, 2003; Randle and Engler, 2000).
III. T HE O RIENTATION M AP AND DATA P ROCESSING Although in certain circumstances it is still desirable to obtain EBSD patterns manually from a specimen surface, almost always EBSD data are in the form of an orientation map. The orientation map has become recognized as the hallmark of EBSD. Once a region of a microstructure has been selected for study, a map is acquired by the collection and evaluation of diffraction patterns from a grid of points on the specimen surface. The grid of points constitutes the spatial coordinates of the map, and the evaluation of each diffraction pattern provides the content for each pixel in the map. Clearly, the choice of grid resolution, or step size, will influence the detail in the map. As a rule of thumb, step sizes are typically approximately one-tenth of the average grain size. An EBSD data file contains, for each sampling point, the orientation in Euler angle format, map coordinates, an indexing confidence indicator, the phase identity of each point, and a pattern quality measure. A. Core Concepts in EBSD Data Processing The basic EBSD orientation data can be output as statistical representations such as pole figures. Once the gridded diffraction data have been acquired, a wealth of options exists for processing and data output. The core concepts are (1) the connection between microstructure coordinates (the map) and the crystallographic orientation, (2) user-controlled data partitioning into sets based either on the map or on the accompanying statistical output, and (3) data exportability for advanced/customized analysis. Each of these is discussed briefly. 1. The connection between microstructure coordinates (the map) and the orientation. For the majority of cases, the output from an EBSD investigation has been generated in the form of an orientation map. The map is a fully quantitative instrument because each pixel comprises a crystallographic orientation. These data can be tapped in various of ways to allow the user to connect the coordinates of the orientation map with individual or statistical distributions of orientations and misorientations, and other measures of microstructure.
ELECTRON BACKSCATTER DIFFRACTION
375
2. User-controlled data partitioning. Data partitioning is a very powerful aspect of EBSD data processing that greatly enhances its analysis capability. Data can be grouped into user-defined sets based either on the map or on the accompanying statistical output. 3. Data portability. Commercial EBSD software packages have been designed to provide a comprehensive range of standard options, such as microtexture and pole figure determination, to provide a general tool kit. However, for more customized analysis, EBSD data can readily be exported to other applications in the form of a text file or as stored diffraction patterns. B. Orientation Maps The content of an orientation map can be chosen by means of appropriate selection of output options provide maximum visual impact and appreciation of pertinent features. Furthermore, the map components can be directly related to a statistical distribution of the data (and vice versa), for example, by maintaining color coding in both the map and the statistical distribution such as a pole figure or histogram. Many possibilities exist for customized map components, which can be somewhat bewildering to a user unfamiliar with them. Initially simple options may be chosen, and a more customized repertoire is developed with experience. Three of the most common parameters to be displayed in orientation maps, which can therefore be considered as standard maps, are pattern quality, microtexture, and misorientation. A measure of pattern quality, or diffuseness, is the option that can most closely resemble an optical or secondary electron image; hence, it reveals the general microstructure. Decreasing pattern quality frequently is shown by increasing grayscale with unsolved patterns in black. An additional advantage of the pattern quality map is that the pattern quality can be displayed even if the pattern cannot be subsequently indexed. Pattern quality maps indicate variations in lattice strain and so are used as an aid to the study of deformation. Grain boundaries are recorded in the map as darker lines because they represent regions of higher strain. These attributes of a pattern quality map are illustrated in Figure 6a. Orientation mapping is used most frequently to represent microtexture, usually displayed in terms of either Euler angles or as a single-specimen axis in the crystal coordinate system. A color (usually red, green, blue) is allocated to each of the Euler angles or crystal axes, respectively, and this scheme is used to link color in the map to orientation. A color key then is also needed to decode the map in terms of its orientation information. Figure 6b is one type of microtexture map counterpart to Figure 6a, in which the crystallographic
376
RANDLE
orientation of the specimen surface is depicted in colors. The color key is shown in Figure 6c. Note that this type of microtexture map shows only the orientation distribution normal to one specimen reference surface. Maps showing the orientation distribution normal to the other two specimen surfaces can also be generated. There are many extensions to the basic microtexture map in terms of measures of orientation. Figure 7 shows a {111} pole figure comprising all the data in Figure 6b. In this case, the texture is almost random. Orientation difference between adjacent pixels in maps is used to delineate grain boundaries, phase boundaries, or subboundaries. The most basic type of misorientation map shows high-angle boundaries and low-angle boundaries (above a lower threshold limit, say, 2 degrees) in a different color. Figure 6d is the misorientation map counterpart to Figure 6a. There are many variations on this basic grain boundary map, such as misorientation angle or axis categories, or coincidence site lattice (CSL) designations, which involve categorizing boundaries according to a Σ value (e.g., Randle, 2001). In Figure 6d, random
F IGURE 6. Basic orientation maps from an annealed nickel specimen. (a) Diffraction pattern quality map wherein decreasing pattern quality is shown by increasing grayscale and unsolved patterns are depicted black. (b) Microtexture map showing crystallographic orientations of the specimen normal direction. (c) Color key for the map in (b). (d) Misorientation map showing random high-angle (>15 degrees) boundaries in black, low-angle (3–15 degrees) boundaries in gray, and Σ3, Σ9, and Σ27 interfaces in red, blue, and yellow, respectively. (e) Raw data diffraction pattern quality map with unsolved pixels in black. (See Color Insert.)
ELECTRON BACKSCATTER DIFFRACTION
F IGURE 6.
(continued)
377
378
RANDLE
F IGURE 6.
(continued)
ELECTRON BACKSCATTER DIFFRACTION
F IGURE 7.
379
{111} pole figure comprising all the data in Figure 6b.
high-angle (>15 degrees) boundaries are black, low-angle (3–15 degrees) boundaries are gray, Σ3 interfaces are red, Σ9 interfaces are blue, and Σ27 interfaces are yellow. Most of the Σ3 interfaces are annealing twins. In practice, multiple map components often are included in a single map. The range of values and representation styles/colors for these and all other map components are specified mainly by the user to suit particular requirements. The maps shown throughout this chapter have undergone appropriate noise-reduction processing. Various noise-filtering or cleanup algorithms are used to reduce the number of null pixels in the map by extrapolating data into adjacent regions of poor pattern solve success, or by replacing single orientations that are obviously wrong. Reduced pattern solve success occurs at grain boundaries, both because two patterns might be sampled together or there is distortion in the sample volume, and it is appropriate to use filter routines so that grain boundaries are displayed as continuous lines in maps. Use of cleanup routines should be carefully controlled to avoid introduction of artefacts into the data. Figure 6e shows a portion of Figure 6a before cleanup. The majority of unsolved points are at grain boundaries. In addition to the three basic map types previously described and shown in Figure 6, there are many variations on these types and other components that can be invoked to extract the information sought. For example, grain boundary information such as misorientation axes, specific microtexture information such as proximity to a particular orientation, concurrent chemical mapping or more specialized information such as slip systems or Taylor factors (Hong and Lee, 2003) can be shown in a map. Furthermore, it is possible for customized map components to be designed by the user.
380
RANDLE
IV. E STABLISHED A PPLICATIONS OF EBSD The primary established applications of EBSD are measurement of orientation distributions, grain boundary misorientation, lattice strain, microstructure geometry, and phase differentiation or identification. These categories are not mutually exclusive, and the nature of EBSD analysis is such that much of this information is obtained concurrently. A major category of EBSD application, for which automated EBSD was originally developed, is microtexture determination (e.g., Figure 6b). The term microtexture was coined to describe the measurement of texture on the scale of the microstructure (Randle, 1992). Microtexture is fundamentally different than traditional X-ray macrotexture, because microtexture is intimately related to features in the microstructure, and furthermore the full orientation of the sampled volume of crystal is measured, rather than the orientation of lattice planes. In the context of texture measurement, EBSD is typically used to measure the microtexture of regions of interest on a specimen (e.g., around a crack or for a particular grain size category). EBSD can also be used to measure macrotexture—the overall texture (Randle and Engler, 2000). Another large category of EBSD application is interfacing parameters (e.g., Figure 6d). An orientation is expressed relative to reference axes, but alternatively it can be expressed relative to another orientation, usually a neighboring orientation, where it is termed the misorientation. The concept of misorientation gives access to the study of interfaces and orientation relationships in polycrystals. This is a major application of EBSD, which has made possible large-scale characterization of grain boundaries in polycrystals. Before the advent of EBSD, grain boundary crystallography was studied in the transmission electron microscope (TEM). Now EBSD offers advantages over TEM in terms of scale, speed, and convenience, although TEM continues to be used for studies of dislocations. Development of EBSD has been a major factor in the progress that has been made in understanding the links between grain boundary structure and properties. Processing methodology and interpretation of EBSD data for misorientation studies requires a different approach and different nomenclature than that for orientation determination (i.e., microtexture). Although it is a simple matter to produce automatically a misorientation map, extraction of meaningful data from it requires considerable knowledge of the parameters involved, not a simple undertaking because there is an array of possible options, some of which require customization of the standard processing options. One of the largest applications of EBSD to interfaces so far has been categorization of misorientation statistics in metals and linking this information with patterns of intergranular degradation (e.g., Gourgues, 2002; Lee, Ryoo and Hwang, 2003).
ELECTRON BACKSCATTER DIFFRACTION
381
There are two routes for investigation of plastic strain distribution in the microstructure using EBSD: diffraction pattern quality analysis (Kamaya, Wilkinson and Titchmarsh, 2006) and subboundary misorientation analysis (Humphreys, 2001). The quality or sharpness of the diffraction pattern degrades with increasing lattice strain (e.g., Figure 6a), which can be semiquantified. This application of EBSD is a useful microstructural tool because it allows, for example, the recrystallized fraction to be readily measured. Assessment of plastic strain by EBSD also can be made by measurement of the small misorientations associated with subboundaries in a deformed specimen. Strain is inferred from changes in the misorientation (i.e., the lattice rotation), which has an approximately linear relationship in the low-angle regime. The EBSD diffraction pattern contains the full symmetry of the crystal and as such was used when patterns were recorded onto photographic film to deduce the point group and space group identities of a specimen (Dingley, Baba-Kishi and Randle, 1994). In more recent years, this approach has been updated and extended to analysis of unknown phases by digitally recording a high-contrast, detailed diffraction pattern, combined with a spot chemical analysis from the same region. EBSD has been little used for phase differentiation or identification compared to application to texture and grain boundaries.
F IGURE 8. Orientation map showing an intergranular crack path in type 304 austenitic steel. Σ3, Σ9, and all other CSL boundaries are red, blue, and yellow, respectively. The gray background is showing pattern quality, and unsolved pixels are black. (Courtesy D. Engleberg.) (See Color Insert.)
382
RANDLE
Another application of EBSD is that not only does a high-resolution orientation map provide an accurate portrayal of the crystallographic elements of the microstructure, particularly the grain structure, but features such as macroscopic specimen defects also are revealed as unindexed points. In these respects, an EBSD map has tremendous advantages over a conventional micrograph because it portrays the microstructure in a digital format. The grain size distribution and other metrics connected with microstructure morphology such as grain shape can therefore be easily extracted. Figure 8 shows an example in which an intergranular crack is delineated in an orientation map taken from type 304 austenitic steel.
V. R ECENT A DVANCES IN EBSD The preceding sections have summarized the development stages of EBSD, the basic components of an EBSD system and evaluation of the output in terms of the crystallographic data (diffraction pattern), and the crystallographic data combined with the spatial data leading to orientation maps. Then the applications of EBSD that have become routine practice, toward which commercial EBSD software is geared, were summarized. The use of EBSD in such applications has become incorporated into many materials investigations. In many ways, the basic EBSD processing technology package reached a plateau of maturity when the landmark advances of automated pattern solving and orientation mapping occurred. Since that time, further developments have continued on a more incremental basis. The remainder of this review describes and illustrates some of the more significant advances in EBSD technology and applications that have emerged over the past approximately four years. A. Overview of Advances in EBSD Technology In recent years the biggest driver in terms of EBSD hardware and software development has been speed of data collection. This is because a major asset of EBSD always has been that data are acquired in large, statistically significant quantities from bulk specimens. This advantage is expedited if these large data sets can be obtained in a time-efficient manner, allowing cost-effective use of the SEM resource and even more reliable statistics. At the time of writing the maximum pattern solve rate, on suitable materials, is up to approximately 150–200 patterns per second (Anon., 2007a, 2007b, 2007c). There have also been progressive improvements in other aspects of data collection, particularly spatial resolution.
ELECTRON BACKSCATTER DIFFRACTION
383
Application of EBSD to interfaces has allowed the misorientation of grain boundaries to be characterized for large sample populations of boundaries. Typically such characterization has been linked to evolution of the misorientation distribution as a consequence of processing and/or properties (Tan, Sridharan and Allen, 2007). The CSL model has been used widely as a characterization tool, although with increasing realization that it is simply a geometrical model and is linked only to grain boundary properties in certain cases. The network properties of grain boundaries also have been explored via customized applications of the EBSD software (Section VI.B.1). Over the past decade or so a research area known as grain boundary engineering (GBE) has emerged. GBE aims to use processing regimes to alter the crystallography of grain boundaries to those with better properties. The development of EBSD and orientation mapping has been a key driver in this area because it has allowed the routine collection of statistically significant quantities of misorientation data. Furthermore, EBSD technology has begun to be more widely exploited for analysis of interphase interfaces and orientation relationships (Section VI.D.2). Recently the study of grain boundary populations by EBSD has undergone a radical change in approach, which can be used advantageously to supplement misorientation data. It now is possible to measure the density distribution of all five parameters (degrees of freedom) of a boundary. These five parameters comprise the misorientation (three parameters) and the crystallographic indices of the actual boundary plane or surface (two parameters). Although the grain boundary plane has been recognized as a crucial determinant of boundary properties, and it has been measured manually for small data sets, there are considerable technical challenges involved. The new five-parameter methodology has overcome many of these challenges and is resulting in an innovative stereological approach to research on interfaces (Section VI.B.2). Like any microscopy-based techniques in general, EBSD and orientation mapping is essentially a surface exploration: it provides information from a two-dimensional (2D) section of an opaque three-dimensional (3D) specimen. There has long been interest in accessing the full 3D microstructure, usually via serial sectioning. EBSD is an ideal tool for this task. Orientation mapping combined with serial sectioning is made more powerful by reconstruction of the sections to replicate the 3D microstructure. This, in itself, is an advance in EBSD technology. However, a new generation of SEM instrumentation has recently begun to take 3D EBSD to a new level. This is focused ion beam (FIB) tomography, which is used to section thin slices of specimen in situ. The FIB facility can be combined with EBSD analysis in a single column, hence paving the way for 3D orientation maps (Section VI.C). Multiphase materials and phase identification is a slow-growing area of application for EBSD. An unknown phase can be identified by analysis of the
384
RANDLE
symmetry elements in the diffraction pattern, coupled with elemental analysis in the SEM and consultation of a library of possible phase matches (Michael, 2000). Several commercial systems are now available for phase identification by EBSD. A recent advance is that in situ phase identification has now become practical (Section VI.D.1). Finally, it is already established that EBSD data can be exploited to assess the strain-related aspects of microstructure. There have been advances in this area, including assessment of elastic strain and links to dislocation content (Section VI.E). To summarize this section so far, five areas have been identified, each of which constitutes a recent advance in EBSD technology. These areas are • • • • •
Data collection optimization (Section VI.A) Interface characterization (Section VI.B) Three-dimensional EBSD (Section VI.C) Multiphase materials (Section VI.D) Strain assessment (Section VI.E) Each of these topics is discussed in the sections indicated. B. Overview of Expansion in EBSD Applications
Both the improvements in EBSD technology listed in the previous section and a general familiarity with EBSD as part of the SEM analysis capability are resulting in an escalation of EBSD applications. Figure 9a confirms this trend by showing the numbers of EBSD-related publications from 2003 to 2006. There is an exponential increase, particularly since 2004. The breakdown of these statistics is discussed at the end of this review. Probably the most profound shift in EBSD application has been that whereas EBSD originally was conceived as a tool for crystallographic evaluation, the launch of orientation mapping has meant that EBSD also has become an invaluable tool for microstructure characterization. With increased mapping efficiencies it is now viable to collect an EBSD map in place of a traditional micrograph, both to depict the microstructure and to quantify the microstructure morphology, and in some cases only as a secondary benefit to have access to all the underpinning crystallographic data. For example, in the map series in Figure 6 the grain structure and crystallography of the region is exposed in a map that was collected in a matter of minutes. Grain size distribution statistics can be readily computed by the EBSD software via the recognition of grain boundaries and hence grains. The map is digital and can therefore be input directly into various image analysis routines. Because it relies only on crystallographic characterization to determine a
ELECTRON BACKSCATTER DIFFRACTION
385
grain rather than etching or other traditional methods, the orientation map is replacing optical microscopy for characterization of grain size. For example, the effect of second-phase particles on the rate of grain refinement has been studied in an aluminum alloy (Apps, Bowen and Prangnell, 2003). The misorientation to define a grain boundary or subboundary from an orientation map is user defined and will, of course, affect the grain size calculation. A more subtle consequence of use of orientation mapping to define grains is that the specification of a grain in the orientation map is based on the orientation of neighboring points being within a certain, small tolerance of each other. However, the cumulative spread of orientation within a grain can be relatively large, especially for deformed materials (Kamaya, Wilkinson and Titchmarsh, 2006). This calls into question whether it is always appropriate to assign a single orientation to a grain. An example of microstructure characterization by EBSD in place of, or to supplement, traditional optical microscopy was shown in Figure 8, where the crack morphology is revealed in the map. This benefit of EBSD (i.e., in some cases to supersede optical microscopy for general microstructure
(a) F IGURE 9. (a) Numbers of papers published each year this century that report EBSD, showing an approximately exponential increase (source: ScienceDirect). (b) Breakdown of EBSD application according to material type based on publications in 2003. (c) Breakdown of EBSD application according to material type based on publications in 2006. (d) Breakdown of EBSD applications according to metals (or alloys based on that metal) in 2006.
386
RANDLE
(b)
(c) F IGURE 9.
(continued)
ELECTRON BACKSCATTER DIFFRACTION
387
(d) F IGURE 9.
(continued)
characterization) has only really come of age because mapping is now so rapid. Furthermore, modern EBSD allows microstructure to be defined fully as a combination of morphology, chemistry, and crystallography. This certainly constitutes an advance in EBSD and indeed in the way microstructure characterization is perceived in general. A cornerstone philosophy of materials science is that the properties of a material are intimately linked to its microstructure. A consequence of EBSD data is therefore that they are applied to extend knowledge of structure/property relationships. An increasing number of reports in the literature illustrate this point (Figure 9). A noteworthy advance is that EBSD data are increasingly being used as input to other analysis packages. This highlights the advantage of the portability of EBSD data, which can be exported as a text file. The philosophy of how recent technical advances in EBSD are underpinning a range of investigations, with the ultimate goal of understanding further the link between microstructure and properties, is captured in Figure 10. Here circles are used to depict technological developments, which are shown contributing to advanced microstructural characterization encompassing crystallography, chemistry, and morphology. In turn, these data contribute to understanding of microstructure/property links, often via technological advances in use of EBSD.
388
RANDLE
F IGURE 10. Diagram illustrating the relationship between areas of recent advances in EBSD and their applications.
VI. A DVANCES IN EBSD T ECHNOLOGY This section details the advances in EBSD identified in Section V.A. A. Data Collection Optimization This section discusses recent improvements in data acquisition speed, spatial resolution, and orientation measurement precision, the combination of which constitute the critical data collection efficiency parameters. 1. Camera Technology and Speed Efficient data collection is an advantage in an EBSD investigation. An exception is application of EBSD to phase identification, where the best quality, most detailed pattern is required (Section VI.D.1). The total time required per map pixel is the sum of the time taken to move to the map coordinates and site the electron probe, plus the time taken to collect a diffraction pattern, plus the time taken to analyze the pattern. Map points are acquired by automatically
ELECTRON BACKSCATTER DIFFRACTION
389
stepping the electron beam across the specimen surface within the field of view on the SEM image. Thereafter new fields of view can be automatically sampled according to inputted specimen coordinates, and the analyzed fields can be subsequently stitched together. In this way, maps are generated over many hours or even days, provided the microscope conditions remain stable. This method provides an efficient means of automated map collection. The greatest gains in data collection speed are due to improvements in camera technology. The improved quality of the captured diffraction pattern is due in part to an improved dynamic range of the camera; the number of distinguishable gray levels, which has increased tenfold with a CCD camera compared with a SIT camera. The improvement is also due to the recent technological advance whereby groups of pixels in the diffraction pattern can be grouped together into super pixels, which has the effect of increasing the sensitivity of the camera. For example, a block of 8 × 8 pixels can be grouped together to produce the same increase in camera sensitivity (i.e., 8 × 8 = 64). In turn, this yields the same reduction in diffraction pattern collection time. The gain in pattern collection speed brought about by binning is further enhanced by recent improvements in electronic processing, such as frame averaging and amplification of the captured diffraction pattern. This has led to increasing faster mapping rates, depending on the material. For example, a map was acquired from a martensite-ferrite steel in 12 minutes (i.e., 88 points per second). In another example, a map was acquired from a nickel superalloy at a rate of 70 points per second, with 99% of the data indexed correctly (Dingley, 2004). It is important to realize that these gains in speed do not incur any loss in indexing accuracy or spatial resolution. The data collection speed achievable in individual cases depends on the type of SEM, EBSD camera, operating parameters, and specimen type. For example, backscattered electron signal increases with atomic number, so high atomic number materials yield stronger diffraction patterns and in general can be analyzed faster than low atomic number species. Degraded diffraction patterns will increase analysis time; therefore, specimen preparation routines must always be meticulous. At the time of writing this chapter, a new generation of EBSD cameras means that the fastest mapping speeds recorded is 200 patterns per second (Anon., 2007b). The latest generation of cameras also has the advantage of distortion-free lenses and a rectangular phosphor screen, to replace the previous circular screen, so that the entire diffraction pattern is captured and used. This camera is also shaped so that it can be physically moved close to the specimen, which increases both the camera sensitivity and the spatial resolution (Anon., 2007c). Having collected the diffraction pattern, the speed of pattern analysis and indexing depends on the processing speed of the computer and selections
390
RANDLE
made in the pattern-solving algorithm. Some issues of indexing ambiguity are discussed elsewhere (Nowell and Wright, 2005). The resolution of the Hough transform and the number of Kikuchi bands used to recognize the pattern (typically 3–8) are the main influential variables. The pattern indexing rate is increased by a low-resolution Hough transform and few Kikuchi bands. However, these selections must be tempered by the requirement to optimize the proportion of indexable patterns, sometimes called the hit rate. Unsolved patterns result from either siting the electron probe on a region where the diffraction pattern is corrupted (typically at a grain boundary or a surface blemish) or from setting the Hough transform resolution and/or Kikuchi band number too low. Finally, the operator must select judiciously the level of acceptance criterion for correct indexing, which will affect the speed of map generation and the solve accuracy. The factors discussed in this section are now illustrated by an example from annealed copper. In practice, for meticulously prepared specimens that provide high-quality diffraction patterns, achievement of the optimum mapping speed for a particular EBSD camera and system is influenced primarily by the balance between a suitable choice of binning level, diffraction pattern noise reduction, and hit rate tolerance. Figure 11 shows the effect of these variables. First, a reference map with a maximized hit rate of 99.1% was collected by using 4 × 4 binning, and a high noise reduction (obtained by repeated frame averaging) was used. Figure 11a shows the as-collected reference map. The mean cycle time (i.e., the total time per data point) was 0.274 s/pixel. The binning level was increased to 8×8, and the noise reduction was reduced to the lowest level (i.e., a single-frame average pass). This resulted in a mean cycle time of 0.020 s/pixel and a hit rate of 90.5%. This was considered an acceptable combination of speed and hit rate for the SEM capabilities and the investigation in question. Note that the hit rate need not be maximized because certain unsolved points can be filled in during cleanup routines. Figure 11b shows the as-collected speed-optimized map. Figure 11c shows the map from Figure 11b after cleanup processing. This illustrates that a proportion of unsolved points are acceptable if they can be amended without altering the data. 2. Resolution Resolution, with regard to EBSD, can refer either to spatial resolution or angular resolution—the orientation measurement precision. The precision for orientation measurement by EBSD has been found experimentally to be approximately 0.5–1.0 degrees and better for the best case in FEGSEM (El Dasher, Adams and Rollett, 2003). This precision applies for patterns of optimum quality and is reduced significantly for poor-quality patterns. EBSD
ELECTRON BACKSCATTER DIFFRACTION
391
F IGURE 11. Illustration of the effect of map-processing parameters taken from an annealed copper specimen. (a) As-collected reference map. (b) As-collected map optimized for speed. (c) Postprocessed optimized map. The maps show a pattern quality background with grain boundaries in black.
392
RANDLE
F IGURE 11.
(continued)
can now also be applied in an environmental SEM (Sztwiertnia, Faryna and Sawina, 2006). A recent advance in EBSD has been that better resolutions are obtainable due first to the use of a FEGSEM rather than a conventional SEM, and second to improvements in camera technology as described in Section VI.A.1. For several investigation types this improved resolution is an advantage. These included deformed materials, nanocrystals, and measurements close to grain boundaries. The move to use of FEGSEM has therefore extended the range of EBSD applications, especially into areas that were traditionally the province of TEM. For example, the size of the structure elements in severely deformed copper has been determined and found to be larger than those determined by TEM (Dobatkin et al., 2007). The theoretical limit to spatial resolution is governed by the volume of specimen that is sampled in the electron probe, which in turn is governed by the SEM capabilities, probe current, material, and accelerating voltage. For a flat specimen the sampled volume would project as a circle on the surface of the specimen, but for EBSD the high specimen tilt angle (usually 70 degrees) distorts this shape into an ellipse with an aspect ratio of (tan 70 degrees):1, that is, 3:1. In other words, the probe area on the specimen surface
ELECTRON BACKSCATTER DIFFRACTION
393
is three times smaller parallel to the tilt axis than it is perpendicular to it. The depth resolution is approximately 40 nm for silicon and 10 nm for nickel (Dingley, 2004). It is reported that in iron for 15 kV SEM accelerating voltage half the electrons emerge from a depth of 2 nm (Zaefferer, 2007). In theory, the smallest probe sizes would reduce the best resolution. However, in practice the effective resolution is a useful concept, which takes into account that for smaller probe sizes the SNR is reduced to a level that significantly degrades the diffraction pattern. If the probe current is too small, therefore, the resolution is reduced (Humphreys, 2004). A typical probe current used is 0.1 nA. On the other hand, if the probe is too large, more than one diffraction pattern may be sampled concurrently. The number of frames over which the diffraction pattern is averaged increases the quality of the diffraction pattern and hence improves the effective resolution, albeit with a time penalty. The type of SEM has a major effect on spatial resolution: an FEGSEM is capable of some threefold improvement in spatial resolution compared to a tungsten-filament SEM, with a LaB6 SEM performance between the other two modalities. The best spatial resolution for a material such as brass, for example, is 25–50 nm for a tungsten-filament SEM and 9–22 nm for an FEGSEM, both measured parallel to the tilt axis (Dingley, 2000; Humphreys, 2001) A resolution of 6–9 nm in a nickel superalloy has been noted (Dingley, 2004). A further advantage of EBSD in the FEGSEM is that resolution is much less sensitive to probe current than in a tungsten-filament SEM. The atomic number of the specimen affects resolution: higher atomic number species give better resolution because the backscattering signal increases with atomic number. Figure 12 shows effective resolutions for aluminum, brass, and iron in both a tungsten-filament SEM and an FEGSEM. The effect of accelerating voltage is that beam spread increases with increasing accelerating voltage, so it should be kept as low as practicable. Whereas in a tungsten SEM the accelerating voltage has typically been 20 kV, the latest FEGSEMs provide sufficient beam current at typically 5–10 kV, hence allowing further gains in resolution. B. Interface Characterization Routine characterization of interfaces by EBSD is based on the misorientation between neighboring grains. Grain boundaries and other interfaces then are delineated in maps according to some chosen attribute of their misorientation (as shown in Figure 6d). Statistics of the misorientation distribution also can be derived and output; these are standard options in EBSD analysis packages. Recently there have been significant extensions to this basic approach encom-
394
RANDLE
F IGURE 12. Effective EBSD spatial resolution for various metals in tungsten-filament and FEG SEMs. (From Humphreys, 2004).
passing the inclusion of the connectivity of the grain boundary network and measurement of interface planes. These topics are discussed in this section. 1. Connectivity Studying the connectivity of interfaces is informative because many interfacial phenomena that control the behavior of materials, such as diffusion or interfacial corrosion, are governed by how various interfaces are connected, rather than by the crystallography of individual interfaces alone. Two methods to study interface connectivity by EBSD are grain junction assessment or in connection with percolation theory. For both of these methods, standard EBSD procedures have been recently modified and extended. In a stable microstructure three grain boundaries meet at a triple junction. The misorientation geometry of the three constituent boundaries is linked such that the geometry of any one interface in the junction is given by the other two. Practically, this is important when junctions contain one or more boundaries that have a special crystallography, because such junctions could arrest transport phenomena through the grain boundary network. Often these boundaries are recognized by CSLs, and EBSD can be used to categorize junctions by the number of CSL boundaries they contain. In
ELECTRON BACKSCATTER DIFFRACTION
395
F IGURE 13. Constitution of triple junctions in nickel in terms of twin-related boundaries (S) and random boundaries (R), before and after grain boundary engineering.
recent years, statistics on triple junctions have begun to appear in reports alongside grain boundary proportions (Schuh, Kumar and King, 2003). For example, Figure 13 shows the constitution of triple junctions in nickel, in terms of twin-related boundaries (S) and random boundaries (R), before and after grain boundary engineering. The statistics demonstrate that processing has increased the proportion of triple junctions that contain twin-related CSLs. Other metrics are beginning to be used to assess the connectivity and topology of the grain boundary network. Percolation theory is relevant to the length of connected intergranular pathways or clusters through the microstructure. A simple approach is to consider that such pathways consist only of boundaries with random geometries. A skeletonized EBSD mapping can be used to show the evolution of connected random boundaries with processing and the data subsequently exported to other applications to quantify the percolation characteristics. This has been carried out, for example, on a grain boundary–engineered nickel-base alloy (e.g., Schuh, Kumar and King, 2003).
396
RANDLE
2. The Interface Plane Whereas characterization of a grain boundary frequently refers only to its misorientation parameters, usually expressed as an angle and axis of misorientation, this is an incomplete description. Five independent parameters (degrees of freedom) are required to parameterize a grain boundary, namely, three for the misorientation and an additional two for the boundary plane or surface between two neighboring grains. These five parameters give rise to a huge number of possible grain boundary geometries. For example, between two cubic crystals there are 105 possible different boundaries assuming a 5degree resolution (Saylor, Morawiec and Rohrer, 2003). The crystallographic orientation of the boundary plane has a strong influence on boundary properties. For example, the Σ3 misorientation, 60 degrees/111, is well known to have boundary planes that include {111} (the coherent twin) and {112} (the incoherent twin). Although the misorientation is the same in both cases, the Σ3 on {111} is almost immobile, whereas the Σ3 on {112} is highly mobile. It is now thought that the boundary plane is often more important for determining boundary properties than is the misorientation (Randle et al., 2006). Given the importance of the boundary plane, it is at first surprising that it has been largely omitted from EBSD data. This is because a grain boundary trace appears as a line segment in the 2D metallographic section. The rest of the grain boundary surface is buried in the opaque specimen and requires additional techniques to access it. This usually involves depth-calibrated serial sectioning to recover the grain boundary inclination profile, which is a tedious and difficult endeavor. However, if the boundary inclination and the boundary surface trace is known, the Miller indices of the boundary plane can then be calculated from the orientation of neighboring grains, based on the assumption that the boundary is planar within the vertical section depth. In the 1990s, EBSD had been adapted to obtain grain boundary plane geometry via serial sectioning and manual collection of data (Randle, 1995), although it was not widely adopted by the materials community due to the technical difficulties involved. More recently interest has been generated in using the information provided from the grain boundary trace direction on the specimen surface and dispensing with the serial sectioning step. The crystallographic boundary trace direction plus the misorientation between neighboring grains provides four of the five boundary parameters. These can be used to check whether certain well-defined criteria are met—for example, if the boundary is a twin (Randle and Davies, 2002). The basis for this calculation is that the trace vector T must lie in the boundary plane, which has normal N. For the case of a coherent annealing twin boundary, N is 111. Hence the product of N and T will determine whether or not it is possible for the boundary
ELECTRON BACKSCATTER DIFFRACTION
397
F IGURE 14. Σ3 boundaries (white) in a nickel-based superalloy. Those boundaries selected for trace analysis are numbered (see text for details).
to be a twin. For example, Figure 14 shows Σ3 boundaries depicted in white. For grain boundary 1/2 the angle between T and 111 is 88.3 and 87.8 degrees in both neighboring grains, respectively, and the boundary is 0.3 degrees deviated from the Σ3 reference misorientation. This boundary is therefore probably a coherent twin, and similarly for boundary 3/4. On the other hand, for boundary 5/6 in Figure 14 the angle between T and 111 is 76.6 and 73.8 degrees in both neighboring grains, respectively; therefore, it is impossible for this boundary to be a coherent twin. The methodology for identifying the possibility of twinning in this way has been automated by use of an algorithm that automatically extracts the boundary trace position from EBSD orientation maps (Wright and Larsen, 2002). This algorithm has been used to investigate twinning in zirconium, nickel, and copper, but importantly, it is used as part of a procedure to determine automatically all five boundary parameters from a single section (see below). The boundary trace reconstruction routine works on an orientation map in which grains have been identified as groups of similarly oriented points, from which grain boundaries are defined according to a predefined tolerance. Triple junctions, where three grains meet, are then identified by the software. If one of the boundary segment paths is followed until the next triple junction is encountered, a first attempt at reconstructing a boundary trace can be made by joining the two neighboring triple junctions, as shown in Figure 15a. However, the grain boundary is rarely a straight line between the two junctions, and so the reconstructed trace needs to be segmented in
398
RANDLE
(a)
(b)
(c) F IGURE 15. Illustration of the boundary trace reconstruction routine. (a) First reconstruction attempt, by joining adjacent triple junctions. (b) Segmentation of the reconstructed trace. (c) Small map wherein reconstructed boundaries are superimposed on true boundaries. Grains are colored randomly.
ELECTRON BACKSCATTER DIFFRACTION
399
order to follow more closely the true boundary. This is done by locating the point on the true boundary farthest from the reconstructed boundary. If the perpendicular distance between this point and the reconstructed boundary exceeds a predefined tolerance, then the reconstructed boundary is split into two line segments, as shown on Figure 15b. This procedure is repeated until all points on the reconstructed boundary are within the tolerance distance of the true boundary. Figure 15c shows a small map wherein reconstructed boundaries are superimposed on true boundaries. To minimize errors, a small step size must be used to generate the map to reproduce the boundary positions as faithfully as possible, given the discrete nature of the measurement grid. Then the segmenting process must aim to reproduce the true boundary rather than any noise on the boundary length. It is therefore essential that a small tolerance (e.g., twice the map step size) be chosen to reconstruct boundaries. The expected error for a boundary of length six times greater than the scan step size would be ±2 degrees (Wright and Larsen, 2002). Finally, an average orientation from each neighboring grain is associated with each segment of the reconstructed boundary trace. Measurement points between one and five steps from the boundary are used in the averaging. Recently a method has been devised to measure the five-parameter grain boundary distribution. This is defined as the relative frequency of occurrence of a grain boundary in terms of misorientation and boundary plane normal, expressed in units of multiples of a random distribution (MRD). The original five-parameter analysis schedule did not use automatically reconstructed boundaries, but rather used SEM images to determine the boundary positions coupled with serial sectioning. One of the first materials to be analyzed was a hot-pressed MgO polycrystalline specimen (Saylor, Morawiec and Rohrer, 2003). Grain boundaries were manually digitized using a program that allowed the operator to trace and hence skeletonize the boundaries with a computer mouse. A coarse orientation map of the same region was obtained, which was matched to the SEM image to allow an orientation to be assigned to each grain. Information on the inclination of each boundary was then obtained by sequential, high-precision serial sectioning. Data from three to five layers were collected. Once the geometry and orientation of all grains on all the layers had been determined, adjacent layers were aligned to establish the connectivity of grains between the layers. Thus a meshed surface between adjacent layers was finally created. Eventually for each mesh element the area of boundary plane, the misorientation across the plane, and the crystallographic normal to the plane were specified. The procedure outlined above allows the complete five-parameter grain boundary distribution function, encompassing the misorientation and the plane, to be calculated. Details of how the five-dimensional domain of
400
RANDLE
misorientations and planes is parameterized, taking into account that crystal symmetries lead to numerous symmetrically equivalent boundaries, are given elsewhere (Saylor, Morawiec and Rohrer, 2003). Essentially, the domain is normalized and partitioned into cells of equal volume such that the grain boundary distribution that falls in each cell is a multiple of a random distribution, MRD. For the case of MgO, it was shown that grain boundaries most frequently adopted asymmetric configurations with the boundary plane parallel to {100} in one of the two neighboring grains. Considerable technical challenges were involved in the original experimental procedure to obtain the grain boundary distribution. These were mainly due to the need for serial sectioning. Errors that arise in both montaging neighboring images in a single section layer, and more especially in aligning adjacent layers in the same global reference frame must be taken to consideration. The resolution of the sectioning process is far less than the microscopy, leading to further errors. Furthermore, serial sectioning preferentially reveals planes that are perpendicular to the analysis surface, and so a sampling bias is introduced. Finally, the entire procedure is time consuming. For these reasons, the fiveparameter methodology has recently been refined so that the grain boundary distribution is estimated from a single section through the microstructure, and the need for serial sectioning is obviated. The stereological procedure to extract boundary plane data from a single section has been adapted from established methods to determine habit planes from embedded crystals. The details of the method are described elsewhere (Saylor et al., 2004) and only a very brief summary of the principles is given here. The procedure uses EBSD mapping both to obtain grain orientations and to reconstruct the boundary segment traces as described. This provides four of the five independent parameters to describe the grain boundary; only the boundary inclination angle is unknown. However, what is known is that the boundary plane must be in the set of planes whose normals are perpendicular
F IGURE 16. Schematic illustration of the principle of the five-parameter stereology method. T is the boundary trace direction and N is the boundary plane normal. If a sufficiently large number of observations are made, the true boundary plane will accumulate more than the false planes and will form a peak in the distribution at N . See text for details. (Courtesy G. Rohrer.) (See Color Insert.)
ELECTRON BACKSCATTER DIFFRACTION
401
to the boundary trace on the single section. These normals lie on a great circle on the stereographic projection, as shown in Figure 16. Because the sample population of boundary traces is large, comprising at least 50,000 trace segments for a material with cubic symmetry, many crystal pairs have the same misorientation but different boundary planes. The boundary trace generates a set of possible boundary planes. The probability that the set contains the true boundary plane is 1, whereas the probability that other, false planes are included in the set is less than 1. Therefore, if a sufficiently large number of observations are made, the true boundary plane will accumulate more than the false planes and will form a peak in the distribution. The background (i.e., false planes), are then removed, based on the assumption of random sampling. This is illustrated schematically in Figure 16. From the remaining peaks, the ratios of the observed line lengths specify the relative areas of each boundary type—the five-parameter grain boundary distribution. The new, single-section method for estimating the five-parameter boundary distribution has been tested against both simulated data and known boundary distributions obtained by serial sectioning (Saylor et al., 2004). The comparison was very favorable provided that sufficient data are included in the stereological analysis (i.e., at least 50,000 boundary segments for cubic materials and more for noncubics). With the advent of rapid EBSD orientation mapping, as described in Section VI.A, such quantities of data, obtained with high spatial resolution, can be collected easily; this technique is certainly preferable to serial sectioning with its inherent difficulties. Analysis of the five-dimensional (5D) grain boundary distribution function is achieved by visual examination of various sections through the 5D space. It is convenient physically to maintain the distinction between misorientation (three parameters) and boundary plane (two parameters). Misorientation axes and angles, and combinations thereof, can be displayed using standard facilities available in EBSD packages. Data specifically from the five-parameter analysis (i.e., boundary plane distributions) are shown in stereographic projection, where the stereogram has the axes of the crystal and the resolution is usually 10 degrees. The five-parameter grain boundary analysis technique will now be illustrated by an example from annealed brass. Figure 17a shows the distribution of boundary planes in the brass specimen, averaged over all misorientations, and expressed as multiples of a random distribution, MRD. There are strong maxima at {111} and minima at {100}. In this specimen, 60% of boundary length is Σ3, and much of this is coherent annealing twins. These twins account for a large proportion of the {111} peak. However, when the Σ3 boundaries are excluded from the data, the same trend is still revealed in the general boundary population, namely, a small maximum at {111} (MRD value 1.4) and a minimum at {100} (Figure 17b).
402
RANDLE
F IGURE 17. Distribution of boundary planes in a brass specimen averaged over all misorientations and expressed as multiples of a random distribution (MRD) shown in standard stereographic projection. (a) All boundaries. (b) Σ3 boundaries excluded. (See Color Insert.)
This trend also has been revealed in other fcc metals and alloys, such as nickel, aluminum, and copper. Certain misorientations of interest are selected to view the distribution of boundary planes in more detail. A starting point is to select those misorientations with a low-index misorientation axis. For the data from brass illustrated here, there were few misorientations on [100], but more on [111] and [110]. Whereas originally the entire misorientation angle range was inspected, here just one angle/axis combination for each misorientation axis will be shown for illustration purposes. Figure 18a shows 30 degrees/[100], which includes very few planes. Figure 18b shows 30 degrees/[111], which reveals that (111) twist boundaries are prevalent, because there is a maximum at (111) with an MRD value of 3.32. Finally, Figure 18c shows 30 degrees/[110], where it is seen that asymmetrical tilt boundaries are prevalent, because the plane density is distributed along the (110) zone, with multiple peaks having a high MRD value of 18.81. The [110] misorientation axis is marked. These distributions
F IGURE 18. Distribution of boundary planes for selected misorientations in a brass specimen expressed as multiples of a random distribution (MRD) and shown in standard stereographic projection. (a) 30 degrees/[100] misorientation, (b) 30 degrees/[111] misorientation, (c) 30 degrees/[110] misorientation. The [110] misorientation axis is marked on (c). (See Color Insert.)
ELECTRON BACKSCATTER DIFFRACTION
403
show that there is anisotropy in the distribution of boundary planes. Since tilt and twist boundaries have lower energies than a random configuration, here energy considerations have governed the choice of boundary planes. EBSD can also be used to determine the orientation of facets, for example, on a fracture surface (Ayer, Mueller and Neeraj, 2006; Randle, 1999). This is an awkward procedure, both because of uncertainties in the facet alignment with respect to the microscope geometry and because surface preparation is not viable. Recently the technique has been improved such that the stereological measurements and the EBSD data are both collected in situ, without the need for specimen replacement and associated errors (Ro, Agnew and Gangloff, 2005). The uncertainty in facet crystallography determination was found to be 2–5 degrees. The technique was used to show that fatigue fracture surface facets in an Al–Cu–Li alloy were close to, but not identical to, {111}. It has also been coupled with Auger electron spectroscopy to investigate the correlation between grain boundary segregation and grain boundary plane orientation in Nb-doped TiO2 (Peng and Wynblatt, 2005). It should be noted that here EBSD is being used to make great progress in grain boundary studies. This is the first time that data on which boundary planes exist in polycrystals have been available; thus it has paved the way for expanding knowledge on grain boundary structure and how that links to properties. C. Three-Dimensional EBSD Traditionally microstructure studies have relied on extracting information from a single 2D section and extrapolating to three dimensions either by direct inference or by stereological procedures. Sometimes this approach is inadequate. Efforts are therefore made to obtain a 3D microstructural data set, and modern EBSD is contributing to this field. 1. Sectioning The most common way to characterize the 3D microstructure is by depthcalibrated serial sectioning. Precision serial sectioning to obtain a population of boundary plane inclinations has been described already in Section VI.B.2. There are other examples in which EBSD can be supplemented by other techniques for 3D studies. EBSD measurements have been coupled with synchrotron X-ray microtomography to study the interaction of fatigue cracks with grain boundaries in cast Al alloys (Ludwig et al., 2003). Here grains were visualized in three dimensions by decoration of the grain boundaries by liquid gallium while EBSD was used on the sample surface to obtain grain orientations. The 3D structure of diamond films has been studied
404
RANDLE
by acquisition of an EBSD map followed by ion milling to remove a layer of the film and repeated mapping (Chen and Rudolph, 2003). A final example is that serial sectioning has been combined with EBSD to provide a methodology for 3D microstructure reconstruction, as demonstrated on an austenitic steel (Lewis et al., 2006). Although sectioning is usually performed in a parallel manner, a series of oblique double sections can also be obtained and combined with EBSD data, which can offer better representation of the bulk microstructure and certain experimental advantages compared to serial sectioning (Homer, Adams and Fullwood, 2006). Having obtained the 3D data, reconstruction and visualization in 3D is a further step, for which several commercial or in-house programs are available for this 3D rendering (e.g., Lewis et al., 2006). Such data, in addition to giving insights into the true microstructure, also provide input for various computer models. 2. Focused Ion Beam Tomography Focused ion beam (FIB) tomography is a relatively new technique used to section and shape specimens on a nanometer scale. An exciting extension to this technique has been the recent integration of FIB technology with an SEM to configure a dual-beam instrument capable of both precision sectioning and high-resolution imaging. An added benefit of the dual-beam configuration is that EBSD is possible. The applications of this setup fall into two main categories: analysis of certain delicate specimens and acquisition of 3D data. Delicate specimens include those that degrade readily so that a freshly milled surface is advantageous for diffraction pattern collection or where the specimen form renders preparation awkward (e.g., thin wires and nanomaterials). For example, FIB-SEM has been used to record the microtexture and microstructure of gold interconnect wires and tungsten lamp filaments (Anon., 2007c). It is possible to obtain a 3D EBSD data set by removing successive slices with the FIB column and then manually switching to the electron column to obtain successive EBSD maps. However, a real step forward in efficiency occurred recently when the entire process was automated using software that communicates with both the dual beam for controlled sectioning and the EBSD system. Maps from up to 200 slices have been obtained (Anon., 2007c). The best spatial resolution of the 3D pixels (voxels) is 100 nm × 100 nm × 100 nm, the angular resolution is 0.5 degrees, and the maximum observable volume is 50 µm × 50 µm × 50 µm. A slice from nickel measuring 15 µm×8 µm×200 nm would be removed in about 25 minutes, using a 30 kV ion beam and a current of 3 nA (Anon., 2007c). The integration of EBSD and FIB-SEM is experimentally challenging because of obstruction by the various detectors and the need to collect EBSD
ELECTRON BACKSCATTER DIFFRACTION
405
F IGURE 19. Schematic illustration of the geometry of the FIB column, the SEM column, the EBSD detector, and the specimen stage. (Courtesy M. Ferry.)
patterns at the crossover point of the electron and ion beams, which occurs at a very short working distance (5–8 mm). Figure 19 illustrates the combined geometry of the FIB column, the SEM column, the EBSD detector, and the specimen stage. An energy-dispersive spectroscopy (EDS) detector also may be required. The specimen surface must first be maintained parallel to the ion beam for milling and then be maintained at 20 degrees (typically) to the electron beam for EBSD. It is therefore necessary to precision rotate and position the specimen between successive operations. The ion beam is Ga+ ions with typical energies between 5 and 30 keV. Further experimental and technical details can be found elsewhere (Konrad, Zaefferer and Raabe, 2006; Xu et al., 2007; Zaafarani et al., 2006). The Ga+ beam produces some surface damage, depending on the atomic number of the material and the operating conditions. This in turn can have a knock-on effect on the clarity and hence solve efficiency of the diffraction patterns. Low atomic number metals are more affected than those with high atomic numbers (Ferry et al., 2007). Figure 20 shows an example of FIB/EBSD data from an investigation that was able to reveal particle-stimulated nucleation in a nickel alloy. This would have been difficult to detect in a 2D section. Figure 20a shows EBSD maps of consecutive FIB sections through partially recrystallized Ni– 0.3wt% Si. The slices represent section depths of 0.2 µm ± 0.05 µm. SiO2 particles and recrystallized grains containing annealing twins are seen in the deformed matrix. A 3D rendering of the sections in Figure 20a is shown in Figure 20b. As yet there are few examples in the literature of the application of FIB/EBSD to materials research. One study shows the evaluation of mi-
406
RANDLE
crostructure and microtexture below a nanoindent and how these measurements compare to finite element simulations (Zaafarani et al., 2006). In another example, orientation gradients around a hard Laves particle in a warm-rolled Fe3 Al-based alloy were analyzed (Konrad, Zaefferer and Raabe, 2006). Similar to the case for conventional serial sectioning, a variety of ex situ packages are used to compile and manipulate 3D renderings from the 2D section maps. Of note, there are few publications in the literature on combined sectioning and EBSD for 3D microstructure characterization using only a conventional SEM. This perhaps reflects both the labor-intensive nature of the process and the more exciting attractions of FIB. FIB/EBSD is complementary to the serial sectioning or stereological methods of 3D characterization because
(a) F IGURE 20. (a) EBSD maps of consecutive FIB sections through partially recrystallized Ni–0.3wt% Si. The slices represent section depths of 0.2 µm ± 0.05 µm. SiO2 particles and recrystallized grains containing annealing twins are seen in the deformed matrix. (b) 3D rendering of (a). (Courtesy M. Ferry.) (See Color Insert.)
ELECTRON BACKSCATTER DIFFRACTION
407
(b) F IGURE 20.
(continued)
FIB/EBSD is a slow process and is only viable for small volumes, whereas serial sectioning or stereology methods provide statistical data from much larger specimens. D. Multiphase Materials EBSD is increasingly being used to analyze complex materials both as a phase identification tool and to analyze interphase relationships. 1. Phase Identification Dedicated application of EBSD for identification of phases is a more recent development than its application to microtexture or grain boundary crystallography. The EBSD approach offers better spatial resolution than X-ray powder diffraction analysis plus concurrent microstructure and microtexture information. A high-sensitivity EBSD camera is used for dedicated phase
408
RANDLE
identification work. The principle of phase identification by EBSD is that the full crystal symmetry of the specimen is embodied in the symmetry of the diffraction patterns. If an experimentally acquired, unknown EBSD pattern is indexed using the correct phase match, a consistent and accurate match will be achieved with all parts of the pattern simulation of the phase match candidate. Phase match candidates are selected from compiled or existing external crystallographic databases. Chemical composition analysis of the phase by EDS is used as a filter in selection of candidate phases. The EDS and EBSD analyses can be performed as two separate steps. For example, very small intergranular Bi–In phases have been identified in Zn powders (Perez et al., 2006), and calcium compounds have been identified in aluminum alloys (Zaldívar-Cadena and Flores-Valdés, 2007). A recent advance is the integration of both these steps into a single interface such that chemical and crystallographic data are acquired simultaneously (Dingley, 2004). This requires conjoint data collection from a highly tilted specimen (for EBSD), no shadowing from either the EDS detector or the EBSD camera, and rationalization of the different dwell times to acquire EDS and EBSD data. The dedicated phase identification package compares automatically collected diffraction patterns from an unknown phase with simulated patterns from reference phases using chemical composition from the EDS spectra to filter out impossible solutions. Links to a number of external databases, such as the International Center for Diffraction Data (ICDD), are used to find the candidate reference phases and then to simulate diffraction patterns from the crystallographic parameters, including full structure factor calculations. Often such databases were originally compiled for X-ray diffraction and conversions are applied for electron diffraction. Phases from all seven crystal systems can be identified by the combined method. For example, complex phase compositions in the Cr–Si–Nb system have been identified, including orthorhombic (Cr,Nb)11 Si8 and orthorhombic (Cr,Nb)6 Si5 (Anon., 2007c). 2. Orientation Relationships In the past few years EBSD has been more frequently used to investigate crystallographic relationships between phases, which in turn is contributing toward understanding of phase transformations. This application of EBSD, which is a natural outcome of the increased efficiency of orientation mapping from multiphase materials, has recently been reviewed in detail (GourguesLorenzon, 2007). EBSD allows the crystallographic units in a phase microstructure to be distinguished from the morphological units that are observed in an image. The crystallographic units usually are more complex than the image implies. Often a crystallographic orientation relationship (OR) exists between two phases that have a parent/product relationship in a phase transformation.
ELECTRON BACKSCATTER DIFFRACTION
409
The most frequently investigated orientation relationships involve either the decomposition of austentite in steels to form ferrite or other products such as martensite or bainite (Cabus, Réglé and Bacroix, 2007; Petrov et al., 2007) or the α/β (hexagonal close-packed/body-centred cubic—hcp/bcc) phase transformation in titanium and titanium alloys (Seward et al., 2004). All of these phase transformations are technologically important. The OR may have several variants, depending on the crystallography of the system. A well-known example is the Burgers relationship, where both close-packed planes and close-packed directions in the hcp/bcc transformation are parallel. This relationship produces six crystallographically equivalent variants. For the α/β phase transformation in pure titanium, for example, it has been found that plates of β have a Burgers OR with the parent α (Seward et al., 2004). Several variants tended to occur in one grain. The Burgers relationship in zirconium alloys has also been investigated (Chauvy, Barberis and Montheillet, 2006). Various EBSD investigations have recently shown that variant selection usually operates, and variants are favored that either obey the OR with both parent grains or that have a special OR with the boundary plane (GourguesLorenzon, 2007). Recently EBSD has also been used on several systems to retrieve by calculation the orientation of the parent phase from measurement of the orientation of the product (Cayron, Artaud and Briottet, 2006). Phase relationships of an increasingly complex nature are being investigated by EBSD. For example, the phases in the Nb–Si system have been studied (Miura, Ohkubo and Mohri, 2007). It was found that there were several ORs between the Nb and Nb5 Si3 phases formed during the eutectoid reaction, some of which show good atomic matching. In another investigation the relationship between NbC and ZrO2 was elucidated (Faryna, 2003). Misorientations in the composite systems Al2 O3 /WC and Al2 O3 /W have also been studied in an ESEM (Sztwiertnia, Faryna and Sawina, 2006). E. Strain Assessment Although not a widespread application of EBSD, advances in analyses connected with lattice strain are noteworthy because they allow insights into deformation and some other physical characteristics of materials. A new parameter, called the modified crystal deformation, has been defined to quantify the spread of crystal orientation within individual grains arising due to dislocation accumulation during plastic deformation (Kamaya, Wilkinson and Titchmarsh, 2006). This parameter has a good correlation with the degree of plastic strain. With regard to elastic strain, until recently measurement of elastic strain had been beyond the scope of EBSD. However, small shifts in the positions of zone axes in EBSD patterns can now be detected and measured
410
RANDLE
with sufficient sensitivity to enable elastic strain measurement (Wilkinson, Meaden and Dingley, 2006). The technique has been used to measure strains across the interface in a Si–Ge epilayer on a Si substrate. Other workers have exploited the strain-sensitive parameters in EBSD patterns to evaluate the elastically distorted regions in GaN epilayers (Luo et al., 2006). A strain range from 100 to 200 nm was detected. Other aspects of strain-related exploitation of EBSD have concentrated on visualization of plastic deformation or measurement of dislocation content. An algorithm has been developed to map plastic deformation, which has been used to study deformation zones around high-strain gradient microstructural features such as crack tips and indentations in an austenitic steel (Brewer et al., 2006). Analysis of grain boundary maps or misorientation gradient maps has been adapted to connect the observed deformation with the local density of geometrically necessary dislocations (GNDs). Simple dislocation models can be used to estimate local GND densities from misorientation maps by measuring the lattice curvature. The long-range accumulation of dislocations near grain boundaries has been obtained by this method (El Dasher, Adams and Rollett, 2003). The second approach to visualization and quantification of strain involves the average orientation deviation angle approach (Field et al., 2005).
VII. T RENDS IN EBSD U SAGE The emphasis in this review has been on describing state-of-the-art EBSD principles and practice and capturing the latest developments in both the technique itself and its applications. The hundreds of papers now published annually attest that this is a large task, and therefore some selectivity has been exercised in the topics covered here. Those likely to be developed further in the near future have been given prominence, whereas applications of well-established aspects of EBSD, such as microtexture determination, have not been discussed in any detail. It should also be mentioned that although application of EBSD in geological sciences is increasing significantly, it has not been covered in this review because of the author’s limited experience in this area rather than a reflection of the importance of the topic. The advances of EBSD in resolution, phase identification, and multiphase mapping are of great benefit to analysis of geological materials, because they are often complex in terms of both the phases present and their morphology. The trends in EBSD usage have been assessed and quantified by an analysis of the published papers in two recent years, 2003 and, more up-to-date, 2006. The overall trend is diversification of EBSD usage, both in terms of
ELECTRON BACKSCATTER DIFFRACTION
411
materials analyzed and applications. Figures 9b and c show a breakdown of EBSD usage in 2003 and 2006, respectively, according to material type. There is also a group labeled technique, which refers to publications that describe aspects of EBSD methodology rather than being material specific. Metals are overwhelmingly the largest category, and they are further divided into fcc (face-centred cubic), bcc (body-centred cubic), and hexagonal crystal structures. The total number of publications doubled in the period 2003– 2006 (Figure 9a). In 2003 the emphasis was particularly on fcc metals, accounting for nearly half the publications at that time. The most noticeable trend between 2003 and 2006 is the upsurge in application of EBSD to crystal structures and material groups other than fcc metals, particularly those that are more complex. The number of papers on geological materials has increased more than four times, and articles on hexagonal, bcc, and electronic materials have more than doubled. This is evidence that EBSD is becoming more widely applicable to the entire range of crystalline materials. In 2006 fcc metals remained the biggest single group for EBSD-related investigations. This group was composed mainly of aluminum, copper, or nickel alloys. The bcc group is the second largest group, representing mainly steels. The hexagonal metals group is the smallest of the metals groups and represents mainly titanium and magnesium alloys. The proportions of all these and other metals and alloys are shown on Figure 9d. Whereas a few years ago the primary application of EBSD was to microtexture, Figure 9a shows that the rate of increase of texture-related EBSD papers is less than that for all EBSD investigations. Now the range of applications has broadened to include a whole range of microstructure characterization, as depicted in Figure 10 and discussed throughout this review. Increasingly, applications of EBSD may include combination with other characterization techniques. For example, EBSD is often linked with atomic force microscopy to correlate crystallography with surface evolution (Chandrasekaran and Nygårds, 2003; Schuh, Kumar and King, 2003). Another aspect is the use of EBSD data as input for finite element modeling and other simulations (e.g., Zhang, Zhang and McDowell, 2007). These and many other applications are augmenting and enriching the use of EBSD as a investigative tool.
R EFERENCES Adams, B.L., Wright, S.I., Kunze, K. (1993). Orientation imaging: The emergence of a new microscopy. Met. Trans. A 24, 819–831. Alam, M.N., Blackman, M., Pashley, D.W. (1953). High-angle Kikuchi patterns. Proc. Roy. Soc. Lond. A 221, 224–242. Anon. (2007a). http://www.edax.com/products/TSL.cfm.
412
RANDLE
Anon. (2007b). http://www.edax.com/snippet.cfm?Snippet_Id=1536. Anon. (2007c). http://www.oxinst.com/wps/wcm/connect/Oxford+Instruments/ Products/Microanalysis/CHANNEL+5/HKL+CHANNEL+5. Apps, P.J., Bowen, J.R., Prangnell, P.B. (2003). The effect of coarse secondphase particles on the rate of grain refinement during severe deformation processing. Acta Mater. 51, 2811–2822. Ayer, R., Mueller, R.R., Neeraj, T. (2006). Electron backscattered diffraction study of cleavage fracture in pure iron. Mater. Sci. Eng. A 417, 243–248. Brewer, L.N., Othon, M.A., Young, L.M., Angeliu, T.M. (2006). Misorientation mapping for visualization of plastic deformation via electron backscattered diffraction. Microsc. Microanal. 12, 85–91. Cabus, C., Réglé, H., Bacroix, B. (2007). Orientation relationship between austenite and bainite in a multiphased steel. Mater. Charact. 58, 332–338. Cayron, C., Artaud, B., Briottet, L. (2006). Reconstruction of parent grains from EBSD data. Mater. Charact. 57, 386–401. Chandrasekaran, D., Nygårds, M. (2003). A study of the surface deformation behaviour at grain boundaries in an ultra-low-carbon steel. Acta Mater. 51, 5375–5384. Chauvy, C., Barberis, P., Montheillet, F. (2006). Microstructure transformation during warm working of β-treated lamellar Zircaloy-4 within the upper α-range. Mater. Sci. Eng. A 431, 59–67. Chen, H.-W., Rudolph, V. (2003). The 3-D structure of polycrystalline diamond film by electron backscattering diffraction (EBSD). Diamond Relat. Mater. 12, 1633–1639. Dingley, D.J. (1981). A comparison of diffraction techniques for the SEM. Scan. Elect. Micros. IV, 273–286. Dingley, D.J. (1989). Developments in on-line crystal orientation determination. In: EMAG-MICRO ’89. In: Inst. Phys. Conf. Ser., No. 98. IOP Publishing, Bristol, UK, pp. 473–476. Dingley, D.J. (2000). The development of automated diffraction in scanning and transmission electron microscopy. In: Schwartz, A.J., et al. (Eds.), Electron Backscatter Diffraction in Materials Science. Kluwer Academic, New York, pp. 1–18. Dingley, D.J. (2004). Progressive steps in the development of electron backscatter diffraction and orientation imaging microscopy. J. Micros. 213, 214–224. Dingley, D.J., Baba-Kishi, K., Randle, V. (1994). An Atlas of Backscatter Kikuchi Diffraction Patterns. Institute of Physics Publishing, Bristol, UK. Dobatkin, S.V., Szpunar, J.A., Zhilyaev, A.P., Cho, J.-Y., Kuznetsov, A.A. (2007). Effect of the route and strain of equal-channel angular pressing on structure and properties of oxygen-free copper. Mater. Sci. Eng. A 462, 132–138.
ELECTRON BACKSCATTER DIFFRACTION
413
El Dasher, B.S., Adams, B.L., Rollett, A.D. (2003). Viewpoint: Experimental recovery of geometrically necessary dislocation density in polycrystals. Scripta Mater. 48, 141–145. Faryna, M. (2003). TEM and EBSD comparative studies of oxide–carbide composites. Mater. Chem. Phys. 81, 301–304. Ferry, M., Xu, W., Mateescu, N., Cairney, J.M., Humphreys, F.J. (2007). On the viability of FIB tomography for generating 3-D orientation maps in deformed and annealed metals. Mater. Sci. For. 550, 55–64. Field, D.P., Trivedi, P.B., Wright, S.I., Kumar, M. (2005). Analysis of local orientation gradients in deformed single crystals. Ultramicroscopy 103, 33–39. Gourgues, A.-F. (2002). Review: Electron backscatter diffraction and cracking. Mater. Sci. Tech. 18, 119–133. Gourgues-Lorenzon, A.-F. (2007). Application of electron backscatter diffraction to the study of phase transformations. Int. Mater. Rev. 57, 65–128. Homer, E.R., Adams, B.L., Fullwood, D.T. (2006). Recovery of the grain boundary character distribution through oblique double-sectioning. Scripta Mater. 54, 1017–1102. Hong, S., Lee, D.N. (2003). Grain coarsening in IF steel during strain annealing. Mater. Sci. Eng. A 357, 75–85. Humphreys, F.J. (2001). Review—Grain and subgrain characterization by electron backscatter diffraction. J. Mater. Sci. 36, 3833–3854. Humphreys, F.J. (2004). Characterisation of fine-scale microstructures by electron backscatter diffraction (EBSD). Scripta Mater. 51, 771–776. Humphreys, F.J., Huang, Y., Brough, I., Harris, C. (1999). Electron backscatter diffraction of grain and subgrain structures—Resolution considerations. J. Microsc. 195, 212–216. Kamaya, M., Wilkinson, A.J., Titchmarsh, J.M. (2006). Quantification of plastic strain of stainless steel and nickel alloy by electron backscatter diffraction. Acta Mater. 54, 539–548. Konrad, J., Zaefferer, S., Raabe, D. (2006). Investigation of orientation gradients around a hard Laves particle in a warm-rolled Fe3 Al-based alloy using a 3D EBSD-FIB technique. Acta Mater. 54, 1369–1380. Krieger-Lassen, N., Juul Jensen, D., Conradsen, K. (1992). Image processing procedures for analysis of EBSPs. Scan. Microsc. 6, 115–121. Lee, D.S., Ryoo, H.S., Hwang, S.K. (2003). A grain boundary engineering approach to promote special boundaries in Pb-base alloy. Mater. Sci. Eng. A 354, 106–111. Lewis, A.C., Bingert, J.F., Rowenhorst, D.J., Gupta, A., Geltmacher, A.B., Spanos, G. (2006). Two- and three-dimensional microstructural characterization of a super-austenitic stainless steel. Mater. Sci. Eng. A 418, 11–18.
414
RANDLE
Ludwig, W., Buffière, J.-Y., Savelli, S., Cloetens, P. (2003). Study of the interaction of a short fatigue crack with grain boundaries in a cast Al alloy using X-ray microtomography. Acta Mater. 51, 585–598. Luo, J.F., Ji, Y., Zhong, T.X., Zhang, Y.Q., Wang, J.Z., Liu, J.P., Niu, N.H., Han, J., Guo, X., Shen, G.D. (2006). EBSD measurements of elastic strain fields in a GaN/sapphire structure. Microelectr. Reliab. 46, 178–182. Michael, J. (2000). Phase identification using electron backscatter diffraction in the SEM. In: Schwartz, A.J., et al. (Eds.), Electron Backscatter Diffraction in Materials Science. Kluwer Academic, New York, pp. 75–90. Miura, S., Ohkubo, K., Mohri, T. (2007). Microstructural control of Nb–Si alloy for large Nb grain formation through eutectic and eutectoid reactions. Intermetallics 15, 783–790. Nowell, M.M., Wright, S.I. (2005). Orientation effects on indexing of electron backscatter diffraction patterns. Ultramicroscopy 103, 41–58. Peng, Y., Wynblatt, P. (2005). Correlation between grain boundary segregation and grain boundary plane orientation in Nb-doped TiO2 . J. Am. Ceram. Soc. 88, 2286–2291. Perez, M.G., Kenik, E.A., O’Keefe, M.J., Miller, F.S., Johnson, B. (2006). Identification of phases in zinc alloy powders using electron backscatter diffraction. Mater. Sci. Eng. A 424, 239–250. Petrov, R., Kestens, L., Wasilkowska, A., Houbaert, Y. (2007). Microstructure and texture of a lightly deformed TRIP-assisted steel characterized by means of the EBSD technique. Mater. Sci. Eng. A 447, 285–297. Randle, V. (1992). Microtexture Determination and Its Applications, 1st edition. Institute of Materials, London. Randle, V. (1995). Crystallographic characterization of planes in the scanning electron microscope. Mater. Charact. 34, 29–34. Randle, V. (1999). Crystallographic analysis of facets using electron backscatter diffraction. J. Microsc. 195, 226–232. Randle, V. (2001). The coincidence site lattice and the ‘sigma enigma’. Mater. Charact. 47, 411–416. Randle, V. (2003). Microtexture Determination and Its Applications, 2nd edition. Institute of Materials, London. Randle, V., Davies, H. (2002). A comparison between three-dimensional and two-dimensional grain boundary plane analysis. Ultramicroscopy 90, 153– 162. Randle, V., Engler, O. (2000). Introduction to Texture Analysis: Macrotexture, Microtexture and Orientation Mapping. Taylor and Francis, London. Randle, V., Rohrer, G., Kim, C., Hu, Y. (2006). Changes in the five-parameter grain boundary character distribution in alpha-brass brought about by iterative thermomechanical processing. Acta Mater. 54, 4489–4502. Ro, Y.J., Agnew, S.R., Gangloff, R.P. (2005). Uncertainty in the determination of fatigue crack facet crystallography. Scripta Mater. 52, 531–536.
ELECTRON BACKSCATTER DIFFRACTION
415
Saylor, D.M., Morawiec, A., Rohrer, G.S. (2003). Distribution of grain boundaries in magnesia as a function of five macroscopic parameters. Acta Mater. 51, 3663–3674. Saylor, D.M., El-Dasher, B.S., Adams, B.L., Rohrer, G.S. (2004). Measuring the five-parameter grain-boundary distribution from observations of planar sections. Met. Mater. Trans. A 35, 1981–1989. Schmidt, N.-H., Olesen, N.O. (1989). Computer-aided determination of crystal-lattice orientation from electron-channeling patterns in the SEM. Can. Mineral. 27, 15–22. Schuh, C.A., Kumar, M., King, W.E. (2003). Analysis of grain boundary networks and their evolution during grain boundary engineering. Acta Mater. 51, 687–700. Schwartz, A.J., Kumar, M., Adams, B.L. (Eds.) (2000). Electron Backscatter Diffraction in Materials Science. Kluwer Academic, New York. Seward, G.G.E., Celotto, S., Prior, D.J., Wheeler, J., Pond, R.C. (2004). In situ SEM-EBSD observations of the hcp to bcc phase transformation in commercially pure titanium. Acta Mater. 52, 821–832. Sztwiertnia, K., Faryna, M., Sawina, G. (2006). Misorientation characteristics of interphase boundaries in particulate Al2 O3 -based composites. J. Eur. Cer. Soc. 26, 2973–2978. Tan, L., Sridharan, K., Allen, T.R. (2007). Effect of thermomechanical processing on grain boundary character distribution of a Ni-based superalloy. J. Nucl. Mater. 371, 171–175. Venables, J.A., Bin-Jaya, R. (1977). Accurate microcrystallography using electron back-scattering patterns. Phil. Mag. A 35, 1317–1332. Venables, J.A., Harland, C.J. (1973). Electron backscattering pattern—A new technique for obtaining crystallographic information in the scanning electron microscope. Phil. Mag. 27, 1193–1200. Wilkinson, A.J., Meaden, G., Dingley, D.J. (2006). High-resolution elastic strain measurement from electron backscatter diffraction patterns: New levels of sensitivity. Ultramicroscopy 106, 307–313. Wilson, A.W., Madison, J.D., Spanos, G. (2003). Determining phase volume fraction in steels by electron backscattered diffraction. Scripta Mater. 45, 1335–1340. Wright, S.I., Larsen, R.J. (2002). Extracting twins from orientation imaging microscopy scan data. J. Microsc. 205, 245–252. Wright, S.I., Nowell, M. (2006). EBSD image quality mapping. Microsc. Microanal. 12, 72–84. Xu, W., Ferry, M., Mateescu, N., Cairney, J.M., Humphreys, F.J. (2007). Techniques for generating 3-D EBSD microstructures by FIB tomography. Mater. Charact. 58, 961–967.
416
RANDLE
Zaafarani, N., Raabe, D., Singh, R.N., Roters, F., Zaefferer, S. (2006). Three-dimensional investigation of the texture and microstructure below a nanoindent in a Cu single crystal using 3D EBSD and crystal plasticity finite element simulations. Acta Mater. 54, 1863–1876. Zaefferer, S. (2007). On the formation mechanisms, spatial resolution and intensity of backscatter Kikuchi Patterns. Ultramicroscopy 107, 254–266. Zaldívar-Cadena, A.A., Flores-Valdés, A. (2007). Prediction and identification of calcium-rich phases in Al–Si alloys by electron backscatter diffraction EBSD/SEM. Mater. Charact. 58, 834–841. Zhang, M., Zhang, J., McDowell, D.L. (2007). Microstructure-based crystal plasticity modeling of cyclic deformation of Ti–6Al–4V. Int. J. Plastic. 23, 1328–1348.
Index
A
B
Aberration coefficients 244, 269, 271, 273, 275, 276, 309, 310, 313, 333, 334, 336, 337, 349 integral 273, 309, 349 order 276, 278, 279, 292 polynomials 244, 309–311, 314 Aberrations 243, 257, 269, 273, 276, 277, 279, 280, 287, 297, 298, 310–312, 314, 329–332, 338, 344, 348, 351 axially symmetric 330, 336, 338, 340 fifth-order 338, 347, 349 geometric 310 Accelerating voltage 392, 393 Achromatic axis 78, 94–99 Acquisition system 73, 74 Adapted hybrid color space 149 Algebra of axial symmetric polynomials 340 of functions 289 of quadratic polynomials 244, 312, 315 Algorithm à trous 183, 186, 187, 215, 216 color pixels classification 149, 151 fast slant stack Radon 225, 227 Anisotropic part 349, 352, 353, 355 Aperture planes 244, 262, 264, 277–279, 283, 286, 287, 292, 295, 301, 304, 334, 337, 349 position 271, 292, 294, 337, 349 Astigmatism 330, 331, 355 Asymptote 43, 46, 51 Axial potential 257, 276 Axial symmetric magnetic field 342 magnetic lens 244, 269 space 255 systems 310
Backprojection 24, 25, 28, 31, 37, 60, 62, 230 segment 20, 23, 24, 34–37, 42, 50 Beam 74, 270, 368, 369, 393, 405 Boundaries reconstructed 398, 399 true 398, 399 Boundary assumptions 212 planes 383, 396, 399–403, 409 trace 397, 401 Brain 67, 69, 71, 89 Brightness 70, 85, 88
C Cameras 70, 71, 79, 80, 109, 110, 365, 368, 369, 389 one-chip color 71, 72 Canonical transformation 250, 251, 265, 267, 289–294, 311, 312, 356 Cardiac CT 3, 52, 55 Cardiac cycle 53 Charged particle optics 244, 280 Chromaticity coordinates 78, 79, 81–83, 91, 92, 96, 97, 106, 107 diagram 78, 81–87, 107 Chrominance components 85, 86, 88, 89, 91, 92 CIE color appearance model (CIECAM) 102 CIE (Commission Internationale de l’Éclairage) 76, 80, 81, 86, 92, 94, 102 Circle and line trajectory (CLT) 3, 34–36, 41, 60 Class construction 123, 158, 161 Class of pixels 111, 119, 121, 122, 126–128, 131–133, 135, 137, 138, 145, 149, 150, 153, 158, 159
417
418 Clusters 111, 119, 121–123, 134, 138, 395 of color points 111, 119–121, 133, 136, 138 well-separated 119, 136 Coding of color spaces 104–108, 161 Coding scheme 105–107, 109 dependent 105–107, 109 independent 106, 107, 109 Color 66–76, 78–81, 83, 84, 86, 92, 94, 96, 102, 104, 106, 107, 109–112, 115–121, 129, 130, 132, 136–138, 140, 144, 145, 149, 153, 156, 375, 376 camera 71–73, 75 component images 112, 123, 124, 128, 137 levels 66, 119, 126, 137, 153 values 78, 105 components 73, 74, 78, 81, 88–90, 92–102, 104, 105, 107, 111, 114, 117, 118, 126, 129, 130, 135, 147, 149, 153, 157, 158, 161 transformed 104, 105 cube 97 differences 85, 86 discrepancies 144 display 74, 104 distribution 104, 119, 121, 123, 126, 133, 136, 140 domains 137–139 edge detection 113 pixels 114 filters 71, 74, 80 gradient 112–114 histogram 111, 121, 123, 126, 129–131, 151 image 67, 71, 73–75, 91, 100, 102, 104, 111, 123, 126, 129, 137, 139, 151, 159 acquisition 66, 67, 71, 74, 102, 109 analysis 66, 74–76, 79, 91, 101, 102, 104, 105, 109 digitization 66, 67, 73 segmentation 66, 101, 102, 109, 110, 115, 123, 130, 140, 143, 146–148, 151, 154, 161 visualization 74 information 68, 71, 89, 92, 140 key 375, 376
INDEX management 80 measurement 69 mixture 69, 70 additive 69, 70, 73, 75, 77, 78, 81, 83 subtractive 69, 70, 80 model 94, 102 point distributions 119, 121, 136, 139 points 85, 111, 119–123, 125, 132, 136, 138, 139 properties 110, 111, 119, 136, 156, 159 quantization 73, 130 segmentation 140 sensation 67 signal 69 space 66, 74–81, 83–86, 88, 92, 100–102, 104–107, 109–112, 115, 117, 119–123, 125, 126, 129, 130, 132, 133, 135, 136, 138–140, 143, 147–149, 151, 153, 156–161 analysis 119, 161 (C, M, Y ) 80 Carron’s (Y, Ch1 , Ch2 ) 122 CIE (R, G, B) 76–78 CIE (X, Y, Z) 81, 106 conversion 80, 85 families 103, 104 (I, S, T ) 94, 97, 111, 112, 116–118, 121 (L, C, H ) 94 (L, M, S) 89, 90 most discriminating 140, 147, 151 primary 80 (R, G, B) 66, 79–84, 88–91, 100–102, 106, 111, 116–123, 125, 126, 132, 155–160 reference (R, G, B) 77, 78 sRGB 80 (U, V , W ) 86 uniform 86, 88 (X, Y, Z) 81, 82, 84–86, 88, 92, 106 (Y , Cb , Cr ) 89 (Y , Dr , Db ) 89 (Y , U , V ) 89, 111, 112, 116–118, 121, 158, 160 space-coding scheme 104 stimulus 67–71, 77, 78, 81, 90 subsets 138, 139 transformations 104 vectors 77, 92, 100, 153, 159 Colored lights 70, 74
419
INDEX Colorfulness 70 Colorimetry 69, 70 Colors false 71, 117, 128, 131, 132, 138, 159 homogeneous 138 human perception of 67, 70 primary 76, 78, 81, 92, 96 Complex polynomials, space of 316, 324–326, 329 Computed tomography, see CT Computers 71, 80, 367, 368, 389 Cones 68, 69, 89 Connectedness degree 156–158 Conversion of color spaces 81, 82, 88, 90 Convexity analysis 131–133 Convolution matrix 212 property 172, 211, 212 Coordinate spaces perceptual 92, 94 polar 92–94 Coordinates focus-detector 8 interaction 294, 297, 334 planar-detector 10, 11, 25, 36 rotating 259, 260, 262–264, 266–268, 277, 283, 284, 287, 292, 294, 302, 334, 342, 347 Crystallography 383, 384, 387, 394, 409 CT 1–4, 52 CT reconstruction 2, 6, 61 Curvelet transform 178, 179, 194–196, 219, 223 Curvelets 194, 195, 197, 225, 234
D Decay rates 188, 193 Design particle 245, 247 Detector 2–4, 7, 8, 10, 15, 24, 49, 50, 404, 405, 408 rows 3, 4, 40, 60 wedge 8–10 Deviation, standard 111, 130, 154, 217, 218 Device, color image display 67 DFT matrix 178, 200, 207, 211 Diagonal matrix 201, 207, 209, 211, 335 Differential algebraic method 244, 273–277, 297
Differential equations 266, 274, 277, 294 linear 258, 274, 275, 297 ordinary 257, 264 Diffraction pattern 364, 365, 367, 368, 370–374, 381, 382, 384, 388–390, 393, 405, 408 Digital color image 104 acquisition 67, 109 Discrepancy measure 141–143, 145 Discrete cosine transform (DCT) 206, 208, 209 Discrete frame 184, 185 Discrete sine transform (DST) 206, 208 Discriminating powers 153, 158, 159 Dispersion 137, 139, 266, 296 Distribution 110, 119, 123, 130, 400–402 statistical 374, 375 Dyadic 174, 178, 193, 194, 218, 223
E EBSD 363, 364, 367–369, 371, 374, 380–385, 387, 388, 390, 392–394, 396, 403–411 camera 369, 389, 390, 408 data 374, 375, 380, 384, 387, 396, 403–405, 408, 411 maps 382, 384, 404–406 patterns 367, 370, 374, 409, 410 system 367–369, 382, 404 technology 364, 382–384 ECG (electrocardiogram) 53, 55 Edge 111, 117, 118, 132, 141, 142, 188, 189, 191, 194, 195, 209 binary images 113 detection 111, 114, 115, 140, 141, 143, 146, 161 pixels 111, 112, 115, 116, 141 Eigenvalues 101, 178, 185, 201, 316, 318, 320, 329, 335, 340, 341 Eigenvectors 100, 324, 325 Eikonal method 243, 273, 280, 297, 298, 309, 310 Electron 71, 242, 245, 251, 261, 265, 267, 364, 365, 393, 405 optical systems 242, 243, 273 optics 242–244, 249, 273, 288 probe 368, 371, 388, 390, 392 Electron backscatter diffraction, see EBSD Electrostatic lenses 257, 333
420 Equations of motion 245, 263, 265, 269, 282, 290 Expansion 193, 255, 257, 263, 297, 310, 338, 342 Eye 67–71, 89, 367
F FCC 75, 83, 85 Feature vectors 178, 227, 228, 230–232, 235 rotation invariant 226, 227 translation invariant 226, 227 Federal Communications Commission, see FCC FEGSEM (field emission gun SEMs) 369, 390, 392, 393 FIB 383, 404–407 tomography 404 Field curvature 330, 331, 355 electromagnetic 242, 245, 248, 249, 251 of view (FOV) 7, 38, 42 quadrupole 254, 255, 258 Figure of merit (FOM) 142 Filter line 26–29, 35–39, 43, 45, 46, 48, 50–54, 56–58 tangential 37, 39, 51, 52, 56–58 Filtering 3, 19, 24, 25, 28, 30, 43 Filters 3, 24, 37, 115, 182, 217, 220, 222, 225, 408 scaling 182, 183, 204, 212, 214 First-derivative function 154 First-order perturbation 282–287 Focus 10, 19, 268, 270 detector 7–10, 13, 56 Focused ion beam, see FIB Fourier descriptors 227, 231, 235 slice theorem 18, 19, 172, 197, 204 Frame bounds 184, 185, 209, 234 elements 177, 194, 212, 213
INDEX network 394, 395 Grains 384, 385, 397–399, 403, 409 neighboring 367, 393, 396, 397, 399, 400 Gray scale 227, 230, 231 Grid pseudopolar 178, 207, 213, 214, 235 pseudospiral 208, 210 Ground truth 140–143, 146, 151, 152
H Haar system 181, 182 Hamilton–Jacobi approach 250, 290 equation 251 Hamiltonian 249–251, 263–267, 276, 290, 291, 293, 294, 296, 302, 311, 313, 315, 342, 355, 356 interaction 291, 297, 303, 304, 313, 338, 357 Helical CT 3, 35, 42, 59 Helix 4, 8, 12, 14, 15, 23, 24, 34, 42, 43, 46, 50, 55 Hexcone model 94, 97, 98 double 94, 99 Histogram 123, 125, 126, 128–133, 153–158, 375 one-dimensional 123, 124, 129, 155, 156 smoothed 154, 156, 157 Holes 141, 142 Homogeneous polynomials 290, 311, 316, 329 Homogram 137 Hough transform 368, 371–373, 390 Hue 70, 76, 92, 94, 96, 97 component 96, 98, 100, 109 Human eye response 75, 88, 91 Human observer 67, 73, 77, 85, 86 Human visual system 73, 89 Hybrid color spaces (HCS) 102, 104, 109, 140, 147, 149, 151, 152
G
I
Gamma correction 74, 75, 109 Gradient 26, 43, 45, 47, 111, 112, 137 Grain boundary 375, 379–381, 383–385, 390–397, 399, 400, 403, 410 distribution 400
Illuminants 80, 82–84, 90, 106, 107 Image acquired 73, 74 acquisition 79, 90 analysis 2, 101, 104, 161
421
INDEX binary 111–113, 141 coding 80, 104 data 2, 60 digital 66, 67, 89 matrix 205, 225 plane 73, 110, 111, 115, 119, 121, 139, 140, 153, 156, 265, 267, 269, 279, 283, 284, 287, 292, 293, 300, 304, 347, 349 processing 2, 104, 169, 175, 179 recognition 177, 179, 221, 227, 235 segmentation 65, 140, 147, 149, 161 sensor 67, 71, 73, 109 synthetic 111, 112, 115, 119, 120, 132, 136 Independent axis spaces 76, 100–102, 109 Independent component analyses 101 Indexing 364, 368, 371, 372, 389 Interpolation 8, 10, 178, 213, 221, 227, 230, 235 Intersection point (IP) 23, 24, 34–36, 38, 39, 42, 43, 47–51, 55 Irreducible subspaces 313, 314, 316, 318, 322, 324, 326, 328, 329, 340 ITU (International Telecommunication Union) 80, 83, 85, 89
K Karhunen–Loeve transformation 100, 101, 126, 128, 208 Kikuchi bands 367, 370, 371, 373, 390
L Lagrangian 248, 280, 298 perturbation method 269 Lattice strain 371, 375, 380, 409 Lie algebraic method 243, 244, 251, 273, 288–290, 296, 297, 302, 309, 312, 338 brackets 312, 340, 341, 345, 348, 353, 354 transform 296 Light 67, 68, 70, 73, 74, 242 optics 242, 243, 249, 333 Line, Pi 14, 15, 35, 37, 42, 43 Luminance 70, 78, 81, 84, 85, 88 component 81, 85, 86, 88, 89, 91–94, 101
Luminance-chrominance spaces 76, 84, 85, 91–94, 101, 102, 107
M Maps 274, 313, 314, 335, 374–376, 379, 384, 385, 389–391, 393, 398, 399, 404 transfer 243, 311, 314 Material 67, 68, 70, 371, 382, 387, 389, 392–394, 399, 401, 405, 409, 411 crystalline 364, 411 multiphase 383, 384, 407, 408 Matrix, orthogonal 178, 179, 205, 208, 209, 234 Maxwell triangle 78, 96 Mean square error (MSE) 144, 145, 148 Mean-shift method 138 Metals, fcc 402, 411 Microstructure 364, 367, 368, 374, 375, 380–385, 387, 395, 400, 403, 404 characterization 384, 385, 387, 406, 411 Microtexture 367, 375, 380, 404, 406, 407, 411 Misorientation 367, 374, 375, 380, 381, 383, 385, 393, 396, 399–402, 409 axes 379, 396, 401, 402 map 376, 380, 410 Mode detection 125, 126, 129–132, 154 Monitors 67, 70, 74, 75, 80 Monochromatic 267, 295, 297 Multiples of a random distribution (MRD) 399–402
N NTSC (National Television Standards Committee) 75, 80, 88, 90, 106 Numerical methods 73, 257, 271, 272, 275 Numerical solution 256, 257, 271
O Object plane
262, 265, 267, 272, 286, 298, 303, 305, 310, 329, 334, 337, 349, 351 points 7, 12, 14, 15, 20, 23–25, 28, 31, 32, 35, 38, 40–43, 45, 47, 50 position 272, 277, 278, 334, 337 Optical systems 72, 242–244, 276, 333
422 Orientation 68, 85, 96, 177, 258, 367, 374–376, 379, 380, 385, 396, 399, 403, 409 crystallographic 367, 368, 370, 374, 376, 396 distributions 376, 380 mapping 368, 375, 382–385, 408 maps 364, 368, 374, 375, 381–383, 385, 397
P Paraxial approximation 312 Partition 66, 129, 193, 215 Pattern quality 371, 375 Patterns 177, 367, 369–372, 375, 379–382, 388–390, 405 Perturbation methods 243, 244, 269, 273, 280, 295, 297, 309, 312 parameter 278, 280, 285, 295 Phase identification 364, 383, 384, 388, 407, 408, 410 space 243, 288, 296, 309 Phase-space variables 250, 311–313 Phosphor screen 364, 365, 367, 368, 374 Photons 4, 67, 71 Pixel 66, 72–74, 79, 80, 100–102, 109–112, 114, 116–119, 121–123, 125–141, 144, 145, 147, 149, 152, 153, 156–161, 203, 374, 379, 389, 390, 404 classes 123, 125, 131–133, 136, 138, 149, 161 classification 123, 147, 149, 159 Pixels colors of 110, 119, 123, 130 connected 110, 121, 138 neighboring 117, 137 Planes 20, 24, 31, 36, 50, 78, 81, 95, 170, 258, 331, 370, 399–401 chromatic 119, 120, 125, 130, 132, 136 Player pixels 150–153 Poisson brackets 274, 276, 288–290, 296, 312, 331, 340, 341 equation 252, 255, 257 Polynomial solution 269, 274
INDEX space
274, 277, 310, 314, 316, 322, 324, 326 Polynomials 257, 263, 269, 290, 305, 313, 315–317, 319, 320, 322–328, 330, 331, 333, 336, 339–341, 344, 345, 347, 349, 354, 355 anisotropic 339, 341 axial symmetric 242, 332, 338–340 fifth-order 348, 351, 354, 355 isotropic 339, 341 k-th order homogeneous 278 real 242, 318, 324, 326, 333 symmetric 324, 338–341 Pratt’s discrepancy measure 142, 143 Primaries 69, 70, 75–83, 89, 106 Primary spaces 76, 80, 81, 89, 90, 92, 94, 102, 106, 107, 109 Principal component analysis (PCA) 100, 101, 126 Prism 72 Probabilistic error approach 141, 144, 145 Probability error 143, 144, 148 Probe current 392, 393 Projection matrices 235 Projections 6, 10, 11, 13–15, 17, 20, 28, 32–34, 36–38, 40, 41, 43, 55, 56, 78, 96–99, 136, 178, 208–210, 213, 227, 369, 370
Q Quadratic polynomials 244, 311, 312, 315 Quadratic part of Hamiltonian 263–266, 291, 311, 315 of Lagrangian 280 Quantitative evaluation methods 140, 141, 143, 146
R Radon matrix 205, 206, 211, 212 p-adic 199, 200 planes 16, 19, 23, 24, 34, 36, 38–40, 42, 43, 50, 51, 55 transform 15–19, 170–172, 175, 178, 190, 191, 204, 207, 214, 217, 224, 231, 234 classical 171 discrete 173, 175, 177–179, 197, 207, 210, 213, 221, 225–227, 231
423
INDEX discrete p-adic 178, 197, 199, 202–204, 207, 234 generalized discrete 177–179, 205–207, 209–213, 235 standard 177, 226, 227, 230, 231, 235 Rank order 155, 158 Rays 245, 262, 264, 268, 271–273, 283, 286, 287, 294, 310, 337, 347 perturbed 282, 283, 292 Receptors 71, 72 Reconstruction 2, 3, 5, 20, 27, 28, 50, 53, 55, 59, 170, 185, 197, 373, 383, 404 algorithms 2, 3, 15, 17, 19, 23, 24, 34, 38, 41, 42, 44, 50, 55, 60 exact 12, 27, 49 Reference white 76–80, 82, 85, 86, 90, 106, 109, 161 Reflection symmetry 332, 333 Region construction 115, 117, 119, 140, 143, 147, 148 Region growing method 115 Region of interest (ROI) 7 Regions 7, 51, 66, 85, 101, 110, 111, 115–119, 121, 125, 126, 130, 133, 136–139, 144–147, 149, 153, 158, 159, 161, 168, 215, 374, 375, 380, 381 adjacent 110, 116–118, 379 inhomogeneous color 147 neighboring 116 reconstructed 126, 139, 159 uniform color 118 Representation 75, 94, 99, 109, 184, 185, 234, 244, 311, 313, 318 by biorthogonal wavelets 185 of h2 313, 314, 316, 324, 325 of Lie group 244 of sl(2, R) 319 of Sp(2, R) 244 of sp(2, R) 319 Resolution 2, 71, 72, 137, 181, 186, 242, 390, 392, 393, 400, 401, 410 cell 217 effective 393 RGB color cube 77, 78, 81, 94, 96 Ridgelet 190, 191, 221, 234 local 217, 218, 222, 223 transform 177–179, 190, 191, 193, 231, 234–236 finite 178
generalized discrete 212, 215, 234 generalized local 216, 217 local 178, 213, 219, 220, 234 ROC (receiver operator characteristic) curves 144–146, 148 Rotation 2, 177, 178, 227, 230, 258, 259, 311 angles 227, 230 axis 6, 8, 9, 32 Round magnetic lens 244, 273, 297, 309, 338, 342, 350, 357
S Sampled volume 368, 370, 374, 380, 392 SAR (synthetic aperture radar) images 217, 218, 239 Saturation 70, 92, 94, 95 components 76, 96–100 Scale invariance 227 Scaling functions 181, 185, 186 Scanning electron microscope, see SEM Screen 74, 75 Segment, Pi 14, 15, 27, 42, 43 Segmentation 66, 110, 111, 115, 117, 118, 121, 128, 137–141, 144–146, 159–161, 398 Segmented image 117, 121, 126, 127, 130, 133, 141–144, 146–148, 159 Segments 2, 14, 37, 60, 61, 97, 111, 137, 147, 151, 230, 399 Seidel weight 329, 345, 348, 349, 351, 353–355 Selection matrix 179, 205, 207–209, 211, 212 SEM 363–365, 368, 384, 389, 392, 393, 404, 406 image 368, 389, 399 tungsten-filament 369, 393 Sensor array 71, 72 Serial sectioning 383, 396, 399–401, 404, 406, 407 Shearlets 194, 196 Signal to noise ratio (SNR) 2, 231 Signals 69, 74, 88, 89, 180, 188, 189, 192, 209 Singularity 191, 193 Slices 41, 170, 177, 209, 214, 231, 235, 404–406 Slope 8, 245, 257, 262, 267, 272, 277, 279, 282, 286, 305, 310
424 Smoothness 176, 191 SMPTE (Society of Motion Picture and Television Engineers) 83, 85, 88 Source positions 8, 9, 11, 12, 28, 40, 43, 45, 47 Space of axially symmetric polynomials 339 of zero-mean functions 204 Spaces 75, 76, 81, 84, 85, 90, 91, 100, 101, 104, 106, 111, 145, 147, 158, 171, 173, 202, 315, 325, 401 antagonist 89, 91 perceptual 76, 92, 102, 109 perceptually uniform 85, 86, 94 television 85, 88, 90 Spatial resolution 369, 382, 388–390, 392, 393, 407 Spatial-color classification 136 Spatial-frequency tiling 194–197 SPD (spectral power distribution) 68, 70, 71, 73 Speckle 217 Spectral response 73 Sphere in Radon space 17, 18 Spherical aberration 244, 270, 271, 330, 338, 345, 347–349, 351 Spot, focal 5, 7, 10–12, 20, 25 Steels, austenitic 371, 372, 381, 382, 404, 410 Stereographic projection 10, 401 Stigmatic systems 244, 261, 264, 283, 284, 287, 292, 294, 311, 314 Structure, algebraic 316, 340, 342, 345 Sunrise 14, 15 Sunset 14, 15 System Deutsches Institut für Normung (DIN) 75 Munsell 75 Natural Color System (NCS) 75 Optical Society of America 75
T Tangency 37, 39, 58 Taylor polynomial 310 Texture 376, 380, 381 Thresholding hard 220, 222 soft 219, 220, 222
INDEX Thresholds 111, 115–117, 125, 129, 130, 134, 154, 158, 159 Time parameterization 245, 248, 249, 272 Trajectories 3, 6, 10, 19, 20, 23, 24, 26, 32–35, 42, 60, 245–249, 251, 257, 267, 297 circular 6, 10, 24 helical 5, 6, 8, 12, 17, 42 Trajectory equations 242–245, 247–251, 257, 258, 261, 263, 265, 267–269, 272, 274, 277, 278, 280, 295, 304, 305 paraxial 258–261, 264 method 243, 273, 276, 277, 279, 295, 297, 304, 309, 310 Transform, rapid 226, 227, 231, 235 Transformation 78, 84, 101, 105, 106, 161, 250, 289–292, 303, 311, 312, 314, 332, 335–337, 344, 347, 362 matrix 80, 85, 88, 90, 91, 101, 106, 337 phase 408, 409, 413 Transmission electron microscope (TEM) 380, 392 Triangle model 94, 95, 97 Triangular grid 207–209, 212 Triple junctions 394, 395, 397 Tristimulus values 69, 73, 78, 81, 86, 92 Tube-detector system 2, 4–6
V Values, singular 227, 232 Vision, human color 66, 67
W Wavelengths 68, 70, 71, 73, 76, 80, 242 Wavelet approximation error 188, 190 coefficients 186, 210 transform 178, 181, 183, 186, 190 Wavelets 178, 179, 181, 183, 186, 188, 190, 197, 217, 234 biorthogonal 183, 185 Wedge-detector projections 8 White point 76, 83, 98 Window boundary, Pi 14, 43, 51, 52, 57 Windows, Pi 12–15, 27, 28, 50, 55, 56, 59
X X-ray tube
2, 4, 7
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 2. mixture. (b) Subtractive color mixture.
Color mixture. (a) Additive color
(a)
(b) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 3. camera. (b) Three-chip color camera.
Color camera. (a) One-chip color
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 4.
RGB color cube.
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 10.
Triangle model.
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 11.
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 12.
Hexcone model.
Double hexcone model.
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 13.
Color space families.
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 16. Synthetic image whose color is coded in different color spaces. (a) Color synthetic (R, G, B) image. (b) Image of (a) coded in the (Y , U , V ) color space. (c) Image of (a) coded in the (I, S, T ) color space.
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 17. Different approaches for color edge detection. (a) Analysis of edge binary images. (b) Analysis of the norm of a color gradient. (c) Analysis of a color gradient vector.
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 18. Edge pixels of the images in Figure 16 detected by Di Zenzo’s method. The parameters α, Thh , and Thl are set to 1.0, 10, and 5, respectively. (a) (R, G, B) color space. (b) (Y , U , V ) color space. (c) Triangular (I, S, T ) color space.
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 19. Segmentation of the images in Figure 16 by region growing. The region grows until the max of the differences between one color component of the region and that of the examined pixel is higher than 10. (a) (R, G, B) color space. (b) (Y , U , V ) color space. (c) Triangular (I, S, T ) color space.
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 20. Adjacency graph of regions. (a) Presegmented image. (b) Schematic adjacency graph of the image of figure (a).
(a)
(b)
(c)
(d)
(e) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 21. Clusters of color points of the image of figure (a) in the (R, G, B) color space. (a) Color synthetic image. (b) Clusters of color points in the (R, G, B) color space. (c) Clusters of color points in the (R, G) chromatic plane. (d) Clusters of color points in the (G, B) chromatic plane. (e) Clusters of color points in the (R, B) chromatic plane.
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 22. Clusters of color points of the image of Figure 21a in different color spaces. (a) Clusters of color points in the (Y , U , V ) color space. (b) Clusters of color points in the (I, S, T ) color space.
(a) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 23. Clusters corresponding to different classes projected onto different color spaces. (a) Classes of pixels to be considered. (b) Color points in the (R, G, B) color space. (c) Color points in Carron’s (Y, Ch1 , Ch2 ) color space. (d) Color points in the hybrid (x, Ch2 , I 3) color space.
(b)
(c)
(d) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 23.
(continued)
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 24. One-dimensional histograms of the image in Figure 16a. (a) One-dimensional histogram of the color component image IR . (b) One-dimensional histogram of the color component image IG . (c) One-dimensional histogram of the color component image IB .
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 25. Mode detection in the chromatic plane (G, B) by the analysis of the 1D histograms H G [I] and H B [I] of the image in Figure 21a.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 26. Iterative segmentation of the Hand image. (a) Hand image. (b) Pixels assigned to the class built at the first iteration. (c) Pixels to be analyzed after the first iteration. (d) Pixels assigned to the class built at the second iteration. (e) Pixels to be analyzed after the second iteration. (f) Pixels assigned to the class built at the third iteration. (g) Pixels to be analyzed after the third iteration. (h) Pixels assigned to the class built at the fourth iteration. (i) Pixels to be analyzed after the fourth iteration. (j) Pixels assigned to the class built at the fifth iteration. (k) Pixels to be analyzed after the fifth iteration. (l) Segmented image.
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 28. One-dimensional histograms of the IX1 , IX2 , and IX3 color component images in Figure 27. (a) One-dimensional histogram of the image in Figure 27a. (b) One-dimensional histogram of the image in Figure 27b. (c) One-dimensional histogram of the image in Figure 27c.
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 29. Image of Figure 21a segmented by Tominaga’s scheme. (a) Pixels assigned to the class built at the first iteration step. (b) Segmented image segmented after the last iteration step.
(a)
(b)
(c)
(d)
(e)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 30. Mode detection by convexity analysis of the color histogram. (a) Image. (b) Histogram. (c) Modes of the histogram detected by the convexity test; their identification is illustrated by false colors. (d) Prototypes of the three built pixel classes. (e) Pixels assigned to the three classes.
(a)
(b)
(c) B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 31. Clustering-based segmentation of the image of Figure 21a. (a) Image segmented thanks to the c-means scheme with a random initialization of the gravity centers. (b) Image segmented thanks to the c-means scheme with an interactive initialization of the gravity centers. (c) Image segmented thanks to the competitive learning scheme.
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 32. Relationship between the regions in the image and the clusters of color points in a color space. (a) Original synthetic image. (b) Clusters of color points projected onto the (R, G) chromatic plane.
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 33. Image in Figure 32a segmented by the “mean-shift” analysis. The labels of pixels are false colors. (a) hs = 8, hr = 4, M = 1. (b) hs = 8, hr = 8, M = 1.
(a)
(b)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 34. Image in Figure 32a segmented by the SCD analysis. (a) Color domains of the image in Figure 32a automatically selected by the proposed procedure. (b) Segmented color image from the analysis of the image of Figure 32a. The boundaries of the reconstructed regions are displayed in black.
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 35. Example of three classes of pixels representing the players of the two opposing teams and the referee.
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 36. pixels).
Color soccer images (125 × 125
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 37. images in Figure 36.
Player pixels extracted from the
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 38. in the hybrid color space.
Player pixels of Figure 37 classified
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 39. Figure 37.
Ground truth of the player pixels of
(a)
(b)
(c)
(d)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 41. One-dimensional histograms H i,j of the House image in the (R, G, B) color space (i = 1). (a) Original image. (b) Red one-dimensional histogram H 1,1 . (c) Green one-dimensional histogram H 1,2 . (d) Blue one-dimensional histogram H 1,3 .
(a)
(b)
(c)
(d)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 42. Smoothed one-dimensional hisi,j tograms Hσ by Lin et al.’s method processed from one-dimensional histograms of Figure 41. (a) Original image. (b) Smoothed one-dimensional histogram Hσ1,1 . (c) Smoothed one-dimensional histogram Hσ1,2 . (d) Smoothed one-dimensional histogram Hσ1,3 .
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 43. the smoothed 1D histogram Hσ1,3 .
Features of the detected modes from
(a)
(b)
(c)
(d)
B USIN , VANDENBROUCKE AND M ACAIRE , F IGURE 44. Segmentation of the House image (255 × 255 pixels) by the proposed method. (a) Original image House. (b) Segmented image House by the proposed method. (c) Segmented image House when the (R, G, B) color space is selected at each iteration. (d) Segmented image House when the (Y , U , V ) color space is selected at each iteration.
BUSIN, VANDENBROUCKE AND MACAIRE, TABLE 11 N UMBER OF M ODES N I,J (I ) AND D ISCRIMINATING P OWER R I,J (I ) OF THE M OST D ISCRIMINATING 1D-H ISTOGRAM J (I ) OF THE C OLOR S PACE I S ELECTED AT E ACH S TEP OF THE P ROCEDURE A PPLIED TO THE H OUSE I MAGE Iteration step
I
J (I )
N I,J (I )
R I,J (I )
(Y , U , V ) (Y , U , V ) (R, G, B) (R, G, B) (bw, rg, by) (I 1, I 2, I 3) (r, g, b) (Y , U , V )
3 3 2 2 1 1 1 1
4 3 5 4 3 3 2 2
3.42 2.54 3.48 2.52 1.75 1.94 1.20 1.49
R ANDLE , F IGURE 4. Sequence showing online diffraction pattern acquisition and indexing. (a) Processed pattern from an austenitic steel. (b) Hough transform (medium resolution) of the diffraction pattern in (a). (c) Detection of bands in the processed pattern. (d) Indexing of pattern.
R ANDLE , F IGURE 6. Basic orientation maps from an annealed nickel specimen. (a) Diffraction pattern quality map wherein decreasing pattern quality is shown by increasing grayscale and unsolved patterns are depicted black. (b) Microtexture map showing crystallographic orientations of the specimen normal direction. (c) Color key for the map in (b). (d) Misorientation map showing random high-angle (>15 degrees) boundaries in black, low-angle (3–15 degrees) boundaries in gray, and Σ3, Σ9, and Σ27 interfaces in red, blue, and yellow, respectively. (e) Raw data diffraction pattern quality map with unsolved pixels in black.
R ANDLE , F IGURE 6.
(continued)
R ANDLE , F IGURE 8. Orientation map showing an intergranular crack path in type 304 austenitic steel. Σ3, Σ9, and all other CSL boundaries are red, blue, and yellow, respectively. The gray background is showing pattern quality, and unsolved pixels are black. (Courtesy D. Engleberg.)
R ANDLE , F IGURE 16. Schematic illustration of the principle of the five-parameter stereology method. T is the boundary trace direction and N is the boundary plane normal. If a sufficiently large number of observations are made, the true boundary plane will accumulate more than the false planes and will form a peak in the distribution at N . See text for details. (Courtesy G. Rohrer.)
R ANDLE , F IGURE 17. Distribution of boundary planes in a brass specimen averaged over all misorientations and expressed as multiples of a random distribution (MRD) shown in standard stereographic projection. (a) All boundaries. (b) Σ3 boundaries excluded.
R ANDLE , F IGURE 18. Distribution of boundary planes for selected misorientations in a brass specimen expressed as multiples of a random distribution (MRD) and shown in standard stereographic projection. (a) 30 degrees/[100] misorientation, (b) 30 degrees/[111] misorientation, (c) 30 degrees/[110] misorientation. The [110] misorientation axis is marked on (c).
(a) R ANDLE , F IGURE 20. (a) EBSD maps of consecutive FIB sections through partially recrystallized Ni–0.3wt% Si. The slices represent section depths of 0.2 µm ± 0.05 µm. SiO2 particles and recrystallized grains containing annealing twins are seen in the deformed matrix. (b) 3D rendering of (a). (Courtesy M. Ferry.)
(b) R ANDLE , F IGURE 20.
(continued)