1,229 90 9MB
Pages 559 Page size 335 x 511 pts Year 2007
Chemometrics in Spectroscopy
This page intentionally left blank
Chemometrics in Spectroscopy
Howard Mark Mark Electronics
Suffern, New York
USA
Jerry Workman Jr. Thermo Fischer Scientific Inc.
Molecular Spectroscopy & Microanalysis
Madison, WI
USA
Amsterdam • Boston • Heidelberg • London • New York • Oxford
Paris • San Diego • San Francisco • Singapore • Sydney • Tokyo
Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 84 Theobald’s Road, London WC1X 8RR, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA First edition 2007 Copyright © 2007 Elsevier Inc. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made ISBN: 978-0-12-374024-3
For information on all Academic Press publications visit our website at books.elsevier.com
Printed and bound in USA 07 08 09 10 11
10 9 8 7 6 5 4 3 2 1
Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org
Dedication To our families and to our readers � � � – Howard Mark and Jerry Workman
This page intentionally left blank
Contents Preface Note to Readers 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.
A New Beginning � � � Elementary Matrix Algebra: Part 1 Elementary Matrix Algebra: Part 2 Matrix Algebra and Multiple Linear Regression: Part 1 Matrix Algebra and Multiple Linear Regression: Part 2 Matrix Algebra and Multiple Linear Regression: Part 3 – The Concept of
Determinants Matrix Algebra and Multiple Linear Regression: Part 4 – Concluding
Remarks Experimental Designs: Part 1 Experimental Designs: Part 2 Experimental Designs: Part 3 Analytic Geometry: Part 1 – The Basics in Two and Three Dimensions Analytic Geometry: Part 2 – Geometric Representation of Vectors and
Algebraic Operations Analytic Geometry: Part 3 – Reducing Dimensionality Analytic Geometry: Part 4 – The Geometry of Vectors and Matrices Experimental Designs: Part 4 – Varying Parameters to Expand the Design Experimental Designs: Part 5 – One-at-a-time Designs Experimental Designs: Part 6 – Sequential Designs Experimental Designs: Part 7 – �, the Power of a Test Experimental Designs: Part 8 – �, the Power of a Test (Continued) Experimental Designs: Part 9 – Sequential Designs Concluded Calculating the Solution for Regression Techniques:
Part 1 – Multivariate Regression Made Simple Calculating the Solution for Regression Techniques: Part 2 – Principal
Component(s) Regression Made Simple Calculating the Solution for Regression Techniques: Part 3 – Partial Least
Squares Regression Made Simple Looking Behind and Ahead: Interlude A Simple Question: The Meaning of Chemometrics Pondered Calculating the Solution for Regression Techniques: Part 4 – Singular
Value Decomposition Linearity in Calibration Challenges: Unsolved Problems in Chemometrics Linearity in Calibration: Act II Scene I Linearity in Calibration: Act II Scene II – Reader’s Comments � � � Linearity in Calibration: Act II Scene III
xi
xiii
1
9
17
23
33
43
47
51
57
63
71
77
81
85
89
91
93
97
101
103
107
109
113
117
119
127
131
135
141
145
149
viii
32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.
Contents
Linearity in Calibration: Act II Scene IV Linearity in Calibration: Act II Scene V Collaborative Laboratory Studies: Part 1 – A Blueprint Collaborative Laboratory Studies: Part 2 – using ANOVA Collaborative Laboratory Studies: Part 3 – Testing for Systematic Error Collaborative Laboratory Studies: Part 4 – Ranking Test Collaborative Laboratory Studies: Part 5 – Efficient Comparison of Two
Methods Collaborative Laboratory Studies: Part 6 – MathCad Worksheet Text Is Noise Brought by the Stork? Analysis of Noise: Part 1 Analysis of Noise: Part 2 Analysis of Noise: Part 3 Analysis of Noise: Part 4 Analysis of Noise: Part 5 Analysis of Noise: Part 6 Analysis of Noise: Part 7 Analysis of Noise: Part 8 Analysis of Noise: Part 9 Analysis of Noise: Part 10 Analysis of Noise: Part 11 Analysis of Noise: Part 12 Analysis of Noise: Part 13 Analysis of Noise: Part 14 Derivatives in Spectroscopy: Part 1 – The Behavior of the Derivative Derivatives in Spectroscopy: Part 2 – The “True” Derivative Derivatives in Spectroscopy: Part 3 – Computing the Derivative Derivatives in Spectroscopy: Part 4 – Calibrating with Derivatives Comparison of Goodness of Fit Statistics for Linear Regression:
Part 1 – Introduction Comparison of Goodness of Fit Statistics for Linear Regression:
Part 2 – The Correlation Coefficient Comparison of Goodness of Fit Statistics for Linear Regression:
Part 3 – Computing Confidence Limits for the Correlation Coefficient Comparison of Goodness of Fit Statistics for Linear Regression:
Part 4 – Confidence Limits for Slope and Intercept Correction and Discussion Regarding Derivatives Linearity in Calibration: Act III Scene I – Importance of Nonlinearity Linearity in Calibration: Act III Scene II – A Discussion of the
Durbin-Watson Statistic, a Step in the Right Direction Linearity in Calibration: Act III Scene III – Other Tests for Nonlinearity Linearity in Calibration: Act III Scene IV – How to Test for Nonlinearity Linearity in Calibration: Act III Scene V – Quantifying Nonlinearity Linearity in Calibration: Act III Scene VI – Quantifying Nonlinearity, Part
II, and a News Flash Connecting Chemometrics to Statistics: Part 1 – The Chemometrics Side Connecting Chemometrics to Statistics: Part 2 – The Statistics Side Limitations in Analytical Accuracy: Part 1 – Horwitz’s Trumpet
159
163
167
179
183
185
187
193
223
227
235
243
253
271
277
285
293
299
313
317
323
329
339
351
359
371
379
385
393
399
413
421
427
435
439
451
459
471
477
481
Contents
72. Limitations in Analytical Accuracy: Part 2 – Theories to Describe the
Limits in Analytical Accuracy 73. Limitations in Analytical Accuracy: Part 3 – Comparing Test Results for
Analytical Uncertainty 74. The Statistics of Spectral Searches 75. The Chemometrics of Imaging Spectroscopy Glossary of Terms Index Colour Plate Section
ix
487
491
497
503
509
513
This page intentionally left blank
Preface
This large single volume fulfils the need for chemometric-based tutorials on topics of interest to analytical chemists or other scientists performing modern mathematical and statistical operations for use with analytical measurements. The book covers a very broad range of chemometric topics as indicated in the extensive table of contents. This book is a collection of the series of columns first published in Spectroscopy providing detailed mathematical and philosophical discussions on the use of chemometrics and statistical methods for scientific measurements and analytical methods. In addition the new revolution in biotechnology and the use of spectroscopic techniques therein provides an opportunity for those scientists to strengthen their use of mathematics and calibration through the use of this book. Subjects covered include those of interest to many groups of scientists, mathemati cians, and practicing analysts for daily problem solving as well as detailed insights into subjects difficult to thoroughly grasp for the non-specialist. The coverage relies more on concept delineation than on rigorous mathematics, but the descriptive mathematics and derivations are included for the more rigorously minded. Sections on matrix algebra, analytic geometry, experimental design, instrument and system calibration, noise, derivatives and their use in data analysis, linearity and nonlinearity are described. Collaborative laboratory studies, using ANOVA, testing for systematic error, ranking tests for collaborative studies, and efficient comparison of two analytical methods are included. Discussion on topics such as the limitations in analytical accuracy; and brief introductions to the statistics of spectral searches; and the chemometrics of imaging spectroscopy are included. The popularity of the Chemometrics in Spectroscopy series (ongoing since the early 1990s) as well as the Statistics in Spectroscopy series and books has been overwhelming and we sincerely thank our readership over the years. We have received e-mails from many people, one memorable one thanking us that a career change was made due to the renewed and stimulated interest in statistics and chemometrics due largely to our thought-provoking columns. We hope you find this collection useful and will continue to read the columns and write to us with your thoughts, comments, and questions regarding this stimulating topic. Howard Mark Suffern, NY Jerry Workman Madison, WI
This page intentionally left blank
Note to Readers
In some cases there were errors, both trivial and significant, in the original column from which a given chapter was taken. Sometimes we found the error ourselves (unfortunately after the column was printed) and sometimes, more embarrassingly, the error was brought to our attention by one of our ever-vigilant readers. For all significant errors, the necessary corrections were made in a subsequent column; in all cases, the corrected version is what is in this book. Sometimes, for the more serious errors, we note that the corresponding column was erroneous, so that any reader who wants to go back to the original will be aware that a comparison with what is presented here will fail.
This page intentionally left blank
1
A New Beginning � � �
Why do we title this chapter “A New Beginning � � � ”? Well, there are a lot of reasons. First of all, of course, is the simple fact that that is just the way we do things. Secondly, is the fact that we developed this book in much the same way we developed our previous book Statistics in Spectroscopy (SiS). Those of you out there who have followed the series of articles published in Spectroscopy magazine since 1986 know that for the most part, each column in the series was pretty much self-contained and could stand alone, yet also fit into that series in the appropriate place and contributed to the flow of information in that series as a whole. We hope to be able to reproduce that on a larger scale. Just as the series Statistics in Spectroscopy (this is too long to write out each time, from here on we will abbreviate it SiS) was self-contained and stood alone, so too will we try to make this new series stand alone, and at the same time be a worthy successor to SiS, and also continue to develop the concepts we began there. Thirdly is the fact that we are finally starting to write again. To you, our readership, it may seem like we have been writing continuously since we began SiS, but in fact we have been running on backlog for a longer time than you would believe. That was advantageous in that it allowed us time to pursue our personal and professional lives including such other projects as arranging for SiS to be published as a book [1]. The downside of our getting ahead of ourselves, on the other hand, is that we were not able to keep you abreast on the latest developments related to our favorite topic. However, since the last time we actually wrote something, there have been a number of noteworthy developments. Our last series dealt only with the elementary concepts of statistics related to the general practice of calibration used for UV-VIS-NIR and occasionally for IR spec troscopy. Our purpose in writing SiS was to help provide a small foot bridge to cross the gap between specialized chemometrics literature written at the expert level and those general statistics articles and texts dealing with examples and questions far removed from chemistry or spectroscopic practice. Since the beginning of the “Statistics” series in 1986, several reviews, tutorials, and textbooks have been published to begin the construction of a major highway bridging this gap. Most notably, at least in our minds, have been tutorial articles on classical least squares (CLS), principal components regression (PCR), and partial least squares regression (PLSR) by Haaland and Thomas [2, 3]. Other important work includes textbooks on calibration and chemometrics by Naes and Martens [4], and Mark [5]. Chemometric reviews discussing the progress of tutorial and textbook literature appear regularly in Analytical Chemistry, Critical Review issues. Another recent series of articles on chemometric concepts termed “The Chemometric Space” by Naes and Isaksson has appeared [6]. In addition, there is a North American chapter of the International Chemometrics Society (NAmICS) which we are told has
2
Chemometrics in Spectroscopy
over 300 members. Those interested in joining or obtaining further information may contact Professor Thomas O’Haver at the Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742 (Donald B. Dahlberg, 1993, personal communication). All the foregoing was true as of when the Chemometrics column began in 1993. Now in 2006, when we are preparing this for book publication, there are many more sources of information about Chemometrics. However, since this is not a review of the field, we forebear to list them all, but will correct one item that has changed since then: to obtain information about NAmICS, or to join the discussion group, contact David Duewer at NIST ([email protected])) or send a message to the discussion group ([email protected]). Finally, since imitation is the sincerest form of flattery (or so they tell us), we are pleased to see that others have also taken the route of printing longer tutorial discussions in the form of a series of related articles on a given topic. Two series that we have no qualms recommending, on topics related to ours, have appeared in some of the sister publications of Spectroscopy [7–15] (note: there have been recent indications that the series in Spectroscopy International has continued beyond the ones we have listed. If we can obtain more information we will keep you posted – Spectroscopy International has also undergone some transformations and it is not always easy to get copies). So, overall the chemometrics bridge between the lands of the overly simplistic and severely complex is well under construction; one may find at least a single lane open by which to pass. So why another series? Well, it is still our labor of love to deal with specific issues that plague ourselves and our colleagues involved in the practice of multivariate qualitative and quantitative spectroscopic calibration. Having collectively worked with hundreds of instrument users over 25 combined years of calibration problems, we are compelled, like bees loaded with pollen, to disseminate the problems, answers, and questions brought about by these experiences. Then what would a series named “Chemometrics in Spectroscopy” hope to cover which is of interest to the readers of “Spectroscopy”? We have been taken to task (with perhaps some justice) for using the broader title label “Chemometrics in Spectroscopy” for what we have claimed will be discussions of the somewhat narrower range of topics included in the field of multivariate statistical algorithms applied to chemical problems, when the term “Chemometrics” actually applies to a much wider range of topics. Nevertheless, we will use this title, for a number of reasons. First, that is what we said we were going to do, and we hate to not follow through, even on such a minor point. Secondly, we have said before (with all due arrogance) that this is our column, and we have been pretty fortunate that the editors of Spectroscopy have always pretty much let us do as we please. Finally, at this point we consider the possibility that we may very well eventually extend our range to include some of these other topics that the broader term will cover. As of right now, some of the topics we foresee being able to expand upon over the series will include, but not be limited to • The multivariate normal distribution • Defining the bounds for a data set
A New Beginning � � �
3
• The concept of Mahalanobis distance • Discriminant analysis and its subtopics of – Sample selection – Spectral matching (Qualitative analysis) • Finding the maximum variance in the multivariate distribution • Matrix algebra refresher • Analytic geometry refresher • Principal components analysis (PCA) • Principal components regression (PCR) • More on Multiple linear least squares regression (MLLSR), also known as Multiple linear regression (MLR) and P-matrix, and its sibling, K-matrix • More on Simple linear least squares regression (SLLSR), also known as Simple least squares regression (SLSR) or univariate least squares regression • Partial least squares regression (PLSR) • Validation of calibration models • Laboratory data and assessing error • Diagnosis of data problems • An attempt to standardize statistical/chemometric terms • Special calibration problems (and solutions) • The concept of outliers: theory and practice • Standardization concepts and methods for transfer of calibrations • Collaborative study problems related to methods and instruments. We also plan to include in the discussions the important statistical concepts, such as correlation, bias, slope, and associated errors and confidence limits. Beyond this, it is also our hope that readers will write to us with their comments or suggestions for chemometric challenges which confront them. If time and energy permit, we may be able to discuss such issues as neural networks, general factor analysis, clustering techniques, maximizing graphical presentation of data, and signal processing.
THE MULTIVARIATE NORMAL DISTRIBUTION We will begin with the concept of the multivariate normal distribution. Think of a cigar, suspended in space. If you cannot think of a cigar suspended in space, look at Figure 1-1a. Now imagine the cigar filled with little flecks of stuff, as in Figure 1-1b (it does not really matter what the stuff is, mathematics never concerned itself with such unimportant details). Imagine the flecks being more densely packed toward the middle of the cigar. Now imagine a swarm of gnats surrounding the cigar; if they are attracted to the cigar, then naturally there will be fewer of them far away from the cigar than close to it (Figure 1-1c). Next take away the cigar, and just leave the flecks and the gnats. By this time, of course, you should realize that the flecks and the gnats are really the same thing, and are neither flecks nor gnats but simply abstract representations of points in space. What is left looks like Figure 1-1d.
4
Chemometrics in Spectroscopy (a)
(b)
(c)
(d)
Figure 1-1 Development of the concept of the Multivariate Normal Distribution (this one shown having three dimensions) – see text for details. The density of points along a cross-section of the distribution in any direction is also an MND, of lower dimension.
Figure 1-1d, of course, is simply a pictorial/graphical representation of what a Multivariate Normal Distribution (MND) would look like, if you could see it. Furthermore, it is a representation of only one particular MND. First of all, this particular MND is a three-dimensional MND. A two-dimensional MND will be represented by points in a plane, and a one-dimensional MND is simply the ordinary Normal distri bution that we have come to know and love [16]. An MND can have any number of dimensions; unfortunately we humans cannot visualize anything with more than three dimensions, so for our examples we are limited to such pictures. Also, the MND depicted has a particular shape and orientation. In general, an MND can have a variety of shapes and orientations, depending upon the dispersion of the data along the different axes. Thus, for example, it would not be uncommon for the dispersion along two of the axes to be equal and independent. In this case, which represents one limiting situation, an appropriate cross-section of the MND would be circular rather than elliptical. Another limiting situation, by the way, is for two or more of the variables to be perfectly corre lated, in which case the data would lie along a straight line (or plane, or hyperplane as the corresponding higher-dimensional figure is called). Each point in the MND can be projected onto the planes defined by each pair of the axes of the coordinate system. For example, Figure 1-2 shows the projection of the data onto the plane at the “bottom” of the coordinate system. There it forms a twodimensional MND, which is characterized by several parameters, the two-dimensional MND being the prototype for all MNDs of higher dimension and the properties of this MND are the characteristics of the MND that are the key defining properties of it. First of all, the data contributing to an MND itself has a Normal distribution along any of the
A New Beginning � � �
5
Figure 1-2 Projecting each point of the three-dimensional MND onto any of the planes defined by two axes of the coordinate system (or, more generally, any plane passing through the coor dinate system) results in the projected points being represented by a two-dimensional MND). The correlation coefficients for the projections in all planes are needed to fully describe the original MND.
axes of the MND. We have discussed the Normal distribution previously [16], and have seen that it is described by the expression: f �x� = ae−�
x−x �
�
2
(1-1)
The MND can be mathematically described by an expression that is similar in form, but has the characteristic that each of the individual parts of the expression represents the multivariate analog of the corresponding part of equation 1-1. Thus, for example, where x represents the mean of the data for which equation 1-1 describes the distribution, there is a corresponding quantity X that represents in matrix notation the fact that for each of the axes shown in Figure 1-1, each datum has a value, and therefore the collection of data has a mean value along each dimension. This quantity represented as a list of the set of means along all the different dimensions is called a vector, and is represented as X (as opposed to x, an individual mean). If we project the MND onto each axis of the coordinate system containing the MND, then as stated above, these projections of the data will be distributed as an ordinary Normal distribution, as shown in Figure 1-3. This distribution will itself then have a standard deviation, so that another defining characteristic of the MND is the standard deviation of the projection of the MND along each axis. This must also then be represented by a vector.
Figure 1-3 Projecting the points onto a line results in a point density that is our familiar Normal Distribution.
6
Chemometrics in Spectroscopy
The final key point to note about the MND, which can also be seen from Figure 1-2, is the fact that when the MND is projected onto the plane defined by any two axes of the coordinate system the data may show some correlation (as does the data in Figure 1-2). In fact, the projection onto any of the planes defined by two of the axes will have some value for the correlation coefficient between the corresponding pair of variables. The amount of correlation between projections along any pair of axis can vary from zero, in which case the data would lie in a circular blob, to unity, in which case the data would all lie exactly on a straight line. Since each pair of axes define another plane, many such projections may be possible, depending on the number of dimensions in which the MND exists. Indeed, every possible pair of axes in the coordinate system defines such a plane. As we have shown, we mere mortals cannot visualize more than three dimensions, as so our examples and diagrams will be limited to showing data in three or lesser dimensions, but the mathematical descriptions can be extended with all generality, to as high dimensionality as might be needed. Thus, the full description of the MND must include all the correlations of the data between every pair of axes. This is conventionally done by creating what is known as the correlation matrix. This matrix is a square matrix, in which any given row or column corresponds to a variable, and the individual positions (i.e., the m, n position for example, where m and n represent indices of the variables) in the matrix represent the correlation between the variable represented by the row it lies in and the variable represented by the column it lies in. In actuality, for mathematical reasons, the correlation itself is not used, but rather the related quantity the covariance replaces the correlation coefficient in the matrix. The elements of the matrix that lie along what is called the main diagonal (i.e., where the column and row numbers are the same) are then the variances (the square of the standard deviation – this shows that there is a rather close relationship between the standard deviation and the correlation) of the data. This matrix is thus called the variance-covariance matrix, and sometimes just the covariance matrix for simplicity. Since it is necessary to represent the various quantities by vectors and matrices, the operations for the MND that correspond to operations using the univariate (simple) Normal distribution must be matrix operations. Discussion of matrix operations is beyond the scope of this column, but for now it suffices to note that the simple arithmetic operations of addition, subtraction, multiplication, and division all have their matrix counterparts. In addition, certain matrix operations exist which do not have counterparts in simple arithmetic. The beauty of the scheme is that many manipulations of data using matrix operations can be done using the same formalism as for simple arithmetic, since when they are expressed in matrix notation, they follow corresponding rules. However, there is one major exception to this: the commutative rule, whereby for simple arithmetic: A (operation) B = B (operation) A e.g.: A + B = B + A A−B = B−A does not hold true for matrix multiplication: A×B = B×A
A New Beginning � � �
7
That is because of the way matrix multiplication is defined. Thus, for this case the order of appearance of the two matrices to be multiplied may provide different matrices as the answer. Thus, instead of f�x� and the expression for it in equation 1-1 describing the simple Normal distribution, the MND is described by the corresponding multivariate expression (1-2): f �X� = Ke−�X−X�
T A�X−X�
(1-2)
where now the capital letters X and K represent vectors, and the capital letter A represents the covariance matrix. This is, by the way, a somewhat straightforward extension of the definition (although it may not seem so at first glance) because for the simple univariate case the matrix A degenerates into the number 1, X becomes x, and thus the exponent becomes simply the square of x − x. Most texts dealing with multivariate statistics have a section on the MND, but a particularly good one, if a bit heavy on the math, is the discussion by Anderson [17]. To help with this a bit, our next few chapters will include a review of some of the elementary concepts of matrix algebra. Another very useful series of chemometric related articles has been written by David Coleman and Lynn Vanatta. Their series is on the subject of regression anal ysis. It has appeared in American Laboratory in a set of over twenty-five articles. Copies of the back articles are available on the web at the URL address found in reference [18].
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Mark, H. and Workman, J., Statistics in Spectroscopy (Academic Press, Boston, 1991) Haaland, D. and Thomas, E., Analytical Chemistry 60, 1193–1202 (1988). Haaland, D. and Thomas, E., Analytical Chemistry 60, 1202–1208 (1988). Naes, T. and Martens, H., Multivariate Calibration (John Wiley & Sons, New York, 1989). Mark, H., Principles and Practice of Spectroscopic Calibration (John Wiley & Sons, New York, 1991). Naes, T. and Isaksson, T., “The Chemometric Space”, NIR News (PO Box 10, Selsey, Chichester, West Sussex, PO20 9HR, UK, 1992). Bonate, P.L., “Concepts in Calibration Theory”, LC/GC, 10(4), 310–314 (1992). Bonate, P.L., “Concepts in Calibration Theory”, LC/GC, 10(5), 378–379 (1992). Bonate, P.L., “Concepts in Calibration Theory”, LC/GC, 10(6), 448–450 (1992). Bonate, P.L., “Concepts in Calibration Theory”, LC/GC, 10(7), 531–532 (1992). Miller, J.N., “Calibration Methods in Spectroscopy”, Spectroscopy International 3(2), 42–44 (1991). Miller, J.N., “Calibration Methods in Spectroscopy”, Spectroscopy International 3(4), 41–43 (1991). Miller, J.N., “Calibration Methods in Spectroscopy”, Spectroscopy International 3(5), 43–46 (1991). Miller, J.N., “Calibration Methods in Spectroscopy”, Spectroscopy International 3(6), 45–47 (1991).
8
Chemometrics in Spectroscopy
15. Miller, J.N., “Calibration Methods in Spectroscopy”, Spectroscopy International 4(1), 41–43 (1992). 16. Mark, H. and Workman, J., “Statistics in Spectroscopy – Part 6 – The Normal Distribution”, Spectroscopy 2(9), 37–44 (1987). 17. Anderson, T.W., An Introduction to Multivariate Statistical Analysis (Wiley, New York, 1958). 18. Coleman, D. and Vanatta, L., Statistics in Analytical Chemistry, International Scientific Com munications, Inc. found at http://www.iscpubs.com/articles/index.php?2.
2
Elementary Matrix Algebra: Part 1
You may recall that in the first chapter we promised that a review of elementary matrix algebra would be forthcoming; so the next several chapters will cover this topic all the way from the very basics to the more advanced spectroscopic subjects. You may already have discovered that the term “matrix” is a fanciful name for a table or list. If you have recently made a grocery list you have created an n×1 matrix, or in more correct nomenclature, an Xn×1 matrix where n is the number of items you would like to buy (rows) and 1 is the number of columns. If you have become a highly sophisticated shopper and have made lists consisting of one column for Store A and a second one for Store B, you have ascended into the world of Xn×2 matrix. If you include the price of each item and put brackets around the entire column(s) of prices, you will have created a numerical matrix. By definition, a numerical matrix is a rectangular array of numbers (termed “ele ments”) enclosed by square brackets [ ]. Matrices can be used to organize information such as size versus cost in a grocery department, or they may be used to simplify the problems associated with systems or groups of linear equations. Later in this chapter we will introduce the operations involved for linear equations (see Table 2-1 for common symbols used).
Table 2-1 Common symbols used in matrix notation Matrix∗ Determinant∗ Vectors∗ Scalars∗ Parameters or matrix names Errors and residuals Addition Subtraction Multiplication Division Empty or null set Inverse of a matrix Transpose of a matrix Generalized inverse of a matrix Identity matrix ∗
[X] or X �X� x x A, B, C, G, H, P, Q, R, S, U, V D, E, F + − × or • ÷ or / � [X]−1 �X�� or [X]T [X]− [I] of [1]
Where X or x are represented by any letter, generally those are listed under “Parameters or matrix names” in this table.
10
Chemometrics in Spectroscopy
The symbols below represent a matrix:
a1 a2
b1 b2
Note that a1 and a2 are in column 1, b1 and b2 are in column 2, a1 and b1 are in row 1, and a2 and b2 are in row 2. The above matrix is a 2 × 2 (rows × columns) matrix. The first number indicates the number of rows, and the second indicates the number of columns. Matrices can be denoted as X2×2 using a capital, boldface letter with the row and column subscript.
MATRIX OPERATIONS The following illustrations are useful to describe very basic matrix operations. Discus sions covering more advanced matrix operations will be included in later chapters, but for now, just review these elementary operations.
Matrix addition To add two matrices, the following operation is performed:
a1 a2
b1 c + 1 b2 c2
d1 a + c1 = 1 d2 a2 + c 2
b1 + d 1 b2 + d 2
To add larger matrices, the following operation applies:
a1 a2
b1 b2
c1 c2
d1 e + 1 e2 d2
f1 f2
g1 g2
h1 a + e1 = 1 a2 + e 2 h2
b1 + f 1 b 2 + f2
c1 + g1 c2 + g2
d1 + h 1 d2 + h 2
c1 − g1 c2 − g2
d1 − h 1 d2 − h 2
Subtraction For subtraction, use the following operations:
a1 a2
b1 c − 1 b2 c2
d1 a − c1 = 1 d2 a2 − c 2
b1 − d 1 b2 − d 2
The same operation holds true for larger matrices such as
a1 a2
b1 b2
and so on.
c1 c2
d1 e − 1 d2 e2
f1 f2
g1 g2
h1 a − e1 = 1 h2 a2 − e 2
b1 − f 1 b 2 − f2
Elementary Matrix Algebra: Part 1
11
Matrix multiplication To multiply a scalar by a matrix (or a vector) we use a A 1 a2
A × a1 b1 = b2 A × a2
A × b1 A × b2
where A is a scalar value.
The product of two matrices (or vectors) is given by
a1 a2
b1 c × 1 b2 c2
d1 a c + b1 c2 = 1 1 d2 a2 c 1 + b 2 c 2
a1 d1 + b1 d2 a2 d1 + b2 d2
In another example, in which an X1×2 matrix is multiplied by an X2×1 matrix, we have:
a1
b1
a × 2 = �a1 b1 + a2 b2 � b2
denoted by X1 × X2 in matrix notation.
Matrix division Division of a matrix by a scalar is accomplished:
a1 a2
b1 a A ÷ A = 1 b2 a2 A
b1 A b2 A
where A is a scalar value.
Inverse of a matrix The inverse of a matrix is the conceptual equivalent to its reciprocal. Therefore if we denote our matrix by X, then the inverse of X is denoted as X−1 and the following relationship holds. X × X−1 = �1� = X−1 × X where [1] is an identity matrix. Only square matrices, which have an equal number of rows and columns (for example, 2 × 2, 3 × 3, 4 × 4, etc.) have inverses. Several computer packages provide the algorithms for calculating the inverse of square matrices. The identity matrix for a 2 × 2 matrix is �1�2×2 =
1 0
0 1
12
Chemometrics in Spectroscopy
and for a 3 × 3 matrix, the identity matrix is ⎡
1 �1�3×3 = ⎣ 0 0
0 1 0
⎤ 0 0⎦ 1
and so on. Note that the diagonal is always composed of ones for the identity matrix, and all other values are zero. To summarize, by definition: X2×2 × X−1 2×2 = �1�2×2 The basic methods for calculating X−1 will be addressed in the next chapter.
Transpose of a matrix The transpose of a matrix is denoted by X� (or, alternatively, by XT �. For example, for the matrix: �X� = a1 a2
b1 b2 ⎡
then
a1 �X�� = ⎣ b1 c1
c1 c2
⎤ a2 b2 ⎦ c2
The first column of [X] becomes the first row of �X�� ; the second column of [X] becomes the second row of �X�� ; the third column of [X] becomes the third row of �X�� ; and so on.
ELEMENTARY OPERATIONS FOR LINEAR EQUATIONS To solve problems involving calibration equations using multivariate linear models, we need to be able to perform elementary operations on sets or systems of linear equations. So before using our newly discovered powers of matrix algebra, let us solve a problem using the algebra many of us learned very early in life. The elementary operations used for manipulating linear equations include three simple rules [1, 2]: • Equations can be listed in any order for convenience and organizational purposes. • Any equation may be multiplied by any real number other than zero. • Any equation in a series of equations can be replaced by the sum of itself and any other equation in the system. As an example, we can illustrate these operations using
Elementary Matrix Algebra: Part 1
13
the three equations below as part of what is termed an “equation system” or simply a “system” (equations 2-1 through 2-3): 1a1 + 1b1 = −2
(2-1)
4a1 + 2b1 + c1 = 6
(2-2)
6a1 − 2b1 − 4c1 = 14
(2-3)
To solve for this system of three equations, we begin by following the three elementary operations rules above: • We can rearrange the equations in any order. In our case the equations happen to be in a useful order. • We decide to multiply equation 2-1 by a factor such that the coefficients of a are of opposite sign and of the same absolute value for equations 2-1 and 2-2. Therefore, we multiply equation 2-1 by −4 to yield −4a1 − 4b1 = 8
(2-4)
• We can eliminate a1 in the first and the second equations by adding equations 2-4 and 2-2 to give equation (2-5) �−4a1 − 4b1 = 8� + �4a1 + 2b1 + c1 = 6� = 6a1 − 2b1 + c1 = 14
(2-5)
and we bring equation 2-1 back in the system by dividing equation 2-4 by −4 to get a1 + b1 = −2
(2-6)
−2b1 + c1 = 14
(2-7)
6a1 − 2b1 − 4c1 = 14
(2-8)
Now to eliminate the a1 term in equations 2-6 and 2-8, we multiply equation 2-6 by −6 to yield −6a1 − 6b1 = 12
(2-9)
Then we add equation 2-9 to equation 2-8: �−6a1 − 6b1 = 12� + �6a1 − 2b1 − 4c1 = 14� = −8b1 − 4c1 = 26
(2-10)
14
Chemometrics in Spectroscopy
Now we bring back equation 2-6 in its original form by dividing equation 2-9 by −6, and our system of equations looks like this: a1 + b1 = −2
(2-11)
−1b1 + c1 = 14
(2-12)
−8b1 − 4c1 = 26
(2-13)
We can eliminate the b1 term from equations 2-12 and 2-13 by multiplying equation 2-12 by −8 and equation 2-13 by 2 to obtain 16b1 − 8c1 = −112
(2-14)
−16b1 − 8c1 = 52
(2-15)
−16c1 = −60
(2-16)
Adding these equations, we find
Restore equation 2-7 by dividing equation 2-14 by −8 to yield a1 + b1 = −2
(2-17)
−2b1 + c1 = 14
(2-18)
−16c1 = −60
(2-19)
The solution Solving for c1 , we find c1 = �−60/ − 16� = 3�75� Substituting c1 into equation 2-18, we obtain −2b1 + 3�75 = 14� Solving this for b1 , we find b1 = −5�13� Substituting b1 into equation 2-17 , we find a1 + �−5�13� = −2. Solving this for a1 , we find a1 = 3�13� Finally, a1 = 3�13 b1 = −5�13 c1 = 3�75 A system of equations where the first unknown is missing from all subsequent equations and the second unknown is missing from all subsequent equations is said to be in echelon form. Every set or equation system comprised of linear equations can be brought into echelon form by using elementary algebraic operations. The use of augmented matrices can accomplish the task of solving the equation system just illustrated.
Elementary Matrix Algebra: Part 1
15
For our previous example, the original equations a1 + b1 = −2
(2-20)
4a1 + 2b1 + c1 = 6
(2-21)
6a1 − 2b1 − 4c1 = 14
(2-22)
can be written in augmented matrix form as: ⎡ ⎤ 1 1 0 −2 ⎣4 2 1 6⎦ 6 −2 −4 14
(2-23)
The echelon form of the equations can also be put into matrix form as follows. Echelon form: a1 + b1 = −2
(2-24)
−2b1 + c1 = 14
(2-25)
−16c1 = −60
(2-26)
Matrix form: ⎡
1 ⎣0 0
1 −2 0
⎤ 0 −2 1 14 ⎦ −16 −60
(2-27)
SUMMARY In this chapter, we have used elementary operations for linear equations to solve a problem. The three rules listed for these operations have a parallel set of three rules used for elementary matrix operations on linear equations. In our next chapter we will explore the rules for solving a system of linear equations by using matrix techniques.
REFERENCES 1. Kowalski, B.R., Recommendations to IUPAC Chemometrics Society (Laboratory for Chemo metrics, Department of Chemistry, BG-10, University of Washington, Seattle, WA 98195; 1985), pp. 1–2. 2. Britton, J.R. and Bello, I., Topics in Contemporary Mathematics (Harper & Row, New York, 1984), pp. 408–457.
This page intentionally left blank
3 Elementary Matrix Algebra: Part 2
ELEMENTARY MATRIX OPERATIONS To solve the set of linear equations introduced in our previous chapter referenced as [1], we will now use elementary matrix operations. These matrix operations have a set of rules which parallel the rules used for elementary algebraic operations used for solving systems of linear equations. The rules for elementary matrix operations are as follows [2]: 1) Rows can be listed in any order for convenience or organizational purposes. 2) All elements within a row may be multiplied using any real number other than zero. 3) Any row can be replaced by the element-by-element sum of itself and any other row. To solve a system of equations, our first step is to put zeros into the second and the third rows of the first column, and into the third row of the second column. For our exercise we will bring forward equations 2-1 through 2-3 as (equation set 3-1): 1a1 + 1b1 = −2 4a1 + 2b1 + 1c1 = 6 6a1 − 2b1 − 4c1 = 14
(3-1)
We can put the above set or system of equations in matrix notation as: ⎡ 1 A = ⎣4 6
⎤ 0 1⎦ −4
1 2 −2
⎡ ⎤ a1 B = ⎣ b1 ⎦ c1
⎡
⎤ −2 C = ⎣ 6⎦ 14
and so, AB = C
or
A • B = C
Matrix A is termed the “matrix of the equation system”. The matrix formed by A C is termed the “augmented matrix”. For this problem the augmented matrix is given as:
⎡
1 A C = ⎣4 6
1 2 −2
0 1 −4
⎤ −2 6⎦ 14
18
Chemometrics in Spectroscopy
Now if we were to find a set of equations with zeros in the second and the third rows of the first column, and in the third row of the second column we could use equations 2-17 through 2-19 [1] which look like (equation set 3-2): a1 + b1 = −2 −2b1 + c1 = 14 −16c1 = −60 we can rewrite these equations in matrix notation as: ⎡ ⎤ ⎡ ⎤ 1 1 0 a1 1⎦ H = ⎣b1 ⎦ G = ⎣0 −2 c1 0 0 −16
(3-2) ⎡
⎤ −2 P = ⎣ 14⎦ −60
and the augmented form of the above matrices is written as: ⎤ ⎡ −2 1 1 0 G P = ⎣0 −2 14⎦ 1 0 0 −16 −60 For equation 2-7, we can reduce or simplify the third row in G P by following Rule 3 of the basic matrix operations previously mentioned. As such we can multiply row III in G P by 1/2 to give ⎡ ⎤ −2 1 1 0 1 14⎦ G P = ⎣0 −2 0 0 −8 −30 We can use elementary also known as elementary matrix to row operations, operations obtain matrix G P from A C . By the way, if we can achieve G P from A C using these operations, the matrices are termed “row equivalent” denoted by X1 ∼ X2 . To begin with an illustration of the use of elementary matrix operations let us use the following example. Our original A matrix above can be manipulated to yield zeros in rows II and III of column I by a series of row operations. The example below illustrates this: ⎤ ⎡ ⎤ ⎡ 1 1 0 −2 1 1 0 −2 ⎣4 2 1 1 6⎦ ∼ ⎣0 −2 14⎦ 6 −2 −4 0 −8 −4 14 26 The left-hand augmented matrix is converted to the right-hand augmented matrix by II/II − 4I or row II is replaced by row II minus 4 times row I. Then III/III − 6I or row III is replaced by row III minus 6 times row I. To complete the row operations to yield G P from A C we write ⎤ ⎡ ⎤ ⎡ −2 1 1 0 −2 1 1 0 ⎣0 −2 1 14⎦ ∼ ⎣0 −2 1 14⎦ 0 −8 −4 0 0 −8 −30 26
Elementary Matrix Algebra: Part 2
19
This is accomplished by III/III − 4II or row III is replaced by row III minus 4 times row II. As we have just shown using two series of row operations we have ⎡ ⎤ 1 1 0 −2 ⎣0 −2 1 14⎦ 0 0 −8 −30 which is equivalent to equations 2-17 through 2-19, and equations (3-3) above; this is shown here as (equation set 3-3). a1 + b1 = −2 −2b1 + c1 = 14 −8c1 = −30
(3-3)
Now, solving for c1 = −30/− 8 = 375; substituting c1 into equation 2-18, we find −2b1 + 375 = 14, therefore b1 = −513; and substituting b1 into equation 2-17, we find a1 + −513 = −2, therefore a1 = 313; and so, a1 = 313 b1 = −513 c1 = 375 Thus matrix operations provide a simplified method for solving equation systems as compared to elementary algebraic operations for linear equations.
CALCULATING THE INVERSE OF A MATRIX In Chapter 2, we promised to show the steps involved in taking the inverse of a matrix. Given a 2 × 2 matrix X2×2 , how is the inverse calculated? We can ask the question another way as, “What matrix when multiplied by a given matrix Xr×c will give the identity matrix ([I])? In matrix form we may write a specific example as: −2 −3
1 1 ∼ 2 0
0 1
Therefore,
−2 −3
1 c × 1 d1 2
1 d1 = 0 d2
0 =1 1
or stated in matrix notations as A × B = I, where B is the inverse matrix of A, and [I] is the identity matrix.
20
Chemometrics in Spectroscopy
By multiplying A × B we can calculate the two basic equation systems to use in solving this problem as: −2c1 + 1c2 = 1 System 1 −3c1 + 2c2 = 0 −2d1 + 1d2 = 0
System 2
−3d1 + 2d2 = 1 The augmented matrices are denoted as: −2 1 −3 2
1 0
0 1
The first (preceding) matrix is reduced to echelon form (zeros in the first and the second rows of column one) by 0 −2 1 1 −2 1 1 0 ∼ −3 2 0 1 0 −1 3 −2 The row operation is II/3I − 2II or row II is replaced by three times row I minus two times row II. The next steps are as follows: 0 −2 0 4 −2 −2 1 1 ∼ 0 −1 3 −2 0 −1 3 −2 with row operations as (I/I + II) and I/ − 1/2I.
Thus c1 = −2, c2 = −3, d1 = 1, and d2 = 2. So B = A−1 (inverse of A) and
−2 1 −1 A = −3 2 So now we check our work by multiplying A • A−1 as follows: −2 1 −2 1 −2 × −2 + 1 × −3 −2 × 1 + 1 × 2 −1 × = A × A = −3 2 −3 2 −3 × −2 + 2 × −3 −3 × 1 + 2 × 2 1 0 = = 1 0 1 By coincidence, we have found a matrix which when multiplied by itself gives the identity matrix or, saying it another way, it is its own inverse. Of course, that does not generally happen, a matrix and its inverse are usually different.
SUMMARY Hopefully Chapters 1 and 2 have refreshed your memory of early studies in matrix algebra. In this chapter we have tried to review the basic steps used to solve a system of linear equations using elementary matrix algebra. In addition, basic row operations
Elementary Matrix Algebra: Part 2
21
were used to calculate the inverse of a matrix. In the next chapter we will address the matrix nomenclature used for a simple case of multiple linear regression.
REFERENCES 1. Workman, J., Jr. and Mark, H., Spectroscopy 8(7), 16–19 (1993). 2. Britton, J.R. and Bello, I., Topics in Contemporary Mathematics (Harper & Row, New York, 1984), pp. 408–457.
This page intentionally left blank
4
Matrix Algebra and Multiple Linear Regression: Part 1
In a previous chapter we noted that by augmenting the matrix of coefficients with unit matrix (i.e., one that has all the members equal to zero except on the main diagonal, where the members of the matrix equal unity), we could arrive at the solution to the simultaneous equations that were presented. Since simultaneous equations are, in one sense, a special case of regression (i.e., the case where there are no degrees of freedom for error), it is still appropriate to discuss a few odds and ends that were left dangling. We started in the previous chapter with the set of simultaneous equations: 1a + 1b + 0c = −2
(4-1a)
4a + 2b + 1c = 6
(4-1b)
6a − 2b − 4c = 14
(4-1c)
(where we now leave the subscripts off the variables for simplicity, with no loss of generality for our current purposes). Also note that here we write all the coefficients out explicitly, even when the ones and zeroes do not necessarily appear in the original equations – this is so that they will not be inadvertently left out of the matrix expressions, where the “place filling” function must be performed), and we noted that we could express these equations in matrix notation as: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 a −2 2 1⎦ B = ⎣b ⎦ C = ⎣ 6⎦ A = ⎣4 6 −2 −4 c 14 where the equations then take the matrix form: A ∗ B = C
(4-2)
The question here is, how did we get from equations 4-1 to equation 4-2? The answer is that it is not at all obvious, even in such a simple and straightforward case, how to break up a group of algebraic equations into their equivalent matrix expression. It turns out, however, that going in the other direction is often much simpler and straightforward. Thus, when setting up matrix expressions, it is often desirable to run a check on the work to verify that the matrix expression indeed correctly represents the algebraic expression of interest. In the current case, this can be done very simply by carrying out the matrix multiplication indicated on the left-hand side of equation 4-2. Thus, expanding the matrix expression AB into its full representation, we obtain ⎡ ⎤ ⎡ ⎤ 1 1 0 A ⎣4 2 1⎦ × ⎣ B ⎦ (4-3) 6 −2 −4 C
24
Chemometrics in Spectroscopy
From our previous chapter defining the elementary matrix operations, we recall the operation for multiplying two matrices: the i j element of the result matrix (where i and j represent the row and the column of an element in the matrix respectively) is the sum of cross-products of the ith row of the first matrix and the jth column of the second matrix (this is the reason that the order of multiplying matrices depends upon the order of appearance of the matrices – if the indicated ith row and jth column do not have the same number of elements, the matrices cannot be multiplied). Now let us apply this definition to the pair of matrices listed above. The first matrix (A) has three rows and three columns. The second matrix (B) has three rows and one column. Since each row of A has three elements, and the single column of B has three elements, matrix multiplication is possible. The resulting matrix will have three rows, each row resulting from one of the rows of matrix A, and one column, corresponding to the single column in the matrix B. Thus the first row of the result matrix will have the single element resulting from the sum-of-products of the first row of A times the column of B, which will be 1a + 1b + 0c
(4-4)
Similarly the second row of the result matrix will have the single element resulting from the sum-of-products of the second row of A times the column of B, which will be 4a + 2b + 1c
(4-5)
and the third row of the result matrix will have the single element resulting from the sum-of-products of the third row of A times the column of B, which will be 6a + −2b + −4c
(4-6)
6a − 2b − 4c
(4-7)
or, simplifying:
The entire matrix product, then, is ⎡
⎤ 1a + 1b + 0c AB = ⎣4a + 2b + 1c⎦ 6a − 2b − 4c Equations 4-4, 4-5, and 4-6 represent the three elements of the matrix product of A and B. Note that each row of this resulting matrix contains only one element, even though each of these elements is the result of a fairly extensive sequence of arithmetic operations. Equations 4-4, 4-5, and 4-7, however, represent the symbolism you would normally expect to see when looking at the set of simultaneous equations that these matrix expressions replace. Note further that this matrix product AB is the same as the entire left-hand side of the original set of simultaneous equations that we originally set out to solve. Thus we have shown that these matrix expressions can be readily verified through straightforward application of the basic matrix operations, thus clearing up one of the loose ends we had left.
Matrix Algebra and Multiple Linear Regression: Part 1
25
Another loose end is the relationship between the quasi-algebraic expressions that matrix operations are normally written in and the computations that are used to implement those relationships. The computations themselves have been covered at some length in the previous two chapters [1, 2]. To relate these to the quasi-algebraic operations that matrices are subject to, let us look at those operations a bit more closely.
QUASI-ALGEBRAIC OPERATIONS Thus, considering equation 4-2, we note that the matrix expression looks like a simple algebraic expression relating the product of two variables to a third variable, even though in this case the “variables” in question are entire matrices. In equation 4-2, the matrix B represents the unknown quantities in the original simultaneous equations. If equation 4-2 were a simple algebraic equation, clearly the solution would be to divide both sides of this equation by A, which would result in the equation B = C/A. Since A and C both represent known quantities, a simple calculation would give the solution for the unknown B. There is no defined operation of division for matrices. However, a comparable result can be obtained by multiplying both sides of an equation (such as equation 4-2 by the inverse of matrix A. The inverse (of matrix A, for example) is conventionally written as A−1 . Thus, the symbolic solution to equation 4-2 is generated by multiplying both sides of equation 4-2 by A−1 : A−1 AB = A−1 C
(4-8)
There are a couple of key points to note about this operation. The main point is that since the order of appearance of the matrices matters, it is important that the new matrix, the one we are multiplying both sides of the equation by, is placed at the beginning of the expressions on each side of the equation. The second key point is the accomplishment of a desired goal: on the left-hand side of equation 4-8 we have the expression A−1 A. We noted earlier that the key defining characteristic of the inverse of a matrix is that fact that when multiplied by the original matrix (that it is the inverse of), the result is a unit matrix. Thus equation 4-8 is equivalent to 1B = A−1 C
(4-9)
where [1] represents the unit matrix. Since the property of the unit matrix is that when multiplied by any other matrix, the result is the same as the other matrix, then [1]B = B, and equation 4-9 becomes B = A−1 C
(4-10)
Thus we have symbolically solved equation 4-2 for the unknown matrix B, the elements of which are the unknown variables of the original set of simultaneous equations. Performing the matrix multiplication of A−1 C will then provide the values of these unknown variables.
26
Chemometrics in Spectroscopy
Let us examine these symbolic transformations with a view toward seeing how they translate into the required arithmetic operations that will provide the answers to the original simultaneous equations. There are two key operations involved. The first is the inversion of the matrix, to provide the inverse matrix. This is an extremely intensive computational task, so much so that it is in general done only on computers, except in the simplest cases for pedagogical purposes, such as we did in our previous chapter. In this regard we are reminded of an old, and somewhat famous, cartoon, where two obviously professor-type characters are staring at a large blackboard. On the left side of the blackboard are a large number of mathematical symbols, obviously representing some complicated and abstruse mathematical derivations. On the right side of the blackboard is a similar set of symbols. In the middle of the blackboard is a large blank space, in the middle of which is written, in big letters: “AND THEN SOME MAGIC HAPPENS”, and one of the characters is saying to the other: “I think you need to be a bit more explicit here in step 10.” To some extent, we feel the same way about matrix inversions. The complications and amount of computation involved in actually doing a matrix inversion are enough to make even the most intrepid mathematician/statistician/chemometrician run for the nearest computer with a preprogrammed algorithm for the task. Indeed, there sometimes seem to be just about as many algorithms for performing a matrix inversion as there are people interested in doing them. In most cases, then, this process is in practice treated as a “black box” where “some magic happens”. Except for the theoretical mathematician, however, there is usually little interest in “being more explicit”, as long as the program gives the right answer. As is our wont, however, our previous chapter worked out the gory details for the simplest possible case, the case of a 2 × 2 matrix. For larger matrices, the amount of computation increases so rapidly with matrix size that even the 3 × 3 matrix is left to the computer to handle. But how can we tell then if the answer is correct? Well, there is a way, and one that is not too overwhelming. From the definition of the inverse of a matrix, you should obtain a unit matrix if you multiply the inverse of a given matrix by the matrix itself. In our previous chapter [1] we showed this for the 2 × 2 case. For the simultaneous equations at hand, however, the process is only a little more extensive. From the original matrix of coefficients in the simultaneous equations that we are working with, the one called A above, we find that the inverse of this matrix is ⎡
−1
A
−0375 = ⎣ 1375 −125
025 −025 05
⎤ 00625 −00625⎦ −0125
(4-11)
How did we find this? Well, we used some of our magic. The details of the computations needed were described in the previous chapter, for the 2 × 2 case; we will not even try to go through the computations needed for the 3 × 3 case we concern ourselves with here. However, having a set of numbers that purports to be the inverse of a matrix, we can verify whether or not it is the inverse of that matrix: all we need to do is multiply by the original matrix and see if the result is a unit matrix. We have done this for the 2 × 2 matrix in our previous chapter. An exercise for the reader is to verify that the matrix shown in equation 4-11 is, in fact, the inverse of the matrix A.
Matrix Algebra and Multiple Linear Regression: Part 1
27
That was the hard part. It now remains to calculate out the expressions shown in equation 4-10, to find the final values for the unknowns in the original simultaneous equations. Thus, we need to form the matrix product of A−1 and C: ⎡ ⎤ ⎡ ⎤ −0375 025 00625 −2 (4-12) A−1 C = ⎣ 1375 −025 −00625⎦ × ⎣ 6⎦ −125 05 −0125 14 This matrix multiplication is similar to the one we did before: we need to multiply a 3 × 3 matrix by a 3 × 1 matrix; the result will then also have dimensions of three rows and one column. The three rows of this matrix will thus be the result of these computations: C11 = −0375 ∗ −2 + 025 ∗ 6 + 00625 ∗ 14 = 075 + 15 + 0875 = 3125
(4-13a)
C21 = 1375 ∗ −2 + −025 ∗ 6 + −00625 ∗ 14 = −275 + −15 + −875 = −5125
(4-13b)
C31 = −125 ∗ −2 + 05 ∗ 6 + −0125 ∗ 14 = 25 + 3 + −175 = 375
(4-13c)
Thus, in matrix terms, the matrix C is ⎡
⎤ 3125 C = ⎣−5125⎦ 375
(4-14)
and this may be compared to the result we obtained algebraically in the last chapter (and found to be identical, within the limits of different roundings used). At first glance it would seem as though this approach has the additional characteristic of requiring fewer computations than our previous method of solving similar equations. However, the computations are exactly the same, but most of them are “hidden” inside the matrix inversion. It might also seem that we have been repetitive in our explanation of these simul taneous equations. This is intentional – we are attempting to explicate the relationship between the algebraic approach and the matrix approach to solving the equations. Our first solution (in the previous chapter) was strictly algebraic. Our second solution used matrix terminology and concepts, in addition to explicitly writing out all the arithmetic involved. Our third approach uses symbolic matrix manipulation, substituting numbers only in the last step.
28
Chemometrics in Spectroscopy
MULTIPLE LINEAR REGRESSION In Chapters 2 and 3, we discussed the rules related to solving systems of linear equations using elementary algebraic manipulation, including simple matrix operations. The past chapters have described the inverse and transpose of a matrix in at least an introductory fashion. In this installment we would like to introduce the concepts of matrix algebra and their relationship to multiple linear regression (MLR). Let us start with the basic spectroscopic calibration relationship: Concentration = Bias +
(Regression Coefficient 1) × (Absorbance at Wavelength 1) +
(Regression Coefficient 2) × (Absorbance at Wavelength 2)
Also written as:
Concentration = 0 + 1 A1 + 2 A2
(4-15)
In this example we state that the concentration of an analyte within a sample is a linear combination of two variables. These variables, in our case, are measured in the same units, that is Absorbance units. In this case the concentration is known as the dependent variable or response variable because its magnitude depends or responds to the values of the changes in Absorbances at Wavelengths 1 and 2. The Absorbances are the x-variables, referred to as independent variables, regressor variables, or predictor variables. Thus an equation such as equation 4-4 through 4-15 attempts to explain the relationship between concentration and changes in Absorbance. This calibration equation or calibration model is said to be linear because the relationship is a linear combination of multiplier terms or regression coefficients as predictors of the concentration (response or dependent variable). Note that the 1 and 2 terms are called Regression Coefficients, Multiplier Terms, Multipliers, or sometimes Parameters. The analysis described is referred to as Linear Regression, Least-Squares, Linear Least-Squares, or most properly, MLR. In more formal notation, we can rewrite Equation 4-15 as: Ecj = 0 + 1 A1 + 2 A2
(4-16)
where Ecj is the expected value for the concentration. Note: the difference between Ecj and cj is the difference between the predicted or expected value Ecj and the actual or observed value cj . This can be rewritten as: cj − Ecj = cj − 0 + 1 A1 + 2 A2
(4-17)
cj = 0 + 1 A1 + 2 A2 + j
(4-18)
and
where j is termed the Prediction Error, Residual Error, Residual, Error, Lack of Fit Error, or the Unexplained Error.
Matrix Algebra and Multiple Linear Regression: Part 1
29
We can also rewrite the equation in matrix form as: ⎤ c1 ⎢ c2 ⎥ ⎢ ⎥ ⎢•⎥ ⎥ C = ⎢ ⎢•⎥ ⎢ ⎥ ⎣•⎦ cN ⎡
⎡
1 ⎢1 ⎢ ⎢1 A = ⎢ ⎢ ⎢• ⎣• 1
A11 A21 A31 • • AN 1
⎤ A12 A22 ⎥ ⎥ A32 ⎥ ⎥ ⎥ • ⎥ • ⎦ AN 2
⎡
⎡
⎤ 0 = ⎣ 1 ⎦ 2
⎤ 1 ⎢ 2 ⎥ ⎢ ⎥ ⎢ 3 ⎥ ⎥ =⎢ ⎢•⎥ ⎢ ⎥ ⎣•⎦ N
(4-19)
This equation of the model in matrix notation is written as: C = A +
(4-20)
THE LEAST SQUARES METHOD The problem now becomes: how do we handle the situation in which we have more equations than unknowns? When there are fewer equations than unknowns it is clear that there is not enough information available to determine the values of the unknown variables. When we have more equations than unknowns, however, we would seem to have the problem of having too much information; how do we handle all this extra information and put it to use? For example, consider the following set of simultaneous equations: 1a + 1b + 0c = −2
(4-21a)
4a + 2b + 1c = 6
(4-21b)
6a − 2b − 4c = 14
(4-21c)
1a + 3b + −1c = −15
(4-21d)
This is a set of equations in three unknowns. The first three of these equations are the ones we dealt with above, and we have seen that the solution to the first three equations is a = 3125
(4-22a)
b = −5125
(4-22b)
c = 375
(4-22c)
However, when we replace a, b and c in equation 4-21d by those values, we find that 1 × 3125 + 3 × −5125 + −1 × 375 = −16 rather than the −15 that the equation specifies. If we were to use different subset of groups of three of these equations at a time, we would obtain different answers depending
30
Chemometrics in Spectroscopy
on which set of three equations we used. There seems to be an inconsistency here, yet in the set of four equations represented by equations 4-21 (a–d) all the equations have the same significance; there are no a priori criteria for eliminating any one of them. This is the situation we must handle. We cannot simply ignore one or more of these equations arbitrarily; dealing with them properly has become known variously as the Least Squares method, Multiple Least Squares, or Multiple Linear Regression. As spectroscopists, we are concerned with the application of these mathematical techniques to the solution of spectroscopic problems, particularly the use of spectroscopy to perform quantitative analysis, which is done by applying these concepts to a set of linear equations, as we will see. In this least squares method example the object is to calculate the terms 0 , 1 and 2 which produce a prediction model yielding the smallest or “least squared” differences or residuals between the actual analyte value cj , and the predicted or expected concentration Ecj . To calculate the multiplier terms or regression coefficients j for the model we can begin with the matrix notation: A� A = A� C
(4-23)
When solving for ˆ the expression becomes ˆ To illustrate the matrix ⎡ 2 1 j ⎢ ⎢ A� A = ⎢ j 1 × Aj1 ⎣ 1 × Aj2 j
= A� A−1 A� C
algebra involved for this problem we write 2 2 ⎤ ⎡ Aj1 × 1 Aj2 × 1 N j j A1•2 ⎥ ⎢A Aj1 ⎥ Aj1 × Aj1 Aj2 × Aj1 ⎥ = ⎢ •1 j ⎣ j j ⎦ A•2 Aj1 Aj2 Aj1 × Aj2 Aj2 × Aj2 j j
j
(4-24)
⎤ A2• Aj2 Aj1 ⎥ ⎥ j 2 ⎦ Aj2 j
(4-25) Then rewriting in summation notation we have N
12 = N
and
j=1 N
Aj1 × Aj2 =
Aj1 Aj2
(4-26)
j=1 N j=1
Aj1 =
Aj•
j
Note that A� C is also required for the computations (see equation 4-24) and is given as: ⎡ ⎤ ⎡ ⎤ 1 × Cj NCj j ⎢ ⎥ ⎢ A C ⎥ ⎢ ⎥ j1 j ⎥ (4-27) A� C = ⎢ j Aj1 Cj ⎥ = ⎢ j ⎣ ⎦ ⎣ A C ⎦ j2 j Aj2 Cj j j
Matrix Algebra and Multiple Linear Regression: Part 1
31
If we represent our spectroscopic data using the following symbols: j Cj N Aj1 Aj2
= Spectrum number = Actual concentration for each spectrum = Rank of each spectrum (1) = Absorbance at Wavelength 1 = Absorbance at Wavelength 2.
From this information we can calculate the ˆ (see equation 4-8) using ⎡ ⎤ c1
⎢c2 ⎥
⎢ ⎥ ⎢•⎥ ⎥ C = ⎢ ⎢•⎥ ⎢ ⎥ ⎣•⎦ cj ⎡ 1 ⎢1 ⎢ ⎢1 A = ⎢ ⎢• ⎢ ⎣• 1
A11 A21 A31 • • Aj1
⎤ A12 A22 ⎥ ⎥ A32 ⎥ ⎥ • ⎥ ⎥ • ⎦ Aj2
(4-28)
⎡
⎤ NCj ⎢ Aj1 Cj ⎥ ⎥ A� C = ⎢ j ⎣ ⎦ Aj2 Cj j
If we then calculate the inverse of A� A, written as A� A−1 , the computations are nearly complete and we finally obtain ⎡ ⎤ ˆ0 ⎢ˆ⎥ ˆ = A� A−1 A� C = ⎣ (4-29) 1 ⎦ ˆ 2 which in conclusion gives the completed regression equation ECˆ = ˆ0 + ˆ1 A1 + ˆ2 A2
(4-30)
In our next installment, we will review the “how to” of the matrix operations for this example using numerical data. Authors’ note: This initial chapter dealing with matrix algebra and regression has been adapted for spectroscopic nomenclature from Shayle R. Searle’s book, Matrix Algebra Useful for Statistics (John Wiley & Sons, New York, 1982), pp. 363–368. Other particularly useful reference sources with page numbers are listed below as [1–3].
32
Chemometrics in Spectroscopy
REFERENCES 1. Draper, N.R. and Smith, H., Applied Regression Analysis (John Wiley & Sons, New York, 1981), pp. 70–87. 2. Kleinbaum, D.G. and Kupper, L.L., Applied Regression Analysis and Other Multivariable Methods (Duxbury Press, Boston, 1978), pp. 508–520. 3. Workman, J., Jr. and Mark, H., Spectroscopy 8(7), 16–19 (1993).
5
Matrix Algebra and Multiple Linear Regression: Part 2
In the previous chapter we presented the problem of fitting data when there is more information (in the form of equations relating the several variables involved) available than the minimum amount that will allow for the solution of the equations. We then presented the matrix equations for calculating the least squares solution to this case of overdetermined variables. How did we get from one to the other? As we described the situation, when there are more equations than unknowns, one possibility is to ignore some of the equations. This is unsatisfactory, for a number of reasons. In the first place, there is no a priori criterion for deciding which equations to ignore, so that any choice is arbitrary. Secondly, by rejecting some of the equations, we are also rejecting and wasting the work that went into the collection of the data represented by those equations. Thirdly, and perhaps most importantly, when we ignore some of the equations, we are also ignoring the (rather important) fact that the lack of perfect fit to all the equations is itself an important piece of information. What the set of equations is telling us in this case is that there is, in fact, not a perfect fit of the data, taken as a whole, of any of the equations in the set. Rather, there is some average equation, that in some sense gives a best fit to all of the data taken as a set, without favoring any particular subset of them. It is this “average” equation that we would like to be able to find. In the history of the development of mathematics, one important branch was the study of the behavior of randomness. Initially, there were no highfalutin ideas of making “science” out of what appeared to be disorder; rather, the investigations of random phenomena that lead to what we now know as the science of Statistics began as studies of the behavior of the random phenomena that existed in the somewhat more prosaic context of gambling. It was not until much later that the recognition came that the same random phenomena that affected, say, dice, also affected the values obtained when physical measurements were made. By the time this realization arose, it was well recognized that random phenomena were describable only by probabilistic statements; by definition it is not possible to state a priori what the outcome of any given random event will be. Thus, when the attention of the mathematicians of the time turned to the description of overdetermined systems, such as we are dealing with here, it was natural for them to seek the desired solution in terms of probabilistic descriptions. They then defined the “best fitting” equation for an overdetermined set of data as being the “most probable” equation, or, in more formal terminology, the “maximum likelihood” equation. Under the proper conditions (said conditions being that the errors that prevent all the data relationships from being described by a single equation are normally [1, 2] distributed) it can be proven mathematically that the “most probable” equation is exactly the one that is the “least square” equation. While we have discussed this point
34
Chemometrics in Spectroscopy
briefly in the past [3] it is, perhaps, appropriate at this point to revisit it, in a bit more detail. The basis upon which this concept rests is the very fact that not all the data follows the same equation. Another way to express this is to note that an equation describes a line (or more generally, a plane or hyperplane if more than two dimensions are involved. In fact, anywhere in this discussion, when we talk about a calibration line, you should mentally add the phrase “� � � or plane, or hyperplane � � � ”). Thus any point that fits the equation will fall exactly on the line. On the other hand, since the data points themselves do not fall on the line (recall that, by definition, the line is generated by applying some sort of [at this point undefined] averaging process), any given data point will not fall on the line described by the equation. The difference between these two points, the one on the line described by the equation and the one described by the data, is the error in the estimate of that data point by the equation. For each of the data points there is a corresponding point described by the equation, and therefore a corresponding error. The least square principle states that the sum of the squares of all these errors should have a minimum value; and as we stated above, this will also provide the “maximum likelihood” equation. It is certainly true that for any arbitrarily chosen equation, we can calculate what the point described by that equation is, that corresponds to any given data point. Having done that for each of the data points, we can easily calculate the error for each data point, square these errors, and add together all these squares. Clearly, the sum of squares of the errors we obtain by this procedure will depend upon the equation we use, and some equations will provide smaller sums of squares than other equations. It is not necessarily intuitively obvious that there is one and only one equation that will provide the smallest possible sum of squares of these errors under these conditions; however, it has been proven mathematically to be so. This proof is very abstruse and difficult. In fact, it is easier to find the equation that provides this “least square” solution than it is to prove that the solution is unique. A reasonably accessible demonstration, expressed in both algebraic and matrix terms, of how to find the least square solution is available. Even though regression analysis (one of the more common names for the application of the least square principle) is a general mathematical technique, when we are dealing with spectroscopic data, so that the equation we wish to fit must be fitted to data obtained from systems that follow Beer’s law, it is convenient to limit our discussion to the properties of spectroscopic systems. Thus we will couch our discussion in terms of quantitative analysis performed using spectroscopic data; then the dependent variable of the least square regression analysis (usually called the “Y” variable by mathematicians) will represent the concentration of analyte in the set of samples used to calibrate the system, and the independent (or “X”) variable will represent absorbance values measured by a suitable instrument in whichever spectral region we are dealing with. We will begin our discussion by demonstrating that, for a non-overdetermined system of equations, the algebraic approach and the least-square approach provide the same solution. We will then extend the discussion to the case of an overdetermined system of equations. Therefore this chapter will continue the multiple linear regression (MLR) discussion introduced in the previous chapter, by solving a numerical example for MLR. Recalling
Matrix Algebra and Multiple Linear Regression: Part 2
35
the basic ultraviolet, visible, near-infrared, and infrared use of MLR for spectroscopic calibration, we have Concentration = Constant term (or Bias) + �Regression coefficient 1� • �Absorbance at wavelength 1� + �Regression coefficient 2� • �Absorbance at wavelength 2� + · · · + �Regression coefficient N� • �Absorbance at wavelength N� Also written in equation form as: Concentration = �0 + �1 A�1 + �2 A�2 + · · · + �N A�N
(5-1)
By including an error term, we can write the equation as: Concentration = �0 + �1 A�1 + �2 A�2 + · · · + �N A�N + e And also in expanded matrix form as: ⎡ ⎡ ⎤ A11 A12 A13 A14 c1 ⎢ A21 A22 A23 A24 ⎢c2 ⎥ ⎢ ⎥ ⎢ ⎢ ⎢•⎥ • • • ⎥ A=⎢ • c=⎢ ⎢•⎥ ⎢ • • • • ⎢ ⎢ ⎥ ⎣ • ⎣•⎦ • • • AM1 AM2 AM3 AM4 cN
⎤ • • A1N • • A2N ⎥ ⎥ • • • ⎥ ⎥ • • • ⎥ ⎥ • • • ⎦ • • AMN
⎡ ⎤ �1 ⎢�2 ⎥ ⎢ ⎥ ⎢�3 ⎥ ⎢ ⎥ �=⎢ ⎥ ⎢∗⎥ ⎣•⎦ �N
(5-2) ⎡ ⎤ e1 ⎢e2 ⎥ ⎢ ⎥ ⎢e3 ⎥ ⎥ e=⎢ ⎢•⎥ ⎢ ⎥ ⎣•⎦ eN (5-3)
and in simplified matrix notation, the equation is c = a� + e
(5-4)
Because we have limited time and space, let us solve our problem using two wavelengths (or frequencies) and a basic calculator. To define the problem, we start with a set of calibration samples with the characteristics listed in Table 5-1: The system of equations for solving this problem can be written as 2�0 = �0 + �1 �0�75� + �2 �0�28�
(5-5a)
4�0 = �0 + �1 �0�51� + �2 �0�485�
(5-5b)
7�0 = �0 + �1 �0�32� + �2 �0�78�
(5-5c)
Table 5-1 Characteristics of the calibration samples Sample number 1 2 3
Concentration 2�0 4�0 7�0
Signal at wavelength 1
Signal at wavelength 2
0�75 0�51 0�32
0�28 0�485 0�78
36
Chemometrics in Spectroscopy
and in simplified matrix form as C = �A� • ���
(5-6)
and written in matrix form (with the constant term as the third column) as: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0�75 0�28 1 2�0 �0 C = ⎣ 4�0 ⎦ � � = ⎣ �1 ⎦ � A = ⎣0�51 0�485 1⎦ 0�32 0�78 1 �2 7�0 The augmented matrix formed by [A�C] is ⎡ 0�75 �A�C� = ⎣0�51 0�32
(5-7)
designated as: 0�28 0�485 0�78
1 1 1
⎤ 2�0 4�0⎦ 7�0
(5-8)
The first task is to use elementary matrix row operations to manipulate matrix [A�C] to yield zeros in rows II and III of column I. The row operations are to replace row II by row II minus 0.68 times of row I; that is. II = II − 0�68 × I; followed by replacing row III by row III minus 0.4267 times of row I; that is, III = III − 0�4267 × I. To complete our row operations we must accomplish placing zeros in columns I and II of row III by replacing row III by row III minus 2.242 times of row II: that is: III = III − 2�242 × II. These row operations yield (remember to keep as much precision as possible in your calculations): ⎤ ⎡ 2�0 0�75 0�28 1 ⎣0 0�2946 0�32 2�64 ⎦ (5-9) 0 0 −0�1442 0�2274 In summary, by using two series of row operations, namely III − 0�4267 I: and III = III − 2�242 II we have ⎤ ⎡ ⎡ 0�75 0�28 1 0�75 0�28 1 2�0 ⎣0�51 0�485 1 4�0⎦ ∼ ⎣0 0�2946 0�32 0�32 0�78 1 7�0 0 −0�1442 0
II = II − 0�68 I� III = ⎤ 2�0 2�64 ⎦ 0�2274
(5-10)
These two matrices (original and final) are row equivalent because by using simple row operations the right matrix was formed from the left matrix. The final matrix is equivalent to a set of equations as shown below: 0�75�1 + 0�28�2 + 1�0�0 = 2�0 0�2946�2 + 0�32�0 = 2�64 −0�1442�0 = 0�2274
(5-11a) (5-11b) (5-11c)
Now solving the system of equations yields (−0�1442��0 = 0�2274� �0 = −1�577; solv ing for �2 , we find (0.2946) �2 + 0�32�−1�577� = 2�64� �2 = 10�674; solving for �1 yields (0.75)�1 + 6�28�10�674� + 1�−1�577� = 2�0�1 = 0�784.
Matrix Algebra and Multiple Linear Regression: Part 2
37
And so, �0 = −1�577 �1 = 0�784 �2 = 10�674 Substituting into the original equations and calculating the differences between predicted and actual results, we find the results shown in Table 5-2. The foregoing discussion is all based on one important assumption: that the equation describing the relationship between the data does, in fact, include a constant term. If Beer’s law is strictly followed, however, when the concentration of all absorbing constituents is zero, then the absorbance (at all wavelengths, no less) is also zero: that is, the equation describing the relationship between the data generates a line that passes through the origin. If this condition holds, then the constant term of the equation is also exactly zero, and may be dropped from the equation. It has been shown possible to generate a least squares expression for this case also, that is, with the constant of the equation forced to be zero: it is merely necessary to formulate the expression for the prediction equation, corresponding to equation 5-11d as: Conc� = �1 A1 + �2 A2
(5-11d)
Starting from this expression, one can execute the derivation just as in the case of the full equation (i.e., the equation including the constant term), and arrive at a set of equations that result in the least square expression for an equation that passes through the origin. We will not dwell on this point since it is not common in practice. However, we will use this concept to fit the data presented, just to illustrate its use, and for the sake of comparison, ignoring the fact that without the constant term these data are overdetermined, while they are not overdetermined if the constant term is included – if we had more data (even only one more relationship), they would be overdetermined in both cases. Then, if the equation system is solved with no constant term (�0 �, we have the following results (you can either take our word for it or perform the row operations for yourself. Exercise for the reader: do those row operations.): �2 �0�2946� = 2�64, �2 = 8�9613; and �0�75� + 0�28�8�9613� = 2�0, �1 = −0�679. And so, �1 � = −0�679 �2 � = 8�9613 Table 5-2 Results after substituting into the original equations and calculating the differences between predicted and actual results (using manual row operations) Sample number 1 2 3
�0
+
�1 (A�1 �
+
�2 (A�2 �
= Predicted − Actual = Residual
−1�577 + 0.784(0.75) + 10.674(0.28) = −1�577 + 0.784(0.51) + 10.674(0.485) = −1�577 + 0.784(0.32) + 10.674(0.78) =
2.0 4.0 7.0
− − −
2.0 4.0 7.0
= = =
0 0 0
38
Chemometrics in Spectroscopy
Table 5-3 Results when there is no constant (bias) term after substituting into the original equations and calculating the differences between predicted and actual results �1 �A�1 �
+
�1 �A�2 �
=
Predicted
−
Actual
=
Residual
−0�679�0�75� −0�679�0�51� −0�679�0�32�
+ + +
8.9613(0.26) 8.9613(0.485) 8.9613(0.78)
= = =
2�0 4�0 6�78
− − −
2�0 4�0 7�0
= = =
0�0 0�0 −0�23
Sample number 1 2 3
and the results are shown in Table 5-3. Another exercise for the reader: Why is a bias term often used in regression for spectroscopic data?
THE POWER OF MATRIX MATHEMATICS Now let us see what happens when we use pure, unadulterated matrix power to solve this equation system, such that A� A�ˆ = A� C
(5-12)
as equation 4-23 showed us. When solving for the regression coefficients (��, we have ⎡
⎤ �0 ⎣ �1 ⎦ = �ˆ = �A� A�−1 A� C �2
(5-13)
Noting the matrix algebra for this problem (Equation 25 from reference [1]) ⎡
A2j0
⎢ ⎢ A� A = ⎢ j Aj0 Aj1 ⎣ Aj0 Aj2 j
j
⎤ ⎡ ⎤ Aj1 Aj0 Aj2 Aj0 A•1 A•2 N j j j ⎥ ⎢ ⎥ 2 2
⎢ A•1 Aj1 Aj2 Aj1⎥ Aj1 Aj2 Aj1⎥
= ⎥ ⎢ ⎥ (5-14) j j j j 2 ⎦ ⎣ j 2 ⎦ Aj1 Aj2 Aj2 A•2 Aj1 Aj2 Aj2 j
j
j
j
j
j
and substituting the numbers from our current example, we illustrate the following steps: ⎡
⎤ 1 0�75 0�28 A = ⎣ 1 0�51 0�485 ⎦ 1 0�32 0�78
(5-15)
and so the transpose of A (which is A� ) is ⎡ 1 A� = ⎣0�75 0�28
1 0�51 0�485
⎤ 1 0�32⎦ 0�78
(5-16)
Matrix Algebra and Multiple Linear Regression: Part 2
39
and to continue. A transpose (A� ) times A is ⎡
1×1+1×1+1×1 1 × 0�75 + 1 × 0�51 + 1 × 0�32 A� A = ⎣ 0�75 × 1 + 0�51 × 1 + 0�32 × 1 0�75 × 0�75 + 0�51 × 0�51 + 0�32 × 0�32 0�28 × 1 + 0�485 × 1 + 0�78 × 1 0�28 × 0�75 + 0�485 × 0�51 + 0�78 × 0�32 ⎤ ⎡ ⎤ 1 × 0�28 + 1 × 0�485 + 1 × 0�78 3 1�58 1�5450 0�75 × 0�28 + 0�51 × 0�485 + 0�32 × 0�78 ⎦ = ⎣ 1�58 0�925 0�707 ⎦ 0�28 × 0�29 + 0�485 × 0�485 + 0�78 × 0�78 1�545 0�707 0�922 (5-17) Next we need to calculate the inverse of [A� A], designated [A� A]−1 . Because A� A is an X3×3 problem, we had better use a computer program suitably equipped to calculate the inverse (2). ⎡
3 ⎣1�58 1�545
1�58 0�925 0�707
⎤ ⎡ 1�545 1 0�707⎦ ∼ ⎣0 0�922 0
0 1 0
⎤ 0 0⎦ 1
(5-18)
Exercise for the reader: See if you are able to determine all the row operations required to find the inverse of A� A (We recommend you set aside the better part of an afternoon to work this one through!) The augmented form is written as ⎡
3 ⎣1�58 1�545
1�58 0�925 0�707
1�545 0�707 0�922
1 0 0
⎤ 0 0⎦ 1
0 1 0
(5-19)
Thanks to the power of computers we find that the inverse of A� A is ⎡
348�0747 −1 �A� A� = ⎣−359�3786 −307�7061
−359�3786 373�6609 315�6969
⎤ −307�7061 315�6969⎦ 274�639
(5-20)
Then the next step is to calculate ⎡
⎤
⎡
⎤ ⎡ Nc• 1 ⎥ ⎢ A c⎥ ⎢ ⎥ ⎢ •1 1 ⎥ ⎣ 0�75 A� c = ⎢ j A•1 c1 ⎥ = ⎢ = j ⎦ ⎣ A c ⎦ ⎣ 0�28 •2 2 A c j
j
⎡
A•0 c0
•2 2
1 0�51 0�485
j
⎤ ⎡ ⎤ 1�2� + 1�4� + 1�7� 13 = ⎣ 0�75�2� + 0�51�4� + 0�32�7� ⎦ = ⎣ 5�78 ⎦ 0�28�2� + 0�485�4� + 0�78�7� 7�96
⎤ ⎡ ⎤ 1 2�0 0�32 ⎦ • ⎣ 4�0 ⎦ 0�78 7�0 (5-21)
40
Chemometrics in Spectroscopy
To solve for the regression coefficients (�i �, we are required to calculate (A� A�−1 A� C as follows (see equation 5-13): ⎡ ⎤ ⎡ ⎤ 348�0747 −359�3786 −307�7061 13�0 373�6609 315�6969⎦ • ⎣ 5�78⎦ � = �A� A�−1 A� C = ⎣−359�3786 −307�7061 315�6969 274�639 7�96 ⎡ ⎤ 348�0707�13� + �−359�3786��5�78� + �−307�7061��7�96� = ⎣ �−359�3786��13� + 373�6609�5�78� + 315�6969�7�96� ⎦ (5-22) �−307�7061��13� + 315�6969�5�78� + 274�639�7�96� ⎡ ⎤ ⎡ ⎤ −1�577 �0
= ⎣ 0�786⎦ = ⎣�1 ⎦ 10�675 �2 And, checking our work, we arrive at Table 5-4. Now, if we took our original set of data, as expressed in equations 5-5a–5-5c, and added one more relationship to them, we come up with the following situation: 2�0 = b0 + b1 �0�75� + b2 �0�28�
(5-23a� )
4�0 = b0 + b1 �0�51� + b2 �0�485�
(5-23b� )
7�0 = b0 + b1 �0�32� + b2 �0�78�
(5-23c� )
8�0 = b0 + b1 �0�40� + b2 �0�79�
(2-23d� )
Now we have the situation we discussed earlier: we have four relationships among a set of data, and only three possible variables (even including the b0 term) that we can use to fit these data. We can solve any subset of three of these relationships, simply by leaving one of the four equations out of the solution. If we do that we come up with the following table of results (we forbear to show all the computations here; however, we do recommend to our readers that they do one or two of these, for the practice): b1
b0 Eliminating Eliminating Eliminating Eliminating
equation equation equation equation
5-1: −9�47843 5-2: −10�86455 5-3: −0�520039 5-4: −1�5777
b2
10�39215 10�15801 4�1461 0�78492
16�86274 10�73589 14�6100 10�675
Table 5-4 Results after substituting into the original equations and calculating the differences between predicted and actual results (using MATLAB calculations) Sample number 1 2 3
�0
+
�1 �A�1 �
+
�2 �A�2 �
= Predicted − Actual = Residual
−1�577 + 0.786(0.75) + 10.675(0.28) = −1�577 + 0.786(0.51) + 10.675(0.485) = −1�577 + 0.786(0.32) + 10.675(0.78) =
2.002 4.001 7.001
− − −
2.0 4.0 7.0
= = =
0.002 0.001 0.001
Matrix Algebra and Multiple Linear Regression: Part 2
41
The last entry in this table, the results obtained from eliminating equation 5-4, rep resents of course the results obtained from the original set of three equations, since eliminating equation 5-4 from the set leaves us with exactly that same set. However, even though there does not seem to be much difference between the various equa tions represented by equations 2a� –2d� , it is clear that the fitting equation depends very strongly upon which subset of these equations we choose to keep in our calculations. Thus we see that we cannot arbitrarily select any subset of the data to use in our computations; it is critical to keep all the data, in order to achieve the correct result, and that requires using the regression approach, as we discussed above. If we do that, then we find that the correct fitting equation is (again, this system of equations is simple enough to do for practice – the matrix inversion can be performed using the row operations as we described previously):
Regression results:
b0
b1
−0�685719
6.15659
b2 15.50951
Note, by the way, that if you thought that the regression solution would simply be the average of all the other solutions, you were wrong. By now some of you must be thinking that there must be an easier way to solve systems of equations than wrestling with manual row operations. Well, of course there are better ways, which is why we will refresh your memory on the concept of determinants in the next chapter. After we have introduced determinants we will conclude our introductory coverage of matrix algebra and MLR with some final remarks.
REFERENCES 1. Mark, H. and Workman, J., Statistics in Spectroscopy, (Academic Press, Boston, 1991), pp. 45–56; see also Mark, H. and Workman, J., Spectroscopy 2(9), 37–43 (1987). 2. Mark, H., Principles and Practice of Spectroscopic Calibration (John Wiley & Sons, New York, 1991), pp. 21–24. 3. Mark, H. and Workman, J., Statistics in Spectroscopy (Academic Press, Boston, 1991), pp. 271–281; see also H. Mark and J. Workman, Spectroscopy 7(3), 20–23 (1992).
This page intentionally left blank
6 Matrix Algebra and Multiple Linear Regression: Part 3 – The Concept of Determinants
In the previous chapter [1] we promised a discussion of an easier way to solve equation systems – the method of determinants [2]. To begin, given an X2×2 matrix [A] as � � a 1 b1 A= (6-1) a2 b2 the determinant of A is designated by � �a A = �� 1 a2
� b1 �� b2 �
(6-2)
Note that the brackets [ ] used to denote matrices are converted to vertical lines to denote a determinant. To continue, then the determinant of A is calculated this way: Adet = a1 b2 − a2 b1
(6-3)
The determinant is found by cross-multiplying the diagonal elements in a matrix and subtracting one diagonal product from the other, such that � � �a b1 �� Adet = �� 1 = a 1 b2 − a 2 b 1 (6-4) a 2 b2 � A numerical example is given as follows: Given A, find its determinant: � � � � �0�75 0�28 � 0�75 0�28 � � If A = � then Adet = � 0�51 0�485 0�51 0�485� = 0�75 × 0�485 − 0�28 × 0�5 = 0�364 − 0�141 = 0�221
(6-5)
To use determinants to solve a system of linear equations, we look at a simple application given two equations and two unknowns. For the equation system C1 = �1 Ak11 + �2 Ak12
(6-6a)
C2 = �1 Ak21 + �2 Ak22
(6-6b)
we denote �1 and �2 as unknown regression coefficients. By algebraic manipulation,
we can eliminate the �2 term from the equation system by multiplying the first equation
44
Chemometrics in Spectroscopy
by Ak22 and the second equation by Ak12 . By subtracting the two equations, we arrive at equations 6-6 through 6-7d: Ak22 C1 = Ak22 �1 Ak11 + Ak22 �2 Ak12
(6-7a)
�−�Ak12 C2 = Ak12 �1 Ak21 + Ak12 �2 Ak22
(6-7b)
Ak21 C1 − Ak12 C2 = Ak21 �1 Ak11 − Ak12 �1 Ak21
(6-7c)
Ak21 C1 − Ak12 C2 = Ak21 Ak11 − Ak12 Ak21 �1
(6-7d)
and
If the (Ak22 Ak11 − Ak12 Ak2 � term is nonzero, then we can divide this term into the above equation (6-7d) to arrive at �1 =
Ak22 C1 − Ak12 C2 Ak22 Ak11 − Ak12 Ak21
Note the denominator can be written as the determinant � � �Ak11 Bk12 � � � �Ak21 Bk21 �
(6-8)
(6-9)
referred to as the determinant of coefficients. We can also write the numerator as the determinant: � � �C1 Ak12 � � � (6-10) �C2 Ak22 � and so, � � C1 � � C2
�1 = � �Ak11 � �Ak21
� Ak12 �� Ak22 �
� Ak12 �� Ak22 �
(6-11)
We can also solve for �2 by algebraic manipulation of the equation system. Elimination of the �1 term is accomplished by multiplying the first equation by Ak21 and the second equation by Ak11 and subtracting the results, dividing by the common term, and lastly, by converting both the numerator and the denominator to determinants, finally arriving at equation 6-12. � � �Ak11 C1 � � � �Ak21 C2 � � � �2 = (6-12) �Ak11 Ak12 � � � �Ak21 Ak22 �
Matrix Algebra and Multiple Linear Regression: Part 3
45
To summarize what is referred to as Cramer’s rule, we can use the following general expressions given a system of two equations (6-13a and 6-13b) in two unknowns such that C1 = �1 Ak11 + �2 Ak12
(6-13a)
C2 = �1 Ak21 + �2 Ak22
(6-13b)
We can generalize a solution to this system of equations by using the following deter minant notation: � � � � � � �Ak11 Ak12 � �C1 Ak12 � �Ak11 C1 � � � D�1 = � � � � D = �� �C2 Ak22 � � D�2 = �Ak21 C2 � Ak21 Ak21 � And so, if D = 0, then we can solve for �1 , and �2 , using the relationships � � � � �C1 Ak12 � �C2 Ak22 � D�1 � �1 = =� � D �Ak11 Ak12 �� �Ak21 Ak22 �
(6-14)
and
�2 =
D�2 = D
� �Ak11 � �Ak21
� � �Ak11 �Ak21
� C1 �� C2 �
� Ak12 �� Ak22 �
(6-15)
There are, of course, additional rules for solving larger equation systems. We will address this subject again in later chapters when we discuss multivariate calibration in greater depth.
REFERENCES 1. Workman, J., Jr. and Mark, H., Spectroscopy 9(1), 16–19 (1994). 2. Britton, J.R. and Bello, I., Topics in Contemporary Mathematics (Harper & Row, New York, 1984), pp. 445–451.
This page intentionally left blank
7
Matrix Algebra and Multiple Linear Regression:
Part 4 – Concluding Remarks
Our discussions on MLR in previous chapters are all based on one important assumption: that the equation describing the relationship between the data does include a constant term. If Beer’s law is strictly followed, however, when the concentration of all absorbing constituents is zero, then the absorbance (at all wavelengths, no less) is also zero, that is the equation describing the relationship between the data generates a line that passes through the origin. If this condition holds, then the constant term of the equation is also exactly zero, and may be dropped from the equation. It has been shown possible to generate a least square expression for this case also, that is with the constant of the equation forced to be zero: it is merely necessary to formulate the expression for the prediction equation, corresponding to equation 7-1 as: Conc = b1 A1 + b2 A2
(7-1 )
Starting from this expression, one can execute the derivation just as in the case of the full equation (i.e., the equation including the constant term), and arrive at a set of equations that result in the least square expression for an equation that passes through the origin. We will not dwell on this point since it is not common in practice. We will use this concept to fit the data presented, just to illustrate its use, and for the sake of comparison, ignoring the fact that without the constant term these data are overdetermined, while they are not overdetermined if the constant term is included – if we had more data (even only one more relationship) they would be overdetermined in both cases. If we take our original set of data, as expressed in equations 7-5a–7.5c [1], and add one more relationship to them, we come up with the following situation: 20 = b0 + b1 075 + b2 028
(7-2a )
40 = b0 + b1 051 + b2 0485
(7-2b )
70 = b0 + b1 032 + b2 078
(7-2c )
80 = b0 + b1 040 + b2 079
(7-2d )
We now have the situation we discussed earlier: we have four relationships among a set of data, and only three possible variables (even including the b0 term) that we can use to fit these data. We can solve any subset of three of these relationships, simply by leaving one of the four equations out of the solution. If we do that we come up with the
48
Chemometrics in Spectroscopy
following table of results (we forbear to show all the computations here; however, we do recommend to our readers that they do one or two of these, for the practice):
Eliminating Eliminating Eliminating Eliminating
equation equation equation equation
7-1: 7-2: 7-3: 7-4:
b0
b1
−947843 −1086455 −0520039 −15777
10.39215 10.15801 4.1461 0.78492
b2 16.86274 10.73589 14.6100 10.675
The last entry in this table, the results obtained from eliminating equation 7-4, of course represents the results obtained from the original set of three equations, since eliminating equation 7-4 from the set leaves us with exactly that same original set. However, even though there does not seem to be much difference between the various equations represented by equations 7-2a –7-2d , clearly the fitting equation depends very strongly upon which subset of these equations we choose to keep in our calculations. Thus we see that we cannot arbitrarily select any subset of the data to use in our computations; it is critical to keep all the data, to achieve the correct result, and that requires using the regression approach, as we discussed above. If we do that, then we find that the correct fitting equation is (again, this system of equations is simple enough to do for practice – the matrix inversion can be performed using the row operations as we described previously):
Regression results:
b0
b1
b2
−0685719
6.15659
15.50951
Note, by the way, if you thought that the regression solution would simply be the average of all the other solutions, you were incorrect. With this chapter we will suspend our coverage of elementary matrix operations until a later chapter.
A WORD OF CAUTION We have noticed recently, a growing tendency for the chemical/spectroscopic community to draw the inference that the term “chemometrics” is virtually equivalent to “quanti tative analysis algorithms”. This misconception seems to be due to the overwhelming concentration of interest in that aspect of the application of chemometric techniques. This perceived equivalency is, of course, incorrect and non-existent in reality. The purview of chemometrics is much wider than that single application area, and encompasses a wide variety of techniques; including algorithms not only for quantitative and qualitative chemical analysis, but also for methods for analyzing, categorizing and generally dealing with data in a variety of ways (just look at the topic list included in the Analytical Chemistry reviews issue when Chemometrics is included). We ourselves have to plead guilty to some extent to promoting this misconception. While discussing and explaining the underlying concepts, we have also inherently spent much time and attention on that single topic, in much the same way that many other authors do.
Matrix Algebra and Multiple Linear Regression: Part 4
49
However, we do recognize and wish to caution our readers to recognize the fact that Chemometrics does in fact include this variety of methodologies alluded to above. We do, in fact, hope to eventually discuss these other concepts. Two items prevent us from just jumping in chin first, however. The first item is that there are, in fact, useful and important things that need to be said about the application of the quantitative analysis algorithms. The second item is the fact that while we are knowledgeable concerning some of the other areas of chemometric interest, we are not and could not possibly be experts in all such areas. We have discussed this between ourselves, and have decided that the only reasonable way to deal with this limitation is to entertain submissions from our readership. Anyone who has particular expertise in a topic that falls under the wider definition of “chemometrics” is welcome to submit one (or more) chapters dealing with that topic. We only request that you try to keep your discussions both simple and complete, using, as we say, only words of one syllable or less.
REFERENCE 1. Workman, J., Jr. and Mark, H., Spectroscopy 9(1), 16–19 (1994).
This page intentionally left blank
8
Experimental Designs: Part 1
The next several chapters will deal with the philosophy of experimental designs. Exper imental design is at the very heart of the scientific method; without proper design, it is well-nigh impossible to glean high-quality information from experimental data col lected. No amount of sophisticated processing or chemometrics can create information not presented within the data. Every scientist has designed experiments. So what is there left for us to say about that topic that chemometrics/statistics can shed some light on? Well, quite a bit actually, since not all experiments are designed equally, but some are definitely more equal than others (to steal a paraphrase). Another way to say it is that every experiment is a designed experiment, but some designs are better than others. In point of fact, the sciences of both statistics and chemometrics each have their own approach to how experiments should be designed, each with a view toward mak ing experimental procedures “better” in some sense. There is a gradation between the two approaches, nevertheless there is also somewhat of a distinction between what might be thought of as classical “statistical experimental design” and the more currently fashionable experimental designs considered from a chemometric point of view. These differences in approach reflect differences in the nature of the information to be obtained from each. Experimental designs, and in particular “statistical” experimental designs, are used in order to achieve one or more of the following goals: 1) Increase efficiency of resource use, that is, obtain the desired information using the fewest possible necessary experiments (this is usually what is thought of when “statistical experimental designs” are considered). This aspect of experimentation is particularly important when the experiment is large to begin with, or if the experiment uses resources that are rare or expensive, or if the experiment is destructive, so that materials (especially expensive ones) are used up. 2) Determine which variables or phenomena (“factors” in statistical/chemometric par lance) in an experiment are the “important” ones. This has two aspects: first is an effect large enough that we can be sure it is real, and not due simply to noise (or error) alone (i.e., “statistically significant”). We have treated this question to some extent in our previous chapters, and the book from it (both titled “Statistics in Spectroscopy”). The second aspect is, if the effect of a factor is indeed real, is it of sufficiently large magnitude to be of practical importance? While the answer to this question is important to understanding the outcome of the experiment, it is not a statistical question, and we will give it fairly short shrift.
52
Chemometrics in Spectroscopy
3) Accommodate noise and/or other random error. 4) Allow estimates to be made of the magnitude of the noise and/or other random error, if for no other reason than to compare our results to so as to tell if they are statistically significant. 5) Allow estimates to be made of the sensitivity to variations in the several factors. This can help decide whether any of the variations seen are of practical importance. A good design also allows these estimates of sensitivity to be made against an error background that is reduced compared to the actual error. This is accomplished by causing the effects of the factors to be effectively “averaged”, thus reducing the effect of error by the square root of the number of items being averaged. 6) Optimize some characteristic of the experimental system. To achieve these goals, certain requirements are imposed on the design and/or the data to be collected. The maximum amount of information can be obtained when: 1) The standard requirements for the behavior of the errors are met, that is, the errors associated with the various measurements are random, independent, normally (i. e., Gaussian) distributed, and are a random sample from a (hypothetical, perhaps) pop ulation of similar errors that have a mean of zero and a variance equal to some finite value of sigma-squared. 2) The design is balanced. This requirement is critical for certain types of designs and unimportant in others. Balance, in the sense used here, means that the values of a given experimental variable (factor) occurs in combination with all of the values of every other factor. For example, common variables in chemical experimentation are temperature and pressure. For a balanced design, experiments should be carried out where the material is held at low temperature, and at both high and low pressure. Additionally, experiments should be carried out where the material is held at high temperature, and at both high and low pressure. If a third variable, such as con centration of a reactant, is to be studied, then high and low pressure and high and low temperature should coexist with both the high and the low concentrations. The foregoing would seem to imply that a balanced experiment would require all possible combinations of conditions. While all-possible-combinations is certainly one way to achieve this balance, the advan tage of “statistical” deigns comes from the fact that clever ways have been devised to achieve balance while needing far fewer experiments than the all-possible-combinations approach would require (Table 8-1). As an illustration of this, let us consider the three aforementioned variables: tem perature, pressure, and concentration of reactant. An all-possible-combinations design would require eight experiments, with the following set of conditions in each experiment (where H and L represent the high and the low temperatures, pressures, etc.): However, to achieve balance, it is not necessary to carry out eight experiments; balance can be achieved with only four experiments with the conditions suitably set (Table 8-2). Check it out: High reactant concentration occurs in combination with each (high and low) temperature, and with each pressure; similarly for low reactant concentration.
Experimental Designs: Part 1
53
Table 8-1 An all-possible-combinations design of three factors, needing eight experiments and sets of conditions Experiment number 1 2 3 4 5 6 7 8
Temperature
Pressure
Concentration
L L L L H H H H
L L H H L L H H
L H L H L H L H
Table 8-2 Balanced design for three factors, needing only four experiments Experiment number 1 2 3 4
Temperature
Pressure
Concentration
L L H H
L H L H
L H H L
You will find the same situation for the other variables. This is not to say that there are no benefits to the larger experimental design, but we are making the point that balance can be achieved with the smaller one, and for those designs where balance is an important consideration, much work (and resources, and MONEY) can be saved. Balance is not always achievable in practice due to physical constraints on the mea surements that can be made. Certain designs do not require balance, and in fact to enforce balance would mitigate some of the benefits of the design. In particular, there are some designs where future experiments to be performed are determined by the results of the past experiments. To enforce balance here would require extra, unnecessary experimentation that did not contribute to the main goal of the whole venture. The various designs that have been generated can be classified into one of several categories. One way to classify experimetal designs is as follows: 1) 2) 3) 4)
Classical designs Screening designs Analytical designs Optimization designs.
In one sense, it is possible to think of the categories involved as “building blocks” for designs, which can then be combined in various ways which depend upon the information that you want to obtain which, in turn, determines the nature of the data to collect. These
54
Chemometrics in Spectroscopy
general categories, by the way, are not mutually exclusive. It is even possible to consider some types of designs as extensions of others, or, vice versa, as subsets, or special cases of other types of designs. Some of these main categories are A) B) C) D) E)
Factorial designs Fractional factorial designs Nested designs Blocked designs Response surface designs.
The key to all “statistical experimental” designs is planning. A properly planned experi ment can achieve all the goals set forth above, and in fewer runs than you might expect (that’s where achieving the goal of efficiency comes in). However, there are certain requirements that must be met: The experiment must be executed according to the plan! All the planning in the world is of naught if carrying out the experiment results in blunders (e.g., even something as crude as dropping a key sample on the floor – and look at how often that has been done!). The statistical literature contains examples (unfortunately) where large experiments, that cost millions of dollars to perform, were completely ruined by carelessness on the part of the personnel actually carrying it out. As noted above, the variations in the data representing the error must meet the usual conditions for statistical validity: they must be random and statistically independent, and it is highly desirable that they be homoscedastic and Normally distributed. The data should be a representative sampling of the populations that the experiment is supposed to explore. Blunders must be eliminated, and all specified data must be collected. The efficiency of these experimental designs has another side effect: any missing or defective data has a disproportionate effect relative to the amount of information that can be extracted from the final data set. When simpler experimental designs are used, where each piece of data is collected for the sole purpose of determining the effect of one variable, loss of that piece of data results in the loss of only that one result. When the more efficient “statistical” experimental designs are used, each piece of data contributes to more than one of the final results, thus each one is used the equivalent of many times and any missing piece of data causes the loss of all the results that are dependent upon it. These types of experimental designs also have some limitations. The first is the exaggeration of the effect of missing or defective data on the results, as mentioned above. The second is the fact that until the entire plan is carried out, little or no information can be obtained. There are generally few, if any, “intermediate results”; only after all the data is available can any results at all be calculated, and then all of them are calculated at once. This phenomenon is related to the first caveat: until each piece of data is collected, it is “missing” from the experiment, and therefore the results that depend upon it cannot be calculated. The simplest possible experimental design would almost not be recognized as an “experimental design” at all, but does serve as a prototype situation (as we like to use for pedagogical purposes). The situation arises when there is one variable (factor) to investigate, and the question is, does this factor have an effect on the property studied? We have introduced this situation earlier, in our discussion of hypothesis testing, as in
Experimental Designs: Part 1
55
our previous Statistics in Spectroscopy book [1–3]. We will discuss how we treated this situation previously, then change our point of view to see how we would do it from the point of view of an “experimental design”.
REFERENCES 1. H. Mark, and J. Workman, “Statistics in Spectroscopy; Elementary Matrix Algebra and Multiple Linear Regression: Conclusion”, Spectroscopy 9(5), 22–23 (June, 1994). 2. H. Mark, and J. Workman, “Statistics in Spectroscopy’, Spectroscopy 4(7), 53–54 (1989). 3. H. Mark, and J. Workman, Statistics in Spectroscopy (Academic Press, Boston, 1991), chapter 18.
This page intentionally left blank
9
Experimental Designs: Part 2
As we have mentioned in the last chapter, “Experimental Design” often takes a form in scientific investigations, such that some of experimental objects have been exposed to one level of the variable, while others have not been so exposed. Oftentimes this situation is called the “experimental subject” versus the “control subject” type of experiment. In the face of experimental error, or other source of variability of the readings, both the “experimental” and the “control” readings would be taken multiple times. That provides the information about the “natural” variability of the system against which the difference between the two can be compared. Then, a t-test is used to see if the difference between the “experimental” and the “control” subjects is greater than can be accounted for by the inherent variability of the system. If it is, we conclude that the difference is “statistically significant”, and that there is a real effect due to the “treatment” applied to the experimental subject. Of course there are variations on this theme: the difference between the “experimental” and the “control” subjects can be due to different amounts of something applied to the two types of object, for example. That is how we have treated this type of experiment previously. We will now consider a somewhat different way to formulate the same experiment; the purpose being to be able to set up the experimental design, and the analysis of the data, in such a way that it can be generalized to more complicated types of experiments. In order to do this, we recognize that the value of any individual reading, whether from the experimental subject or the control subject, can be expressed as the sum of three quantities. These three quantities arise from a careful consideration of the nature of the data. Given that a particular measurement belongs either to the experimental group or to the control group, then the value of the data collected can be expressed as the sum of these three quantities: 1) The grand mean of all the data (experimental + control)
2) The difference between the mean of the data group (experimental or control) and the
grand mean of the data 3) The difference between the individual reading and the mean reading of its pertinent group. This can then be expressed mathematically as: � � � � Xij = X + X i − X + Xij − X i
(9-1)
58
Chemometrics in Spectroscopy
where, Xij represents each individual datum.
X i represents the mean of the particular data group (experimental or control) that the
individual datum belongs to. X represents the grand mean of all the data (from both groups). By rearranging equation 9-1, we can also express it as follows, wherein the fact that it is a mathematical identity becomes apparent: � � � � (9-2) Xij = X − X + X − X + Xij We have previously shown that through the operation called “partitioning the sums of squares”, the following equality holds [1]: �2 � 2 � 2 �� X −X (9-3) Xi = X + Note that what we call the grand mean here is simply called the mean in the prior discussion. That is because in the prior discussion there was no further splitting of the data into subgroups. In the current discussion we have indeed split the data into subgroups; and we note that what was previously the total difference from the mean now consists of two contributions: the difference of each subgroup’s mean from the grand mean, and the difference of each datum’s value from its subgroup’s mean. We might expect, and it turns out to be so (again we leave the proof as an “exercise for the reader”), that sum of squares of the differences of each datum’s value from the grand mean can also be partitioned; thus,: �2 � � � 2 � 2 �� �2 Xi − X + Xij − X i (9-4) Xij = X + We had previously discussed the situation (from a slightly different point of view) where more than two subgroups of data existed. In that case we noted that we could generate two estimates of sigma, the within-group standard deviation. One estimate is calculated from the pooled within-group standard deviation. The other is calculated from the standard deviation between the means of the various subgroups. This quantity, you recall, is equal to the within-group standard deviation divided by the square root of n, the number of data used in the calculation of each subgroup’s mean. However, the second calculation is correct only if the differences between the means is due to the random variations of the data itself, and there are no external influences. If such influences exist, then the second calculation (from the between-group means) will estimate a larger value for sigma than the first calculation (the pooled within-group standard deviations). This was then used as the basis of a statistical hypothesis test: if the value of sigma calculated from the between-groups means is statistically significantly larger than the value of sigma calculated from with the groups, then we have evidence to conclude that there are indeed, external influences acting upon the data, and we used an F -test to determine whether there was more scatter between the means than could be accounted for by the random variations within the subgroups. In the case at hand, with only two subgroups, we can proceed the same way. The difference is that now, with only two subgroups, there is only one degree of freedom
Experimental Designs: Part 2
59
available for the difference between the subgroups. No matter; an F -test with one degree of freedom is possible. Thus, to analyze the data from the model of equation 9-4, we calculate the mean square between the subgroups, and the mean square within the subgroups and perform an F -test (rather than a t-test as before) between these two mean squares. We would recommend doing it formally, with an ANOVA table, but this is the basic calculation. The conclusions drawn will be identical to those drawn by use of the t-test. Check it out: the tabled values of F for one and n degrees of freedom is equal to the square of the value of t for n degrees of freedom. We might also note here, almost parenthetically, that if the hypothesis test gives a statistically significant result, it would be valid to calculate the sensitivity of the result to the difference between the two groups (i.e., divide the difference in the means of the two groups by the difference in the values of the variable that correspond to the “experimental” and “control” groups). As an example of using an experimental design together with its associated analysis of variance to obtain a meaningful result, we have here an example based on some real data that we have collected. The problem was interesting: to troubleshoot a method of (wet) chemical analysis. A large quantity of sample was available, and had been well-ground and mixed. Suitable data was collected to permit performing a straightforward one-way analysis of variance. To start with, 5 g of sample was dissolved in 100 ml of water, and 20 repeat analyses were performed. The resulting values are shown in Table 9-1. The entry in the third row, second column was noted to have been measured under abnormal conditions. Since an assignable cause for this discrepant value was available, the reading was discarded. The statistics for the remaining data were Mean = 5.01, SD = 0.327. This value for the standard deviation was accepted as the best available approximation to the population value for . The next step was to take several different aliquots from a large sample (a different sample than used previously) and collect multiple readings from each of them. Six aliquots were placed in each of six flasks, and six repeat measurements were made on each of these six flasks. Each aliquot consisted of 10 g of test sample/100 ml water. The results are shown in Table 9-2. The value for the pooled within-flask standard deviation, while somewhat higher than for the twenty repeat readings, is not so high as to be worrisome. Strictly speaking, we should have done an F -test between the variance from the two sets of results to see if there is any extra variance there, but we will ignore that question for now, because the important point here is the highly statistically significant value of the “between” flasks standard deviation, indicating some extra source of variation was superimposed on the analytical value.
Table 9-1 Results from 20 repeat readings of 5 g of sample dissolved in 100 ml water 5.12 5.28 4.97 5.20 4.50
5.60 5.14 3.85 4.69 5.12
5.18 4.74 5.39 4.49 5.61
4.71 4.72 4.94 4.91 4.99
60
Chemometrics in Spectroscopy
Table 9-2 Results of repeat readings of six aliquots in six flasks (from 10-g samples) Flask #
Means: SDs:
1
2
3
4
5
6
7.25 7.68 7.76 8.10 7.50 7.58
10.07 9.02 9.51 10.64 10.27 9.64
5.96 6.66 5.87 6.95 6.54 6.29
7.10 6.10 6.27 5.99 6.32 5.54
5.74 6.90 6.29 6.37 5.99 6.58
4.74 6.75 6.71 6.51 5.95 6.50
7.64 0.28
9.85 0.58
6.37 0.42
6.22 0.51
6.31 0.41
6.19 0.77
Pooled SD = 0.52, “Between” SD = 1.46 Expected “Between” SD = 0.212 F = 47 F (crit) = F (0.95, 5, 30) = 2.53
Having found a statistically significant “between” flasks standard deviation, the next step was to formulate hypotheses as to the possible physical causes of this situation. The list we arrived at was the following: • • • •
Inhomogeneous sample Drift between sets of readings Sampling error Something else.
The first physical cause considered was the possibility of an inhomogeneous sample. To eliminate this as a possibility, the sample was ground before aliquots were taken. The sample size was still 10 g of sample per 100 ml of water. In this case, however, time constraints permitted only three replicate readings per flask. The results are shown in Table 9-3. We note that there is still much larger difference between the different flasks’ readings that can be accounted for by the within-flask repeatability. Therefore we press onward to consider another possible cause of the variation; in this case we consider the possibility of inhomogeneity of the sample, at a scale not affected by grinding. For example, the sample might contain small specks of material that are too small to be ground further, Table 9-3 Results of repeat readings of six aliquots in six flasks (from 10-g samples ground)
Means: SDs:
6.57 6.27 6.35
5.06 6.27 5.88
8.07 7.82 8.52
4.93 5.64 5.19
4.78 5.50 5.99
6.23 7.37 5.27
6.39 0.16
5.74 0.61
8.19 0.35
5.25 0.36
5.43 0.61
7.29 1.01
Pooled SD = 0.58, “Between” SD = 1.14 Expected “Between” SD = 0.33 F = 113 F (crit) = F (0.95, 5, 12) = 3.10
Experimental Designs: Part 2
61
Table 9-4 Results from using 10 × larger (100-gram) samples
Means: SDs:
8.29 8.12 8.72 8.54
8.61 8.72 8.42 8.76
10.04 11.67 11.38 10.19
8.86 9.02 9.29 8.63
8.42 0.26
8.63 0.15
10.82 0.82
8.94 0.26
Pooled SD = 0.46, “Between” SD = 1.10 Expected “Between” SD = 0.23 F = 23 F (crit) = F (0.95, 3, 12) = 3.49
but which are large enough to measurably affect the analysis. In this case, the expected distribution of the sampling variation of such particles would be the Poisson distribution [2]. In such a case, if we take a larger sample, we would expect the standard deviation to decrease as the square root of the sample size. Thus, if we take samples ten times larger than previously, the standard deviation of the “between” readings should become approximately one-third of the previous value. Therefore, for the next test, 100 g samples each were dissolved in 1 liter of water. The results are shown in Table 9-4. Note that the “between” standard deviation is almost identical to the previous value; we conclude that inhomogeneity of the sample is not the problem. The possibility of drift between sets of readings was ruled out by virtue of the fact that many of the steps of the analytical procedure were done simultaneously on the several readings of the different aliquots. The possibility of drift between readings was ruled out by repeating the readings in different orders; the same values were obtained regardless of the order of reading. This left “something else” as the possible cause of the variability. When we considered the nature of the test, which was sensitive to parts per million of organic materials, we realized that one possibility was contamination of the glassware by the soap used to clean it. We next cleaned all glassware with chromic acid cleaning solution, and reran the tests, with the result as shown in Table 9-5. Removal of the extraneous source of variability did indeed reduce the “between-flasks” variance to a level that is now explainable (in the statistical sense) by the underlying random variations attributable to the within-flask variability. Table 9-5 Results after cleaning glassware with chromic acid
Means: SDs:
4.65 5.03 4.38
5.98 4.61 4.49
5.19 3.96 4.92
4.97 4.43 4.79
4.62 4.94 3.37
3.93 4.60 5.95
4.68 0.33
5.16 0.73
4.69 0.64
4.73 0.27
4.31 0.83
4.84 1.03
Pooled SD = 0.69, “Between” SD = 0.27 Expected “Between” SD = 0.39 F = 047 F (crit) = F (0.95, 5, 12) = 3.10
62
Chemometrics in Spectroscopy
Table 9-6 Types of experimental designs Number of levels
Number of factors Single
Multiple
Two
Experimental versus control subjects
One-at-a-time designs Factorial designs Fractional factorial designs Nested designs Special designs
Multiple
Sensitivity testing Simple regression
Response surface designs Multiple regression
End of example From the prototype experiment, we can generate many variations of the basic scheme. The two main ways that the model shown in equation 9-4 can be varied is to increase the number of factors and to increase the number of levels of each factor. A given factor must have at least two levels (even if one of the levels is an implied zero), and may have any number greater than two. Table 9-6 lists the types of designs that fall into each of these categories. The types of designs used by scientists in simple settings, not usually considered “statistical” designs, are the “experimental versus control” designs (discussed above), the one-at-a-time designs (where each factor is individually changed from its “control” value to its “experimental” value, then restored when the next fac tor is changed), and the simple regression (often used in calibration work when only one physical variable is affected – in chemistry, electrochemical and chromatographic applications come to mind). The table is not exhaustive, although it does include a majority of experimental designs that are used. One-at-a-time designs are the usual “non-statistical” type of experiments that are often carried out by scientists in all disciplines. Not included explicitly, however, are experimental designs that are generated from combinations of listed items. For example, a multi-factor experiment may have several levels of some of the factors but only two levels of other factors. Also, due to the nature of the physical factors involved, the values of some of the factors may not be under the experimenter’s control. Thus, some factors may be nested, while others may not be.
REFERENCES 1. Mark, H. and Workman, J., Statistics in Spectroscopy (Academic Press, Boston, 1991), pp. 80–81. 2. Mark, H. and Workman, J., Spectroscopy 5(3), 55–56 (1991).
10
Experimental Designs: Part 3
We continue with this chapter specifically dealing with experimental design issues. When we leave the realm of the simplest designs, we find that the experiments, and the analyses of the data therefrom, acquire characteristics not existing in the simpler designs, and beyond obvious extensions of them. For example, consider a two-factor design with each factor at two levels. This is also a form of all-possible-combinations experiment. One item we note here is that there is more than one way to describe the form of an experiment, and we include a short digression here to explicate this multiplicity of ways of describing an experiment. In this particular case, we have two factors, each at two levels. We can describe it as a listing of values corresponding to each experiment (Table 10-1). Alternatively, we can describe it as the experiment number that will correspond to each set of combinations of factors (Table 10-2): Whichever way we choose to describe the design, it (and the others of this type) has some attractive features. We will illustrate these features with a numerical example. For our example, we will imagine an experiment where the scientist is interested in determining the influence of temperature and of catalyst on the yield of a chemical reaction. The questions to be answered are: does the concentration of catalyst make a difference, and does the type of catalyst make a difference? The experiment is to consist of trying each of the four available catalysts and three solvents, and determining the yield. The experiment can be described by Table 10-3. In a more complicated case, where a physical variable such as temperature, which can be assigned meaningful physical values, was the physical variable and the sensitivity of the yield to temperature was of concern, we would then need to maintain (or control) the information regarding the actual temperatures. For our first look at this experiment we will examine the behavior of the experiment under two sets of conditions. The first scenario gives a set of conditions with the results obtained under the following assumptions: 1) There is no influence of solvent 2) None of the catalysts have an effect 3) There are no random influences on the experiment. The second scenario has similar conditions, but with one change: 1) There is no influence of solvent 2) One of the three catalysts has an effect 3) There are no random influences on the experiment.
64
Chemometrics in Spectroscopy
Table 10-1 All-possible-combinations experiment organized as a list of values Experiment number 1 2 3 4
Factor #1
Factor #2
L L H H
L H L H
Table 10-2 All-possible-combinations experiment organized as a table where the body of the table contains the experiment number corresponding to each set of experimental conditions
L H
Factor #1 1 3
2 4
L H
Factor #2 1 2
3 4
Table 10-3 Conditions for the experiment consisting of determining the yield of a chemical reaction with different solvents and temperatures Catalyst number 1 2 3 4
Solvent #1
Solvent #2
Solvent #3
1 4 7 10
2 5 8 11
3 6 9 12
In both experiments, Conditions 1 and 2 together mean that all results from the experi ment will be the same in the first scenario, and all results except the ones corresponding to the “effective” catalyst will be the same; while that one will differ. Condition 3 means that we do not need to use any statistical or chemometric considerations to help explain the results. However, for pedagogical purposes we will examine this experiment as though random error were present, in order to be able to compare the analyses we obtain in the presence and in the absence of random effects. The data from these two scenarios might look like that shown in Table 10-4. For each scenario, the statistical analysis of this type of experimental design would be a two-way analysis of variance. This is predicated on the construction of the experiment, which includes some implicit assumptions. These assumptions are 1) The influence of the factors changing between the rows is independent of the influence of the factors changing between the columns.
Experimental Designs: Part 3
65
Table 10-4 Hypothetical data under two different scenarios, for the experiment examining the effect of temperature and catalyst on yield; with no random variations affecting the data Catalyst number
1 2 3 4
First scenario
Second scenario
Solvent number
Solvent number
1
2
3
1
2
3
25 25 25 25
25 25 25 25
25 25 25 25
25 25 35 25
25 25 35 25
25 25 35 25
2) The influence of the factors changing between the columns is independent of the influence of the factors changing between the rows. 3) Any error (in these first two scenarios assumed zero) is random, has a mean value of zero, and is Normally distributed. If these assumptions hold, then each quantity in the data table can be expressed as the
sum of the following four factors:
1) 2) 3) 4)
The The The The
grand mean of all the data
influence of the value of the factor corresponding to each row
influence of the value of the factor corresponding to each column.
variation superimposed by any random phenomena affecting the data.
This being the case, quantities computed for a two-way analysis of variance are the
following:
1) The grand mean of all the data
2) The mean of each row, and the difference of each row mean from the grand mean (this estimates the influence of the values of the factor corresponding to the rows) 3) The mean of each column, and the difference of each column mean from the grand mean (this estimates the influence of the values of the factor corresponding to the columns) 4) Any difference between the actual data and the corresponding values calculated from the grand mean and the influences of the row and columns factors (this estimates the error variability). In Tabel 10-5, we present the standard representation of this breakdown of the data. There are two important points to note about the results in this table: first the data, shown in the body of the table in Part A, is in fact equal to the sum of the following quantities: 1) the grand mean (shown in Part A)
2) + row differences from the grand mean (shown in Part B)
66
Chemometrics in Spectroscopy
Table 10-5 Part A – ANOVA for the errorless data from Table 10-4 Catalyst number
First scenario Solvent number 1
2
3
1 2 3 4
25 25 25 25
25 25 25 25
25 25 25 25
Col. means:
25
25
25
∗
Second scenario
Row means
Solvent number
Row means
1
2
3
25 25 25 25
25 25 35 25
25 25 35 25
25 25 35 25
25 25 35 25
25
27.5
27.5
27.5
27.5∗
Grand mean
Table 10-5 Part B – RESIDUALS for ANOVA from Table 10-4 after correcting for row and column means Catalyst number
First scenario Solvent number 1
2
3
1 2 3 4
0 0 0 0
0 0 0 0
0 0 0 0
Mean diff. from grand mean:
0
0
0
Second scenario
Row diffs
0 0 0 0
Solvent number 1
2
3
0 0 0 0
0 0 0 0
0 0 0 0
0
0
0
Row diffs
−25 −25 7.5 −25
3) + column differences from the grand mean (shown in Part B) 4) + residuals (shown in the body of Part B). The second point is that the mean of the residuals, representing the error portion of the data, are zero; the data is accounted for entirely by the systematic variations due to the variations between the rows and the variations between the columns (of course, the column differences happen to be zero in this data). Now the really interesting stuff happens when we do in fact have error in the data. Let us look at what happens to these two scenarios when there is a small amount of random error variability superimposed on the data. Now the experimental conditions for the two scenarios are as follows: Scenario #3: 1) There is no influence of solvent 2) None of the catalysts have an effect 3) There is a random due to error on the experiment.
Experimental Designs: Part 3
67
Scenario #4: 1) There is no influence of solvent 2) One of the three catalysts has an effect 3) The same random error exists as in Scenario #1. For these two situations, let us suppose each error has the value as shown in Table 10-6 for the corresponding datum. The values in Table 10-6 were selected randomly, and have a mean of zero and a standard deviation of unity. When these error values are superimposed on the data, we arrive at the Table 10-7. When we subject this data to the same ANOVA calculations as the errorless data, we arrive at the following results (Table 10-8): It is instructive to compare the values in these tables with the corresponding values in the ANOVA tables for the errorless data. In particular, note that in the table corresponding to Scenario 1, even though there is no underlying systematic variations in the data, both the row and the column means are perturbed by the random variations superimposed on the data. How then, can we differentiate these differences from the ones due to real systematic variations such as are present in Scenario 2? The answer, of course, is to do a statistical hypothesis test, but as it stands, we do not seem to have enough information available for such a test. We can compute variances between rows and also between columns, in order to have the mean squares for the corresponding differences, but what are we going to compare these mean squares to? In particular, what are we going to use
Table 10-6 For Scenarios 3 and 4 each error has the following value for the corresponding datum −03583 −09583 0.0416 −10583
0.8416 −12583 −13583 0.4416
0.5416 1.4416 1.4416 0.2416
Table 10-7 Hypothetical data under two different scenarios; for the experiment examining the effect of solvent and catalyst on yield, random variations (from Table 10-6) have zero mean and unity standard deviation Catalyst number
1 2 3 4
Third scenario
Fourth scenario
Solvent number
Solvent number
1
2
3
1
2
3
25.8416 23.7416 23.6416 25.4416
24.6416 24.0416 25.0416 23.9416
25.5416 26.4416 26.4416 25.2416
25.8416 23.7416 33.6416 25.4416
24.6416 24.0416 35.0416 23.9416
25.5416 26.4416 36.4416 25.2416
68
Table 10-8 Part A – DATA: ANOVA for the hypothetical data containing error with mean equal 0 and standard deviation (S) equal to unity Catalyst number
Third scenario
Fourth scenario
Solvent number 1
2
3
1 2 3 4
25.8416 25.7416 25.6416 25.4416
24.6416 24.0416 25.0416 23.9416
25.5416 26.4416 26.4416 25.2416
Col. means:
25.6666
24.4166
25.9166
Grand mean
Row means
1
2
3
25.3416 24.7416 25.0416 24.875
25.8416 25.7416 33.6416 25.4416
24.6416 24.0416 35.0416 23.9416
25.5416 26.4416 36.4416 25.2416
25.3416 24.7416 35.0416 24.875
25∗
27.1666
26.9166
28.4166
27.5∗ Chemometrics in Spectroscopy
∗
Solvent number
Row means
Experimental Designs: Part 3
Table 10-8 Part B – RESIDUALS for the hypothetical data containing error with mean equal 0 and standard deviation (S) equal to unity Catalyst number
Third scenario
Fourth scenario
Solvent number 1
2
3
1 2 3 4
0.8333 −06666 −10666 0.9
−01166 −01166 0.5833 −035
−07166 0.7833 0.4833 −055
Col. diff from grand mean
−03333
−05833
0.9166
Row diff. from grand mean 0.3416 −02583 0.0416 −0125
Solvent number 1
2
3
0.8333 −06666 −10666 0.9
−01166 −01166 0.5833 −035
−07166 0.7833 0.4833 −055
−03333
−05833
0.9166
Row diff from grand mean −21583 −27583 75416 −2625
69
70
Chemometrics in Spectroscopy
to represent the error, to see if the row mean squares or the column mean squares are larger than can be accounted for by the error of the data? The answer to this question is in the residuals. While the residuals might not seem to bear any relationship to either the original data or the errors (which in this case we know because we created them and they are listed above), in fact the residuals contain the variance present in the errors of the original data. However, the value of the error sum of squares is reduced from that of the original data, because of the subtraction of some fraction of the error variation from the total when the row and column means were subtracted from the data itself. This reduction in the sum of squares can be compensated for by making a corresponding compensation in the degrees of freedom used to calculate the mean square from the sum of squares. In this data the sum of squares of the residuals is 5.24 (check it out). The number of degrees of freedom in these residuals is calculated by starting with the total (which is twelve, one from each piece of data in the experiment) and subtracting one degree of freedom for each quantity calculated from and subtracted from the data. What are these? Well, there is one grand mean, four row means, and three column means. The number of degrees of freedom lost = r − 1c − 1 = 4 − 13 − 2 = 6. Thus there is a loss of six degrees of freedom from the twelve, leaving six for the residuals. The mean square for the residuals is thus 5.24/6, or 0.877, and as a check, the square root of that value, 0.934 is an estimate of the error (which we know is unity).
11 Analytic Geometry: Part 1 – The Basics in Two and Three Dimensions
Analytic geometry is a branch of mathematics in which geometry is described through the use of algebra. Rene Descartes (1596–1650) is credited for conceptualizing this mathematical discipline. Recalling the basics, we can express the points of a plane as a pair of numbers with x-axis and y-axis coordinates, designated by (x, y). Note that the x-axis coordinate is termed the “abscissa”, and the y-axis the “ordinate”.
THE DISTANCE FORMULA In two dimensions (x and y), the distance between two points (x1 , y1 ) and (x2 , y2 ) in two-dimensional space (as shown in Figure 11-1) is given by the Pythagorean theorem as D2 = x2 − x1 2 + y2 − y1 2 = x2 − x1 2 + y2 − y1 2
(11-1)
and D=
√
x2 − x1 2 + y2 − y1 2
(11-2)
Note: This relationship holds even when x1 or y1 or both are negative (also shown in Figure 11-1). In three dimensions (x, y, z), we describe three lines at right angles to one another, designated as the x, y, z axes. Three planes are represented as xy, yz, and zx, and the distance between two points (x1 , y1 , z1 ) and (x2 , y2 , z2 is given by D2 = x2 − x1 2 + y2 − y1 2 + z2 − z1 2 = x2 − x1 2 + y2 − y1 2 + z2 − z1 2
(11-3)
and D=
√
x2 − x1 2 + y2 − y1 2 + z2 − z1 2
(11-4)
72
Chemometrics in Spectroscopy Y
(x2, y2)
X
(x1, y1)
Figure 11-1 The distance between two points in a two-dimensional coordinate space is deter mined using the Pythagorean theorem.
DIRECTION NOTATION For two-dimensional problems, given a line with respect to two axes x and y, there is a set of angles and that are designated as the x direction angle and y direction angle, respectively. Thus, as illustrated by using Figures 11-2a and 11-2b, a clearly defined line segment can be described given the angles and on the coordinate axes x and y. The only restriction that applies here is that both angles and must be ≥ 0 and ≤ 180 .
THE COSINE FUNCTION The cosine function applied to Figures 11-2a and 11-2b is given as cos =
x2 − x1 d
(11-5a)
cos =
y2 − y1 d
(11-5b)
and
(a)
(b) Y
Y
β
β α X X
α
Figure 11-2 Two illustrations of the x-direction angle ( and y-direction angle ( for a two-dimensional coordinate system.
Analytic Geometry: Part 1
73
where, d=
√ x2 − x1 2 + y2 − y1 2
(11-6)
Note that cos a and cos p are referred to as the direction cosines of the line segment described. To summarize in expanded notation: cos = √
x2 − x1 x2 − x1 2 + y2 − y1 2
(11-7a)
and cos = √
y2 − y1 x2 − x1 2 + y2 − y1 2
(11-7b)
Example: To find the direction cosines and corresponding angles for a line segment AB, where A is (3, 5) and B is (2, 7); check your work using cos2 + cos2 = 10, and draw a graphic of the line segment (Figure 11-3). The solution proceeds as follows: x2 − x1 = 2 − 3 = −1
(11-8a)
y 2 − y1 = 7 − 5 = 2
(11-8b)
and
Therefore, the distance (d) is given by √ x2 − x1 2 + y2 − y1 2 √ √ d = −12 + 22 = 5
d=
(11-9a) (11-9b)
From the formulas above, we can determine that √ cos = −1/ 5 Y
B
β = 26.57° α = 116.5° A
X
Figure 11-3 The x-direction angle and y-direction angle for a line segment, where A is (3, 5) and B is (2, 7) (see example in text).
74
Chemometrics in Spectroscopy
and the corresponding angle is given as √ = cos−1 −1/ 5 = 11657 We also know that √ cos = 2/ 5 therefore the angle is given by √ = cos−1 2/ 5 = 2657 Checking our work using the formula cos2 + cos2 = 10, we find that cos2 11657 + cos2 2657 = 020 + 080 = 10
DIRECTION IN 3-D SPACE To continue our discussion of direction angles, we will use the same nomenclature: x, designated by ; y, designated by ; and z, newly designated by . We can determine the cosine of any direction angle, given the corresponding x, y, z coordinates for designated points in space as: cos = x2 − x1 /d
(11-10a)
cos = y2 − y1 /d
(11-10b)
cos = z2 − z1 /d
(11-10c)
and
and
where, d=
√ x2 − x1 2 + y2 − y1 2 + z2 − z1 2
(11-11)
cos 2 + cos 2 + cos 2 = 10
(11-12)
It follows algebraically that
Example: Find the direction cosines and corresponding angles for a line segment AB where A is (2, −1, 4) and B is (4, 1, 2). To solve, use x2 − x1 = 4 − 2 = 2
Analytic Geometry: Part 1
75
and y2 − y1 = 1 − −1 = 2 and z2 − z1 = 2 − 4 = −2 √ x2 − x1 2 + y2 − y1 2 + z2 − z1 2 √ √ d = 22 + 22 + −22 = 12 = 346
d=
and cos = 2/346 = 0577 cos = 2/346 = 0577 cos = −2/346 = −0577 To find the direction angles corresponding to the above we use = cos−1 0577 = 5476 = cos−1 0577 = 5476 = cos−1 −0577 = 12523 Checking the calculations, we use cos2 + cos2 + cos2 = 10 or 0333 + 0333 + 0333 = 100
DEFINING SLOPE IN TWO DIMENSIONS The slope m of a line segment between two points is given as: m = y2 − y1 /x2 − x1 = tan
(11-13)
where is the x direction angle and 0 < 360 . This well-known expression is also equivalent to the tangent of the x direction angle for the line segment defined by the two points on the line. Thus the slope of the line given in Figure 11-4 is tan120 = −174. Just store this information away for the next several chapters as we build a pre chemometrics view of analytic geometry.
76
Chemometrics in Spectroscopy Y
θ = 120°
X
Figure 11-4 Illustration of the slope of a line given an x-direction angle of 120 .
RECOMMENDED READING We recommend a standard text on vector analytic geometry. One good example is 1. White, P.A., Vector Analytic Geometry (Dickenson, Belmont, CA, 1966).
12
Analytic Geometry: Part 2 – Geometric Representation
of Vectors and Algebraic Operations
We continue with our pre-chemometrics review of analytic geometry, noting the term “vector” in all cases can be represented by a matrix of r × c dimensions, where r = # of rows and c = # of columns. The operations defined below will be employed in future discussions.
VECTOR MULTIPLICATION (SCALAR × VECTOR) If M represents a vector with components (or elements) as (Mx , My , then sM (where s is a real number, also termed a “scalar”) is defined as the vector represented by (sMx , sMy ); and the length of sM is s times the length of M. One can relate the direction angles of M to those of sM as follows: For the case where s > 0 (s is a positive, real number), then cos sM = cos M
(12-1a)
cos sM = cos M
(12-1b)
and
So the vectors sM and M have the exact same direction. For the case where s < 0 (where s is a negative, real number), then cos sM = −cos M
(12-1c)
cos sM = −cos M
(12-1d)
and
In this case, the vectors sM and M have the exact opposite directions. (Note: When
s = 0, there is no definition for the vector or direction.)
Example problem. If M = 1 5, then 2M (where s = 2) = 2 × 1 2 × 5 = 2 10,
represented in Figure 12-1 as the line segment from point (0, 0) to (2, 10). (Note: The
expression −2M = −2 −10 is represented by the line segment from point (0, 0) to
−2 −10.]
78
Chemometrics in Spectroscopy
(2, 10) 2M segment (0, 0) to (2, 10) (1, 5) M segment (0, 0) to (1, 5)
–2M segment (0, 0) to (–2, –10)
(–2, –10)
Figure 12-1 An example of scalar × vector multiplication: if M = 1 5, then 2M = 2 10 and −2M = −2 −10.
VECTOR DIVISION (VECTOR ÷ SCALAR) Vector division is represented as vector multiplication by using a fractional multi plier term. For example, if s = 1/2, then sM = 05 25; if s = −1/2, then sM = −05 −25, and so forth.
VECTOR ADDITION (VECTOR + VECTOR) Given M = Mx , My ), where M = 1 3; and N = Nx , Ny ), where N = 3 1, then M + N = MX + Nx My + Ny
(12-2)
The geometric representation is shown in Figure 12-2 for 1 + 3 3 + 1 = 4 4.
M + N = (4, 4)
M = (1, 3)
N = (3, 1)
Figure 12-2 An example of vector + vector addition: If M = 1 3 and N = 3 1, then M + N = 4 4.
Analytic Geometry: Part 2
79
VECTOR SUBTRACTION (VECTOR − VECTOR) Given M = Mx , My ), where M = 1 3, and N = Nx , Ny ), where N = 3 1, then M − N = Mx − Nx My − Ny The geometric representation of M − N = 1 − 3 3 − 1 = −2 2 is shown in Figure 12-3. In our next chapter we will look at the problem of representing higher dimensional space with fewer dimensions; it will be a precursor to discussions of the dimensional aspects of multivariate algorithms.
M – N = (–2, 2)
–N
M = (1, 3)
N = (3, 1)
Figure 12-3 An example of vector-vector subtraction: If M = 1 3 and N = 3 1 then M −N = −2 2.
This page intentionally left blank
13
Analytic Geometry: Part 3 – Reducing Dimensionality
For this chapter, we will reduce three-dimensional data to one-dimensional data using the techniques of projection and rotation. The (x, y, z) data will be projected onto the (x, z) plane and then rotated onto the x axis. This chapter is purely pedagogical and is intended only to demonstrate the use of projection and rotation as geometric terms.
REDUCING DIMENSIONALITY The exercise for this column is to reduce a point on a vector in 3-D space to a point on a vector in 2-D space, then to further reduce the point on a vector in 2-D space to a point on a vector in 1-D space – all the while maintaining as much information as possible. So (x, y, z) is reduced to (x, z), which is further reduced to (x). This process can be represented in symbolic language as (x, y, z) → (x, z) → x.
3-D TO 2-D BY PROJECTION Let us calculate some of the angles relative to the vector in 3-D space as shown in Figure 13-1. To calculate these angles, we refer to Chapter 1, and if we proceed with our calculations we find = cos−1 07071 = 45
(13-1)
and cos =
y2 − y1 2−0 = √ = 07071 d 8
= cos−1 07071 = 45
(13-2)
where, d=
�
x2 − x2 2 + y2 − y2 2 =
�
2 − 02 + 2 − 02 =
√
8
82
Chemometrics in Spectroscopy z (2, 2, 6)
α
y
β
α
x
Figure 13-1 A point (X, Y , Z) = (2, 2, 6) located along a vector in 3-D space. Both the angle (the angle to the x-axis) and the angle (the angle to the y-axis), as illustrated in the figure are shown as a projection of the 3-D-vector (2,2,6) onto the (x, y) plane, and the proper calculations for both and from what is then a 2-D vector are correct as given in equations 13-1 and 13-2.
Because the third dimension is represented by the z axis, we calculate the z-direction angle on the (x, z) plane as : � = cos
−1
�
x 2 − x1 x2 − x1 2 + z2 − z1 2
�
� = cos
−1
�
2−0
�
2 − 02 + 6 − 02
= cos−1 03162 = 7157
(13-3)
Now look at Table 13-1 , which describes the trigonometric functions of a right triangle (Figure 13-2). If we apply Table 13-1 to this problem, we can calculate the length of a vector using trigonometric functions. Figure 13-3 illustrates the geometric problem for solving the length of the vector A to B or from points on the (x, z) axis (0, 0) to (2, 6). The angle calculated in equation 13-3 is represented in Figures 13-3 and 13-4; the angle shown in Figure 13-1 is not discussed. Because the third dimension is represented by the z-axis, we calculate the x-direction angle on the (x z) plane as : The correct calculation for this angle () is given in equation 13-3. To calculate the length of the horizontal vector for the projection of vector AB onto the (x, z) plane, we can use sin = opp/hyp
Table 13-1 Trigonometric functions of a right triangle opposite hypotenuse adjacent cos = hypotenuse opposite tan = adjacent sin =
hypotenuse opposite hypotenuse sec = adjacent adjacent cot = opposite csc =
Analytic Geometry: Part 3
83 Hypotenuse Opposite
θ Adjacent
Figure 13-2 A right triangle showing adjacent (adj.), hypotenuse (hyp.) and opposite sides relative to angle .
B
z
(2, 6)
θ hyp
adj
D
A
x
opp
Figure 13-3 The geometric problem associated with calculating the length of a vector AB, given a point (x, z) = (2, 6) in 2-D space. Note that the angle is equal to 90 − 7157 = 183 .
z L = 6.33
α = 71.57°
x
Figure 13-4 Illustration of two-dimensional reduction to one dimension by an x-directional rotation of 7157 .
which becomes hyp = opp/ sin = 2/ sin1843 = 633 Therefore, we can project the AB vector in 3-D space onto 2-D space by using a projection onto the (x, z) plane, resulting in a point on a vector (on the 2-D (x, z) plane) the vector being 6.33 units in length and having an X-direction angle equal to 7157 (as in Figure 13-4).
84
Chemometrics in Spectroscopy
2-D INTO 1-D BY ROTATION By rotating the vector in 2-D space over 7157 in the X-direction, we can align it to the X axis as a 1-D line 6.33 units in length (as shown in Figure 13-5). z
L = 6.33
x
Figure 13-5 By projecting a vector in (x, y, z) space onto a plane in (x, z) space, and by an x-directional rotation of 7157 in the (x, z) plane, we have the reduction of a point on a vector in 3-D space to a point on a vector in 1-D space.
In our next chapter, we will be applying the lessons reviewed over these past three chapters toward a better understanding of the geometric concepts relative to multivari ate regression.
14
Analytic Geometry: Part 4 – The Geometry of Vectors
and Matrices
In this chapter, we plan to use the information presented over the past three chapters to illustrate the geometry of vectors and matrices; these concepts will continue to be discussed routinely throughout this series in relation to regression vectors.
ROW VECTORS IN COLUMN SPACE Let us begin by representing a row matrix M = 1 2 3 in column space as shown in Figure 14-1. Note that the row vector M = 1 2 3 projects onto the plane defined by columns 1 and 2 as a point (1, 2) or a vector (straight line) with a C1 direction angle () equal to � = cos
−1
C12 − C11 d
�
� = cos
−1
1−0 √ 5
� (14-1)
cos−1 04472 = 6343 and a C2 direction angle () equal to � = cos
−1
C22 − C21 d
�
� = cos
−1
2−0 √ 5
� (14-2a)
cos−1 08944 = 2657 where d=
� � √ C12 − C11 2 + C22 − C21 2 = 12 + 12 = 5
(14-2b)
COLUMN VECTORS IN ROW SPACE �
� 1 2 can be represented 3 4 by 2-D row space as shown in Figure 14-2. Note that each column in the matrix can be represented by a column vector as shown in the figure. A matrix consisting of more than one row, such as M =
86
Chemometrics in Spectroscopy
Column 3
Row vector M = [1, 2, 3]
Column 2
β
Column 1
α
Figure 14-1 A representation of a row vector M = 1 2 3 in column space, and the projection of this vector onto the plane represented by Columns 1 and 2.
Row 2 4
Column 2
Column 1
3
2
1 Row 1
0 0
1
2
3
4
�
� 12 . Figure 14-2 The representation of column vectors in row space of matrix M = 34
PRINCIPAL COMPONENTS FOR REGRESSION VECTORS Figure 14-3a shows the projection of two column vectors – C1 = 1 3 and C2 = 3 1 onto their vector sum (or principal component (PC1)). We note that the product 1 3 × 3 1 = 1 × 3 3 × 1 = 3 3 . The vector sum of the two column vectors passes through the point (3, 3). but the projection of each column onto PC1 gives a vector with a length equal to line segments B + C as shown in Figure 14-3b.
Analytic Geometry: Part 4
87
(a)
(b) 4
4 PC1
Column 1
3
PC1
Column 1
3
B 2
2
D
E A
C 1 ∠D
Column 2
1
Column 2
∠α ∠β
0
∠C
0 0
1
2
3
4
0
1
2
3
4
Figure 14-3 (a) The representation of two columns of a matrix in row space. The vector sum of the two column vectors is the first principal component (PC1). (b) A close-up view of Figure 14-3a, illustrating the line segments, direction angles, and projection of Columns 1 and 2 onto the first principal component.
To determine the geometry for Figures 14-3a and 14-3b, we begin by calculating the length of line segment E (Column 1) by using the Pythagorean theorem as E 2 = Hyp2 = 3−02 + 1−02 = 32 + 12 = 10 √ Therefore: E = 10 = 3162
(14-3)
Then the angle C can be determined using opp 1 = adj 3
(14-4a)
1 = 18435 3
(14-4b)
tanC = and tan−1
So ∠C = 18435 , ∠D = 18435 , and ∠ + ∠ − 2 × 18435 = 90 . Thus, both ∠ and ∠ are each equal to 26565 . It follows that the projection of the vectors represented by Columns 1 and 2 onto the vector PC1 yields a right triangle defined by the three line segments C + B, D, and E. The length of PC1 (the hypotenuse) is equal to line segments C + B and is given by cos =
adj E 3162 ⇒ cos = ⇒ 08944 = = 35353 hyp C +B hyp
(14-5)
So the length of the hypotenuse (segments C + B) is 3.5353. We can check our work by calculating the opposite side (D) length as tan =
opp D opp ⇒ tan = ⇒ 0500 = = 15810 adj E 3162
(14-6)
88
Chemometrics in Spectroscopy
And by using the Pythagorean theorem we can calculate the length of the hypotenuse: 31622 + 158102 = 353522
(14-7)
By representing a row vector in column space, or a column vector in row space, we can illustrate the geometry of regression. These concepts combined with matrix algebra will be useful for further discussions of regression. In Chapters 15–20, we will digress from these topics and revisit experimental design concepts. Readers may wish to study additional materials related to the subject of analytical geometry and regression. We recommend two sources of such information below.
RECOMMENDED READING 1. Beebe, K.R. and Kowalski, B.R., Analytical Chemistry 59(17), 1007A–1017A (1987). 2. Fogiel, M., ed., The Geometry Problem Solver (Research and Education Association, New York, 1987).
15 Experimental Designs: Part 4 – Varying Parameters to Expand the Design
We have discussed experimental designs in previous papers [1–4], and in Chapters 8–10. In those previous chapters, the designs we discussed were, with the exception of one particularly interesting design (representing a special case of a more general type of design that we will discuss later), rather simple and plain, in the sense that the designs included only small numbers of levels of the various factors of interest, and were basically considerations of “all possible combination” of those factors – the types of experiments that scientists have been designing “forever” without any thought or consideration that they were “statistical experimental designs”. Obviously, though, since they represent special cases of wider classes of designs, they must also come under that umbrella. So what is special about the experimental designs that we call “statistical” or “chemometric” designs? Actually, very little, until we take a look at what happens when we need to scale these designs up to larger sample numbers or more complex designs. Before we do that, let us consider the various types of experiments, and the nature of the factors that are used in those experiments, involved. Someone doing an experiment is generally trying to learn about the effect of some phenomenon on some quantity that can be measured. While there are cases that do not fit the description we are about to present, one very common type of experiment involves changing (or allowing the change of) some parameter, and then measure the effect of that change. If there is only one such parameter, the situation is pretty straightforward, but things start getting interesting when two or more possible parameters are involved. Intuitively, the first instinct is to measure the results that are obtained for all possible combinations of the available values of the parameters. In Chapter 8, we looked at some experiments that involved two parameters (factors), each at two levels. In Chapter 10, we briefly looked at a three-factor, two-level design, with attention to how it could be represented geometrically. The use of the term “three factor, two level” to describe the design means that each factor was present at two levels, that is, the corresponding parameters were each permitted to assume two values. There are several ways we can expand a design such as this: we can increase the number of factors, the number of levels of each factor, or we can do both, of course. There are other differences than can be superimposed over the basic idea of the simple, all-possible combinations of factors, such as to consider the effect of whether we can control the levels of the factors (if we can then do things that are not possible to do if we cannot control the levels of the factors), whether the “levels” correspond to physical characteristics that can be evaluated and the values described have real physical meaning (temperature, for example, has real physical meaning, while catalyst type does not, even though different catalysts in an experiment may all have different degrees of effectiveness, and reproducibly so).
90
Chemometrics in Spectroscopy
Another consideration is whether all the factors can be changed independently through their range of possible values, or whether there are limits on the possible values. The most obvious limiting situation is the case of mixtures, where all the components of a mixture must sum to 100%. Other limitations might be imposed by the physical (or chemical) behavior of the materials involved: solubility as a function of temperature, for example, or as a function of other materials present (maximum solubility of salt in water–alcohol mixtures, for example, will vary with the ratio of the two solvents). Other limits might be set by practical considerations such as safety; except for specialized work by scientists experienced in the field, few experimenters would want to work, for example, with materials at concentrations above their explosive limits.
REFERENCES 1. 2. 3. 4.
Mark Mark Mark Mark
H. H. H. H.
and and and and
Workman, Workman, Workman, Workman,
J., J., J., J.,
Spectroscopy Spectroscopy Spectroscopy Spectroscopy
9(8), 26–27 (1994). 9(9), 30–32 (1994). 6(1), 13–16 (1991). 10(1),17–20 (1995).
16
Experimental Designs: Part 5 – One-at-a-time Designs
In Chapter 15, which was based on reference [1] we began our discussions of factorial designs. If we expand the basic n-factor two-level experiment by increasing the number of factors, maintaining the restriction of allowing each to assume only two values, then the number of experiments required is 2n , where n is the number of factors. Even for experiments that are easy to perform, this number quickly gets out of hand; if eight different factors are of interest, the number of experiments needed to determine the effect of all possible combinations is 256, and this number increases exponentially. The other obvious way we might want to expand the experiment is to increase the number of levels (values) that some or all of the factors take. In this case, the number of experiments required increases even faster than 2n . So, for example, if each factor is at three levels, then the number of experiments needed is 3n (for eight factors, corresponding to our previous calculation, this comes to 6,561 experiments!). In the general case, the number of experiments needed is i ni , where ni is the number of levels of the ith factor. It should be clear at this point that the problem with this scenario is the sheer number of experiments needed, which in the real world translates into time, resources, and expense. Something must be done. Several “somethings” have been done. The intuitive experimenter, expert in his partic ular field of science but untrained in “statistical” designs, simplifies the whole process by throwing out all the combinations, and uses what are known as simply “one-at-a-time” designs [2]. Five variations of this basic design are described, but basically these are only useful when the random noise or error is small (compared to the expected magnitude of the effects), and involve the experimenter changing one variable (factor) at a time to see which one(s) cause the greatest effect. Sometimes those are then examined in greater detail, by varying them over larger ranges, and/or at values lying within the original range. This solves the problem of the proliferation of experiments, since the number of experiments needed is now only 1+i ni instead of i ni , a much smaller number. It also provides a first-order indication of the effect of each of the factors. The difficulty now is the possibility of throwing out the baby with the bathwater, so to speak, by losing all information about the actual noise level, and information about any possible synergistic or inhibitory interactions between the factors. Thus, when statisticians got into the act, there saw a need to retain the information that was not included in the one-at-a-time plans, while still keeping the total number of experiments manageable; the birth of “statistical experimental designs”. Several types of “statistical experimental designs” have been developed over the years, with, of course,
92
Chemometrics in Spectroscopy
innumerable variations. However, they can be placed into a fairly small group of main design types: 1) 2) 3) 4)
Factorial Fractional factorial Sequential a) Latin square b) Graeco-latin square c) Latin and Graeco-latin cubes 5) Model-building 6) Response surface.
By far the most statistical energy has been spent on the design and analysis of factorial designs. Books dealing with such designs (e.g., [3, 4]) spend a good part of their space discussing the variations required to accommodate such considerations as replication, blocking, how to deal with situations where the experiment itself is destructive (so that the same specimen is never available for retesting), whether the experimental conditions can be reproduced at will, and whether the experimental factors (or the desired response) can be assigned meaningful numerical values. Each of these considerations dictates the types of designs that can be considered and how they must be implemented. For our current discussions, however, we have been taking the path of discussing ways to reduce the required number of experiments, while still retaining the advantages of obtaining several types of information about the system under consideration. The simplest such type of design is the sequential design, simplest if for no other reason than that the type of design it replaces is one of the simplest designs itself. We will discuss this type of design in Chapter 17.
REFERENCES 1. 2. 3. 4.
Mark, H. and Workman, J., Spectroscopy 10(9), 21–22 (1995). Daniel, C., Journal of American Statistical Association 68(342), 353–360 (1973). Davies, O.L., The Design and Analysis of Industrial Experiments (Longman, New York, 1978). Box, G.E.P. Hunter, W.G. and Hunter, J.S., Statistics for Experimenters (John Wiley & Sons, New York, 1978).
17 Experimental Designs: Part 6 – Sequential Designs
We begin our discussion of resource-conserving (for want of a better generic term) experimental design with a look at sequential designs. This is the first of the types of experimental designs that have as one of their goals, a reduction in the required number of experiments, while still retaining the advantages of obtaining several types of information about the system under consideration. The simplest such type of design is the sequential design, simplest if for no other reason than that the type of design it replaces is one of the simplest designs itself. This design is the simple test for comparison of means, using the Z-test or the t-test as the test statistic; we have discussed these in our previous column series and book: “Statistics in Spectroscopy” (now in its second edition [1]). The standard t-test (or Z-test) specifies a predefined number of measurements to be made, either for a single condition or for a pair of conditions (i.e., sample-versus “control”). The difference between the two states is compared to the experimental error evidenced in the data, and a decision made based on whether the difference between the states is “large enough”, compared to the noise (or error). For a sequential test, the number of experiments is not predefined. Rather, experiments are performed sequentially (surprise!), and the series terminated as soon as enough data is available that a decision can be made as to whether the difference is “large enough”. True, it is theoretically possible for such a sequence of experiments to be indefinitely long; in practice, however, it is far more common for the situation to become decidable after fewer experiments than are required for the case of a fixed number of experiments. So how does this “magic” experimental design work? The best available discussion we know of is in reference [2]. The standard concept behind this experimental design is illustrated in Figure 17-1. As this figure shows, the “universe” is divided into three regions: the region (A) is the region of acceptance of the null hypothesis; region C is the region of acceptance of the alternative hypothesis. The middle region, B, is the region of continuation: as long as values fall into this region, we must continue with the experiments, since there is not enough information to make a decision. Figure 17-2 shows how this works for two typical cases. First a single experiment is performed, and the results noted. If these results put it into the region of continuing the project (virtually inevitable after only one experiment), then a second experiment is performed, and so forth. Figure 17-2 shows typical results for two possible sequences of experiments: the one indicated by the crosses enters the region of acceptance of the alternative hypothesis after seven experiments, the one indicated by the circles enters the region of acceptance of the null hypothesis after nine experiments. Obviously, the actual number of experiments required will depend on both the nature of the experiments and the definition of the two regions of acceptance. The x-axis represents, clearly, the number of experiments that have been carried out. The y-axis represents a function of
94
Chemometrics in Spectroscopy
A
f (α, β)
B
C
Number of experiments
Figure 17-1 Standard concept behind sequential experimental design (see text for definition of function f ( )).
1 A
B f (α, β)
2
C
5 10 Number of experiments
15
Figure 17-2 Typical results for two possible experimental sequences.
the results of the experiments. Important to note at this point is the fact that, in one way or another, the quantity plotted along the y-axis is a function, not of the result of an experiment, but on one way or another, the cumulative results of all the experiments done up to that point. The key point, then, is how the lines separating the different regions are defined. The total answer will depend, of course, on which statistic is being plotted and on the details of the nature of the hypothesis test being done (e.g., two-tailed versus one-tailed, etc.). For an illustration we consider the sequential test of the hypothesis of the mean of a sample being the same as that of a given population, with the standard deviation known. In the case of fixed sample size, this would be done using a statistical hypothesis test with the Z statistic as the test statistic, and the probability level set simply to . For a sequential test, both the theory and the computations are a bit more complicated. In the case at hand, the defining limits are constructed as shown in Figure 17-3. The expected value of any given measurement is, of course 0 , the population mean. Then the expected value of the sum of n readings, which we label T , equals n for each value
Experimental Designs: Part 6
95
f (α, β)
A
B C
h0 Number of experiments
Figure 17-3 The relationship between the expected value of the statistic and the lines separating the regions of acceptance and rejection from the region indicating continuation of the experiment.
of n, and plotting these sums as a function of n gives the central straight line shown in Figure 17-3; this line represents the expected value of the sum, and has a slope equal to 0 . As can be seen, data that agrees with the null hypothesis will follow this line and eventually move into region A, the region of acceptance of the null hypothesis. The lines separating the two regions are defined by their slope and intercepts. If we let represent the minimum difference from 0 we wish to detect, then the slope of the lines (which is common to the two lines: they are parallel) equals 0 + /2. The y-intercepts, which we designate h, are h0 = − ln1 − / 2 / h1 = ln1 − / 2 / We note several interesting points about these expressions. First, the positions of the lines of demarcation depend, as we would expect, on both the minimum expected departure from 0 we wish to detect and . It also depends upon a quantity that is a logarithm, and the logarithm of the quantity no less, that we have always previously dismissed. While a discussion of properly belongs in the realm of elementary statistics, at this point it is worthwhile to go back to some of those discussions, to examine how this impacts our current interests. We will proceed along with this digression in our next chapter.
REFERENCES 1. Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991). 2. Davies, O.L., The Design and Analysis of Industrial Experiments (Longman, New York, 1978).
This page intentionally left blank
18
Experimental Designs: Part 7 – �, the Power of a Test
In Chapter 17 and reference [1], we started discussing the way a series of experiments could be designed so that the decision to perform another experiment could be based on the outcomes of the experiments already done. We saw there that we needed to be able to tell if we could stop because the result had become statistically significant; and we also saw that we needed a way to tell if we could stop because we had reached the statistically significant conclusion that there is no real difference between the sample and the (hypothetical) reference population. This is necessary, indeed crucial, otherwise we could continue experimenting endlessly, waiting for a statistically significant result when there was no real difference to detect so that none would be expected. The first stopping criterion is straightforward, it is simply the standard hypothesis test, based on probabilities that we have previously discussed of a sample coming from the hypothesized population P0 [2]. The second stopping criterion, however, seems to fly in the face of our previous discussions on the topic, where we said that you could not prove two populations the same. However, the reason for the second statement is that the difficulty in proving that a sample came from a given population is easier to see if we reword the statement of it by making it a double negative, and ask whether we can prove that it did not come from a different population? Now the nature of the difficulty becomes clearer: we have no information about the nature of the “different” population that we want to test against. Now that we can see the problem, we can find a point of attack against it. We can hypothesize a population Pa with any given characteristics we want, and then consider the consequences of dealing with that alternate population. In particular, we consider the probabilities of either accepting or rejecting our original null hypothesis (based on P0 if, in fact, our sample came from the alternate population Pa . The probability of coming to the incorrect conclusion that the sample came from P0 when it really came from Pa is called the probability (compare with the probability, which is the probability of drawing the incorrect conclusion that a sample did not come from P0 when it really did). This is known in statistical parlance as the “power” of the statistical test. Thus, in performing a statistical hypothesis test, we would normally consider only the ordinary tests against the alpha error as a means of determining statistical significance. However, as we have seen, that leaves completely open the number of samples needed. The power of a test gives us a criterion which will allow determining the number of samples. To redefine the term: the power of a statistical test is the probability of obtaining a statistically significant result given that in fact the null hypothesis is false. Ordinarily to show a non-significant result is easy: just use few enough samples. To show that you have obtained a non-significant result when there is a high probability of obtaining a significant result for a false hypothesis is convincing indeed, and also gives us the basis for determining the number of samples needed. On the other hand, we do not want to go overboard and use so many samples that we get statistically significant results for
98
Chemometrics in Spectroscopy
tiny, unimportant differences. As we will see below, the power of the test does allow us to specify the minimum number of samples required, but this number can quickly get out of hand, and show up tiny differences, if we are not careful on how we specify the requirements. The problem with defining criteria for such a test is that it depends on the probability, which is difficult to determine (although we could arbitrarily specify a value, such as 95%). It also depends on the smallest difference you need to detect, the number of samples, the variability of the data (which at least can be determined from the data, the same way it is done for determining ), and the probability of detecting the given difference at a specified alpha- significance level. Thus what we do is to work backwards, so to speak. Since we want to find the number of samples corresponding to different probabilities for , and D (the difference between the data and 0 , we first find the difference corresponding to given values of the other quantities. This can be seen more easily in Figure 18-1. To summarize Figure 18-1 in words, the top curve represents the characteristics of a population P0 with mean 0 . Also indicated in Figure 18-1 is the upper critical limit, marking the 95% point for a standard hypothesis test H0 that the mean of a given sample is consistent with 0 . A measured value above the critical value indicates that it would be “too unlikely” to have come from population P0 , so we would conclude that such a reading came from a different population. Two such possible different, or alternate, populations are also shown in Figure 18-1, and labeled P1 and P2 . Now, if in fact a random sample was taken from one of these alternate populations, there is a given probability, whose value depends on which population it came from, that it would fall above (or below) the upper critical limit indicated for H0 . The shaded areas in Figure 18-1 indicate the probabilities for a random sample falling below the critical value for H0 , when one of those alternate populations is in fact the correct population from which the sample was taken. As can be seen, these probabilities are 50% for population P1 and roughly 5% for population P2 . These probabilities are
P0 Upper critical limit for P0 Mean = µ 0 P1
P2
Figure 18-1 Characteristics of population P0 with mean 0 and alternate populations P1 and P2 (Note that the X-axes have been offset for clarity).
Experimental Designs: Part 7
99
the probabilities of (incorrectly) concluding that the data is consistent with H0 , for the two cases. This same topic is continued in our next chapter.
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 11(2), 43 (1996). 2. Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991).
This page intentionally left blank
19
Experimental Designs: Part 8 – �, the Power of a Test
(Continued)
Continuing from our previous discussion in Chapter 18 from reference [1], analogous to making what we have called (and is the standard statistical terminology) the error when the data is above the critical value but is really from P0 , this new error is called the error, and the corresponding probability is called the probability. As a caveat, we must note that the correct value of can be obtained only subject to the usual considerations of all statistical calculations: errors are random and independent, and so on. In addition, since we do not really know the characteristics of the alternate population, we must make additional assumptions. One of these assumptions is that the standard deviation of the alternate population Pa is the same as that of the hypothesized population P0 , regardless of the value of its mean. The existence of the probability provides us with the tool for determining what is called the power of the test, which is just 1 − , the probability of coming to the correct conclusion when in fact the data did not come from the hypothesized population P0 . This is the answer to our earlier question: once we have defined the alternate population Pa , we can determine the probability of a sample having come from Pa , just as we can determine the probability of that sample having come from P0 . So how does this help us determine n? As we know from our previous discussion of the Central Limit Theorem [2], the standard deviation of a sample from a population decreases from the population standard deviation as n increases. Thus, we can fix 0 and a and adjust the and probabilities by adjusting n and the critical value. Normally, it is convenient to adjust the critical value to be equidistant from 0 and a , and then adjust n so that that critical value represents the desired probability levels for and . As an example, we can set alpha- and beta- levels to the same value, which makes for a simple computation of the number of samples needed, at least for the simple case we have been considering: the comparison of means. If we use the 95% value for both (a very stringent test), which corresponds to a Z-value of 1.96 (as we know), then if we let D represent the difference in means between the two values (sample data and population mean), and S is the precision of the data, we find that √ D >= 392 S/ n
(19-1)
so that n = 392S/D2 = 15 S/D2
(19-2)
In words, we would need 15 samples for 95% confidence on both alpha and beta, to distinguish a difference of the means equal to the precision of the measurement, and the number increases as the square of any decrease in difference we want to detect.
102
Chemometrics in Spectroscopy
To compute the power for a hypothesis test based on standard deviation, we would have to read off the corresponding probability points from a chi-square table; for 95% confidences on both alpha and beta, the square root of the ratio of 2 (0.95, v) and 2 (0.05, v (v = the degrees of freedom, close enough to n for now) is the ratio of standard deviations that can be distinguished at that level of power. Similarly to the case of the means, v would also be related to the square of that ratio, but 2 would still have to be read from tables (or computed numerically). As an example, for 35 samples, the precision of the instrument could not be tested to be better than � √ 486/216 = 225 = 15 (19-3) or 1.5 times the precision of the reference method with that amount of power, and as before, n will increase as the square of any improvement we want to demonstrate. The ratio of 2 (.95, v to 2 (.05, v does decrease as v increases, but not nearly as fast as the square increases: it is a losing fight. Thus, the use of the concept of the Power of a Test allows specification of the number of samples (although it may turn out to be very high), and by virtue of that forms the basis for performing experiments as a sequential series.
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 11(6), 30–31 (1996). 2. Mark, H. and Workman, J., Spectroscopy 3(1), 44–48 (1988).
20
Experimental Designs: Part 9 – Sequential Designs
Concluded
Our previous two chapters based on references [1, 2] describe how the use of the power concept for a hypothesis test allows us to determine a value for n at which we can state with both - and -% certainty that the given data either is or is not consistent with the stated null hypothesis H0 . To recap those results briefly, as a lead-in for returning to our main topic [3], we showed that the concept of the power of a statistical hypothesis test allowed us to determine both the and the probabilities, and that these two known values allowed us to then determine, for every n, what was otherwise a “floating” quantity, D. At this point it should be starting to become clear what is going on. If a given set of , and D allow us to determine n, then similarly, a corresponding set of , and n allow us to determine D. Thus for a given and , n and D are functions of each other, and it then becomes a simple matter (at least in principle, in practice the math involved is extremely hairy) to determine the functionality. In fact the actual situation is considerably more complicated to determine mathemat ically. In our previous discussions, we have made a number of simplifying assumptions which cannot be used if we wish to calculate correct values for our expressions, and for which the actual situation must be incorporated into the math. The first of these assumptions is the use of the Normal distribution. When we perform an experiment using a sequential design, we are implicitly using the experimentally determined value of s, the sample standard deviation, against which to compare the difference between the data and the hypothesis. As we have discussed previously, the use of the experimental value of s for the standard deviation, rather than the population value of , means that we must use the t-distribution as the basis of our comparisons, rather than the Normal distribution. This, of course, causes a change in the critical value we must consider, especially at small values of n (which is where we want to be working, after all). The other key assumption that we sort of implied was that the comparison of standard deviation is constant. Of course we know that as n changes, the comparison value changes as the square root of n. This is on top of and in addition to the changes caused by the use of the t rather than the Normal (Z) distribution. So how is this related to the nature of the graph used for the sequential experimental design? We forgo the detailed math here, in deference to trying to impart an intuitive grasp of the topic, and we have already presented the equations involved [3]. The limits of the allowable values around the hypothesized values close in on it as n increases. This behavior is shown in Figure 20-1. If, in fact, we were to plot the mean of the population as a function of n, it would be a horizontal line, just as shown. The mean of the actual data would vary around this horizontal line (assuming the null hypothesis was correct), at smaller and smaller distances, as n increased.
104
Chemometrics in Spectroscopy
Upper critical limit Mean (µ0)
Lower critical limit
n
Figure 20-1 The limits of the allowable values around the hypothesized value close in on it as n increases.
If the null hypothesis was wrong, then the data would vary around a line offset from the line representing 0 , and get closer and closer to it, instead. Eventually, at some value of n, this line would cross the converging lines representing the critical limits around 0 , indicating the result. This is the basic picture, shown in Figure 20-2. For a sequential experimental plan, the sequence is terminated at the first significant experiment, as shown. The details differ, however. By convention, instead of plotting the mean, 0 , as a function of n, the sum of the data, which has a theoretical value of n∗ 0 , is used. Clearly this line will slope upward with a slope of 0 , instead of being horizontal, as will the data plot. The rest of the conceptual picture is the same, however. As we saw previously in reference [3], the slope of the line represented by n∗ 0 is paralleled by the confidence limits for the sum of the data, as represented by the equations in that
First significant reading Upper critical limit Mean (x) Mean (µ0)
Lower critical limit
n
Figure 20-2 If the null hypothesis was wrong, then the data would vary around a line offset from the line representing 0 and get closer and closer to that line.
Experimental Designs: Part 9
105 n × (x )
n × (µ 0)
First significant point
Upper critical limit
Lower critical limit
n
Figure 20-3 The approach of the upper line, representing the probability, corresponds to the approach of the curved lines to the n × 0 line (representing the null hypothesis).
column; thus, at the point where the line representing the successive mean values from the experimental design crosses the confidence limit in Figure 20-2, so does the line representing the successive sums eventually cross the line specified by the equations in reference [3], and illustrated in Figure 20-3 here. According to the derived equations, as we saw previously, the actual confidence limits representing the and probabilities are straight lines parallel to each other but not parallel to the line representing n∗ 0 . The approach of the upper line, representing the probability, corresponds to the approach of the curved lines, shown in Figure 20-3, to the n∗ 0 line (representing the null hypothesis) there. The line representing , however, being parallel to the line, departs from the null hypothesis. This can be interpreted as stating, as we have previously implied, that it is always harder to “prove” the null hypothesis than to disprove it.
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 11(6), 30–31 (1996). 2. Mark, H. and Workman, J., Spectroscopy 11(8), 34 (1996). 3. Mark, H. and Workman, J., Spectroscopy 11(4), 32–33 (1996).
This page intentionally left blank
21 Calculating the Solution for Regression Techniques: Part 1 – Multivariate Regression Made Simple
For the next several chapters we will illustrate the straightforward calculations used for multivariate regression (MLR), principal components regression (PCR), partial least squares regression (PLS), and singular value decomposition (SVD). In all cases we will use the same notation and perform all mathematical operations using MATLAB (Matrix Laboratory) software [1, 2]. We have already discussed and shown many of the manual methods for calculating the matrix algebra in references [3–6]. Let us begin by identifying a simple data matrix denoted by A. A is used to represent a set of absorbances for three samples as a rows × columns matrix. For our example each row represents a different sample spectrum, and each column a different data channel, absorbance or frequency. We arbitrarily designate A for our example as ⎡ ⎤ ⎡ ⎤ A11 A12 1 7 A = ⎣ A21 A22 ⎦ = AI×K = ⎣ 4 10 ⎦ (21-1) A31 A32 6 14 Thus, the integers 1 and 7 represent the instrument signal for two data channels (fre quencies 1 and 2) for sample Spectrum #1, 4 and 10 represent the same data channel signals (e.g., frequencies 1 and 2) for sample Spectrum #2, and so on. If we arbitrarily set our concentration c vector representing a single component to be a single column of numbers as ⎡ ⎤ ⎡ ⎤ c11 4 cr×c = ⎣ c21 ⎦ = cI×1 = ⎣ 8 ⎦ (21-2) c31 11 we now have the data necessary to calculate the matrix of regression coefficients b which is given by b11 −1 b = = A� A A� c = A+ c = pˆ (21-3) b21 This b (also known as pˆ = the prediction vector) is often referred to as the regression vector or set of regression coefficients. Note that A� A−1 A� is referred to as the pseu doinverse of A designated as A+ . Note that there is one regression coefficient for each frequency (or data channel). The matrix of predicted values is easily obtained as Matrix A (the data matrix) × Vector b (the regression coefficients) = Vector c (the predicted values). This is shown in matrix notation as A × b = c
(21-4)
108
Chemometrics in Spectroscopy
Now using the MATLAB command line software, we can easily demonstrate this solution (for the multivariate problem we have identified) using a series of simple matrix operations as shown in Table 21-1 below: Table 21-1 Matrix operations in MATLAB to compute equations 21-1–21-4 Command line
Comments
� A = [1 7;4 10;6 14]
Enter the A matrix
� A= 1 7 4 10 6 14
Display the A matrix
� c = [4;8;11]
Enter the concentration vector c
c=
Display the concentration vector c
4 8 11 � b = invA�∗ A∗ A�∗ c
Calculate the regression vector [Note: The inverse applies only to (A�∗ A)]
b= 0.7722 0.4662
Display the regression vector b
� A∗ b ans = 4.0356 7.7509 11.1601
Predict the concentrations [Note: A residual (or difference) exists between the predicted and the actual concentrations 4, 8, and 11].
REFERENCES 1. MATLAB software for Windows from The MathWorks, Inc., 24 Prime Park Way, Natick, Mass. 01760-1500. Internet:[email protected] 2. O’Haver, T.C., Chemometrics and Intelligent Laboratory Systems 6, 95 (1989). 3. Workman, J. and Mark, H., Spectroscopy 8(9), 16 (1993). 4. Workman, J. and Mark, H., Spectroscopy 9(1), 16 (1994). 5. Workman, J. and Mark, H., Spectroscopy 9(4), 18 (1994). 6. Mark, H. and Workman, J., Spectroscopy 9(5), 22 (1994).
22 Calculating the Solution for Regression Techniques: Part 2 – Principal Component(s) Regression Made Simple
For the next several chapters in this book we will illustrate the straight forward cal culations used for multivariate regression. In each case we continue to perform all mathematical operations using MATLAB software [1, 2]. We have already discussed and shown the manual methods for calculating most of the matrix algebra used here in references [3–6]. You may wish to program these operations yourselves or use other software to routinely make these calculations. As in Chapter 21, we begin by identifying a simple data matrix denoted by A. A is used to represent a set of absorbances for three samples as a rows × columns matrix. For our example each row represents a different sample spectrum, and each column a different data channel, absorbance or frequency. We arbitrarily designate A for our example as ⎡ A11 A = ⎣A21 A31
⎤ ⎡ A12 1 A22 ⎦ = AI×K = ⎣4 A32 6
⎤ 7 10⎦ 14
(22-1)
Thus, 1 and 7 represent the instrument signal for two data channels (frequencies 1 and 2) for sample spectrum #1; 4 and 10 represent the same data channel signals (e.g., frequencies 1 and 2) for sample spectrum #2, and so on. We now have the data necessary to calculate the singular value decomposition (SVD) for matrix A. The operation performed in SVD is sometimes referred to as eigenanal ysis, principal components analysis, or factor analysis. If we perform SVD on the A matrix, the result is three matrices, termed the left singular values (LSV) matrix or the U matrix; the singular values matrix (SVM) or the S matrix; and the right singular values matrix (RSV) or the V matrix. We now have enough information to find our Scores matrix and Loadings matrix. First of all the Loadings matrix is simply the right singular values matrix or the V matrix; this matrix is referred to as the P matrix in principal components analysis terminology. The Scores matrix is calculated as The data matrix A × the Loadings matrix V = Scores matrix T
(22-2)
Note: the Scores matrix is referred to as the T matrix in principal components analysis terminology. Let us look at what we have completed so far by showing the SVD calculations in MATLAB as illustrated in Table 22-1.
110
Chemometrics in Spectroscopy
Table 22-1 Matrix operations in MATLAB to compute the SVD of data matrix A Command line
Comments
� A = [1 7;4 10;6 14] A= 1 7 4 10 6 14
Enter the A matrix Display the A matrix
� [U,S,V] = svd(A);
Perform SVD on the A matrix
�U U= 03468 09303 01193 05417 -0.0949 -0.8352 07656 -0.3543 05369
Display the U matrix or the left singular values (LSV) matrix
�S S= 198785 0 0 16865 0 0
Display the S matrix or the singular values (SV) matrix
�V V= 03576 -0.9339 09339 03576
Display the V matrix or the right singular values (RSV) matrix (Note: this is also known as the P matrix or Loadings matrix)
� T = A*V T= 68948 15690 107691 -0.1600 152198 -0.5976
Calculate the Scores Matrix or the T matrix
If we arbitrarily set our concentration c vector representing a single component to be a single column of numbers as ⎡ ⎤ ⎡ ⎤ c11 4 cr×c = ⎣ c21 ⎦ = cI×1 = ⎣ 8 ⎦ (22-3) c31 11 We can now use S, V, and T to calculate the following; A reconstruction of the original data matrix A is computed by using the preselected number of principal components (i.e., columns in our T and V matrices) as A estimated = T × V �
(22-4)
The set of regression coefficients (i.e., the regression vector) is calculated as b (regression vector) = V × S−1 × U � × c
(22-5)
Calculating the Solution for Regression Techniques: Part 2
111
Table 22-2 Matrix operations in MATLAB to compute equations 22-4–22-6 Command line
Comments
� Aest = T*V�
Estimate the A data matrix
� Aest = 10000 70000 40000 100000 60000 140000
Display the estimate for A
� b = V(:,1:2)*inv(S(1:2,1:2))*U(:,1:2)’*c;
Calculate the regression vector [Note: The inverse operation refers only to the singular values matrix S. The calculation to determine b can only be performed using two columns in each of the V, S, and U matrices; this number is equivalent to the number of latent variables (or principal components) used.
b= 07722 04662
Display the regression vector
� cest = (T*V� )*b
Predict the concentrations [Note: This computation is equivalent to (Aest × b)].
cest = 40356 77509 111601
Display the concentration vector [Note: For this example of PCR a residual (or difference) exists between the predicted and the actual concentrations 4, 8, and 11].
The predicted or estimated values of c are computed as c (estimated) = T × V � × b
(22-6)
Now using the MATLAB command line software, we can easily demonstrate this solution (for the multivariate problem we have identified) using a series of matrix operations as shown in Table 22-2.
REFERENCES 1. MATLAB software from The MathWorks, Inc., 24 Prime Park Way, Natick, Mass. 01760-1500. Internet: [email protected]. 2. O’Haver, T.C., Chemometrics and Intelligent Laboratory Systems 6, 95 (1989). 3. Workman, J. and Mark, H., Spectroscopy 8(9), 16 (1993). 4. Workman, J. and Mark, H., Spectroscopy 9(1), 16 (1994). 5. Workman, J. and Mark, H., Spectroscopy 9(4), 18 (1994). 6. Mark, H. and Workman, J., Spectroscopy 9(5), 22 (1994).
This page intentionally left blank
23
Calculating the Solution for Regression Techniques:
Part 3 – Partial Least Squares Regression Made Simple
For the past three chapters we have described the most basic calculations for MLR, PCR, and PLS. Our intent is to show basic computations for these regression methods while avoiding unnecessary complexity which could confuse rather than instruct. There are of course a number of difficulties in taking this simplistic approach; namely the assumptions made for our simple cases do not always hold, and poorly behaved matrices are the rule rather than the exception. We have not yet discussed the concepts of rank, collinearity, scaling, or data conditioning. Issues of graphical representation and details of computational methods and assessing model performance are forthcoming. We ask that you abide with us over the next several chapters as we intend to delve much more deeply into the details and problems associated with regression methods. For this chapter we will illustrate the straightforward calculations used for PLS regres sion utilizing singular value decomposition. For PLS a special case of SVD is used. You will notice that the PLS form of SVD includes the use of the concentration vector c as well as the data matrix A. The reader will note that the scores and loadings are determined using the concentration values for PLS-SVD whereas only the data matrix A is used to perform SVD for principal components analysis. The SVD and PLS SVD will be the subject of several future chapters so we will only introduce its use here and not its derivation. All mathematical operations are completed using MATLAB soft ware [1, 2]. As previously discussed the manual methods for calculating the matrix algebra used within these chapters on the subject is found in references [3–7]. You may wish to program these operations yourselves or use other software to routinely make the calculations. As in our last installment we begin by identifying a simple data matrix denoted by A. A is used to represent a set of absorbances for three samples and three data channels, as a rows × columns matrix. For our example each row represents a different sample spectrum, and each column a different data channel, absorbance or frequency. We arbitrarily designate A for our example as ⎡
A11 Ar×c = ⎣A21 A31
A12 A22 A32
⎤ ⎡ A13 1 A23 ⎦ = AI×K = ⎣4 A33 6
7 10 14
⎤ 9 12⎦ 16
(23-1)
Thus, 1, 7, and 9 represent the instrument signal for three data channels (frequencies 1, 2, and 3) for sample spectrum #1; 4, 10, and 12 represent the same data channel signals (e.g., frequencies 1, 2, and 3) for sample spectrum #2, and so on.
114
Chemometrics in Spectroscopy
If we arbitrarily set our concentration c vector representing a single component to be a single column of numbers as ⎡ ⎤ ⎡ ⎤ c11 4 cr×c = ⎣ c21 ⎦ = cI×1 = ⎣ 8 ⎦ (23-2) c31 11 We now have both the data matrix A and the concentration vector c required to calculate PLS SVD. Both A and c are necessary to calculate the special case of PLS singular value decomposition (PLSSVD). The operation performed in PLSSVD is sometimes referred to as the PLS form of eigenanalysis, or factor analysis. If we perform PLSSVD on the A matrix and the c vector, the result is three matrices, termed the left singular values (LSV) matrix or the U matrix; the singular values matrix (SVM) or the S matrix; and the right singular values matrix (RSV) or the V matrix. We now have enough information to find our PLS Scores matrix and PLS Loadings matrix. First of all the PLS Loadings matrix is simply the right singular values matrix or the V matrix; this matrix is referred to as the P matrix in principal components analysis and partial least squares terminology. The PLS Scores matrix is calculated as The data matrix A × the PLS Loadings matrix V = PLS Scores matrix T
(23-3)
Note: the PLS Scores matrix is referred to as the T matrix in principal components analysis and partial least squares terminology. Let us look at what we have completed so far by showing the PLS SVD calculations in MATLAB as illustrated in Table 23-1. We can now use S, V, and T to calculate the following: A reconstruction of the original data matrix A is computed by using the preselected number of factors (i.e., columns in our T and V matrices) as A estimated = T × V
(23-4)
The set of regression coefficients (i.e., the regression vector) is calculated as b regression vector = V × S−1 × U × c
(23-5)
The predicted or estimated values of c are computed as c estimated = T × V × b
(23-6)
This expression is equivalent to c estimated = A estimated × b = A × b
(23-7)
or can be used to predict a single sample spectrum a using the expression c estimated = a estimated × c = a × b
(23-8)
Now using the MATLAB command line software, we can easily demonstrate this solution (for the multivariate problem we have identified) using a series of matrix operations as shown in Table 23-2.
Calculating the Solution for Regression Techniques: Part 3
115
Table 23-1 Matrix operations in MATLAB to compute the PLS SVD calculations of data matrix A (see equations 23-1–23-3) Command line
Comments
A = 1 7 9 4 10 12 6 14 16
Enter the A matrix
A= 1 7 9 4 10 12 6 14 16
Display the A matrix
c = [4;8;11]
Enter the c vector
c= 4 8 11
Display the c vector
[U,S,V] = SVDPLS(A,c,3);
Perform PLS SVD on the A matrix. This is a CPAC(7) version of the PLS SVD algorithm.
U U= 03817 -0.9067 -0.1797 05451 00638 08359 07465 04170 -0.5186
Display the U matrix or the left singular values (LSV) matrix
S S= 295796 -0.2076 00000 00000 19904 -0.0367 00000 00000 02038
Display the S matrix or the singular values (SV) matrix
V V= 02446 09345 02588 06283 00506 -0.7764 07386 -0.3525 05747
Display the PLS V matrix or the right singular values (RSV) matrix (Note: this is also known as the P matrix or PLS Loadings matrix)
T = A∗ V T= 112894 -1.8839 -0.0034 161236 00138 01680 220801 06750 -0.1210
Calculate the PLS Scores Matrix or the T matrix
116
Chemometrics in Spectroscopy
Table 23-2 Matrix operations in MATLAB to compute equations 23-4–23-8) Command line
Comments
Aest = T∗ V
Estimate the A data matrix
Aest = 10000 70000 90000 40000 100000 120000 60000 140000 160000
Display the estimate for A
b = V∗ invS∗ U∗ c
Calculate the PLS regression vector [Note: The inverse operation refers only to the singular values matrix S. The calculation to determine b is performed using three columns in each of the V, S, and U matrices; this number is equivalent to the number of latent variables (or PLS factors) used.
b= 11667 -0.6667 08333
Display the regression vector
cest = T∗ V ∗ b
Predict the concentrations [Note: This computation is equivalent to (Aest × b)].
cest = 40000 80000 110000
Display the concentration vector [Note: For this simple example of PLS no residual (or difference) exists between the predicted and the actual concentrations 4, 8, and 11].
REFERENCES 1. MatLab software Version 4.2 for Windows from The MathWorks, Inc., 24 Prime Park Way, Natick, Mass. 01760-1500. Internet: [email protected]. 2. O’Haver, T.C., Chemometrics and Intelligent Laboratory Systems, 6, 95 (1989). 3. Workman, J. and Mark, H., Spectroscopy 8(9), 16 (1993). 4. Workman, J. and Mark, H., Spectroscopy 9(1), 16 (1994). 5. Workman, J. and Mark, H., Spectroscopy 9(4), 18 (1994). 6. Mark, H., and Workman, J., Spectroscopy 9(5), 22 (1994). 7. Center for Process Analytical Chemistry, University of Washington, Seattle, WA, m-script library, 1993 (Contact Mel Koch or Dave Veltkamp for current versions).
24 Looking Behind and Ahead: Interlude
We depart from discussion of our usual topics in this chapter. Over the years since beginning writing on this topic, there has been a spate of telephone calls where the callers, after introducing themselves, said something that could generically be rendered as: “By chance I came across a copy of one of your articles, and am interested in reading more about this subject. Are there any more articles like this, and what are they, and how can I get them?” After discussing this between ourselves, we decided that we have reached a point where it is worthwhile to present our readers with a complete set of the chemometrics writings published to date. Those of you who have been reading our work for a long time will recall that the column series “Chemometrics in Spectroscopy” is a continuation of our previous column series, “Statistics in Spectroscopy”. Statistics in Spectroscopy was published from 1986 to 1992, with some preliminary articles in 1985. The columns from the earlier series, “Statistics in Spectroscopy”, have been collected and published in their entirety as a book (with minor editorial changes appropriate to the change in format from a series of columns to a book) of the same name, now in its second edition. So much for the past; what about the discussion? The last few chapters have been presenting the “nuts and bolts” of some of the more common chemometric techniques for performing quantitative chemometric/spectroscopic calibration, even getting down to the level of a “cookbook” of actual code (written for the MATLAB Matrix Algebra multivariate analysis software). The following chapters will deal first with completing a discussion on the various chemometric techniques in current use, and then to go “under the hood” with them to emphasize the underlying mathematical and theoretical framework that these methods rest upon. One upcoming topic will be a description of the so-called “statistical design of experiments” methodologies, emphasizing those techniques that tend to be obscure, but are more useful than they are dealt with in mainstream Chemometric discussions.
This page intentionally left blank
25 A Simple Question: The Meaning of Chemometrics Pondered
In a 1997 paper, Steve Brown and Barry Lavine state, “Chemometrics is not a subfield of Statistics. Although statistical methods are employed in Chemometrics, they are not the primary vehicles for data analysis” [1]. Parenthetically, we recommend this article as a very nice nonmathematical introduction for the average chemist as to what Chemometrics is, and how it can be used. As far as the quote is concerned, we have to both agree and disagree. On the one hand, we have to recognize the de facto truth that many users of Chemometric techniques are not aware of the Statistical backgrounds of the techniques, and indeed, we sometimes suspect that even the developers of those techniques may also not be aware of, or at least, give the statistical considerations their proper weight. Having said that, we will issue some disclaimers a little further on, because there are some legitimate and justifiable reasons for the existence of this situation. However, ignoring the existence of this situation means that nobody is paying the attention that would eventually lead to the condition being corrected, which would result in a better theoretical understanding of the techniques themselves, with a concomitant improvement in their reliability and definition of their range of applicability. This leads us to the other hand, which, it should be obvious, is that we feel that Chemometrics should be considered a subfield of Statistics, for the reasons given above. Questions currently plaguing us, such as “How many MLR/PCA/PLS factors should I use in my model?”, “Can I transfer my calibration model?” (or more importantly and fundamentally: “How can I tell if I can transfer my calibration model?”), may never be answered in a completely rigorous and satisfactory fashion, but certainly improvements in the current state of knowledge should be attainable, with attendant improvements in the answers to such questions. New questions may arise which only fundamental statistical/probabilistic considerations may answer; one that has recently come to our attention is, “What is the best way to create a qualitative (i.e., identification) model, if there may be errors in the classifications of the samples used for training the algorithm?” Part of the problem, of course, is that the statistical questions involved are very difficult, and have not yet been solved completely and rigorously even by statisticians. Another part of the problem is that very few first-class statisticians are interested in, or perhaps even aware of, the existence of our subdiscipline or its problems. Thus of necessity we push on and muddle through in the face of not always having a completely firm, mathematically rigorous foundation on which to base our use of the techniques we deal with (here comes our disclaimer). So we use these techniques anyway because otherwise we would have nothing: if we waited for complete rigor before we did anything, we would likely be waiting a long, long time, maybe indefinitely, for a solution that might never appear, and in the meanwhile be helpless in the face of the real (and real-world) problems that confront us.
120
Chemometrics in Spectroscopy
But that does not mean that we should not fight the good fight while we are trying to solve current problems, or let that effort distract us. This means two things. The first is to do as we have been doing, and use our imperfect tools and our imperfect understanding of them, to continue to solve problems as best we can. But the second thing we need to do is what we have not been doing, which is to improve our understanding of the tools we use. In this endeavor, more widespread and better understanding and application of the fundamental statistical/probabilistic basis of our chemometric algorithms is crucial. Maybe one of the things we need to accomplish this is to recruit more first-class statisticians into our ranks, so that they can pay proper attention to the fundamentals, and explain them to the rest of us. Also each of us should pay attention and put some effort into learning more about these fundamentals ourselves. Then we could ourselves better understand the phenomena we see occurring in our data and analyses thereof, and then maybe eventually learn how to deal with them properly. In order to appreciate how understanding new statistical concepts can help us, let us look at an example of where we can better apply known statistical concepts, to understand phenomena currently afflicting us. To this end, let us pose the seemingly innocuous question: “When doing quantitative calibration, why is it that we use the formulation of the problem that makes the constituent values the dependent (i.e., the Y ) variable, and make the spectroscopic data the X (or independent) variable, called the Inverse Beer’s Law formulation (sometimes called the P-matrix formulation)?” (For that matter, why is the formulation that we most commonly use called “Inverse Beer’s Law” instead of the direct “Beer’s Law”?) Now, we are sure that everybody reading this chapter thinks they know the answer. Now, if you are among those readers, then you are wrong already, because there are multiple answers to this question, all of them correct, and each of them incomplete. Let us dispose of the most common answer first. This answer is the one given in most of the discussions about the relative merits of the two formulations, e.g. [2], and is essentially a practical one: we use the Inverse Beer’s Law formulation because by doing so, we need to only determine the concentration(s) of the analyte(s) of interest. In the Beer’s law formulation, you must determine the concentrations of all components in a mixture, whether they are of interest or not. Of course, there is benefit to that also; as Malinowski points out, you can determine the number of components in a mixture and their spectra, as well as their concentrations, by proper application of the techniques of factor analysis in such a case [3]. The second answer is similar, but even more simplistic. Figure 25-1 shows a graphical depiction of a two-wavelength calibration situation: the values on the two wavelength axes determine the point on the calibration plane from which to strike a line to the concentration axis. The situation, however, is symmetric; so why don’t we consider the possibility of using the value along one of the wavelength axes along with the concentration value to determine the value along the other wavelength axis? In theory this could be done, but the reason we do not do it is the same as the answer to the main question above: we do not care; this case is of no interest to us. As chemists, we are interested in determining quantities of chemical interest, and we use the spectroscopic values as a mean of attaining this goal; the reverse calculation is of no interest to us as chemists. None of these answers deal with fundamentals. So finally we get to the substantive part of the discussion, the one that connects with our original diatribe concerning the goal
A Simple Question: The Meaning of Chemometrics Pondered
121
Calibration plane CONC
+
WL 2
WL 1
Figure 25-1 Symbolic graphical depiction of a two-wavelength calibration.
and role of Statistics in Chemometric calculations, the one that will give us an answer to our original question that is based on fundamental considerations, and therefore the one that is the purpose of this whole discussion. To fully appreciate the point we have to go back a bit and look at the historical development of spectroscopic quantitative analysis. Back when we were in school and taking academic courses in Analytical Chemistry, spectroscopy was only one of many techniques presented (and one of the “minor” ones, at that). Now, we can not really compare our experiences with what is being done currently because we are somewhat out of touch with academia, but back then what we now call the Beer’s Law formulation (i.e., making the constituent concentration the X-variable) was the one presented and taught, and we were required to use it. Of course, as an academic exercise the system was simplified: there was only one analyte in a pure solvent, so in principle it would seem that we could have put either variable on the X-axis. Nowadays, standard practice would impel us to put the analyte concentration on the Y -axis even in this simplified situation (whether it belonged there or not). What has changed between then and now? Well in fact considerable has changed, in both the nature of the situation surrounding the analysis and the instruments we use to do the measurements. Back in the days of our academic exercises, spectrometers were based on vacuum-tube technology (remember them? – or are we dating ourselves?), were noisy, drifted terribly, and were full of all manner of error sources. The samples we used to calibrate the instrument, on the other hand, were made synthetically, by weighing the analyte on an analytical balance and dis solving it in the fixed volume of a volumetric flask. Both of these items were considered to be the highest-precision, highest-accuracy measuring devices available. Therefore, in those days, the accuracy of the spectroscopic measurements were considered to be far inferior to the accuracy of the training samples. In those days, Statistics was more highly regarded than it is now, and the analytical chemists then knew the fundamental requirements of doing calibration work. There are several; we need not go into all of them now, but the one that is pertinent to our current discussion is the one that states that, while the Y -variable may contain error, the X-variable must be known without error. Now, in the real world this is never true, since all quantities are the result of some measurement, which will therefore have error
122
Chemometrics in Spectroscopy
associated with it. In practice, however, it is sometimes possible to reduce the error to a sufficiently small value that it approximates zero well enough for the calibration calculations to work. What happens if we do not manage to keep the X error “sufficiently small”? Let us examine a situation which is just complicated enough to show the effects; three sets of data are presented in Table 25-1, that we will use, along with some of the statistics Table 25-1 Three sets of data illustrating the effect of errors in X and in Y on the results obtained by calibration (A) No error Sample #
X
Y
1 2 3 4
0 0 10 10
0 0 10 10
Intercept = 0 Slope = 1 Correlation coeff = 1 SEE = 0 PRESS = 0 (B) Error in Y Sample #
X
Y
1 2 3 4
0 0 10 10
−1 1 9 11
X
Y
−1 1 9 11
0 0 10 10
Intercept = 0 Slope = 1 Correlation coeff = 0.98058 SEE = 1.4142 PRESS = 2.000 (C) Error in X Sample # 1 2 3 4 Intercept = 0.19231 Slope = 0.96154 Correlation coeff = 0.98058 SEE = 1.38675 PRESS = 1.92018
A Simple Question: The Meaning of Chemometrics Pondered (a)
123
(b)
Y
Y Correct model
Correct model
X
X
(c) Correct model Y Calculated model
X
Figure 25-2 Graphical representation of three regression situations. (a) no error. (b) Error in y only. (c) Error in x only. See text for discussion.
associated with calibration calculations based on those data. Graphical representations of the three data sets are displayed in Figures 25-2A through 25-2C, so that the respective models can be compared to the data. We present univariate data, since that shows the effects we wish to illustrate, and is the simplest example that will do so. The biggest advantage to a scenario like this is that we know the “right” answer, because we can make it whatever we want it to be. In this case, the right answer is that the intercept is zero and the slope is 1 (unity). Table 25-1A represents this condition with four samples whose data follow that model without error. The data in Table 25-1A are the prototype data upon which we will build data containing error, and investigate the effects of errors in Y and in X. We use four data points, in coincident pairs, so that when we introduce error, we can retain certain important properties that will result in the same model being the correct one for the data. Along with the data, we show the results of doing the calibration calculations on the data. For Table 25-1A, the slope and the intercept are as we described, the error (which we measure as both the Standard Error of Estimate [SEE] and using cross-validation [the PRESS statistic, using the leave-one-out algorithm]) is zero (naturally), and the correlation coefficient is unity – a necessary concomitant of having zero error.
124
Chemometrics in Spectroscopy
Now in Table 25-1B, we introduce error into the Y variable. We do so by adding +1 to one each of the high and low values, and −1 to each of the other high and low values. This maintains symmetry and keep the average position of the pairs of points remains the same, which guarantees that the correct model for the data does not change. This is in accordance with theory and is borne out when the calibration calculations are performed: the model is identical, even though the error (SEE) is no longer zero and the correlation coefficient is no longer unity. Go ahead: redo the calculations and check this out for yourself. Now, the purists and the sharper-eyed among us may argue that another requirement of regression theory is that the errors follow a Normal (i.e., Gaussian) distribution and that these errors are not distributed properly. We counter this argument by pointing out that there is not enough data to tell the difference; there is no significance test that can be used to demonstrate that the data either do or do not follow any predetermined distribution. Finally, and of most interest, is the data in Table 25-1C. Here we have taken the same errors as in Table 25-1B and applied them to the X variable rather than the Y variable. By symmetry arguments, we might expect that we should find the same results as in Table 25-1B. In fact, however, the results are different, in several notable ways. In the first place, we arrive at the wrong model. We know that this model is not correct because we know what the right model is, since we predetermined it. This is the first place that what the statisticians have told us about the results are seen. In statistical parlance, the presence of error in the X variable “biases the coefficient toward zero”, and so we find: the slope is decreased (always decreased) from the correct value (of unity, with this data) to 096+. So the first problem is that we obtain the wrong model. The next item we will look at is the correlation coefficient. The correlation coeffi cient for Table 25-1C is identical to that in Table 25-1B. There is nothing particularly noteworthy about this, except that the correlation coefficient is useless as a means of distinguishing between the two cases: obviously, since we obtain the same result in both situations, we cannot tell from the value of the correlation coefficient which situation we are dealing with. Now we come to the Standard Error of Estimate and the PRESS statistic, which show interesting behavior indeed. Compare the values of these statistics in Tables 25-1B and 25-1C. Note that the value in Table 25-1C is lower than the value in Table 25-1B. Thus, using either of these as a guide, an analyst would prefer the model of Table 25-1C to that of Table 25-1B. But we know a priori that the model in Table 25-1C is the wrong model. Therefore we come to the inescapable conclusion that in the presence of error in the X variable, the use of SEE, or even cross-validation as an indicator, is worse than useless, since it is actively misleading us as to the correct model to use to describe the data. This is for univariate data; what happens in the case of multivariate (multiwavelength) spectroscopic analysis. The same thing, only worse. To calculate the effects rigorously and quantitatively is an extremely difficult exercise for the multivariate case, because not only are the errors themselves are involved, but in addition the correlation structure of the data exacerbates the effects. Qualitatively we can note that, just as in the univariate case, the presence of error in the absorbance data will “bias the coefficient(s) toward zero”, to use the formal statistical description. In the multivariate case, however, each coefficient will be biased by different amounts, reflecting the different amounts of noise (or error, more generally) affecting the data at different wavelengths. As mentioned above, these
A Simple Question: The Meaning of Chemometrics Pondered
125
effects will be exacerbated by intercorrelation between the data at different wavelengths. The difficulty comes when you realize that it is not simply the correlations between pairs of wavelengths that are operative in this regard, but also the intercorrelation effects of the data when the wavelengths are taken 3, 4, n at a time. This is what has made the problem so intractable. Now, we are sure that there are some readers who will read this and say something along the lines of “well, all you need do is do a PCA/PLS analysis and get rid of all those effects”. Actually, there might be a germ of truth to that – if you can always do all your calibration modeling using only the first two or three PCA or PLS factors. Beyond that you will run into what we might almost call the Law of Conservation of Error (except for the fact that, as we all know, error is much easier to create than destroy!). In special cases, however, such as PCA and PLS, the total error really is constant, so that we quickly get into territory where the noise that you pushed out of the first couple of factors reappears, and affects the higher factors even more than the original noise affected the original data. So in the long-gone days of our academic lives, the chemical measurements, being based on high-accuracy gravimetric and volumetric techniques, were indeed the proper ones to put on the X-axis. Contrast that with the current state of technology: instruments have improved enormously, and rather than making up training samples by simple gravi metric dilutions, we often obtain our training, or reference, values through complicated analytical methodologies, which are themselves fraught with so much error that even in favorable cases, the error can be 5–10% of the analytical value. In our current practice, therefore, the error in the reference lab values really is greater than the error in the absorbance data. For this reason it is now appropriate to reverse the positions of the concentration and absorbance values relative to their place in the calculation schema. So it is the changing nature of the world and the types of analyses we do that dictate how we go about organizing the calculations we use to do them. This comes from fundamental considerations of the behavior of the modeling process, which the science of Statistics can tell us about.
REFERENCES 1. Lavine, B.K. and Brown, S., Today’s Chemist at Work 6(9), 29–37 (1997). 2. Brown, C.W., Spectroscopy 1(4), 32–37 (1986). 3. Malinowski, E.R., Factor Analysis in Chemistry, 2nd ed. (John Wiley & Sons, New York, (1991).
This page intentionally left blank
26
Calculating the Solution for Regression Techniques:
Part 4 – Singular Value Decomposition
In Chapters 21–23 and in this chapter, we have described the most basic calculations for MLR, PCR, and PLS. To reiterate, our intention is to demonstrate these basic computations for each mathematical method presently, and then to delve into greater detail as the chapters progress; consider these articles linear algebra bytes. For this chapter we will illustrate the basic calculation and mathematical relationships of different matrices for the calculations of Singular Value Decomposition or SVD. You will note from previous chapters that SVD is used for modern computations of principal components regression (PCR) and partial least squares regression (PLSR), although slightly different forms of SVD are used for each set of computations. Recall for PCR we simply used SVD and for PLS a special case of SVD that we called PLS SVD was used. You will also recall that the PLS form of SVD includes the use of the concentration vector c as well as the data matrix A. The reader will note that the scores (T) and loadings (V) are determined using the concentration values for PLS SVD whereas only the data matrix A is used to perform SVD for principal components analysis. All mathematical operations used for this chapter are completed using MATLAB software for Windows [1]. As previously discussed the manual methods for calculating the matrix algebra used within these chapters is found in references [2–5]. You may wish to program these operations yourselves or use other software to routinely make the calculations. As in previous installments we begin by identifying a simple data matrix denoted by A. A is used to represent a set of absorbances for three samples and three data channels, as a rows × columns matrix. For our example each row represents a different sample spectrum, and each column a different data channel, absorbance or frequency. We arbitrarily designate A for our example as ⎡ ⎤ ⎡ ⎤ A11 A12 A13 1 7 9 Ar×c = ⎣ A21 A22 A23 ⎦ = AI×K = ⎣ 4 10 12 ⎦ (26-1) A31 A32 A33 6 14 16 Thus, 1, 7, and 9 represent the instrument signal for three data channels (frequencies 1, 2, and 3) for sample spectrum #1; 4, 10, and 12 represent the same data channel signals (e.g., frequencies 1, 2, and 3) for sample spectrum #2, and so on. Given any data matrix A of arbitrary size (as rows × columns) the matrix A can be written or defined using the computation of Singular Value Decomposition [6–8] as A = USV = U × S × V
(26-2)
where U is the left singular values matrix, V is the loadings matrix, and S is the diagonal matrix containing information on the variance described by each principal component
128
Chemometrics in Spectroscopy
(as the S matrix columns). It is important to note when reviewing the use of SVD in the literature that many references define the scores matrix (T) as U × S. Keep in mind that the scores can be calculated as U×S=A×V=T
(26-3)
and it holds that the original data matrix A can be reconstructed as U × S × V = T × V = A × V × V = A × I = A
(26-4)
We can demonstrate the interrelationships between the different matrices resulting from the SVD calculations by the use of MATLAB as shown in Table 26-1. By studying the relationships between the various matrices resulting from the com putation of SVD, one can observe that there are several ways to compute the same Table 26-1 Simple SVD performed on matrix A using MATLAB; other matrix relation ships are also shown (see equations 26-1 through 26-4) Command line
Comments
A = [1 7 9;4 10 12;6 14 16]
Enter the A matrix
A= 1 7 9 4 10 12 6 14 16
Display the A matrix
[U,S,V] = svd(A)
Calculate the SVD of A
U= 03821 09061 -0.1814 05451 -0.0624 08361 07463 -0.4183 -0.5178
Display the U matrix, also known as the left singular values matrix, and rarely referred to as the scores matrix. The scores matrix is most often denoted as U × S or A × V which as it turns out are exactly the same.
S= 295803 0 0 0 19907 0 0 0 02038
Display the S matrix or the singular values matrix. This diagonal matrix contains the variance described by each principal component. Note: the squares of the singular values are termed the eigenvalues.
V= 02380 -0.9312 02762 06279 -0.0694 -0.7752 07410 03579 05681
Display the V matrix or the right singular values matrix; this is also known as the loadings matrix. Note: this matrix is the eigenvectors corresponding to the positive eigenvalues.
U*S*V = ans = 10000 70000 90000 40000 100000 120000 60000 140000 160000
U*S*V is equivalent to the original data matrix A derived using the SVD computation
Calculating the Solution for Regression Techniques: Part 4
129
Table 26-1 (Continued) Command line
Comments
T = A*V T= 113024 18038 -0.0370 161231 -0.1243 01704 220748 -0.8328 -0.1055
The scores matrix (often designated as T) can be calculated as A × V
U*S ans = 113024 18038 -0.0370 161231 -0.1243 01704 220748 -0.8328 -0.1055
As mentioned in the text of the article, the scores matrix T can also be calculated as U × S.
T*V ans = 10000 70000 90000 40000 100000 120000 60000 140000 160000
As we have stated, the original data matrix A can be estimated as the scores matrix (T) × the transpose of the loadings matrix (V ) as shown.
A*V*V ans = 10000 70000 90000 40000 100000 120000 60000 140000 160000
Just another way to estimate the original data matrix A. In this case, V times the transpose of V (itself) is a diagonal matrix with a value of ones along the diagonal, such as shown below. Note: this matrix of ones along the diagonal is called an identity matrix or (I). 10000 00000 00000 00000 10000 00000 00000 00000 10000
final results, making it somewhat difficult to follow the literature. However, knowing these inner mathematical relationships can help clarify our understanding of the different nomenclature. We will compare and contrast some of the literature and the use of different terms in later installments; right now just tuck this information away for future reference.
REFERENCES 1. MatLab software for Windows from The MathWorks, Inc., 24 Prime Park Way, Natick, Mass. 01760-1500. Internet: [email protected]. 2. Workman, J. and Mark, H., Spectroscopy 8(9), 16 (1993). 3. Workman, J. and Mark, H., Spectroscopy 9(1), 16 (1994). 4. Workman, J. and Mark, H., Spectroscopy 9(4), 18 (1994). 5. Mark, H. and Workman, J., Spectroscopy 9(5), 22 (1994). 6. Mandel, J., American Statistician 36, 15 (1982). 7. Golub, G.H. and Van Loan, Charles F., Matrix Computations, 2nd ed. (The Johns Hopkins University Press Baltimore, MD, 1989), pp. 427, 431. 8. Searle, S.R., Matrix Algebra Useful for Statistics (John Wiley & Sons, New York, 1982), p. 316.
This page intentionally left blank
27 Linearity in Calibration
Those who know us know that we have always been proponents of the approach to calibration that uses a small number of selected wavelengths. The reasons for this are partly historical, since we became involved in Chemometrics through our involvement in near-infrared spectroscopy, back when wavelength-based calibration techniques were essentially the only ones available, and these methods did yeoman’s service for many years. When full-spectrum methods came on the scene (PCR, PLS) and became popu lar, we adopted them as another set of tools in our chemometric armamentarium, but always kept in mind our roots, and used wavelength-based techniques when necessary and appropriate, and we always knew that they could sometimes perform better than the full spectrum techniques under the proper conditions, despite all the hype of the proponents of the full-spectrum methods. Lately, various other workers have also noticed that eliminating “extra” wavelengths could improve the results, but nobody (including ourselves) could predict when this would happen, or explain or define the conditions that make it possible. The advantages of the full-spectrum methods are obvious, and are promoted by the proponents of full-spectrum methods at every opportunity: the ability to reduce noise by averaging data over both wavelengths and spectra, noise rejection by rejecting the higher factors, into which the noise is preferentially placed, the advantages inherent in the use of orthogonal variables, and the avoidance of the time-consuming step of performing the wavelength selection process. The main problem was to define the conditions where wavelength selection was superior; we could never quite put our finger on what characteristics of spectra would allow the wavelength-based techniques to perform better than full-spectrum methods. Until recently. What sparked our realization of (at least one of) the key characteristics was an on-line discussion of the NIR discussion group [1] dealing with a similar question, whereupon the ideas floating around in our heads congealed. At the time, the concept was proposed simply as a thought experiment, but afterward, the realization dawned that it was a relatively simple matter to convert the thought experiment into a computer simulation of the situation, and check it out in reality (or at least as near to reality as a simulation permits). The advantage of this approach is that simulation allows the experimenter to separate the effect under study from all other effects and investigate its behavior in isolation, something which cannot be done in the real world, especially when the subject is something as complicated as the calibration process based on real spectroscopic data. The basic situation is illustrated in Figure 27-1. What we have here is a simulation of an ideal case: a transmission measurement using a perfectly noise-free spectrometer through a clear, non-absorbing solvent, with a single, completely soluble analyte dissolved in it. The X-axis represents the wavelength index, the Y -axis represents the measured absorbance. In our simulation there are six evenly spaced concentrations of analyte, with simulated “concentrations” ranging from 1 to 6 units, and a maximum simulated
132
Chemometrics in Spectroscopy 1.6 1.4 1.2 1
0.8 0.6 0.4 0.2 301
289
277
265
253
241
229
217
205
193
181
169
157
145
133
121
97
109
85
73
61
49
37
25
1
13
0 –0.2
Figure 27-1 Six samples worth of spectra with two bands, without (left) and with (right) stray light. (see Color Plate 1)
absorbance for the highest concentration sample of 1.5 absorbance units. Theoretically, this situation should be describable, and modeled by a single wavelength, or a single factor. Therefore in our simulation we use only one wavelength (or factor) to study. For the purpose of our simulation, the solute is assumed to have two equal bands, both of which perfectly follow Beer’s law. What we want to study is the effect of non linearities on the calibration. Any nonlinearity would do, but in the interest of retaining some resemblance to reality, we created the nonlinearity by simulating the effect of stray light in the instrument, such that the spectra are measured with an instrument that exhibits 5% stray light at the higher wavelengths. Now, 5% might be considered an excessive amount of stray light, and certainly, most actual instruments can easily exhibit more than an order of magnitude better performance. However, this whole exercise is being done for pedagogical purposes, and for that reason, it is preferable for the effects to be large enough to be visible to the eye; 5% is about right for that purpose. Thus, the band at the lower wavelengths exhibits perfect linearity, but the one at the higher wavelengths does not. Therefore, even though the underlying spectra follow Beer’s law, the measured spectra not only show nonlinearity, they do so differently at different wavelengths. This is clearly shown in Figure 27-2, where absorbance versus concentration is plotted for the two peaks. Now, what is interesting about this situation is that ordinary regression theory and the theory of PCA and PLS specify that the model generated must be linear in the coefficients. Nothing is specified about the nature of the data (except that it be noise-free, as our simulated data is); the data may be non-linear to any degree. Ordinarily this is not a problem because any data transform may be used to linearize the data, if that is desirable. In this case, however, one band is linearly related to the concentrations and one is not; a transformation, blindly applied, that linearized the absorbance of the higher-wavelength band would cause the other band to become non-linear. So now, what is the effect of this all on the calibration results that would be obtained? Clearly, in a wavelength-based approach, a single wavelength (which would be theo retically correct), at the peak of the lower-wavelength band, would give a perfect fit to the absorbance data. On the other hand, a single wavelength at the higher-wavelength band would give errors due to the nonlinearity of the absorbance. The key question then becomes, how would a full-wavelength (factor-based) approach behave in this situation?
Linearity in Calibration
133
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1
2
3
4
5
6
Figure 27-2 Absorbance versus concentration, without (upper) and with (lower) stray light.
In the discussion group, it was conjectured that a single factor would split the dif ference; the factor would take on some character of both absorbance bands, and would adjust itself to give less error than the non-linear band alone, but still not be as good as using the linear band. Figure 27-3 shows the factor obtained from the PCA of this data. It seems to be essentially Gaussian in the region of the lower-wavelength band, and somewhat flattened in the region of the higher-wavelength band, conforming to the nature of the underlying absorbances in the two spectral regions. Because of the way the data was created, we can rely on the calibration statistics as an indicator of performance. There is no need to use a validation set of data here. Validation sets are required mainly to assess the effects of noise and intercorrelation. Our simulated data contains no noise. Furthermore, since we are using only one wavelength or one factor, intercorrelation effects are not operative, and can be ignored. Therefore the final test lies in the values obtained from the sets of calibration results, which are presented in Table 27-1. Those results seem to bear out our conjecture. The different calibration statistics all show the same effects: the full-wavelength approach does seem to be sort of “split the difference” and accommodate some, but not all, of the non-linearities; the algorithm 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02
Figure 27-3 First principal component from concentration spectra.
157
151
145
139
133
127
121
115
109
97
103
91
85
79
73
67
61
55
49
43
37
31
25
19
7
13
1
0
134
Chemometrics in Spectroscopy
Table 27-1 Calibration statistics obtained from the three calibration models discussed in the text Linear wavelength SEE Corr. Coeff. F
0 1
Non-linear wavelength
Principal component
0237 09935 305
00575 09996 5294
uses the data from the linear region to improve the model over what could be achieved from the non-linear region alone. On the other hand, it could not do so completely; it could not ignore the effect of the nonlinearity entirely to give the best model that this data was capable of achieving. Only the single-wavelength model using only the linear region of the spectrum was capable of that. So we seem to have identified a key characteristic of chemometric modeling that influences the capabilities of the models that can be achieved: not nonlinearity per se, because simple nonlinearity could be accommodated by a suitable transformation of the data, but differential nonlinearity, which cannot be fixed that way. In those cases where this type of differential, or non-uniform, nonlinearity is an important characteristic of the data, then selecting those wavelengths and only those wavelengths where the data are most nearly linear will provide better models than the full-spectrum methods, which are forced to include the non-linear regions as well, are capable of. Now, the following discussion does not really constitute a proof of this condition (in the mathematical sense), but this line of reasoning is fairly convincing that this must be so. If, in fact, a full-spectrum method is splitting the difference between spectral regions with different types and degrees of nonlinearity, then those regions, at different wavelengths, themselves must have different amounts of nonlinearity, so that some regions must be less nonlinear than others. Furthermore, since the full-spectrum method (e.g., PCR) has a nonlinearity that is, in some sense, between that of the lowest and highest, then the wavelengths of least nonlinearity must be more linear than the full-spectrum method and therefore give a more accurate model than the full-spectrum algorithm. All that is needed in such a case, then, is to find and use those wavelengths. Thus, when this condition of differential nonlinearity exists in the data, modeling tech niques based on searching through and selecting the “best” wavelengths (essentially we’re saying MLR) are capable of creating more accurate models than full-wavelength methods, since almost by definition this approach will find the wavelength(s) where the effects of nonlinearity are minimal, which the full-spectrum methods (PCA, PLS) cannot do.
REFERENCE 1. The moderator of this discussion group was Bruce Campbell. He can be reached for information, or to join the discussion group by sending a message to: [email protected]. New members are welcome.
28
Challenges: Unsolved Problems in Chemometrics
We term the issues we plan to discuss in this chapter as “unsolved” problems, but that may be incorrect. It may be, perhaps, more accurate to call them “Unaddressed Problems in Chemometrics”. Calling them “unsolved” implies that attempts have been made to solve them, but those attempts were unsuccessful, possibly because these problems are too difficult, or possibly because maybe we are not smart enough. Calling them “unaddressed” on the other hand, really gets to the heart of the matter: a number of problems have come to our attention that nobody seems to be paying any heed to. It may very well turn out that some of these problems are too difficult to solve at the current state of the art in Chemometrics, and maybe we are really not smart enough, but at this point we do not know, and we will never know if nobody tries. Our attention was drawn to these problems via various routes. Some arose from our own work on various projects. Some arose from discussions in the on-line discussion group. Some have been floating around in the backs of our minds for what seems like forever, but only recently crystallized into something concrete enough to write down in a coherent manner so that it could be explained to somebody else. Answers – we have none, only questions. We bring up these points to stir up some discussion, and maybe even a little controversy, and certainly with the hope that we can prod some of our compatriots “out there” to tackle some of these. Conspicuous by its absence is the question of calibration transfer, even though we consider it unsolved in the general sense, in that there is no single “recipe” or algorithm that is pretty much guaranteed to work in all (or at least a majority) of cases. Nevertheless, not only are many people working on the problem (so that it is hardly “unaddressed”), but there have been many specific solutions developed over the years, albeit for particular calibration models on particular instruments. So we do not need to beat up on this one by ourselves. So what are these problems? 1) The first one we mention is the question of the validity of a test set. We all know and agree (at least, we hope that we all do) that the best way to test a calibration model, whether it is a quantitative or a qualitative model, is to have some samples in reserve, that are not included among the ones on which the calibration calculations are based, and use those samples as “validation samples” (sometimes called “test samples” or “prediction samples” or “known” samples). The question is, how can we define a proper validation set? Alternatively, what criteria can we use to ascertain whether a given set of samples constitutes an adequate set for testing the calibration model at hand? A very limited version of this question, does in fact, sometimes appear, when the question arises of how many samples from a given calibration set to keep in reserve for
136
Chemometrics in Spectroscopy
the validation process. Answers range from one (at a time, in the PRESS algorithm) to half the set, and there is no objective, scientific criterion given for any of the choices that indicate whether that amount is optimum. Each one is justified by a different heuristic criterion, and there is never any discussion of the failings of any particular approach. For example, while the PRESS algorithm is appealing, it does not even test the calibration model: if anything, for n samples it tests n different models, none of which is the one to be used, and so forth. Another shortcoming of PRESS is that if each sample was read multiple times, then a computer program that simply removes one reading at a time does not remove the effect of that sample from the data. Even so, at best any of these answers treat only one aspect of the larger question, which includes not only how many samples, but which ones? A properly taken random sample is indeed representative of the population from which it comes. So one subquestion here is, how should we properly sample? The answer is “randomly” but how many workers select their validation samples in a verifiably random manner? How can someone then tell if their test set is then valid, and against what criteria? Some of this goes back to the original question of obtaining a proper and valid set of calibration samples in the first place, but that is a different, although related problem. We can turn that question around in the same way: what are the criteria for telling if a calibration sample set is a valid set? Maybe both problems have the same solution, but we do not know because nobody is working on either one. But to pose the question more directly: how can we tell if any set of samples constitute a valid test set? Even if they were chosen in a proper random manner, are there any independent tests for their validity? What characteristics should the criteria for deciding be based on, and what are the criteria to use? 2) The next problem we bring up for discussion is the definition of “validation”. Now, we are sure there are some who will complain that we are arguing terminology rather than substance. However, we think that agreement on what terms mean has substantive consequences, especially in modern times when standards-setting organizations (e.g., ASTM) and government agencies are taking an interest in what we do. As we will see below, there is the question of the time required to validate, so on the one hand, if we recognize that verifying the accuracy of a given model at the time that model is created may or may not be a sufficient test of its long-term behavior and we may need to include long-term testing procedures. On the other hand, if government agencies create regulations for how models are to be validated, which presumably they are likely to do on the basis of what we ourselves decide is required, do we want to be constrained to not being able to declare that we have created a model until months or years have passed? Such questions involve much more than terminology, especially if the government decides that “validation” is, in fact, whatever we claim it is. As we hinted above, the most common use of the term “validation” involves simply retaining some samples separately from the main set of calibration samples and using those as a more-or-less independent test of the accuracy of the calibration model obtained. However, this definition is not universally agreed to. When the subject came up in the on-line discussion group, the following comment was made by Richard Kramer of the discussion group [1]: The issue Howard raises is an important one. However, I disagree with his characterization of validation and with the resulting conclusion. It all depends upon
Unsolved Problems in Chemometrics
137
what one means by the concept of validation. If validation means the ongoing validation of a plurality of alternative models (my preferred meaning), it DOES become the means of selecting one model over others. And importantly, it permits selection of models which exhibit the best performance with respect to time-related properties such as robustness. It is not uncommon to observe that the model which initially appears to be optimum is the one whose performance degrades most rapidly as time passes. Validation over time also provides a means of gaining insight into which portions of the data might contain more confusion than information and would be best discarded. In particular, it can be interesting to look at the data residuals over time. It is not uncommon to find that the residuals in some parts of the data space increase more rapidly, over time, than the residuals in other parts of the data space. Generally excluding (or de-weighting) the former from the model can improve the model’s performance, short term and long term. Certainly Richard raises valid points, and you can hardly fault his prescription for monitoring and improving the results. However, is that considered, or should that be considered a requirement for validation, or even a necessary part of the validation process? The response comment to Richard at the time was as follows: I think Rich & I agree more than we disagree. If you use his definition of validation then what he says follows. However, that definition is not the one in common use – the MUCH more common definition is simply the one that tells you to separate your calibration samples & keep some out of the calibration calculations, then use those to validate. Once you’ve gone to the trouble to collect data over time then your options expand greatly. Not only can you use that data for ongoing validation, you can also include those new readings in the calibration calculations. There are at least two ways to do this: 1) As Richard implies, one way is to gradually replace the older data with the new as it becomes available. This has been standard practice for a long time, for example in the agricultural industry, where old samples will never be seen again. A grain elevator, e.g., will never again have to measure another sample from the 1989 crop year. 2) The other obvious extension, which is more useful for the case where you may still have to measure samples with the same characteristics as the old ones, is to simply keep adding to and expanding the calibration set as new samples become available. The new samples then not only allow you to test for robustness, but inclusion of such samples will actually make the calibration more robust. I think we all know this intuitively, but I have also been able to prove this mathematically. So validation may not only involve the time frame required to perform it, it may also involve questions of the models (or at least the number of models) being tested. So there we have it: what exactly is “validation”? 3) The next unsolved problem we bring up is the question of error in the classification of training samples when calibrating an instrument to do identification. We mentioned
138
Chemometrics in Spectroscopy
this briefly in a recent column, but it is worth some more discussion. The problem appears to arise primarily in medical applications, so as a non-proprietary example, let us imagine we are interested in identifying the degree of burn of a burn victim: that is whether the subject has a 1st, 2nd or 3rd degree burn. The distinctions are medically important, and furthermore there are qualitative differences between them despite the fact that they arise out of the quantitative difference in the amount of heat involved. In these respects this typifies other medical situations. We could take spectra of the burned areas from subjects who have been burned, but there is a certain amount of subjectivity in assigning the degree of burn in a given case, and occasionally two physicians will disagree on the designation of the degree of burn in some cases. Clearly, if they disagree, they both cannot be correct, so if we use one or the other’s diagnosis, the training classification will also occasionally be in error. While there is certainly a progression in the intensity and severity of the burn as we go from 1st to 3rd degree burns, we cannot simply use a quantitative scale, for a number of reasons: a quantitative scale of that sort is not agreed to by all physicians, it would be, at best, highly nonlinear, and most importantly, there are real qualitative differences between tissue subjected to the different extents of damage, besides the potential quantitative ones. Because of this, a straightforward quantitative approach would not suffice, even if one could be developed. We need methods to deal with the existence of errors in the training classifications when training instruments to do automated identification. 4) The final problem we bring up is based on the question of modeling based on individual wavelengths versus full-spectrum methods and the modern variations on those themes. Basically the question can be put: “How far should we go in eliminating wavelengths?”. As we discussed in a recent column, as well as in times past, our backgrounds are from the days of pre-PCA/PLS/PCR/NN calibration modeling, and we there learned the value of wavelength-based models (principally MLR, or P-matrix as it’s sometimes called), which we only recently crystallized into something concrete enough to write down in a coherent manner so that it could be explained to somebody else. (does that sound familiar?) The full-spectrum methods (PLS, PCR, K-matrix, etc.) have their advantages and, as we recently discussed, so do the individual-wavelength methods. The users of the full-spectrum approaches have in recent years taken an empirical, ad hoc approach to the question of wavelength elimination, finding that there was benefit to it, even if there were no explanations of the reasons for that benefit. Our initial reaction was something on the order of: why not go the whole way and eliminate all the wavelengths except those few that are needed to do the analysis (i.e., go to the limit of wavelength elimination, which essentially brings it back to MLR)? However, now that we know what the benefit of MLR-type modeling is, it is clear that eliminating all those wavelengths is counterproductive, because it throws the baby out with the bathwater, so to speak. Ideally, we should like to devise criteria for determining how many wavelengths, and which wavelengths, to keep and which to eliminate, to obtain the optimum balance between the noise-reduction capabilities of the fill-spectrum methods and the linearity-maximization capabilities of the individualwavelength approaches.
Unsolved Problems in Chemometrics
139
Well, there we have it: our list of current unsolved/unaddressed problems. Hop to it, readers!!!
REFERENCE 1. Chemometrics discussion group moderated by Bruce Campbell. He can be reached for infor mation, or to join the discussion group by sending a message to: [email protected]. New members are welcome.
This page intentionally left blank
29
Linearity in Calibration: Act II Scene I
When we first published our chapter “Linearity in Calibration” as an article in Spectroscopy magazine [1] we did not quite realize what a firestorm we were going to ignite, although, truth be told, we did not expect everybody to agree with us, either. But if so many actually took the trouble to send their criticisms to us, then there must also be a large “silent majority” out there that are upset, perhaps angry, and almost certainly misunderstanding what we said. We prepared responses to these criticisms, but they became so lengthy that we could not print them all in a single published column, and thus the topic is included in several smaller chapters. At this point in our discussion, let us raise the question of the linearity of spectro scopic data as a general topic. There are a number of causes of nonlinearity that most chemists and spectroscopists are familiar with. Let us define our terms. When speak ing of “linearity” the meaning of the term depends on your point of view, and your interests. An engineer is concerned, perhaps, with the linearity of detector response as a function of incident radiant energy. To a chemist or spectroscopist, the interest is in the linearity of an instrument’s readings as a function of the concentration of an analyte in a set of samples. In practice, this is generally interpreted to mean that when measuring a transparent, non-scattering sample, the response of the instrument can be calculated as some constant times the concentration of the analyte (or at least some function of the instrument response can be calculated as a constant times some other function of the concentration). In spectroscopic usage, that is normally interpreted as meaning the condition described theoretically by Beer’s Law, that is the instrument response function is the negative exponential of the concentration: I = k Io e−bC
(29-1)
where I = k= Io = b= C=
the the the the the
radiation passing through the sample multiplying constant radiation incident on the sample product of the pathlength and absorbtivity concentration of the analyte.
When other types of samples are measured, the resulting data is usually known to be nonlinear (except possibly in a few special cases), so those measurements are of no interest to us here. Thus, in practice, the invocation of “linearity” implies the assumption that Beer’s Law holds, therefore discussions of nonlinearity are essentially about those phenomena that cause departures from Beer’s law.
142
Chemometrics in Spectroscopy
These include 1) Chemical causes a) Hydrogen bonding b) Self-polymerization or condensation c) Interaction with solvent d) Self-interaction 2) Instrumental causes a) Nonlinear detector b) Nonlinear electronics c) Instrument bandwidth broad compared to absorbance band d) Stray light e) Noncollimated radiation f) Excessive signal levels (saturation). Most chemists and spectroscopists expect that in the absence of these distinct phenom ena causing nonlinearity, Beer’s Law provides an exact description of the relationship between the absorbance and the analyte concentration. Unfortunately the world is not so simple, and Beer’s Law never holds exactly, EVEN IN PRINCIPLE. The reason for this arises from thermodynamics. Optical designers and specialists in heat transfer calculations in the chemical engineer ing and mechanical engineering sciences are familiar with the mathematical construct known as The Equation of Radiative Transfer, although most chemists and spectro scopists are not. The Equation of Radiative Transfer states that, disregarding absorbance and scattering, in a lossless optical system dE = I d d da dt
(29-2)
where dE = the differential energy transferred in differential time dt I = the optical intensity as a function of wavelength (i.e., the “spectrum”) d = the differential wavelength increment d = the differential optical solid angle the beam encompasses da = the differential area occupied by the beam. For a static (i.e., unvarying with time) system, we can recast equation 29-2 as: dE/dt = I d d da
(29-3)
where dE/dt is the power in the beam. The application of these equations to heat transfer problems is obvious, since by knowing the radiation characteristics of a source and the geometry of the system, these equations allow an engineer, by integrating over the differential terms of equation 29-2 or equation 29-3, to calculate the amount of energy transferred by electromagnetic radiation from one place to another. Furthermore, the first law of thermodynamics assures us that dE/dt will be constant anywhere along the optical beam, since any change would require that the energy in the
Linearity in Calibration: Act II Scene I
143
beam be either increased or decreased, which would require that energy would be either created or destroyed, respectively. Less obviously, perhaps, the second law of thermodynamics assures us that the inten sity, I, is also constant along the beam, for if this were not the case, then it would be possible to focus all the radiation from a hot body onto a part of itself, increasing the radiation flux onto that portion and raising its temperature of that portion without doing work – a violation of the second law. The constancy of beam energy and intensity has other consequences, some of which are familiar to most of us. If we solve equation 29-3 for the product (d da) we get: d da = dE/dt × d/I
(29-4)
All the terms on the right-hand side of equation 29-4 are constants, therefore for any given wavelength and source characteristics, the product d da) is a constant, and in an optical system one can be traded off for the other. We are all familiar with this characteristic of optical systems, in the magnification and demagnification of images described by geometric optics. Whenever light is brought to a small focus (i.e., da becomes small) the light converges on the focal point through a large range of angles (i.e., d becomes large) and vice versa. This trade-off of parameters is more obvious to us when seen through the paradigm of geometric optics, but now we see that this is a manifestation of the thermodynamics underlying it all. We are also familiar with this effect in another context: in the fact that we cannot focus light to an arbitrarily small focal point, but are limited to what we usually call the “diffraction limit” of the radiation in the beam. This effect also comes out of equation 29-4, since there is a physical (or perhaps a geometrical) limit to d: d cannot become arbitrarily large, therefore da cannot become arbitrarily small. Again, we are familiar with this effect by coming across it in another context, but we see that it is another manifestation of the underlying thermodynamic reality. Getting back to our main line of discussion, we can see from equation 29-2 (or equation 29-3) that the differential terms must all have finite values. If any of the terms d, d, or da were zero, then zero energy would pass through the system and we could not make any measurements. One thing this tells us, of interest to us as spectroscopists, is that we can never build an instrument with perfect resolution. The mechanistic fundamentals (quantum broadening, Doppler broadening, etc.) have been extensively discussed by one of our colleagues [2]. This effect also manifests itself in the fact that every technology has an “instrument function” that is convolved with the sample spectrum, and each instrument function is explained by the paradigms of the associated technology, but since “perfect” resolution means that d = 0, we see again that this is another result of the same underlying thermodynamics. More to the point of our discussion regarding nonlinearity, however, is the fact that d cannot be zero. d is related to the concept of “collimation”: for a “perfectly collimated” beam, d = 0. But as we have just seen, such a beam can transfer zero energy; so just as with d and da, a perfectly collimated beam has no energy. Beer’s law, on the other hand, is based on the assumption that there is a single pathlength (normally represented by the variable b in the equation A = abc) for all rays through the sample. In a real, physical, measurement system, this assumption is always false, because of the fact that d cannot be zero. As Figure 29-1 shows, the actual
144
Chemometrics in Spectroscopy I2
I0
θ
θ max
I1 b
Figure 29-1 Diagram showing the pathlength in a sample for ray going straight through (to I1 ) and those going at an angle (to I2 ).
rays have pathlengths that range from b (for those rays that travel “straight through”, i.e., normal to the sample surfaces) to b/cos(max (for the rays at the most extreme angles). We noted this effect above as item 2e in our list of sources of nonlinearity, and here we see the reason that there is fundamental limitation. Mechanistically, the nonlinearity is caused by the fact that the absorbance for the rays traveling normally = abc, while for the extreme rays it is abc/cos(max . Thus the non-normal rays suffer higher absorbance than the normal ones do, and the discrepancy (which equals abc1 − 1/cos) increases with increasing concentration. When the medium is completely nonabsorbing, then the difference in pathlength does not affect the measurement. When the sample has absorbance, however, it is clear that ray I2 will have its intensity reduced more than ray I1 , due to the longer pathlength. Thus not all rays are reduced by the same amount and this leads to the nonlinearity of the measurement. Mathematically, this can be expressed by noting that the intensity measured when a beam with a finite range of angles passes through a sample is I = Io
�max
e−b/ cos d
(29-5)
0
rather than the simpler form shown in equation 29-1 (which, we remind the reader, only holds true for “perfectly collimated” beams, which have zero energy). In practice, of course, this effect is very small, normally much smaller than any of the other sources of nonlinear behavior, and we are ordinarily safe in ignoring it, and calling Beer’s law behavior “linear” in the absence of any of the other known sources of nonlinear behavior. However, the point here is that this completes the demonstration of our statement above, that Beer’s law never exactly holds IN PRINCIPLE and that as spectroscopists we never ever really work with perfectly linear data.
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 13(6), 19–21 (1998). 2. Ball, D.W., Spectroscopy 11(1), 29–30 (1996).
30
Linearity in Calibration: Act II Scene II – Reader’s
Comments � � �
Some time ago we wrote an article entitled “Linearity in Calibration” [1], in which we presented some unexpected results when comparing a calibration model using MLR with the model found using PCR. That column generated an active response, so we are discussing the subject in some detail, spread over several columns. The first part of these discussions have been published [2]; this chapter is the continuation of that one. In this chapter we now present the responses we received to the original published article [1] in order of receipt, following which we will comment about them in subsequent chapters. Here, in order of receipt, are the comments: The first set of comments we received were from Richard Kramer: [Howard & Jerry], I’m afraid that this month’s Spectroscopy Column is badly off the mark (pun intended (with apologies)). The errors are two-fold with the most serious error so significant that the other error is moot. 1) If I understand the column correctly, a 1-factor model was used. Well, a single linear factor can never be sufficient to properly model a non-linear system. A minimum of 2 factors are required. The synthetic data did NOT demonstrate the advantage of a single linear wavelength over a multiple wavelength model, it merely illustrated the fact that a single linear factor is not sufficient to model non-linear data. We could stop here, but, for the sake of completeness � � � . 2) The second problem is that that we never have the luxury of working with noise-free data. Thus, the column did not ask the right question(s). The proper question to ask is “In what ways and under which circumstances do the signal averaging advantages of multiple-wavelength models outperform or underper form with respect to a single (or n wavelength, where n is a small integer) wavelength calibration when noise is present?” The answer will depend upon the levels of noise and non-linearity and the number of wavelengths in each model. Regards, Richard We went back and forth a couple of times, but rather than list each of our conversations individually, we will reserve comments until we have looked at all the comments, and then we will summarize our responses to all four respondents together, since several of these response comments say the same things, to some extent.
146
Chemometrics in Spectroscopy
Second, we received comments from Patrick Wiegand: Gents, I have always looked forward to reading your articles on Chemometrics in Spec troscopy. They are truly a valuable resource – I usually cut them out and save them for future reference. However, I think your article “Linearity in Calibration” in the June 1998 issue of Spectroscopy leads the reader to an erroneous conclusion. This conclusion results largely because of the assumptions you make about the application of PLS and PCR. I know of no experienced practitioner of chemometrics who would blindly use the “full spectrum” when applying PLS or PCR. In the book “Chemometrics” by Beebe, Pell and Seasholtz, the first step they suggest is to “examine the data.” Likewise, Kramer in his new book has two essential conditions: The data must have information content and the information in the data must have some rela tionship with the property or properties which we are trying to predict. Likewise, in the course I teach at Union Carbide, I begin by saying that “no model ing technique, no matter how complex, can produce good predictions from bad data.” In your article, you appear to be creating an artificial set of circumstances: 1) You start with a “perfectly noise-free spectrum” 2) You create an excessively high degree of non-linearity which would never be tolerated by an experienced spectroscopist. 3) You assume the spectroscopist will use the entire spectrum blindly when apply ing PLS or PCR, even though some parts of the spectrum clearly have no information and other parts are clearly nonlinear. 4) You limit the number of factors for PLS/PCR to 1, even though the number of latent variables must be greater, due to the nonlinearity. In regards to number 1, by using a perfectly noise-free spectrum, you have elim inated the main advantage of PLS/PCR. That is, the whole point of using these techniques is that they have better ability to reject noise than MLR. To come to an adequate conclusion as to the best performer, you should at least add an amount of random noise an order of magnitude greater than normal, since the amount of nonlinearity you use is an order of magnitude greater than normal. Number 2 – I understand that you wanted to use a high degree of nonlinearity so that the absorbance vs. concentration plot will be nonlinear to the naked eye, but you can’t really expect to use this degree of nonlinearity to make a judgmental comparison between two techniques if it is not realistic that it will ever occur in real life. Number 3 – There are many well-established techniques for choosing which wavelength regions to use when modeling with PLS/PCR. First, I advise people to make sure that the pure component spectrum actually has a band in the location being modeled. If this is not possible, at least only include regions that look like
Linearity in Calibration: Act II Scene II
147
valid bands – no sense in trying to include low s/n baseline regions. Plots of a linear correlation coefficient vs. wavelength for the property of interest are also useful in choosing the right regions to include in the model. Finally, if the initial model is built using the full-spectrum, an examination of factor plots would reveal areas in which there is no activity. Number 4 – In cases where there is no choice but to deal with nonlinearity in the spectra, then it will be necessary to use more factors than the number of chemical species in the system. Once again, an experienced practitioner will use other ways of choosing the right number of factors, like a PRESS plot, etc. Thus your conclusion – that MLR is more capable of producing accurate models than PLS/PCR – is based on a contrived set of circumstances that would not occur in reality, especially when the chemometrician/spectroscopist is experienced. It would be very interesting also, since the performance of the models presented are so similar, to see how the performance would be affected by noise, drift, etc. which are always present in actuality. I would not be surprised if PLS/PCR outperformed MLR under those circumstances. All of the above would seem to indicate that I am totally against using MLR. This is not the case. In my practice, I always try the simplest approach first. This means first trying MLR. If that does not work, then I use PLS. If that does not work – well, some people may use neural networks, but I have not yet found a need to do so. I think you are right in saying that there has been a lot of hype over PLS (although not as much as there has been over neural nets!) In many cases MLR works great, and I will continue to use it. To paraphrase Einstein, “Always use the simplest approach that works – but no simpler.” The third set of comments we received were from Fred Cahn: I read your article in Spectroscopy (13(6), June 1998) with interest. However, I don’t agree with the conclusions and the way your simulation was carried out and/or presented. While I am no longer working in this field, and cannot easily do simulations, I think that a 2 factor PCR or PLS model would fully model the simulated spectra. At any wavelength in your simulation, a second degree power series applies, which is linear in coefficients, and the coefficients of a 2 factor PCR or PLS model will be a linear function of the coefficients of the power series. (This assumes an adequate number of calibration spectra, that is, at least as many spectra as factors and a sufficient number of wavelength, which the full spectrum method assures.) The PCR or PLS regression should find the linear combination of these PCR/PLS coefficients that is linear in concentration. See my publication: Cahn, F. and S. Compton, “Multivariate Calibration of Infrared Spectra for Quanti tative Analysis Using Designed Experiments”, Applied Spectroscopy, 42:865–872 (July, 1988).
148
Chemometrics in Spectroscopy
Fred supplied a copy of the cited paper, and we read it. Again, the comments about it will be included among the general comments. And finally, the fourth set of comments we received were from Paul Chabot: Hello, I recently read your column in the Spectroscopy issue of June 1998, which was dealing with “Linearity in Calibration”. First, I have to tell you that I really like your monthly column. You do a good job at explaining the basics and more of many topics related to chemometrics, and “demistify” the subjects. As an avid user of PLS, I was concerned when you were comparing MLR to PLS and PCR on your synthetic data set. Even though I agree with you that in some cases, MLR is a much better approach than PLS or PCR, sometimes the use of a full spectrum technique is essential. In this particular case, I do not doubt your results showing that MLR outperforms the full spectrum techniques because the data set was designed to do so. But out of the full spectrum techniques, I would expect PLS to outperform PCR, and the loading of the first principal component to be mostly located around the lower wavelength peak for PLS. Did you notice any difference between PCR and PLS on this data set? I would appreciate it if you could let me know if you tried both approaches and the results you obtained so I don’t have to regenerate the data. Thank you very much, and keep up the good work, Paul Chabot To summarize the comments (including ones presented during subsequent discussions, and therefore not included above): 1) Richard Kramer, Patrick Wiegand, and Fred Cahn felt that we should have tried two factors. 2) Richard Kramer and Patrick Wiegand thought we should have added simulated noise to the data. 3) All four responders indicated that we should have tried PLS. 4) Richard Kramer, Patrick Wiegand, and Paul Chabot indicated that one PLS factor might do as well as one wavelength. 5) Richard Kramer and Patrick Wiegand thought that our conclusion was that MLR is better than PCA. As stated in the introduction to this chapter, we present our responses in chapters to follow.
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 13(6), 19–21 (1998). 2. Mark, H. and Workman, J., Spectroscopy 13(11), 18–21 (1998).
31 Linearity in Calibration: Act II Scene III
In Chapter 27, we discussed a previously published paper entitled “Linearity in Calibration” [1]. In the chapter and original paper we presented some unexpected results when comparing a calibration model using MLR with the model found using PCR. That chapter, when first published as an article, generated a rather active response, so we are discussing the subject and responding to the comments received in some detail, spread over several chapters. The first two parts of our response were included as Chapters 29 and 30, which refer to the papers published as [2, 3]; this Chapter 31 is the continuation of those. We ended Chapter 30 with a summary of the comments received regarding a previous “Linearity in Calibration” paper. We therefore pick up where we left off by starting this chapter with that same summary (naturally, anyone who wishes to read the full text of the comments will have to go back and reread Chapter 30 derived from reference [3]): 1) Richard Kramer, Patrick Wiegand, and Fred Cahn felt that we should have tried two factors. 2) Richard Kramer and Patrick Wiegand thought we should have added simulated noise to the data. 3) All four responders indicated that we should have tried PLS. 4) Richard Kramer, Patrick Wiegand, and Paul Chabot indicated that one PLS factor might do as well as one wavelength. 5) Richard Kramer and Patrick Wiegand thought that our conclusion was the MLR is better than PCA. In addition, each of the responders had some of their own individual comments; we discuss all these below. We now continue with our responses, and discussion of these comments: It may surprise some to hear this, especially in light of some of the comments we make below, but we agree with the responders more than we disagree. We also believe, for example, in pre-screening the data, at least as strongly as Patrick Wiegand does, and we believe his comments regarding the way all (or at least, let’s hope all) experienced chemometricians approach a problem. Indeed, fully half the book that one of us authored [4] was spent on just that point: how to “look at the data”. However, our experience in the “real world” (as some like to call it) of instrument manufacturers has given us a somewhat different slant on the reality of what actually happens when users get hold of a new super-whiz-bang package of calculation. In many years of experience in the NIR applications department at Technicon Instru ments, there was about an hour and a half available to teach both theory and practice of calibration to each group of new users; the rest of the training time was spent teaching the students how to set the instrument up, prepare samples, take reproducible readings,
150
Chemometrics in Spectroscopy
and learn the rest of the mechanics needed to run the instrument, take readings, and collect the data. How much attention do you think could be paid to the finer points? This seems to be typical of what happens in the majority of cases involving novice users, and it is rare that there is anyone “back at the plant” who can pick up the ball and take them any further. Even experienced practitioners can be misled, however. As was pointed out, real data contains various types and amounts of variations in both the X and Y variables. Furthermore, in the usual case, neither the constituent values nor the optical readings are spaced at nice, even, uniform intervals. Under such circumstances, it is extremely difficult to pick out the various effects that are operative at the different wavelengths, and even when the data analyst does examine the data, it may not always be clear which phenomena are affecting the spectra at each particular wavelength. Now we will respond to the various comments, and make some more observations of our own. We will re-quote the pertinent parts of the communications from the responders, collecting together those on a similar topic and comment on them collectively. Note than some of these quotes were from later messages than those quoted in our previous column, because they were generated during subsequent discussions, and so may not have appeared previously. We hope nobody takes our reply comments personally. Both some of the comments and some of our responses are energetic, because we seem to have touched on a subject that turned out to be somewhat controversial. So we do not take the responders comments personally, but we do enter with zest and gusto into what looks like something turning into a rather lively debate, and we sincerely hope that everybody can take our own comments in that same spirit. The format of this columns is as follows: each numbered section starts with the comments from the various responders dealing with a given aspect of the subject, followed by our response to them collectively. So now let us consider the various points raised, starting with the use of noise-free data: 1) “You start with a ‘perfectly noise-free spectrum’ ” (Patrick Wiegand) “In regards to number 1, by using a perfectly noise-free spectrum, you have eliminated the main advantage of PLS/PCR. That is, the whole point of using these techniques is that they have better ability to reject noise than MLR. To come to an adequate conclusion as to the best performer, you should at least add an amount of random noise an order of magnitude greater than normal, since the amount of nonlinearity you use is an order of magnitude greater than normal.” (Patrick Wiegand) “The second problem is that that we never have the luxury of working with noise-free data. Thus, the column did not ask the right question(s). The proper question to ask is ‘In what ways and under which circumstances do the signal averaging advantages of multiple-wavelength models outperform or underperform with respect to a single (or n wavelength, where n is a small integer) wavelength calibration when noise is present?’ The answer will depend upon the levels of noise and nonlinearity and the number of wavelengths in each model.” (Richard Kramer) “It isn’t a case of ‘extreme difficulty’. It is a situation where, in one case you use a factor which happens to be based upon an explicit model (i.e. linearity) which is correct
Linearity in Calibration: Act II Scene III
151
for the data while stacking the deck against the second case by denying any opportunity to be correct.” (Richard Kramer) Response: Of course we used noise-free data. Otherwise we could not be sure that the effects we see are due to the characteristics we impose on the data, rather than the random effects of the noise. When anyone does an actual, physical experiment and takes real readings, the noise level or the signal-to-noise ratio is a consideration of paramount importance, and any experimenter normally takes great pains to reduce the noise as much as possible, for just that reason. Why shouldn’t we do the same in a computer experiment? On the other hand, PCA and PLS are both known to perform better than MLR when the data is noisy because of the inherent averaging that they include. In this we agree fully; indeed, we also mentioned this characteristic in Chapter 27, as well as in the original column. Richard Kramer hit the nail on the head with his question “In what ways ?” The important question, then, that needs to be asked (and answered) is, at what point does one phenomenon or the other become dominant, so as to control or determine which algorithm will provide a better model? The next important question is, how can we tell which phenomenon is dominant in any particular case? Rich Kramer also had the insight to go to the next step, and realized that the only way to determine whether the nonlinearity is “small” or “large” is by having something to compare to, and the natural characteristic to compare it to is the noise. On this score we also agree with Richard and Patrick fully, and this is one place where much research is needed (there are others; and we will get to them in due course): How do you compare the systematic behavior of nonlinearity with the random behavior of noise? The standard application of the science of Statistics provides us with tools to detect systematic effects, but how do we go to the next step and ascertain their relative effects on calibration models? These are among the fundamental behavioral properties of calibrations that are not being investigated, but need to be. There are important theoretical reasons to reduce the spectral noise when doing calibrations. Nevertheless, if the main advantage of PLS is its behavior in the presence of noisy data (as Patrick Wiegand states), that is poor praise indeed. Noise levels of modern instruments are far below those of the past. In some cases, and NIR instruments come to mind here, the noise levels are so low that they are tantamount to having “zero noise” to start with. This improvement in instrumentation is a good thing, and we sincerely doubt that anybody would recommend using a noisy instrument for the sole purpose of justifying a more sophisticated algorithm. In any case, even if all the above statements are 100% true, it does not affect our discussion because they are beside the point. The behavior of calibration algorithms in the face of noisy data is an important topic and perhaps should be studied in depth, but it was not at issue in the “Linearity in Calibration” column. 2) “You create an excessively high degree of nonlinearity which would never be tolerated by an experienced spectroscopist.” (Patrick Wiegand) Response: In the absence of random variation, ANY amount of nonlinearity would give the same results, and if we used less, any differences from the results we presented would be only of degree, not of kind. Any amount of nonlinearity is infinitely greater
152
Chemometrics in Spectroscopy
than zero. As we explained in the original column, we deliberately chose an unrealis tically large amount of nonlinearity for pedagogical purposes; what would be the point of comparing different calibration lines that the naked eye saw as equally straight? The fact that it is “unrealistically” large is immaterial. 3) “You assume the spectroscopist will use the entire spectrum blindly when applying PLS or PCR, even though some parts of the spectrum clearly have no information and other parts are clearly nonlinear.” (Patrick Wiegand) Response: Above, I described the situation as we see it, regarding the traps that both experienced and novice users of these very sophisticated algorithms can fall into. Keep in mind the pedagogy involved as well as the chemometrics: by suitable choice of values for the “constituent”, the peaks at the nonlinear wavelengths could have been made to appear equally spaced, and the linear wavelengths appear stretched out at the higher values. The “clarity” of the nonlinearity is due to the presentation, not to any fundamental property of the data, and this clarity does not normally exist in real data. How is someone to detect this, especially if not looking for it? Attempts to address this issue have been made in the past (see [5]) with results that in our opinion are mixed, at best. And that simulated data was also noise-free. With real data, a more scientifically valid approach would be to correct the nonlinearity from physical theory. In the current case, for example, a scientifically valid approach would be to convert the data to transmission mode, subtract the stray light and reconvert to absorbance: the nonlinear wavelengths would have become linear again. There are, of course, several things wrong with this procedure, all of them stemming from the fact that this data was created in a specific way for a specific purpose, not necessarily to be representative of real data: a) You would have to know a priori that only certain wavelengths (and which ones) were subject to the “stray light” or whatever source of nonlinearity was present. b) One of the problems of current chemometric practice is the “numbers game” aspect. No matter how soundly based in physical theory a procedure is, if the numbers it produces are not as good (whatever that might mean in a specific case) as a different, more empirical, procedure, the second procedure will be used, no matter how empirical its basis. The counter-argument to that, of course, is something on the order of “Well, we have to get as good results as we can for the user” and there is a certain amount of legitimacy to this statement. However, we know of no other field of scientific study where a situation of this sort is tolerated. Certainly, every field has areas of unknown effects where not all the fundamental physical theory is available, but in all fields other than chemometrics, there are workers investigating these dark areas, to try to fill in the missing knowledge. In chemometrics, on the other hand, for at least the 22 years we have been involved with the field, all we have seen the workers in the field doing are building bigger and higher and more fanciful mathematical superstructures on foundations that few, if any of them, seem to be aware of. We will have more to say about this below. c) The simple fact that sometimes the nature of the correct physical theory to use is unknown. d) Finally, the real reason we presented these results the way we did was that the whole purpose of the exercise was to study the effect of this type of variation of
Linearity in Calibration: Act II Scene III
153
the data, so that simply removing it would not only be trivial, it would also be a counterproductive procedure. 4) “If I understand the column correctly, a 1-factor model was used. Well, a single linear factor can never be sufficient to properly model a non-linear system. A minimum of 2 factors are required.” (Richard Kramer) “PLS should have, in principle, rejected a portion of the non-linear variance resulting in a better, although not completely exact, fit to the data with just 1 factor. The PLS does tend to reject (exclude) those portions of the x-data which do not correlate linearly to the y-block.” (Richard Kramer) “You limit the number of factors for PLS/PCR to 1, even though the number of latent variables must be greater, due to the nonlinearity.” (Patrick Wiegand) “In principle, in the absence of noise, the PLS factor should completely reject the non linear data by rotating the first factor into orthogonality with the dimensions of the x-data space which are ‘spawned’ by the nonlinearity. The PLS algorithm is supposed to find the (first) factor which maximizes the linear relationship between the x-block scores and the y-block scores. So clearly, in the absence of noise, a good implementation of PLS should completely reject all of the nonlinearity and return a factor which is exactly linearly related to the y-block variances.” (Richard Kramer) “While I am no longer working in this field, and cannot easily do simulations, I think that a 2 factor PCR or PLS model would fully model the simulated spectra.” (Fred Cahn) “My “objection” is that you did not seem to look at the 2nd factor, which I think is needed to accurately model the spectra after the background is added.” (Fred Cahn) “I would expect PLS to outperform PCR, and the loading of the first principal component to be mostly located around the lower wavelength peak for PLS.” (Paul Chabot) Response: Yes, but: The point being that, as our conclusions indicate, this is one case where the use of latent variables is not the best approach. The fact remains that with data such as this, one wavelength can model the constituent concentration exactly, with zero error – precisely because it can avoid the regions of nonlinearity, which the PCA/PLS methods cannot do. It is not possible to model the “constituent” better than that, and even if PLS could model it just as well (a point we are not yet convinced of since it has not yet been tried – it should work for a polynomial nonlinearity but this nonlinearity is logarithmic) with one or even two factors, you still wind up with a more complicated model, something that there is no benefit to. Richard Kramer suggested that we use two wavelengths (with the MLR approach) to see what happens. Well, here’s what happens: if the second wavelength is also on the linear absorbance band, you get a “divide by zero” error upon performing the matrix inversion due to the perfect collinearity between the data at the two wavelengths. If the second wavelength is on the nonlinear band, the regression coefficient calculated for it is exactly zero (at least to 16 digits, where the computer truncation error becomes important), since it plays exactly no role in the modeling. In other words, not only is it
154
Chemometrics in Spectroscopy
unnecessary to add a second wavelength to the model, it is impossible to do so if you try; when the model is perfectly correct you can’t force a second wavelength into that model even if you want to. Richard Kramer, Patrick Wiegand, and Paul Chabot suggested that a one-factor PLS model should reject the data from the nonlinear wavelength and therefore also provide a perfect fit to the “constituent”. I offered to provide the data as an EXCEL spreadsheet to these responders; Paul accepted the offer, and I e-mailed the data to him. We will see the results at an appropriate stage. 5) “There are many well-established techniques for choosing which wavelength regions to use when modeling with PLS/PCR. First, I advise people to make sure that the pure component spectrum actually has a band in the location being modeled." (Patrick Wiegand) Response: That indeed is a good procedure when you can do it (keeping in mind our earlier discussion regarding users reactions to the case of a conflict between theoret ical correctness and the experimental “numbers game”), and we also make the same recommendation when appropriate. If anything, proper wavelength choice is even more important when using MLR than either PCA or PLS. But what do you do when the “constituent” is a physical property, with no distinct absorbance band? This consider ation becomes particularly pernicious when that property is not itself being calibrated for, but is a variation superimposed on the data, and needs a factor (or wavelength) to compensate for, yet has no absorbance band of its own? The prototype example of this is the “repack” effect found when the measurements are made by diffuse reflectance: “Repack” does not have an absorbance band. Other situations arise where that approach fails: when the chemistry is unknown or too complicated (octane rating in gasoline, for example). Here again, even though a fair amount is known about the chemistry behind octane rating, there is no absorbance band for “octane value”. Another case is where the chemistry is known, but the spectroscopy is unknown, because the pure material is not available. Protein, for example, cannot be extracted from wheat (or at least not and still remain protein), so the spectrum of “pure” protein as it exists in wheat is unknown. Even simpler molecules are subject to this effect: we can measure the spectrum of pure water easily enough, for example, but that is not the same spectrum as water has when it is present as an intimate mixture in a natural product – the changes in the hydrogen bonding completely change the nature of the spectrum. And these examples are ones we know about! 6) “Finally, the calibration statistics presented in Table 27-1 show a correlation coef ficient of 0.9996 for PCR, even when an obviously nonlinear region is used! I am not sure if this is significantly different from the one shown for MLR using only the linear region. To me either model would be acceptable at the stage of method development where the article ended. Besides, it is unlikely that someone would be able to know a priori that the linear region was the better region to use for MLR.” (Patrick Wiegand) Response: As a purely practical matter, we agree with that interpretation. However, we hope that by now we have convinced you that we are trying to do more than that – we are trying to find out what really goes on inside the “black boxes” of chemometric
Linearity in Calibration: Act II Scene III
155
calculations. The fact that the value of the PCR correlation coefficient differs significantly from unity becomes clear when you look at the other term of the ANOVA equation: in the MLR case the sum-squared error is zero, in the PCR case it is “infinitely” greater than that. Don’t forget that “significance”, at least in the statistical sense, is defined only when dealing with random variables. This also relates to the earlier comment regarding how to find ways to compare the relative effects of noise and nonlinearity on calibration models. 7) “It would be very interesting also, since the performance of the models presented are so similar, to see how the performance would be affected by noise, drift, etc. which are always present in actuality. I would not be surprised if PLS/PCR outperformed MLR under those circumstances.” (Patrick Wiegand) Response: Yes, it certainly would be most interesting to investigate this question. This is closely related to the previous discussion concerning the relationship between noise and nonlinearity, so I would modify the statement of the problem to “At what point does one or another effect dominate the behavior of the calibration?” that is, where is the crossover point? Investigating questions of this sort is called “research”, and a more fundamental question arises: why isn’t anybody doing such investigations? Other, related, questions are also important: Having determined this in isolation, how does the data analyst determine this in real data, where unknown amounts of several effects may be present? There is a similarity here to Richard’s earlier point regarding the relationship between the amount of noise and the amount of nonlinearity. Here are more fertile areas for research into the behavior of calibration models. 8) “At any wavelength in your simulation, a second degree power series applies, which is linear in coefficients, and the coefficients of a 2 factor PCR or PLS model will be a linear function of the coefficients of the power series. (This assumes an adequate number of calibration spectra, that is, at least as many spectra as factors and a sufficient number of wavelength, which the full spectrum method assures.) The PCR or PLS regression should find the linear combination of these PCR/PLS coefficients that is linear in concentration.” (Fred Cahn) Response: We have read the indicated section of that paper [6], and scanned the rest of it. We agree with much of what it says, both in the paper and in Fred Cahn’s messages, but we are not sure we see the relevance to the column. Certainly, nonlinearities in real data can have several possible causes, both chemical (e.g., interactions that make the true concentrations of any given species different than expected or might be calculated solely from what was introduced into a sample, and interaction can change the underlying absorbance bands, to boot) and physical (such as the stray light, that we simulated). Approximating these nonlinearities with a Taylor expansion is a risky procedure unless you know a priori what the error bound of the approximation is, but in any case it remains an approximation, not an exact solution. In the case of our simulated data, the nonlinearity was logarithmic, thus even a second-order Taylor expansion would be of limited accuracy. Alternative methods, such as correcting the nonlinearity though the application of an appropriate physical theory as we described above, may do as well or even better than a Taylor series approximation, but a rigorous theory is not always available. Even in
156
Chemometrics in Spectroscopy
cases where a theory exists, often the physical conditions for which the theory is valid cannot be achieved; we demonstrated this in the discussion in Chapters 29 and 30 of the fundamental impossibility of truly achieving “Beer’s Law linearity”. Thus we are left with a situation where even in the best cases we can achieve, there can be residual non-linearities in the data. The purpose of our column was to investigate the behavior of different modeling methods in the face of nonlinearity. 9) “Thus, my interest in 2 or more factor chemometric models of your simulation is in line with this view of chemometrics. I agree with the need for better physical understanding of instrument responses as well as of the spectra themselves. I would not choose PCR/PLS or MLR to construct such physical models, however.” (Fred Cahn) Response: We were not trying to use the chemometric techniques to create a physical model in the column. We also agree that physical models should be created in the traditional manner, based on the study of the physical considerations of a situation. Ideally you would start from a fundamental physical law and derive, through logic and mathematics, the behavior of a particular system: this is how all other fields of science work. A chemometric technique then would be used only to ascertain the value (from a series of physical measurements) of an unknown parameter that the mathematical derivation created. What we were trying to do in the column was to ascertain the behavior of a mathemat ical (not physical!) system in the face of a certain type of (simulated) physical behavior. There is nothing wrong with trying to come up with empirical methods for improving the practical performance of chemometric calibration, but one of the philosophical problems with the current state of chemometrics is that nobody is trying to do anything else, that is to determine the fundamental behavior of these mathematical systems. 10) “The synthetic data did NOT demonstrate the advantage of a single linear wavelength over a multiple wavelength [sic] model ” (Richard Kramer) “ in one case you use a factor which happens to be based upon an explicit model (i.e. linearity) which is correct for the data while stacking the deck against the second case by denying any opportunity to be correct.” (Richard Kramer) “In your article, you appear to be creating an artificial set of circumstances: ” (Patrick Wiegand) “Thus your conclusion – that MLR is more capable of producing accurate models than PLS/PCR – is based on a contrived set of circumstances that would not occur in reality, especially when the chemometrician/spectroscopist is experienced.” (Patrick Wiegand) Response: Artificial? Contrived? Only insofar as any experimental study is based on a “contrived” set of circumstances – contrived to enable the experimenter to separate the phenomenon of interest and study its effects, with “everything else the same”. But that is a minor matter. Richard and Patrick (and how many others, who didn’t respond?) believe that we concluded that “MLR is better than PCA/PLS”. The really critical point here is that that is NOT our conclusion, and anyone who thinks this has misunderstood us. We put the fault for this on ourselves, since the one thing that is clear is that we did not explain ourselves sufficiently.
Linearity in Calibration: Act II Scene III
157
Therefore let us clarify the point here and now: we are not fighting a “holy war” against PCA/PLS etc. The purpose of the exercise was NOT to “prove that MLR with wavelength selection is better”, but to investigate and explain conditions that cause that to be so, when it happens (which it does, sometimes). As we discussed in the original column, more and more discussions about calibration processes, both oral and in the literature, describe situations where wavelength selection improved the results (in PCR and PLS as well as MLR), but there has previously been no explanation for this phenomenon. Therefore we decided to investigate nonlinearity since we suspected that to be a major consideration, and so it turned out to be. We continue our discussion in the following chapters.
REFERENCES 1. 2. 3. 4.
Mark, H. and Workman, J., Spectroscopy 13(6), 19–21 (1998). Mark, H. and Workman, J., Spectroscopy 13(11), 18–21 (1998). Mark, H. and Workman, J., Spectroscopy 14(1), 16–17 (1999). Mark, H., Principles and Practice of Spectroscopic Calibration (John Wiley & Sons, New York, 1991). 5. Mark, H., Applied Spectroscopy 42(5), 832–844 (1988). 6. Cahn, F. and Compton, S., Applied Spectroscopy 42, 865–872 (1988).
This page intentionally left blank
32
Linearity in Calibration: Act II Scene IV
This chapter continues our discussion started by the responses received to our Chapter 27 when it was first published as a paper entitled “Linearity in Calibration” [1]. So far our discussion has extended over three previous chapters (29 through 31) whose original published citations are given in references [2–4]. In Chapter 31, originally referenced as [4] we stated, “we are not fighting a ‘holy war’ against PCA/PLS etc.” and then went on to discuss what our original column was really about. However, if there is a “holy war” being fought at all, then from our point of view it is against the practice of simply accepting the results of the computer’s cogitations without attempting to understand the underlying phenomena that affect the behavior of the calibration models, regardless of the algorithm used. This has been our “fight” since the beginning – which can be verified by going back and rereading our very first column ever [5]. The authors do not always agree, but we do agree on the following: it is incompre hensible how a person calling himself a scientist can fail to wonder WHY calibration models behave the way they do, and try to relate their behavior to the properties of the data giving rise to them. There are reasons for everything that happens, whether we know what those reasons are or not, and the goal of science is to determine what those underlying reasons or principles are. At least that is the goal of every other field of scientific endeavor that we are aware of – why is Chemometrics exempt? Real data, as we have seen, is far too complicated to work with to try to obtain fundamental understanding, just as the physical world is often too complicated to study directly in toto. Therefore work such as was presented in the “Linearity in Calibration” chapter is needed, creating a simplified system where the characteristic of interest can be isolated and studied – just as physical experiments often work with a simplified portion of the physical world for the same reason. This might be categorized as “Experimental Chemometrics”, controlling the nature of the data in a way that allows us to relate the properties of the data to the behavior of the model. Does this mimic the “real world”? No, but it does provide a window into the inner workings of the calibration calculations, and we need as many such windows as we can get. We will go so far as to make an analogy with Chemistry itself. The alchemists of old had an enormous empirical knowledge base, and from that could do all manner of useful things. But we do not consider alchemy a science, and it did not become a science until the underlying principles and phenomena were discovered and codified in a way that all could use. The current state of Chemometrics is more nearly akin to alchemy than Chemistry: we can do all manner of useful things with it, but it is all empirical and there are still many areas where even the most expert and prominent practitioners treat it as a “black box” and make no attempt to understand the inner workings of that black box.
160
Chemometrics in Spectroscopy
Empiricism is important and even necessary, but hardly sufficient. The ultimate test of whether something is scientific is its ability to predict – and that does NOT mean SEP!! The irony of the situation is that a good deal of basic knowledge is available. The field of Chemometrics bypasses all the Statistical basics and jumps right into the heavyduty sophisticated algorithms: everybody just wants to start running before they can even crawl. We commented on this situation in earlier Chapters 29–31 and previous publications [6], and what response we received was on the order of “Why was so much space wasted before getting to the important part?” It is certainly unfortunate that the portion of the discussion that was perceived as “wasted space” was the important part, but was not recognized as such. The early foundations of Statistics go back to the 1600s or so, to the time when proba bility theory was recognized as a distinct branch of mathematics. The current problem is that nobody currently seems to apply the knowledge gained over the intervening span of time, or to be interested in applying that knowledge, or to do fundamental investigations at all. The chemometric community completely ignores the previous mathematical basis underlying its structure. The science of Statistics does, in fact, form a firm foundation that Chemometrics is built on. It is almost shameful that the modern Chemometrics community seems to be content to build ever higher and fancier superstructures on a foundation that is solid enough, but to which it is hardly connected. Worse, there seems to be an active antipathy to such investigations: just look at the firestorm we aroused by publishing a very small and innocuous study of the funda mental behavior of a particular data system! In fact, from the response, you would almost think we committed heresy or attacked religious beliefs, in daring to suggest that PCR/PLS was not always the best way to go, much less do some serious research on the subject. Everybody gives lip service to the concept of “fundamental research is good for the long run”, but nobody seems interested in putting that concept into practice, even with the possibility of fairly short-term returns. Let us look at a couple of examples. In reference [7] we found the following passage: But, it would be dangerous to assume that we can routinely get away with extrapolation of that kind. Sometimes it can be done, sometimes it can’t. There is no simple rule that can tell us which situation we might be facing. (see p. 129 in [7]). And that passage seems to sum up the current state of affairs. Theoretically, a good straight line should be extrapolatable almost indefinitely, yet we all know how risky it is to extrapolate even a little bit beyond the range of our data. Why does not practice conform to theory? The obvious answer is that something is nonlinear. But why cannot we detect this? As Rich says, we do not have any simple rules. Well, OK, so we do not have simple rules. Maybe no simple rules exist. But then, why do not we at least have complicated rules to help us make such important decisions? At least then we would have a way to predict (in the scientific sense) something that is worthwhile knowing. As it stands we have nothing, and nobody seems interested in finding out why. Maybe a new approach is needed. Maybe this is where Fred Cahn’s work is pertinent: if you can approximate the nonlinearity with a Taylor series, then maybe the quality of the fit can provide a diagnostic to form the foundation of a rule on which to base a decision. Maybe something else will work. We do not know, but it is a possible starting
Linearity in Calibration: Act II Scene IV
161
point. Fred, you are in the ideal position to pursue this, how about it – will you accept this challenge? The above example, of course, is relatively abstract and “academic”, and as such perhaps not of too much interest to the majority. Another example, with more practical application, is transfer of calibration models from one instrument to another. This is an endeavor of enormous current practical importance. Witness that hardly a month passes without at least one article on that topic in one or more of the analytical or spectroscopic journals. Yet all those reports are the same: “Effect of Data Treatment ABC Combined with Algorithm XYZ Compared to Algorithm UVW” or some such; they are all completely empirical studies. In themselves there is nothing wrong with that. The problem is that there is nothing else. There are no critical reviews summarizing all this work and extracting those aspects that are common and beneficial (or common and harmful, for that matter). Even worse, there are no fundamental studies dealing with the relationship of the algorithm’s behavior to the underlying physics, chemistry, mathematics, or instrumental effects. It is not difficult to see that the calibration transfer problem breaks down into two pieces: a) The effect of instrumental variation on the data b) The effect of variations of the data on the model. Studying the effects of instrumental performance should be the province of the manu facturers. Unfortunately, the perception is that it is to their benefit to release such results only if they turn out to be “good”, and there is little incentive for them to perform studies whose only purpose is to increase scientific knowledge. Thus it is up to academia to pick up this particular ball, if there is any interest in it at all. Fundamental studies in those areas will eventually give rise to real knowledge about how and when calibrations can be transferred, and provide us with trustworthy recipes for doing the transfer. Such knowledge will also provide us with the confidence of knowing that the underlying science is sound, and thus take us beyond the “my algorithm is better than your algorithm” stage that we are now at. Furthermore, true fundamental understanding could also be applied in reverse. Then instrument manufacturers could concentrate on those aspects of construction and opera tion that affect the transferability situation, and be able to verify their capabilities in an unambiguous, scientifically valid and agreed-on manner. This is just one other example of a current problem that COULD be attacked with fundamental studies, with both short- and long-term benefits that are obvious to all. Connecting to the statistical foundations, as described above, can have other benefits. For example, computing an SEP on a validation set of data is considered the be-all and end-all of calibration diagnostics. This is an important calculation, to be sure, but it has its limitations, as well. For example, the SEP alone has no diagnostic capability: it tells you nothing about what you need to do in order to improve a calibration model. For another, even when you compare SEPs from different models and choose the model with the smallest SEP, that does not necessarily mean you are choosing the best model. We often see “robustness” bandied about in discussions of calibration models, but what diagnostics do we have to quantify “robustness”? Without such a diagnostic, how can we expect to evaluate “robustness” either in isolation or to compare with SEP?
162
Chemometrics in Spectroscopy
By focusing all our attention on the SEP we have also lost the ability to evaluate calibrations on their own. When calibrating spectrometers to do quantitative analysis, where samples are cheap and easy to come by, this loss is not too serious, but what do you do when a project requires calibration runs that cost a million (or ten million) dollars per run, and minimizing the number of runs is the absolute top priority? In such a case, you will not only not have validation data, you will likely not even have enough calibration data to do a leave-one-out calculation, and then being able to evaluate models from calibration diagnostics alone will be critical. Statisticians have, in fact, developed diagnostic tests that provide information about such characteristics, but the Chemometric community, in our arrogance, think we know better and ignore all this prior work. The statistical community has also developed many local and semi-local diagnostic tools to help understand and improve calibration models; we really need to get back to the roots on this, as well. There are innumerable unsolved problems in Chemometrics that need to be addressed: real, scientific problems, not just new ways to throw numbers around.
REFERENCES 1. 2. 3. 4. 5. 6. 7.
Mark, H. and Workman, J., Spectroscopy 13(6), 19–21 (1998). Mark, H. and Workman, J., Spectroscopy 14(2), 16–27 (1999). Mark, H. and Workman, J., Spectroscopy 14(1), 16–17 (1999). Mark, H. and Workman, J., Spectroscopy 13(11), 18–21 (1998). Mark, H. and Workman, J., Spectroscopy 2(1), 38–39 (1987). Mark, H. and Workman, J., Spectroscopy 13(4), 26–29 (1998). Kramer, R., Chemometric Techniques for Quantitative Analysis (Marcel Dekker, New York, 1998).
33 Linearity in Calibration: Act II Scene V
This chapter is still a continuation of our discussion started by the responses received to Chapter 27 from our initial publication of “Linearity in Calibration” [1]. Up until now our discussion has extended over Chapters 29–32 as original paper publications ([2–5], respectively). At this point, however, we are finally getting toward the end of our obsession with considerations of linearity – at least until we receive another set of comments from our readers. Incidentally, we welcome such feedback, even those that disagree with us or with which we disagree, so please keep it coming. Indeed, it seems that we do not get much feedback unless our readers disagree with us, and feel it strongly enough to feel the need to say so. That is great – there is nothing like a little controversy to keep a book like this interesting: who said chemometrics and statistics and mathematics were dry subjects, anyway?! In our original column on this topic [1] we had only done a principal component analysis to compare with the MLR results. One of the comments made, and it was made by all the responders, was to ask why we did not also do a PLS analysis of the synthetic linearity data. There were a number of reasons, and we offered to send the data to any or all of the responders who would care to do the PLS analysis and report the results. Of the original responders, Paul Chabot took us up on our offer. In addition, at the 1998 International Diffuse Reflectance Conference (The “Chambersburg” meeting), Susan Foulk also offered to do the PLS analysis of this data. Gratifyingly, when Paul and Susan reported their PLS loadings they were identical, even though they used different software packages to do the PLS calculations (PLSIQ and Unscrambler). We are certainly glad we do not have to worry about sorting out dif ferences in software packages (due to different convergence criteria, etc., that sometimes creep into results such as these) on top of the Chemometric issues we want to address. Figure 33-1 presents the plot of the PLS loadings. Paul and Susan each computed both loadings. Note that the first loading is indistinguishable to the eye from the first PCA loading (see our original column on this topic [1]). Paul and Susan each also computed the two calibration models and performance statistics for both models. Except that various programs did not compute the same sets of performance statistics (although in one case a different computation seemed to be given the same label as SEE), the ones that were reported by both programs had identical values. As expected by all responders, and by your hosts as well, when two-factor models (either PCR or PLS) were computed, the fit of the model to the synthetic data was perfect. Table 33-1 presents a summary of the numerical results obtained, for one-factor calibration models. Interestingly, when comparing the calibration results we find that the reported cor relation coefficients agree among the different programs using the same algorithm, but the SEE values differ appreciably; it would seem that not all programs use the same
164
Chemometrics in Spectroscopy PLS Loadings 0.2 0.15 0.1
300
288
276
264
252
240
228
216
204
192
180
168
156
144
132
108
120
96
84
72
60
48
36
24
0
0 –0.05
12
Loading
0.05
–0.1 –0.15 –0.2 –0.25 –0.3 Index
Figure 33-1 PLS loadings from the synthetic data used to test the fit of models to nonlinearity. (see Colour Plate 2)
Table 33-1 Summary of results obtained from synthetic linearity data using one PCA or PLS fac tor. We present only those performance results listed by the data analyst as Correlation Coefficient and Standard Error of Estimate Data analyst Column Chabot Chabot Foulk
Type of analysis
Corr. Coeff.
SEE
PCR PCR PLS PLS
0999622439 0999622411 0999623691 0999624
0057472 001434417 001436852 0051319
definition of SEE. This leaves in question, for example, whether the value reported for SEE from PLS by Susan Foulk is really as large an improvement over the SEE for PCR reported by your columnists, or if it is due to a difference in the computation used. Since Paul Chabot reported SEE for both algorithms and his values are more nearly the same, even though his computation seems to differ from both the others, the tentative conclusion is that there is a difference in the computation. Indeed, we find that if we multiply our own value for SEE by the square root of 4/5, we obtain a value of 0.0514045, a value that compares to the SEE obtained by Susan Foulk in more nearly the same way that Paul Chabot’s values compare to each other, indicating a possibility that there is a discrepancy in the determination of degrees of freedom that are used in the two algorithms. Based on the values of the correlation coefficients, then, we can find the following comparisons between the two algorithms: as several of the responders indicated, the PLS model did provide improved results over the PCR model. On the other hand, the degree of improvement was not the major effect that at least some of the responders expected. As Richard Kramer expected,
Linearity in Calibration: Act II Scene V
165
PLS should have, in principle, rejected a portion of the non-linear variance result ing in a better, although not completely exact, fit to the data with just 1 factor. Some of this variance was indeed rejected by the PLS algorithm, but the amount, compared to the Principal Component algorithm, seems to have been rather minuscule, rather than providing a nearly exact fit. Nonlinearity is a subject the specifics of which are not prolifically or extensively discussed as a specific topic in the multivariate calibration literature, to say the least. Textbooks routinely cover the issues of multiple linear regression and nonlinearity, but do not cover the issue with “full-spectrum” methods such as PCR and PLS. Some discussion does exist relative to multiple linear regression, for example in Chemometrics: A Textbook by D.L. Massart et al. [6], see Section 2.1, “Linear Regression” (pp. 167–175) and Section 2.2, “Non-linear Regression,” (pp. 175–181). The authors state, In general, a much larger number of parameters [wavelengths, frequencies, or factors] needs to be calculated in overlapping peak systems [some spectra or chromatograms] than in the linear regression problems. (p. 176) The authors describe the use of a Taylor expansion to negate the second and the higher order terms under specific mathematical conditions in order to make “any function” (i.e., our regression model) first-order (or linear). They introduce the use of the Jacobian matrix for solving nonlinear regression problems and describe the matrix mathematics in some detail (pp. 178–181). There are also forms of nonlinear PCR and PLS where the linear PCR or PLS factors are subjected to a nonlinear transformation during singular value decomposition; the nonlinear transformation function can be varied with the nonlinearity expected within the data. These forms of PCR/PLS utilize a polynomial inner relation as spline fit functions or neural networks. References for these methods are found in [7]. A mathematical description of the nonlinear decomposition steps in PLS is found in [8]. These methods can be used to empirically fit data for building calibration models in nonlinear systems. The interesting point is that there are cases, such as the one demonstrated in the Linearity in Calibration chapter where nonlinearity is the dominant phenomenon, where MLR will fit the data more closely with fewer terms than either PCR or PLS. One could imagine a real case where an analyte would have a minor absorption band such that the magnitude of the spectral band is within a linear region of the measuring instrument. One could also imagine the major absorption band of this analyte is somewhat nonlinear at the higher concentration ranges. In this special case the MLR would provide a closer fit with fewer terms than either the PLS or the PCR, unless the minor band was isolated prior to model development using the PCR or PLS. This points to a continuing need for spectral band selection algorithms that can automatically search for the optimum spectral information and linear fit prior to the calibration modeling step. But all things remaining constant, cases remain where MLR with automatic channel selection feature will provide a more optimum fit, in some cases, than either PCR or PLS. Surprising indeed, to some people! In their day, Principal Components and Partial Least Squares were each considered almost as “the magic answer to all calibration problems”. It took a long time for the realization to dawn that they contain no “magic” and are subject to most of the
166
Chemometrics in Spectroscopy
same problems as the algorithm previously available (at that time, what we now call MLR). Now we see a surge in other new algorithms: wavelets, neural networks, genetic algorithms, as well as the combining of techniques (e.g., selecting wavelengths before performing a PCA or PLS calculation). While some of the veterans of the “PC wars” (not “political correctness”, by the way) realize that they can be overfit just as MLR calibrations can, have become wary of the problem and are more cautious with new algorithms, there is some evidence that a large number, perhaps the majority, of users are not nearly so careful, and are still looking for their “magic answer”. There is a generic caution that need to be promoted, and all users made aware of when dealing with these more sophisticated methods. That is the simple fact that every new parameter that can be introduced into a calibration procedure is another way to overfit and hide the fact that it is happening. Worse, the more sophisticated the algorithm the harder it is to see and recognize that that is going on. With PCR and PLS we introduced the extra parameter of the number of factors: one extra parameter. With wavelets we introduce the order and the locality of each wavelet: two extra parameters. With neural nets, we have the number of nodes in each layer: n extra parameters, and then there is even a metaparameter: the number of layers. No wonder reports of overfitting abound (and don’t forget: those are only the ones that are recognized)! And nary a diagnostic in sight. In a perfect world, a new algorithm would not be introduced until a corresponding set of diagnostic methods were developed to inform the user how the algorithm was behaving. As long as we are dreaming, let us have those diagnostics be informative, in the sense that if the algorithm was misbehaving, it would point the user in the proper direction to fix it.
REFERENCES 1. 2. 3. 4. 5. 6.
Mark, H. and Workman, J., Spectroscopy 13(6), 19–21 (1998). Mark, H. and Workman, J., Spectroscopy 13(11), 18–21 (1998). Mark, H. and Workman, J., Spectroscopy 14(1), 16–17 (1999). Mark, H. and Workman, J., Spectroscopy 14(2), 16–27 (1999). Mark, H. and Workman, J., Spectroscopy 14(5), 12–14 (1999). Massart, D.L., Vandeginste, B.G.M., Deming, S.N., Michotte, Y. and Kaufman, L., Chemo metrics: A Textbook (Elsevier Science Publishers, Amsterdam, 1988). 7. Wold, S., Kettanah-Wold, N. and Skagerberg, B., Chemometrics and Intelligent Laboratory Systems 7, 53–65 (1989). 8. Wold, S., Chemometrics and Intelligent Laboratory Systems 14 (1992).
34
Collaborative Laboratory Studies: Part 1 – A Blueprint
We will begin by taking a look at the detailed aspects of a basic problem that confronts most analytical laboratories. This is the problem of comparing two quantitative methods performed by different operators or at different locations. This is an area that is not restricted to spectroscopic analysis; many of the concepts we describe here can be applied to evaluating the results from any form of chemical analysis. In our case we will examine a comparison of two standard methods to determine precision, accuracy, and systematic errors (bias) for each of the methods and laboratories involved in an analytical test. As it happens, in the case we use for our example, one of the analytical methods is spectroscopic and the other is an HPLC method. As it happens, a particularly opportune event occurred recently, almost simultaneously with our writing these next few chapters: an article [1] appeared in LC-GC, a sister magazine to Spectroscopy, that also takes concepts that we discussed and described in some of our early chapters, and applies them to a real-life situation (or at least a simulation of a real-life situation), the main difference is that the experiment described deals with macroscopic objects while the “real world” deals in atoms and molecules). In past chapters [2, 3] we also described how probabilistic phenomena give rise to distributions and even included computer programs to allow simulations of this, but given the constraints of time and text space, we were not able to link that to the actual behavior of the physical world nearly as well as Hinshaw does. In the case described, given the venue, the interest is in the chromatography, and for that reason we will not dwell on their application. However, we do strongly urge our readers to obtain a copy of this article and read it for it is description of the basis and generation of the distributions that arise from the effects of the random behavior of the physical world. The probabilistic and statistical experiments described are superb examples of how concepts such as these can be illustrated and brought to life. The statistical tools we describe in the next few chapters, and use for this demonstra tion, are ones that we have previously described. These tools include statistical hypothesis testing and ANOVA. Our previous descriptions of these topics were generic and rather general; at that time we were interested in presenting the theoretical background and reasoning behind the development of these statistical techniques. Now we will use them in a practical situation, to show how these methods can be used to evaluate various characteristics relating to the precision and accuracy of analytical methods, applying them to real data to simultaneously demonstrate how to use them and the nature of the results that can be obtained. We will use ANOVA to evaluate potential bias in reported results inherent in the analytical methods themselves, or due to the operators (i.e., location of laboratory) performing the methods. For the next series of articles all computations were completed using MathCad Worksheets [4] written by the authors. The objectives of this next set of articles is to determine the precision, accuracy, and bias due to choice of analytical
168
Chemometrics in Spectroscopy
method and/or operator for the determination of an analyte within a set of hypothetical production samples and spiked recovery samples (samples of gravimetrically known composition). The discussion will occupy the Chapters 34–39.
EXPERIMENTAL DESIGN The experimental design used for this hypothetical study is based on a relatively simple factorial model where individual samples are measured as shown in Figure 34-1 and Table 34-1. We have previously discussed factorial designs [5] although, as was the case with ANOVA, our previous discussion was simplified and primarily theoretical, to demonstrate the principles involved, while in the current discussion, we apply these concepts to a more realistic practical situation. For this hypothetical test, samples consist of three production run samples (i.e., Nos. 1–3) with a target analyte value of 3.60 units (percent, grams, pounds, etc.). In addition, three spiked recovery samples with target analyte levels of 3.40, 3.61, and 3.80% respectively are represented by Nos. 4–6. This experimental model allows the methods and locations (labs or operators) to be compared for precision, accuracy, and systematic errors. We will use the designation Lab 1 and Lab 2 to indicate different locations and/or operators performing the identical procedures for METHODS A and B (or I and II). Before considering the design and the analysis of it in detail, let us take a look at the factors that are being included in the design, and their impact on the experimental design and the analysis of this design: we have six samples, two methods of analysis for the constituent of interest, two laboratories, two chemists in each laboratory and five repeat readings of the constituents of each sample by each chemist. Statistical hypothesis
Method I
r1 r2 r 3
r 4
r 5
Method II
r1 r2 r 3
r 4
r 5
Method I
r1 r2 r 3
r 4
r 5
Method II
r1 r2 r 3
r 4
r 5
Lab 1
Each sample (n = 6)
Lab 2
Sample
Location
Method
Replicates
Figure 34-1 A simple factorial design for collaborative data collection. Each sample analyzed (in this hypothetical case n = 6) requires multiple labs, or operators, using both methods of analysis and replicating each measurements a number of times (r = 5) for this hypothetical case.
Collaborative Laboratory Studies: Part 1
169
Table 34-1 “As reported” analytical data∗ for collaborative study Sample No. – Replicate no.
Lab 1 – Method B
Lab 2 – Method B
Lab 1 – Method A
Lab 2 – Method A
11 12 13 14 15 Mean
3507 3463 3467 3501 3489 3.485
3507 3497 3503 3473 3447 3.485
3462 3442 3460 3517 3460 3.468
3460 3443 3447 — — 3.450
21 21 23 24 25 Mean
3479 3453 3459 3461 3481 3.467
3497 3660 3473 3447 3453 3.506
3446 3448 3455 3456 3455 3.452
3460 3470 3450 3460 3460 3.460
31 32 33 34 35 Mean
3366 3362 3351 3353 3347 3.356
3370 3327 3387 3430 3383 3.379
3318 3330 3328 3322 3323 3.324
3337 3317 3337 3330 3330 3.330
41 42 43 44 45 Mean
3421 3377 3399 3379 3379 3.391
3407 3400 3417 3353 3380 3.391
3366 3360 3361 3362 3370 3.364
3380 3380 3380 3380 3380 3.380
51 52 53 54 55 Mean
3565 3568 3561 3576 3587 3.571
3540 3550 3573 3533 3543 3.548
3538 3539 3544 3540 3543 3.541
3560 3580 3590 3580 3560 3.570
61 62 63 64 65 Mean
3764 3742 3775 3767 3766 3.763
3860 3833 3933 3870 3810 3.881
3741 3740 3739 3742 3744 3.741
3740 3760 3730 3770 3750 3.740
∗
Note: For this hypothetical exercise, Samples 1–3 have a target value of 3.60% absolute; whereas Samples 4–6 are Spiked Recovery Samples with target values of 3.40 (No. 4), 3.61 (No. 5), and 3.80 (No. 6).
170
Chemometrics in Spectroscopy
testing provides us with an objective method of determining whether or not a given difference in conditions (i.e., factor) has an effect on the readings. We have the following a priori expectations for the behavior of these several factors: a) Since we know that the samples are of different composition we expect the measure ments of the constituent value to reflect this genuine difference in composition, and be therefore to be systematic, and be constant across all other factors. Any departure from constant differences (beyond the amount expected from random variation due to unavoidable random error of the analysis, of course) can be attributed to an effect of the corresponding factor, or due to blunders such as improper mixing or sampling of the material. b) There may be an effect due to the use of two different laboratories. This effect may or may not be the same for the two different methods of analysis. This can be examined by comparing the results of measurements on the same sample by the same method in each of the two different laboratories. Under the proper circumstances, results from multiple samples may be combined to achieve a more definitive test. Before doing so, the existence of the appropriate circumstances must first be determined. c) There may be an effect due to the use of two different methods of analysis. This effect may or may not be the same in the two different laboratories. There may or may not be a difference between the two chemists in each laboratory. This can be examined by comparing the results of measurements on the same sample by the two different methods of analysis. Under the proper circumstances, results from multiple samples may be combined to achieve a more definitive test; if circumstances are appropriate, results from the two chemists in each laboratory and the results from the two laboratories may also be combined. Before doing so, the existence of the appropriate circumstances must first be determined. d) There may or may not be a difference between the two chemists’ readings of the constituent values in a given laboratory. If we arbitrarily label the chemists in each laboratory as “Chemist #1” and “Chemist #2”, we would not expect a systematic difference between the corresponding chemists in the two different laboratories. This can, however, happen by coincidence. This can be examined by comparing the results of measurements on the same sample by the two different chemists in each laboratory. Under the proper circumstances, results from multiple samples may be combined to achieve a more definitive test. Before doing so, the existence of the appropriate circumstances must first be determined. Many of these aspects will be presented over the next several chapters. e) We do not expect any systematic effects among the five repeat readings of each sample by each chemist in each laboratory. We do expect random variations, reflecting unavoidable random errors of measurement. These unavoidable random errors of measurement are quantified by the terms “precision” and “accuracy”. f) We expect the precision and accuracy for each method to be the same at both laboratories. This can be examined by comparing the precision and accuracy of each method in each laboratory, combining results from multiple samples when appropriate. Before doing so, the existence of the appropriate circumstances must first be determined. g) We do not expect the precision and accuracy to be the same for the two methods except by coincidence.
Collaborative Laboratory Studies: Part 1
171
h) We expect the precision and accuracy to be the same for all four chemists for each method, unless we find a difference in precision and/or accuracy between laboratories. This can be examined by comparing the precision and accuracy of each method as performed by each chemist, combining results from multiple samples when appropriate. Before doing so, the existence of the appropriate circumstances must first be determined. The use of the statistical tools of ANOVA and statistical hypothesis testing, described previously in these chapters and whose application is described in further detail below, allows separation of the effects due to the various factors and objective verification as to which ones are statistically significant. In the absence of any systematic effects due to one or more of the factors, our a priori expectation is that any differences seen are due to the effects of unavoidable random errors only, and will therefore be non-significant. Therefore, any statistically significant effects found due to differences between sets of readings indicates that the corresponding factor has a real, systematic effect on the readings. By posing the scientific questions about the effects of the factors in the formalism of statistical hypothesis tests [6], any statistically significant result is an indication that the corresponding factor has a real, systematic effect on the readings, and this gives us the handle we need to extract that information from the mass of data we obtain from this simple-seeming, but (as we see) actually very complicated experimental design. Data analysis for this series was performed using MathCad and the statistical methods used are described in greater detail in Youden’s monograph [7] and in Mark and Workman [8]. We use the MathCad worksheets both to illustrate how the theoretical concepts can be put to actual use and also to demonstrate how to perform the calculations we describe. The worksheets will be printed along with the chapters in which they are first used. At a later date we are planning to enable you to go to the Spectroscopy home page (http://www.spectroscopymag.com) and find them. If, and when, the actual URLs for the worksheets become available, we will let you know. The primary goal of this series of chapters is to describe the statistical tests required to determine the magnitude of the random (i.e., precision and accuracy) and systematic (i.e., bias) error contributions due to choosing Analytical METHODS A or B, and/or the location/operator where each standard method is performed. The statistical analysis for this series of articles consists of five main parts as: Part 1: Overall comparison of both locations and analytical methods for precision and accuracy; Part 2: Analysis of Variance testing for both locations and analytical methods to deter mine if an overall bias exists for location or analytical method; Part 3: Testing for systematic error in each method by performing a comparison test for a set of measurements versus the known True Value; Part 4: Performing a ranking test to determine if either analytical method or location affects the results as a systematic error (bias); and Part 5: Computing the “efficient comparison of two methods” as described by Youden and Steiner in reference [7]. The analyst may use one or more of these statistical test methods to compare analyti cal results depending upon individual requirements. It is recommended that the easiest
172
Chemometrics in Spectroscopy
and most fruitful test for the effort expended would be the test method described in Chapter 38. This simple set of tests statistically compares precision, accuracy, and sys tematic error for two methods with the minimum quantity of analytical effort. Chapter 38 is most highly recommended above the Chapters 34–37, but it is a useful tool to proceed through an understanding of the first chapters before proceeding to Chapter 38. The basic experimental design required for statistical methods in Chapters 34–37 is demonstrated in Figure 34-1 and the data is presented in Table 34-1. The basic experimental design required for Chapter 38 statistical methods is given in Figure 34-2 and the corresponding data in Table 34-2. Thus, if you would like to follow along by performing these tests on your own real data, the basic designs are demonstrated here to allow you to collect data before proceeding through the statistical methods described within the next 6 chapters.
r1
Sample X
r2 r3 r4 r5
Sample Y
r1 r2 r3 r4 r5
Sample X
r1 r2 r3 r4 r5
Sample Y
r1 r2 r3 r4 r5
Method A
Method B
Method
Sample
Replicates
Figure 34-2 Simple experimental design for Youden/Steiner comparison of two Methods (data shown in Table 34-2).
Table 34-2 Analytical data entry for comparison of two methods tests Method A
Mean
Method B
Sample X
Sample Y
Sample X
Sample Y
3366 3380 3360 3380
3741 3740 3740 3760
3421 3407 3377 3400
3764 3860 3742 3833
3372
3745
3401
3800
Collaborative Laboratory Studies: Part 1
173
ANALYTICAL METHODS Sample collection and handling Let us say the first three samples tested were collected by Lab 2 from their production facility. These samples were retained from actual production lots. An aliquot from each retained jar was removed and shipped to Lab 1 in appropriate sealed containers. METHOD B testing was started at both laboratories the day following receipt of the samples to rule out any possible aging effects. METHOD A testing was performed in Lab 1 on the following day, while the METHOD A testing in Lab 2 occurred a week later. The second three samples were spiked, produced at Lab 2 using the pure analyte reagent and Control material. An aliquot of each sample was shipped to Lab 1 in appropriate sealed containers. Once again, the METHOD B testing was performed on the same day at both locations. METHOD A testing was done at both sites within a 2-day time period.
METHOD A and B analysis All six samples at both sites were prepared the same way. Five separate aliquots from each sample were separately sampled and prepared for testing. Each aliquot was then measured three times. Conditions and standard operating procedures for METHODS A and B were carefully specified for both Labs 1 and 2.
RESULTS AND DATA ANALYSIS Comparing all laboratories and all methods for precision and accuracy COMPARISON OF PRECISION AND ACCURACY FOR METHODS AND LABO RATORIES USING THE GRAND MEAN FOR SAMPLES No. 1–3 (Collabor_GM Worksheet), OR BY USING A SPIKED RECOVERY STUDY FOR SAMPLES No. 4–7 (Collabor_TV Worksheet) To compute the results shown in Tables 34-3 and 34-4, the precision of each set of replicates for each sample, method, and location are individually calculated using the root mean square deviation equation as shown (Equations 34-1 and 34-2) in standard symbolic and MathCad notation, respectively. Thus the standard deviation of each set of sample replicates yields an estimate of the precision for each sample, for each method, and for each location. The precision is calculated where each yij is an individual replicate (j) measurement for the ith sample; y¯ i is the average of the replicate measurements for the ith sample, for each method, at each location; and N is the number of replicates for each sample, method, and location. The results of these computations for these data
174
Chemometrics in Spectroscopy
Table 34-3 Individual sample analysis precision for hypothetical production samples Sample no. Sample 1 Sample 2 Sample 3 Pooled
METHOD B – Lab 1
METHOD B – Lab 2
METHOD A – Lab 1
METHOD A – Lab 2
0020 0013 00079 0015
0025 0088 0037 0057
00089 00066 00068 0008
00089 0010 0012 0010
Table 34-4 Individual sample analysis precision for hypothetical spiked recovery samples Sample no. Sample 4 Sample 5 Sample 6 Pooled
METHOD B – Lab 1
METHOD B – Lab 2
METHOD A – Lab 1
0019 0010 0012 0014
0025 0015 0047 0032
00041 00026 00019 00030
METHOD A – Lab 2 0000 0013 0016 0012
are found in Tables 34-3 and 34-4 representing samples 1–3 (hypothetical production samples), and 4–6 (hypothetical spiked samples), respectively.
S=
� �N �� � yi − y¯ i 2 � i=1 N −1
� �−−−−−−−−−−−−−−→ � �� � Y − meanY 2 S= N −1
(34-1)
(34-2)
The pooled precision and accuracy for each sample for both analytical methods and locations are calculated using Equations 34-3 and 34-4, representing standard symbolic and MathCad notation, respectively. The pooled precision is calculated where each yi is an individual replicate measurement for an individual sample; y¯ i is the average of the replicate measurements for each sample, each method, each location; and Ni is the number of replicates for an individual (ith) sample, method, and location. The results of these computations for these data are found in Tables 34-3 and 34-4 (Pooled) row representing samples 1–3, and 4–6, respectively. The results from Tables 34-3 and 34-4 indicate there is no trend in error versus concentration, therefore the error appears to show no trending with respect to concentration.
Ps =
� � N1 � N2 � N3 � N4 � �2 � �2 � �2 � �2 �� y1j − y¯ 1 + y2j − y¯ 2 + y3j − y¯ 3 + y4j − y¯ 4 � � j=1 j=1 j=1 j=1 N1+N2+N3+N4−4
(34-3)
Collaborative Laboratory Studies: Part 1
175
Table 34-5 Individual sample analysis estimated accuracy using grand mean calculation Sample no. Sample 1 Sample 2 Sample 3 Pooled
� Ps =
METHOD B – Lab 1
METHOD B – Lab 2
METHOD A – Lab 1
METHOD A – Lab 2
0025 0014 0012 0018
0029 0096 0051 0065
0029 0031 0037 0033
0029 0017 0024 0024
− −−−−−−−−−−−−−−−−�−−−�−−−−−−−−−−−−−−2−−−−−−−−−−−−−−−→ � �−−−−−−−−−−−−−−− Y 1 − meanY 12 + Y 2 − meanY 22 Y 3 − meanY 3 + Y 4 − meanY 42 + N1+N2+N3+N4−4 N1+N2+N3+N4−4
(34-4) To compute the results shown in Table 34-5 for production samples, the accuracy of each set of replicates for each sample, method, and location was individually calculated using the root mean square deviation equation as shown in equations 34-5 and 34-6 in standard symbolic and MathCad notation, respectively. The standard deviation of each set of sample replicates yields an estimate of the accuracy for each sample, for each method, and for each location. The accuracy is calculated where each yi is an individual replicate measurement; GM is the Grand Mean of the replicate measurements for each sample, both methods, both locations; and N is the number of replicates for each sample, method, and location. The results found in Table 34-5 represent samples 1–3. Note: Each sample had a Grand Mean computed by taking the mean for all measurements made for each of the samples 1–3. � � N � �2 �� � yij − GMi � j=1 Si = (34-5) N −1 � �−−−−−−−−−−→ � �� � Y − GM2 S = N −1
(34-6)
To compute the results shown in Table 34-6 for the Spiked Recovery samples, the accu racy of each set of replicates for each sample, method, and location can be individually calculated using the root mean square deviation equation as shown in equations 34-5 and 34-6 in standard symbolic and MathCad 7.0 notation, respectively. The standard devia tion of each set of sample replicates yields an estimate of the accuracy for each sample, for each method, and for each location. The accuracy is calculated where each yi is an individual replicate measurement; and The Spiked or true values (TV) are substituted for GM in equations 34-5 and 34-6. The accuracy is calculated for each sample, each method, and each location; and N is the number of replicates for each sample, method, and location. The results found in Table 34-6 represent samples 34-4 through 34-6. Note: Each sample had a True Value given by a known analyte spike into the sample.
176
Chemometrics in Spectroscopy
Table 34-6 Individual sample analysis accuracy using Spiked Recovery study Sample no. Sample 4 Sample 5 Sample 6 Pooled
METHOD B – Lab 1
METHOD B – Lab 2
METHOD A – Lab 1
METHOD A – Lab 2
0022 0044 0043 0038
0027 0071 0083 0065
0041 0077 0066 0063
0022 0042 0058 0043
Table 34-7 Individual sample precision and accuracy for combined Methods A and B and Labs 1 and 2 – Production samples No. Sample 1 Sample 2 Sample 3 Pooled
GM
Precision
3.472 3.471 3.347 3.430
00231 00479 0021 0033
Accuracy 00278 00538 0033 0040
Table 34-8 Individual sample precision and accuracy for combined Methods A and B and Labs 1 and 2 – Spiked Recovery samples No. Sample 4 Sample 5 Sample 6 Pooled
TR
Precision
340 361 380 3603
0016 0011 0025 0018
Accuracy 0029 0061 0064 0054
The analytical results for each sample can again be pooled into a table of precision and accuracy estimates for all values reported for any individual sample. The pooled results for Tables 34-7 and 34-8 are calculated using equations 34-1 and 34-2 where precision is the root mean square deviation of all replicate analyses for any particular sample, and where accuracy is determined as the root mean square deviation between individual results and the Grand Mean of all the individual sample results (Table 34-7) or as the root mean square deviation between individual results and the True (Spiked) value for all the individual sample results (Table 34-8). The use of spiked samples allows a better comparison of precision to accuracy, as the spiked samples include the effects of systematic errors, whereas use of the Grand Mean averages the systematic errors across methods and shifts the apparent true value to include the systematic error. Table 34-8 yields a better estimate of the true precision and accuracy for the methods tested. A simple statistical test for the presence of systematic errors can be computed using data collected as in the experimental design shown in Figure 34-2. (This method is demonstrated in the Measuring Precision without Duplicates sections of the MathCad Worksheets Collabor_GM and Collabor_TV found in Chapter 39.) The results of this test are shown in Tables 34-9 and 34-10. A systematic error is indicated by the test using
Collaborative Laboratory Studies: Part 1
177
Table 34-9 Statistical test for the presence of systematic errors (using samples 1 and 2 only) F-test for bias 16.53
F-critical for bias 9.27
Table 34-10 Statistical test for the presence of systematic errors (using samples 4 and 5 only) F-test for Bias 2.261
F-critical for Bias 9.277
Samples 1 and 2, but not for Samples 4 and 5. This indicates that the difference between precision and accuracy is large enough to indicate a bias inherent within the analytical method(s). Since these are the same methods and locations tested, further evaluation is required to determine if a bias actually exists.
REFERENCES 1. 2. 3. 4. 5. 6. 7.
Hinshaw, J.V., LC-GC 17(7), 616–625 (1999). Mark, H. and Workman, J., Spectroscopy 2(2), 60–64 (1987). Workman, J. and Mark, H., Spectroscopy 2(6), 58–60 (1987). MathCad; MathSoft, Inc.: 101 Main Street, Cambridge, MA 02142; Vol. v. 7.0; (1997). Mark, H. and Workman, J., Spectroscopy 10(1), 17–20 (1995). Mark, H. and Workman, J., Spectroscopy 4(7), 53–54 (1989). Youden, W. J. and Steiner, E. H., Statistical Manual of the AOAC, 1st ed. (Association of Official Analytical Chemists, Washington, DC, 1975). 8. Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991).
This page intentionally left blank
35
Collaborative Laboratory Studies: Part 2 – using ANOVA
In this chapter the use of ANOVA will be described for use in collaborative study work.
ANOVA TEST COMPARISONS FOR LABORATORIES AND METHODS (ANOVA_s4 WORKSHEET) Analysis of Variance (ANOVA) is a useful tool to compare the difference between sets of analytical results to determine if there is a statistically meaningful difference between a sample analyzed by different methods or performed at different locations by different analysts. The reader is referred to reference [1] and other basic books on statistical methods for discussions of the theory and applications of ANOVA; examples of such texts are [2, 3]. Table 35-1 illustrates the ANOVA results for each individual sample in our hypo thetical study. This test indicates whether any of the reported results from the analytical methods or locations is significantly different from the others. From the table it can be observed that statistically significant variation in the reported analytical results is to be expected based on these data. However, there is no apparent pattern in the method or location most often varying from the others. Thus, this statistical test is inconclusive and further investigation is warranted.
Table 35-1 ANOVA: comparing methods and laboratories No.
F -test for bias
F -critical for bias
Difference
Bias
Sample 1
181
Sample 2
121
3.34
—
No
3.34
—
No
Sample 3
689
3.34
METHOD B-LAB 1 + METHOD B-LAB 2 vs. METHOD A-LAB 1 + METHOD A-LAB 2
Yes
Sample 4
328
3.24
METHOD A-LAB 1
Yes
Sample 5
1052
3.24
METHOD B-LAB 1 + METHOD A-LAB 2 vs. METHOD B-LAB 2 + METHOD A-LAB 1
Yes
Sample 6
2410
3.24
METHOD B-LAB 2
Yes
180
Chemometrics in Spectroscopy
ANOVA test comparisons (using ANOVA_s2 worksheet) Table 35-2 shows the ANOVA results comparing laboratories (i.e., different locations) performing the same METHOD B analytical procedure for analysis. This statistical test indicates that for the higher concentration spiked samples (i.e. 5 and 6 at 3.61 and 3.80% levels, respectively) a significant difference in reported average values occurred. However, Lab 1 was higher for Sample No. 5 and lower for Sample No.6 indicating no apparent trend in the analytical results reported for both labs, indicating that there is no systematic difference between labs using METHOD B. Table 35-3 illustrates the ANOVA results comparing laboratories (i.e., different loca tions) performing the same METHOD A for analysis. This statistical test indicates that for the mid-level concentration spiked samples (i.e. 4 and 4 at 3.40 and 3.61% levels, respectively) difference in reported average values occurred. However, this trend did not continue for the highest concentration sample (i.e., Sample No. 6) with a concentration of 3.80%. The Lab 1 was slightly lower in reported value for Samples 4 and 5. There is no significant systematic error observed between laboratories using the METHOD A. Table 35-4 reports ANOVA comparing the METHOD B procedure to the METHOD A procedure for combined laboratories. Thus the combined METHOD B analyses for each sample were compared to the combined METHOD A analyses for the same sample. This statistical test indicates whether there is a significant bias in the reported results for each method, irrespective of operator or location. An apparent trend is indicated using this statistical analysis, that trend being a positive bias for METHOD B as compared to
Table 35-2 ANOVA: comparing laboratories for METHOD B (Lab 1 vs. Lab 2) No. Sample Sample Sample Sample Sample Sample
Method 1 2 3 4 5 6
METHOD METHOD METHOD METHOD METHOD METHOD
B B B B B B
F -test for bias
F -critical for bias
Difference
Bias
0 098 199 00008 814 2091
532 532 532 532 532 532
— — — — 0.024 −0098
No
No
No
No
Yes
Yes
Table 35-3 ANOVA: comparing laboratories for METHOD A spectrophotometry (Lab 1 vs. Lab 2) No. Sample Sample Sample Sample Sample Sample
Method 1 2 3 4 5 6
METHOD METHOD METHOD METHOD METHOD METHOD
A A A A A A
F -test for bias
F -critical for bias
Difference
Bias
110 252 118 763 2952 153
5.99 5.99 5.99 5.32 5.32 5.32
— — — −0016 −0029 —
No
No
No
Yes
Yes
No
Collaborative Laboratory Studies: Part 2
181
Table 35-4 ANOVA: comparing methods for combined laboratories and operators, all Method B vs. all Method A No.
Method comparison
Sample 1
METHOD B vs. METHOD A
Sample 2
METHOD B vs. METHOD A
Sample 3
METHOD B vs. METHOD A
Sample 4
METHOD B vs. METHOD A
Sample 5 Sample 6
F -test for bias
F -critical for bias
Difference
Bias
505
4.49
0.024
Yes
193
4.49
—
No
4.49
0.041
Yes
706
4.41
0.019
Yes
METHOD B vs. METHOD A
007
4.41
—
No
METHOD B vs. METHOD A
1144
4.41
0.066
Yes
159
METHOD A. Thus METHOD B would be expected to report a higher level of analyte than METHOD A.
REFERENCES 1. Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991). 2. Draper, N. and Smith, H., Applied Regression Analysis (John Wiley & Sons, New York, 1981). 3. Zar, J.H., Biostatistical Analysis (Prentice Hall, Englewood Cliffs, NJ, 1974).
This page intentionally left blank
36 Collaborative Laboratory Studies: Part 3 – Testing for Systematic Error
TESTING FOR SYSTEMATIC ERROR IN A METHOD: COMPARISON TEST FOR A SET OF MEASUREMENTS VERSUS TRUE VALUE – SPIKED RECOVERY METHOD (COMPARET WORKSHEET) The Student’s (W.S. Gossett) t-test is useful for comparisons of the means and standard deviations of different analytical test methods. Descriptions of the theory and use of this statistic are readily available in standard statistical texts including those in the references [1–6]. Use of this test will indicate whether the differences between a set of measurement and the true (known) value for those measurements is statistically meaningful. For Table 36-1 a comparison of METHOD B test results for each of the locations is compared to the known spiked analyte value for each sample. This statistical test indicates that METHOD B results are lower than the known analyte values for Sample No. 5 (Lab 1 and Lab 2), and Sample No. 6 (Lab 1). METHOD B reported value is higher for Sample No. 6 (Lab 2). Average results for this test indicate that METHOD B may result in analytical values trending lower than actual values. For Table 36-2, a comparison of METHOD A results for each of the locations is made to the known spiked analyte value for each sample. This statistical test indicates that METHOD A results are lower than the known analyte values for Sample Nos. 4–6 for both Lab 1 and Lab 2. Average results for this test indicate that METHOD A is consistently lower than actual values.
Table 36-1 Comparison of METHOD B test results to true value Method–Location Sample Sample Sample Sample Sample Sample
4 4 5 5 6 6
METHOD METHOD METHOD METHOD METHOD METHOD
B–LAB B–LAB B–LAB B–LAB B–LAB B–LAB
1 2 1 2 1 2
t-test for bias
t-critical for bias
Difference
Bias
106 076 837 906 673 294
2776 2776 2776 2776 2776 2776
— — −0038 −0062 −0037 0061
No
No
Yes
Yes
Yes
Yes
184
Chemometrics in Spectroscopy
Table 36-2 Comparison of METHOD A results to true value Method–Location Sample Sample Sample Sample Sample Sample
4 4 5 5 6 6
METHOD METHOD METHOD METHOD METHOD METHOD
A–LAB A–LAB A–LAB A–LAB A–LAB A–LAB
1 2 1 2 1 2
t-test for bias
t-critical for bias
Difference
Bias
1952 90 598 60 684 707
2776 2776 2776 2776 2776 2776
−0036 −0018 −0069 −0036 −0058 −0050
Yes Yes Yes Yes Yes Yes
REFERENCES 1. MathCad; MathSoft, Inc.: 101 Main Street, Cambridge, MA 02142; Vol. v. 7.0 (1997). 2. Youden, W.J. and Steiner, E.H., Statistical Manual of the AOAC, 1st ed. (Association of Official Analytical Chemists, Washington, DC, 1975). 3. Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991). 4. Draper, N. and Smith, H., Applied Regression Analysis (John Wiley & Sons, New York, 1981). 5. Zar, J.H., Biostatistical Analysis (Prentice Hall, Englewood Cliffs, NJ, 1974). 6. Owen, D.B., Handbook of Statistical Tables (Addison-Wesley Publishing Co., Inc., Reading, MA, 1962).
37
Collaborative Laboratory Studies: Part 4 – Ranking Test
RANKING TEST FOR LABORATORIES AND METHODS (MANUAL COMPUTATIONS) The ranking test for laboratories provides for the calculation of individual ranks for each laboratory or method using the averaged results collected for all replicates and all methods/locations. The summary of averaged analytical results discussed in this series is shown in Table 37-1a. These compiled results are assigned ranks by column from the largest to the smallest reported analytical values. The largest analytical result in each column receives a score of 1, whereas the smallest result receives the largest number. When two results in a column are identical, a 0.5 is added to the rank number, and the subsequent number is not used. Note column 1 in Table 37-1a; both row 1 and row 2 have the identical value of 3.485 and are assigned 1.5 as rank score values. Note that rank 2 is not used due to the tie, and the lower analytical results are given ranks 3 and 4, respectively. The rows are summed resulting in a rank score as column #8, Table 37-1b. Table 37-1a Results table for ranking test Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
3.485 3.485 3.468 3.450
3.467 3.506 3.542 3.460
3.356 3.379 3.324 3.330
3.391 3.391 3.364 3.380
3.571 3.548 3.541 3.570
3.763 3.861 3.741 3.740
L1: METHOD B–LAB 1 L2: METHOD B–LAB 2 L3: METHOD A–LAB 1 L4: METHOD A–LAB 2
Table 37-1b Ranked results table
L1: METHOD B–LAB 1 L2: METHOD B–LAB 2 L3: METHOD A–LAB 1 L4: METHOD A–LAB 2 ∗
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Score∗
1.5
2
2
1.5
1
2
10
1.5
1
1
1.5
3
1
9
3
3
4
4
4
3
21
4
4
3
3
2
4
20
If an individual laboratory score is equal to or outside of the limit boundaries, then we conclude that there is a pronounced systematic error present between the laboratory, or laboratories, with the extreme score. In this particular case the limits are 8–22.
186
Chemometrics in Spectroscopy
Table 37-1c Approximate 5% two-tail limits for laboratory ranking Scores (from Ref. [1]) No. of locations/tests
Number of samples 3
4
5
6
7
8
9
10
3
—
4 12
5 15
7 17
8 20
10 22
12 24
13 27
4
—
4 16
6 19
8 22
10 25
12 28
14 31
16 34
5
—
5 19
7 23
9 27
11 31
13 35
16 38
18 42
6
3 18
5 23
7 28
10 32
12 37
15 41
18 45
21 49
7
3 21
5 27
8 32
11 37
14 42
17 47
20 52
23 57
8
3 24
6 30
9 36
12 42
15 48
18 54
22 59
25 65
9
3 27
6 34
9 41
13 47
16 54
20 60
24 66
27 73
10
4 29
7 37
10 45
14 52
17 60
21 67
26 73
30 80
The score values are compared to a statistical table of values found in reference [1]. This table is partially reproduced as Table 37-1c. If an individual laboratory score is equal to or outside of the limit boundaries, then we conclude that there is a pronounced systematic error present between the laboratory, or laboratories, with the extreme score. In this particular case the limits are 8 to 22, therefore there is no significant systematic error in the methods as determined using this test.
REFERENCE 1. Youden, W.J. and Steiner, E.H., Statistical Manual of the AOAC, 1st ed. (Association of Official Analytical Chemists, Washington, DC, 1975).
38
Collaborative Laboratory Studies: Part 5 – Efficient
Comparison of Two Methods
COMPUTATIONS FOR EFFICIENT COMPARISON OF TWO METHODS (COMP_METH WORKSHEET) The section following shows a statistical test (text for the Comp_Meth MathCad Work sheet) for the efficient comparison of two analytical methods. This test requires that replicate measurements be made on two different samples using two different analyt ical methods. The test will determine whether there is a significant difference in the precision and accuracy for the two methods. It will also determine whether there is sig nificant systematic error between the methods, and calculate the magnitude of that error (as bias). This efficient statistical test requires the minimum data collection and analysis for the comparison of two methods. The experimental design for data collection has been shown graphically in Chapter 35 (Figure 35-2), with the numerical data for this test given in Table 38-1. Two methods are used to analyze two different samples, with approximately five replicate measurements per sample as shown graphically in the previously mentioned figure. The analytical results can immediately be plotted using the Youden/Steiner twosample graphic shown in Figure 38-1. This graphic gives a rapid method for visually determining if the reported analytical values contain systematic error. The presence of systematic error is indicated by the occurrence of two-sample plot points that are found in the lower left, and upper right quadrants of the charts. The presence of points in these quadrants indicates that high analyte value samples are biased to the high end, and low analyte containing samples are biased to the low end. Analytical methods not exhibiting systematic (bias) errors should have randomly distributed two-sample plot points throughout all the quadrants of the chart. Figure 38-1 gives an indication that METHOD A has a negative bias; and METHOD B is more random. However, the range of the axes is much lower for Method A indicating that the overall bias is quite small, and significantly less than Method B. The calculations for the efficient two-method comparison are shown in Table 38-2 and the subsequent equations following. The mathematical expressions are given in MathCad symbolic notation showing that the difference is taken for each replicate set of X and Y and the mean is computed. Then the sum for each replicate set of X and Y is calculated and the mean is computed. The difference in the sums is computed (as d) and the differences are summed and reported as an absolute value (as �d). The mean difference is calculated as mean(d). Each X and Y result contains the systematic error of the analytical method for its respective laboratory, noting that the systematic error is assumed to be identical for
188
Chemometrics in Spectroscopy
Table 38-1 Analytical data entry for comparison of two methods tests METHOD A
METHOD B
Sample X
Sample Y
Sample X
Sample Y
3.366 3.380 3.360 3.380
3.741 3.740 3.740 3.760
3.421 3.407 3.377 3.400
3.764 3.860 3.742 3.833
3.372
3.745
3.401
3.800
Mean
METHOD A:
METHOD B:
3.9
3.905
3.9
+ +
mean(BY )
mean(AY )
3.8
BY
AY
+++ ++ 3.7
3.35
3.8
+++
+ +
+
+ 3.4
3.45
3.7
mean(AX ) . AX
3.35
3.4
3.45
mean(BX ) . BX
Figure 38-1 Two-sample charts illustrating systematic errors for Methods A vs. B.
Table 38-2 Calculations for comparison tests METHOD A:
METHOD B:
ADxy �= �AX − AY� mean�ADxy� = 0�374 ATxy �= �AX + AY� mean�ATxy� = 7�117
BDxy �= �BX − BY� mean�BDxy� = 0�399 BTxy �= �BX + BY� mean�BTxy� = 7�201
d � ATxy − BTxy � d = 0�337 Mean Difference: mean�d� = 0�084 d2 �= BTxy − ATxy
X and Y for each method. When the difference between X and Y is calculated (as d) the systematic error drops out so that the difference (d) between X and Y contains no systematic errors, only random errors. We then estimate the precision by using the difference quantities. The difference between the true analyte concentrations of X and Y represents the true analyte difference between X and Y without the systematic error, but
Collaborative Laboratory Studies: Part 5
189
with the random errors. The relative precision between the two methods is calculated using Table 38-2 and equations 38-1 and 38-2. The F-statistic used to compare the sizes of the Method A vs. Method B precision values is given by equation 38-5 and is compared to the F-statistic table value (equation 38-7). The null (Ho ) hypothesis states that there is no difference in the precision of the two methods; whereas the alternate hypothesis (Ha ) indicates that there is a difference in the precision. For the methods compared in this study there is a significantly larger precision for METHOD B as compared to METHOD A. Method A precision is 0.007, whereas Method B precision is 0.037 representing a 5.3 factor increase. When summing the X and Y values, the systematic contribution is found twice. The two used in the denominator is indicative of the error contribution from each independent set of results (i.e., X and Y ). Given independent random errors only, the standard deviation of the sum of two measurements X and Y would be identical to the standard deviation of the differences between the two measurements X and Y . In the absence of any systematic error, Sr2 and Sd2 estimate the same standard deviation. In the presence of systematic error, Sd2 is large compared to Sr2. The larger the Sd2, the greater is the systematic error contribution. The relative systematic error between the two methods is calculated using Table 38-2, and equations 38-3 and 38-4. The F -statistic is used to compare the sizes of the Method A vs. Method B systematic error values and is given by equation 38-6; and is compared to the F -statistic table value (equation 38-7). The null (Ho ) hypothesis states that there is no difference in the systematic error found in the two methods; whereas the alternate hypothesis (Ha ) indicates that there is a difference in the size of the systematic error. For the methods compared in this study there is a significantly larger systematic error for METHOD B as compared to METHOD A. The test to determine whether the bias is significant incorporates the Student’s t-test. The method for calculating the t-test statistic is shown in equation 38-10 using MathCad symbolic notation. Equations 38-8 and 38-9 are used to calculate the standard deviation of the differences between the sums of X and Y for both analytical methods A and B, whereas equation 38-10 is used to calculate the standard deviation of the mean. The t-table statistic for comparison of the test statistic is given in equations 38-11 and 38-12. The F -statistic and t-statistic tables can be found in standard statistical texts such as references [1–3]. The null hypothesis (Ho ) states that there is no systematic difference between the two methods, whereas the alternate hypothesis (Ha ) states that there is a significant systematic difference between the methods. It can be seen from these results that the bias is significant between these two methods and that METHOD B has results biased by 0.084 above the results obtained by METHOD A. The estimated bias is given by the Mean Difference calculation.
Measuring the Precision and Standard Deviation of the Methods (Youden/Steiner) Note that for the calculations of precision and standard deviation (equations 38-1 through 38-4), the numerator expression is given as 2�n − 1�. This expression is used due to the 2 times error contribution from independent errors found in each independent set (i.e., X and Y ) of results.
190
Chemometrics in Spectroscopy
Precision (Sr) � �� −−−−−−−−−−−−−−−−−−−→ � �−−−−−1−−−−−−� � · �ADxy − mean�ADxy��2 ASr �= 2 · �nY − 1�
(38-1)
ASr = 6�692658 · 10−3
� �−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ � �� � 1 � 2 · �BDxy − mean�BDxy�� BSr �= 2 · �nY − 1�
(38-2)
BSr = 0�037334 Standard deviation (Sd) � �−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ � �� � 1 � · �ATxy − mean�ATxy��2 ASd �= 2 · �nY − 1�
(38-3)
ASd = 0�012428
� �−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ � �� � 1 � 2 BSd �= · �BTxy − mean�BTxy�� 2 · �nY − 1�
(38-4)
BSd �= 0�045387 F -statistic calculation �Fs � for precision ratio Sr2 Ratio: PFs �=
B2 Sr A2 Sr
(38-5)
PFs = 31�118 Ho : If Fs is less than or equal to Ft , then there is NO DIFFERENCE in Precision
estimation.
Ha : If Fs is greater than Ft , then there is a DIFFERENCE in Precision estimation.
F -statistic calculation (Fs ) for presence of systematic errors Sd2 Ratio: SF s �=
B2 Sd A2 Sd
SF s = 13�337
(38-6)
Collaborative Laboratory Studies: Part 5
191
Ho : If Fs is less than or equal to Ft , then there is NO DIFFERENCE in systematic error. Ha : If Fs is greater than Ft , then there is a DIFFERENCE in systematic error. F -statistic table value �Ft � df 1 � = nY − 1 df 1 = 3 qF�0�95� df 1 � df 1 � = 9�277
(38-7)
Student’s t-test for the difference in the biases between two methods
Mean Difference: mean�d� = 0�084
� �− −−−−−−−−−−−−−−−−−−→ � ��−−− � 1 � 2 s �= · �d2 − mean�d� � �df 1 �
(38-8)
s = 0�053
s
sm �= √ nY
(38-9)
sm = 0�026 Calculate t-test statistic: Te �=
mean�d� sm
Te = 3�201
(38-10)
Enter alpha value as a2: �2 �= �95
Calculate t-table value:
�1 �=
�2 + 1
2
(38-11)
�1 = 0�975 t �= qt��1 � df 1 � t-table value� t = 3�182
(38-12)
192
Chemometrics in Spectroscopy
Ho : If Te is less than or equal to t-table value, then there is NO SYSTEMATIC DIF
FERENCE between method results.
Ha : If Te is greater than t, then there is a SYSTEMATIC DIFFERENCE (BIAS) between
method results.
SUMMARY This set of articles presents the computational details and actual values for each of the statistical methods shown for collaborative tests. These methods include the use of precision and estimated accuracy comparisons, ANOVA tests, Student’s t-testing, The Rank Test for Method Comparison, and the Efficient Comparison of Methods tests. From using these statistical tests the following conclusions can be derived: 1. Both analytical methods are quite precise and accurate, therefore the production samples are below target value concentration. 2. The precision for METHOD B is significantly larger than METHOD A, indicating METHOD A is more precise than METHOD B. 3. There is no correlation of analytical error with concentration over the range tested (i.e., 3.40–3.80% analyte). 4. Analytical results comparing METHOD B and METHOD A will show significant variation due to the high precision of both analytical methods. 5. There is no operator/laboratory bias between labs for METHOD B. 6. There is no operator/laboratory bias between labs for METHOD A. 7. There is a significant bias between METHOD B and METHOD A; METHOD B yields higher results. 8. Both METHOD B and METHOD A results trend lower than actual values, but by small quantities (approximately −0.04% at the target value of 3.60%). 9. The laboratory ranking test did not show any laboratory or method outside of confidence limits, therefore neither method nor laboratory is consistently high or low in reported results. 10. METHOD B precision is a factor of 5.3 times greater than that of METHOD A. 11. The systematic error contribution is larger for METHOD B than METHOD A. 12. METHOD B is biased to +0.084 as compared to METHOD A.
ACKNOWLEDGEMENT The real analytical data used for Chapters 34–38 was graciously provided by Dan Devine of Kimberly-Clark Analytical Science & Technology.
REFERENCES 1. MathCad; MathSoft, Inc.: 101 Main Street, Cambridge, MA 02142 (1997). 2. Mark, H. and Workman, J., Spectroscopy 10(1), 17–20 (1995). 3. Mark, H. and Workman, J., Spectroscopy 4(7), 53–54 (1989).
39
Collaborative Laboratory Studies: Part 6 – MathCad
Worksheet Text
The MathCad worksheets used for this Chemometrics in Spectroscopy collaborative study series are given below in hard copy format. Unless otherwise noted, the worksheets have been written by the authors. The text files for the MathCad v7.0 Worksheets used for the statistical tests in this report are attached as Collabor_GM, Collabor_TV, ANOVA_s4, ANOVA_s2, CompareT, and Comp_Meth. References [1–11] are excellent sources of information of the details on these statistical methods. Collabor_GM
Collaborative Test Worksheet -------------------------
RAW DATA ENTRY: X01
X05
X09
3.51 3.46 3.47 3.50 3.49 3.48 3.45 3.46 3.46 3.48 3.37 3.36 3.35 3.35 3.35
X02
X06
X10
3.51 3.50 3.50 3.47 3.45 3.50 3.66 3.47 3.45 3.45 3.37 3.33 3.39 3.43 3.38
X03
X07
X11
3.46 3.44 3.46 3.52 3.46 3.45 3.45 3.46 3.46 3.46 3.32 3.33 3.33 3.32 3.32
X04
3.46 3.44 3.45
X08
3.46 3.47 3.45
X12
3.34 3.32 3.34
Mean values for Data:
n01:=rows(X01) n02:=rows(X02) n03:=rows(X03) n04:=rows(X04) mean(X01) mean(X02) mean(X03) mean(X04)
= = = =
3.485 3.485 3.468 3.45
n05:=rows(X05) n06:=rows(X06) n07:=rows(X07) n08:=rows(X08) mean(X05) mean(X06) mean(X07) mean(X08)
= = = =
3.467 3.506 3.452 3.46
n09:=rows(X09) n10:=rows(X10) n11:=rows(X11) n12:=rows(X12) mean(X09) mean(X10) mean(X11) mean(X12)
= = = =
3.356 3.379 3.324 3.3303
194
Chemometrics in Spectroscopy
--------------------------------------------------------
GRAND MEANS FOR EACH ROW (USE IF NO “TRUE VALUE” IS AVAILABLE): GM1 �=
�mean�X01� + mean�X02� + mean�X03� + mean�X04�� 4
GM2 �=
�mean�X05� + mean�X06� + mean�X07� + mean�X08�� 4
GM3 �=
�mean�X09� + mean�X10� + mean�X11� + mean�X12�� 4
GRAND MEANS FOR EACH ROW: GM1 = 3�472 GM2 �= 3�47115 GM3 �= 3�347433
COMPUTATIONS FOR PRECISION AND ACCURACY: Precision:
−−−−−−−−−−−−−−−−→
−−−−1−−−−− 2 · �X01 − mean�X01�� SDp�X01� �= n01 − 1 −−−−−−−−−−−−−−−−−−−−−−−−−→ 1 SDp�X02� �= · �X02 − mean�X02��2 n02 − 1 SDp�X01� = 0.02 SDp�X02� = 0.025
−−−−−−−−−−−−−−−−−−−−−−−−−→ 1 2 SDp�X03��= · �X03−mean�X03�� n03 − 1 −−−−−−−−−−−−−−−−−−−−−−−−−→ 1 SDp�X04��= · �X04−mean�X04��2 n04 − 1
Collaborative Laboratory Studies: Part 6
SDp�X03� = 8.888 ·10 –3 SDp�X04� = 8.888 ·10 –3
−−−−−−−−−−−−−−−−→
−−−−1−−−−− SDp�X05��= · �X05−mean�X05��2 n05 − 1
−−−−−−−−−−−−−−−−−−−−−−−−→
− 1 2 SDp�X06��= · �X06−mean�X06�� n06 − 1 SDp�X05� = 0.013 SDp�X06� = 0.088
−−−−−−−−−−−−−−−−→
−−−−1−−−−− 2 SDp�X07��= · �X07−mean�X07�� n07 − 1
−−−−−−−−−−−−−−−−−−−−−−−−−→
1 SDp�X08��= · �X08−mean�X08��2 n08 − 1 SDp�X07� = 6.557 ·10 –3 SDp�X08� = 0.01
− −−−−−−−−−−−−−−−−→
−−−1−−−−− 2 SDp�X09��= · �X09−mean�X09�� n09 − 1
−−−−−−−−−−−−−−−−−−−−−−−−→
− 1 SDp�X10��= · �X10−mean�X10��2 n10 − 1 SDp�X09� = 7.918 ·10 –3 SDp�X10� = 0.037
− −−−−−−−−−−−−−−−−→
−−−1−−−−− SDp�X11��= · �X11−mean�X11��2 n11 − 1
195
196
− −−−−−−−−−−−−−−−−→ −−−1−−−−− 2 · �X12−mean�X12�� SDp�X12��= n12 − 1 SDp�X12� = 0.012 SDp�X11� = 6.812 ·10 –3
Accuracy: −−−−−−−−−−−−−−−−−−−−→ − 1 · �X01 − GM1�2 SDa�X01� �= n01 − 1 −−−−−−−−−−−−−−−−−−−−−→ 1 2 SDa�X02� �= · �X02 − GM1� n02 − 1 SDa�X01� = 0.025 SDa�X02� = 0.029 −−−−−−−−−−−−−−−−−−−−→ − 1 2 SDa�X03� �= · �X03 − GM1� n03 − 1 −−−−−−−−−−−−−−−−−−−−→ − 1 SDa�X04� �= · �X04 − GM1�2 n04 − 1 SDa�X04� = 0.029 SDa�X03� = 0.029 −−−−−−−−−−−−−−−−−−−−→ − 1 SDa�X05� �= · �X05 − GM2�2 n05 − 1 −−−−−−−−−−−−−−−−−−−−−→ 1 2 SDa�X06� �= · �X06 − GM2� n06 − 1 SDa�X05� = 0.014 SDa�X06� = 0.096
Chemometrics in Spectroscopy
Collaborative Laboratory Studies: Part 6
197
−−−−−−−−−−−−→
−−−−1−−−−− 2 SDa�X07��= · �X07 − GM2� n07 − 1
−−−−−−−−−−−−−−−−−−−−→
− 1 SDa�X08��= · �X08 − GM2�2 n08 − 1 SDa�X07� = 0.031 SDa�X08� = 0.017
− −−−−−−−−−−−−→
−−−1−−−−− SDa�X09��= · �X09 − GM3�2 n09 − 1
−−−−−−−−−−−−−−−−−−−−−→
1 2 SDa�X10��= · �X10 − GM3� n10 − 1 SDa�X09� = 0.012 SDa�X10� = 0.051
− −−−−−−−−−−−−→
−−−1−−−−− 2 SDa�X11� �= · �X11 − GM3� n11 − 1
−−−−−−−−−−−−→
−−−−1−−−−− SDa�X12� �= · �X12 − GM3�2 n12 − 1 SDa�X11� = 0.037 SDa�X12� = 0.024
Pooled Standard Deviations (As Precision): Row 1: SpR1� = − −−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−→ −
−−−−−−−−−−−−−−−−→ −
−
�X01 − mean�X01��2 + �X02 − mean�X02��2 + �X03 − mean�X03��2 + �X04 − mean�X04��2 n01 + n02 + n03 + n04 − 4 SpR1 = 0.0231474
198
Chemometrics in Spectroscopy
Row 2: SpR2� = −−−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→
�X05 − mean�X05��2 + �X06 − mean�X06��2 + �X07 − mean�X07��2 + �X08 − mean�X08��2 n05 + n06 + n07 + n08 − 4 SpR2 = 0.0478817
Row 3: SpR3� = −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→
−
�X09 − mean�X09��2 + �X10 − mean�X10��2 + �X11 − mean�X11��2 + �X12 − mean�X12��2 n09 + n10 + n11 + n12 − 4 SpR3 = 0.021
Pooled Standard Deviations (As Accuracy): Row 1: − − − − − −
−−−−−−−−−−−→
−−−−−−−−−−−→
−−−−−−−−−−−→ −
−−−−−−−−−−−→ − �X01 − GM1�2 + �X02 − GM1�2 + �X03 − GM1�2 + �X04 − GM1�2 SpR1� = n01 + n02 + n03 + n04 − 4 SpR1 = 0.0277715
Row 2: −−−−−−−−−−−−−→ − − − − −
−−−−−−−−−−−→
−−−−−−−−−−−→ −
−−−−−−−−−−−→
�X05 − GM2�2 + �X06 − GM2�2 + �X07 − GM2�2 + �X08 − GM2�2 SpR2� = n05 + n06 + n07 + n08 − 4 SpR2 = 0.0537719
Row 3: −−−−−−−−−−−−→ − − − − −
−−−−−−−−−−−→
−−−−−−−−−−−→ −
−−−−−−−−−−−→ − �X09 − GM3�2 + �X10 − GM3�2 + �X11 − GM3�2 + �X12 − GM3�2 SpR3� = n09 + n10 + n11 + n12 − 4 SpR3 = 0.033
Collaborative Laboratory Studies: Part 6
199
Measuring Precision without Duplicates (Youden/Steiner): ------------------------------------------------
RAW DATA ENTRY (Enter single Determinations for Sample X from different laboratories or operators): Sample X LAB LAB LAB LAB
#1 #2 #3 #4
X:=
3.51 3.51 3.46 3.46
nX: = rows(X) mean(X) = 3.484 (Enter single Determinations for Sample Y from different laboratories or operators): Sample Y LAB LAB LAB LAB
#1 #2 #3 #4
Y: =
3.48 3.50 3.45 3.46
nY: = rows(X) mean(Y) = 3.47 3.5
mean(Y)
3.48
Y 3.46
3.44
3.44 3.46 3.48 3.5 mean(X), X
3.52
Two-sample Chart Illustrating systematic errors
200
Chemometrics in Spectroscopy
CALCULATIONS: Dxy �=�X − Y� Txy �=�X + Y� mean�Dxy� = 0.014 mean�Txy� = 6.955
Precision (Sr):
− −−−−−−−−−−−−−−−−−−−−−−→
−−−−− 1 Sr �= · �Dxy − mean�Dxy��2 2 · �nY − 1� Sr = 8.276473 ·10 –3
Measuring the Standard Deviation of the Data (Youden/Steiner): -----------------------------------------------------
Standard Deviation (Sd):
−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
1 2 · �Txy − mean�Txy�� Sd �= 2 · �nY − 1� Sd = 0.033653
Statistical Test for presence of systematic errors (Youden/Steiner):
------------------------------------------------------
F-statistic Calculation (Fs):
Fs �=
Sd2 Sr2
Fs = 16.533
F-statistic Table Value (Ft): df1 �= nY − 1 df1 = 3 qF�0.95,df1� df1� = 9.277
Collaborative Laboratory Studies: Part 6
201
Test Criteria: If Fs is less than or equal to Ft, then there is NO SYSTEMATIC ERROR
If Fs is greater than Ft, then there is SYSTEMATIC ERROR (BIAS)
Standard Deviation estimate for the distribution of systematic errors (Sb2):
2
Sd − Sr2
Sb2�=
2 Sb2 = 5.32 ·10–4
202
Chemometrics in Spectroscopy
Collabor_TV
Collaborative Test Worksheet -------------------------
RAW DATA ENTRY: X01
X05
X09
3.42 3.38 3.40 3.38 3.38 3.56 3.57 3.56 3.58 3.59 3.76 3.74 3.77 3.77 3.77
X02
X06
X10
3.41 3.40 3.42 3.35 3.38 3.54 3.55 3.57 3.53 3.54 3.86 3.83 3.93 3.87 3.81
X03
X07
X11
3.37 3.36 3.36 3.36 3.37 3.54 3.54 3.54 3.54 3.54 3.74 3.74 3.74 3.74 3.74
X04
X08
X12
3.38 3.38 3.38 3.38 3.38 3.56 3.58 3.59 3.58 3.56 3.74 3.76 3.73 3.77 3.75
Mean Values for Data Rows:
n01:=rows( X01) n02:=rows(X02) n03:=rows(X03) n04:=rows(X04) mean(X01) mean(X02) mean(X03) mean(X04)
= = = =
3.391 3.391 3.364 3.38
n05:= rows(X05) n06:=rows(X06) n07:=rows(X07) n08:=rows(X08) mean(X05) mean(X06) mean(X07) mean(X08)
= = = =
3.571 3.548 3.541 3.574
n09:=rows(X09) n10:=rows(X10) n11:=rows(X11) n12:=rows(X12) mean(X09) mean(X10) mean(X11) mean(X12)
= = = =
3.763 3.861 3.741 3.75
----------------------------------------------------------
ENTER TRUE VALUES FOR EACH ROW (SPIKED RECOVERY SAMPLES): TR1:=3.40
TR1:=3.61
TR1:=3.80
COMPUTATIONS FOR PRECISION AND ACCURACY: Precision: −−−−−−−−−−−−−−−−→ −−−−1−−−−− SDp�X01� �= · �X01 − mean�X01��2 n01 − 1
Collaborative Laboratory Studies: Part 6
−−−−−−−−−−−−−−−−→ −−−−1−−−−− 2 SDp�X02� �= · �X02 − mean�X02�� n02 − 1 SDp�X01� = 0.019 SDp�X02� = 0.025 −−−−−−−−−−−−−−−−−−−−−−−−−→ 1 2 SDp�X03� �= · �X03 − mean�X03�� n03 − 1 −−−−−−−−−−−−−−−−−−−−−−−−−→ 1 SDp�X04� �= · �X04 − mean�X04��2 n04 − 1 SDp�X03� = 0 SDp�X04� = 0 −−−−−−−−−−−−−−−−→ −−−−1−−−−− SDp�X05��= · �X05 − mean�X05��2 n05 − 1 −−−−−−−−−−−−−−−−→ −−−−1−−−−− 2 SDp�X06��= · �X06 − mean�X06�� n06 − 1 SDp�X05� = 0.01 SDp�X06� = 0.015 −−−−−−−−−−−−−−−−→ −−−−1−−−−− 2 SDp�X07��= · �X07 − mean�X07�� n07 − 1 −−−−−−−−−−−−−−−−→ −−−−1−−−−− SDp�X08��= · �X08 − mean�X08��2 n08 − 1 SDp�X07� = 2.588 ·10–3 SDp�X08� = 0.013
203
204
−−−−−−−−−−−−−−−−→ −−−−1−−−−− 2 SDp�X09��= · �X09 − mean�X09�� n09 − 1 −−−−−−−−−−−−−−−−−−−−−−−−→ − 1 SDp�X10��= · �X10 − mean�X10��2 n10 − 1 SDp�X09� = 0.012 SDp�X10� = 0.047 −−−−−−−−−−−−−−−−→ −−−−1−−−−− 2 SDp�X11��= · �X11 − mean�X11�� n11 − 1 −−−−−−−−−−−−−−−−−−−−−−−−−→ 1 SDp�X12��= · �X12 − mean�X12��2 n12 − 1 SDp�X11� = 1.924 ·10 –3 SDp�X12� = 0.016
Accuracy: −−−−−−−−−−−→ −−−−1−−−−− SDa�X01� � = · �X01 − TR1�2 n01 − 1 −−−−−−−−−−−→ −−−−1−−−−− 2 SDa�X02� � = · �X02 − TR1� n02 − 1 SDa�X01� = 0.022 SDa�X02� = 0.027 − −−−−−−−−−−−→ −−−1−−−−− SDa�X03� � = · �X03 − TR1�2 n03 − 1 −−−−−−−−−−−→ −−−−1−−−−− SDa�X04� � = · �X04 − TR1�2 n04 − 1 SDa�X04� = 0.022 SDa�X03� = 0.041
Chemometrics in Spectroscopy
Collaborative Laboratory Studies: Part 6
−−−−−−−−−−−→ −−−−1−−−−− 2 · �X05 − TR2� SDa�X05� � = n05 − 1 −−−−−−−−−−−−−−−−−−−−→ 1 2 SDa�X06� � = · �X06 − TR2� n06 − 1 SDa�X05� = 0.044 SDa�X06� = 0.071 −−−−−−−−−−−→ −−−−1−−−−− 2 SDa�X07� � = · �X07 − TR2� n07 − 1 −−−−−−−−−−−−−−−−−−−−→ 1 2 SDa�X08� � = · �X08 − TR2� n08 − 1 SDa�X07� = 0.077 SDa�X08� = 0.042 −−−−−−−−−−−→ −−−−1−−−−− 2 SDaX09 � = · �X09 − TR3� n09 − 1 −−−−−−−−−−−−−−−−−−−→ − 1 SDa�X10� � = · �X10 − TR3�2 n10 − 1 SDa�X09� = 0.043 SDa�X10� = 0.083 −−−−−−−−−−−→ −−−−1−−−−− 2 SDa�X11� � = · �X11 − TR3� n11 − 1 −−−−−−−−−−−−−−−−−−−→ − 1 SDa�X12� � = · �X12 − TR3�2 n12 − 1 SDa�X11� = 0.066 SDa�X12� = 0.058
205
206
Chemometrics in Spectroscopy
Pooled Standard Deviations (As Precision): Row 1: SpR1 �= −−−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→
�X01 − mean�X01��2 + �X02 − mean�X02��2 + �X03 − mean�X03��2 + �X04 − mean�X04��2 n01 + n02 + n03 + n04 − 4 SpR1 = 0.0159961
Row 2: SpR2 �= −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→
−
�X05 − mean�X05��2 + �X06 − mean�X06��2 + �X07 − mean�X07��2 + �X08 − mean�X08��2 n05 + n06 + n07 + n08 − 4 SpR2 = 0.0114967
Row3: SpR3 �= −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→ − −−−−−−−−−−−−−−−−→
−
�X09 − mean�X09��2 + �X10 − mean�X10��2 + �X11 − mean�X11��2 + �X12 − mean�X12��2 n09 + n10 + n11 + n12 − 4 SpR3 = 0.025
Pooled Standard Deviations (As Accuracy): Row 1: − −−−−−−−−−−−→ −−−−−−−−−−−→ − −−−−−−−−−−−→ − −−−−−−−−−−−→ −
�X01 − TR1�2 + �X02 − TR1�2 + �X03 − TR1�2 + �X04 − TR1�2 SpR1 �= n01 + n02 + n03 + n04 − 4 SpR1 = 0.0289623
Row2: − −−−−−−−−−−−→ −−−−−−−−−−−→ − −−−−−−−−−−−→ −
−−−−−−−−−−−→ −
�X05 − TR2�2 + �X06 − TR2�2 + �X07 − TR2�2 + �X08 − TR2�2 SpR2 �= n05 + n06 + n07 + n08 − 4 SpR2 = 0.0608954
Collaborative Laboratory Studies: Part 6
207
Row 3: − −−−−−−−−−−−→ − −−−−−−−−−−−→ − −−−−−−−−−−−→ −−−−−−−−−−−→ −
2 2 + �X11 − TR3�2 + �X12 − TR3�2 �X09 − TR3� + �X10 − TR3� SpR3 �= n09 + n10 + n11 + n12 − 4 SpR3 = 0.064
Measuring Precision without Duplicates (Youden/Steiner):
-----------------------------------------------
RAW DATA ENTRY
(Enter single Determinations for Sample X from different laboratories or
operators):
Sample X LAB LAB LAB LAB
#1 #2 #3 #4
X:=
3.42 3.41 3.37 3.38
nX� = rows�X� mean�X� = 3�394 (Enter single Determinations for Sample Y from different laboratories or operators): Sample Y LAB LAB LAB LAB
#1 #2 #3 #4
Y:=
nY� = rows�Y� mean�Y� = 3�551
CALCULATIONS: Dxy�= �X − Y� Txy�= �X + Y� mean�Dxy� = −0�157 mean�Txy� = 6�944
3.56 3.54 3.54 3.56
208
Chemometrics in Spectroscopy
3.56
mean(Y) 3.55
Y
3.54
3.53
3.36 3.38
3.4
3.42
3.44
mean(X), X
Two-sample Chart illustrating systematic errors
Precision (Sr):
− −−−−−−−−−−−−−−−−−−−−−−→
−−−−− 1 2 Sr �= · �Dxy − mean�Dxy�� 2 · �nY − 1� Sr = 0.015805
Measuring the Standard Deviation of the Data (Youden/Steiner): -----------------------------------------------------
Standard Deviation (Sd):
−−−−−−−−−−−−−−−−−−−−−−−−−−−→
− 1 · �Txy − mean�Txy��2 Sd�= 2 · �nY − 1� Sd = 0�023765
Statistical Test for presence of systematic errors (Youden/Steiner): ------------------------------------------------------
F-statistic Calculation (Fs): Fs�=
Sd2 Sr2
Fs = 2.261
Collaborative Laboratory Studies: Part 6
209
F-statistic Table Value (Ft): df1� = nY − 1 df1 = 3 qF�0�95� df1� df1� = 9�277 If Fs is less than or equal to Ft, then there is NO SYSTEMATIC ERROR If Fs is greater than Ft, then there is SYSTEMATIC ERROR (BIAS)
Standard Deviation estimate for the distribution of systematic errors (Sb2):
2
Sd − Sr2
Sb2�=
2 Sb2 = 1.575 · 10−4
210
Chemometrics in Spectroscopy
ANOVA_s4
ANOVA (Analysis of Variance) Test -------------------------------------------------------This Worksheet demonstrates using Mathcad’s F distribution function and programming
operators to conduct an analysis of variance (ANOVA) test.
Enter sample data used in test:
An element of D represents the data collected with a particular factor.
Data Entry:
D0
3.421
3.407
3.366
3.380
3.377
3.400
3.360
3.380
3.399
D1
3.417
D2
3.361
D3
3.380
3.379
3.353
3.362
3.380
3.379
3.380
3.370
3.380
Enter level of significance a: � � = 0�05
Collaborative Laboratory Studies: Part 6
211
Program for conducting ANOVA test:
ANOVA( D , α )
n total SX
0
0
SX2 0 T
0
for i ∈ 0 .. last ( D ) SDi
Di
nDi
length Di
SX
SX
SDi Di .Di
SX2 SX2 2
T
SDi
T
nDi n total
n total
nDi 2
SS factor
SX
T
n total
SS error
SX2 T
SS total
SX2
2
SX
n total
df factor
length ( D )
1
df error
n total
length ( D )
df total
n total
1
SS factor df factor Analysis 0
SS error
df error
SS total
df total
Analysis 0 Analysis 1
Analysis 0
df factor SS error df error 0
0,2 1,2
α , df factor , df error
Analysis 2
qF 1
Analysis 3
Analysis 1 < Analysis 2
Analysis
SS factor
212
Chemometrics in Spectroscopy
Calculate Mean Values: mean�D0 � = 3�391
mean�D1 � = 3�3914
mean�D2 � = 3�3638
mean�D3 � = 3�38
Conducting an analysis of variance:
For a given set of grouped data D and level of significance a:
⎡
⎤ �3� 3�
⎢ 3�281 ⎥
⎥ ANOVA�D� �� = ⎢ ⎣ 3�239 ⎦
0
The ANOVA table: ⎡
SS 2�519 · 10−3
⎢ ⎢ −3 ANOVA�D� ��0 = ⎢ ⎢ 4�094 · 10 ⎣ 6�613 · 10−3
df MS ⎤ 3 8�396 · 10−4 Between Groups ⎥ ⎥ −4 ⎥ 16 2�559 · 10 ⎥ Within Groups ⎦ Total 19 0
The Calculated F statistic: ANOVA�D� ��1 = 3�281485
The critical F Statistic: ANOVA�D� ��2 = 3�238872
The hypothesis test conclusion at the specified level of significance: ANOVA�D� ��3 = 0 0 = reject hypothesis – there is a significant difference 1 = accept hypothesis – there is not a significant difference
Collaborative Laboratory Studies: Part 6
213
ANOVA_s2
ANOVA (Analysis of Variance) Test -----------------------------This Worksheet demonstrates using Mathcad’s F distribution function and programming
operators to conduct an analysis of variance (ANOVA) test.
Enter sample data used in test:
An element of D represents the data collected with a particular factor.
Data Entry:
D0
3.421
3.366
3.377
3.360
3.399
D1
3.361
3.379
3.362
3.379
3.370
Enter level of significance a: � � = 0�05
214
Chemometrics in Spectroscopy
Program for conducting ANOVA test:
ANOVA( D, α )
n total SX
0
0
SX2 0 T
0
for i ∈ 0.. last ( D ) SDi
Di
nDi
length Di
SX
SX
SDi
SX2 SX2
Di Di 2
T
SDi
T
nDi n total
n total
nDi 2
SS factor
SX
T
n total
SS error
SX2 T
SS total
SX2
2
SX
n total
df factor
length ( D )
1
df error
n total
length ( D )
df total
n total
1
SS factor df factor Analysis 0
SS error
df error
SS total
df total
Analysis 0 Analysis 1
Analysis 0
df factor SS error df error 0
0,2 1,2
α , df factor , df error
Analysis 2
qF 1
Analysis 3
Analysis 1 < Analysis 2
Analysis
SS factor
Collaborative Laboratory Studies: Part 6
215
Calculate Mean Values: mean �D0 � = 3�391 mean �D1 � = 3�3638
Conducting an analysis of variance: For a given set of grouped data D and level of significance a: ⎡
⎤ �3� 3�
⎢ 9�755 ⎥
⎥ ANOVA�D� �� = ⎢ ⎣ 5�318 ⎦
0
The ANOVA table: ⎡
SS 1�85 · 10−3
⎢ ⎢ −3 ANOVA�D� ��0 = ⎢ ⎢ 1�517 · 10 ⎣ 3�366 · 10−3
df MS ⎤ 1 1�85 · 10−4 ⎥ ⎥ 8 1�896 · 10−4 ⎥ ⎥ ⎦ 9 0
Between Groups Within Groups Total
The Calculated F statistic: ANOVA�D� ��1 = 9�755274
The critical F Statistic: ANOVA�D� ��2 = 5�317655
The hypothesis test conclusion at the specified level of significance: ANOVA�D� ��3 = 0 0 = reject hypothesis – is a significant difference 1 = accept hypothesis – is not a significant difference
216
Chemometrics in Spectroscopy
CompareT
Comparison Test for a Set of Measurements Vs. True Value -------------------------------------------------
DATA ENTRY: X1:=
5.10 5.20 5.30 5.10 5.00
n�=rows�X1�
Mean of X1: mean�X1� = 5�14
Enter True Value ���: � �= 5�2
Precision (or standard deviation): ⎛ ⎞1 − 2 −−−−−−−−−−−−−−−−−−−−→ 1 sd�X1� �= ⎝ · �X1 − mean�X1��2 ⎠ n−1 sd�X1� = 0�114
Compute degrees of freedom as (n-1): df �= n − 1
Enter alpha value as �2: �2 �= �95
Calculate t-table value: �1 �=
�2 + 1 2
t �= qt��1� df�
Collaborative Laboratory Studies: Part 6
217
t-value: t = 2�776
t experimental (Te):
�mean�X1� − �� √ Te �=
· n
sd�X1� Te =1�177
If Te ≤ t-value, then there is NO SIGNIFICANT DIFFERENCE If Te ≥ t-value, then there IS A SIGNIFICANT DIFFERENCE between the set of measured values and the TRUE VALUE (i.e., they are different)
218
Chemometrics in Spectroscopy
Comp_Meth
Computations for the Comparison of Two Methods (Youden/Steiner): ---------------------------------------------------------RAW DATA ENTRY FOR METHOD A (Enter single Determinations for Sample X from different laboratories using Method A): METHOD A: Sample X LAB LAB LAB LAB
#1 #2 #3 #4
AX:= 3.37 3.38 3.36 3.38
nX�= rows�AX� mean�AX� = 3�372 (Enter single Determinations for Sample Y from different laboratories using Method A): Sample Y LAB LAB LAB LAB
#1 #2 #3 #4
AY:= 3.74 3.74 3.74 3.76
nY�= rows�AY�
mean�AY� = 3�746
RAW DATA ENTRY FOR METHOD B:
(Enter single Determinations for Sample X from different laboratories using
Method A):
METHOD B:
Sample X
LAB LAB LAB LAB
#1 #2 #3 #4
BX:= 3.42 3.41 3.38 3.40
nX�= rows�BX� mean�BX� = 3�401
Collaborative Laboratory Studies: Part 6
219
(Enter single Determinations for Sample Y from different laboratories using Method A): Sample Y LAB LAB LAB LAB
#1 #2 #3 #4
BY:= 3.76 3.86 3.74 3.83
nY�= rows�BY� mean�BY� = 3�8
METHOD B:
METHOD A:
3.76
3.85
mean(AY) AY
mean(BY) 3.75
BY
3.74 3.73
3.8 3.75
3.35 3.36 3.37 3.38 mean(AX), AX
3.39
3.7
3.36 3.38
3.4
3.42
mean(BX), BX
Two-sample Charts illustrating systematic errors for Methods A vs. B:
CALCULATIONS: METHOD A:
METHOD B:
ADxy �= �AX − AY� mean�ADxy� = 0�374 ATxy �= �AX + AY� mean�ATxy� = 7�117
BDxy �= �BX − BY� mean�BDxy� = 0�399 BTxy �= �BX + BY� mean�BTxy� = 7�201
d �= ATxy − BTxy
�d = 0�335
Mean Difference: mean�d� = 0�084
d2 �= BTxy − ATxy
3.44
220
Chemometrics in Spectroscopy
Measuring the Precision and Standard Deviation of the Methods (Youden/Steiner): ----------------------------------------------------------
Precision (Sr): −−−−−−−−−−−−−−−−−−−→ −−−−−1−−−−−− · �ADxy − mean�ADxy��2 ASr�= 2·�nY − 1� −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ 1 2 BSr�= · �BDxy − mean�BDxy�� 2·�nY − 1� ASr = 6.692658 · 10−3 BSr = 0.037334
Standard Deviation (Sd): −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ 1 ASd�= · �ATxy − mean�ATxy��2 2·�nY − 1� −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ 1 2 BSd�= · �BTxy − mean�BTxy�� 2·�nY − 1� ASd = 0�013056 BSd = 0�045387
Statistical Test for presence of systematic errors (Youden/Steiner): ------------------------------------------------------
F-statistic Calculation (Fs) for Precision Ratio: Sr2 Ratio: PFs�=
BSr2 ASr2
PFs = 31�118 Ho: If Fs is less than or equal to Ft, then there is NO DIFFERENCE in Precision
estimation.
Ha: If Fs is greater than Ft, then there is a DIFFERENCE in Precision estimation.
Collaborative Laboratory Studies: Part 6
221
F-statistic Calculation (Fs) for Presence of Systematic Errors: Sd2 Ratio: SFs�=
BSd2 ASd2
SFs = 12�085
Ho: If Fs is less than or equal to Ft, then there is NO DIFFERENCE in systematic error for methods.
Ha: If Fs is greater than Ft, then there is a DIFFERENCE in systematic error for
methods.
F-statistic Table Value (Ft): df1�=nY − 1 df1 �= 3 qF�0�95� df1� df1� = 9�277
Student’s t-Test for the Difference in the biases between Two Methods:
mean�d� = −0�084
Mean Difference:
mean�d� = 0�084
−−−−−−−−−−−−−−−→
−−−1−−−− s�= · �d2− mean�d� �2 �df1� s = 0�053 sm�= √
s nY
sm = 0�026 t-test Statistic: Te�=
mean�d� sm
Te =3�189
222
Chemometrics in Spectroscopy
Enter alpha value as a2: �2 �= �95 Calculate t-table value: �1�=
�2+1 2
�1 = 0.975 t�= qt��1� df1� t-Table Value: t = 3�182 Ho: If Te is less than or equal to t, then there is NO SYSTEMATIC DIFFERENCE between method results.
Ha: If Te is greater than t, then there is a SYSTEMATIC DIFFERENCE (BIAS)
between method results.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Hinshaw, J.V., LC-GC 17(7), 616–625 (1999). Mark, H. and Workman, J., Spectroscopy 2(2), 60–64 (1987). Workman, J. and Mark, H., Spectroscopy 2(6), 58–60 (1987). MathCad; MathSoft, Inc.: 101 Main Street, Cambridge, MA 02142; Vol. v. 7.0 (1997). Mark, H. and Workman, J., Spectroscopy 10(1), 17–20 (1995). Mark, H. and Workman, J., Spectroscopy 4(7), 53–54 (1989). Youden, W.J. and Steiner, E.H., Statistical Manual of the AOAC, 1st ed. (Association of Official Analytical Chemists, Washington, DC, 1975). Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991). Draper, N. and Smith, H., Applied Regression Analysis (John Wiley & Sons, New York, 1981). Zar, J.H., Biostatistical Analysis (Prentice Hall, Englewood Cliffs, NJ, 1974). Owen, D.B., Handbook of Statistical Tables (Addison-Wesley Publishing Co., Inc., Reading, MA, 1962).
40
Is Noise Brought by the Stork? Analysis of Noise: Part 1
Well no, actually. If the truth be told, we all know that noise is brought (on) by quantum mechanics. Now, if we could some day find a really good quantum mechanic, one who could actually fix once and for all, all those broken quanta around us, then maybe all the noise would go away, but that is probably too much to ask for and not likely to happen. About as likely as our getting away with making more of these sorts of bad jokes, those are more in the domain of other spectroscopy writers? On to more serious matters: where does the noise come from and how does noise affect our data, that is the spectra we measure? Chemists are interested in the effects that various phenomena have on the accuracy of chemical analyses. General books about instrumental analysis discuss some of the sources of error, and even provide elementary derivations relating some of the instrumental phenomena to their effect on the error of the chemical analysis. Elementary texts [1, 2] derive a formula for the “optimum” absorbance a sample should have. More recent work has also been directed to ascertaining the “optimum” transmittance (or reflectance) value a sample should have for best quantitative accuracy, directing their efforts particularly to the situation when multivariate methods of analysis are in use [3, 4]. One standard treatment of the problem derives the error in concentration of an analyte caused by error of the spectral value, and presents the often-seen curve showing that the relative error in concentration, C/C, goes through a minimum and computes that the minimum occurs at a transmittance of 0.368, corresponding to an absorbance of 0.4313 More advanced texts [5] relate the measurements and the measurement process to the noise of the spectrum given the nature of different noise sources, “noise” being the term generally (although rather loosely, to be sure) used to describe error of an instrumental reading, while “error” is used more generally. At the end of the day, though, they really mean the same thing: the random variations superimposed on the desired information. Close examination reveals that these expositions are wanting. Sometimes a simplifying assumption is made that results in an incorrect description [2]. In other cases the argument in taken into the statistical domain prematurely, leaving no room to accommodate different situations [5]. It is clear, however, that one formula cannot fit all cases. There are a large number of ways in which instruments react to various sources of variation of the signal; we summarize some of them here: 1) Many common infrared and near-infrared detectors are subject to phenomena that are mainly thermal in origin, and therefore the detector noise is independent of the signal level. 2) Some detectors for the visible and UV spectral regions can detect individual photons. These detectors are shot-noise limited. X-ray and gamma-ray spectroscopy also detects
224
Chemometrics in Spectroscopy
individual photons and therefore is also limited by this source of variation. Since shotnoise follows Poisson statistics, the detector noise in these cases increases with the square root of the signal. 3) Sometimes the detector noise is not the limiting noise source. One prime noise source can be generically called “scintillation noise”: variation in the amount of energy impinging on the detector. These often have mechanical causes: vibration of the source, or vignetting at an optical stop in the optical system, changing the geometry of the radiation on the detector. Astronomical measurements of course, are subject to this noise source from atmospheric fluctuations, and represent the classic example of this type of variation. From whatever source, however, scintillation noise is directly proportional to the energy of the optical signal. 4) Other cases of non-detector noise occur when the noise is introduced after the detector. These are usually a result of limitations of the instrument and in principle could be reduced by re-engineering the instrument. Examples include power line pickup, and mechanical vibrations affecting a sensitive part (generically called “microphonics”). The magnitude of these would also tend to be independent of the signal level. 5) One noise source tends to affect older design spectrometers, which are the spec trometers that use the optical-null principle. In the case of optical-null spectrometers, various electrical (random noise and power-line pickup) and mechanical (vibrations) noise sources can be introduced after the transmittance via the optical null is determined (P.R. Griffiths, 1998, personal communication), and in those cases the error of the transmittance will be constant. In fact, because of the historical origins, this is the case that is usually treated in the extant literature. However, this is not a simple one-to-one relationship either, since it depends on how the instrument designer chose to deal with the problem. Many of those types of instruments had variable slits, and the slits could be opened or closed during a scan, according to some preset (hardwired, to be sure: these were not computer-controlled instruments) program. One possibility, of course, was to leave the slit at a constant opening that was preset before the scan was run. A second possibility was to program the slit for a constant bandwidth across the spectrum. A third possibility was to program the slit for constant reference energy. Here again, it is clear that the noise characteristics of the instrument will depend on how the construction of the instrument determined which of these situations applied, and therefore gives us at least three subcases here. 6) Variations in the temperature of a blackbody used as the source in a spectrometer. The energy density of blackbody radiation is given by the well-known formula: dE 8h 3 = 3 V h/kt d c e −1
(40-1a)
for radiation in the frequency range from to + d, where t is the temperature, V is the volume of the enclosure containing the radiation and h, k and c have their usual meanings. Collecting the constants (to simplify the expression), we obtain K 3 dE = h/kt d e −1
(40-1b)
Analysis of Noise: Part 1
225
Taking the derivative of this with respect to temperature, we obtain � � d dE −1 −h d eh/kt 2 = K 3 h/kt 2 e − 1 kt dt Back substituting equation 40-1b into equation 40-2, we obtain � � d ddE dE heh/kt = dt d kt2 eh/kt − 1
(40-2)
(40-3)
and we see that the relative energy change (as a fraction of the energy) in the wavelength interval between and + d is given by the expression: heh/kt kt2 eh/kt − 1
(40-4)
7) Variation of pathlength will create a source of variation in the data such that the change in absorbance is proportional to the absorbance. This can happen even in transmission spectroscopy if the walls of the sample cell for some reason should not be rigidly fixed in place, or possibly the cell might expand through temperature changes. Of course, in that case the sample itself is also likely to be affected directly; expansion of a liquid sample would have an effect equivalent to a reduction in pathlength. It can also happen, and is perhaps more common, in the case of diffuse reflectance. In that measurement technique, absent a rigorous theory to describe this physical phe nomenon, the concept of a variable pathlength is used as a first approximation to the nature of the change in the measurements. 8) There are other sources of noise, whose behavior cannot be described analytically. They are often principally due to the sample. A premier example is the variability of the measured reflectance of powdered solids. Since we do not have a rigorous ab initio theory of diffuse reflectance, we cannot create analytic expressions that describe the variation of the reflectance. Situations where the sample is unavoidably inhomogeneous will also fall into this category. In all such cases the nature of the noise will be unique to each situation and would have to be dealt with on a case-by-case basis. 9) Another source of variability, which can have still different characteristics, is com prised of the interaction of any of the above factors with a nonlinearity anywhere in the system. These nonlinearities could consist of nonlinearity in the detector, in the spectrometer’s electronics, optical effects such as changes in the field of view, and so on. Many of these nonlinearities are likely to be idiosyncratic to the cause, and would have to be characterized individually and also analyzed on a case-by-case basis. 10) Another, specialized, case would be nondispersive analyzers. For these instruments the whole concept of determining the signal between and + is inapplicable, since the measured signal represents the integrated optical intensity of the incident radiation over a broad range of wavelengths, likely including wavelength regions where the optical radiation is weak as well as where it is strong. Furthermore, this will be sampledependent, and almost certainly would have to be dealt with on a sample-by-sample basis.
226
Chemometrics in Spectroscopy
Thus, given the variety of ways that the noise output of a detector is related to the optical signal into the detector, the argument that a single formula cannot account for them all becomes even more forceful. This being so, it is clear that each case needs to be treated separately in order to obtain a correct description of the effect on the noise of the spectrum. For single-beam spectra the noise can be described directly. For ratioed spectra, it is of interest to ascertain the effect of the various noise sources on the ratioed spectrum (i.e., the transmittance or reflectance spectrum as the case may be), on the absorbance spectrum, and also to determine, as was done previously [1, 2, 5], the optimum value for the sample to have that will give the minimum error of the calculated value. We will be doing this exercise during the course of the next few chapters. We will consider each of these types of noise one at a time. We will start from first principles, derive the appropriate expressions and deal with them in a completely rigorous manner. During the course of this we will compare out results with the ones in the literature and see where the standard derivations (NOT deviations!) depart from our presentation. We will begin with the next chapter with an analysis of the effect of one of the most common cases: constant detector noise, typical of mid-infrared and near-infrared instruments.
REFERENCES 1. Strobel, H. A., Chemical Instrumentation – A Systematic Approach to Instrumental Analysis (Addison-Wesley Publishing Co., Reading, MA, 1960). 2. Ewing, G., Instrumental Methods of Chemical Analysis, 4th ed. (McGraw-Hill, New York, 1975). 3. Honigs, D.E., Hieftje, G.M. and Hirschfeld, T., Applied Spectroscopy 39(2), 253–256 (1985). 4. Hirschfeld, T., Honigs, D. and Hieftje, G., Applied Spectroscopy 39(3), 430–433 (1985). 5. Ingle, J.D. and Crouch, S.R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988).
41 Analysis of Noise: Part 2
Note to the Reader: Chapters 41 through 53 are derived from a series of papers written about the subject of noise. They are sequential in nature and the rationale and descriptions follow a series of equations, figures and tables that are best followed using a serial numbering system running sequentially throughout the chapter series. Thus the equations, figures, and tables for these chapters will contain the chapter number and then the sequential equation, figure, or table number. For example chapter 42 begins with Equation 41-19 and this equation would be designated as (42-19), following a format. Chapter 40 is based on reference [1]. In this chapter we brought up the question of how various types of noise are related to the noise characteristics of the spectra one observes. In this chapter, and in the thirteen subsequent chapters (41 through 53), we will derive the expressions for the various situations that arise; these situations have been described in greater detail within Chapter 40. We begin with a fairly simple case: that of constant detector noise. This chapter will also serve to lay out the general conditions that apply to these derivations, such as nomenclature. We will treat this first case in excruciating detail, so that the methods we will use are clear; then, for the cases we will deal with in the future, we will be able to give an abbreviated form of the derivations, and anyone interested in following through themselves will be able to see how to do it. Also, some of the results are so unexpected, that without our giving every step, they may not be believed. Since the measurement of reflectance and transmittance are defined by essentially the same equation, we will couch our discussion in terms of a transmittance measurement. The important difference lay, as we discussed previously, in the nature of the error superimposed on the measurement. Therefore, we begin by noting that transmittance (T ) is defined by the equation 41-1: T=
Es − E0s Er − E0r
(41-1)
where Es and Er represent the signal due to the sample and reference readings, respec tively, E0s and E0r are the “dark” or “blank” readings associated with Es and Er . (Er − E0r ), of course, must be non-zero. The measured value of T , caused by the error �T is T + �T =
� � �Es + �Es� � − �E0s + �E0s � �Er + �Er� � − �E0r + �E0r �
(41-2)
where the � terms represent the fluctuation in the reading due to instantaneous random effect of noise. An important point to note here is that Es , Er , and T , for any given set
228
Chemometrics in Spectroscopy
of readings at a given wavelength, are constants. All variations in the readings, due to noise, are associated with �Es , �Er , and �T . Rearranging equation 41-2 we have T + �T =
� �Es − E0s � + ��Es� − �E0s � � � �Er − E0r � + ��Er − �E0r �
(41-3)
The difference between two random variables is itself a random variable, therefore we � � � in equation 41-3 with the equivalent, � and (�Er� − �E0r replace the terms (�Es� − �E0s simpler terms �Es and �Er , respectively: T + �T =
Es − E0s + �Es Er − E0r + �Er
(41-4)
The presence of a non-zero dark reading, E0 , will, of course, cause an error in the value of T computed. However, this is a systematic error and therefore is of no interest to us here; we are interested only in the behavior of random variables. Therefore we set E0s and E0r equal to zero and note, if T as described in equation 41-1 represents the “true” value of the transmittance, then the value we obtain for a given reading, including the instantaneous random effect of noise, is T + �T =
Es + �Es Er + �Er
(41-5)
and we also find that upon setting E0s and E0r equal to zero in equation 41-1, equation 41-1 becomes E T= s (41-6) Er where �Es and �Er represent the instantaneous, random values of the change in the sample and reference readings due to the noise. Since, as we noted above, T , Es , and Er are constant for any given reading, any change in the measured value due to noise is contained in the terms �Er and �Es . In statistical jargon this would be called “a point estimate of T from a single reading”, and �T is the corresponding instantaneous change in the computed value of the transmittance. Again, Er must be non-zero. We note here that �Es and �Er need not be equal; that will not affect the derivation. For the case we are considering in this chapter, however, we are assuming constant detector noise, therefore when we pass to the statistical domain, we will consider �Es to be equal to �Er . That, of course, refers only to the expected values; but since the noise is random, the instantaneous values will virtually never be the same. Upon subtracting equation 41-6 from equation 41-5 we obtain the following: T + �T − T =
Es + �Es Es − Er + �Er Er
(41-7)
�T =
Er �Es + �Es � − Es �Er + �Er � Er �Er + �Er �
(41-8)
�T =
Er Es + Er �Es − Es Er − Es �Er
Er �Er + �Er �
(41-9)
�T =
Er �Es − Es �Er
Er �Er + �Er �
(41-10)
Analysis of Noise: Part 2
229
Equation 41-10 might look familiar. If you check an elementary calculus book, you will find that it is about the second-to-last step in the derivation of the derivative of a ratio (about all you need to do is go to the limit as �Es and �Er →zero). However, for our purposes we can stop here and consider equation 41-10. We find that the total change in T , that is �T , is the result of two contributions: �T =
Es �Er Er �Es − Er �Er + �Er � Er �Er + �Er �
(41-11)
We note that, since by assumption Er is non-zero, and �Er is non-zero and independent of Er , the first term of equation 41-11 is non-zero. The value of the second term of equation 41-11, however, will depend on the value of Es , that is on the transmittance of the sample. In order to determine the standard deviation of T we need to consider what would happen if we take multiple sample and reference readings, then we can characterize the variability of T . Since Er and Es are fixed quantities, when we take multiple readings we note that we arrive at different values of T + �T due to the differences in the values of �Er and �Es on each reading, causing a change in �T . Therefore we need to compute the standard deviation of �T , which we do from the expression for �T in equation 41-11: � � Es �Er Er �Es − (41-12) SD��T � = SD Er �Er + �Er � Er �Er + �Er � Or equivalently, we calculate the variance of �T , which is the square of the standard deviation: � � Er �Es Es �Er Var��T � = Var − (41-13) Er �Er + �Er � Er �Er + �Er � The proof that the variance of the sum of two terms is equal to the sum of the variances of the individual terms is a standard derivation in Statistics, but since most chemists are not familiar with it we present it in the Appendix. Having proven that theorem, and noting that �Es and �Er are independent random variables, they are uncorrelated and we can apply that theorem to show that the variance of �T is: � � � � Er �Es −Es �Er Var��T � = Var + Var (41-14) Er �Er + �Er � Er �Er + �Er � Since �Er is small compared to Er , the �Er in the denominator terms will have little effect on the variance of T and in the limit approaches zero. In a case where this is not true, the derivation must be suitably modified to include this term. This is relatively straightforward: substitute the parenthesized terms into the equation for variance (e.g., as we do in the appendix), hook up about a 100-hp motor or so and “turn the crank” – as we will do in due course. It is mostly algebra, although a lot of it! In our current development, however, we assume �Er is small and therefore negligible compared to Er we replace (Er + �Er ) with Er : � � � � Er �Es −Es �Er Var��T � = Var + Var (41-15) Er 2 Er 2
230
Chemometrics in Spectroscopy
� Var��T � = Var
� � � −T�Er �Es + Var Er Er
(41-16)
We have shown previously that if a represents a constant, then Var �aX� = a2 Var�X� ([2], or see [3] Chapter 11, p. 94). Hence equation 41-16 becomes � � � �2 −T 2 1 Var��Es � + Var��Er � (41-17) Var��T � = Er Er Since we have assumed constant detector noise for this chapter, Var��Es � = Var��Er � = Var��E� Var��T � =
1+T2 Var��E� Er 2
(41-18)
Finally, reconverting variance back to SD by taking square roots on both sides of equation 41-18: SD��T � =
� SD��E� 1+T2 Er
(41-19)
We remind our readers here that �E, as we have been using it in this derivation is, as you will recall, the difference between �E � and �E0� in equation 41-4 and the expected value in the statistical nomenclature is therefore 21/2 times as large as �E� (due to the fact that it is the result of the difference between random variables with equal variance), a difference that should be taken note of when comparing results with the original definition of S/N in equation 41-2. We next note, and this is in accordance with expectations, that the noise of the trans mission spectrum, SD(�T ) is dependent on the noise-to-signal ratio of the readings, the inverse of the S/N ratio commonly used and presented as a spectrometer specification – at least, as long as the noise is small compared to the reference energy reading so that the approximation made in equation 41-15 remains valid. Recall that Er is the energy of the reference reading and SD(�E) is the noise of the readings from the detector; this ratio of SD(�E)/Er is the (inverse of the) true signal-to-noise ratio; the noise observed on a transmission spectrum, while related to S/N , is in itself not the true S/N ratio. Next we note further, and this is probably contrary to most spectroscopist’s expecta tions, that the noise of the transmittance spectrum is not constant, but depends on the transmittance of the sample, being higher for highly transmitting samples than for dark samples. Since T can vary from 0 (zero) to 1 (unity), the noise level can vary by a factor of the square root of two, from a relative value of unity (when T = 0) to 1.414 � � � (when T = 1). This behavior is shown in Figure 41-1. The increase in noise with increasing signal might be considered counterintuitive, and therefore surprising, by some. Intuition tells us that he S/N ratio might be expected to improve with increased signal regardless of its source, or that the noise level of the transmittance spectrum should at least remain constant, for constant detector noise. This misapprehension has worked its way into the literature to modern times: “In most infrared measurements situations, the detector constitutes the limiting noise source. Because the resulting fluctuations have the same effect as a fixed uncertainty in the signal readout, they appear as a constant error in the transmittance”. [4]
Analysis of Noise: Part 2
231
1.6 1.4
Relative noise
1.2 1 0.8 0.6 0.4 0.2
1
0.96
0.92
0.88
0.8
0.84
0.76
0.72
0.68
0.6
0.64
0.56
0.52
0.48
0.4
0.44
0.36
0.32
0.28
0.2
0.24
0.16
0.12
0.08
0
0.40
0
Sample transmittance
Figure 41-1 Noise level of a transmittance spectrum as a function of the sample transmittance.
Intuition tells us that if the transmittance is zero, then it should have no effect on the readings. In fact this is true, but misleading. The transmittance being zero, or the sample energy being zero, does not mean that the variability of the reading is zero. The explanation of the actual behavior comes from a careful perusal of the intermediate equations developed in the course of arriving at equation 41-19, specifically equation 41-14. From the first term in that equation we see that the irreducible minimum noise is contributed by the reference signal level (Er � multiplied by the variation of the sample signal (�Es �, independently of the value of the sample signal. Increasing sample signal then serves to add additional noise to the total, through its contribution, in the second term of equation 41-14, which comes from the sample signal through its being multiplied by the reference noise. Conventional developments of the subject contain flaws that are usually hidden and subtle. In Ewing’s book, for example [5], the development includes the step (see page 43, the section between equations 3-6 and 3-7) of noting that, since the reference energy is essentially set equal to unity, log (Er � (or P0 , the equivalent in Ewing’s terminology) is set equal to zero. However, this is done before the separation of P0 from �P0 , creating the implicit, but erroneous, result that �P0 is zero as well. In our nomenclature, this causes the second term of equation 41-14 to vanish, and as a consequence the erroneous result obtained is that �T is independent of T . This, of course, appears to confirm intuition and since it is based on mathematics, appears to be beyond question. Other treatments [6] simply do not question the origin of the noise in T and assume a priori that it is constant, and work from there. The more sophisticated treatment of Ingle and Crouch [7] comes very close but also misses the mark; for an unexplained reason they insert the condition: “� � � it is assumed there is no uncertainty in measuring Ert and E0t � � � ”. Now in fact this could happen (or at least there could be no variation in �Er �; for example, if one refer ence spectrum was used in conjunction with multiple sample spectra using an FTIR spectrometer. However, that would not be a true indication of the total error of the measurement, since the effect of the noise in the reference reading would have been removed from the calculated SD, whereas the true total error of the reading would in
232
Chemometrics in Spectroscopy
fact include that source of error, even though part of it were constant. It is to their credit that these authors explicitly state their assumption that they ignore the variability of Er rather than hiding it. Furthermore they allude to the fact that something is going on when they state “� � � the approximation is good to within a factor of 21/2 .” Nevertheless they failed to follow through and derive the exact solution to the problem. The bottom line to all this is that in one way or another, previous treatments of this subject have invariably failed to consider the effect of the noise of the reference reading, and therefore arrived at an erroneous conclusion. Whew! I think that is enough for one chapter. I need a rest. And so does the typesetter! We will continue the derivation in our next chapter.
APPENDIX Proof that the variance of a sum equals the sum of the variances Let A and B be random variables. Then the variance of (A + B) is by definition:
Var�A + B� =
�2 � � �A + B� − �A + B� n−1
(41-A1)
Since �A + B� = A + B, we can separate the numerator terms and then expand the numerator: � � � A2 + AB − AA − AB + AB + B2 − AB − BB 2 2 −AA − AB + A + AB − AB − BB + AB + B Var�A + B� = (41-A2) n−1 We can now collect terms as follows: � 2 � � 2 2 2 �B − 2BB + B � �A − A��B − B� �A − 2AA + A � + +2 Var�A + B� = n−1 n−1 n−1 (41-A3) Equation 41-A3 can be checked by expanding the last term, collecting terms and verifying that all the terms of equation 41-A2 are regenerated. The third term in equation 41-A3 is a quantity called the covariance between A and B. The covariance is a quantity related to the correlation coefficient. Since the differences from the mean are randomly positive and negative, the product of the two differences from their respective means is also randomly positive and negative, and tend to cancel when summed. Therefore, for independent random variables the covariance is zero, since the correlation coefficient is zero for uncorrelated variables. In fact, the mathematical definition of “uncorrelated” is that this sum-of-cross-products term is zero. Therefore, since A and B are random, uncorrelated variables: � � �B − B�2 �A − A�2 + (41-A4) Var�A + B� = n−1 n−1
Analysis of Noise: Part 2
233
The two terms of equation 41-A4 are, by definition, the variances of A and B. Var�A + B� = Var�A� + Var�B�
(41-A5)
QED
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 15(10), 24–25 (2000). 2. Mark, H. and Workman, J., Spectroscopy 3(8), 13–15 (1988). 3. Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991). 4. Honigs, D.E., Hieftje G.M. and Hirschfeld, T., Applied Spectroscopy 39(2), 253–256 (1985). 5. Ewing, G., Instrumental Methods of Chemical Analysis, 4th ed. (McGraw-Hill, New York, 1975). 6. Strobel, H.A., Chemical Instrumentation – A Systematic Approach to Instrumental Analysis (Addison-Wesley Publishing Co., Reading, MA, 1960). 7. Ingle, J.D., and Crouch, S.R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988).
This page intentionally left blank
42
Analysis of Noise: Part 3
We have been discussing the question of how noise in a spectrometer affects the observed noise in the spectra we measure. This question was introduced [1] and various known phenomena was presented that contribute (or, at least, can contribute) to the noise level of the observed spectra. Since this is a continuation of the previous chapters, we will continue the numbering of equations, figures, and so on as though it were all one chapter. In Chapter 41, based on reference [2] we derived the following expression for the noise of a transmission measurement, for the case of constant detector noise, as is commonly found in IR and NIR spectrometers: SDT =
SDE 1+T2 Er
(42-19 also shown as 41-19)
To continue the derivation, the next step is to determine the variation of the absorbance readings; starting with the definition of absorbance. The extension we present here, of course, is based on Beer’s law, which is valid for clear solutions. For other types of measurements, diffuse reflectance for example, the derivation should be based on a suitable function of T that applies to the situation, for example the Kubelka-Munk function for diffuse reflectance should be used for that case: A = − logT
(42-20a)
A = −04343 lnT
(42-20b)
dA = −04343 dT/T
(42-21)
We take the derivative,
and substitute the expressions for T (Equation 41-6) and dT , replacing the differen tials by finite differences: so that we can use the expression for T found previously (Equation 41-11): Es Er Er Es − −04343 Er Er + Er Er Er + Er (42-22) A = Es Er −04343Er Er Es Es Er A = − (42-23) Es Er Er + Er Er Er + Er −04343Er Er Es − Es Er A = (42-24) Er Er + Er Es
236
Chemometrics in Spectroscopy
Again allowing ourselves to neglect Er in comparison with Er : −04343 Er Es − Es Er A = Es Er
(42-25)
At this stage we have two branches of a derivation “tree” to pursue: one is to determine the standard deviation of A, the other is to continue the derivation, toward the final result corresponding to the “standard” treatments of the topic, but using our rigorously derived equations. We start with the computation of standard deviation of A, which is straightforward. We cut the derivations short slightly, however, in that the process we will use will apply the same sequence of steps; as we did to the case of T as we previously showed [2], but present only the results of each step, not all the intermediate equations. These steps are: separating the fraction in equation 42-25 into two terms, taking the variance of both sides of the equation, noting that Var(Es = VarEr = VarE, applying the two theorems that tell us 1) VarX + Y = VarX + VarY 2) VaraX = a2 VarX simplifying the expressions when possible and then taking square roots again. So we start by multiplying through and separating the fractions in equation 42-25: A =
−04343Es 04343Er + Es Er
(42-26)
taking the variance of both sides of the equation: −04343Es 04343Er + VarA = Var Es Er apply the theorem: VarX + Y = VarX + VarY −04343Es 04343Er VarA = Var + Var Es Er
(42-27)
(42-28)
and then the theorem: VaraX = a2 VarX VarA =
−04343 Es
2
04343 Var Es + Er
2 Var Er
(42-29)
Let VarEs = VarEr = VarE:
−04343 2 04343 2 Var E + Var E Es Er 2 −1 2 1 VarA = + 043432 Var E Es Er
VarA =
(42-30)
(42-31)
Analysis of Noise: Part 3
237
and finally: SDA = 04343SDE
1 1 + E s 2 Er 2
(42-32)
We may compare this with SD(A) that would be obtained if Er were set to zero in equation 42-25 (as per the conventional derivation): SDA =
04343 SDE Es
(42-33)
Since Es can go from zero to Er , it is interesting and instructive to plot these two functions, in order to compare the effect of eliminating the terms involving Er from the expressions. We do this in Figure 42-2. To continue on the second branch of our derivation “tree” as described above, we next derive expressions for relative precision, A/A, starting with the use of equations 42-20b and 42-25: −04343 Er Es − Es Er A Es Er = (42-34) A −04343 lnT A Er Es − Es Er = A Es Er lnT A 1 = A lnT
Es Er − Es Er
(42-35) (42-36)
Exact versus approximate solution 0.6
Absorbance noise
0.5 0.4 0.3 0.2 0.1
1
0.96
0.92
0.88
0.8
0.84
0.76
0.72
0.68
0.6
0.64
0.56
0.52
0.48
0.4
0.44
0.36
0.32
0.28
0.2
0.24
0.16
0.12
0.08
0
0.04
0
%T
Figure 42-2 Absorbance noise as a function of transmittance, for the exact solution (upper curve: equation 42-32) and the approximate solution (lower curve: equation 42-33). The noise-to-signal ratio, i.e., E/Er was set to 0.01. (see Color Plate 3)
238
Chemometrics in Spectroscopy
Again going through the steps needed to convert to the statistical domain (as we did before) we first take the variance of both sides of equation (42-36) to obtain A 1 Es Er Var − (42-36a) = Var Er A lnT Es Then apply the theorem: VarA + B = VarA + VarB: A 1 Es 1 −Er Var = Var + Var A lnT Es lnT Er
(42-36b)
And then the theorem: VaraX = a2 VarX: Var
A A
=
1 Es lnT
2 VarEs +
−1 Er lnT
2 VarEr
A 1 1 Var = 2 VarEs + 2 VarEr 2 A Es lnT Er lnT 2 A 1 1 1 Var VarE + VarE = s r A Er 2 lnT 2 Es 2 Then setting VarEs = VarEr = VarE: A 1 1 1 Var = VarE + VarE A Er 2 lnT 2 Es 2 Var Var
A A A A
=
VarE
=
VarE
lnT 2
lnT 2
1 1 + 2 2 Es Er Es 2 +E r 2 Es 2 Er 2
(42-36c) (42-36d)
(42-36e)
(42-36f)
(42-36g)
(42-36h)
And finally, taking square roots on both sides to convert to standard deviations, and substituting Es /Er forT −SDE Es 2 + Er 2 A = (42-37) SD A Es Er lnEs /Er We may compare this, for example, with the equation at an equivalent point in Ingle and Crouch’s development [3] (taking that as a “typical” derivation): A −st = A TEr lnT
(Ingle and Crouch’s equation 5-45)
The relationship and differences between the two equations are obvious, except we may note that, while can never be negative, there is always the issue, when taking a square root, of determining the sign. Since Es /Er is less than unity, the logarithm in the denom inator is negative and therefore we must determine that the sign of the square root in the
Analysis of Noise: Part 3
239 Exact versus Approx Solution for SD [Δ(A)/A]
1.6 1.4 1.2
Δ(A)/A
1 0.8 0.6 0.4 0.2
1
0.88
0.92
0.9
0.95
0.8
0.85
0.7
0.75
0.6
0.65
0.5
0.55
0.45
0.4
0.3
0.35
0.25
0.2
0.1
0.15
0.05
0
0
%T
Exact Approx
Expansion of SD [Δ(A)/A] 0.16 0.14 0.12
Δ(A)/A
0.1 0.08 0.06 0.04 0.02 1
0.96
0.8
0.84
0.76
0.72
0.68
0.64
0.6
0.56
0.52
0.48
0.44
0.4
0.36
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
0
%T
Figure 42-3 Comparison of the exact (upper curve: equation 42-37) and approximate (lower curve: Ingle and Crouch equation 5-45) expressions for the standard deviation of A/A as a function of %T. Noise-to-signal is set at 0.01.
numerator is also negative in order to obtain a positive value for SD(A). Equation 42-37 then reduces to the Ingle & Crouch equation if Er goes to zero (as Ingle & Crouch assume) and we pass to the statistical domain. Again, it is interesting and instructive to compare the two expressions by plotting them as a function of T , which we do in Figure 42-3. From Figure 42-3 we also see the well-known effect on the relative precision of spectral analysis of, on the one hand, T → 0 and on the other the effect of lnT → 0 as T → 1. The minimum relative error occurs, in the standard treatment, at T = 0368 [4]. Examining the data table from which Figure 42-3 was created (using EXCEL™) confirms what Figure 42-3 leads us to suspect: using the exact solution, there is a
240
Chemometrics in Spectroscopy
shift from the previously accepted value; the optimum value of transmittance occurs at 33.0%T rather than the generally accepted value of 36.8%T . We wish to develop an analytic expression for this situation. To do so, we will follow the same steps used in the standard development, but use the rigorously correct equation (i.e., equation 42-37 instead of the approximate equation previously used. The steps are the standard ones used for finding a minimum (or maximum) of a function: take the derivative of equation 42-37, then set that derivative equal to zero. Since the derivative of interest is the derivative with respect to T , in preparation for this we reorganize equation 42-37 as follows: we substitute equation 41-6 (reference [2], reorganized to Es = TEr (since Er is a constant) into equation 42-37; this enables us to eliminate Es from the equation:
A SD A
A SD A
SDE TEr 2 + Er 2 = TEr Er lnT
SDE Er 2 T 2 + 1 = TEr 2 lnT
A SD A
√ SDE T 2 + 1 = TEr lnT
(42-38)
(42-39)
(42-40)
We could work with equation 42-40, but it is instructive to slightly reorganize it: SD
A A
=
√ SDE T 2 + 1 Er T lnT
(42-41)
SDE is, as before, the noise-to-signal ratio of the reference Er signal. We can also note that if the variation of the sample reading was neglected, then the term under the radical would simply be unity and the expression would again reduce to the conventional expression. We are now ready to take the derivative with respect to T :
√ A d SDE T 2 + 1 d SD = (42-42) dT A dT Er T lnT We note again that
√ d T2 +1 A SDE d SD = dT A Er dT T lnT
(42-43)
Applying the theorem for the derivative of a ratio: ⎧ ⎫ √ d √ 2 d ⎪ ⎪ 2 +1 ⎨ ⎬ T lnT T + 1 − T lnT T d SDE A dT dT (42-44) SD = ⎪ ⎪ A Er dT T lnT 2 ⎩ ⎭
Analysis of Noise: Part 3
241
Since the derivative in the first term in the numerator of equation 42-44 is of the form U n , where n has the value of 1/2, we apply the theorem that the derivative of U n is nU n−1 to that part. And since the derivative in the second term of equation 42-44 is of the form U × V , where U = T and V = ln(T ), we apply the theorem that the derivative of the product of U × V is U dV + V dU to that part, then:
A SDE d SD = dT A Er ⎫ ⎧ √ d d d ⎪ T lnT √ 1 ⎪ 2 2 ⎪ T + 1 − T + 1 T lnT + lnT T ⎪ ⎨ ⎬ dT dT 2 T 2 + 1 dT ⎪ ⎪ T lnT 2 ⎪ ⎪ ⎩ ⎭ (42-45)
Now we can start simplifying (in several steps): ⎫ ⎧ √ 1 ⎪ T lnT √ 2T ⎪ 2 +1 T ⎪ ⎪ − T + lnT ⎬ d SDE ⎨ A T 2 T2 +1 = SD ⎪ ⎪ A Er T lnT 2 dT ⎪ ⎪ ⎩ ⎭ (42-46) ⎧ ⎫ √ 2 ⎪ ⎪ 2 2 +1 ⎬ ⎨ T lnT − T + lnT 1 d A SDE = SD √ ⎪ ⎪ dT A Er T 2 + 1T lnT 2 ⎩ ⎭ d A SDE T 2 lnT − T 2 + 1 1 + lnT SD = √ dT A Er T 2 + 1T lnT 2
(42-47)
(42-48)
For comparison, we note that the corresponding equation from the conventional formu lation is d A SDE 1 + lnT SD = (42-49) dT A Er T lnT 2 Now we set the derivative in equation 42-48 equal to zero and obtain T 2 lnT − T 2 + 1 1 + lnT = 0
(42-50)
This is a transcendental equation, which is not easily solved by ordinary methods. Nowadays, however, computers make the solution of such equations by successive approximations easy. In this case, again using EXCEL™, we find that the value of T that makes the left-hand side of equation (42-50) become zero, which thus gives the value corresponding to the transmittance corresponding to minimum relative error, is 0.32994, rather than the previously accepted value of 0.368 By now you probably think we are done. Not by a long shot! There is considerably more to learn about the effect of noise of a spectrum when the detector noise is constant, some of which is even more surprising than what we have seen until now. More to come in the next chapters – Stay tuned
242
Chemometrics in Spectroscopy
REFERENCES 1. Mark, H. and Workman, J., Spectroscopy 15(10), 24–25 (2000). 2. Mark, H. and Workman, J., Spectroscopy 15(11), 20–23 (2000). 3. Ingle, J.D. and Crouch, S.R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988). 4. Strobel, H.A., Chemical Instrumentation – A Systematic Approach to Instrumental Analysis (Addison-Wesley Publishing Co., Reading, MA, 1960).
43
Analysis of Noise: Part 4
This chapter is the continuation of a set [1–3] dealing with the rigorous derivation of the expressions relating the effect of instrument noise to their contributions to the spectra we observe. Our first chapter in this set was an overview; since then we have been analyzing the effect of noise on spectra when the noise is constant detector noise, that is noise that is independent of the strength of the optical signal. Inasmuch as we are dealing with a continuous set of equations, we again continue our discussion by continuing our equation numbering, figure numbering, use of symbols, and so on as though there were no break. We left off in Chapter 42 based on the original publication [3] with determining the sample transmittance corresponding to the best relative precision of a spectral mea surement, we then noted that there is more to learn about noise effects on quantitative spectroscopic analysis. “What more is there?” you might ask. Well, in the previous chapters, we learned that the transmittance level affects the noise. In this chapter we will learn that the noise can also affect the transmittance. To see how, let us go to equation (41-14) (reference [2]), which we reproduce here, and note the discussion that followed it (which we won’t reproduce here: the reader may go back to the original and reread it): �
� � � −Es Er Er Es + Var VarT = Var Er Er + Er Er Er + Er
(41-14)
Basically, the development of the mathematical derivations from Chapter 41 (Equa tions 41-15 onward) was based on the assumption that in Equation 41-14, Er was small compared to Er so that it could be ignored; this was done for several reasons, one being that it allowed considerable simplification of the equations, which was pedagogically useful. More significant and fundamental is that it represents a limiting case of the situation. But suppose the noise is not small enough to be ignored, that is it is not small compared to Er ? Then we cannot ignore it, or its effect on the equations. As we might expect, it also complicates the analysis of the situation enormously. We mentioned at that time that we would discuss that situation in due course, and the time has now come to do so. Normally, mathematical derivations are done the other way round: the full equations are developed first and then the special cases described and their effect on the equations worked through. But we chose to do it “backwards”, so to speak, because we felt it is more pedagogically effective that way; and it allows our readers to follow along with us in the simpler situations, before becoming immersed in the full complexities of the equations.
244
Chemometrics in Spectroscopy
Besides, that is the way we like to do things As we will see, there are significant consequences of non-negligible noise. To start our discussion we will go back even farther than equation 41-14 and start our discussion with equation 41-5 (reference [2]), which we again reproduce here. T + T =
Es + Es Er + Er
(Equation 41-5 from Chapter 41)
This can be separated into two terms: T + T =
Es Es + Er + Er Er + Er
(43-51)
so that now we can equate corresponding terms on the two sides of the equation: TM =
Es Er + Er
(43-52a)
T =
Es Er + Er
(43-52b)
where TM represents the measured transmittance value of a reading subjected to noise. So that now we see that equation 43-52a represents the computed transmittance of the reading, and equation 43-52b represents the deviation due to noise of that transmittance. We will address the possibility of a contribution of equation 43-52b to the computed value of TM a bit later in this chapter. We will also occasionally use the term fEr to refer to the expression on the right-hand side of equation 43-52a. Upon averaging several values of equation 43-52a, the fact that the noise is in the denominator causes the average value of the effect of the noise not to approach zero, and therefore averaging several values of T will result in a computed value different than the actual value of Es /Er . This is because division is a non-linear arithmetic operation. To illustrate this effect, we will use a numerical example, and consider two readings of T with values of Er = 02 and −0.2 times Er (remember that we are dealing specifically with the case where the noise is not negligibly small compared to the signal); this will make the “noise” symmetrical around Er . Then, the general formula for the average value of T computed will be T=
n 1� Es n i=1 Er + Ei
and for two readings as we described this becomes: � � 1 Es Es T= + 2 Er + Er Er − Er
(43-53)
(43-54)
where represents the fractional change of the measurement. For our example, the specific value of = 02: � � 1 Es Es T= + (43-55) 2 Er + 02 × Er Er − 02 × Er
Analysis of Noise: Part 4
245
�
1 1 + 1 + 02 1 − 02
�
T=
Es 2Er
T=
Es 0833333333 + 125 2Er T = 10416666
Es Er
(43-56) (43-57) (43-58)
Thus we see that, even though the noise values of the reference readings are equally distributed around their mean value of Er , their effect on the computed value of trans mittance is not symmetrically distributed due to the nonlinearity of the division process, resulting in a change of the computed value from the (in this case, known) true value. Now, smaller variations will show small effects and larger variations will show larger effects (i.e., change the computed value of T less or more than the amount shown). The relative effect, for this somewhat artificial case, is shown in Figure 43-4. As the noise becomes larger and larger compared to the value of the reference signal, the second term in equation 43-54 becomes more and more dominant. Therefore we cannot allow the “noise” to equal the signal value, since in that case the denominator of the second term would become zero and the “average” value of T would be infinite. This concern will occur again as we continue discussing this situation. One obvious consequence of this is that if data is to be coadded, the coaddition of sample and reference signals should be done individually, before the computation of T rather than computing T for each reading and then averaging the several values of T together. An interesting side note here: in the real world there is nothing to prevent the noise from becoming greater than the signal (except for the alertness of the spectroscopist doing the work!), thus it is entirely possible for the measured value of the reference reading to become arbitrarily small and the computed value of T to become arbitrarily large. Presumably, any spectroscopist will recognize that such data have no meaning. However, we here find ourselves in the quite unusual situation of allowing the mathematics to limit the extent of our analysis, rather than “real world” considerations, just the reverse of what usually happens. It is even possible for an individual noise pulse to exceed −Er so that a negative reading of Er will be obtained. This happens in the real world and therefore must be taken into account in the mathematical description. This is a good place to also note that since the transmittance of a physical sample must be between zero and unity, Es must be no greater than Er , and therefore when Er is small an individual reading of Es can also be negative. Therefore it is entirely possible for an individual computed value of T to be negative. Now, while we are concerned in these chapters with a thorough analysis of these effects, in practice this is usually not too serious a problem, for several reasons. The first reason is that if the data is noisy and needs to be coadded to reduce the noise level, the coadding is normally done in accordance with our recommendation above: before the computation of T . Therefore the error of the values of Es and Er is reduced before the computation of T is performed, thus keeping it out of the regime where the nonlinear effect becomes important. The second reason is that under normal measurement conditions, the only place where such a high N/S ratio is liable to occur will be at the ends of the spectral range, where
246
Chemometrics in Spectroscopy Relative computed transmittance 60
Relative increase
50 40 30 20 10
1
0.96
0.88 0.84
0.92
0.8
0.84
0.76 0.72
0.8
0.72 0.68
0.76
0.68 0.64
0.6
0.64
0.56
0.52
0.48
0.4
0.44
0.36
0.32
0.28
0.2
0.24
0.12
0.16
0.08
0
0.04
0
α Expansion of plot 9 8
Relative increase
7 6 5 4 3 2 1 0.92
0.88
0.6
0.56
0.52
0.48
0.44
0.4
0.36
0.32
0.28
0.2
0.24
0.16
0.12
0.08
0
0.04
0
α Figure 43-4 The relative change in computed value of T from equation 43-53 for various values of .
Er is becoming very small. Here, however, the effect will be masked by other effects contained in the data, such as the effect of small changes in source intensity, external interference or, in the case of FTIR, interferometer misalignment, or any of several other effects that change the actual values of reference and sample energy at the limits of the spectral range. On the other hand, if the measurement situation is such that the reference energy is small and cannot be increased (e.g. outdoor open-air monitoring, or insufficient time available for coaddition of data), so that the noise level is an appreciable fraction of the reference signal, then this phenomenon can become important. Now it is time to examine the effect of a more realistic type of noise than we have been considering so far. In a real situation, of course, where many readings may be
Analysis of Noise: Part 4
247
averaged together, some will contain small errors and some will contain large errors, each one making its nonlinear contribution to the value of T . Obviously, only one average value of T will be computed from the data. The net effect on the value of T computed from many readings, then, will thus depend not only on the standard deviation of Er compared to Er , but also on how many values of each value of Er are there, that is on the distribution of the values of Er . Statisticians call this average of many values of a quantity the “expected value” of that quantity. For many reasons, that we have discussed previously [4] the Normal distribution is the one that inevitably occurs in nature when there is no overriding factor to change it, therefore it is the one we consider. How do we determine the effect of using the Normal distribution? Basically, what we want to do is find the average value for many readings, when we know how often each reading occurs, after all, that is the meaning of a distribution. This would be the expected value. If we had discrete readings, we would let Wi represent the weight of the ith value, that is, how often that value occurs, and Xi represent the value, then use the formula for a weighted average: � Wi FXi i XW = � (43-59) Wi i
The Normal distribution, however, is a continuous distribution as is the distribution of values of Es /(Er + Er ; therefore we have to change the summations to integrations: � Wxfxdx XW = � (43-60) Wxdx and, since� in �this case Wx represents the Normal distribution weighting: − 1 e 2 21/2
1
Er −Er
2
which specifies the relative weights of the different values, we replace Wx with the expression for the Normal distribution, and fx is the function Es /(Er + Er ), so that equation 43-60 becomes T WN =
� �2 r − 21 Er −E Es e dEr − Er +Er �2 � � − 21 Er −E r 1 e dEr 21/2 −
1 21/2
�
(43-61)
where is the standard deviation of the variations of the energy readings and T WN rep resents the mean computed transmittance for Normally distributed detector noise. Since the normalization factor in front of the integral representing the Normal distribution in the denominator is intended to make the final value of the integrated Normal distribution be unity, the denominator of equation 43-61 is therefore unity, hence: T WN =
� �2 � r 1 Es − 21 Er −E e dEr 21/2 − Er + Er
(43-62)
A plot presenting the two parts of the integrand, and their product, is shown in Figure 43-5. We made an attempt to perform the integration analytically, which failed. While that approach may still be possible, it does not seem likely, for a couple of reasons. The dif ficulty arises from two sources. One is the general difficulty of integrating the Normal
248
Chemometrics in Spectroscopy 5
Integration terms 4
f(E r)
3
Normal distribution
Product
2
f(E r)
1 0 –0.25 –1
–0.13
–0.01
0.11
0.23
0.35 ΔE r
0.47
0.59
0.71
0.83
0.95
–2 –3 –4 –5 –6
Expansion of integral functions 2
f(E r)
1.5 1
Normal distribution
Product
0.23
0.2
0.17
0.14
0.11
0.08
0.05
0.02
–0.01
–0.04
–0.07
–0.1
–0.13
–0.16
–0.19
–0.5
–0.22
0
0.25
f(E r)
0.5
ΔE r –1 –1.5 –2
Figure 43-5 The Normal curve, the function f (Er [= Er /(Er + Er from equation 43-62 and their product. (see Color Plate 4)
distribution (sometimes called the Error Function, for obvious reasons). The other is that the Normal distribution is infinite in extent, and therefore, regardless of the value of Er or of the standard deviation being represented by the particular Normal distribution in use, there will inevitably be a point at which term Es /(Er + Er in equation 43-62 attains a value of infinity (when Er = −Er ). While this in itself does not automati cally preclude performing the integration, or prevent the integral from having a finite value, it points to a problem area, one which indicates that if the integral can be evaluated at all, it will require special methods, as the evaluation of the error function itself does. Now in fact, all this is also in accord with reality: an attempt to use data in which the reference energy becomes so small that the noise brings even a single reading down to zero will cause the computed value corresponding to that reading to become infinite; then, averaging that with any finite number of other finite values will still result in an
Analysis of Noise: Part 4
249
infinite value for the computed value of T . This is, of course, catastrophic to our attempt to deal with this situation analytically. Another point to note: if we look at equation 43-62 critically, we note that the variables are not completely separable. While we can remove Es from inside the integration, Er is not so easily removed. How, then, can we determine the effect on the computed value of T ? One way is to multiply the right-hand-side of equation 43-62 by unity, in the form of Er /Er , this leads to T WN =
� �2 � r Es 1 Er − 21 Er −E e dEr Er 21/2 − Er + Er
(43-63)
which now puts the expression into the form of the ratio of the measured values of Es and Er , with a multiplier. It also, perhaps, makes what is going on somewhat clearer: in the limit of small values of Er the base expression reduces to Er /Er which is unity; the integral then reduces to the ordinary Normal distribution, which, as we noted, also evaluates to unity, so that in the limit of small levels of noise, T becomes Es /Er , as it should. However, we still have that pesky Er inside the integral. As we might expect, the effect of the noise, Er , is really going to be affected by its relationship to Er , the signal strength. The overall noise value is contained in the exponent of the Normal distribution weighting factor, but its presence in the first part of the integral indicates that it has more than just that effect. Thus, if we try to determine the effect of changing the signal-to-noise ratio, at constant noise level, by changing Er , we must realize that Er then becomes a parameter affecting the value of the integral. Therefore in order to represent the effect of varying the signal-to-noise in this regime, we will require a family of curves rather than just a single one. Since we have seen that the integral cannot be evaluated analytically, there are several alternatives to analytic integration of equation 43-63: we can perform the integration numerically, we can investigate the behavior of equation 43-63 using a Monte-Carlo simulation, or we can expand equation 43-63 into a power series. In all cases we need to take at least a brief look at what happens when Er is close to the asymptote at −Er ; basically, it goes off to +infinity when approaching from above (as we saw), and to −infinity when approaching from below. If we do not try to compute values when we are too close to −Er , therefore, using either approach there will be a tendency toward cancellation of the positive and negative terms, leaving a finite result. In the case of a power series expansion, the closer we come to unity, the more terms we would need to include in the series. We now report on the evaluation of the integral in equation 43-63, which was done numerically by computer. The numerical computations were carried out using MATLAB. Here we examine the conditions and the results obtained for this exercise. Before attempting to evaluate the integral, we first tested for convergence, that is, that the integral is finite, and also that when evaluating it we are using a sufficiently fine interval of integration to provide accurate results. To do this, we evaluated the integral for a small region around the point Er = 0, using different values of the integration interval. The integration range was −0.01 to +0.01. Integration intervals ranged from 10−2 to 10−7 . The standard deviation of the Normal distribution was set to unity (note that we will eventually investigate the behavior of equation 43-63 for various values of the standard deviation, so that at this point setting it equal to unity is convenient for
250
Chemometrics in Spectroscopy
Table 43-1 Values of the integral between −0.01 and 0.01, for various values of the integration interval Integration interval 10−2 10−3 10−4 10−5 10−6 10−7
Value of integral 0.012130208846544832 0.012130457519586295 0.012130476382397228 0.012130478208633820 0.012130478390650151 0.012130478408845785
pedagogical purposes, and also for a quick “ballpark” evaluation of equation 43-63 for other values of this parameter), and the mean of the Normal distribution was also set equal to unity. Since the section of the Normal distribution, that is 1 standard deviation away from the mean, is the region that has the maximum slope, these conditions gave the maximum weight to the region around the infinity of f (Er ; thus if the integral did not diverge here it would not diverge at any other point of the Normal distribution. The results are in Table 43-1. Since the value of integration interval also determines how close to the point of infinity any contribution may be, presumably, if the integral were to diverge, what we would see around the point of infinity would be contributions to the integral increasing faster and faster as the computation included points closer to the infinity. Under those circumstances, we would observe an increasing value of the integral as we used finer and finer intervals of integration. What we see in Table 43-1 on the other hand is that, as we use smaller intervals of integration, more digits of the integral remain stable; thus we conclude that the integral does indeed seem to be converging on a finite value. We also observe that using an integration interval of 10−4 provides precision on the order of one part in 107 , which is more than sufficient for our purposes. First, the range of integration was set to be wide enough (10 standard deviations) that at the number of iterations we used, there is no further appreciable contribution to the integral from values beyond that range, the value of the Normal distribution at 10 standard deviations is approximately 2×10−22 . The integral is computed for various values of Er , each set of such integrals forming one curve that we will plot. The family of curves is generated by using various values of sigma (, the standard deviation of the readings due to detector noise). For our demonstration, we compute the curve of multiplication factor versus Er for values of sigma of 0.1, 0.2, 1.0. The point at Er = −Er . with the infinite value, was deleted from the set before adding the terms of the integral. Since we are using the Normal distribution, we take this opportunity to point out some of the other characteristics of the error, in particular the fact that the errors have a mean value of zero. The multiplication factor according to the integral of equation 43-63 was computed, and the family of curves is presented in Figure 43-6. Interestingly, while the values of individual computations of the multiplication factor for a finite number of discrete points can reach large values, as we saw above, we find that the expected value of the multiplication factor reaches a maximum value at a modest level, and then approaches zero as Er approaches zero. The explanation is that at large values of the reference signal strength, Er , where the noise becomes small compared to the signal, the multiplication factor approaches unity, so that the computed value of T W approaches Es /Er , as we
Analysis of Noise: Part 4
251
would expect. As the reference signal strength decreases so that it becomes comparable to the noise level, occasional individual data points will be measured in the regime where the nonlinearity of the division process becomes important; this nonlinearity then causes the computed value of T to be higher than the value computed under strong-signal (i.e., low-noise) conditions. When Er approaches zero, however, the Normal curve then allows occasional negative values to be included in the integral, and more and more often as the reference signal strength decreases further. In reality, noise can indeed cause an apparent negative value of Er , which would result in a negative computed value for the computed quantity T , even though it is a mathematical artifact and cannot correspond to an actual negative value for the physical property, T . In the limit of the reference signal strength approaching zero, there will be equal contributions of negative and positive excursions from zero, so that the average value will be zero. Since the sample signal strength must be less than the reference signal strength, the same thing is happening to Es the sample signal, so that in fact the computation would assume the undefined form of 0/0. Examining Figure 43-6, however, shows that the limiting value of T as Er approaches zero is also zero. The family of curves obtained, and presented in Figure 43-6, show that, not surpris ingly, the controlling parameter of the family of curves is the standard deviation of the noise; the maximum value of the multiplication factor occurs at a given fraction of the standard deviation of the energy readings. Successive approximations show that the maximum multiplier of approximately 1.28 occurs when Er is approximately 2.11 times sigma, the standard deviation of Er . Some miscellaneous questions arise, which we address here: First of all, since the value of a reading can become infinite, why is the integral finite and well-behaved? The answer is that while a single reading can indeed become large beyond all bounds as Er approaches −Er the probability of obtaining a value closer and closer to exactly −Er becomes smaller and smaller, and the probability of Multiplication factor for T as a function of E r
1.4
σ = 0.1
σ = 1.0
Multiplication factor
1.2 1 0.8 0.6 0.4 0.2
4.84
4.4
4.62
4.18
3.96
3.74
3.3
3.52
3.08
2.86
2.64
2.2
2.42
1.98
1.76
1.54
1.1
1.32
0.88
0.66
0.44
0
0.22
0
Er
Figure 43-6 Family of curves of multiplication factor as a function of Er , for different values of the parameter sigma (the noise standard deviation), for Normally distributed error. Values of sigma range from 0.1 to 1.0 for the ten curves shown. (see Color Plate 5)
252
Chemometrics in Spectroscopy
being exactly −Er is exactly zero, therefore in reality an infinity will not occur. Hence the integral, representing the average of what will actually occur, remains finite. There are other factors, also. One factor is that, as we consider two values of Er at equal magnitude and opposite directions from Er , we realize that as the two values get closer to Er there is less room for the nonlinearities to act, therefore the magnitudes of the two values of fEr ) become more and more nearly the same, and since they have opposite sign cancel each other more and more exactly. Secondly, why do the curves pass through a maximum and then go to zero as Er approaches −1? If we look at Figure 43-5, and particularly at the expanded plot, we see that the asymmetry of the Normal curve with respect to the function f (Er causes the cross-product of the two curves (which, after all, is what is being integrated) to exhibit a fairly large area between the peak of the Normal curve and where the curve of f (Er ) really “takes off” that has no counterpart in the region where f (Er ) is negative. This creates a net positive contribution to the integral. As Er approaches −1, the Normal curve “slides under” f (Er ), and there is an increasing contribution from the negative portion of f (Er ), until symmetry assures us that when Er = −1 there is always a negative contribution of f (Er ) to cancel each positive contribution, so that T W = 0 at that point. Thirdly, when we separated equation 43-51 into two terms, we only worked with the first term. The second term, which we presented in equation 43-52B, was neglected. Is it possible that the nonlinear effects observed for equation 43-52A will also operate on equation 43-52B? The answer is yes, it will, but And the “but ” is this: Es is a random variable, just as Er is. Furthermore, it is uncorrelated with E r . Therefore, in order to evaluate the integral representing the variation of both Es and Er , it would be necessary to perform a double integration over both variables. Now, for each value of Es , the nonlinearity caused by the presence of Er in the denominator would apply. However, Es is symmetrically distributed around zero, therefore for every positive value of Es there is an equal but negative value that is subject to exactly the same nonlinear effect. The net result is that these pairs always form equal and opposite contributions to the integral, which therefore cancel, leaving no effect due to Es . We have analyzed the effect that noise has on the computed transmittance, just as we previously analyzed the effect that the sample transmittance has on the computed noise value. We can experimentally measure the variation in noise level due to the sample transmittance. On the other hand, we will not be able to realize the effect of noise on the computed transmittance, for reasons we will discover in our next chapter, which will deal with the noise of the transmittance when the energy is low, or the noise is high, so that again we cannot make the “low noise” approximation we made previously.
REFERENCES 1. 2. 3. 4.
Mark, Mark, Mark, Mark,
H. H. H. H.
and and and and
Workman, Workman, Workman, Workman,
J., J., J., J.,
Spectroscopy Spectroscopy Spectroscopy Spectroscopy
15(10), 24–25 (2000). 15(11), 20–23 (2000). 15(12), 14–17 (2000). 3(1), 44–48 (1988).
44
Analysis of Noise: Part 5
This chapter is the continuation of Chapters 40–43 from a set of articles [1–4] dealing with the rigorous derivation of the expressions relating the effect of instrument (and other) noise to their effects to the spectra we observe. Chapter 40 in this set was an overview; since then we have been analyzing the effect of noise on spectra, when the noise is constant detector noise, that is, noise that is independent of the strength of the optical signal. Inasmuch as we are dealing with a continuous set of chapters (40 through 53) on the same subject, we continue our discussion by serially numbering our equations, figures, use of symbols, and so on. as though there were no break across these chapters. It seems we said something wrong. When we first began this series of chapters (starting at 40) dealing with the effects of various kinds of noise on spectra [1, 2], we said that there does not seem to have been any recent attention paid to the question of noise in spectra. It turns out that that is not quite true. Edward Voigtman pointed out that in fact, he had performed and published computer simulation studies of just this subject [5, 6]. His studies were based on computer simulations of the behavior of various analytical instruments in various situations using a simulation engine described in an Analytical Chemistry Report [7]. In addition to the simulations of spectrometers, he also published simulations of polarimeters [8, 9] with results that are interesting, if not of direct application to our current study. The diagrams he published [5] clearly show the difference in the optimum absorbance values (i.e., minimum relative absorbance error) between these simulations and the conventional theory in use previously. Unfortunately the noise levels of the simulations were too high to precisely determine the actual minimum. When Dr. Voigtman contacted us to inform us of these papers, we discussed the results he obtained, and he revealed that due to the limitations of the computer hardware available at the time the simulations were performed, he could not use more than a few hundred repeats of the Monte-Carlo experiments, resulting in the high noise levels observed. Having seen our early Chapters 40 and 41 dealing with this topic from the papers first published [1, 2], he reprogrammed his simulation engine to perform new simulations and compared the results with the exact solution we derived (see equation 41-19 [2]), and with new hardware allowing use of much more extensive Monte-Carlo calculations, he found excellent agreement (E. Voigtman, 2001, personal communication). We are grateful to Dr Voigtman for pointing out the previous literature that we had missed, as well as sharing the results of his new simulations with us. Now let us recap where we came from in our discussion, in this mini-series-within a-book, and where we are going. In Chapter 41, referenced in [2] we demonstrated that, because previous treatments of this topic failed to take into account the effect of the noise of the reference reading, they did not come up with the rigorously correct formula to describe the effect of transmittance on the computed value of the noise. The rigorously exact solution to this situation shows that the noise level of a transmittance
254
Chemometrics in Spectroscopy
spectrum increases with the transmittance of the sample, rather than being independent of the sample characteristics, as previously thought. We then continued the development of those equations in Chapter 42 [3] to show the effect of the random noise on absorbance spectra, and on the relative precision: SDA/A, in both cases comparing the result of the rigorous treatment of the topic to the previous mathematical analysis, and showing that in both cases, the results from the rigorous treatment differ slightly but noticeably from the previous results. Finally we developed and solved the equations for the minimum in the curve of SDA/A, this being the generally accepted criterion for determining the best value of transmittance (or absorbance) that a sample should have, to obtain the most accurate results from this form of spectroscopic chemical analysis. Our conclusion here was that the optimum value of transmittance under these conditions, that is constant detector noise, is approximately 33 %T rather than the previously accepted 36.8 %T . We next noted in Chapter 43 [4] that all the results obtained up until that point were relevant only to the condition where the detector noise was small compared to the reference signal, and therefore the S/N ratio was high. We then noted that if that condition did not hold for any particular set of measurements, then other phenomena also come into action. We then pointed out that under low-noise conditions the signal can affect the noise level, but under conditions where the signal is weak or the noise excessive, the noise can affect the computed transmittance, as well. The expressions we obtained showed that as the reference signal gets weaker and weaker (or the noise gets larger and larger), the system first reaches a point where the expected value of T is larger than Es /Er and as the reference signal continues to decrease, the multiplying factor first goes through a maximum and then decreases, so that the expected value of T approaches zero as Er as the reference signal energy approaches zero. We are now ready in this chapter to consider the behavior of the noise under conditions where it is not small compared to the signal. We start with the definition of transmittance, as we pointed out previously, and we rewrite the equation here: T=
Es Er
(44-6)
To put equation 44-6 into a usable form under the conditions we wish to consider, we could start from any of several points of view: the statistical approach of Hald (see [10], pp. 115–118), for example, which starts from fundamental probabilistic considerations and also derives confidence intervals (albeit for various special cases only); the mathe matical approach (e.g., [11], pp. 550–554) or the Propagation of Uncertainties approach of Ingle and Crouch ([12], p. 548). In as much as any of these starting points will arrive at the same result when done properly, the choice of how to attack an equation such as equation 44-6 is a matter of familiarity, simplicity and to some extent, taste. At this point, however, we again need to take cognizance of comments we received after the material of this chapter was published as a column. One of our respondents noted that the analysis performed could be done in a different way, a way which might be superior to the way we did it. Normally, if we agree with someone who takes issue with our work we would simply publish a correction, or, when rewriting the material for this book, use the corrected form (as we have done in various places). In this case, however, that seems inappropriate, for several reasons. First, we are not convinced that
Analysis of Noise: Part 5
255
our original approach is “wrong”, therefore we wish to retain it. Secondly, some of our readers may wish to refresh themselves about our original material. Thirdly, some of our readers may wish to compare the two approaches for themselves, to decide if the original one is “wrong” or simply “not as good”, or whether, in fact, the new analysis is better. Therefore we present, at this point, the original analysis of the situation, the same way it was presented in the original column except, perhaps, for some minor enhancements in the wording to improve the comprehensibility. Later on in this chapter, under the heading “Alternate Analysis” we present the new analysis, as recommended. Therefore, continuing as we originally did, we note that we, being chemists and spectroscopists, and writing for spectroscopists, will use the Propagation of Uncertainties approach of Ingle and Crouch: FC D =
fC D fC D C + D C D
(44-64)
Note that we use the letters C, D to represent the variables in equation 44-64 to avoid confusion with our usage of A to mean absorbance. Applying this to equation 44-6: T =
Es /Er Es /Er Es + Er Es Er
(44-65)
Es −Es Er + Er Er2
(44-66)
T = As usual, we take the variance of this:
� VarT = Var
Es −Es Er + Er Er2
�
And apply first, the theorem that VarA + B = VarA + VarB: � � � � −Es Er Es + Var VarT = Var Er Er 2
(44-67)
(44-68)
and then the theorem that VaraX = a2 VarX: VarT =
� � −Es 2 1 E + Var Er Var s Er2 Er2
and continue as before by setting Es = Er = E: � � Es2 1 + Var E VarT = Er 2 Er4
(44-69)
(44-70)
and finally take square roots to obtain: � SDT =
1 Es2 + SD E Er2 Er4
(44-71)
256
Chemometrics in Spectroscopy
This is clearly a function of both Er and Es ; in the regime we are concerned with in this chapter, however, as Er approaches 0, the second term under the radical dominates the expression, although clearly the point at which the numerical value becomes large com pared to 1/Er 2 will depend on the value of Es as well, or equivalently, the transmittance of the sample. Here, again, therefore, the behavior of the noise of the transmittance must be expressed as a family of curves. Figures 44-7 and 44-8 present the behavior of this family of curves. Note that equation 44-71 can be reduced to equation 41-19 [2], which is appropriate when the signal-to-noise ratio is high and may be considered constant. Under these conditions Er is large and the second term under the radical is small and the first of Es , dominates; then the noise of the term under the radical, which is independent √ transmittance increases with T as 1 + T 2 and inversely with the reference energy. Here, however, under low-signal/high-noise conditions, where the variation of Er cannot be ignored and therefore the S/N ratio varies, we must use the full expression of equation 44-71. Note further that when Er is small enough, as we noted above, the second term under the radical dominates, then � T2 T SDT = SD E = SD E (44-72) 2 Er Er The noise of the transmittance thus becomes directly proportional to T and inversely proportional to Er . Under these conditions; the noise of the transmittance approaches infinite values as Er approaches zero, even as the expected value of the transmittance approaches zero, as we saw in Chapter 43 [4]. To summarize the effects at low signal-to-noise to compare with the high signal-to-noise case summarized above, here the noise of the transmittance increases directly with T and still inversely with the reference energy. We now wish to follow through, as we did before, on finding the “optimum” value for sample transmittance under these conditions. To do this, we start with equation 44-24 (reference [3]): � � −04343Er Er Es − Es Er (44-24) A = Er Er + Er Es This is the point at which, in the previous development, we considered the effect of letting Er become negligible, but of course in this case we wish to investigate the small-signal/large-noise behavior. We now, therefore, go directly to dividing A by A (from equation 44-20b): � � −04343Er Er Es − Es Er A Es Er Er + Er = (44-73) A −04343 ln T � � A Er Es − Es Er Er (44-74) = A Es Er ln T Er + Er � � A 1 Er Es −Er = + (44-75) A ln T Es Er + Er Er + Er
Analysis of Noise: Part 5
1 A 1 Es −Er = + A T ln T Er + Er ln T Er + Er
257
(44-76)
To determine the variance of A/A we perform our usual exercise of taking the variance of both sides of equation 44-76 and applying our two favorite theorems; the result is � � � � � � � � �2 �2 A 1 Es 1 −Er Var + Var (44-77) = Var A T ln T Er + Er ln T Er + Er We cannot simplify this equation further; in particular, we cannot separate out the variances of Es and Er , n in order to replace them with the same generic value. To determine the variance of A/A, that is the relative precision (in chemists terms), we need to evaluate the variance of the two terms in equation 44-77. As we had observed previously, as the value of Er approaches −Er , the value of the expressions attains infinite values. However, a difference here is that when computing the variance, these values are squared, and hence the computations are always done using positive values. This differs from out previous case, where the presence of both positive and negative values afforded the opportunity for cancellation of near-infinite contributions; we do not have that situation here. Therefore we are faced with the possibility that the variance will be infinite. An empirical test of this possibility was performed by computing values of the variance of the two terms in equation 44-77. The Normal random number generator of MATLAB was used to create multiple values of Normally distributed random numbers for Er and Es ; these were plugged into the two expressions of equation 44-77 and the variance computed. Values between 100 and 106 were used in each computation of the variance. When Er was more than five standard deviations away from the center of the Normal distribution representing Er , the computed variance was fairly small and reasonably stable, and decreased as Er was moved further away from the center of Er . This might be considered an empirical determination of the point of demarcation of the “small-signal” case. When Er was moved below five standard deviations, the computed value of the variance became very unstable; computed values of the variance would differ by as much as four orders of magnitude. The closer Er came to Er , the more erratic the computed variance became. It was clear that bringing Er close to the center of Er afforded more opportunity for a given reading of the noise to become close to −Er , thus giving a value approaching infinity that would be included in the calculation. Furthermore, for a given relationship between Er and Er , the more readings that were included in the computation, the higher the values of variance that would be calculated. For example, with 100 readings, values of variance might fall between 101 and 104 , while with 10,000 readings calculated variance values would fall in the range of approximately 103 and 106 . This is attributed to the increased likelihood of more data points being close to −Er and also of at least a few points being closer to −Er than with fewer data. Another test of whether the variance actually diverges and becomes infinite is the same as the test we applied in the previous chapter: to integrate the expressions in equation 44-77 in a small region around the point Er = −Er using different intervals of integration and see if the values converge or diverge. Basically, except for a multiplying factor these are both the same expression, so evaluating the expression once suffices to settle the question for both of them. Furthermore, since we are integrating over values of
258
Chemometrics in Spectroscopy
Table 44-2 Value of integral of 1/Er 2 over range −001 to +001 Integration interval 10−2 10−3 10−4 10−5 10−6 10−7
Value of integral 2.0000000000000000e+002 3.0995354623330845e+003 3.2699678003698089e+004 3.2878691333625099e+005 3.2896681436917488e+006 3.2898481337470137e+007
variance, the expression that needs to be integrated is 1/Er 2 . The result of performing this test is presented in Table 44-2. In contrast to the previous test results, the values are clearly growing increasingly larger without bound as the integration interval is reduced. The conclusion from all this is that the variance and therefore the standard deviation attains infinite values when the reference energy is so low that it includes the value zero. However, in a probabilistic way it is still possible to perform computations in this regime and obtain at least some rough idea of how the various quantities involved will change as the reference energy approaches zero; after all, real data is obtained with a finite number of readings, each of which is finite, and will give some finite answer; what we can do for the rest of this current analysis is perform empirical computations to find out what the expectation for that behavior is; we will do that in the next chapter.
ALTERNATE ANALYSIS Here we present the revised analysis of the situation of the effect on the expected noise level of noise that is not small compared to the signal level Er . Before we proceed, however, there is a technical point we need to clear up. This is the numbering of the equations, figures, etc. The previous column/chapter ended with equation 43-63. Therefore it is appropriate to begin the analysis with equation 43-64, as we did above in this chapter, and in the original analysis published in the columns. For obvious reasons, however, we cannot simply repeat using the same equation/figure/etc. numbers that we did above. Neither can we simply continue from the last number used in the first analysis, above, because then we would have to renumber all equations, figures, etc., for the rest of this series of chapters. While laborious, that could be done, but would raise another, insoluble, problem: it would put the numbering of the equations, etc., out of synchronization with the numbering of the original columns. Therefore anybody reading the later chapters and wishing to compare them with the original columns will find that task well-nigh impossible. Fortunately, none of the equations developed in this chapter, nor the figures, used any suffix, as was occasionally done in previous chapters (we do refer to equation 42-20b above, but that equation is in a previous chapter and we will not repeat the use of equation 42-20 here. We will also copy equation 43-52b from the previous column, but the b suffix does not signify a new equation, since it is the equation used previously; also, a b suffix is not indicative of a copy of an equation number in this section). Therefore, we can distinguish the numbering of any equations or other numbered entities in this
Analysis of Noise: Part 5
259
section by appending the suffix “a” to the number, without causing confusion with other corresponding entities. Now we are ready to proceed. We reached this point from the discussion just prior to equation 44-64, and there we noted that a reader of the original column felt that equation 44-64 was being incorrectly used. Equation 44-64, of course, is a fundamental equation of elementary calculus and is itself correct. The problem pointed out was that the use of the derivative terms in equation 44-64 implicitly states that we are using the small-noise model, which, especially when changing the differentials to finite differences in equation 44-65, results in incorrect equations. In our previous column [4] we had created an expression for T + T (as equa tion 44-51) and separated out an expression for T (as equation 44-52b). We present these two equations here: T + T =
Es Es + Er + Er Er + Er
(44-51)
from which we concluded that: T =
Es Er + Er
(44-52b)
At this point we would like to compute the variance of T , but simply computing s would also not be correct, since it would ignore the influence the variance of E E r +Er of the variability of the first term in equation 44-51 [4], and not take its contribution to the variance into proper account. Therefore the expression for T in equation 44-52b is not correct, even though it is the result of the formal breakup of equation 44-51 [4]. We should be using a formula such as: T =
Es Es + Er + Er Er + Er
(44-64a)
in order to include the variability of the first term, also. This, however, leads to another problem: subtracting equation 44-64a from equa tion 44-51 leaves us with the result that T = 0. Furthermore, the definition of T gives us the result that Es is zero, and that therefore T is in fact equal to the expression given by equation 44-52b; anyway despite our efforts to include the contribution to the variance of the first term in equation 44-51. Our conclusion is that the original separation of equation 44-51 into two equations, while it served us well for computing TM and TA , fails us here. This is because Es and Er are random variables and we cannot treat their influences separately; we have no expectation that they will either cancel or reinforce each other, wholly or partially, in any particular measurement. Therefore when we compute the variance of T we wish to retain the contribution from both terms. This also raises a further question: the analysis of equation 44-52a by itself served us well, as we noted; but was it proper, or should we have maintained all of equation 44-51, as we find we must do here? The answer is yes, it was correct, and the justification is given toward the end of the previous column [4]. The symmetry of the expression when
260
Chemometrics in Spectroscopy
averaged over values of Es means that the average will be zero for each value of Er , and therefore the average of the entire second term will always be zero. Therefore, the best way to maintain the entire expression is to go back still a further step, and note that the ultimate source of equation 44-51 was equation 44-5 [2]: T + T =
Es + Es Er + Er
(44-5)
Therefore we solve equation 44-5 for T and, noting the definition of T , we find: T =
Es + Es Es − Er + Er Er
(44-65a)
Then we take the variance of both sides: � VarT = Var
Es + Es Es − Er + Er Er
� (44-66a)
Once again applying the rule that the variance of a sum is the sum of the variances, we obtain: � � � � E E + Es + Var s (44-67a) VarT = Var s Er + Er Er Since Es /Er is the true transmittance of the sample, the value of T for a given sample is constant, and therefore the variance of that term is zero, resulting in: �
E + Es VarT = Var s Er + Er
� (44-68a)
The variables in equation 44-68a are again not separable. While we could formally split equation 44-68a into the sum of two variances: � VarT = Var
Es Er + Er
�
� + Var
Es Er + Er
� (44-69a)
that would not be correct because the two variances that we wish to add have a common term Er + Er and therefore are not independent of each other, as application of the rule for adding variances requires [2]. Also, evaluation of a variance by integration requires the integral of the square of the varying term, which as we have seen previously [13] is always positive and therefore the integrals of both terms of equation 44-69a
diverge.
Thus we conclude that we must compute the variance of T directly from equation
44-68a and the definition of variance:
n �
VarT =
i=1
��
� � ��2 Es + Es Es + Es − Er + Er Er + Er n−1
(44-70a)
Analysis of Noise: Part 5
261
We can learn something interesting by again noting, as we did previously [4], that Es has a mean of zero, therefore equation 44-70a becomes: n �
VarT =
��
i=1
� � ��2 Es + Es Es − Er + Er Er + Er n−1
(44-71a)
and by splitting up the first term in the numerator of equation 44-71a into its two parts: n �
VarT =
��
i=1
Es Er + Er
�
�
Es + Er + Er
�
�
Es − Er + Er
��2 (44-72a)
n−1
and rearranging the terms: n �
VarT =
i=1
��
Es Er + Er
�
�
Es − Er + Er
�
�
Es + Er + Er
��2
n−1
(44-73a)
and again using the definition of variance: � VarT = Var
Es Er + Er
n �
� +
i=1
�
Es Er + Er n−1
�2 (44-74a)
and then the definition of the average value: � VarT = Var
Es Er + Er
� +
� �2 n Es n − 1 Er + Er
(44-75a)
Where we note that the limit of n/n − 1 → 1 as n becomes indefinitely large. Of course, the noise level we want will be the square root of equation 44-75a. We have previously seen, in equation 44-77 [13], that the variance term in equation 44-75a diverges, and clearly, as Er → −Er the second term in equation 44-75a also becomes infinitely large. However, as we discussed at the conclusion of the original analysis, using finite differences means that the probability of a given data point having Er close enough to −Er to cause a problem is small, especially as Er increases. This allows for the possibility that a finite value for an integral can be computed. To recapitulate some of that here, it was a matter of noting two points: first, that as Er gets further and further away from zero (in terms of SD) it becomes increasingly unlikely that any given value of Er will be close enough to Er to cause trouble. The second point is that, in a real instrument there is, of necessity, some maximum limit on the value that 1/Er − Er can attain, due to the inability to contain an actually infinite number. Therefore it is not unreasonable to impose a corresponding limit on our calculations, to correspond to that physical limit. We now consider how to compute the variance of T , according to equation 44-68a. Ordinarily we would first discuss converting the summations of finite differences to
262
Chemometrics in Spectroscopy
integrals, as we did previously, but we will forbear that, leaving it as an exercise for the reader. Instead we will go directly to consideration of the numerical evaluation of equation 44-68a, since a conversion to an integral would require a back-conversion to finite differences in order to perform the calculations. We wish to evaluate equation 44-68a for different values of Es and Er , when each is subject to random variation. Note that VarEs = VarEr , we cannot simply set the two terms equal to a common generic value of E as we did previously, since that would imply that the instantaneous values of Es and Es were the same, but of course they are not since we assume that they are independent noise contributions, although they have the same variance. Under these conditions it is simplest to work with equation 44-68a itself, rather than any of the other forms we found it convenient to convert equation 44-68a into, for the illustrations of the various points we presented and discussed. There are still a variety of ways we can approach the calculations. We could assume that Es or Er were constant and examine how the noise varies as the other was changed. We could also hold the transmittance constant and examine how the transmittance noise varies as both Es and Er are changed proportionately. What we will actually do here, however, is all of these. First we will assume that the ratio of Es /Er , representing T , the true transmittance of the sample, is constant, and examine how the noise varies as the S/N ratio is changed by varying the value of Er , for a constant noise contribution to both Es and Er . The noise level itself, of course, is the square root of the expression in equation 44-67a: � � � E + Es SD T = Var s (44-76a) Er + Er To do the computations, we again use the random number generator of MATLAB to produce Normally-distributed random numbers with unity variance to represent the noise; values of Er will then directly represent the S/N ratio of the data being evaluated. For the computations reported here, we use 100,000 synthetic values of the expression on the RHS of equation 44-76a to calculate the variance of, for each combination of conditions we investigate. A graph of the transmittance noise as a function of the reference S/N ratio is presented in Figure 44-7a-1 and the expanded portion of Figure 44-7a-1, shown in Figure 44-7a-2. The “true” transmittance Es /Er was set to unity (i.e., 100%T ). The inevitable existence of a limit on the value of TM , as described in the section following equation 44-75a was examined in Figure 44-7a-1 by performing the computa tions for two values of that limit, by setting the limit value (somewhat arbitrarily, to be sure) to 1,000 and 10,000, corresponding to the lower and upper curves, respectively. Note that there are effectively two regimes in Figure 44-7a-1, with the transition between regimes occurring when the value of S/N ratio equals approximately 4. When the value of Er was greater than approximately four, i.e., the S/N ratio was greater than four, the curves are smooth and appear to be well-behaved. When Er was below an S/N of four, the graph entered a regime of behavior that shows an appreciable random component. The transition point between these two regimes would seem to represent an implicit definition of the “low noise” versus the “high noise” conditions of measurement. In the low-noise regime the transmittance noise decreases smoothly and continuously as
Analysis of Noise: Part 5
263
140
Transmittance noise
120
100
80
60
40
20 0
0
1
2
3
4
5
6
7
8
9
10
S/N (Er /ΔEr)
Figure 44-7a-1 Transmittance noise as a function of reference S/N ratio, for alternate anal ysis (equation 44-68a). The sample transmittance was set to unity. The limit for the value of Es + Es /Er + Er was set to 10,000 for the upper curve and to 1000 for the lower curve. (see Color Plate 6)
1.2
Transmittance noise
1
0.8
0.6
0.4
0.2
0 4
5
6
7
8
9
10
S/N (Er /ΔEr)
Figure 44-7a-2 Expansion of Figure 44-7a-1. (see Color Plate 7)
the S/N ratio increases. This was verified by other graphs (not shown) that extended the value of S/N ratio beyond what is shown here. The “high-noise” regime seen in Figure 44-7a-1 is the range of values of S/N ratio where the computed standard deviation is grossly affected by the closeness of the approach of individual values of Er to Er . This is, in fact, a probabilistic effect, since
264
Chemometrics in Spectroscopy 140
120
Transmittance noise
Monto-Carlo (equation 44-76a) 100
80
Theory (equation 44-19) Approx (equation 44-52b)
60
40
20
0
0
1
2
3
4
5
6
7
8
9
10
S/N (Er /ΔEr)
Figure 44-8a Comparison of empirically determined transmittance noise value with those determined according to the low-noise approximations of equation 44-19 and equation 44-52b. (see Color Plate 8)
it depends not only on how closely the two numbers approach each other, but also on how often that occurs; a single or only a few “close approaches” will be lost in a large number of readings where that does not happen. As we will see below, there is indeed a regime where the theoretical “low-noise” approximation differs from the results we find here, without becoming randomized. Changing the number of values of Es + Es /Er + Er used for the computa tion of the variance made no difference in the nature of the graph. As is the case in Figure 44-7a-1, the transition between the low- and high-noise regimes continues to occur at a value between 4 and 5. Figure 44-8a shows the graph of transmittance noise computed empirically from equation 44-76a, compared to the transmittance noise computed from the theory of the low-noise approximation, as per equation 44-19 [2] and the approach, under question, of using equation 44-52b. We see that there is a third regime, where the difference between the actual noise level and the low-noise approximation is noticeable, but the computed noise has not yet become subject to the extreme fluctuations engendered by the too-close approach of Er to Er . Since the empirically determined curve approaches the theoretical curve asymptotically as the S/N increases, where the separation becomes “noticeable” will depend on how hard you look, but there is certainly a region in which this occurs, in any case. This is the situation we alluded to above, representing the “middle ground” of the transmittance noise. Figure 44-9a-1 shows what happens to the noise level, for the same condition of constant “sample transmittance” as a function of S/N, for different values of sample transmittance. As we see, in the “low noise” regime the noise has the behavior we have derived for it. However, the effect of the exaggeration of the random variations very quickly takes over, and in the “high noise” regime there is virtually no difference in the
Analysis of Noise: Part 5
265
140
Transmittance noise
120 100 80 60 40 20
0
0
1
2
3
4
5
6
7
8
9
10
S/N (Er /ΔEr)
Figure 44-9a-1 Transmittance noise as a function of reference S/N ratio, at various val ues of sample transmittance. Blue curve: T = 1. Green curve: T = 0.5. Red curve: T = 0.1. (see Color Plate 9)
1.2 1.1
Transmittance noise
1 0.9 0.8 0.7 0.6 0.5 0.4
T=1
0.3
T = 0.5
0.2
T = 0.1 4.2
4.4
4.6
4.8
5
5.2
S/N (Er /ΔEr)
Figure 44-9a-2 Expansion of Figure 44-9a-1. (see Color Plate 10)
noise behavior at different values of transmittance, since that is now dominated by the divergence of the integrals involved. A verification of the effects is seen in Figures 44-9a-1 and 44-9a-2; which is also an investigation that is part of our original plan, and is presented in Figure 44-10a where we present a graph showing the transmittance noise as a function of the sample transmittance Es /Er . As we see, except for the occasional spike, when the S/N ratio is
266
Chemometrics in Spectroscopy S/N = 4 1.2 1.1
Transmittance noise
1 0.9
S/N = 4.5
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transmittance
Figure 44-10a Transmittance noise as a function of transmittance, for different values of refer ence energy S/N ratio (recall that, since the standard deviation of the noise equal unity, the set value of the reference energy equals the S/N ratio). (see Color Plate 11)
5 and even when it is only 4.5, the transmittance noise varies essentially as we saw in working out the exact solution for transmittance noise in the low-noise case. Naturally, the underlying transmittance noise value is higher when the reference S/N ratio is lower. When the S/N ratio decreases to 4, then “spikes” happen frequently enough that it becomes almost impossible to tell where the “underlying” transmittance noise level is, since the computed values are again dominated by the divergent integrals.
ABSORBANCE NOISE IN THE “HIGH NOISE” REGIME Just as equation 41-5, which led to equation 44-76a, was the starting point for investigat ing the behavior of transmittance noise in the high noise regime, so too is equation 42-24 the starting point for investigating the behavior of absorbance noise in the high noise regime. While we presented equation 42-24 above, in the original analysis, we did not follow through to investigate its behavior, since we went directly to the analysis of the behavior of VarA/A instead. Therefore we present equation 44-24 again, and take this opportunity to investigate it: � � −04313Er Er Es − Es Er A = (44-24) Es Er Er + Er We therefore take the variance of A: � � �� −04313Er Er Es − Es Er VarA = Var Es Er Er + Er
(44-77a)
Analysis of Noise: Part 5
267
Then we multiply through: � VarA = Var
−04313Er2 Es − Er Es Er Es Er Er + Er
� (44-78a)
Using the definition of variance, we get: n �
VarA =
��
i=1
� � ��2 −04313Er2 Es − Er Es Er −04313Er2 Es − Er Es Er − Es Er Er + Er Es Er Er + Er n−1
(44-79a)
Again, the mean value of Es and Es are both zero; therefore the mean term of equation 44-79a vanishes, leaving us with: � �2 n −04313Er2 Es − Er Es Er � Es Er Er + Er i=1 VarA = n−1
(44-80a)
Again we see that the variance of the absorbance equals n − 1/n times the mean value of the summand of equation 44-80a, and also that we can ignore the premultiplier term n − 1/n for large values of n. We begin our investigation of the behavior of the absorbance noise by comparing it to the theoretical expectation from the low-noise condition according to equation 42-32 [3]. This comparison is shown in Figures 44-11a-1 and 44-11a-2. These figures show what we might expect: that as the S/N increases the computed value approaches the theoretical 8
7
Absorbance noise
6
5
Computed 4
3
2
Theory 1
0
0
5
10
15
20
25
30
35
40
45
50
S/N (Er /ΔEr)
Figure 44-11a-1 Comparison of computed absorbance noise to the theoretical value (accord ing to equation 44-32), as a function of S/N ratio, for constant transmittance (set to unity). (see Color Plate 12)
268
Chemometrics in Spectroscopy
0.35
Absorbance noise
0.3 0.25 0.2
Computed
0.15
Theory
0.1 0.05 0 5
10
15
20
25
30
35
40
45
S/N (Er /ΔEr)
Figure 44-11a-2 Expansion of Figure 44-11a-1. (see Color Plate 13)
value for the low-noise approximation, and also an excessive bulge at very low values of S/N, apparently similar to the abnormally large values observed in the behavior of the transmittance at very low values of S/N. After performing this comparison, we will not pursue the analysis any further, since we will obtain the results we would expect to get from the analysis of the transmission behavior. There is, however, something unexpected about Figure 44-11a-1. That is the decrease in absorbance noise at the very lowest values of S/N, i.e., those lower than approxi mately Er = 1. This decrease is not a glitch or an artifact or a result of the random effects of divergence of the integral of the data such as we saw when performing a similar computation on the simulated transmission values. The effect is consistent and reproducible. In fact, it appears to be somewhat similar in character to the decrease in computed transmittance we observed at very low values of S/N for the low-noise case, e.g., that shown in Figure 43-6.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Mark, H. and Workman, J., Spectroscopy 15(10), 24–25 (2000). Mark, H. and Workman, J., Spectroscopy 15(11), 20–23 (2000). Mark, H. and Workman, J., Spectroscopy 15(12), 14–17 (2000). Mark, H. and Workman, J., Spectroscopy 16(2), 44–52 (2001). Voigtman, E., Analytical Instrumentation 21(1&2), 43–62 (1993). Voigtman, E., Analytical Chemistry 69(2), 226–234 (1997). Voigtman, E., Analytical Chemistry 65, 1029A–1035A (1993). Voigtman, E., Analytical Chemistry 64, 2590–2598 (1992). Voigtman, E., Analyst 120(February), 325–330 (1995). Hald, A., Statistical Theory with Engineering Applications (John Wiley & Sons, Inc., New York, 1952).
Analysis of Noise: Part 5
269
11. Korn, G.A. and Korn, T.M., Mathematical Handbook for Scientists and Engineers, 1st ed. (McGraw-Hill Book Company, New York, 1961). 12. Ingle, J.D. and Crouch, S.R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988). 13. Mark, H. and Workman, J., Spectroscopy 16(4), 34–37 (2001).
This page intentionally left blank
45
Analysis of Noise: Part 6
This chapter is the continuation of Chapters 40–44 referenced from their original papers [1–5] dealing with the rigorous derivation of the expressions relating the effect of instrument (and other) noise to their effects on the spectra we observe. Chapter 40 in this noise series was an overview; since then we have been analyzing the effect of noise on spectra, when the noise is constant detector noise, that is noise that is independent of the strength of the optical signal. Inasmuch as we are dealing with a continuous set of chapters, we again continue our discussion by serially numbering our equations, figures, and use of symbols, and so on as though there were no break. We left off in our previous Chapter 44 with having concluded that the noise level becomes infinite, both for individual noise pulses, and for the variance of the noise, when value of the reference signal actually crosses zero and becomes negative; we learned this from the following equation, which we reproduce from our previous chapter: �
A Var A
�
� =
1 T ln T
�2
�
Es Var Er + Er
�
�
1 + ln T
�2
�
−Er Var Er + Er
� (45-77)
and we showed that both variance terms become infinite at sufficiently small values of Er . However, that still leaves open the question of the behavior of the noise while the reference signal is not quite low enough to become infinite, but still small enough for the noise level to not be considered completely negligible. First of all, we must note that the two terms of equation 45-77 are not exactly the same. While we tested the behavior of the expressions using a random number generator that produces a Normal distribution of numbers with unity variance, the variance of the entire term is not necessarily unity, especially when, as in the second term of equation 45-77, the same random variable appears in both the numerator and the denominator. The first task, then, is to compare the behavior of those two terms. It was necessary to empirically determine the variances of the two terms in equation 45-77 for comparison. To do this, 10,000 random values for Er , created by the MATLAB random number generator to be Normally distributed with variance = 1, were used for each of the two terms in equation 45-77, then the variance is computed for various values of Er between 3 and 20. A different set of 10,000 random numbers were used for each different value of Er . Figure 45-9 presents the two curves obtained. It is clear that, while the variance of Er /Er − Er ) is larger than that of Es /Er − Er when Er is small, the two curves converge for values of Er above approximately 8 times the variance of the noise. From this it would seem, then, that when the reference signal is at least approximately 3 times its noise level as measured by its standard deviation, we are entering the “low-noise” regime that we discussed previously in Chapters 41 and 42, where the approximations made there apply [2, 3].
272
Chemometrics in Spectroscopy Variances of the two terms in equation 45-77 8 7
Variance
6 5 4 3 2 1 20
18
19
18
17
16
15
16
14
14
12
13
11
11
9
10
9
7
8
6
7
5
4
3
4
0
Er Expansion of plot of terms in equation 45-77 0.50 0.45 0.40
Variance
0.35
Er /(Er – ΔEr)
0.30
Er /(Er – ΔEr)
0.25 0.20 0.15 0.10 0.05
20
19
18
17
17
16
15
14
13
13
12
11
10
9
9
8
7
6
5
5
4
3
0.00
Er
Figure 45-9 Values of the variance of Er /Er − Er ) and Es /Er − Er ) for various value of Er , with a Normal distribution of values for the errors.
Now, in this regime, where the two variances become equal we can again equate Es and Er and replace them both with a generic term, E, then the variance can be factored from equation 45-77: � � � � �� �2 � �2 � A −E 1 1 Var = + Var (45-78) A T ln T Er + E ln T so that now, when standard deviations are taken, it can be put into terms of the standard deviation of the expression involving the generic E. However, that only addresses the limiting case. We are interested in the behavior of the standard deviation of A/A in this whole intermediate regime, so that we can determine the optimum sample transmittance, just as we did before, for data measured
Analysis of Noise: Part 6
273
in the regime where signal is always much greater than the noise. This also assumes that we can assign a meaning to the word “optimum”, in a situation where the noise is comparable to or even greater than the signal. But that is a philosophical question, which we will not attempt to address here; we want to simply follow where the mathematics lead us. Since we can, however, compute the variances corresponding to the two terms in equation 45-77 for various values of Er , we can plot the family of curves of SD(A/A, with Er as the parameter of the family. Since the two variances are, in the regime of interest, unequal and are multiplied by different functions of T , it is not unreasonable to expect that the minima of those curves corresponding to different members of the family will occur at different values of T . Figure 45-10 presents this family, for values of Er between 3 and 10, and for %T between 0.1 and 0.9. It is clear that there is indeed a family of curves. However, the variation on the ordinate is due mainly to the changes in signal-to-noise ratio as Er decreases. What is of more concern to us here is whether the value of %T at which the curve passes through a minimum changes, and if so how, as Er changes. To this end, the program that computed the curves in Figure 45-10 was modified, and instead of simply computing the values of variance it also computed the derivative (estimated as the first difference) of those curves, and then solved for the value at which the derivative was zero, for the various values of Er . The results are shown in Figure 45-11. It is obvious that for values of Er greater than five (standard deviations of the noise), the optimum transmittance remains at the level we noted previously, 33 %T . When the reference energy level falls below five standard deviations, however, the “optimum” transmittance starts to decrease. The erratic nature of the variance at these low values of Er , however, makes it difficult to ascertain the exact amount of falloff with any degree of precision; nevertheless it is clear that as much as we can talk about an optimum transmittance level under these conditions, where variance can become infinite and the actual transmittance value itself is affected, it decreases at such low values of Er . Nevertheless, a close look reveals that when 12.00 10.00
Er = 10
SD (A)/A
8.00
Er = 3 6.00 4.00 2.00
0.86
0.82
0.78
0.74
0.7
0.66
0.62
0.58
0.54
0.5
0.46
0.42
0.38
0.34
0.3
0.26
0.22
0.18
0.1
0.14
0.00
%T
Figure 45-10 Family of curves for SD(A/A for different values of Er . (see Color Plate 14)
274
Chemometrics in Spectroscopy Optimum transmittance using 5,000 values in variance computation 0.40 0.35
Optimum %T
0.30 0.25 0.20 0.15 0.10 0.05 10.0
0 0
9.60
10.0
9.20
8.80
8.40
8.00
7.60
7.20
6.80
6.40
6.00
5.60
5.20
4.80
4.40
4.00
3.60
3.20
2.80
2.40
2.00
1.60
1.20
0.80
0.40
0.00
0.00
Er Optimum transmittance using 100,000 values in variance computation 0.40 0.35
Optimum %T
0.30 0.25 0.20 0.15 0.10 0.05 9.60
9.20
8.80
8.40
8.00
7.60
7.20
6.80
6.40
6.00
5.60
5.20
4.80
4.40
4.00
3.60
3.20
2.80
2.40
2.00
1.60
1.20
0.80
0.40
0.00
0.00
Er
Figure 45-11 Optimum transmittance as a function of Er .
Er has dropped to five standard deviations, the optimum transmittance has dropped to 3.2, and then drops off quickly below that value. Surprisingly, the optimum value of transmittance appears to reach a minimum value, and then increase again as Er continues to decrease. It is not entirely clear whether this is simply appearance or actually reflects the correct description of the behavior of the noise in this regime, given the unstable nature of the variance values upon which it is based. In fact, originally these curves were computed only for values of Er equal to or greater than three due to the expectation that no reasonable results could be obtained at lower values of Er . However, when the unexpectedly smooth decrease in the optimum value of %T was observed down to that level, it seemed prudent to extend the calculations to still lower values, whereupon the results in Figure 45-11 were obtained. Verifying the nature of the curve for at least two sets of variances, calculated from different numbers of random values, was necessary in light of the larger values of
Analysis of Noise: Part 6
275 Variances using 5,000 and 100,000 values
20,000 18,000 16,000
Variance
14,000
Er, 100,000 values
12,000 10,000 8,000
Es, 100,000 values
6,000 4,000 2,000
9.65
9.30
8.95
8.60
8.25
7.90
7.55
7.20
6.85
6.50
6.15
5.80
5.45
5.10
4.75
4.40
4.05
3.70
3.35
3.00
0
Er
Expansion of plot 0.20 Er term, 100,000 values
Variance
0.15
Es term, 100,000 values 5,000 values
0.10
0.05
9.65
9.30
8.95
8.60
8.25
7.90
7.55
7.20
6.85
6.50
6.15
5.80
5.45
5.10
4.75
4.40
4.05
3.70
3.35
3.00
0.00
Er
Figure 45-12 Values of the variances in the two terms of equation 45-77, using different numbers of values. (see Color Plate 15)
variance for the two terms of equation 45-77 encountered when more values were included in the calculation, as described above. However, as Figure 45-12 shows, at even moderate values of Er , all the calculated values of the variance converge. From Figure 45-12 , it appears that once the signal level has fallen low enough to include zero with non-negligible probability, the optimum transmittance varies randomly between zero and a well-defined upper limiting value. This upper limit varies in a well-defined manner, from 0.3 at large values of signal as we saw previously, through a minimum at roughly 2.5 standard deviations above zero. In fact, while it does not seem possible to observe this directly. However, comparing Figure 45-12 with the results we found for the maximum value for computed transmittance under high-noise conditions (see Figure 45-6 and the discussion of that) it would not be surprising if the minimum actually occurred when the signal was 2.11 standard deviations above zero.
276
Chemometrics in Spectroscopy
The overall conclusion of all this work is that it is surely unfortunate that the effect of noise in the reference reading was not considered for lo these many a year, since that is where all the action seems to be. We continue in our next chapter by considering a special case of constant noise, with characteristics that give somewhat different results than the ones we have obtained here.
REFERENCES 1. 2. 3. 4. 5.
Mark, Mark, Mark, Mark, Mark,
H. H. H. H. H.
and and and and and
Workman, Workman, Workman, Workman, Workman,
J., J., J., J., J.,
Spectroscopy Spectroscopy Spectroscopy Spectroscopy Spectroscopy
15(10), 24–25 (2000). 15(11), 20–23 (2000). 15(12), 14–17 (2000). 16(2), 44–52 (2001). 16(4), 34–37 (2001).
46 Analysis of Noise: Part 7
This chapter is the continuation of Chapters 40–45 found as papers first published as [1–6] dealing with the rigorous derivation of the expressions relating the effect of instrument (and other) noise to their effects to the spectra we observe. Our first chapter in this set was an overview; since then we have been analyzing the effect of noise on spectra, when the noise is constant detector noise, that is noise that is independent of the strength of the optical signal. As we do in each chapter in this section of the book we take this opportunity to note that we are dealing with a continuous set of chapters, and so we again continue our discussion by continuing our equation numbering, figure numbering, use of symbols, and so on as though there were no break. We left off in Chapter 45 with having found an expression for the optimum value of transmittance, in situations where the noise is large compared to the signal (or, alterna tively, where the signal is small enough to be comparable to the noise), a regime we have investigated for the previous three chapters. Most of the derivations and mathematical analyses we have done so far have been very general, applying to any and all types of noise that might be superimposed on the spectral signal, as long as the noise level was constant and independent of the signal level. Stating it somewhat more rigorously, we assumed that regardless of the signal level, the noise contribution to each measured value represented a random sample taken from a fixed population of such values. In particular, for the most part we made no assumptions about the distribution of the values in the population of the noise readings. In Chapters 43–45 [6], however, we found it necessary to introduce the assumption that the noise was Normally distributed, in order to be able to determine the expected value for the average transmittance and for the expected standard deviation of the noise level in the case where the signal level was small enough to be comparable to the noise. The Normal distribution is, of course, an important and a common distribution to solve for in this development, but there is another important case where a noise contribution also has a constant standard deviation (i.e., independent of the signal level) but does not have a Normal distribution. These days, this contribution is probably almost as common as the ones having the Normal distribution, although it is not as obvious. Also, it is arguably less important than the other contributions, one reason being that it usually (at least in well-designed instruments) will be swamped out by the other noise sources, and therefore rarely observed. Nevertheless, this contribution does exist and therefore is worthy of being treated in this compilation of the effects of noise, if only for the purpose of completeness. This source of noise is not usually called noise; in most technical contexts it is more commonly called “error” rather than noise, but that is just a label; since it is a random contribution to the measured signal, it qualifies as noise just as much as any other noise source. So what is this mystery phenomenon? It is the quantization noise introduced by the analog-to-digital (A/D) conversion process, and is engendered by the fact that for
278
Chemometrics in Spectroscopy
any analog signal with a value between two adjacent levels that the A/D converter can assign, the difference between the actual value of the electrical voltage and the value represented by the assigned digital value is an error, or noise, and the distribution of this error is uniform. In the past, when instruments were not computer-controlled and all signal processing was done using analog circuits, digitization was not an important consideration. Nowadays, however, since almost all instruments use computerized data collection, this noise source is much more important, since it is so much more common than it used to be. The situation is illustrated in Figure 46-13. The actual voltage is a continuous, linear physical phenomenon. The values represented by the output of the A/D converter, however, can only take discrete levels, as illustrated. The double-headed arrows represent the error introduced by digitizing the continuous physical voltage at various points. The error cannot be greater than 1/2 the difference between representing adjacent levels of the converter; if the voltage increases beyond 1/2 the difference between levels, then the conversion will provide the next step’s representation of the value. Furthermore, if the sampling point is random with respect to the A/D conversion levels, as happens, for example, with any varying signal, then the actual voltage at the sampling point can be anywhere between two adjacent levels with equal probability, therefore the error (or noise) introduced will be uniformly distributed between +1/2 and −1/2 of the step size. This can happen even in the absence of other noise sources; as long as the signal varies, as it would, say, when a source is modulated. In that case, then, the measurement points will have a random relationship to the digitization levels. This effect could conceivably even become observable as the dominant error source, if the instrument has an extremely low noise level (a favorable case) or toolarge differences between A/D levels due to the A/D converter having too few bits (an unfavorable case).
Measured value
A/D step Error
Actual voltage
Applied voltage
Figure 46-13 The actual voltage is a continuous, linear function. The values represented by the output of the A/D converter, however, can only take discrete levels. The double-headed arrows represent the error introduced by digitizing the continuous physical voltage at various points.
Analysis of Noise: Part 7
279
EFFECT OF NOISE ON COMPUTED TRANSMITTANCE Therefore it is necessary at this point to repeat the investigations we did for Normally distributed noise, but to consider the effect of range-limited, uniformly distributed, noise. We will find that investigating this special case is relatively simple compared to the previous derivations, both because the expressions we find are much simpler than the previous ones and also because we have previously derived much of what we need here, and so can simply start at an appropriate point and continue along the appropriate path. The point in our previous discussions where the distribution of the noise was found to matter was the point at which we had to introduce the distribution of the errors in the first place; all previous discussion, derivations, and so on prior to that were independent of the distribution of the errors. That point was equation 43-60 in Chapter 43 first published as [4], where we introduced the weighted average in order to be able to compute the expected value for the measured transmittance, under conditions where the signal was small enough to be comparable to the noise. So let us repeat our previous work, starting at the appropriate point, and investigate both the computed transmittance and the noise of the transmittance, when the noise and signal have comparable magnitudes, but the noise is now uniformly distributed: � Wxfxdx (46-60) XW = � Wxdx In the case we investigated there, we had previously derived that the calculated trans mittance for an individual reading was fx =
Es Er + Er
(46-52a)
and in that case, we set the weighting function Wx to be the Normal distribution. We are now interested in what happens when the weighting function is a uniform distribution. Therefore the formula for the expected value of the mean transmittance, found by using equation 46-52a for fx and (1) for Wx in the interval from −1/2 to +1/2 (and zero outside that interval), becomes � 1/2 TWU =
−1/2
Es 1 dEr Er + Er � 1/2 1dEr −1/2
(46-79)
In equation 46-79, TWU represents the mean computed transmittance for Uniformly distributed noise and the parenthesized (1) in both the numerator and the denominator is a surrogate for the actual voltage difference between successive values represented by the A/D steps: essentially a normalization factor for the actual physical voltages involved. In any case, if the actual voltage difference were used in equation 46-79, it would be factored out of both the numerator and the denominator integrals, and the two would then cancel. Since the denominator is unity in either case, equation 46-79 now simplifies to � 1/2 Es TWU = dEr (46-80) −1/2 Er + Er
280
Chemometrics in Spectroscopy
Equation 46-80 is of reasonably simple form; indeed, the evaluation of this integral is considerably simpler than when the noise was Normally distributed. Not only is it possible to evaluate equation 46-80 analytically, it is one of the Standard Forms for indefinite integrals and can be found in integral tables in elementary calculus texts, in handbooks such as the Handbook of Chemistry and Physics and other reference books. The standard form for this integral is �
1 1 dx = ln a + bx a + bx b
To convert equation 46-80 to its Standard Form, we simply move Es outside the integral, whereupon equation 46-80 becomes TWU = Es
�
1/2 −1/2
1 dEr Er + Er
(46-81)
by setting a = Er and b = 1, the integral of equation 46-81 is 2 TWU = Es ln Er + Er 1/ −1/2
(46-82)
On setting Es = TEr and expanding equation 46-82 out by substituting the limits of integration: �� �� �� �� � � 1� 1� (46-83) TWU = TEr ln ��Er + �� − TEr ln ��Er − �� 2 2 From equation 46-83 we see that expectation for the measured value of TW is proportional to the true value of T (i.e., Es /Er , multiplied by a multiplier that is a function of Er . Figure 46-14 presents this function. Just as the expected value for transmittance (TW 2.5
Multiplication factor
2
1.5
1
0.5
2.3
2.4
2.2
2
2.1
1.9
1.8
1.7
1.5
1.6
1.4
1.2
1.3
1
1.1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
0.1
0
Er
Figure 46-14 Plot of the multiplication factor of equation 46-83 as a function of Er . Abscissa unit is the difference between digitization levels.
Analysis of Noise: Part 7
281
in the case of Normally distributed noise went through a maximum, so too does the expected value for uniformly distributed noise, and the multiplier approaches unity at large values of Er , as it should. We note, however, that the value of the function at Er = 05 is not a valid value. When Er = 05, the argument of the logarithm in the second term of equation 46-83 is zero, and the value of the log becomes undefined. The value approaches an asymptote at Er = 05, indicating the mathematical undecidability of the value of the function, even though an actual physical A/D converter will indeed produce one or the other value at that point.
COMPUTED TRANSMITTANCE NOISE Here again, our task is simplified by the two facts we have mentioned above: first, that we can reuse many of the results we obtained previously for the case of Normally distributed noise, and second, that the nature of uniformly distributed noise characteristics simplify the mathematical analysis. Our first step in this analysis starts with equation 44-71, that we derived previously in Chapter 44 referenced as [5] as a general description of noise behavior: � 1 E2 SDT = + s 4 SD E (44-71) from Chapter 44 2 Er Er In our previous development, we presented a family of curves, corresponding to different values of SD(E. In the case of uniformly distributed noise, which is of necessity contained within a limited range √of values, the well-known fact that the standard deviation of the noise equals the range/ 12 helps us, in that it requires only one curve to display, rather than a family of curves. ([7], p. 146). For this case, then, equation 44-71 becomes equation (46-84): � 1 1 Es2 SDT = √ + (46-84) 12 Er2 Er4 where the unit of measure for Es and Er is the digitization interval of the A/D converter. We forebear plotting this function since it is simply one of the family we have presented previously in Chapter 44, as Figures 44-1 and 44-3 (referenced in [5]). Similarly, in Chapter 44, we have previously derived the absorbance noise and relative absorbance noise, and presented those as equations 44-24 and 44-77, respectively. �
A Var A
�
� =
1 T lnT
�2
�
Es Var Er + Er
�
�
1 + lnT
�2
�
−Er Var Er + Er
� (44-77)
In order to evaluate equation 44-77 it is necessary to assume a distribution for the variability of Es and Er , and in the earlier chapter the distribution used was the Normal distribution; here, therefore, we want to now evaluate this function for the case of a uniform distribution. We note here that much of the discussion in the earlier chapter concerning the evaluation of equation 44-77 applies now as well, so it behooves
282
Chemometrics in Spectroscopy Variances for uniformly distributed noise 2.0
Variance
1.5 1.0 0.5
9.9
9.3
8.8
8.3
7.7
7.2
6.6
6.1
5.5
5.0
4.4
3.9
3.3
2.8
2.2
1.7
1.1
0.6
0.0
0.0
Er
Figure 46-15 Values of the variance of Er /Er − Er ) and Es /Er − Er ) for various value of Er , with a uniform distribution of values for the errors.
the reader to review the procedures used there, and also in Chapter 45, immediately preceding this one (first published as [6]), since we will apply those procedures again, with the difference that we will use a uniform distribution for the variability of the noise terms. Figures 44-6 and 44-1 from our Chapter 44 (referenced as [5]) are unchanged, since they do not depend on the distribution of the errors. The figure corresponding to Figure 45-9 (which appeared in Chapter 45 [6]) that was calculated for Normally distributed noise is Figure 46-15, which presents the results of calculating the variance of the two terms of equation 44-77 for uniformly distributed noise instead. We note that while these terms follows the same trends as the Normally distributed errors, these errors do not become appreciable until Er has fallen below 0.6, which corresponds to the point where values occur close to or less than zero. For values of Er below 0.6 the values of both terms of equation 44-77 become very large and erratic. Following along the developments in Chapter 45, we find that the plot of A/A depends on T , but the variance terms that depend on Er as the parameter are essen tially independent of T . Therefore we expect that the plots of A/A as a function of T will result in a family of curves similar to what we found in Figure 45-11, but different in the values of A/A. However, Figure 45-11 shows only the net result of seeking the minimum of the function; it does not reveal the nature of the curves con tributing to the erratic behavior of the minimum. Therefore, we now present a set of the curves for which the minimum can be found, in Figure 46-16. We see in Figure 46-16A that while the behavior of the curve of A/A is systematic when Er is large enough for the variance to remain small, Figure 46-16B shows how the erratic behav ior of the two standard deviation terms in equation 44-77 result in a set of curves that form a family, but an erratic family rather than a well-ordered and well-behaved family. At this point we have completed our analysis of spectral noise for the case where the noise is constant (or at least independent of the signal level). Having completed this part of the analyses originally proposed in Chapter 40 (referenced as [1]) we will continue by doing a similar analysis for a complicated case.
Analysis of Noise: Part 7
283
(a) Er
Er
Er
Er
Er
Er
1.00000E – 01
2.00000E – 01
3.00000E – 01
4.00000E – 01
5.00000E – 01
6.00000E – 01
0.001
1.07737E + 08
2.68976E + 03
9.07148E + 02
4.86867E + 02
2.99824E + 02
1.96293E + 02
0.002
3.32775E + 07
8.30808E + 02
2.80198E + 02
1.50383E + 02
9.26091E + 01
6.06308E + 01
0.003 0.004
1.69267E + 07 1.05393E + 07
4.22594E + 02 2.63126E + 02
1.42524E + 02 8.87422E + 01
7.64928E + 01 4.76280E + 01
4.71060E + 01 2.93304E + 01
3.08401E + 01 1.92025E + 01
0.005
7.32527E + 06
1.82886E + 02
6.16802E + 01
3.31038E + 01
2.03861E + 01
1.33467E + 01
0.006 0.007
5.45604E + 06 4.26147E + 06
1.36219E + 02 1.06395E + 02
4.59413E + 01 3.58831E + 01
2.46567E + 01 1.92585E + 01
1.51842E + 01 1.18598E + 01
9.94105E + 00 7.76459E + 00
0.008
3.44565E + 06
8.60277E + 01
2.90140E + 01
1.55718E + 01
9.58951E + 00
6.27823E + 00
0.009
2.86035E + 06
7.14152E + 01
2.40858E + 01
1.29268E + 01
7.96068E + 00
5.21184E + 00
0.010
2.42412E + 06
6.05245E + 01
2.04128E + 01
1.09555E + 01
6.74670E + 00
4.41706E + 00
0.011
2.08898E + 06
5.21577E + 01
1.75910E + 01
9.44109E + 00
5.81408E + 00
3.80647E + 00
0.012
1.82508E + 06
4.55692E + 01
1.53690E + 01
8.24853E + 00
5.07967E + 00
3.32566E + 00
0.013
1.61296E + 06
4.02735E + 01
1.35830E + 01
7.28997E + 00
4.48937E + 00
2.93919E + 00
0.014
1.43948E + 06
3.59426E + 01
1.21224E + 01
6.50605E + 00
4.00662E + 00
2.62314E + 00
0.015
1.29549E + 06
3.23479E + 01
1.09100E + 01
5.85540E + 00
3.60593E + 00
2.36081E + 00
0.973
0.919
0.865
0.811
0.757
0.703
0.649
0.595
0.541
0.487
0.433
0.379
0.325
0.271
0.217
0.163
0.109
0.055
50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 0.001
Δ(A)/A
(b)
T
Figure 46-16 The behavior of the family of curves of A/A. Figure 46-16a shows the systematic behavior obtained when Er is greater than 0.2 (in this case 02 < Er < 1). Figure 46-16b shows the erratic behavior obtained when Er is less than 0.2, in this case 006 < Er < 02.
REFERENCES 1. 2. 3. 4. 5. 6. 7.
Mark, H. and Workman, J., Spectroscopy 15(10), 24–25 (2000). Mark, H. and Workman, J., Spectroscopy 15(11), 20–23 (2000). Mark, H. and Workman, J., Spectroscopy 15(12), 14–17 (2000). Mark, H. and Workman, J., Spectroscopy 16(2), 44–52 (2001). Mark, H. and Workman, J., Spectroscopy 16(4), 34–37 (2001). Mark, H. and Workman, J., Spectroscopy 16(5), 20–24 (2001). Ingle, J. D. and Crouch, S. R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988).
This page intentionally left blank
47
Analysis of Noise: Part 8
This chapter further continues the set of chapters 40 through 46 first published as [1–7] dealing with the rigorous derivation of the expressions relating the effect of instrument (and other) noise to their effects to the spectra we observe. Our Chapter 40 was an overview; since then we have been analyzing the effect of noise on spectra by considering the case when the noise is constant detector noise, that is noise that is independent of the strength of the optical signal, which is the typical behavior of detectors for the IR and near-IR. As we do in each chapter in this section of the book we take this opportunity to note that we are dealing with a continuous set of chapters, and so we again continue our discussion by continuing our equation numbering, figure numbering, use of symbols and so on as though there was no break in the chapters. However, this chapter differs somewhat from the previous seven chapters in that, as we will see shortly, we will be performing parts of the same derivations all over again. Therefore, when we re-use previously derived equations, we will use the same equation numbers as we did for the original derivation. When we change course from the previous derivation, then we will number the equations starting with the next higher equation number from the last one we used (which we will note was equation 46-84 [7]). This procedure will also allow us to use some of our previous results to save time and space, allowing us to move along somewhat faster without sacrificing either rigor or detail. We left off in Chapter 46 by noting that we had just about exhausted the topic of the constant-noise (and by implication, a relatively “simple”) case (although not completely, in fact: there is still more to be said about the constant noise case, but that is for the future, right now it is time to move on), with the threat to begin discussion of a complicated case. Whether in fact it is more complicated than what we have been discussing remains to be seen; the question of whether something is “complicated” and “difficult” is partially subjective, since it depends on the perceptions of the person doing the evaluating. Something that is “difficult” for one may be “easy” for another because of a better background or more familiarity with the topic. Be that as it may, having decided to move on from the constant-detector-noise case, there remained the question of what to move on TO, that is which of the ten or so types of noise we originally brought up [1] should be tackled next. Tossing a mental coin, the decision was to analyze the case of noise proportional to the square root of the signal. This, as you will recall, is Poisson-distributed noise, characteristic of the noise encountered when the limiting noise source is the shot noise that occurs when individual photons are detected and represent the ultimate sensitivity of the measurement. This is a situation that is fairly commonly encountered, since it occurs, as mentioned previously, in UV-Vis instrumentation as well as in X-ray and gamma-ray measurements. This noise source may also enter into readings made in mass spectrometers, if the detection method includes counting individual ions. We have, in
286
Chemometrics in Spectroscopy
fact, discussed some general properties of this distribution quite a long time ago (see [8] or p. 175 in [9]). Now, we are not particular experts in X-ray and gamma-ray spectroscopy (nor mass spectroscopy, for that matter), but our understanding of those technologies is that they are used mainly in emission mode. Even when the exciting source is a continuum source, such as is found when an X-ray tube is used to produce the exciting X-rays for an X-ray Fluorescence (XRF) measurement, the measurement itself consists of counting the Xrays emitted from the sample after the sample absorbs an X-ray from the source. These measurements are themselves the equivalent of single-beam measurements and will thus also be Poisson-distributed in accordance with the basic physics of the phenomenon. The interesting parts occur when we calculate the transmittance (or reflectance) or absorbance of the sample under consideration, and therefore we must take a dual-beam measurement (or, at least the logically equivalent measurement of sample and reference readings) and compute the transmittance/reflectance or absorbance from those readings. Therefore, while the underlying physics results in the same form of noise characteristic in all those technologies, our results will be applicable mainly to UV-Vis measurements, where the quantity actually of interest is the amount of energy removed from the optical beam by absorption in the sample. Therefore, for the mathematical development we wish to pursue, we will again assume (as we did for the constant-noise case) that we are measuring transmittance through a clear (non-scattering) solution, and that Beer’s law applies. Examining Ingle and Crouch ([10], p. 152) we find the same situation as we found for constant detector noise: the computed noise of absorbance values does not take into account the effect of the noise of the reference reading. Hence, we can expect the results of our derivations to differ from the classic values for this situation as it did for the constant-detector noise case. We have recently found out and it is interesting to note, however, that in a much more obscure part of the book [10], in Table 6-2, there are expressions for absorbance noise that include terms for the noise of both sample and reference beam readings. The expressions given there are very complicated, since they include the combined effect of several different noise sources. However, since the main discussion in that book does not deal with the broader picture, the relegating of the full expression to such an obscure part of the book with no pointer to it in the text causing it to be missed, we are forced to treat Poisson noise as though it too, has not been derived for the full situation despite our finding it in that table. Indeed, the main discussion in Chapter 5 gives expressions, and results that, as we shall see, conform to the expressions obtained when the reference noise is neglected. Also, we just received a last-minute bulletin: one of the authors of [10] has kindly pointed out a typographical error in Table 6-2, so that we might put the matter right. The T within the parenthesis in the first expression for sT should be squared; this will correct an otherwise erroneous result that might be derived from that expression (J.D. Ingle, 2001, personal communication). With this correction, the expression in Table 6-2 results in exactly the same expression we obtained in our own derivation for the constant-noise case [2]. We begin, as we did before with the basic expression for the transmittance of a sample; since this is a repeat of previous equations we use the same numbers instead of starting with new numbering for the same equations: T=
Es − E0s Er − E0r
(47-1)
Analysis of Noise: Part 8
287
and, with the addition of noise affecting the computation of T : T + T =
Es + Es − E0s + E0s Er + Er − E0r + E0r
(47-2)
At this point we make a slight alteration to what we did previously. Strictly speaking we are being slightly premature here, but the gain in simplification of the equations more than compensates for the slight departure from complete rigor. Since the noise for the pure Poisson case is related to the signal, the noise at zero signal is zero; that is E 0s and E 0r are both zero. Therefore, for this case Es = E s and Er = E r . With this substitution, we can write equation 47-4 unchanged; however, we must keep in mind the difference in the meaning of these two terms (Es and Er ) compared to the meaning in the previous chapters. Hence, T + T =
Es − E0s + Es Er − E0r + Er
(47-4)
From this point, up to and including equation 47-17, the derivation is identical to what we did previously. To save time, space, forests and our readers’ patience we forbear to repeat all that here and refer the interested reader to Chapter 41 referenced as [2] for the details of those intermediate steps, here we present only equation 47-17, which serves as the starting point for the departure to work out the noise behavior for case of Poisson-distributed detector noise: � �2 � � 1 −T 2 VarT = VarEs + VarEr (47-17) Er Er This is the point at which we must depart from the previous work. At this point in the previous (constant-noise) case we noted that SD(Es = SDEr and therefore we set both of those quantities equal to SD(E); We cannot make this equivalency in this case, since the noise values (or, at least, the expected noise values) will in general NOT be equal except when Es = Er , that is the transmittance (or reflectance) of the sample is unity. Poisson-distributed noise, however, has an interesting characteristic: for Poissondistributed noise, the expected standard deviation of the data is equal to the square root of the expected mean of the data ([11], p. 714), and therefore the variance of the data is equal (and note, that is equal, not merely proportional) to the mean of the data. Therefore we can replace Var(Es ) with Es in equation 47-17 and Var(Er ) with Er : � �2 � � 1 −T 2 VarT = Es + Er (47-85) Er Er The next transformation we are going to have to do in really tiny little baby steps, lest we be accused of doing something illegal to equation 47-85: VarT =
Es Er T 2 + Er 2 Er 2
(47-86)
T2 T + E r Er
(47-87)
VarT =
288
Chemometrics in Spectroscopy
And upon converting variance to standard deviation: � T +T2 SDT = Er
(47-88)
Compare equation 47-87, for Poisson noise with equation 47-18, or equation 47-88 with equation 47-19 as we derived for constant detector noise [2]. Equation 47-88 has also been previously derived by Voigtman, it turns out [12], in the course of his √ simulation studies. We note that now, instead √ of varying over a relative range of 1 to 2, the noise will vary over a range of zero to 2 as the sample transmittance varies from zero to unity. What is even more interesting is that nowhere in equation 47-88 is there a term representing the S/N (or N/S) ratio, as we found in equation 47-19. This is because the noise level of a detector with Poisson-distributed noise is predetermined by the signal level, and was implicitly introduced with which we substituted Es and Er for Var(Es ) and Var(Er ) in equation 47-85. Therefore the shape of the transmittance noise curve as a function of sample transmittance is constant (as it was for the case of constant noise). However, as equation 47-88 shows, the value of the noise is scaled by the reference signal, and varies inversely with the square root of the reference signal. We present the curve of SD(T ) as a function of T in Figure 47-17. From Figure 47-17 we note several ways in which the behavior of the transmittance noise for the Poisson-distributed detector noise case differs from the behavior of the constant-noise case. First we note as we did above that at T = 0 the noise is zero, rather than unity. This justifies our earlier replacement of E0 by E0 for both the sample and the reference readings. Second, we note that the curve is convex upward rather than concave upward. Third we note that for values of T greater than roughly 0.25, the curve appears almost linear, at least to the eye. This is a consequence of the fact that, at small values of T , the square of T inside the radical becomes negligible√compared to T , causing the overall value of the curve to be roughly proportional to T , while at large values of T , the Poisson-distributed transmittance noise 1.6 1.4
Relative noise
1.2 1 0.8 0.6 0.4 0.2
%T
Figure 47-17 Standard deviation of T as a function of T .
0.99
0.95
0.9
0.86
0.81
0.77
0.72
0.68
0.63
0.59
0.5
0.54
0.45
0.41
0.36
0.32
0.27
0.23
0.18
0.14
0.09
0.05
0
0
Analysis of Noise: Part 8
289
square √ term dominates, causing the overall value of the curve to be roughly proportional to T 2 , or, in other words, roughly proportional to T . Another issue to bring up is the question of units. In the case of constant noise, as expressed by equation 47-19, T was dimensionless, being a ratio of two numbers (Es and Er with the same units, whatever those units might be, and the other term in equation 47-19: SD(Er /Er is also a ratio of two numbers with the same units. In equation 47-88, on the other hand, T is still dimensionless, but Er is not dimen sionless; since it is a measurement, it must have units. The question of the units of Er bring us to an important caveat concerning the interpretation of equation 47-88 and Figure 47-17. First, to answer the question of units, we recall that the Poisson distribution applies to measurements for X-ray, UV, and visible detectors, and the reason that distribution applies is because it is the distribution describing the behavior of the number of discrete events occurring in a given time interval; the actual data, then, is the number of counts occurring during the measurement time. The unit of Er , then, is the absolute number of counts, and this brings us to our caveat. Equation 47-88 and Figure 47-17 are presented as describing a continuous series of values, and if Er is sufficiently large (large enough that a change of 1 count is small compared to the total number of counts), these equations and figures are a good approximation to a continuum. However, suppose Er is small. Let us pick a small number and see what happens: let us say Er is five. That means that the reference reading is five counts. Now it is immediately clear that we simply cannot have any value of T along the X-axis of Figure 47-17. Since Es can take only integer values (0, 1, 2, 3, ) T can take only discrete values of 0, 0.25, 0.5, 0.75, and unity, since you cannot have a fraction of a count as data. For those values of T , Figure 47-17 will provide an accurate measure of the expected value for SD(T ), but not necessarily the actual value you will measure in any particular measurement. This is a result of the randomness inherent in the measurement and the discreteness of the measurement of Es as well as Er . We discussed these issues a long time ago, when our series was still called “Statistics in Spectroscopy” rather than its current appellation of “Chemometrics in Spectroscopy”; we recommend our readers to go back and reread those columns, or the book that they were collected into [9], or any good book about elementary Statistics. Another consequence of the behavior of the Poisson distribution is that for small values of Er , the N/S ratio becomes large, to the point where values of T appreciably greater than unity may be measured. For example, if Er = 5 as we presented just above, the standard deviation of Er can be calculated as SD(Er = 223. Given a ±2 standard deviation range, we can expect (truncating to the nearest integer) that values of Es (when T = 1) as high as 5 + 2 × 223 = 5 + 4 = 9 counts will be observed, corresponding to a calculated value of T = 9/5 = 18 Furthermore, one of the steps taken during the omitted sequence between equation 47-4 and equation 47-17 was to neglect Er compared to Er . Clearly this step is also only valid for large values of Er , both for the case of constant detector noise and for the current case of Poisson-distributed detector noise. Therefore, from both of these considerations, it is clear that equation 47-88 and Figure 47-17 should be used only when Er is sufficiently large for the approximation to apply. Therefore our caveats. Equation 47-88 and Figure 47-17 are best reserved for cases of high signal, where the continuum approximation will be valid.
290
Chemometrics in Spectroscopy
Now that we have completed our expository interlude, we continue our derivation along the same lines we did previously. The next step, as it was for the constantnoise case, is to derive the absorbance noise for Poisson-distributed detector noise as we previously did for constant detector noise. As we did above in the derivation of transmittance noise, we start by repeating the definition and the previously derived expressions for absorbance [3]. A = − logT
(47-20a)
A = −04343 lnT
(47-20b)
We take the derivative dA = −04343
dT T
(47-21)
and substitute the expressions for T (47-6) and dT , replacing the differentials by finite differences: so that we can use the expression for T found previously (J.D. Ingle, 2001, personal communication): � −04343 A =
Es Er Er Es − Er Er + Er Er Er + Er Es Er
� (47-22)
Again in the interests of saving time, space, and so on, we skip over the repetition of the intermediate steps between equation 47-22 and equation 47-29: � VarA =
−04343 Es
�2
� Var Es +
04343 Er
�2 Var Er
(47-29)
And again our departure from the derivation for the constant detector noise case is to note and use the fact that for Poisson-distributed noise, Var(Er = Er and Var(Es = Es : � VarA =
−04343 Es
�2
�
04343 Es + Er
�2 Er
(47-89)
And simplifying as we did above: VarA =
043432 043432 Es + Er 2 Es Er 2
(47-90)
043432 043432 + Es Er
(47-91)
VarA =
Analysis of Noise: Part 8
291
and since T = Es /Er , we solve for Es = TEr and substitute this into equation 47-91: VarA =
043432 043432 + TEt Er
VarA =
043432 Er
(47-92)
and factor out 0.43432 /Er : �
�
1 +1 T
(47-93)
and upon taking square roots: 04343 SDA = √ Er
�
1 +1 T
(47-94 – for Poisson noise)
Again we can compare the expression in equation 47-94 with the equivalent expres sion for the constant detector noise case, which starts with equation 42-32, also equation 47-32 [3]. � SDA = 04343SDE
1 1 + 2 2 Es Er
(47-32 – for constant noise)
It is instructive to put equation 47-32 into similar form as equation 47-94 – for Poisson noise by replacing Es with TEr : � 1 1 + T 2 Er 2 E r 2 � SDE 1 SDA = 04343 +1 Er T2
SDA = 04343SDE
(47-95 – for constant noise)
(47-96 – for constant noise)
Thus, in the constant-noise case the absorbance noise is again proportional to the N/S ratio, although this is clearer now than it was in the earlier chapter; there, however, we were interested in making a different comparison. The comparison of interest here, of course, is the way the noise varies as T varies, which is immediately seen by comparing the expressions in the radicals in equations 47-94 – for Poisson noise and 47-96. Also, as equation 47-94 shows, the absorbance noise is again inversely proportional to the square root of the reference signal, as was the transmittance noise. And once again we remind our readers concerning the caveats under which equation 47-94 is valid. We present the variation of absorbance noise for the two cases (equations 47-94 – for Poisson noise and 47-96, corresponding to the Poisson noise and constant noise cases) in Figure 47-18. While both curves diverge to infinity as the transmittance → 0 (and the absorbance → ), the situation for constant detector noise clearly does so more rapidly, at all transmittance levels. Again, we continue our derivations in our next chapter.
292
Chemometrics in Spectroscopy Absorbance noise
Relative absorbance noise
12 10 8 6
Constant noise
4 2 Poisson noise 1
0.9
0.95
0.85
0.8
0.75
0.7
0.6
0.65
0.5
0.55
0.45
0.4
0.35
0.3
0.2
0.25
0.1
0.15
0
%T
Figure 47-18 Comparison between absorbance noise for the constant-detector noise case and the Poisson-distributed detector noise case. Note that we present the curves only down to T = 0.1, since they both asymptotically → as T → 0, as per equations 94 and 96.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9.
Mark, H. and Workman, J., Spectroscopy 15(10), 24–25 (2000). Mark, H. and Workman, J., Spectroscopy 15(11), 20–23 (2000). Mark, H. and Workman, J., Spectroscopy 15(12), 14–17 (2000). Mark, H. and Workman, J., Spectroscopy 16(2), 44–52 (2001). Mark, H. and Workman, J., Spectroscopy 16(4), 34–37 (2001). Mark, H. and Workman, J., Spectroscopy 16(5), 20–24 (2001). Mark, H. and Workman, J., Spectroscopy 16(7), 36–40 (2001). Mark, H. and Workman, J., Spectroscopy 5(3), 55–56 (1990). Mark, H. and Workman, J., Statistics in Spectroscopy, 1st ed. (Academic Press, New York, 1991). 10. Ingle, J.D. and Crouch, S.R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988). 11. Hald, A., Statistical Theory with Engineering Applications (John Wiley & Sons, inc., New York, 1952). 12. Voigtman, E., Analytical Instrumentation 21(1&2), 43–62 (1993).
48
Analysis of Noise: Part 9
We keep learning more about the history of noise calculations. It seems that the topic of the noise of a spectrum in the constant-detector-noise case was addressed more than 50 years ago [1]. Not only that, but it was done while taking into account the noise of the reference readings. The calculation of the optimum absorbance value was performed using several different criteria for “optimum”. One of these criteria, which Cole called the Probable Error Method, gives the same results that we obtained for the optimum transmittance value of 32.99%T [2]. Cole’s approach, however, had several limitations. The main one, from our point of view, is the fact that he directed his equations to represent the absorbance noise as soon as possible in his derivation. Thus his derivation, as well as virtually all the ones since then, bypassed consideration of the behavior of noise of transmittance spectra. This, coupled with the fact that the only place we have found that presented an expression for transmittance noise had a typographical error as we reported in our previous column [3], means that as far as we know, the correct expression for the behavior of transmittance noise has still never been previously reported in the literature. On the other hand, we do have to draw back a bit and admit that the correct expression for the optimum transmittance has been reported. Not only that, but Cole points out and laments that, at that time, other scientists were already using the incorrect formulas for noise behavior. That means that the same situation that exists now, existed over 50 years ago, and in all the intervening time has not been corrected. This, perhaps, explains why the incorrect theory is still being used today. We can only hope that our efforts are more successful in persuading both the practitioners and teachers of spectroscopic theory to use the more exact formulations we have developed. Getting back to the current state of the columns, this column is one more in the set [2–9] dealing with the rigorous derivation of the expressions relating the effect of instrument (and other) noise to their effects to the spectra we observe. The impetus for this was the realization that the previously existing theory was deficient in that the derivations extant ignored the effect of noise in the reference reading, which turns out to have appreciable effects on the nature of the derived noise behavior. Our first chapter in this set [4] was an overview; the next six examined the effects of noise when the noise was due to constant detector noise, and the last one on the list is the first of the chapters dealing with the effects of noise when the noise is due to detectors, such as photomultipliers, that are shot-noise-limited, so that the detector noise is Poisson-distributed and therefore the standard deviation of the noise equals the square root of the signal level. We continue along this line in the same manner we did previously: by finding the proper expression to describe the relative error of the absorbance, which by virtue of Beer’s law also describes the relative error of the concentration as determined by the spectrometric readings, and from that determine the
294
Chemometrics in Spectroscopy
value of transmittance a sample should have in order to optimize the analysis, in the sense that the relative error of the concentration is minimized. As we do in each chapter in this section of the book we take this opportunity to note that we are dealing with a continuous set of chapters, and so we again continue our discussion by continuing our equation numbering, figure numbering, use of symbols, and so on as though there were no break, except that when we repeat an equation or series of equations that were derived and presented previously, we retain the original number(s) for those equation(s). So let us continue. We now wish to generate the expression for the relative error of the absorbance, A/A, which we again obtain by using the expression in equation 48-25 −04343 Er Es − Es Er (48-25) A = Es Er for A, and the expression in equation 42-20b: A = −04343 lnT , for A. This results in the same expression we obtained previously, which we present, as usual, without repeating all the intermediate steps: Es Er A 1 (48-36) = − A lnT Es Er We again go through the usual sequence of steps needed to pass to the statistical domain, which we do in detail here since, looking back we find that we had neglected to present them previously due to somewhat of a feeling of being rushed. First we take the variance of both sides of equation 48-36: Es Er A 1 − (48-97) = Var Var A lnT Es Er A 1 Es 1 Er Var = Var (48-98) − A lnT Es lnT Er Then we apply the theorem that Var(A + B) = Var(A) + Var(B): A 1 Es −1 Er Var = Var + Var A lnT Es lnT Er
(48-99)
And then we apply the theorem that, if a is a constant, then VaraX = a2 VarX: A 1 1 Var = Var Es + Var Er (48-100) 2 A E s lnT E r lnT 2 Again we use the property of the Poisson distribution that the variance of a value is equal to the value, so that Var(Es = Es and Var(Er = Er : A Es Er Var = + (48-101) A E s lnT 2 E r lnT 2 A 1 1 1 Var + (48-102) = A lnT 2 Es Er
Analysis of Noise: Part 9
295
and finally:
A SD A
1 1 + Es Er
1 = lnT
(48-103)
Interestingly, in Voigtman’s development of these equations, his expression correspond ing to equation 48-103 is missing the 1/Er term inside the radical, even though he arrived at the correct equation corresponding to equation 47-88, as we noted in Chapter 47 referenced as the paper [3]. There are now two ways to proceed with equation 48-103. One way is to replace T in the denominator with Es /Er , which makes it easier to compare with equation 42-37, which is the corresponding equation describing the constant-noise case. Alternatively, we can replace Es in the denominator of equation 48-103 with TEr , which is more convenient for plotting the expression. Since we wish to explore both phenomena, we will do both transformations of equation 48-103. First we will replace T in the denominator with Es /Er , which makes it easier to compare with equation 42-37:
A SD A
A SD A
1 = lnEs /Er 1 = lnEs /Er
Er E + s E s Er E s E r
(48-104)
Es + Er Es Er
(48-105)
Equation 48-105 is the closest we can come to the form of equation 42-37, so compare the functions describing the relative precision for the constant-noise case to that of the Poisson-noise case. To put equation 48-103 into a form easier to plot, we now replace Es in the denominator of equation 48-103 with TEr
A SD A
A SD A
A SD A
1 1 + TEr Er
1 = lnT 1 = lnT
1 Er
1 =√ Er lnT
(48-106)
1 +1 T
(48-107)
1 +1 T
(48-108)
Qualitatively we can note that equation 48-108 also passes through a minimum, since it will diverge as T → 0 (in the denominator of the radical) and also as T → 1, which causes lnT → 0. Again, we see that the actual value of the relative error is scaled inversely with the square root of the reference reading, as it did for both transmittance 1 1 and absorbance noise. We verify the behavior of equation 48-108 by plotting lnT +1 T 1 versus T in Figure 48-19 (actually, we plot lnT T1 + 1 , for reasons that will be
296
Chemometrics in Spectroscopy 3 2.5
SD(Δ(A))/A
2 1.5 1 0.5
0.53
0.48
0.505
0.43
0.455
0.405
0.38
0.33
0.355
0.28
0.305
0.23
0.255
0.18
0.205
0.155
0.13
0.08
0.105
0.055
0.03
0.005
0
%T
Figure 48-19 Relative absorbance precision for Poisson-distributed detector noise.
discussed below). Unsurprisingly, the optimum transmittance (roughly T = 011 from the data table used to plot Figure 48-19 ) differs appreciably from what was found for the corresponding situation when the detector noise was constant. The more interesting and important question, however, is how the value we arrived at compares with the “optimum” obtained from the previously derived expression, that neglected the effect of the noise in the reference reading. To continue, therefore, we proceed in the usual manner for finding a minimum: we take the derivative of equation 48-108 and then set the derivative equal to zero. Since equation 48-108 is complicated, and the derivative more so, we will generate the derivative in several steps:
d d 1 A 1 1 1 d (48-109) SD =√ +1 + +1 √ dT A T T dT Er lnT dT Er lnT d 1 d 1 1 d 1 1 A 1 1 SD =√ +1 + +1∗ √ dT A T Er lnT 2 1 + 1 dT T Er dT lnT T
(48-110) d d 1 A 1 SD = √ +1 dT A 2 Er lnT T1 + 1 dT T +
1 −1 d 1 lnT +1∗ √ 2 T Er lnT dT
(48-111)
Analysis of Noise: Part 9
297
− T1 + 1 d 1 A −1 1 = √ SD + √ ∗ 2 2 dT A lnT Er T 2 Er lnT T1 + 1 T
d A SD dT A
1 − +1 −1 T + = √ 2 √ 2T 2 Er lnT T1 + 1 T Er lnT
It will help our cause to factor out from equation 48-113 what we can ⎡ ⎤ 1 − + 1 d 1 A −1 T ⎢ ⎥ = √ + SD ⎣ ⎦ dT A lnT T Er lnT 2T 1 + 1 T and then combine the terms:
d A SD A dT
⎡
(48-112)
(48-113)
(48-114)
⎤ 1 + 1 + 1 1 − lnT T ⎢ ⎥ = √ + ⎣ ⎦ T Er lnT 2T lnT 1 + 1 1 2T lnT T + 1 T −2T
1 T
(48-115) ⎡ ⎤ 1 d A 1 ⎢ − lnT − 2T T + 1 ⎥ SD = √ ⎦ ⎣ dT A T Er lnT 2T lnT 1 + 1
(48-116)
T
Now we can set the derivative equal to zero: ⎡ 0=
1
⎤
⎢ − lnT − 2T T + 1 ⎥ ⎣ ⎦ T Er lnT 2T lnT T1 + 1 √
1
(48-117)
and simplify the expression: 0 = − lnT − 2T 0 = lnT + 2T + 2
1 +1 T
(48-118) (48-119)
Equation 48-119 is a much simpler equation than most of the ones we have had to deal with before, including equation 42-50 (which is the corresponding equation for the constant-detector-noise case [2]); nevertheless, it is still a transcendental equation and is best solved by successive approximations. The solution to 5 decimal places is 0.10886 , or 10.886 %T . The solution given by Ingle and Crouch for this case, which again, does not take into account the variation of the reference channel is 13.5%T ([10], p. 153).
298
Chemometrics in Spectroscopy
We therefore see that in this case also, neglecting the reference channel error also causes a noticeable change in the answer from the correct one. 1 To finish up this chapter, we discuss the use of lnT T1 + 1 as the expression we plotted in Figure 48-19. In passing from equation 48-102 to 48-103, we did the usual and intuitive step of using the positive square root of the expression in equation 48-102, which seems reasonable, since we are working with variances, which must always be positive, and standard deviations, which we also want to have positive values. However, when we come to plot the expression in equation 48-108, we find that since T is always less than unity, lnT is negative, and therefore the entire expression is negative. Thus, plotting this expression directly results in the curve having a maximum rather than a minimum at the point where the derivative is zero. Since this does not conform to reality, where we obtain the best precision rather than the worst, it is clear that this is an artifact of our choice of sign for the square root; the way we obtain a unique answer, and one that is in conformance with the real world, is to use the absolute value of the expression. Again, we continue our derivations in our next chapter.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Cole, R., Journal of the Optical Society of America 41, 38–40 (1951). Mark, H. and Workman, J., Spectroscopy 15(12), 14–17 (2000). Mark, H. and Workman, J., Spectroscopy 16(11), 36–40 (2001). Mark, H. and Workman, J., Spectroscopy 15(10), 24–25 (2000). Mark, H. and Workman, J., Spectroscopy 15(11), 20–23 (2000). Mark, H. and Workman, J., Spectroscopy 16(2), 44–52 (2001). Mark, H. and Workman, J., Spectroscopy 16(4), 34–37 (2001). Mark, H. and Workman, J., Spectroscopy 16(5), 20–24 (2001). Mark, H. and Workman, J., Spectroscopy 16(7), 36–40 (2001). Ingle, J.D. and Crouch, S.R., Spectrochemical Analysis (Prentice-Hall, Upper Saddle River, NJ, 1988).
49 Analysis of Noise: Part 10
This chapter is one more in the set of chapters starting at Chapter 40 and first published as [1–9], dealing with the rigorous derivation of the expressions relating the effect of instrument (and other) noise to their effects to the spectra we observe. The impetus for this was the realization that the previously existing theory was deficient in that the derivations extant ignored the effect of noise in the reference reading, which turns out to have appreciable effects on the nature of the derived noise behavior. Chapter 40 in this set referenced as [1] was an overview; Chapters 41–46 examined the effects of noise when the noise was due to constant detector noise (e.g., IR/NIR spectroscopy), and the last two chapters (47 and 48) began by considering the effects of noise when the noise is due to detectors, such as photomultipliers, that are shotnoise-limited, so that the detector noise is Poisson-distributed and therefore the standard deviation of the noise equals the square root of the signal level. The path we are taking pretty well follows the one we used for the constant-detector-noise case, and those two chapters derived the effects when the noise is small compared to the measured signal. Since we wish to continue following that same path, we now need to consider what happens when the optical signal falls to the point where the noise becomes an appreciable fraction of the measured signal, and the effects of the noise, such as induced nonlinearities, can no longer be neglected. And as we do in each chapter in this section of the book we once more take this opportunity to remind our readers that we are dealing with a continuous series of chapters, and so we again continue our discussion by continuing our equation numbering, figure numbering, use of symbols, and so on as though there were no break, except that when we reuse an equation or series of equations that were derived and presented previously, we retain the original number(s) for those equation(s). So let us continue. In Chapter 43 [4], which the interested reader may wish to go back and refresh themselves about, we discussed the general descriptions of how and why the equations came about, we noted that the point of departure for investigating what happens when the noise level becomes large enough that it can no longer be ignored was equation 49-5:
T + T =
Es + Es Er + Er
(49-5)
and we noted that in that case, that of Normally distributed noise, the expected computed value of T was
T=
Es Er + Er
(49-52a)
300
Chemometrics in Spectroscopy
the reason being, as we pointed out, the other term that arose, Es /Er + Er , would vanish from the expression for the expected value of T because of symmetry. In the current case, however, we cannot rely on that argument. The Poisson distribution is not symmetric around any particular value, as we will observe shortly when we present a graph of the members of the family of Poisson distributions, despite the fact that this distribution approaches the Normal distribution in the limit as the parameter → . However, in addition to the fact that the distribution never becomes exactly Normal, our interest in this chapter is specifically to examine the effects occurring at small values of . Hence, in this case we must work with equation 49-5, rather than the simpler equation 49-52a: T + T =
Es + Es Er + Er
(49-5)
We next noted that the expected value of T is computed from the general equation for an expected value: � i
TW =
Wi FXi � Wi
(49-59)
i
Fx, here, is Es +Es /Er +Er , as we just noted. In the previous case, the weighting function was the Normal distribution. Our current interest is the Poisson distribution, and this is the distribution we need to use for the weighting factor. The interest in our current development is to find out what happens when the noise is Poisson-distributed, rather than Normally distributed, since that is the distribution that applies to data whose noise is shot-noise-limited. Using P to represent the Poisson distribution, equation 49-59 now becomes � X WP =
i
PXi FXi � Pi
(49-120)
i
and since probability distributions have integrals that always equal unity (reflecting the reality that the argument must have SOME value every time it is evaluated, so that it is certain that some value will be obtained over the entire range of summation; certainty of obtaining the value of a means that Pa = 1). The denominator of equation 49-120 vanishes, therefore, and equation 49-120 reduces to � X WP = PXi FXi (49-121) i
The Poisson distribution is actually a special case of the binomial distribution, a fact that is only of mild peripheral interest here, as we will not be using that fact. The formula for the Poisson distribution is PX =
e− X X!
(49-122)
Analysis of Noise: Part 10
301
In our terminology, the parameter corresponds to Er or Es , the (fixed) value of the energy to be measured, and X corresponds to Er or Es , as appropriate. Therefore equation 49-122 becomes PX =
e−Er Er Er Er !
(49-123)
Figure 49-20 presents the Poisson distribution; Figure 49-20a shows the distribution for integer values of up to = 11, and Figure 49-20b shows this distribution for 1 ≤ λ ≤ 11
(a)
Poisson distribution
0.4 0.35
λ=1
0.3
P(X)
0.25 0.2 0.15
λ = 11
0.1 0.05
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
X 0)
6 4 2 0 –2
0
1
2
3
4
5
6
7
8
9
10
11 12
13
–4 –6 –8
Analyte conc. exp. (10 < – exp.>)
Figure 71-2 Relationship of Laboratory CV (as powers of 2) with analyte concentration (as powers of 10− exp �. (For example, 6 on the abscissa represents a concentration of 10−6 or 1 ppm with a CV (%) of 24 .)
Table 71-2 Relationship of Laboratory CV (%) (as powers of 2) with analyte concentration (as powers of 10) CV (%) 20 21 22 23 24 25 26
Analyte conc.
Absolute conc.
Conc. in ppm
100 10−1 10−2 10−3 10−6 10−9 10−12
Near 100% 10% 1.0% 0.1% 1 ppm 1 ppb 1 ppt
106 105 104 103 1 10−3 10−6
Limitations in Analytical Accuracy: Part 1
485
Table 71-3 Relationship of Laboratory CV (as powers of 2) with analyte concentration (as powers of 10) CV (as 2exp ) 0 1 2 3 4 5 6
conc. (as 10− exp )
Absolute conc.
Conc. in ppm
0 −1 −2 −3 −6 −9 −12
Near 100% 10% 1.0% 0.1% 1 ppm 1 ppb 1 ppt
106 105 104 103 1 10−3 10−6
of magnitude that concentration decreases; for low (micro) concentrations, CV doubles for every 3 orders of magnitude decrease in concentration. Note that this represents the between-laboratory variation. The within-laboratory variation should be 50–66% of the between laboratory variation. Reflecting on Figures 71-1 and 71-2, as some have called this Horowitz’s trumpet. How interesting that he plays such a tune for analytical scientists. Another form of expression can also derived as CV (%) is another term for % relative standard deviation (%RSD) as equation 71-6 (reference [6]). %RSD = 2�1−0�5 log C�
(71-6)
There are many tests for uncertainty in analytical results and we will continue to present and discuss these within this series.
REFERENCES 1. Horwitz, W., Analytical Chemistry 54(1), 67A–76A (1982). 2. Horwitz, W., Laverne, R.K. and Boyer, W.K., Journal – Association of Official Analytical Chemists 63(6), 1344 (1980). 3. ASTM E177 – 86. Form and Style for ASTM Standards, ASTM International, West Conshohocken, PA ASTM E177 – 86 “Standard Practice for Use of the Terms Precision and Bias in ASTM Test Methods.” 4. Helland, S., Scand. J. Statist. 17, 97 Scandinavian Journal of Statistics (1990). 5. Mark, H. and Workman, J., Statistics in Spectroscopy, (2nd ed.), (Elsevier, Amsterdam, 2003), pp. 205–211, 213-222. 6. Personal Communication with G. Clark Dehne Capital University Columbus, OH 43209-2394 (2004), ASTM International E-13 Meeting.
This page intentionally left blank
72 Limitations in Analytical Accuracy: Part 2 – Theories to Describe the Limits in Analytical Accuracy
Recall from our previous chapter [1] how Horwitz throws down the gauntlet to analytical scientists stating that a general equation can be formulated for the representation of analytical precision based on analyte concentration (reference [2]). He states this as equation 72-1: CV% = 21−05 log C
(72-1)
where C is the mass fraction as concentration expressed in powers of 10 (e.g., 0.1% analyte is equal to C = 10−3 ). A paper published by Hall and Selinger [3] points out that an empirical formula relating the concentration (c) to the coefficient of variation (CV) is also known as the precision (. They derive the origin of the “trumpet curve” using a binomial distribution explanation. Their final derived relationship becomes equation 72-2: CV =
c−015 50
(72-2)
They further simplify the Horwitz trumpet relationship in two forms as: CV% = 0006c−05
(72-3a)
= 0006c05
(72-3b)
and
They then derive their own binomial model relationships using Horwitz’s data with variable apparent sample size. CV% = 002c−015
(72-4a)
= 002c085
(72-4b)
and
Both sets of relationships depict relative error as inversely proportional to analyte concentration. In yet a more detailed incursion into this subject, Rocke and Lorenzato [4] describe two disparate conditions in analytical error: (1) concentrations near zero; and macrolevel concentrations, say greater than 0.5% for argument’s sake. They propose that analytical
488
Chemometrics in Spectroscopy
error is comprised of two types, additive and multiplicative. So their derived model for this condition is (72-5): x = e +
(72-5)
where x is the measured concentration, is the true analyte concentration, and is a Normally distributed analytical error with mean 0 and standard deviation . It should be noted that represents the multiplicative or proportional error with concentration and represents the additive error demonstrated at small concentrations. Using this approach, the critical level at which the CV is a specific value can be found by solving for x using the relationship shown in equation 72-6: CVx2 = x2 + 2
(72-6)
where x is the measured analyte concentration as the practical quantitation level (PQL used by the U.S. Environmental Protection Agency (EPA)). This relationship is simplified to equation 72-7. � x=
1 � 2 � CV − 2
(72-7)
where CV is the critical level at which the coefficient of variation is a preselected value to be achieved using a specific analytical method, and is the standard deviation of the multiplicative or measurement error of the method. For example, if the desired CV is 0.3 and is 0.1, then the PQL or x is computed as 3.54. This is the lowest analyte concentration that can be determined given the parameters used. The authors describe the model above as a linear exponential calibration curve as equation 72-8. y = + e +
(72-8)
where y is the observed measurement data. This model approximates a consistent or constant standard deviation model at low concentrations and approximates a constant CV model for high concentrations, where the multiplicative error varies as e .
DETECTION LIMIT FOR CONCENTRATIONS NEAR ZERO Finally detection limit (D) is estimated using equation 72-9. 3 D = √ r
(72-9)
where is the standard deviation of the measurement error measured at low (near zero) concentration, and r is the number of replicate measurements made.
Limitations in Analytical Accuracy: Part 2
489
REFERENCES 1. Workman, J. and Mark, H., “Chemometrics in Spectroscopy: Limitations in Analytical Accuracy – Part 1 Horwitz’s Trumpet,” Spectroscopy 21(9), 18–24 (2006). 2. Horwitz, W., Analytical Chemistry 54(1), 67A–76A (1982). 3. Hall, P. and Selinger, B., Analytical Chemistry, 61, 1465–1466 (1989). 4. Rocke, D. and Lorenzato, S., Technometrics 37(2), 176–184 (1995).
This page intentionally left blank
73
Limitations in Analytical Accuracy: Part 3 – Comparing Test Results for Analytical Uncertainty
UNCERTAINTY IN AN ANALYTICAL MEASUREMENT By making replicate analytical measurements one may estimate the certainty of the analyte concentration using a computation of the confidence limits. As an example, given five replicate measurement results as: 5.30%, 5.44%, 5.78%, 5.00%, and 5.30%. The precision (or standard deviation) is computed using equation 73-1, � �� � r 2 � xi − x¯ � i=1 s= r − 1
(73-1)
where s represents the precision, means summation of all the xi − x¯ 2 values, xi is an individual replicate analytical result, x¯ is the mean of the replicate results, and r is the total number of replicates included in the group (this is often represented as n). For the above set of replicates s = 0282. The degrees of freedom are indicated by r − 1 = 4. If we want to calculate the 95% confidence level, we note that the t-value is 2.776. So the uncertainty (U ) of our measurement result is calculated as 73-2: s U = x¯ ± t · √
r
(73-2)
So the example case results in an uncertainty range from 5.014 to 5.714 with an uncertainty range of 0.7. Therefore if we have a relatively unbiased analytical method, there is a 95% probability that our true analyte value lies between these upper and lower concentration limits.
COMPARISON TEST FOR A SINGLE SET OF MEASUREMENTS VERSUS A TRUE ANALYTICAL RESULT Now let us start this discussion by assuming we have a known analytical value by artificially creating a standard sample using impossibly precise weighing and mixing methods so that the true analytical value is 5.2% analyte. So we make one measurement and obtain a value of 5.7%. So then we refer to errors using statistical terms as follows: Measured value: 5.7%
“True” value: = 52%
492
Chemometrics in Spectroscopy
Absolute error: Measured Value − True Value = 05% Relative % error: 05/52 × 100 = 96% Then we recalibrate our instrumentation and obtain the results: 5.10, 5.20, 5.30, 5.10, and 5.00. Thus our mean value (¯x is 5.14. Our precision as the standard deviation (s) of these five replicate measurements is calculated as 0.114 with n − 1 = 4 degrees of freedom. The t-value from the t table, = 095, degrees of freedom as 4, is 2.776. To determine if a specific test result is significantly different from the true or mean value, we use equation 73-3 as the test statistic Te : � � � x¯ − √ � � Te = � · n�� s
(73-3)
For this example Te = 1177. We note there is no significant difference in the measured value versus the expected or true value if Te ≤ t-value. And there is a significant difference between the set of measured values and the true value if Te ≥ t-value. We must then conclude here that there is no difference between the measured set of values and the true value, as 1177 ≤ 2776.
COMPARISON TEST FOR A TWO SETS OF MEASUREMENTS If we take two sets of five measurements using two calibrated instruments and the mean results are x¯ 1 = 514 and x¯ 2 = 516, we would like to know if the two sets of results are statistically identical. So we calculate the standard deviation for both sets and find s1 = 0114 and s2 = 0193. The pooled standard deviation s12 = 0079. The degrees of freedom in this case is n1 − 1 equals 5 − 1 = 4. The t-value at = 095, d.f. = 4, is 2.776. To determine if one set of measurements is significantly different from the other set of measurements we use equation 73-4 the test statistic Te : � � � x¯ 1 − x¯ 2 Te12 = �� � 1 � s · n +n 1
2
� � � � � �
(73-4)
For this example, Te12 = 0398. So there is no significant difference in the sets of measured values we would expect Te ≤ t-value, since 0398 ≤ 2776. And if there is a significant difference between the sets of measured values we expect Te ≥ t-value. We must conclude here that there is no difference between the sets of measured values.
Limitations in Analytical Accuracy: Part 3
493
CALCULATING THE NUMBER OF MEASUREMENTS REQUIRED TO ESTABLISH A MEAN VALUE (OR ANALYTICAL RESULT) WITH A PRESCRIBED UNCERTAINTY (ACCURACY) If error is random and follows probabilistic (normally distributed) variance phenomena, we must be able to make additional measurements to reduce the measurement noise or variability. This is certainly true in the real world to some extent. Most of us having some basic statistical training will recall the concept of calculating the number of measurements required to establish a mean value (or analytical result) with a prescribed accuracy. For this calculation one would designate the allowable error (e), and a probability (or risk) that a measured value (m) would be different by an amount (d). We begin this estimate by computing the standard deviation of measurements, this is determined by first calculating the mean, then taking the difference of each control result from the mean, squaring that difference, dividing by n − 1, then taking the square root. All these operations are included in the equation 73-5. � �� � n 2 � xi − x¯ � i=1 s= (73-5) n − 1 where s represents the standard deviation, means summation of all the xi − x¯ 2 values, xi is an individual control result, x¯ is the mean of the control results, and n is the total number of control results included in the group. If we were to follow a cookbook approach for computing the various parameters we proceed as follows: (1) Compute an estimate of (s) for the method (see above) (2) Choose the allowable margin of error (d) (3) Choose the probability level as alpha (, as the risk that our measurement value (m) will be off by more than (d) (4) Determine the appropriate t value for t1−/2 for n − 1 degrees of freedom (5) Finally the formula for n (the number of discrete measurements required) for a given uncertainty as equation 73-6. � 2 2� t ·s n= +1 (73-6) d2 Problem Example: We want to learn the average value for the quantity of toluene in a test sample for a set of hydrocarbon mixtures. s = 1, = 095, d = 01. For this problem t1−/2 = 196 (from t table) and thus n is computed as equation 73-7: � � 1962 · 12 n= + 1 = 385 (73-7) 012 So if we take 385 measurements we conclude with a 95% confidence that � � the true analyte value (mean value) will be between the average of the 385 results X ± 01.
494
Chemometrics in Spectroscopy
THE Q-TEST FOR OUTLIERS [1–3] We make five replicate measurements using an analytical method to calculate basic statistics regarding the method. Then we want to determine if a seemingly aberrant single result is indeed a statistical outlier. The five replicate measurements are 5.30%, 5.44%, 5.78%, 5.00%, and 5.30%. The result we are concerned with is 6.0%. Is this result an outlier? To find out we first calculate the absolute values of the individual deviations:
Compute deviation
Absolute deviation
5.30–6.00 5.44–6.00 5.78–6.00 5.00–6.00 5.30–6.00
0.70 0.56 0.22 1.00 0.70
Thus the minimum deviation (DMin is 0.22; the maximum deviation is 1.00 and the deviation range (R) is 100 − 022 = 078. We then calculate the Q-Test Value as Qn using equation 73-8: Qn =
DMin R
(73-8)
This results in the Qn of 022/078 = 028 for n = 5. Using the Q-Value Table (90% Confidence Level as Table 73-1 we note that if Qn ≤ Q-Value, then the measurement is NOT an Outlier. Conversely, if Qn ≥ Q-Value, then the measurement IS an outlier. So since 028 ≤ 0642 this test value is not considered an outlier.
SUMMATION OF VARIANCE FROM SEVERAL DATA SETS We sum the variance from several separate sets of data by computing the variance of each set of measurements; this is determined by first calculating the mean for each set, then taking the difference of each result from the mean, squaring that difference,
Table 73-1 Q-Value table (at different confidence levels) n: Q90%: Q95%: Q99%:
3
4
5
6
7
8
9
0941 0970 0994
0765 0829 0926
0642 0710 0821
0560 0625 0740
0507 0568 0680
0468 0526 0634
0437 0493 0598
10 0412 0466 0568
Limitations in Analytical Accuracy: Part 3
495
dividing by r − 1 where r is the number of replicates in each individual data set. All these operations are included in equation 73-9: r �
s = 2
xi − x¯ 2
i=1
r − 1
(73-9)
where s2 represents the variance for each set, means summation of all the xi − x¯ 2 values, xi is an individual result, x¯ is the mean of the each set of results, and r is the total number of results included in each set. The pooled variance is given as equation 73-10: sp2 =
s12 + s22 + + sk2 k
(73-10)
where sk2 represents the variance for each data set, and k is the total number of data sets included in the pooled group. The pooled standard deviation p is given as 73-11:
p =
� s2p
(73-11)
REFERENCES 1. Miller, J.C. and Miller, J.N., Statistics for Analytical Chemistry, 2nd ed. (Ellis Horwood Limited Publishers, Chichester, 1992), pp. 63–64. 2. Dixon, W.J. and Massey, F.J., Jr, Introduction to Statistical Analysis, 4th ed. (ed. W.J. Dixon) (McGraw-Hill, New York, 1983), pp. 377, 548. 3. Rohrabacher, D.B. “Dixon’s Q-Tables for Multiple Probability Levels” Analytical Chemistry 63, 139 (1991).
This page intentionally left blank
74
The Statistics of Spectral Searches
There are a variety of mathematical techniques used for determining the matching index (or agreement) between an unknown test spectrum (or signal pattern) and a set of known or reference spectra (or multiple signal patterns) [1–12]. The set of known spectra are often referred to as a reference spectral library. In general, high match score values or similarity is indicative of greater ‘alikeness” between an unknown test spectrum and single or multiple known reference spectra contained within a reference library. A basic list of the techniques used to compare an unknown test spectrum to a set of known library spectra is found in Table 74-1. Some of the mathematical approaches used will be described in greater detail in this chapter.
COMMON SPECTRAL MATCHING APPROACHES The ASTM (American Society for Testing and Materials) has published a “Standard Practice for General Techniques for Qualitative Analysis” (Method E 1252-88). The method describes techniques useful for qualitative evaluation of liquids, solids, and gases using the spectral measurement region of 4000 to 50 cm−1 (above 2500 nm) [1, 2].
MAHALANOBIS DISTANCE MEASUREMENTS The Mahalanobis Distance statistic (or more correctly the square of the Mahalanobis Distance), D2 , is a scalar measure of where the spectral vector a lies within the mul tivariate parameter space used in a calibration model [3, 4]. The Mahalanobis distance is used for spectral matching, for detecting outliers during calibration or prediction, or for detecting extrapolation of the model during analyses. Various commercial software packages may use D instead of D2 , or may use other related statistics as an indication of high leverage outliers, or may call the Mahalanobis Distance by another name. D2 is preferred here since it is more easily related to the number of samples and variables. Model developers should attempt to verify exactly what is being calculated. Both meancentered and not mean-centered definitions for Mahalanobis Distance exist, with the mean-centered approach being preferred. Regardless of whether mean-centering of data is performed, the statistic designated by D2 has valid utility for qualitative calculations. If a is a spectral vector (dimension f by 1) and A is the matrix of calibration spectra (of dimension n by f , then the Mahalanobis Distance is defined as: D2 = at AAt + a
(74-5a)
498
Chemometrics in Spectroscopy
Table 74-1 A listing of Classic Spectral Search Algorithms and Terminology 1. Visual overlap test spectrum (t) and reference spectrum (r) to compare spectral shapes for similarity 2. Search and identify (compare individual peaks or sets of spectral peaks) 3. Compare physical data or chemical measurements between samples 4. Use mathematical methods, such as Hit quality index (HQI) value (or ‘similarity’ value vs. library reference sample). An example list of such HQI methods includes
a. Euclidean distance (d) algorithm for d =
� n �
� 21 ti − ri
2
i=1
b. First derivative Euclidean distance algorithm for 1Dd = c. Sum of differences as sd =
n �
(74-1) �
n �
� 21 1 ti − 1 ri
2
(74-2)
i=1
ti − ri
i=1
d. Correlation [row matrix (row vector) dot product as r = T • R =
(74-3) n �
Ti Ri
(74-4)
i=1
5. Other approaches: Hamming networks, pattern recognition, wavelets, and neural network learn ing systems are sometimes discussed but have not been commercially implemented.
For a mean-centered calibration, a and A in equation 74-5a are replaced by a − a and A −A respectively. If a weighted regression is used, the expression for the Mahalanobis Distance becomes equation 74-5b: D2 = at ARAt + a
(74-5b)
In MLR, if m is the vector (dimension k by 1) of the selected absorbance values obtained from a spectral vector a, and M is the matrix of selected absorbance values for the calibration samples, then the Mahalanobis Distance is defined as equation 74-6a: D2 = mt MMt −1 m
(74-6a)
If a weighted regression is used, the expression for the Mahalanobis Distance becomes equation 74-6b: D2 = mt MRMt −1 m
(74-6b)
In PCR and PLS, the Mahalanobis distance for a sample with spectrum a is obtained by substituting the decomposition for PCR, or for PLS, into equation 74-5a. The statistic is expressed as equation 74-7a. D2 = st s
(74-7a)
If a weighted PCR or PLS regression is used, the expression for the Mahalanobis Distance becomes equation 74-7b. D2 = st St RS−1 s
(74-7b)
The Statistics of Spectral Searches
499
The Mahalanobis Distance statistic provides a useful indication of the first type of extrapolation. For the calibration set, one sample will have a maximum Mahalanobis 2 . This is the most extreme sample in the calibration set, in that, it is the Distance, Dmax farthest from the center of the space defined by the spectral variables. If the Maha 2 , then the estimate for lanobis Distance for an unknown sample is greater than Dmax the sample clearly represents an extrapolation of the model. Provided that outliers have been eliminated during the calibration, the distribution of Mahalanobis Distances should 2 can be used as an indication of be representative of the calibration model, and Dmax extrapolation.
EUCLIDEAN DISTANCE There may be some great future algorithm or approach developed using some of these concepts, but for now how about the Euclidean Distance approach (equation 74-8) where: d = X21 −X11 2 +X22 −X12 2 +X23 −X13 2 + · · · + X2i −X1i 2 05
(74-8)
where Xki are data points from each of two spectra where k is the spectrum or sample number and i is the data point number in the spectrum. The distance is calculated at each data point (from 1 to i), with a comparison between the test spectrum (sub 2) and each reference spectrum (sub k). The distance from a reference spectrum to the test spectrum is calculated as the Euclidean distance.
COMMON SPECTRAL MATCHING (CORRELATION OR DOT PRODUCT) Techniques for matching sample spectra include the use of Mahalanobis distance and Cross Correlation techniques “Correlation Matching”, described earlier. The general method for comparing two spectra (test versus reference), where the reference is a known compound or the mean spectrum of a set of known spectra, is given as the MI (Match Index). The MI is computed by comparing the vector dot products between the test and the reference spectra. The theoretical values for these dot products range from −1.0 to +1.0, where −1.0 is a perfect negative (inverse) correlation, and +1.0 is a perfect match. Since for near infrared spectroscopy only positive absorbance values are used to compute the dot products, the values for the match index must fall within the 0.0 to +1.0 range. The mathematics is straightforward and are demonstrated below. The MI is equal to the cosine of the angle (designated as between two row vectors (the test and reference spectra) projected onto a two-dimensional plane, and is equivalent to the correlation (r) between the two spectra (row vectors) as equation 74-9. � MI = cos =
T •R T R
� (74-9)
where T is the test spectrum row matrix, and R is the reference spectrum row matrix.
500
Chemometrics in Spectroscopy
Note the following equation 74-10a: T • R =
n �
Ti R i
(74-10a)
i=1
where Ti represents the individual data points for the test spectrum (designated as the absorbance values of spectrum T from wavelengths i = 1 through n), and Ri represents the individual data points for the reference spectrum (designated as the absorbance values of spectrum R from wavelengths i = 1 through n).
And where
T R =
� n � i=1
� 21 � Ti2
n �
� 21 R2i
(74-10b)
i=1
note, the angle ( , in degrees, between two vectors can be determined from the MI using = cos−1 MI
(74-10c)
The “alikeness” of one test spectrum (or series of spectra) to a reference spectrum can be determined by calculating a point-by-point correlation between absorbance data for each test and reference spectrum. The correlation matching can be accomplished for all data points available or for a pre-selected set only. The more alike the test and reference spectra are, the higher (closer to 1.00) are the r (correlation coefficient) and R2 (coefficient of determination) values. A perfect match of the two spectra would produce r or R2 values of 1.00000. The sensitivity of the technique can be increased by pre-treating the spectra as first to higher order derivatives and then calculating the correlation between test and reference spectra. Full spectral data can also be truncated (or reduced) to include only spectral regions of particular interest, a practice which will further improve matching sensitivity for a particular spectral feature of interest. Sample selection using this technique involves selecting samples most different from the mean population spectrum for the full sample set. Those samples with correlations of the lowest absolute values (including negative correlations) are selected first and then samples of second lowest correlation are selected (and so on) until the single sample of highest correlation is found. The distribution of spectra about the mean is assumed to follow a normal distribution with a computable standard deviation. This assumption indicates that a uniformly distributed sample set can be selected based on the correlation between test spectra and the mean spectrum of a population of spectra.
REFERENCES 1. ASTM “Practice for General Techniques for Qualitative Infrared Analysis”, ASTM Committee E 13, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. 2. ASTM Committee E13.11, “Practice for Near Infrared Qualitative Analysis”, ASTM Com mittee E 13, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. 3. Mahalanobis, P.C., Proceedings of the National Institute of Science 2, 49–55 (1936). 4. Mark, H.L. and Tunnell, D., Analytical Chemistry 57, 1449–1456 (1985).
The Statistics of Spectral Searches
501
5. Workman, J., Mobley, P., Kowalski, B. Bro, R., Applied Spectroscopy Reviews 31(1–2), 73 (1996). 6. Whitfield, R.G., Gerber, M.E. and Sharp, R.L., Applied Spectrocopy 41, 1204–1213 (1987). 7. Mahalanobis, P.C., Proceedings of the National Institute of Science 2, 49 (1936). 8. Reid, J.C. and Wong, E.C., “Data-Reduction and -Search System for Digital Absorbance Spectra”, Applied Spectrocopy 20, 320–325 (1966). 9. Owens, P.M. and Isenhour, T.L., “Infrared Spectral Compression Procedure for Resolution Independent Search Systems”, Analytical Chemistry 55, 1548–1553 (1983). 10. Tanabe, K. and Saeki, S., “Computer Retrieval of Infrared Spectra by a Correlation Coefficient Method”, Analytical Chemistry 47, 118–122 (1975). 11. Azarraga, L.V., Williams, R.R. and de Haseth, J.A., “Fourier Encoded Data Searching of Infrared Spectra (FEDS/IRS)”, Applied Spectroscopy 35, 466–469 (1981). 12. de Haseth, J.A. and Azarraga, L.V., “Interferogram-Based Infrared Search System”, Analytical Chemistry 53, 2292–2296.
This page intentionally left blank
75 The Chemometrics of Imaging Spectroscopy
Imaging spectroscopy is particularly useful toward understanding the structure and func tional relationships of materials and biological samples. Spatial images of chemical structure demonstrate physical or chemical phenomena related to a particular structural anatomy. Software packages such as MATLAB and many others provide easily learned methods for image display and mathematical manipulation for matrices of data [1]. Imaging data may be measured using array and camera data comprised of spatial data on an X, and Y plane with the Z-axis being related to frequency and the fourth dimension related to amplitude or signal strength. The Figures below illustrate the types of data useful for imaging problems. Figures 75-1a and 75-1b illustrate second-order data comprised of signal amplitude (A), multiple frequencies (/, and time. In this image model, one is taking spectroscopic measurements over time. Figure 75-1b shows another form of second-order data where spectroscopic amplitude at a single wavelength is combined with spatial information. Figure 75-2 shows third-order data or a hyperspectral data cube where the spectral amplitude is measured at multiple frequencies (spectrum) with X and Y spatial dimen sions included. Each plane in the figure represents the amplitude of the spectral signal at a single frequency for an X and Y coordinate spatial image. Such data shown in the above figures provides powerful information relating structure to chemical knowledge. Such data may be measured by rastering a spectrometer or microscope over a particular area, or by using an array detection scheme for collecting spectroscopic data. Imaging provides an entirely expanded dimension of spectroscopy and increases the power of spectroscopic techniques to reveal new information regarding investigations into new materials and biological or chemical interactions.
IMAGE PROJECTION OF SPECTROSCOPIC DATA Table 75-1 demonstrates the rows × columns data matrix that can be obtained by rastering a spectrophotometer across a two-dimensional plane surface of paper with a pattern entered onto the paper, using, for example, water or an invisible ink that has a unique spectral absorption. For illustrative purposes the data shown here are created with a computer. One might imagine spectroscopic data measured at single or multiple wavelengths to obtain a similar data matrix. One may also enhance the signal or spectra using the toolbox of preprocessing techniques to enhance or draw out a clearer image. The data matrix is preprocessed using any signal enhancement technique to obtain the spectroscopic data of greatest interest as it relates to the spatial characteristics of the material or sample surface under study. In this particular case each data point represents the absorbance difference between a no absorbing wavelength for the paper surface and an absorbing wavelength for the transparent ink added to the paper. The difference in
504
Chemometrics in Spectroscopy
(a)
(b)
A
t
Aλ /v
λ /v
Y
X
Figure 75-1 (a) Second order data (amplitude, multiple frequencies, time); (b) Second order data (amplitude at one frequency, with X and Y spatial dimensions).
A
Y
λ /v
X
A
λ /v
Figure 75-2 Third order data (Hyperspectral Data Cube: Amplitude, multiple frequencies, and X, Y spatial data – each plane represents the amplitude of spectral signal at a single frequency for an X, Y coordinate spatial image).
Table 75-1 Simulated absorbance data depicting an ink pattern on a two-dimensional paper surface with spatial dimensions X and Y .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 1.1 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001;
.001 .001 .001 .001 .001 .001 .001 .001 .001 .001 1.11 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001;
.001 1.11 .001 .001 .001 .001 .001 .001 .001 .001 1.10 .001 .001 .001 .001 .001 .001 .001 .001 1.10 .001;
.0012 .0012 1.10 .0012 .0012 .0012 .0012 .0012 .0012 .0012 1.11 .0012 .0012 .0012 .0012 .0012 .0012 .0012 1.10 .0012 .0012;
.0011 .0011 .0011 1.11 .0011 .0011 .0011 .0011 .0011 .0011 1.12 .0011 .0011 .0011 .0011 .0011 .0011 1.10 .0011 .0011 .0011;
.0011 .0011 .0011 .0011 1.11 .0011 .0011 .0011 .0011 .0011 1.11 .0011 .0011 .0011 .0011 .0011 1.11 .0011 .0011 .0011 .0011;
.0012 .0012 .0012 .0012 .0012 1.10 .0012 .0012 .0012 .0012 1.10 .0012 .0012 .0012 .0012 1.12 .0012 .0012 .0012 .0012 .0012;
.0010 .0010 .0010 .0010 .0010 .0010 1.10 .0010 .0010 .0010 1.11 .0010 .0010 .0010 1.10 .0010 .0010 .0010 .0010 .0010 .0010;
.0011 .0011 .0011 .0011 .0011 .0011 .0011 1.10 .0011 .0011 1.12 .0011 .0011 1.10 .0011 .0011 .0011 .0011 .0011 .0011 .0011;
.0011 .0011 .0011 .0011 .0011 .0011 .0011 .0011 1.11 .0011 1.11 .0011 1.11 .0011 .0011 .0011 .0011 .0011 .0011 .0011 .0011;
.0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 1.12 1.12 1.10 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012;
.0011 .0011 .0011 .0011 .0011 .0011 .0011 .0011 1.11 .0011 1.11 .0011 1.11 .0011 .0011 .0011 .0011 .0011 .0011 .0011 .0011;
.0012 .0012 .0012 .0012 .0012 .0012 .0012 1.11 .0012 .0012 1.11 .0012 .0012 1.11 .0012 .0012 .0012 .0012 .0012 .0012 .0012;
.0011 .0011 .0011 .0011 .0011 .0011 1.10 .0011 .0011 .0011 1.11 .0011 .0011 .0011 1.11 .0011 .0011 .0011 .0011 .0011 .0011;
.0011 .0011 .0011 .0011 .0011 1.12 .0011 .0011 .0011 .0011 1.10 .0011 .0011 .0011 .0011 1.11 .0011 .0011 .0011 .0011 .0011;
.0011 .0011 .0011 .0011 1.10 .0011 .0011 .0011 .0011 .0011 1.12 .0011 .0011 .0011 .0011 .0011 1.12 .0011 .0011 .0011 .0011;
.0012 .0012 .0012 1.11 .0012 .0012 .0012 .0012 .0012 .0012 1.11 .0012 .0012 .0012 .0012 .0012 .0012 1.10 .0012 .0012 .0012;
.001 .001 1.11 .001 .001 .001 .001 .001 .001 .001 1.11 .001 .001 .001 .001 .001 .001 .001 1.10 .001 .001;
.0012 1.11 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 1.11 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 1.10 .0012;
.0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 1.10 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012 .0012;
.001 .001 .001 .001 .001 .001 .001 .001 .001 .001 1.10 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001]
The Chemometrics of Imaging Spectroscopy
505
2-D image map 20 18 16
Y-dimension
14 12 10 8 6 4 2 2
4
6
8
10
12
14
16
18
20
X-dimension
Figure 75-3 Two-dimensional contour plot of data matrix A found in Table 75-1.
absorbance between these two wavelengths will be directly related to the amount of ink added to the paper surface. By applying imaging software to the data matrix, an image of the ink content added to paper will appear. The first graphical representation using MATLAB® software is that of a twodimensional contour surface plot of the data from Table 75-1 [2]. This Figure 75-3 plot can represent multiple levels of z-axis data (absorbance) by the use of contours and color schemes. The MATLAB® commands for generating this image are given in Table 75-2 where A represents the raster data matrix shown in Table 75-1. The second graphical representation using MATLAB® software is that of a threedimensional surface plot (Table 75-3, Figure 75-4). This plot visually represents the three-dimensional data where the X and Y axes are spatial dimensions and the Z axis depicts absorbance. The MATLAB® commands for this graphic are given in Table 75-3 where A represents the raster data matrix given in Table 75-1.
Table 75-2 MATLAB® commands for generating a contour plot of data matrix A found in Table 75-1
contour(A) grid title([‘2-D Image Map’]) xlabel([‘X-Dimension’]) ylabel([‘Y-Dimension’])
506
Chemometrics in Spectroscopy
Table 75-3 MATLAB® commands for generating a 3-D surface plot of data matrix A found in Table 75-1
surf(A) colormap(cool) title([‘3-D Image Map’]) xlabel([‘X-Dimension’]) ylabel([‘Y-Dimension’]) zlabel([‘Z-Dimension’])
3-dimensional map
Z-dimension
1.5
1
0.5
0 30 25
20
20 15
Y-dimension 10
10 0
5 0
X-dimension
Figure 75-4 Three-dimensional surface plot of data matrix A found in Table 75-1
The third graphical representation using MATLAB® software is that of a twodimensional contour map overlay onto a three-dimensional surface plot (Table 75-4, Figure 75-5). This plot visually represents Figure 75-3 overlay onto Figure 75-4 For this three-dimensional graphic, the X and Y spatial dimension axes correlate to the Z-axis depicting absorbance (or spectroscopic signal). The MATLAB® commands for this graphic are given in Table 75-4 where A represents the raster data matrix given in Table 75-1 So by producing a matrix of data containing a contrast between the signal and the background one may obtain useful images for study. In order to utilize this technique for optimization of image quality, one must process the raw signal to enhance the difference between the component of interest and the background signal. The signal is enhanced using many of the techniques described in this text. The use of MLR, PCR, PLS and other background correction, derivatives, and the like can all be used to enhance the signal to noise between the component of interest for imaging and the background signal. Once this contrast is achieved the simple techniques described
The Chemometrics of Imaging Spectroscopy
507
Table 75-4 MATLAB® commands for generating a 2-D contour plot over a 3-D surface plot
C=(A-1); surf(C) axis([0 25 0 25 -1 1]) hold contour(A) grid title([‘3-D Image Map with Contour’]) xlabel([‘X-Dimension’]) ylabel([‘Y-Dimension’]) zlabel([‘Z-Dimension’])
3-D image map with contour
1
Z-dimension
0.5
0
–0.5
–1 25 20 15
Y-dimension
20
25
15
10
10
5 0
5 0
X-dimension
Figure 75-5 Two-dimensional contour plot overlay onto three-dimensional surface plot of data matrix A found in Table 75-1. (see Color Plate 24)
here are useful for projecting the image for structure-chemical composition studies or for detecting the presence and location of impurities.
REFERENCES 1. Workman, J., NIR News 9(3), 4–5 (1998). 2. MATLAB® software from The Mathworks, Inc. 24 Prime Park Way, Natick, MA 01760.
This page intentionally left blank
Glossary of Terms
This set of terms is a supplement to the text. Many of these terms are included to clarify issues discussed in the text. We refer to the text index for more detailed coverage of the statistics and chemometrics terms. Many of these terms refer to the measuring instrument or the process of making a measurement rather than to mathematical concepts. Action limit, n – the limiting value from an instrument performance test, beyond which the instrument or analytical method is expected to produce potentially invalid results. Analysis, v – the process of applying a calibration model to an absorption spectrum so as to estimate a component concentration value or property. Analyzer, n – all piping, hardware, computer, software, instrumentation, and one or more calibration models required to automatically perform analysis of a specific sample type. Analyzer calibration, n – see multivariate calibration. Analyzer model, n – see multivariate model. Analysis precision, n – a statistical measure of the expected repeatability of results for an unchanging sample, produced by an analytical method or instrument for samples whose spectra represent an interpolation of a multivariate calibration. The reader is cautioned to refer to specific definitions for precision and repeatability based on the context of use. Analysis result, n – the numerical or qualitative estimate of a physical, chemical, or quality parameter produced by applying the calibration model to the spectral data collected by an instrument according to specified measurement conditions. Analysis validation test, n – see validation test. Calibration, v – a process used to create a model relating two types of measured data. Also, a process for creating a model that relates component concentrations or properties to absorbance spectra for a set of samples with known reference values. Calibration model, n – the mathematical expression that relates component concentra tions or properties of a set of reference samples to their absorbances. It is used to predict the properties of samples based upon their measured spectrum.
510
Glossary of Terms
Calibration, multivariate, n – a process for creating a model that relates component concentrations or properties to the absorbances of a set of known reference samples at more than one wavelength or frequency. Calibration samples, n – the set of samples used for creating a calibration model. Reference component concentration or property values need to be known, or measured by a suitable reference method in order that they may be related to the measured absorbance spectra during the calibration process. Calibration transfer, n – a method of applying a multivariate calibration developed on one instrument to data measured on a different instrument, by mathematically modifying the calibration model or by a process of instrument standardization. Check sample, n – a single pure compound, or a known, reproducible mixture of compounds whose spectrum is constant over time such that it can be used as a qual ity or validation or verification sample during an instrument performance or function test. Control limit, n – for validation tests, the maximum difference allowed between a valid analytical result, and a reference method result for the same sample. A measured value that exceeds a control limit requires that action be taken to correct the process. Control limits are statistically determined. Estimate, n – the value for a component concentration or property obtained by applying the calibration model for the analysis of an absorption spectrum; v - this is also a general statistical term referring to an approximation of a parameter based upon theoretical computation. Inlier, n – see nearest neighbor distance inlier. Inlier detection methods, n – statistical tests which are conducted to determine if a spectrum resides within a region of the multivariate calibration space which is sparsely populated. Instrument standardization, v – a procedure for standardizing the response of multiple instruments such that a common multivariate model is applicable for measurements con ducted across these instruments, the standardization being accomplished via adjustment of the spectrophotometer hardware or via mathematical treatment of one or a series of collected spectra. Model validation, v – the process of testing a calibration model to determine bias between the estimates from the model and the reference method, and to test the expected agreement between estimates made with the model and the reference method. Multivariate calibration, n – an analyzer calibration that relates the spectrum at multiple wavelengths or frequencies to the physical, chemical, or quality parameters; v – the process or action of calibrating.
Glossary of Terms
511
Multivariate model, n – a multivariate mathematical rule or formula used to calculate a physical, chemical, or quality parameter from the measured spectrum. Nearest neighbor distance inlier, n – a spectrum residing within a significant gap in the multivariate calibration space, the result for which is subject to possible interpolation error across the sparsely populated calibration space. Optical background, n – the spectrum of radiation incident on a sample under test, typically obtained by measuring the radiation transmitted through or reflected from the spectrophotometer when no sample is present, or when an optically thin or non-absorbing standard material is present. Optical reference filter, n – an optical filter or other device which can be inserted into the optical path in the spectrophotometer or probe producing an absorption spectrum which is known to be constant over time such that it can be used in place of a check or test sample in a performance test. Outlier detection limits, n – the limiting value for application of an outlier detection method to a spectrum, beyond which the spectrum represents an extrapolation of the calibration model. Outlier detection methods, n – statistical tests which are conducted to determine if the analysis of a spectrum using a multivariate model represents an interpolation of the model. Outlier spectrum, n – a spectrum whose analysis by a multivariate model represents an extrapolation of the model. Performance test, n – a test that verifies that the performance of an instrument is consistent with historical data and adequate to produce valid analysis results. Physical correction, n – a type of post processing where the correction made to the numerical value produced by the multivariate model is based on a separate physi cal measurement of, for example, sample density, sample pathlength, or particulate scattering. Post-processing, n – performing a mathematical operation on an intermediate analysis result to produce the final result, including correcting for temperature effects, adding a mean property value of the calibration model, or converting the instrument results into appropriate units for reporting purposes. Pre-processing, n – performing mathematical operations on raw spectral data prior to multivariate analysis or model development, such as selecting wavelength regions, correcting for baseline, smoothing, mean centering, and assigning weights to certain spectral positions. Primary method, n – see reference method.
512
Glossary of Terms
Reference method, n – the analytical method that is used to estimate the reference component concentration or property value which is used in calibration and validation procedures. Reference values, n – the component concentrations or property values for the calibra tion or validation samples which are measured by the reference analytical method. Spectrophotometer cell, n – an apparatus which allows a liquid sample or gas to flow between two optical surfaces which are separated by a fixed distance, referred to as the sample pathlength, while simultaneously allowing light to pass through the liquid. There are variations of this including variable-pathlength cells, and multi-pass cells, and so on. Test sample, n – a sample, or a mixture of samples which has a constant spectrum for a limited time period, which is well characterized by the primary method, and which can be used as a QC sample in a performance test. Test samples and their spectra are generally not reproducible over extended periods. Validation, v – the process by which it is established that an analytical method is suitable for its intended purpose. Validation samples, n – a set of samples used in validating a calibration model. Validation samples are not generally part of the set of calibration samples. Reference component concentrations or property values are known (measured using a reference method), and are compared to those estimated using the model. Validated result, n – a result produced by the spectroscopic (or instrumental) method that is equivalent, within control limits, to the result expected from the reference method so that the result can be used in lieu of the direct measurement of the sample by the reference method. Validation test, n – a test performed on a validation sample that demonstrates that the result produced by the instrument or analytical method and the result produced by the reference method are equivalent to within statistical tests.
Index
A (estimated), 110, 114 A/D converter, 274, 277, 306 Ab initio theory, 225 Abscissa (x-axis), 71, 298, 340, 384–6, 479–80 Absorbance noise, 265–6, 277, 282, 286–8, 289, 291, 311, 321–2 Absorbance, 28 Absorptivity, 165, 283, 461, 480, 500 Accuracy, 121, 125, 136, 167, 173–7, 329, 453, 478, 484, 490 Actual result, 37–8, 40, 315 Addition, 6, 10, 78 Alchemy, 159 Algebra, matrix, 9–16, 17–20, 23–31, 33–41, 43–5, 47–9 Algebraic manipulation, 28, 43–4 Algebraic transformation, 26 Algorithms, 26, 48–9, 135–6, 152, 159, 160, 161, 163–6, 461 multivariate, 79 Alikeness, 376, 493, 496 All-possible-combinations design of three factors, 53 All-possible-combinations experiment, 63–4 All-possible combinations of factors, 89 Allowable uncertainty, 478 Alpha error, 97 Alpha-level(s), 101 Alpha-significance level, 98 Alternate population, 97–8, 101 Alternative hypothesis test, 93, 392 Alternative hypothesis, Ha, 93, 392 American Pharmaceutical Review, 423 American Society for Testing and Materials International (ASTM), 493 Amount of non-linearity, 146, 150–2, 155, 447–9, 453, 455, 457 Amplitude, 326, 330, 344, 499–500 Analog-to-digital (A/D) conversion, 273
Analysis of noise, 223–6, 227–33, 235–41, 243–52, 253–66, 267–72, 273–9, 281–8, 289–94, 295–307, 309–11, 313–17, 319–23, 325–33 Analysis of variance (ANOVA), 59, 64–5, 171, 179, 210, 213, 215, 431, 450 accuracy, 167 data table, 59 general discussion, 59, 66–8 precision, 167–8 preclude to, 248 for regression, 155 results comparing laboratories, 179–80 statistical design of experiments, 168–72 table showing calculations, 67, 212 table, 59, 67, 212, 215 Test Comparisons for Laboratories and Methods, 179, 180 Analyte, 28, 30, 34, 121, 131, 141, 165, 168, 183, 187, 188, 223, 378, 379, 382–3, 385, 386, 390, 421, 430, 437, 478, 480 concentration, 121, 142, 188, 377, 378, 389, 420, 429, 431, 435–6, 441, 479–81, 483–4, 487 Analytic geometry, 71–6, 77–9, 81–4, 85–8 refresher, 3 Analytical Chemistry: A-pages, 477 critical review issues, 1 fundamental reviews, 1, 48–9 Analytical designs, 53 Analytical uncertainty, 487 Anscombe, 421, 425, 427–9, 435, 442 data, 428, 442–3 Anscombe’s plot, 442 Antibiotics, 419 Anticholesterol drugs, 419 AOAC, Association of Official Analytical Chemists, 479 AOTF, 365, 415 Applied spectroscopy, 313, 459 Applied statistics, 59, 376–80, 429
514 Approximation, 155, 232, 328, 340, 348, 350, 355, 368, 456 Array detection, 499 Association of Official Analytical Chemists (AOAC), 479 Astronomical measurements, 224 Atomic absorption, 479 Augmented matrices, 14–15, 17–18, 20, 36 Auxiliary statistics, calibration, 1, 120–5, 133–4, 141, 154, 398, 422 Average analytical value, 479 Average of samples from a population, 52, 54, 59, 94, 390 Average, 33, 48, 52, 183, 185, 245, 247, 262, 306, 326, 358, 372, 479 Balanced design for three factors, 52–3 Band position, in spectroscopy, 132–3 Beer’s law, 34, 37, 47, 120–1, 132, 141–4, 156, 235, 282, 289, 368 Behavior of the derivative, 335, 341, 346, 348 Best-fit line, 361, 440, 451 Best-fit linear model, 453 Beta-level(s), 101 Between-laboratory variation, 481 Between-treatment mean square, 59, 67, 70, 176 Bias-corrected standard error (SEP(c)), 378, 382–3, 478–9 Bias-corrected standard error (SEV(c)), 477–8 Bias, 3, 124, 167, 171, 177, 180, 187, 189, 379, 478–80 due to location or analytical method, 167–8, 171, 187 Biased estimator, 187–9, 379, 480 Big “if”, 423 Binomial distribution, 296, 483 Bioassay, 479 Biological interactions, 499 Biological samples, 499 Black box, 26, 154, 159 Blackbody radiation, energy density of, 224 Blank sample, 227 Bounds for a data set, 2 C (estimated), 111, 114 Calculating correlation, 381–2 Calculations for Comparison Tests, 188 Calculus, 229, 260, 276, 313, 457–8, 469–71, 473
Index Calibration: auxiliary statistics, 133–4, 154, 421 developing the model, 381 equations, 12, 28 error sources, 121–2, 132–3 linear regression, 28–9, 131, 165 lines, 34, 152, 424, 431–2, 463 sample selection, 494–6 samples, 35, 136–7, 379, 385, 494, 506 set, 135, 137, 377, 379, 389, 463, 495 of spectrometers, 121, 131, 162 in spectroscopy, 2, 28, 35, 117, 418–19, 429, 459, 462 transfer, 135, 161, 460, 506 Central limit theorem, derivation of, 101 Chebyshev polynomials, 437, 440 Chemical causes, 142 Chemical interactions, 463, 499 Chemical measurements: qualitative, 125 quantitative, 125 Chemical variation in sample, 500 Chemometric calibrations, 156, 333 Chemometric designs, 89 Chemometric modeling, 134 Chemometrician, 26, 147, 149, 156, 464, 467, 475 Chemometrics-based approach, 473 Chemometrics, 1–2, 48–9, 89, 117, 119–21, 131, 134, 135, 159–60, 163 Chi-squared distribution (2 , 429, 433 set of tables, 102 Chromatography, 167, 418, 420, 479 Classical designs, 53 Coefficient of determination, 375, 379, 385, 398, 496 Coefficient of multiple determination, 28–30, 361, 364–5, 453 Coefficient of variation (CV), 479, 483–4 Coefficients for orthogonalized functions, 452 Collaborative Laboratory Studies, 167–77, 179–81, 183–4, 185–6, 187–92, 193–221 Collaborative study problems, 3, 169, 478 Collinearity, 113, 153 Color schemes, 501 Column vectors in row space, 85 Common Spectral Matching, 493, 495 Commutative rule, 6 Comparing laboratories methods for precision and accuracy, 170, 173–7 Comparing test results for analytical uncertainty, 487
Index Comparison of correlation coefficient: and SEE, 393, 399, 401 and standard deviation, 379–80 Comparison test: for a set of measurements versus true value, 171, 183, 216 for a two sets of measurements, 488 Compliance, 478 Computed transmittance noise, 277 Concentration, 28 ,30, 31, 34, 35, 37, 47, 48, 52, 63, 90, 107, 110, 113, 114, 120, 121, 125, 127, 131, 132, 141, 142, 144, 146, 147, 153, 155, 165, 174, 180, 188, 223, 289, 290, 368, 369, 373, 375, 377, 381, 382, 389, 395, 420, 421, 429, 431, 435, 436, 439, 441, 443, 460, 462, 463, 479, 481, 483, 484, 487 expressed in powers of, 10, 479, 483 units, 28, 132 Confidence interval, 254, 390, 429 Confidence level, 389, 390, 392, 395, 402, 404, 405, 407, 478, 487, 490 Confidence limits: for correlation coefficient, 390 for slope and intercept, 395, 405–6 Constant term, 35–7, 47, 439 Continuous population: distribution of means, 273–4 probability of obtaining a mean value within a given range, 97, 251, 305–7 Contour surface plot, 501 Controlled experiment, 57–9, 62, 93, 159 Correlation coefficient, 5, 6, 123, 124, 147, 154, 155, 163, 164, 232, 375, 379–86, 389–93, 398, 399, 402, 404, 439, 440, 443, 450, 452, 455, 474, 475, 496 confidence levels, 379–80, 390–1, 393, 405 discussion of use, correlation coefficient, population value for, p, 59, 103, 469 methods for computing, 398 Correlation or dot product, 495 Correlation, 3–6, 123–5, 154, 163, 164, 175, 232, 375, 377–87, 389–93, 398, 399, 402, 404, 420, 427, 428, 439, 440, 443, 450, 452, 455, 474, 494–6 Cosine, 72, 73, 74, 437, 495 Counting, 281, 282, 298 Covariance, 6, 7, 232, 474, 475 Covariance of (X, Y), 381, 382 Cramer’s rule, 45 Critical level, 484, 485
515 Critical value, 98, 101, 103, 428, 429 Critical, 1, 41, 48, 52, 98, 101, 103, 104, 156, 161, 162, 212, 215, 219, 428, 429, 475, 484 Cross Correlation techniques, 495 Cross-product matrix, 475 Cross-product, 24, 232, 252, 299, 301, 303, 474, 475 CV, coefficient of variation, 479, 483, 484 Daniel and Wood, 440, 444 Data: continuous, 274, 285, 305, 319 discrete, 247, 250, 274, 285, 305, 309, 315, 327, 332, 336, 489 historical, 433 Data conditioning, 113 Data matrix A, 109, 110, 113, 114, 127, 128, 501 Data set: bounds for, 2 synthetic, 148 Dependent (or “Y”) variable, 28, 34, 379, 468, 469 Dependent events, II, 367, 468 Dependent variable (Y variable), 34, 124, 368 Derivative (difference) ratios, 229, 240–1, 284 Derivatives (different spacings), 341, 344, 349, 351 Derivatives of spectra, 335, 409 Descartes, Rene (1596–1650), 71 Designed experiments, 51, 147 Detection limit, 477, 484 for concentrations near zero, 484 Detection, 282, 376, 477, 479, 484, 499 Detector noise, 223, 224, 226–8, 230, 235, 241, 243, 247, 250, 253, 254, 267, 273, 281–9, 292, 293, 295, 309–11, 313, 320, 325, 327, 328, 332 Determinants, 41–5, 440 Deterministic considerations, 478–9 Developing the model, calibration, 154–5, 381 Diagnosis of data problems, 3 Diagonal elements in a matrix, 43 Diagonal product, 43 Differences, successive, 423, 424 Different size populations, 379, 380, 392, 404 Diffuse reflectance, 154, 163, 225, 235 Digitized spectrum, 273–4, 281 Dimensionality, reducing, 81 Direction angles, 74, 75, 77 Direction cosines, 73, 74
516 Direction in 3-D space (cosine), 74 Direction notation, 72 Discriminant analysis and its subtopics of, 3 Distance between two points, 71 Distance formula, 71 Distribution of means: continuous population, 273–4 discrete population, 250, 273–4 sampling, 54, 60, 61, 170, 274 Distribution(s), 167, 296, 298, 305, 314, 328, 350, 376, 433, 449, 456 binomial, 296, 483 Chi (, 102 Chi-squared ( 2 102 constituent, 459–60 continuous, 247, 309, 319 discrete, 309, 332 F, 210, 213 finite, 248–50, 252, 259, 262–3, 340–1, 357 Gaussian (normal), 52, 124, 433, 449 Gaussian mathematical expression, 52, 124, 335–6 hypergeometric, 4, 33–4 infinite, 248–51, 259, 267 of means for a discrete population, 250, 273–4 multinomial, 296, 357, 436, 442–3, 483–4 Normal (Gaussian) mathematical expression, 3–7, 103, 247–9, 247–50, 267–8, 275, 449 Poisson, 61, 285, 290, 296–9, 304, 309, 315, 319, 327, 328, 332 Poisson, formula for, 283–4, 286, 296–9, 320 of a population, 52, 54, 389–90, 392 of possible measurements showing confidence limits, figure showing, 389–93 probability, 296 of S, 175 of standard deviations for a discrete population, 59, 247, 305 t, 103, 389–90, 392 of variances, 489, 491 of (X - JS), 5–7 of X variable, 473–4 of Y variable, 473–4 Divide-by-zero computation, 309 Division, 6, 11, 25, 78, 244, 245, 251, 340, 341, 346 Dot product, 494, 495
Index Double blind, 274, 331, 347, 371 Double negative, 97 Draper & Smith, 427–8, 441–2 Drift, 60, 61, 121, 147, 155, 417, 418 between sets of readings, 60, 61 instrument, 61, 155, 417 and other systematic error, 418 Durbin-Watson Statistic, 421, 423, 424, 427–9, 431, 432, 435 Echelon form, 14, 15, 20 Effect of instrumental variation on the data, 161 Effect of noise on computed transmittance, 275 Effect of variations of the data on the model, 161 Efficacy, 419 Efficient comparison of two methods, 171, 187 Eigenanalysis, 109, 114 Eigenvectors, 128 Electrochemistry, 420 Electromagnetic spectrum, 142 Electronic noise error, 225 Elementary calculus book, 229 Elementary row operations, 18 Elementary statistics, 95, 285, 306, 379 Elimination, matrix operation, 17, 18, 24, 48 Empty or null set, 9 Energy density of blackbody radiation, 224 Energy-distribution product, 330 Error of integral, 329 Error propagation, 289–91 Error source, 121, 223, 231–2, 274, 325, 417–18 Error sources, calibration, 121–2, 132–3 Error(s) combined, 28–9, 121–2, 123–4, 145, 153, 155, 170, 176, 187–9, 370, 409, 410 electronic noise, 225 estimating total, 3, 34, 70, 164, 392, 408, 429 experimental, 57, 93 heteroscedastic, 424 homoscedastic, 54 of interpretation, 421 maximum, 370, 414 non-random, 428 peak-to-peak, 343, 345, 347–8, 460 population, 98, 101, 103 propagation of, 289–91
Index random, 52, 64, 66, 67, 170, 171, 188, 189, 418, 421, 424, 447, 448, 453, 460, 462, 463 reference method, 28–9, 70, 91, 97, 123 repack, 154 and residuals, 9 sampling, 60 at some stated confidence interval, 3, 34, 70, 164, 392, 408, 429 source of, 232 in spectroscopic data, 31, 34, 38, 120, 131, 141, 353, 355, 359, 367, 377, 499 stochastic, 52, 64–6, 91, 101, 170–1,
188–9, 273–4, 418, 421, 424–5,
447–8, 460, 463, 489
systematic, 167, 168, 176, 177, 188, 190, 200, 201, 208, 209, 219–21 true, 121–2, 231–2, 489 undefined, 251, 277, 305 unsystematic (random), 52, 64–6, 91, 101, 170–1, 188–9, 273–4, 418, 421, 424–5, 447–8, 460, 463, 489 ESR, 335 Euclidean distance (D), 494, 495 Events, dependent, II, 28, 33 Ewing’s terminology, 231 EXCEL™, 241 Excessive signal levels (saturation), 142 Expectation, 170, 171, 230, 259, 260, 265, 270, 276, 311, 315, 341 Expected result, 28, 94–5, 228, 230, 247, 254, 256, 273, 275–7, 285, 295–6, 298–9, 309, 325–7 Expected value of a parameter (E(S), 285, 299–300 Expected value of a parameter S, 228, 299, 432 Experiment: balls in jar, 150, 161 controlled, 54 Experimental chemometrics, 159 Experimental design, 51, 53–5, 57, 59, 62–4, 88, 89, 91, 93, 94, 97, 101, 103, 105, 168, 171, 172, 176, 187, 460, 461 balanced, 52–3
crossed, 93–5, 104–105
efficient, 53–4
fractional factorial, 54, 92
nested designs, 54, 62
nested, 54, 62
one-at-a-time, 62, 91
517 one-factor, two-level experiment, 91 seven factors, table showing, 53 three-factor, two-level crossed experiment, 89, 461
two-factor crossed experiment, 63
two-factor, two-level crossed
experiment, 63 Experimental versus control designs, 62 Expression for relative absorbance noise, 320 Expression for transmittance noise, 289, 320 Extrapolating or generalizing results, 160, 493–5 F-distribution, 210, 213, 397, 432 F-ratio, 432 F-statistic Calculation (Fs) for precision ratio, 190, 220 F statistic, 189, 190, 191, 200, 208, 209, 212, 215, 220, 221, 478 F test, 58, 59, 421, 431, 433, 478 statistical significance of, 432–3 for the regression, 58–9, 431–3 F values, 431–2 F, t2 statistic, 189 Factor analysis scores, 109, 114 Factor analysis, 3, 109, 114, 120 Factorial, 92, 307 design for collaborative data collection, 168 designs, 54, 91, 92, 168 model experimental design, 168 Factors in statistical/chemometric parlance, 51 Failure to use adequate controls, 57–8 Family of curves of multiplication factor as a function of Er, 251 Fatal flaw, 432 FDA/ICH guidelines, 427, 431, 435, 436 Finite population, 273 First difference (derivative), 269, 350 Fisher’s Z transformation (i.e., the Z-statistic), 389, 390 Food and Drug Administration, 447 Fourier coefficients, 28, 381, 383 Fourier transform infrared (FTIR), 231, 246, 335, 365, 415
100% line, 151, 263, 481
table of standard deviations, 479–81
mid-infrared spectrometer, 100% line,
231–2, 246, 270, 365, 415
spectrometers, 231–2 Fractional factorial designs, 54 Fractional factorial, 92
518 Frequency, 107, 109, 113, 127, 224, 315, 499, 500 FWHH, full width at half height, 336 Gamma-ray spectroscopy, 223, 282 Gauss, Carl Friedrich, 249, 253, 314 Gaussian distribution, 124 ,433, 449 Gaussian-shaped bands, 335 Generalized inverse of a matrix, 9 Generalizing results, 160, 493–5 Genetic algorithms (GA), 166 Goodness of fit test, 375–9 Goodness of fit, 375, 379, 381, 389, 395, 398, 429, 441 Gossett, W.S. (Student’s t-test), 183 Graeco-latin square design, 92 Grand Mean, 57, 58, 65, 66, 70, 173, 175, 176, 194 H statistic, 98, 103, 189 Ha, alternative hypothesis, 93, 392 Hamming networks, 494 Handbook of Chemistry and Physics, 276 Heterogenous, variance, 59, 229–30, 262–3, 267–8, 313–15 Heteroscedastic error, 424 Higher order differences (derivatives), 165, 372, 373, 496 Hit quality index (HQI), 494 Ho, null hypothesis, 93, 95, 97, 103–5, 189, 392, 404 Homogeneous, variance, 268, 376 Homoscedastic error, 54 Horwitz’s Trumpet, 477 Hydrogen bonding, 142, 154 in NIR and IR, 235 Hyperplane, 4, 34 Hyperspectral data cube, 499 Hypotenuse, 87, 88 Hypothesis test, 54, 58, 59, 67, 94, 97, 98, 102, 103, 167, 171, 212, 215, 389, 392, 393, 405 Chi-square, 102 nomenclature, 389, 392–3 null, 392 Hypothesized population, 97, 101 Hypothetical synthetic data, 448 ICH specifications, 419 Identity matrix, 11, 12, 19, 20 Image projection, 499
Index Imaging, 499, 501, 502 Incorrect choice of factors/wavelengths, 418 Independent error, 189, 424 Independent variable (X variable), 28, 34, 120, 379 Inferences, statistical, 375, 377 Infinite-finite numbers, 248–52, 259, 262–3, 305 Infrared, 35, 147, 223, 226, 230, 495 Ingle and Crouch’s development, 238 Inhomogeneous sample, 60 Instrument: bandwidth broad Compared to absorbance band, 142 noise, 223–6, 243 Instrument (and other) noise, 223–6, 243 Instrumental causes, 142 Integers, population of, 101, 389, 392 Integral, 247–52, 261–4, 266, 275, 276, 296, 298, 299, 307, 327, 328, 330–2, 436, 457, 458 Integrated circuit problem, 274 Integration interval, 249, 250, 259, 328, 329 Interaction: between variables, 91, 461 with solvent, 142 Intercept (k0 , 95, 123, 375, 379, 380, 395–8, 405–407, 429, 452, 453 confidence limits, 375, 396, 397 of a linear regression line, 381, 395 of a regression line, 379 Interference, 246, 459, 477 Interlaboratory tests, 477 International Chemometrics Society (NAmICS), 1, 362 Interpretive spectroscopy, 377 Inverse Beer’s Law, 120 Inverse of a matrix, 11, 19, 21, 25, 26 K-matrix (multiple linear regression), 3, 138 Known samples, 135 Kowalski, Bruce, 467 Kubelka-Munk function for diffuse reflectance, 235 Laboratory data and assessing error, 3 Laboratory error, 477 Lack of fit error, 28 Latin & Graeco-latin cubes, 92 Latin Square design, 92 Latin squares, 92 LC-GC, 167
Index Learning set, for calibration, 378, 384, 460, 475 Least squared differences, 30 Least-squares, 28, 357–9, 433, 436, 457, 469, 471, 473, 475
criterion, 421, 436
line, 28
property, 468
Left singular values (LSV) matrix or the U matrix, 109, 114, 127 Level of significance, 210, 212, 213, 215, 405 Limit of detection (LOD), 376 Limit of reliable measurement, 477 Limits in analytical accuracy, 483 Linear least-squares, 28 Linear regression, 165, 375, 376, 379, 381, 389, 395, 431
calibration, 28–9, 165, 431
Linearity, 132, 138, 141, 163, 164, 417, 418, 420, 421, 423, 424, 428, 429, 431, 433, 435, 436, 439–43, 447, 449, 450, 452, 459, 460, 461, 463, 464 assumption of, 47, 141–4 calibration, 131–4, 141, 145, 146, 148, 149, 159, 163, 165, 417, 423, 431, 435, 447, 455 Loadings matrix V, 110, 114 Log, 1/R, 235, 286 -Log(R), 277, 294, 322 Logarithm, 95, 153, 155, 238, 277, 322 Lorentzian distribution, 337–40, 410, 411 Low-noise case, 264, 266, 322, 325, 332 Lower confidence limit(LCL), 389, 390 Lower limit, 327, 328, 390, 391, 395, 404, 407, 408 Luck, concept of, 359 Mahalanobis distance, 3, 493–5 weighted regression, 494 Main diagonal (of matrix), 6, 23 Malinowski, Ed, 120 Mandel, John, 477 Manual wet chemistry, 431, 435 Mass fraction of analyte, 479 Match index, 495 Matching index, 493 MathCAD, 167, 171, 173–6, 187, 189, 193, 210, 213, 375, 379–82, 389, 392, 395, 398 Mathematical constructs, 142 Mathematical statistics, 467
519 Mathematician, 26, 33, 34, 467 MATLAB (Matrix Laboratory), 40, 107–11, 113, 114, 116, 117, 127, 128, 249, 258, 263, 267, 315, 328, 362, 364, 419, 501, 502 Matrix, 5–7, 9–12, 15, 17–20, 23–31, 33–6, 38, 41, 43, 47–9, 55, 77, 85, 88, 107–11, 113, 114, 117, 120, 127, 128, 138, 153, 165, 362–4, 381, 382, 389, 439, 468–71, 473, 475, 493–5, 499, 501, 502 addition, 10 algebra refresher, 3 algebra, 7, 9, 12, 23, 28, 30, 31, 33, 38, 41, 43, 47, 88, 107, 109, 113, 117, 127, 471, 473
division, 11
form, 15, 17, 19, 23, 29, 35, 36, 120,
439, 469 inversion, 26, 27, 41, 48, 153, 439, 469 multiplication, 6, 7, 11, 23–5, 27, 363, 470 nomenclature, 21 notation, 5, 6, 11, 17–19, 23, 29, 30, 35, 107, 381, 468, 470, 471, 475 operations, 6, 10, 17–19, 24, 25, 28, 31, 48, 108, 111, 114–16, 362, 471
product, 24, 27
row operations, 36
Maximum error, 433, 459–60 Maximum likelihood: equation, 33, 34 estimator, 33–4, 433 method, 433 Maximum variance in the multivariate distribution, 3 MDL, minimum detection limit, 477 Mean: population, 94, 101, 103, 496 of a population, (mu), 94, 101, 103, 496 of a sample (X bar), 94 sample, 5, 58–9, 104–105 Mean deviation, 173, 175, 176, 183, 189 Mean square: between-treatment, 59 for /s1 for /s1within-treatment, 59 regression, 440 residuals, 30, 70, 421 Mean square error(MSE), 59, 67–70, 450, 479 Means and standard deviations from a population of integers of random samples, Computer, 98, 136, 273
520 Means and standard deviations of a population of integers, computer program, 59–61 Microphonics, 224 Mid-IR, 226 Miller and Miller, 375, 383, 395, 396, 405–408 Minimum detection limit (MDL), 477 MND (Multivariate normal distribution), 2–7 Mode, 152, 282 Model-building, 92 Model for the experiment, equation example, 58–9, 168 Molar absorptivity, 479 Molar concentration, 479–81 Molecular absorption spectroscopy, 479 Monitor, 137, 246, 478 Monte-Carlo calculations, 253 Monte-Carlo numerical computer simulation, 314 Monte Carlo study, 249, 253, 314 Most probable equation, 33 Multilinear regression, 23–9, 33–41, 47–9 Multinomial distribution, 296, 437–9, 483 Multiple correlation, 378 Multiple frequencies, 499, 500 Multiple linear least squares regression (MLLSR), 3 Multiple linear regression(MLR), 3, 21, 23, 28, 30, 33, 34, 35, 41, 43, 47, 107, 113, 119, 127, 134, 138, 145–51, 153–7, 163, 165, 166, 418, 441, 459, 460, 494, 502 Multiplication, 6, 7, 9, 11, 23–5, 27, 77, 78, 250, 251, 332, 363, 470 Multiplier terms, 28, 30 Multipliers, 28, 315, 356 Multiplying both sides of an equation, 25 Multivariate distribution, 3 Multivariate linear models, 12 Multivariate normal distribution (MND), 2–7 Multivariate regression, 84, 107, 109 N random samples, Table of standard deviations of, 67–9 N as number of total specimens, 91–2 Narrow band, 336 Near-infrared, 35, 131, 223, 226 detectors, 223 reflectance analysis, 223 Nested designs, 54, 62
Index Neural networks (NN), 3, 138, 147, 165, 166 learning systems, 494 New materials, 499 News Flash, 455, 459, 460 95% confidence limit, 101, 102, 478, 487, 489 NIR, 1, 131, 149, 151, 235, 295, 335, 365, 415, 417–19, 421, 459–61, 463 NMR, 335 Noise-to-signal ratio of the reference signal, 240 Noise, 223–6, 227–32, 235–41, 243–52, 253–66, 277–9 characteristics, 224, 227, 277, 320 FTIR spectrometer, 231 instrument, 243, 253, 267, 273, 289, 295, 309, 313, 325 level, 91, 151, 230, 235, 245, 246, 249, 251–4, 257, 259, 262–4, 267, 273, 274, 284, 295, 306, 309–11, 321, 325, 327, 330, 341, 357, 369, 373, 374 ratio, 151, 230, 249, 256, 269, 325, 355, 356 spectra, 253, 267, 273, 281, 289, 295, 309, 313, 319, 325, 332, 369 spectrum, 223, 230, 241, 254, 289, 356, 357, 369, 374 variance, 369 Noisy data, figures of, 151, 353 Non-Beer’s law relationship, 120–1 Non-collimated radiation, 142 Non-detector noise, 224 Non-dispersive analyzers, 225 Non-linear detector, 142 Non-linear dispersion, of spectrometer, non-linearity, 4, 133–4 Non-linear electronics, 142 Non-linearities, 132, 155, 225, 252, 295, 443 Non-significant result, 97 Normal (Gaussian) distribution, 449 Normal distribution weighting factor, 249 Normal distribution, 4–7, 103, 247–50, 258, 267, 273, 275, 277, 296, 298, 304–307, 315, 326–8, 330, 331, 337–40, 350, 355, 367, 409, 414, 423, 433, 452, 456, 496 Normal method, 3–4, 103 Normal probability distribution, 296 Normal random number generator, 258 Normality of Residuals, 433 Normally-distributed noise, 275, 277, 278, 295, 301, 309
Index Normally distributed, 54, 65, 251, 258, 263, 267, 273, 278, 296, 328, 376, 389, 425, 427, 433, 452, 453, 455, 484, 489 Null hypothesis test, 392 Number of experiments needed, 91 Number of measurements required, 489 Number of samples in the calibration set, 389 One-at-a-time designs, 62, 91 One-factor, two-level experiment, figure showing, 133, 154, 163, 252 100% line, from FTIR spectrometer, 231 One-hundred-percent transmittance line, from FTIR spectrometer, 246 One-tailed hypothesis test, figure showing, 94 Operative difference for denominator, 431 Operative difference for numerator, 432 Opposite sides, 83 Optical-null principle, 224 Optimization designs, 53 Ordinary regression theory, 132 Ordinate (y-axis), 71 Ordinate, 269, 384–6, 479 Original population, 94, 97–8, 101, 103, 379 Orthogonal, 153, 440 Chebyshev polynomials, 440 Orthogonalize the variables, 440 Orthogonalized functions, 452 Orthogonalized quadratic term, 452 Outliers, 378, 379, 417, 433, 479, 490, 493, 495 prediction, 378–9, 464, 490 samples, 378 theory and practice, 3 Overdetermined, 33, 34, 37, 47 P (probability), 97–8, 101, 298, 306, 309, 330, 332, 375–6 P-matrix (multiple linear regression), 3, 138 P-matrix formulation, 120 Painkillers, 419 Pairs of values, 389, 455 Parabola, 347 Parameter , 251 Parameters, 4, 28, 89, 143, 165, 166, 299, 301, 359, 379, 420, 436, 437, 449, 484, 489 estimate, 301 or matrix names, 9 population, 97–8, 101, 389–90 statistical, 375, 379, 381, 398 Partial F or t-squared test for a regression coefficient, 58–9, 189, 191, 299
521 Partial least squares (PLS), 107, 113, 114, 119, 125, 127, 131, 132, 134, 138, 146–57, 159, 160, 163–6, 418, 460, 494, 502 Partial least squares regression (PLSR), 1, 3, 107, 113, 127 Partitioning the sums of squares, 58, 449, 475 Pascal’s triangle, table of, 83 Pathlength, 141, 143, 144, 225 Pattern recognition, 494 Peak picking algorithm, 347 Peak-to-peak error, 132, 344–5 Peak, 132, 148, 152, 153, 165, 252, 336, 337, 343–5, 347, 355, 460 Pedagogic, 26, 54, 64, 81, 132, 152, 243, 250, 341, 375, 449, 452 Percent CV, 479 Perfectly noise-free spectrum, 146, 150 Pharmacopoeia, 419 Physical variation in sample, 377 Pitfalls of statistics, 375–80 Plane, 4, 6, 34, 71, 81–5, 120, 463, 495, 499 PLS singular value decomposition (plsSVD), 114 PLS, partial least squares regression, 1, 3, 107, 113, 127 Point estimates, 228 Poisson distribution, 61, 285, 290, 296, 298, 299, 304–307, 309, 315, 319, 327, 328, 332 formula for, 296, 306 Poisson-noise case, 291 Polynomials, 357, 359, 361, 373, 436–41, 447 Pooled precision, 174 and accuracy, 174 Pooled standard deviation, 197, 198, 206, 488, 491 Poor choice of algorithm and/or data transformation, 418 Population, 52, 54, 59, 94, 97, 98, 101, 103, 136, 273, 379, 380, 389, 390, 392, 404, 468, 496 distribution of, 103–105 error, 52 finite, 136, 273 of integers, 94, 97–8, 101, 103 large, 273, 379, 389, 392–3 mean (, 94, 101 original, 94, 97–8, 101, 103, 379 parameters, 97–8, 101, 389–90 of spectra, 496
522 Population (Continued) value, 59, 103, 468 variance, 59, 103 Potency, 419 Power of the statistical test, 97 Practical quantitation level (PQL), 484 Precision and standard deviation of methods (Comparison), 189, 220 Precision, 36, 101, 102, 121, 167, 168, 170–4, 176, 177, 187–90, 194, 197, 199, 200, 202, 206–208, 216, 220, 237, 239, 243, 250, 254, 258, 269, 291, 294, 307, 442, 452, 459, 473, 477–9, 483, 487, 488 Prediction: error, 28, 382, 383 samples, 135 vector, 107 Prediction error sum of squares (PRESS), 122–4, 136, 147 PRESS statistic, 123, 124 Principal components analysis (PCA), 3, 109, 113, 114, 119, 125, 127, 132–4, 138, 148, 149, 151, 153, 154, 156, 157, 159, 163, 166 Principal components for regression vectors, 86 Principal components regression (PCR), 1, 3, 107, 113, 127, 131, 134, 138, 145–50, 152–7, 163–6, 418, 460, 494, 502 Principal components scores, 109–11 Probabilistic answer, 119–21 Probabilistic calculations, 251–2, 259, 264, 306, 315, 375–6, 487–8 Probabilistic considerations, 119 Probabilistic force, 33, 119, 120, 167, 254, 259, 264, 427, 489 Probabilistic statements, 33, 427 Probability, 94, 97, 98, 101, 102, 105, 160, 251, 252, 262, 271, 274, 296, 298, 305, 306, 309, 315, 320, 328, 330, 332, 375, 376, 427, 428, 463, 487, 489 distribution, normal, 296 sampling, 274 theory, 160, 306 and statistics, connection between, 375 Projection, 4–6, 81–3, 86, 87, 463, 499 Proof that the variance of the sums equals the sums of the variances, 229, 232 Propagation of uncertainties expression, 310 Proportion, 224, 225, 256, 263, 276, 281, 283–5, 287, 317–21, 325, 330, 340, 341, 344, 346, 347, 351, 368, 420, 452, 483, 484
Index Pseudoinverse, 107, 468 Pseudoinverse, theorem, 107, 468 Q-Test for Outliers, 490 Quadratic non-linearity, 452 Quadratic polynomial, 442, 447 Qualitative analysis (Spectral matching), 3 Qualitative, chemical measurements, 48 Quantitative analysis, 30, 34, 48, 49, 125, 162, 367, 381 Quantitative, chemical measurements, 254, 418–19 Quasi-algebraic operations, 25 Quintic polynomial, 360 R-squared (R2 , 379, 398 Random (stochastic) noise, 91, 146, 150, 224, 254, 370, 418, 424 Random effect(s), 64, 151, 266, 449 of noise, 227, 228 Random error, 52, 64, 66, 67, 170, 171, 188, 189, 418, 421, 424, 447, 448, 453, 460, 462, 463 Random numbers, 252, 263, 267, 467 generator, 258, 263, 267 Random phenomena, 33, 65 Random sample, 52, 98, 98, 136, 136, 267, 273, 450, 455, 489 Random variable, 155, 228, 229, 230, 232, 252, 260, 267, 298, 314, 353, 356 Randomness, 33, 285 behavior of, 33, 285 test for, 33, 285 Rank, 185 Ranking test, 171, 185 for Laboratories and Methods, 185 Rastering, 499 Ratio of the range (Sr) to the SEE, 386 Ratioed spectra, 226Ratios of upper to lower confidence limits, table of, 98, 389 Real data, 59, 150, 152, 155, 159, 167, 172, 259, 336, 341, 347, 425, 437, 440, 455 Real world samples, 119, 157, 245, 305, 336, 425, 489 Reducing dimensionality, 81 Reference laboratory value, 107, 167, 183, 187, 193–201 Reference method error, 33–5, 70, 119, 121–5, 171–2, 183–4, 273–4, 289–91, 439–41 Reference noise, 231, 282, 310 Reference spectral library, 493
Index Reflectance (reflection), 154, 163, 223, 225, 226, 227, 235, 282, 283, 378 Reflection (reflectance), 154, 163, 223, 225–6, 227, 235, 282, 283, 378 Regression: algorithms, 26, 48–9 analysis, 7, 34, 421, 440, 450, 468 calculations, 421 coefficients, 28, 30, 38, 40, 43, 107, 110, 114, 469 Regression (MLR), and P-matrix, and its sibling, K-matrix, 3 Regression line, 379, 381, 382, 395, 397, 420 linear equation, 379 Relative error of the absorbance, 289, 290 Relative mean deviation, 183 Reliability, 119 Repack, 154 averaging, 59–61 Repeat readings, 59, 168, 170 Repeatability, 60, 460, 477, 478, 505 Replicate measurement, 173, 174, 175, 187, 484, 487, 488, 490 Replicates, 173–6, 185, 478, 487–91 Representative sample, 54, 136 Reproducibility, 477, 478 Residual Error, 28 Residual sum of squares, 421, 432 Residuals, 433 Resource-conserving experimental design, 93–5 Response surface designs, 54 Response surface, 54, 62, 92 Result, actual, 37–8, 40, 315 Result, expected, 97, 145, 149, 179 Rho, table of exact values, 403 RHS, 263, 474 Right singular values matrix (RSV) or the V matrix, 109, 114 Right triangle, 82, 87 Rocke and Lorenzato, 483 Root mean square error (RMS), of FTIR spectrometer signal, 231, 415 difference, 176–7 FTIR % line, 246, 415 Rotation, 81, 83, 84, 365, 415 Row effects, 36, 70, 85–6 Row equivalent, 18, 36 Row operations, 18, 19, 20, 36, 37, 39, 41, 48 Row vectors in column space, 85 RSSK/Norm, 373, 374
523 S: calculation of the sample standard deviation, 103 standard deviation of a sample, 101, 103 Sample: blank, 227 mean, 172 non-homogeneity, 60–1 pathlength, 507, 508 presentation error, 123 representative, 54 selection, 3, 496 calibration, 35 statistic, 93–5, 97 Sampling, 54, 60, 61, 170, 274 Sampling distribution, 54, 60, 61, 170, 274 expression for, 290 Sampling error, 274 Savitzky-Golay convolution functions, 355, 357, 371, 372, 435 Savitzky-Golay/Steinier tables, 359 Scalars, 9 Scaling, 113, 299, 301, 341, 364, 414 Scatter diagrams, 378 Science of Statistics, 33, 125, 151, 160, 467, 473 Scintillation noise, 224, 316, 319, 322, 325, 327, 330, 332 Scores matrix T, 109, 114 Screening designs, 53 Second derivative, 335, 337–8, 340, 343, 345–6, 347–8 of the normal distribution, 338, 409 Second difference (derivative), 347–8 Second law of thermodynamics, 143 Second-order data, 499 SECV, Standard error of cross validation, 419 SED, standard error of difference, 123–4, 163–4 SEE (Standard Error of Estimate), 123, 124, 379, 380, 383, 386, 398, 402, 406 SEL, Standard error of the laboratory, 478 Self-interaction, 142 Self-polymerization or condensation, 142 Sensitivity testing, 62 SEP (Standard Error of Prediction), 161–2, 381–4, 419 Sequential design, 92, 93, 103 Sequential experimental design, 103 Set of regression coefficients, 107, 110, 117 Shot-noise, 223, 289, 296
524 Signal-to-noise (S/N) ratio, 347 Significance level, 98, 404 Simple correlation, 378 Simple least squares regression (SLSR), 3 Simple linear least squares regression (SLLSR), 3 Simultaneous equations, 23, 24, 25, 26, 27, 29, 439, 469, 470 Sine, 330 Single wavelength, 132, 134, 359, 499 Singular value decomposition (SVD), 127 Singular values matrix (SVM) or the S matrix, 109, 114 Slope (k1 , 75, 395 confidence limits, 396 defining in two dimensions, 75–6 of a linear regression line, 395 Solvent interactions, 63, 142 Source of error, 232 Spatial dimension, 499, 501, 502 Spatial information, 499 Special designs, 62(this ciation is only in table) Specimen, 92 Spectra: of noise, 131–3 population of, 496 Spectral matching (Qualitative analysis), 3 Spectral matching approaches, 493 Spectral search algorithms, 494 Spectral searches, 493 Spectrophotometry, 479 Spectroscopic amplitude, 499 Spectroscopic imaging, 499–503 Spectroscopist, 30, 141, 142, 143, 144, 146, 147, 151, 152, 156, 245, 255 Spectroscopy: calibration, 367–74 FTIR, 231, 246, 335, 365, 415 home page, 171 magazine, 1, 141, 467 Spectrum, noise, 151, 278, 369–70 Specular reflection, 124, 223, 225–6, 270, 282 Spiked or true values (TV), 175 Spiked recovery method, 183 Square of the correlation coefficient, 450 Square root of variance (standard deviation), 474 Squares for residuals, 424 Standard calibration set, 379
Index Standard deviation (s or S), calculation for a sample, 101
of A, 237–9
of a of a sample (s or S), 101
of difference (SOD), 423, 479
pooled, 58, 60, 197–8, 206, 488, 491
of T, 229, 289
Standard deviation of a population(), 59, 94, 101, 103 Standard error of calibration (SEC), 122, 124, 163–4, 380, 385 Standard error of cross validation (SECV), 410 Standard error of estimate (SEE), 123, 124, 379, 380, 383, 386, 398, 402, 406 Standard error of laboratory (SEL), 478 Standard error of prediction (SEP), 161–2, 381–4, 419 Standard error of the laboratory (SEL), 478 Standard error of the mean, 382 Standard error of validation (SEV), 477–8 Standard error of validation (SEV), 123–4 Standard Practice for General Techniques for Qualitative Analysis, 493 Standardization concepts, 3 Statistic, test, 93, 94 Statistical analysis, 8, 17, 180, 423, 425, 491 Statistical conclusion, 441–3 Statistical design of experiments, Statistical design of experiments, using ANOVA, 168–72, 473–5 Statistical experimental design, 51, 54, 62, 89, 91 Statistical inferences, 375 Statistical significance, 97, 441, 450, 461 Statistical tests, 171, 192, 193, 375, 378, 443, 447, 449, 506, 507, 508 Statistical variability, 423 Statistically designed experiments, 41, 147 Statistically significant, 51–2, 57–60, 97, 171, 179, 424, 427–9, 439, 441, 443, 447, 455, 477, 478 Statistician, 91, 119–20, 162, 247, 376, 423–6, 433, 464 Statistics: applied, 428, 449–50 general, 1, 379–80, 429 mathematical, 58, 311, 314 pitfalls, 375–80 science of, 33, 125, 151, 160, 467, 473 Steinier, 359, 360, 361, 362
Index Stochastic (random) noise, 91, 146, 150, 224, 254, 370, 418, 424 Stochastic error, 52, 64–6, 91, 101, 170–1, 188–9, 273–4, 418, 421, 424–5, 447–8, 460, 463, 489 Stray light effects, 132–3, 463 Stray light, 132, 142, 152, 155, 463 Student’s (W.S. Gossett) t-test, 183 Student’s t-statistic, 396 Student’s t-test, mathematical description, 221 Student’s t-value for a regression, 189, 390 Studentized t-test for the residual, 183–4 Subclasses, 378 Subsamples, 478 Subtraction, 6, 10, 70, 79, 449 Sum of differences, 494 Sum of squares, 23, 34, 58, 70, 421, 432, 449, 450, 457, 458, 470, 475 between-groups, 70, 449–50, 457–8 due to error, 34, 70, 470 for regression, 34 for residuals, 421 within-groups, 449 Summation, 490 notation, 30, 381. 382, 395, 396 of variance from several data sets, 490 Super-whiz-bang chemometrics, 149 Survey, 478 Systematic effects, 151, 170, 171, 449 Systematic error (bias), 171, 187, 201, 209, 477 Systematic errors for methods A vs. B, 188, 219 T-distribution, nature of, 93, 103–4 T-statistic, 189, 395 T-table, 189, 191, 192, 216, 222 T-test, 57, 59, 93, 183, 189, 191, 192, 221, 439 T, calculation, 191 T, F1/2 , 189 Tangent of the x direction angle, 75 Test for non-linearity, 431, 435 Test samples, 135, 168, 508 Test spectrum, 493, 495, 496 row matrix, 495 Test statistic, 93, 94, 122, 189, 191, 221, 392, 404, 488 Testing correlation for different size populations, 392, 404 Testing for non-linearity, 421, 435
525 Testing for systematic error in a method, 183 Tests for non-linearity, 133–4 Tests for randomness, 33 Thermal, independent noise, 263 Third-order data, 499 3-D to 2-D projection, 81 3rd -order data, 499, 500 Three-dimensional data, 81, 501 Three-dimensional surface plot, 502, 503 Three factor, two level, 89 Total degrees of freedom, calculation, 59, 70, 216, 487–8 calculation, 475 Training set (Calibration set), 135, 137, 379, 389, 495 Transfer of calibrations, 3 Transmittance multiplication factor, 331 Transmittance, 275 Transpose of a matrix, 12, 28 Trigonometric functions of a right triangle, 82 True derivative, 347 True error, 231 True value, 175, 183, 194, 202, 216, 217, 468 Trumpet curve, 483 2-D into 1-D by rotation, 84 Two-dimensional contour: map overlay onto a three-dimensional surface plot, 502 plot, 501, 503 Two-dimensional coordinate space, 72 Two-dimensional reduction, 83 Two equations and two unknowns, 43 Two-factor design, 63 Two-factor, two-level crossed experiment, 91–2 Two-sample charts, 188, 219 Two-way analysis of variance (ANOVA), 64, 65 Type of noise, 246, 332 U.S. Environmental Protection Agency (EPA), 484 UCL, upper confidence limit, 390, 391 Unaddressed Problems in Chemometrics, 135 Unbiased estimators, 431, 487 Uncertainty in an Analytical Measurement, 487 Uncontrolled, non-systematic variable, 187–8 Undefined error, 34, 251 Unexplained Error, 28 Uniformly distributed noise, 275, 277, 278
526 Unit matrix, 23, 25, 26, 469 Univariate least squares regression, or Simple least squares regression (SLSR), 3 Univariate methods, 419, 421 Univariate statistics, 6–7, 123–4, 419–21, 462–3 Unsolved problems, 135, 162 Unsystematic (random), errors, 52, 64, 66, 170–1, 188–9, 421, 424, 447–9, 453, 460, 462, 463 Upper confidence limit (UCL), 390, 391 Upper critical limit, 98 Upper Limit, 271, 307, 389, 390, 404, 407, 408, 439 Validation, 133, 135–7, 375, 419, 464 of calibration models, 3 parameters, 420 Validity of a test set, 135 Variability, measures of, 61, 98, 260, 277, 429 Variable, 4, 6, 23, 25, 28, 29, 33, 40, 47, 51, 52, 53, 131, 146, 150, 153, 155, 228, 229, 330 apparent sample size, 483 interaction, 460–2 nested, 54, 62 uncontrolled, non-systematic, 187–8 Variance(s), ( 2 58, 250 addition of, 357 between groups, 58–9, 212, 215 computation of, 258, 315, 564 definition of, 262, 265, 473 heterogenous, 356 homogeneous, 268, 376 population ( 2 , 52 sample (s2 or S2 , 431–2, 491 square root of, 474 sum of, 229, 232, 261 techniques, 168, 171, 179–80 terms become infinite at sufficiently small values of Er, 267 of variance, 65, 262, 473 within groups, 58–9 of X, 356, 369, 376 of Y, 376 Variation of pathlength, 225
Index Variations in temperature, 63, 224–5Vector(s), 6, 7, 11, 77, 85, 86, 87, 382, 471, 495, 496 addition, 78 division, 78 multiplication, 77, 78 subtraction, 79 Vignetting the beam, 326 Voigtman’s development, 291 Wavelength selection error, 131, 157, 459, 460, 464 Wavelets, 166, 494 Within-laboratory variation, 481 Within-treatment mean square, 70, 175–6 X-axis (abscissa), 71, 82, 93, 121, 125, 131, 285, 336, 337, 342, 343, 350, 364, 378, 415, 424, 425, 449, 450 X-direction angle, 82, 83 X-ray, 223, 281, 282, 285, 298 X-scale, 365, 415 X-variable, 28, 121, 425, 445, 449, 460 X versus Y, 375–7, 379 figure of, 378 X, Y coordinate spatial image, 500 Y-axis (ordinate), 71 Y-direction angle, 72, 73. Y distribution, 376 Y estimate, 34, 120, 122–4, 187–9, 337, 381 Y-intercept, 95, 421 Y variable, 34, 124, 150, 368, 474 Youden, W.J., 171, 172, 187 Youden/Steiner Comparison of Two Methods, 172 Youden’s monograph, 171 Z: calculation of, 403, 404 as number of standard deviations from the hypothesized population mean, 330 Z axis data, 501 Z statistic, 94, 375, 389, 390, 391, 392, 404 Z-test, 93 Zero-crossing, 350 (mu), mean of a population, X, 98, 103–104 P (rho), population value for correlation coefficient, 103
COLOUR PLATE
SECTION
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 301
289
277
265
253
241
229
217
205
193
181
169
157
145
133
121
97
–0.2
109
85
73
61
49
37
25
1
13
0
Colour Plate 1 Six samples worth of spectra with two bands, without (left) and with (right) stray light. (see Figure 27-1, p. 132)
PLS Loadings 0.2 0.15 0.1
300
288
276
264
252
240
228
216
204
192
180
168
156
144
132
108
120
96
84
72
60
48
36
–0.05
24
0
0 12
Loading
0.05
–0.1 –0.15 –0.2 –0.25 –0.3 Index
Colour Plate 2 PLS loadings from the synthetic data used to test the fit of models to nonlinearity. (see Figure 33-1, p. 164)
Exact versus approximate solution 0.6
Absorbance noise
0.5 0.4 0.3 0.2 0.1
1
0.96
0.92
0.88
0.8
0.84
0.76
0.72
0.68
0.6
0.64
0.56
0.52
0.48
0.4
0.44
0.36
0.32
0.28
0.2
0.24
0.16
0.12
0.08
0
0.04
0
%T
Colour Plate 3 Absorbance noise as a function of transmittance, for the exact solution (upper curve: equation 42-32) and the approximate solution (lower curve: equation 42-33). The noise-to signal ratio, i.e., E/Er was set to 0.01. (see Figure 42-2, p. 237) 5
Integration terms 4
f(E r)
3
Normal distribution
Product
2
f(E r)
1 0 –0.25 –1
–0.13
–0.01
0.11
0.23
0.35 ΔE r
0.47
0.59
0.71
0.83
0.95
–2 –3 –4 –5 –6
Expansion of integral functions 2
f(E r)
1.5 1
Normal distribution
Product
0.23
0.2
0.17
0.14
0.11
0.08
0.05
0.02
–0.01
–0.04
–0.07
–0.1
–0.13
–0.16
–0.19
–0.5
–0.22
0
0.25
f(E r)
0.5
ΔE r
–1 –1.5 –2
Colour Plate 4 The Normal curve, the function f (Er [= Er /(Er + Er from equation 43-62 and their product. (see Figure 43-5, p. 248)
Multiplication factor for T as a function of E r
1.4
σ = 0.1
σ = 1.0
Multiplication factor
1.2 1 0.8 0.6 0.4 0.2
4.84
4.4
4.62
4.18
3.96
3.74
3.3
3.52
3.08
2.86
2.64
2.2
2.42
1.98
1.76
1.54
1.1
1.32
0.88
0.66
0.44
0
0.22
0
Er
Colour Plate 5 Family of curves of multiplication factor as a function of Er , for different values of the parameter sigma (the noise standard deviation), for Normally distributed error. Values of sigma range from 0.1 to 1.0 for the ten curves shown. (see Figure 43-6, p. 251)
140
Transmittance noise
120
100
80
60
40
20 0
0
1
2
3
4
5
6
7
8
9
10
S/N (Er /ΔEr)
Colour Plate 6 Transmittance noise as a function of reference S/N ratio, for alternate anal ysis (equation 44-68a). The sample transmittance was set to unity. The limit for the value of Es + Es /Er + Er was set to 10,000 for the upper curve and to 1000 for the lower curve. (see Figure 44-7a-1, p. 263)
1.2
Transmittance noise
1
0.8
0.6
0.4
0.2
0 4
5
6
7
8
9
10
S/N (Er /ΔEr)
Colour Plate 7 Expansion of Figure 44-7a-1. (see Figure 44-7a-2, p. 263)
140
120
Transmittance noise
Monto-Carlo (equation 44-76a) 100
80
Theory (equation 44-19) Approx (equation 44-52b)
60
40
20
0
0
1
2
3
4
5
6
7
8
9
10
S/N (Er /ΔEr)
Colour Plate 8 Comparison of empirically determined transmittance noise value with those determined according to the low-noise approximations of equation 44-19 and equation 44-52b. (see Figure 44-8a, p. 264)
140
Transmittance noise
120 100 80 60 40 20
0
0
1
2
3
4
5
6
7
8
9
10
S/N (Er /ΔEr)
Colour Plate 9 Transmittance noise as a function of reference S/N ratio, at various values of sample transmittance. Blue curve: T = 1. Green curve: T = 0.5. Red curve: T = 0.1. (see Figure 44-9a-1, p. 265)
1.2 1.1
Transmittance noise
1 0.9 0.8 0.7 0.6 0.5 0.4
T=1
0.3
T = 0.5
0.2
T = 0.1 4.2
4.4
4.6
4.8
5
5.2
S/N (Er /ΔEr)
Colour Plate 10 Expansion of Figure 44-9a-1. (see Figure 44-9a-2, p. 265)
S/N = 4 1.2 1.1
Transmittance noise
1 0.9
S/N = 4.5
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transmittance
Colour Plate 11 Transmittance noise as a function of transmittance, for different values of reference energy S/N ratio (recall that, since the standard deviation of the noise equal unity, the set value of the reference energy equals the S/N ratio). (see Figure 44-10a, p. 266)
8 7
Absorbance noise
6 5
Computed 4 3 2
Theory 1 0
0
5
10
15
20
25
30
35
40
45
50
S/N (Er /ΔEr)
Colour Plate 12 Comparison of computed absorbance noise to the theoretical value (according to equation 44-32), as a function of S/N ratio, for constant transmittance (set to unity). (see Figure 44-11a-1, p. 267)
0.35
Absorbance noise
0.3 0.25 0.2
Computed
0.15
Theory
0.1 0.05 0 5
10
15
20
25
30
40
35
45
S/N (Er /ΔEr)
Colour Plate 13 Expansion of Figure 44-11a-1. (see Figure 44-11a-2, p. 268)
12.00 10.00
Er = 10
SD (A)/A
8.00
Er = 3 6.00 4.00 2.00
0.86
0.82
0.78
0.74
0.7
0.66
0.62
0.58
0.54
0.5
0.46
0.42
0.38
0.3
0.34
0.26
0.22
0.18
0.1
0.14
0.00
%T
Colour Plate 14 Family of curves for SD(A/A for different values of Er . (see Figure 45-10,
p. 273)
Variances using 5,000 and 100,000 values 20,000 18,000 16,000
Variance
14,000
Er, 100,000 values
12,000 10,000 8,000
Es, 100,000 values
6,000 4,000 2,000
9.65
9.30
8.95
8.60
8.25
7.90
7.55
7.20
6.85
6.50
6.15
5.80
5.45
5.10
4.75
4.40
4.05
3.70
3.35
3.00
0
Er
Expansion of plot 0.20 Er term, 100,000 values
Variance
0.15
Es term, 100,000 values 5,000 values
0.10
0.05
9.65
9.30
8.95
8.60
8.25
7.90
7.55
7.20
6.85
6.50
6.15
5.80
5.45
5.10
4.75
4.40
4.05
3.70
3.35
3.00
0.00
Er
Colour Plate 15 Values of the variances in the two terms of equation 45-77, using different numbers of values. (see Figure 45-12, p. 275)
1 ≤ λ ≤ 11
(a)
Poisson distribution
0.4 0.35
λ=1
0.3
P(X)
0.25 0.2 0.15
λ = 11
0.1 0.05
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
X 0