Adaptive Filter Theory Solution manual only (4th Edition)

  • 94 444 10
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Adaptive Filter Theory Solution manual only (4th Edition)

CHAPTER 1 1.1 Let r u ( k ) = E [ u ( n )u * ( n – k ) ] (1) r y ( k ) = E [ y ( n )y * ( n – k ) ] (2) We are give

2,563 182 921KB

Pages 338 Page size 612 x 792 pts (letter) Year 2009

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

File loading please wait...
Citation preview

CHAPTER 1 1.1

Let r u ( k ) = E [ u ( n )u * ( n – k ) ]

(1)

r y ( k ) = E [ y ( n )y * ( n – k ) ]

(2)

We are given that y(n) = u(n + a) – u(n – a)

(3)

Hence, substituting Eq. (3) into (2), and then using Eq. (1), we get r y(k ) = E [(u(n + a) – u(n – a))(u*(n + a – k ) – u*(n – a – k ))] = 2r u ( k ) – r u ( 2a + k ) – r u ( – 2a + k ) 1.2

We know that the correlation matrix R is Hermitian; that is R

H

= R

Given that the inverse matrix R-1 exists, we may write –1 H

R R

= I

where I is the identity matrix. Taking the Hermitian transpose of both sides: RR

–H

= I

Hence, R

–H

= R

–1

That is, the inverse matrix R-1 is Hermitian. 1.3

For the case of a two-by-two matrix, we may Ru = Rs + Rν

1

2 r 11 r 12 σ 0 = + 2 r 21 r 22 0 σ

=

r 11 + σ r 21

2

r 12 r 22 + σ

2

For Ru to be nonsingular, we require 2

2

det ( R u ) = ( r 11 + σ ) ( r 22 + σ ) – r 12 r 21 > 0 With r12 = r21 for real data, this condition reduces to 2

2

( r 11 + σ ) ( r 22 + σ ) – r 12 r 21 > 0 2

2

Since this is quadratic in σ , we may impose the following condition on σ for nonsingularity of Ru: 4∆ r   2 1 σ > --- ( r 11 + r 22 )  1 – -------------------------------------- 2 2  ( r + r ) – 1 11

22

2

where ∆ r = r 11 r 22 – r 12 1.4

We are given R = 1 1 1 1 This matrix is positive definite because a T a Ra = [ a 1 ,a 2 ] 1 1 1 1 1 a2 2

2

= a 1 + 2a 1 a 2 + a 2

2

2

= ( a 1 + a 2 ) > 0 for all nonzero values of a1 and a2 (Positive definiteness is stronger than nonnegative definiteness.) But the matrix R is singular because 2

2

det ( R ) = ( 1 ) – ( 1 ) = 0 Hence, it is possible for a matrix to be positive definite and yet it can be singular. 1.5

(a) H

r(0) r R M+1 = r RM

(1)

Let –1

R M+1 =

a b

H

b C

(2)

where a, b and C are to be determined. Multiplying (1) by (2): H

r(0) r I M+1 = r RM

a b

b

H

C

where IM+1 is the identity matrix. Therefore, H

r ( 0 )a + r b = 1

(3)

ra + R M b = 0

(4)

H

(5)

rb + R M C = I M H

H

r ( 0 )b + r C = 0

T

(6)

From Eq. (4):

3

–1

b = – R M ra

(7)

Hence, from (3) and (7): 1 a = -----------------------------------H –1 r ( 0 ) – r RM r

(8)

Correspondingly, –1

RM r b = – -----------------------------------H –1 r ( 0 ) – r RM r

(9)

From (5): –1

–1

C = R M – R M rb

H

–1

=

–1 RM

H

–1

R M rr R M + -----------------------------------H –1 r ( 0 ) – r RM r

(10)

As a check, the results of Eqs. (9) and (10) should satisfy Eq. (6). H

–1

H

–1

H

–1

r ( 0 )r R M H – 1 r R M rr R M r ( 0 )b + r C = – ------------------------------------ + r R M + ------------------------------------H –1 H –1 r ( 0 ) – r RM r r ( 0 ) – r RM r H

H

= 0

T

We have thus shown that H

–1 R M+1

–1

T 1 –r R M 0 0 = + a –1 0 R–1 R M r R – 1 rr H R – 1 M M M T 1 0 0 = + a –1 0 R–1 –R M r M

H

–1

[ 1 –r R M ]

4

where the scalar a is defined by Eq. (8):

(b)

RM

B*

r R M+1 = BT r(0) r

(11)

Let D

–1

R M+1 =

e

e f

H

(12)

where D, e and f are to be determined. Multiplying (11) by (12): RM

B*

D

r I M+1 = BT r(0) r

e

H

e f

Therefore RM D + r RM e + r r r

B* H

e

B*

(13)

= I

(14)

f = 0

BT

e + r(0) f = 1

BT

D + r ( 0 )e

H

= 0

(15) T

(16)

From (14): – 1 B*

e = – RM r

(17)

f

Hence, from (15) and (17): 1 f = --------------------------------------------BT – 1 B* r ( 0 ) – r RM r

(18)

Correspondingly,

5

– 1 B*

RM r e = – --------------------------------------------BT – 1 B* r ( 0 ) – r RM r

(19)

From (13): –1

– 1 B* H

D = RM – RM r

e

– 1 B* BT

=

–1 RM

–1

RM r r RM + --------------------------------------------BT – 1 B* r ( 0 ) – r RM r

(20)

As a check, the results of Eqs. (19) and (20) must satisfy Eq. (16). Thus BT

r

BT

D + r ( 0 )e

H

= r

BT

= 0

–1 RM +

– 1 B* BT

–1

BT

–1

r ( 0 )r R M r RM r r RM ------------------------------------------------ – --------------------------------------------BT – 1 B* BT – 1 B* r ( 0 ) – r RM r r ( 0 ) – r RM r

T

We have thus shown that – 1 B* BT

–1

–1 R M+1

=

RM 0 0

T

+f

–r

0

–1

=

RM 0 0

T

0

RM r

r

BT

–1

R M – R – 1 r B* M

–1

RM

– 1 B*

+f

–R M r

[ –r

1

BT

1

–1

RM 1 ]

where the scalar f is defined by Eq. (18). 1.6

(a) We express the difference equation describing the first-order AR process u(n) as u ( n ) = v ( n ) + w1 u ( n – 1 ) where w1 = -a1. Solving this equation by repeated substitution, we get u ( n ) = v ( n ) + w1 v ( n – 1 ) + w1 u ( n – 2 )

6

= … 2

n-1

= v ( n ) + w1 v ( n – 1 ) + w1 v ( n – 2 ) + … + w1 v ( 1 )

(1)

Here we have used the initial condition u(0) = 0 or equivalently u(1) = v(1) Taking the expected value of both sides of Eq. (1) and using E [v(n)] = µ

for all n,

we get the geometric series 2

n-1

E [ u ( n ) ] = µ + w1 µ + w1 µ + … + w1 µ    1 – w n 1  - , =  µ  -------------1 – w  1   µn, 

      

w1 ≠ 1 w1 = 1

This result shows that if µ ≠ 0 , then E[u(n)] is a function of time n. Accordingly, the AR process u(n) is not stationary. If, however, the AR parameter satisfies the condition: a 1 < 1 or w 1 < 1 then µ E [ ( n ) ] → --------------- as n → ∞ 1 – w1 Under this condition, we say that the AR process is asymptotically stationary to order one. (b) When the white noise process v(n) has zero mean, the AR process u(n) will likewise have zero mean. Then

7

2

var [ v ( n ) ] = σ v

2

var [ u ( n ) ] = E [ u ( n ) ].

(2)

Substituting Eq. (1) into (2), and recognizing that for the white noise process   2 E [ v ( n )v ( k ) ] =  σ v  0, 

n=k

(3)

n≠k

we get the geometric series var [ u ( n ) ] = σ v ( 1 + w 1 + w 1 + … + w 1 2

2

4

2n-2

  2n   2  1 – w1  - ,  σ v  ----------------2 =  1 w –  1   2  σ v n, 

)

w1 ≠ 1 w1 = 1

When |a1| < 1 or |w1| < 1, then 2

2

σv σv var [ u ( n ) ] ≈ --------------- = -------------- for large n 2 2 1 – w1 1 – a1 (c) The autocorrelation function of the AR process u(n) equals E[u(n)u(n-k)]. Substituting Eq. (1) into this formula, and using Eq. (3), we get 2

k

k+2

E [ u ( n )u ( n – k ) ] = σ v ( w 1 + w 1

+ … + w1

k+2n-2

  2n   2 k  1 – w1  - ,  σ v w 1  ----------------2 =  – 1 w  1   2  σ v n, 

8

w1 ≠ 1 w1 = 1

)

For |a1| < 1 or |w1| < 1, we may therefore express this autocorrelation function as r ( k ) = E [ u ( n )u ( n – k ) ] 2 k

σv w1 ≈ --------------- for large n 2 1 – w1 Case 1: 0 < a1 < 1 In this case, w1 = -a1 is negative, and r(k) varies with k as follows: r(k) -3

-1

+3

+1

-2

-4

0

+2

+4

k

Case 2: -1 < a1 < 0 In this case, w1 = -a1 is positive and r(k) varies with k as follows: r(k)

-4

1.7

-3

-2

-1

0

+1

+2

+3 +4

k

(a) The second-order AR process u(n) is described by the difference equation: u ( n ) = u ( n – 1 ) – 0.5u ( n – 2 ) + v ( n ) Hence w1 = 1 w 2 = – 0.5 and the AR parameters equal a1 = –1 a 2 = 0.5 Accordingly, we write the Yule-Walker equations as

9

r(0) r(1)

r(1) r(0)

1 = r(1) – 0.5 r(2)

(b) Writing the Yule-Walker equations in expanded form: r ( 0 ) – 0.5r ( 1 ) = r ( 1 ) r ( 1 ) – 0.5r ( 0 ) = r ( 2 ) Solving the first relation for r(1): 2 r ( 1 ) = --- r ( 0 ) 3

(1)

Solving the second relation for r(2): 1 r ( 2 ) = --- r ( 0 ) 6

(2)

(c) Since the noise v(n) has zero mean, so will the AR process u(n). Hence, 2

var [ u ( n ) ] = E [ ( u n ) ] = r ( 0 ). We know that 2 σv

2

=

∑ ak r ( k ) k=0

= r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 )

(3)

Substituting (1) and (2) into (3), and solving for r(0), we get 2

σv r ( 0 ) = ---------------------------- = 1.2 2 1 1 + --- a 1 --- a 2 3 6 1.8

By definition, P0 = average power of the AR process u(n)

10

= E[|u(n)|2] = r(0)

(1)

where r(0) is the autocorrelation function of u(n) for zero lag. We note that r(1) r(2) … r(M )  -, ----------, , -------------   r--------r(0)   (0) r(0)

{ a 1, a 2, …, a M }

Equivalently, except for the scaling factor r(0), { r ( 1 ), r ( 2 ), …, r ( M ) }

{ a 1, a 2, …, a M }

(2)

Combining Eqs. (1) and (2): { r ( 0 ), r ( 1 ), r ( 2 ), …, r ( M ) }

{ P 0, a , a 2 , … , a M } 1 1.9

(a) The transfer function of the MA model of Fig. 2.3 is * –1

H ( z ) = 1 + b1 z

* –2

+ b2 z

+ … + bK z

* –K

(b) The transfer function of the ARMA model of Fig. 2.4 is b0 + b1 z + b2 z + … + bK z H ( z ) = --------------------------------------------------------------------------------* –1 * –2 * –M 1 + a1 z + a2 z + … + a M z *

* –1

* –2

* –K

(c) The ARMA model reduces to an AR model when b0 = b1 = … = bK = 0 It reduces to an MA model when a1 = a2 = … = a M = 0 1.10

We are given x ( n ) = υ ( n ) + 0.75υ ( n – 1 ) + 0.25υ ( n – 2 ) Taking the z-transforms of both sides:

11

(3)

X ( z ) = ( 1 + 0.75z

–1

–2

+ 0.25z )V ( z )

Hence, the transfer function of the MA model is –1 –2 X (z) ----------- = 1 + 0.75z + 0.25z V (z)

1 = --------------------------------------------------------------–1 –2 –1 ( 1 + 0.75z + 0.25z )

(1)

Using long division, we may perform the following expansion of the denominator in Eq. (1): ( 1 + 0.75z

–1

–2 –1

+ 0.25z )

3 – 1 5 – 2 3 – 3 11 – 4 45 – 5 = 1 – --- z + ------ z – ------ z – --------- z + ------------ z 64 4 16 256 1024 85 – 8 627 – 9 91 – 6 93 – 7 1541 – 10 – ------------ z + --------------- z + --------------- z – ------------------ z + --------------------- z +… 65536 262144 4096 16283 1048576 ≈ 1 – 0.75z – 0.0222z

–6

–1

+ 0.3125z

+ 0.0057z

–7

–2

– 0.0469z

+ 0.0013z

–8

–3

– 0.043z

– 0.0024z

–4

–9

+ 0.0439z

+ 0.0015z

–5

– 10

(2)

(a) M = 2 Retaining terms in Eq. (2) up to z-2, we may approximate the MA model with an AR model of order two as follows: X (z) 1 ----------- ≈ ---------------------------------------------------------– 1 V ( z ) 1 – 0.75z + 0.3125z – 2 (b) M = 5 Retaining terms in Eq. (2) up to z-5, we obtain the following approximation in the forms of an AR model of order five: X (z) 1 ----------- ≈ --------------------------------------------------------------------------------------------------------------------------------------------------– 1 – 2 V ( z ) 1 – 0.75z + 0.3125z – 0.0469z – 3 – 0.043z – 4 + 0.0439z – 5

12

(c) M = 10 Finally, retaining terms in Eq. (2) up to z-10, we obtain the following approximation in the form of an AR model of order ten: X (z) 1 ----------- ≈ ----------V (z) D(z) where D(z) is given by the polynomial on the right-hand side of Eq. (2). 1.11

(a) The filter output is H

x(n) = w u(n) where u(n) is the tap-input vector. The average power of the filter output is therefore 2

H

H

E [ x ( n ) ] = E [ w u ( n )u ( n )w ] H

H

= w E [ u ( n )u ( n ) ]w H

= w Rw (b) If u(n) is extracted from a zero mean white noise of variance σ2, we have 2

R = σ I where I is the identity matrix. Hence, 2

2 H

E [ x(n) ] = σ w w 1.12

(a) The process u(n) is a linear combination of Gaussian samples. Hence, u(n) is Gaussian. (b) From inverse filtering, we recognize that v(n) may also be expressed as a linear combination of samples represented by u(n). Hence, if u(n) is Gaussian, then v(n) is also Gaussian.

1.13

(a) From the Gaussian moment factoring theorem: *

E ( u1 u2 )

k

= E [ u1 … u1 u2 … u2 ] *

*

13

= k! E [ u 1 u 2 ] … E [ u 1 u 2 ] *

*

*

= k! ( E [ u 1 u 2 ] )

k

(1)

(b) Putting u2 = u1 = u, Eq. (1) reduces to E[ u

2k

2

] = k! ( E [ u ] )

k

1.14

It is not permissible to interchange the order of expectation and limiting operations in Eq. (1.113). The reason is that the expectation is a linear operation, whereas the limiting operation with respect to the number of samples N is nonlinear.

1.15

The filter output is y(n) =

∑ h ( i )u ( n – i ) i

Similarly, we may write y(m) =

∑ h ( k )u ( m – k ) k

Hence, *

r y ( n, m ) = E [ y ( n )y ( m ) ] = E

∑ h ( i )u ( n – i ) ∑ h i

=

k

*

( k )E [ u ( n – i )u ( m – k ) ]

∑ ∑ h ( i )h

*

( k )r u ( n – i, m – k )

i

1.16

*

( k )u ( m – k )

∑ ∑ h ( i )h i

=

*

*

k

k

The mean-square value of the filter output in response to white noise input is 2

2σ ∆ω P o = -----------------π

14

The value Po is linearly proportional to the filter bandwidth ∆ω. This relation holds irrespective of how small ∆ω is, compared to the mid-band frequency of the filter. 1.17

(a) The variance of the filter output is 2

2 2σ ∆ω σ y = -----------------π

We are given 2

σ = 0.1 volt

2

∆ω = 2π × 1 radians/sec. Hence, 2 2 2 × 0.1 × 2 σ y = -------------------------- = 0.4 volt π

(b) The pdf of the filter output y is 2 2 – y ⁄ 2σ y 1 f ( y ) = ----------------- e 2πσ y 2 – y ⁄ 0.8 1 = ---------------------e 0.63 2π

1.18

(a) We are given N -1

Uk =

∑ u ( n ) exp ( – jnωk ) ,

k = 0,1,...,N-1

n=∞

where u(n) is real valued and 2π ω k = ------k N Hence,

15

* E[UkUl ]

N -1 N -1

= E

∑ ∑ u ( n )u ( m ) exp ( – jnωk + jmωl )

n=0 m=0 N -1 N -1

=

∑ ∑ exp ( – jnωk + jmωl )E [ u ( n )u ( m ) ]

n=0 m=0 N -1 N -1

=

∑ ∑ exp ( – jnωk + jmωl )r ( n – m )

n=0 m=0 N -1

=

N -1

∑ exp ( jnωk ) ∑ r ( n – m ) exp ( – jnωk )

m=0

(1)

n=0

By definition, we also have N -1

∑ r ( n ) exp ( – jnωk )

= Sk

n=0

Moreover, since r(n) is periodic with period N, we may invoke the time-shifting property of the discrete Fourier transform to write N -1

∑ r ( n – m ) exp ( – jnωk )

= exp ( – jmω k )S k

n=0

Thus, recognizing that ωk = (2π/N)k, Eq. (1) reduces to * E[UkUl ]

N -1

= Sk

∑ exp ( jm ( ωl – ωk ) )

m=0

 S , =  k  0,

l=k otherwise

(b) Part (a) shows that the complex spectral samples Uk are uncorrelated. If they are Gaussian, then they will also be statistically independent. Hence,

16

1 H 1 f U ( U 0, U 1, …, U N -1 ) = --------------------------------- exp  – --- U ΛU  2  N ( 2π ) det ( Λ ) where U = [ U 0, U 1, …, U N -1 ]

T

H 1 Λ = --- E [ UU ] 2

1 = --- diag ( S 0, S 1, …, S N – 1 ) 2 N -1

1 det ( Λ ) = ------ ∏ S k N 2 k=0 Therefore,  N -1 U 2 1 1 k - exp  – --- ∑ ------------ f U ( U 0, U 1, …, U N -1 ) = --------------------------------------N -1  2 1  N –N  k=0 --- S k  ( 2π ) 2 ∏ S k 2 k=0 2

= π

1.19

–N

N -1   Uk   exp  ∑ –  ------------ – ln S k  k=0  S k  

The mean square value of the increment process dz(ω) is 2

E [ dz ( ω ) ] = S ( ω )dω Hence E[|dz(ω)|2] is measured in watts. 1.20

The third-order cumulant of a process u(n) is c 3 ( τ 1, τ 2 ) = E [ u ( n )u ( n + τ 1 )u ( n + τ 2 ) ] = third-order moment. All odd-order moments of a Gaussian process are known to be zero; hence,

17

c 3 ( τ 1, τ 2 ) = 0 The fourth-order cumulant is c 4 ( τ 1, τ 2, τ 3 ) = E [ u ( n )u ( n + τ 1 )u ( n + τ 2 )u ( n + τ 3 ) ] – E [ u ( n )u ( n + τ 1 ) ]E [ u ( n + τ 2 )u ( n + τ 3 ) ] – E [ u ( n )u ( n + τ 2 ) ]E [ u ( n + τ 1 )u ( n + τ 3 ) ] – E [ u ( n )u ( n + τ 3 ) ]E [ u ( n + τ 1 )u ( n + τ 2 ) ] For the special case of τ = τ1 = τ2 = τ3, the fourth-order moment of a zero-mean Gaussian process of variance σ2 is 3σ4, and its second-order moment of σ2. Hence, the fourth-order cumulant is zero. Indeed, all cumulants higher than order two are zero. 1.21

The trispectrum is ∞

C 4 ( ω 1, ω 2, ω 3 ) =











τ 1 =-∞ τ 2 =-∞ τ 3 =-∞

c 4 ( τ 1, τ 2, τ 3 )e

– j ( ω1 τ1 + ω2 τ2 + ω3 τ3 )

Let the process be passed through a three-dimensional band-pass filter centered on ω1, ω2, and ω3. We assume that the bandwidth (along each dimension) is small compared to the respective center frequency. The average power of the filter output is proportional to the trispectrum, C4(ω1, ω2, ω3). 1.22

(a) Starting with the formula c k ( τ 1, τ 2, …, τ k-1 ) = γ k



∑ hi hi + τ1 … hi + τk-1

i=-∞

the third-order cumulant of the filter output is ∞

c 3 ( τ 1, τ 2 ) = γ 3

∑ hi hi + τ1 hi + τ2

i=-∞

where γ 3 is the third-order cumulant of the filter input. The bispectrum is

18



C 3 ( ω 1, ω 2 ) = γ 3



τ 1 =-∞ τ 2 =-∞ ∞

= γ3







c 3 ( τ 1, τ 2 )e

– j ( ω1 τ1 + ω2 τ2 )







i=-∞ τ 1 =-∞ τ 2 =-∞

hi hi + τ hi + τ e 1 2

– j ( ω1 τ1 + ω2 τ2 )

Hence, jω 1 jω 2 * j ( ω 1 + ω 2 ) C 3 ( ω 1, ω 2 ) = γ 3 H  e  H  e  H  e      

(b) From this formula, we immediately deduce that arg [ C 3 ( ω 1, ω 2 ) ] = arg H  e  1.23

 + arg H  e jω 2 – arg H  e j ( ω 1 + ω 2 )     

jω 1

The output of a filter of impulse response hi due to an input u(i) is given by the convolution sum y(n) =

∑ hi u ( n – i ) i

The third-order cumulant of the filter output is, for example, C 3 ( τ 1, τ 2 ) = E [ y ( n )y ( n + τ 1 )t ( n + τ 2 ) ] = E

∑ hi u ( n – i ) ∑ hk u ( n + τ1 – k ) ∑ hl u ( n + τ2 – l ) i

= E

k

∑ hi u ( n – i ) ∑ hk+τ1 u ( n – k ) ∑ hl+τ2 u ( n – l ) i

=

l

k

l

∑ ∑ ∑ hi hk+τ1 hl+τ2 E [ u ( n – i )u ( n – k )u ( n – l ) ] i

k

l

For an input sequence of independent and identically distributed random variables, we note that  γ E [ u ( n – i )u ( n – k )u ( n – l ) ] =  3  0,

i = k= l



otherwise



19

Hence, ∞

C 3 ( τ 1, τ 2 ) = γ 3

∑ hi hi+τ1 hi+τ2

i=-∞

In general, we may thus write ∞

C 3 ( τ 1, τ 2, …, τ k-1 ) = γ k

∑ hi hi+τ1 …hi+τk-1

i=-∞

1.24

By definition:

r

(α)

N -1

* – j2παn jπαk 1 ( k ) = ---- ∑ E [ u ( n )u ( n – k )e ]e N n=0

Hence,

r

(α)

N -1

* – j2παn – j παk 1 ( – k ) = ---- ∑ E [ u ( n )u ( n + k )e ]e N n=0

r

( α )*

N -1

* j2παn – j παk 1 ( k ) = ---- ∑ E [ u ( n )u ( n – k )e ]e N n=0

We are told that the process u(n) is cyclostationary, which means that *

E [ u ( n )u ( n + k )e

– j2παn

*

] = E [ u ( n )u ( n – k )e

j2παn

]

It follows therefore that r 1.25

(α)

( –k ) = r

( α )*

(k)

For α = 0, the input to the time-average cross-correlator reduces to the squared amplitude of a narrow-band filter with mid-band frequency ω. Correspondingly, the time-average cross-correlator reduces to an average power meter. Thus, for α = 0, the instrumentation of Fig. 1.16 reduces to that of Fig. 1.13.

20

CHAPTER 2 2.1

(a) Let wk = x + jy p(-k) = a + jb We may then write f = wk p*(-k) = (x + jy)(a - jb) = (ax + by) + j(ay - bx) Let f = u + jv with u = ax + by v = ay - bx Hence, ∂u ------ = a ∂x

∂u ------ = b ∂y

∂v ----- = a ∂y

∂v ------ = – b ∂x

From these results we immediately see that ∂u ∂v ------ = ----∂x ∂y ∂v ∂u ------ = – -----∂x ∂y In other words, the product term wk p*(-k) satisfies the Cauchy-Rieman equations, and so this term is analytic.

21

(b) Let f = wk*p(-k) = (x - jy) (a + jb) = (ax + by) + j(bx - ay) Let f = u + jv with u = ax + by v = bx - ay Hence, ∂u ------ = a ∂x

∂u ------ = b ∂y

∂v ------ = b ∂x

∂v ----- = – a ∂y

From these results we immediately see that ∂u ∂v ------ ≠ ----∂x ∂y ∂u ∂v ------ = – -----∂y ∂x In other words, the product term wk*p(-k) does not satisfy the Cauchy-Rieman equations, and so this term is not analytic. 2.2

(a) From the Wiener-Hopf equation, we have –1

wo = R p

(1)

22

We are given

R =

1 0.5

0.5 1

p =

0.5 0.25

Hence, the inverse matrix R-1 is

R

–1

= 1 0.5

0.5 1

–1

1 = ---------- 1 0.75 – 0.5

– 0.5 1

Using Eq. (1), we therefore get 1 w o = ---------- 1 0.75 – 0.5 1 = --- 1 3 – 0.5

– 0.5 0.5 1 0.25 – 0.5 2 1 1

1 = --- 1.5 3 0 = 0.5 0 (b) The minimum mean-square error is 2

H

J min = σ d – p w o

23

2 = σ d – 0.5, 0.25 0.5 0 2

= σ d – 0.25 (c) The eigenvalues of matrix R are roots of the characteristic equation 2

2

( 1 – λ ) – ( 0.5 ) = 0 That is, the two roots are λ 1 = 0.5 and λ 2 = 1.5 The associated eigenvectors are defined by Rq=λq For λ1 = 0.5, we have

1 0.5

0.5 q 11 = 0.5 q 11 1 q 12 q 12

Expanding q11 + 0.5 q12 = 0.5 q11 0.5 q11 + q12 = 0.5 q12 Therefore, q11 = - q12 Normalizing the eigenvector q1 to unit length, we therefore have 1 q 1 = ------- 1 2 –1

24

Similarly, for the eigenvalue λ2 = 1.5, we may show that 1 q 2 = ------- 1 2 1 Accordingly, we may express the Wiener filter in terms of its eigenvalues and eigenvectors as follows: 2

 H 1 w o =  ∑ ----q i q i  p  i=1 λ i    1 =  1 1, – 1 + --- 1 1, 1  0.5 3 1  –1  0.25

=

( –11

–1 1

1 + --- 1 3 1

H 1 ------ q q λ 1 1 1

2.3

1 1

)

0.5 0.25

H 1 ------ q q λ 2 2 2

p

(a) From the Wiener-Hopf equation we have –1

wo = R p

(1)

We are given 1 0.5 0.25 R = 0.5 1 0.5 0.25 0.5 1 and p = 0.5 0.25 0.125

T

Hence, the use of these values in Eq. (1) yields

25

–1

wo = R p 1 0.5 0.25 = 0.5 1 0.5 0.25 0.5 1

–1

0.5 0.25 0.125

1.33 – 0.67 0 0.5 = – 0.67 1.67 – 0.67 0.25 0 – 0.67 1.33 0.125

= 0.5 0 0

T

(b) The minimum mean-square error is H

2

J min = σ d – p w o

=

2 σd

0.5 – 0.5 0.25 0.125 0 0

2

= σ d – 0.25 (c) The eigenvalues of matrix R are λ = 0.4069,

0.75,

1.8431

The corresponding eigenvectors constitute the orthogonal matrix:

Q =

– 0.4544 – 0.7071 0.5418 0.7662 0 0.6426 – 0.4544 0.7071 0.5418

Accordingly, we may express the Wiener filter in terms of its eigenvalues and eigenvectors as follows:

26

3

 H 1 w o =  ∑ ----q i q i  p  λi  i=1

 – 0.4544  1 =  ---------------- 0.7662 – 0.4544 0.7662 – 0.4554  0.4069 – 0.4544 

1 + ---------0.75

– 0.7071 0 – 0.7071 0 0.7071 0.7071

  0.5 0.5418   1 × 0.25  + --------------1.8431 0.6426 0.5418 0.6426 0.5418   0.5418   0.125  0.2065 – 0.3482 0.2065  1 =  ---------------- – 0.3482 0.5871 – 0.3482  0.4069 0.2065 – 0.3482 0.2065 

1 + ---------0.75

0.5 0 – 0.5 0 0 0 – 0.5 0 0.5

0.2935 0.3482 0.2935  0.5 1 + ---------------- 0.3482 0.4129 0.3482  0.25 1.8431  0.2935 0.3482 0.2935  0.125 2.4

By definition, the correlation matrix H

R = E [ u ( n )u ( n ) ] where

27

...

u(n) u ( n ) = u ( n-1 ) u(0) Invoking the ergodicity theorem, N

H 1 R ( N ) = ----------- ∑ u ( n )u ( n ) N +1 n=0

Likewise, we may compute the cross-correlation vector p = E [ u ( n )d * ( n ) ] as the time average N

* 1 p ( N ) = ----------- ∑ u ( n )d ( n ) N +1 n=0

The tap-weight vector of the Wiener filter is thus defined by N

  H w o ( N ) =  ∑ u ( n )u ( n )  n=0 

–1

N

  *  ∑ u ( n )d ( n )  n=0 

which is dependent on the length (N+1) of the time series. 2.5

(a) R = E[u(n)uH(n)] = E[(α(n)s(n) + v(n))(α*(n)sH(n) + vH(n))] With α(n) uncorrelated with v(n), we have R = E[|α(n)|2]s(n)sH(n) + E[v(n)vH(n)] = σα2s(n)sH(n) + Rv

(1)

where Rv is the correlation matrix of v(n). (b) The cross-correlation vector between the input vector u(n) and desired response d(n) is

28

p = E[u(n)d*(n)]

(2)

If d(n) is uncorrelated with u(n), we have p = 0. Hence, the tap-weight of the Wiener filter is wo = R-1p =0 (c) With σα2 = 0, Eq. (1) reduces to R = Rv With the desired response d(n) = v(n-k) Eq. (2) yields p = E [ ( α ( n )s ( n ) + v ( n )v * ( n – k ) ) ] *

= E [ v ( n )v ( n – k ) ]

...

   v(n)   v(n – 1)  * = E (v (n – k))     v(n – M + 1)    rv(n) rv(k – 1)

,

0≤k≤M–1

(3)

...

=

rv(k – M + 1)

where rv(k) is the autocorrelation of v(n) for lag k. Accordingly, the tap-weight vector of the (optimum) Wiener filter is wo = R-1p

29

= Rv-1p where p is as defined in Eq. (3). (d) For a desired response d(n) = α(n)e-jωτ the cross-correlation vector p is p = E [u(n)(d*n)] *

= E [ α ( n )s ( n ) + v ( n )α ( n )e = s ( n )e

jωτ

2

= σ α s ( n )e

jωτ

]

2

E [ α(n) ]

jωτ

1

e

– jω

e

...

2 = σα e

jωτ

– jω ( M – 1 )

e ...

2 = σα e

e

jωτ jω ( τ – 1 )

jω ( τ – M + 1 )

The corresponding value of the tap-weight vector of the Wiener filter is

e 2

H

–1

e

jω ( τ – 1 )

...

2

w o = σ α ( σ α s ( n )s ( n ) + R v )

jωτ

e

jω ( τ – M + 1 )

30

e

jωτ

...

  – 1 jω ( τ – 1 ) H 1 e =  s ( n )s ( n ) + ------- R v 2   σ α

e 2.6

jω ( τ – M + 1 )

The optimum filtering solution is defined by the Wiener-Hopf equation Rwo = p for which the minimum mean-square error equals 2

H

J min = σ d – p w o

(1)

(2)

Combine Eqs. (1) and (2) into a single relation: 2

H

σd

p

p

R

J 1 = min –wo 0

Define 2

A = σd p

p

H

(3)

R

Since 2

σ d = E [ d ( n )d * ( n ) ] , *

p = E [ u ( n )d ( n ) ] , and *

R = E [ u ( n )u ( n ) ] , we may rewrite Eq. (3) as *

A = E [ d ( n )d ( n ) ] * E [ u ( n )d ( n ) ]

H

E [ d ( n ) ]u ( n ) H

E [ u ( n )u ( n ) ]

31

  = E  d ( n ) d * ( n ), u H ( n )   u(n)  The minimum mean-square error equals 2

H

J min = σ d – p w o

(4)

2

Eliminate σ d between Eqs. (1) and (4): H

H

H

H

J ( w ) = J min + p w o – p w – w p + w Rw

(5)

Eliminate p between (2) and (5): H

H

H

H

J ( w ) = J min + w o Rw o – w o Rw – w Rw o + w Rw

(6)

where we have used the property RH = R. We may rewrite Eq. (6) simply as H

J ( w ) = J min + ( w – w o ) R ( w – w o ) which clearly show that J(wo) = Jmin. 2.7

The minimum mean-square error equals 2

H

–1

J min = σ d – p R p

(1)

Using the spectral theorem, we may express the correlation matrix R as R = QΛQ

H

M

=

∑ λk qk qk

H

k=1

Hence, the inverse of R equals

R

–1

M

=

1

q ∑ -----q λk k k

H

(2)

k=1

32

Substituting Eq. (2) into (1):

J min =

2 σd

M



1 H

∑ -----p λk

H

qk qk p

k=1

=

2 σd

M



1

p ∑ ----λk

H

qk

2

k=1

2.8

When the length of the Wiener filter is greater than the model order m, the tail end of the tap-weight vector of the Wiener filter is zero; thus, am

wo =

0

Therefore, the only possible solution for the case of an over-fitted model is am

wo = 2.9

0

(a) The Wiener solution is defined by RM aM = pM r M-m

RM H

r M-m

am

=

R M-m, M-m 0 M-m

pm p M-m

R M am = pm H

r M-m a m = p M-m H

H

–1

p M-m = r M-m a m = r M-m R M p m

(1)

(b) Applying the condition of Eq. (1) to the example in Section 2.7: H

r M-m = [ – 0.05, 0.1, 0.15 ]

33

0.8719 a m = – 0.9129 0.2444 The last entry in the 4-by-1 vector p is therefore H

r M-m a m = – 0.0436 – 0.0912 + 0.1222 = – 0.0126 2.10

2

H

2

H

J min = σ d – p w o –1

= σd – p R p When m = 0, 2

J min = σ d = 1.0 When m = 1, 1 J min = 1 – 0.5 × ------- × 0.5 1.1 = 0.9773 When m = 2, J min = 1 – [ 0.5 – 0.4 ] 1.1 0.5 = 1 - 0.6781 = 0.3219

0.5 1.1

–1

0.5 – 0.4

When m = 3,

34

1.1 0.5 0.1 J min = 1 – 0.5 – 0.4 – 0.2 0.5 1.1 0.5 0.1 0.5 1.1 = 1 - 0.6859 = 0.3141

–1

0.5 – 0.4 – 0.2

When m = 4, Jmin = 1 - 0.6859 = 0.3141 Thus any further increase in the filter order beyond m = 3 does not produce any meaningful reduction in the minimum mean-square error. 2.11

(a) u ( n ) = x ( n ) + ν 2 ( n )

(1)

d ( n ) = – d ( n-1 ) × 0.8458 + ν 1 ( n )

(2)

x ( n ) = d ( n ) + 0.9458x(n-1)

(3)

d ( n ) = x ( n ) – 0.9458x ( n-1 ) Using Eqs. (2) and (3): x ( n ) – 0.9458x ( n-1 ) = 0.8458 [ – x ( n-1 ) + 0.9458x ( n-2 ) ] + ν 1 ( n ) Rearranging terms: x ( n ) = ( 0.9458 – 8.8458 )x ( n-1 ) + 0.8x ( n-2 ) + ν 1 ( n ) x ( n ) = 0.1x ( n-1 ) + 0.8x ( n-2 ) + ν 1 ( n ) (b) u ( n ) = x ( n ) + ν 2 ( n ) where x ( n ) are ν 2 ( n ) are uncorrelated Therefore, R = R x + R ν 2

35

Rx =

r x(0)

r x(1)

r x(1)

r x(0)

2

r x(0) = σx

2

1 + a2 σ1 = --------------- ⋅ ---------------------------------- = 1 1 – a2 ( 1 + a )2 – a2 2

ν1(n) + _

1

.

Σ

d(n)

z-1 0.8458 d(n)

(a) Σ

d(n-1)

.

x(n) z-1

0.9452

(b)

–a1 r x ( 1 ) = --------------1 + a2 = 0.5

Rx =

1 0.5

R ν = 0.1 2 0

0.5 1 0 0.1

R = R x + R 2 = 1.1 0.5

p =

0.5 1.1

p(0) p(1)

p ( k ) = E [ u ( n – k ) ⋅ d ( n ) ],

k = 0, 1

36

ν2(n)

Σ

u(n)

p ( 0 ) = r x ( 0 ) + b1 r x ( –1 ) = 1 – 0.9458 × 0.5 = 0.5272 p ( 1 ) = r x ( 1 ) + b1 r x ( 0 ) = 0.5 – 0.9458 = – 0.4458 Therefore,

p =

0.5272 – 0.4458 –1

(c) Optimum weight vector w o = R p = 1.1 0.5 =

2.12

0.5 1.1

–1

0.5272 – 0.4458

0.8363 – 0.7853

(a) For M = 3 taps, the correlation matrix of the tap inputs equals

R =

1.1 0.5 0.85 0.5 1.1 0.5 0.85 0.5 1.1

The cross-correlation vector between the tap inputs and the desired response equals 0.527 p = – 0.446 0.377 (b) The inverse of the correlation matrix equals

R

–1

2.234 – 0.304 – 1.666 = – 0.304 1.186 – 0.304 – 1.66 – 0.304 2.234

37

Hence, the optimum weight vector equals 0.738 –1 w o = R p = – 0.803 0.138 The minimum mean-square error equals J min = 0.15 (a) The correlation matrix R is H

R = E [ u ( n )u ( n ) ]

e

– jω 1 n

2 = E [ A1 ] e

– jω 1 ( n – 1 )

e

+ jω 1 n

,e

+ jω 1 ( n – 1 )

...

2.13

e

– jω 1 ( n – M + 1 ) H

2

2

= E [ A 1 ]s ( ω 1 )s ( ω 1 ) + IE [ v ( n ) ] 2

H

2

= σ 1 s ( ω 1 )s ( ω 1 ) + σ v I where I is the identity matrix. (b) The tap-weight vector of the Wiener filter is –1

wo = R p From part (a), 2

2

H

R = σ v I + σ 1 s ( ω 1 )s ( ω 1 ) We are given 2

p = σ0 s ( ωo )

38

, …, e

+ jω 1 ( n – M + 1 )

To invert the matrix R, we use the matrix inversion lemma (see Chapter 9), as described here: If A = B-1 + CD-1CH then A-1 = B - BC(D + CHBC)-1CHB

In our case, 2

A = σv I B

2

–1

= σv I

–1

= σ1

D

2

C = s ( ω1 ) Hence, H 1 ------ s ( ω 1 )s ( ω 1 ) 2 σv –1 1 R = ------ I – ---------------------------------------------2 2 σv σv H ------ + s ( ω 1 )s ( ω 1 ) 2 σ1

The corresponding value of the Wiener tap-weight vector is –1

wo = R p 2

σ0 H ------ s ( ω 1 )s ( ω 1 ) 2 2 σ0 σv = ------ s ( ω 0 ) – ---------------------------------------------- s ( ω 0 ) 2 2 σv σv H ------ + s ( ω 1 )s ( ω 1 ) 2 σ1 We note that

39

H

s ( ω 1 )s ( ω 1 ) = M H

s ( ω 1 )s ( ω 0 ) = scalar Hence,     2 2 H   σ0 σ 0 s ( ω 1 )s ( ω 0 ) w o = ------ s ( ω 0 ) –  ------ --------------------------------- s ( ω 1 )  2  2 2 σv σv  σv  ------ + M   2   σ0 2.14

The output of the array processor equals e ( n ) = u ( 1, n ) – wu ( 2, n ) The mean-square error equals 2

J (w) = E [ e(n) ] = E [ ( u ( 1, n ) – wu ( 2, n ) ) ( u * ( 1, n ) – w * u * ( 2, n ) ) ] 2

2

2

= E [ u ( 1, n ) ] + w E [ u ( 2, n ) ] *

– wE [ u ( 2, n )u * ( 1, n ) ] – w * E [ u ( 1, n )u ( 2, n ) ] Differentiate J(w) with respect to w: * 2 ∂J ------- = – 2E [ u ( 1, n )u ( 2, n ) ] + 2wE [ u ( 2, n ) ] ∂w

∂J ( w ) Putting --------------- = 0 and solving for the optimum value of w: ∂w *

E [ u ( 1, n )u ( 2, n ) ] w o = ---------------------------------------------2 E [ u ( 2, n ) ] 2.15

Define the index of performance (i.e., cost function)

40

2

H H

H

H

J ( w ) = E [ e ( n ) ] + c S w + w Sc – 2c D H

H H

H

H

= w Rw + c S w + w Sc – 2c D

1⁄2

1⁄2

1

1

Differentiate J(w) with respect to w and set the result equal to zero: ∂J ------- = 2Rw + 2Sc = 0 ∂w Hence, –1

w o = – R Sc But, we must constrain wo as H

S wo = D

1⁄2

1

This constraint yields H

–1

– S R Sc = D

1⁄2

1

Therefore, the vector c equals H

–1 1 ⁄ 2

–1

c = – (S R S) D

1

Correspondingly, the optimum weight vector equals –1

H

–1

–1 1 ⁄ 2

wo = R S ( S R S ) D 2.16

1

The weight vector w of the beamformer that maximizes the output signal-to-noise ratio H

w Rs w ( SNR ) o = -------------------H w Rv w is derived in part (b) of the solution to Problem 2.18. There it is shown that the optimum weight vector wSN so defined is given by

41

–1

w SN = R v s

(1)

where s is the signal component and Rv is the correlation matrix of the noise comment v(n). On the other hand, the optimum weight vector of the LCMV beamformer is defined by

wo =

–1 R s(φ) * g ----------------------------------H –1

(2)

s ( φ )R s ( φ )

where s(φ) is the steering vector. In general, the formulas (1) and (2) yield different values for the weight vector of the beamformer. 2.17

Let τi be the propagation delay, measured from the zero-time reference to the ith element of a nonuniformly spaced array, for a plane wave arriving from a direction defined by angle θ with respect to the perpendicular to the array. For a signal of angular frequency ω, this delay amounts to a phase shift equal to -ωτi. Let the phase shifts for all elements of the array be collected together in a column vector denoted by d(ω,θ). The response of a beamformer with weight vector w to a signal (with angular frequency ω) originating from angle θ = wHd(ω,θ). Hence, constraining the response of the array at ω and θ to some value g involves the linear constraint wHd(ω,θ) = g Thus, the constraint vector d(ω,θ) serves the purpose of generalizing the idea of an LCMV beamformer beyond simply the case of a uniformly spaced array. Everything else is the same as before, except for the fact that the correlation matrix of the received signal is no longer Toeplitz for the case of a nonuniformly spaced array.

2.18

(a) Under hypothesis H1, we have u=s+v The correlation matrix of u equals T

R = E [ uu ] T

= ss + R N , where RN = E[VVT]. T

The tap-weight vector wk is chosen so that w k u yields an optimum estimate of the kth element of s. Thus, with s(k) treated as the desired response, the cross-correlation vector between u and s(k) equals

42

p k = E [ us ( k ) ] k = 1, 2, …, M

= ss ( k ),

Hence, the Wiener-Hopf equation yields the optimum value of wk as –1

w ko = R p k T

–1

k = 1, 2, …, M

= ( ss + R N ) ss ( k ),

(1)

To apply the matrix inversion lemma (introduced in Problem 2.13), we let A = R B-1 = RN C = s D = 1 Hence, –1

R

–1

=

–1 RN

T

–1

R N ss R N – ---------------------------T –1 1 + s RN s

(2)

Substitute Eq. (2) into (1): –1 T –1  R N ss R N  – 1 w ko =  R N – ---------------------------- ss ( k )  T –1  1 + s RN s  –1

T

–1

–1

T

–1

R N s ( 1 + s R N s ) – R N ss R N s = ----------------------------------------------------------------------------------s ( k ) T –1 1 + s RN s –1 s(k ) = ---------------------------R N s T –1 1 + s RN s

43

(b) The output signal-to-noise ratio equals 2

T

E[(w s) ] SNR = --------------------------T 2 E[(w v) ] T

T

w ss w = ------------------------------T T w E [ vv ]w T

T

w ss w = --------------------T w RN w

(3)

Since RN is positive definite, we may write 1⁄2 1⁄2

RN = RN RN

Define the vector 1⁄2

a = RN w or equivalently, –1 ⁄ 2

w = RN

a

(4)

Accordingly, we may rewrite Eq. (3) as follows T

1⁄2

T

1⁄2

a R N ss R N a SNR = --------------------------------------------T a a

(5)

where we have used the symmetric property of RV. Define the normalized vector a a = ------a where a is the norm of a. Then we may rewrite Eq. (5) as T

1⁄2

T

1⁄2

SNR = a R N ss R N a

44

T

1⁄2 2

= a RN s

Thus the output signal-to-noise ratio SNR equals the squared magnitude of the inner 1⁄2

product of two vectors a and R N s . This inner product is maximized when a equals –1 ⁄ 2

RN

s . That is, –1 ⁄ 2

a SN = R N

s

(6)

Let wSN denote the value of the tap-weight vector that corresponds to Eq. (6). Hence, the use of Eq. (4) in (6) yields –1 ⁄ 2

w SN = R N

–1 ⁄ 2

( RN

s)

–1

= RN s (c) Since the noise vector v(n) is Gaussian, its joint probability density function equals 1 1 T –1 f V ( v ) = ---------------------------------------------------- exp  – --- v R N v  2  M⁄2 1⁄2 ( 2π ) ( detR N ) Under hypothesis H0 we have u=v and 1 T –1 1 f U ( u H 0 ) = ---------------------------------------------------- exp  – --- u R N u   M⁄2 1⁄2 2 ( detR N ) ( 2π ) Under hypothesis H1 we have u=s+v and T –1 1 1 f U ( u H 1 ) = ---------------------------------------------------- exp – --- ( u – s ) R N ( u – s ) M⁄2 1⁄2 2 ( detR N ) ( 2π )

45

Hence, the likelihood ratio equals f U(u H 1) Λ = ------------------------f U(u 0) T –1 1 T –1 = exp  – --- s R N s + s R N u  2 

The natural logarithm of the likelihood ratio equals T –1 1 T –1 ln Λ = – --- s R N s + s R N u 2

The first term represents a constant. Hence, testing ln Λ against a threshold is equivalent to the test T

–1

H

s RN u > 1 λ < H 0

where λ is some threshold. Equivalently, we may write –1

w ML = R N s where wML is the maximum likelihood weight vector. The results of parts (a), (b) and (b) show that the three criteria discussed here yield the same optimum value for the weight vector, except for a scaling factor. 2.19

(a) Assuming the use of a noncausal Wiener filter, we write ∞

∑ woi r ( i – k )

= p ( – k ),

k = 0, ± 1, ± 2, …, ± ∞

i=-∞

where the sum now extends from i=-∞ to i=∞ . Define the z-transforms: ∞

S(z) =

∑ k=-∞

–k

r ( k )z ,



H u(z) =

∑ k=-∞

46

w o, k z

–k

(1)



P(z) =



p ( – k )z

–k

–1

= P(z )

k=-∞

Hence, applying the z-transform to Eq. (1): –1

H u ( z )S ( z ) = P ( z ) , which gives P(1 ⁄ z) H u ( z ) = -----------------S(z)

(2)

0.36 (b) P ( z ) = ---------------------------------------------0.2  1 – ------ ( 1 – 0.2z )  z  0.36 P ( 1 ⁄ z ) = ---------------------------------------------------( 1 – 0.2z )1 – ( 0.2 ⁄ z ) –1

( 1 – 0.146z ) ( 1 – 0.146z ) S ( z ) = 1.37 ----------------------------------------------------------------–1 ( 1 – 0.2z ) ( 1 – 0.2z ) Thus, applying Eq. (2) yields 0.36 H u ( z ) = ---------------------------------------------------------------------------–1 1.37 ( 1 – 0.146z ) ( 1 – 0.146z ) –1

0.36z = -----------------------------------------------------------------------------–1 –1 1.37 ( 1 – 0.146z ) ( z – 0.146 ) 0.0392 0.2685 = ------------------------------ + --------------------------–1 –1 z – 0.146 1 – 0.146z Clearly, this system is noncausal. Its impulse response is h(n) = inverse z-transform of Hu(z) = 0.2685 (0.146)nustep(n)

47

0.0392 1 n – ----------------  ------------- u step ( – n ) 0.146  0.146 where ustep(n) is the unit-step function:  u step ( n ) =  1 for n=0,1,2,…  0 for n=-1,-2,… and ustep(-n) is its mirror image:  u step ( – n ) =  1 for n=0,-1,-2,…  0 for n=1,2,… Simplifying, n

–n

h u ( n ) = 0.2685 × ( 0.146 ) u step ( +n ) – 0.2685 ( 6.849 ) u step ( – n ) Evaluating hu(n) for varying n: hu(0) = 0, and h u ( 1 ) = 0.03, h u ( 2 ) = 0.005, h u ( 3 ) = 0.0008 h u ( – 1 ) = – 0.03, h u ( – 2 ) = – 0.005, h u ( – 3 ) = – 0.0008 These are plotted in the following figure: hu(n) 0.03

. . -2

-1

.

.

0.01

0

1

. . . 2

3

Time n

.

(c) A delay by 3 time units applied to the impulse response will make the system causal and therefore realizable.

48

CHAPTER 3 3.1

(a) Let aM denote the tap-weight vector of the forward prediction-error filter. With a tapinput vector uM+1(n), the forward prediction error at the filter output equals H

f M ( n ) = a M u M+1 ( n ) The mean-square value of fM(n) equals H

2

H

E [ f M ( n ) ] = E [ a M u M+1 ( n )u M+1 ( n )a M ] H

H

= a M E [ u M+1 ( n )u M+1 ( n ) ]a M H

= a M R M+1 a M H

where R M+1 = E [ u M+1 ( n )u M+1 ( n ) ] is the correlation matrix of the tap-input vector. (b) The leading element of the vector a equals 1. Hence, the constrained cost function to be minimized is H

H

* T

J ( a M ) = a M R M+1 a M + λa M 1 + λ 1 a M where λ is the Lagrange multiplier and 1 is the first unit vector defined by 1 = [ 1, 0, …, 0 ] . Differentiating J(aM) with respect to aM, and setting the result equal to zero yields T

2R M+1 a M + 2λ1 = 0 Solving for aM: R M+1 a M = – λ1

(1)

However, we may partition RM+1 as

49

H

R M+1 = r ( 0 ) r

r RM

Hence, H

– λ = [ r ( ( 0 ), r ) ]a M = PM where PM is the minimum prediction-error power. Accordingly, we may rewrite Eq. (1) as

R M+1 a M = P M 1 =

3.2

PM 0

B*

(a) Let a M denote the tap-weight vector of the backward prediction-error filter. With a tap-input vector uM+1(n) the backward prediction error equals BT

b M ( n ) = a M u M+1 ( n ) The mean-square value of bM(n) equals 2

BT

H

B*

E [ b M ( n ) ] = E [ a M u M+1 ( n )u M+1 ( n )a M ] BT

H

B*

= a M E [ u M+1 ( n )u M+1 ( n ) ]a M BT

B*

= a M R M+1 a M B*

(b) The last element of a M equals 1. Hence, the constrained objective function to be minimized is BT

B*

BT B

* BT B* aM

J ( a M ) = a M R M+1 a M + λa M 1 + λ 1 where λ is the Lagrange multiplier, and

50

1

BT

= [ 0, 0, …, 1 ]

Differentiating J(aM) with respect to aM, B*

2R M+1 a M + 2λ1

B

= 0

B*

Solving for a M , we get B*

R M+1 a M = – λ1

B

(1)

However, we may express RM+1 in the partitioned form:

RM

R M+1 =

r

BT

r

B*

r(0)

Therefore, –λ = [ r

BT

B*

, r ( 0 ) ]a M

= PM where PM is the minimum backward prediction-error power. We may thus rewrite Eq. (1) as B*

R M+1 a M = P M 1 3.3

B

=

0 PM

(a) Writing the Wiener-Hopf equation Rg = rB* in expanded form, we have

51

..

...

...

r ( – M+1 ) r ( – M+2 ) …

g2

r(M ) = r ( M-1 )

...

...

g1

… r ( M-1 ) … r ( M-2 ) ...

r(1) r(0)

.

r(0) r ( –1 )

r(0)

r(1)

gM

Equivalently, we may write M

∑ gk r ( k – i ) = r ( M + 1 – i ),

i = 1, 2, …, M

k=1

Let k = M-l+1, or l = M-k+1. Then M

∑ g M -l+1 r ( M – l + 1 – i )

i = 1, 2, …, M

= r ( M + 1 – i ),

l=1

Next, put M+1-i = j, or i = M+1-j. Then M

∑ g M -l+1 r ( j – l )

j = 1, 2, …, M

= r ( j ),

l=1

Putting this relation into matrix form, we write

r(0)

g M-1

=

g1

This, in turn, may be put in the compact form RTgB = r* (b) The product rBTg equals

52

r(1) r(2) ...

gM ...

r ( M-1 ) r ( M-2 ) …

...

.

..

r ( – 1 ) … r ( – M+1 ) r ( 0 ) … r ( – M +2 ) ...

...

r(0) r(1)

r(M )

g1 BT

g = [ r ( – M ), r ( – M+1 ), …, r ( – 1 ) ]

g2 ...

r

gM M

=

∑ gk r ( k – 1 – M )

(1)

k=1

The product rTgB equals gM g M-1

= [ r ( – 1 ), r ( – 2 ), …, r ( – M ) ]

...

T B

r g

g1 M

=

∑ g M+1-k r ( –k ) k=1

Put M+1-k = l, or k = M+1-l. Hence, T B

r g

M

=

∑ gl r ( l-1-M )

(2)

l=1

From Eqs. (1) and (2): rBTg = rTgB 3.4

Starting with the formula m-1

r(m) = –

κ *m P m-1 –

∑ a*m-1, k r ( m-k )

k=1

and solving for κm, we get

53

m-1 r *m 1 κ m = ----------- – ----------- ∑ a m-1, k r * ( m-k ) P m-1 P m-1

(1)

k=1

We also note a m, k = a m-1,k + κ m a *m-1,m-k ,

k = 0, 1, …, m

2

P m = P m-1 ( 1 – κ m )

(2) (3)

(a) We are given r(0) = 1 r(1) = 0.8 r(2) = 0.6 r(3) = 0.4 We also note that P0 = r(0) = 1 Hence, the use of Eq. (1) for m = 1 yields r*(1) κ 1 = - ------------- = – 0.8 P0 The use of Eq. (3) for m = 1 yields 2

p 1 = P 0 ( 1 – κ 1 ) = 1 – 0.64 = 0.36 We next reapply Eq. (1) for m = 2: r*(2) r*(1) κ 2 = – ------------- – ------------- κ 1 P1 P1 where we have noted that κ1 = a1,1

54

Hence, 0.6 0.8 × 0.8 0.04 1 κ 2 = – ---------- + --------------------- = ---------- = --- = 0.1111 9 0.36 0.36 0.36 The use of Eq. (3) for m = 2 yields 2

P2 = P1 ( 1 – κ2 ) 4 1 = 0.36  1 – ------ = ------ = 0.0444   90 81 Next, we reapply Eq. (1) for m = 3: r*(3) 1 κ 3 = – ------------- – ------ ( a 2, 1 r * ( 2 ) + a 2, 2 r * ( 1 ) ) P2 P2 From Eq. (2) we have 1 a 2, 2 = κ 2 = --- = 0.1111 9 a 2, 1 = a 1, 1 + κ 2 a *1, 1 = κ 1 + κ 2 κ *1 4 4 4 188 = – --- – ------ × --- = – --------- = – 0.8356 5 90 5 225 Hence, 1 0.4 κ 3 = – ---------------- – ---------------- ( – 0.8356 × 0.6 + 0.111 × 0.8 ) 0.0444 0.0444 0.4 0.41248 0.01248 = – ---------------- + ------------------- = ------------------- = 0.281 0.0444 0.0444 0.0444 Note that all three reflection coefficients have a magnitude less than unity; hence, the lattice-predictor (prediction-error filter) representation of the process is minimum phase.

55

(b) This lattice-predictor representation is as shown in the following figure:

.

f0(n)

Σ

.

f1(n)

.

Σ

f3(n)

Σ

b3(n)

0.281

0.111

-0.8

u(n)

f2(n)

Σ

. 0.111

-0.8

.

b0(n) z-1

Σ

Stage 1

b1(n) z-1

0.281

.

Σ

b2(n) z-1

Stage 2

.

Stage 3

(c) From the calculations presented in part (a), we have P0 = 1 P1 = 0.36 P2 = 0.0444 To complete the calculations required, we note that P3 = P2(1 - |κ3|2) = 0.0444 (1-0.2812) = 0.0444 (1-0.079) = 0.0444 x 0.921 = 0.0409 From the power plot

1.0

Pm

0

1

2

3

4

5

m we note that the average power Pm decreases exponentially with the prediction order m.

56

The estimation error e(n) is e(n) = u(n) - wHx(n) where x ( n ) = u ( n – ∆ ) = [ u ( n-∆ ), u ( n-1-∆ ), …, u ( n-M-∆ ) ]

T

The mean-square value of the estimation error is 2

J = E [ e(n) ] H

H

= E [ (u ( n ) – w x ( n ))(u * ( n ) – x ( n )w) ] 2

H

H

H

H

= E [ u ( n ) ] – w E [ x ( n )u * ( n ) ] – E [ u ( n )x ( n ) ]w + w E [ x ( n )x ( n ) ]w H

H

H

H

= P 0 – w E [ u ( n-∆ )u * ( n ) ] – E [ u ( n )u ( n-∆ ) ]w + w E [ u ( n-∆ )u ( n-∆ ) ]w We now note the following:

...

  u ( n-∆ )  * E [ u ( n-∆ )u ( n ) ] = E  u ( n-1-∆ )   u ( n-M-∆ )  r ( –∆ ) = r ( – ∆-1 )

   * u ( n )   

= r∆

...

3.5

r ( – ∆ -M ) r ( –∆ ) H E [ u ( n )u ( n-∆ ) ] = r ( – ∆-1 ) r ( – ∆ -M )

H H

= r∆

H

E [ u ( n-∆ )u ( n-∆ ) ] = R

57

(1)

We may thus rewrite Eq. (1) as H

H

J = P0 – w r∆ – r∆ w + R The optimum value of the weight vector is –1

wo = R r∆ where R-1 is the inverse of the correlation matrix R. 3.6

We are given the difference equation u ( n ) = 0.9u ( n – 1 ) + v ( n ) (a) For a prediction-error filter of under two, we have a2,1 = -0.9 a2,2 = 0 The prediction-error filter representation of the process is therefore u(n)

.

u(n-1) z-1 -0.9

Σ

v(n)

(b) The corresponding reflection coefficients of the lattice predictor are κ1 = a2,1 = -0.9 κ2 = a2,2 = 0 We are given a first-order difference equation as the description of the AR process u(n). It is therefore natural that we use a first-order predictor for the representation of this process. 3.7

(a) (i) The tap-weight vector of the prediction-error filter of order M is

aM =

1 –wo

(1)

58

where –1

wo = R M r M

(2)

r M = E [ u M ( n-1 )u * ( n ) ]

e

– j2ω

e

2

= σα e

...

2 = σα e

– jω – jω

sM ( ω )

(3)

– jMω

2

H

2

R M = σ α s M ( ω )s M ( ω ) + σ v I M 1 – jω

e

(4)

...

sM ( ω ) = e

– jω ( M-1 )

From the matrix inversion lemma (see Chapter 9), we have: If A = B-1 + CD-1CH then A-1 = B - CD(D + CHBC)-1CHB

For our application B

–1

= σv I M

–1

= σα

D

2

2

C = sM ( ω ) Hence,

59

4

–1 RM

1 ⁄ σv H 1 = ------ I M – -----------------------------------------------------s M ( ω )s M ( ω ) 2 1 H 1 σv ------- + ------ s M ( ω )s M ( ω ) 2 2 σα σv 2

1 ⁄ σv H 1 = ------ I M – -------------------------------------------------------------s M ( ω )s M ( ω ) 2 2 2 H σv ( σ v ⁄ σ α ) + s M ( ω )s M ( ω ) We also note that H

s M ( w )s M ( w ) = M Hence, 2

–1 RM

1 ⁄ σv H 1 = ------ I M – ---------------------------------s M ( ω )s M ( ω ) 2 2 2 ( σv ⁄ σα ) + M σv

(5)

Substituting Eqs. (3) and (5) into (2) yields 2

wo =

2 2 – jω ( σ α ⁄ σ v )e sM ( ω )

2

2

= ( σ α ⁄ σ v )e

– jω

2

– jω

( σ α ⁄ σ v )e H – ---------------------------------s M ( ω )s M ( ω )s M ( ω ) 2 2 ( σv ⁄ σα ) + M

M 1 – --------------------------------- s M ( ω ) 2 2 ( σv ⁄ σα ) + M

– jω   e =  --------------------------------- s M ( ω )  ( σ2 ⁄ σ2 ) + M v α

(6)

Equations (1) and (6) define the tap-weight vector of the prediction-error filter. Moreover, the final value of the prediction-error power is H

P M = r ( 0 ) – r M wo

60

2   σα H  = r ( 0 ) – --------------------------------- s M ( ω )s M ( ω )  2 2   ( σv ⁄ σα ) + M

(7)

Using (4) and the fact that 2

2

r ( 0 ) = σα + σv

we may rewrite Eq. (7) as 2

PM =

2 σα +

2 σv –

Mσ α --------------------------------2 2 ( σv ⁄ σα ) + M

2

2

2

σv [ 1 + M + ( σv ⁄ σα ) ] = -----------------------------------------------------2 2 ( σv ⁄ σα ) + M

(8)

(a) (ii) The mth reflection coefficient is [from Eq. (3.56) of the text] ∆ m-1 κ m = – ----------P m-1 where BT

∆ m-1 = r m a m-1 Hence, BT

r m a m-1 κ m = – --------------------P m-1

(9)

From Eq. (3) we deduce that B

2 – jω B sm ( ω )

rm = σα e

61

2 = σα e

– jmω – j ( m-1 )ω

e

(10)

...

e

– jω

From Eq. (8): 2

2

2

σv [ m + ( σv ⁄ σα ) ] P m-1 = -------------------------------------------2 2 ( σ v ⁄ σ α ) + m-1

(11)

From Eqs. (1) and (6): 1 a m-1 =

– jω   e –  -------------------------------------- s m-1 ( ω )  ( σ 2 ⁄ σ 2 ) + m-1 v

(12)

α

Hence, the combined use of Eqs. (9)-(12) yields 2

2

2

– jmω

4 – jmω

σ α [ σ v + ( m-1 )σ α ]e ( m-1 ) σα e κ m = – -------------------------------------------------------------- + ------------------------------------------------2 2 2 2 2 2 2 2 σ v ( σ v + mσ α – σ α ) σ v ( σ v + mσ α – σ α ) 2 – jmω

σα e = – ------------------------------------2 2 2 σ v + mσ α – σ α 2

(a) (iii) When we let the noise variance σ v approach zero, the mth reflection coefficient 1 – jmω κm approaches ---------e , the magnitude of which, in turn, approaches zero as m m-1 approaches infinity. (b) (i) The tap-weight vector of the prediction-error filter of order M is

62

1 a M = α * e – jω 0 M-1 where 0M-1 is a null vector of dimension M-1. (ii) The reflection coefficients of a lattice predictor of order M are κ1 = α* e

– jω

for m = 2, …, M .

κm = 0

(c) We may consider u1(n) = u2(n) under the limiting conditions: α →1 and 2

σv → 0 2

where σ v refers to the noise in the AR process.

am =

a m-1 0

+ κm

0 B*

a m-1

In expanded form, we have

a m-1, 0

a m, 1

a m-1, 1 ...

=

0 a *m-1, m-1 + κm

...

a m, 0 ...

3.8

a m, m-1

a m-1, m-1

a *m-1, 1

a m, m

0

a *m-1, 0

or equivalently,

63

k = 0, 1, …, M

a m, k = a m-1, k + κ m a *m-1,m-k, Put m-k = l or k = m-l. Then,

l = 0, 1, …, m

a m, m-l = a m-1, m-l + κ m a *m-1, l , Complex conjugate both sides:

l = 0, 1, …, m

a *m, m-l = a *m-1, m-l + κ *m a m-1, l, Rewrite this relation in matrix form:

a *m, m-1

a *m-1, m-1 + κ *m

...

=

a m-1,0 a m-1,1 ...

0

...

a *m, m

a *m-1

a *m-1,1

a m-1, m-1

a *m-0

a *m-1,0

0

or equivalently,

B*

am =

0 B* a m-1

+ κ *m

a m-1 0

Note that (for all m)  a m, k =  1,  0, 3.9

k=0 k>m

We start with the formula m-1

∆ m-1 =

∑ am-1,k r ( k – m )

(1)

k=0

The autocorrelation function r(k - m) equals (by definition)

64

r ( k – m ) = E [ u ( n – m )u * ( n – k ) ]

(2)

Hence, substituting Eq. (2) in (1): m-1

∆ m-1 =

∑ am-1,k E [ u ( n – m )u* ( n – k ) ]

k=0 m-1

= E u ( n – m ) ∑ a m-1,k u * ( n – k )

(3)

k=0

But, by definition, m-1

f m-1 ( n ) =

∑ a*m-1,k u ( n – k )

k=0

Hence, we may rewrite Eq. (3) as ∆ m-1 = E [ u ( n – m ) f *m-1 ( n ) ]

(4)

Next we note that u ( n – m ) = uˆ ( n – m U n-1 ) + b m-1 ( n-1 )

(5)

where Un-1 is the space spanned by u ( n-1 ), …, u ( n-m+1 ) , and bm-1 is the backward prediction error produced by a predictor of order m-1. The estimate uˆ ( n-m U n-1 ) equals m-1

uˆ ( n-m U n-1 ) =

∑ w*b, k u ( n – k )

(6)

k=1

Accordingly, using Eqs. (5) and (5) in (3): m-1

∆ m-1 =

E [ b m-1 ( n-1 ) f *m-1 ( n ) ] +

E

∑ w*b, k u ( n – k ) f *m-1 ( n )

k=1

65

m-1

=

E [ b m-1 ( n-1 ) f *m-1 ( n ) ]

∑ w*b, k E [ u ( n – k ) f m-1 ( n ) ]

+

*

(7)

k=1

But E [ f m-1 ( n )u * ( n – k ) ] = 0,

1 ≤ k ≤ m-1

Hence, Eq. (7) simplifies to ∆ m-1 = E [ b m-1 ( n-1 ) f *m-1 ( n ) ] 3.10

The polynomial x(z) equals x ( z ) = a M, M z M

=

M

∑ a M, k z

+ a M , M-1 z

M-1

+ … + a M, 0

k

k=0 M

= z H b, M ( z ) where Hb,M(z) is the transfer function of a backward prediction-error filter of order M. The reciprocal polynomial x′ ( z ) equals ′

x ( z ) = a M , M + a M , M-1 z + … + a M , 0 z *

*

*

M

M

= z H f , M (z) where Hf,M(z) is the transfer function of a forward prediction-error filter of order M. Next, we note, by definition, that ′

*

T [ x ( z ) ] = a M, 0 x ( z ) – a M, M x ( z ) M

=

∑ a M, k z k=0

k

M

– a M , M ∑ a M , M-k z *

k=0

66

k

M

=

∑ ( a M, k – a M, M a M, M-k )z *

k

k=0

But from the inverse recursion: *

a M , k – a M , M a M , M-k a M-1, k = ----------------------------------------------------, 2 1 – a M, M

k = 0, 1, …, M

Therefore, M

T [ x ( z ) ] = ( 1 – a M , M ) ∑ a M-1, k z 2

k

k=0 2

= ( 1 – a M, M ) [ z

M-1

H b, M-1 ( z ) ]

where we have used the fact that aM-1,M is zero. This shows that T[x(z)] is of order M-1, one less than the order of the original polynomial x(z). Similarly, we have 2

2

2

T [ x ( z ) ] = ( 1 – a M , M ) ( 1 – a M-1, M-1 ) [ z We also note that 1

T [ x ( 0 ) ] = 1 – a M, M 2

2

2

2

T [ x ( 0 ) ] = ( 1 – a M , M ) ( 1 – a M-1, M-1 ) Thus 0

0

M

1

1

M-1

H b, M-1 ( z )

2

2

M-2

H b, M-2 ( z )

T [ x ( z ) ] = T [ x ( 0 ) ]z H b, M ( z ) T [ x ( z ) ] = T [ x ( 0 ) ]z T [ x ( z ) ] = T [ x ( 0 ) ]z

67

M-2

H b, M-2 ( z ) ]

where, in the first line, we have 0

T [ x(z)] = x(z) and x(0) = 1 We may generalize these results by writing i

i

T [ x ( z ) ] = T [ x ( 0 ) ]z

M-i

H b, M-i ( z )

where i-1

i

T [ x(0)] =

∏ (1 –

2

a M – j, M – j ),

1≤i≤M

j=0

3.11

(a) The AR parameters equal a1 = –1 1 a 2 = --2 Since v(n) has zero mean, the average power of u(n) equals P0 = r ( 0 ) 2

σv  1 + a 2 =  --------------- ------------------------------------ = 1.2  1 – a 2 [ ( 1 + a ) – a 2 ] 2 1 (b) For prediction order M = 2, the prediction-error filter coefficients equal the AR parameters: a 2, 1 = a 1 = – 1 1 a 2, 2 = a 2 = --2

68

The use of the inverse Levinson-Durbin recursion for real-valued data yields a m, k – a m, m a m, m-k a m-1, k = ------------------------------------------------, 2 1 – a m, m

k = 0, …, M

For m = 2, we have a 2, k – a 2, 2 a 2, 2-k a 1, k = -----------------------------------------2 1 – a 2, 2

k = 0, 1

Hence, a1,0 = 1 a 2, 1 – a 2, 2 a 2, 1 a 1, 1 = -------------------------------------2 1 – a 2, 2 2 = – --3 The reflection coefficients are thus as follows 2 κ 1 = a 1, 1 = – --3 1 κ 2 = a 2, 2 = --2 (c) Use of the formula (for real-valued data) 2

P m = P m-1 ( 1 – κ m ) yields the following values for the average prediction-error powers: 2 2 P 1 = P 0 ( 1 – κ 1 ) = --3 2 1 P 2 = P 1 ( 1 – κ 2 ) = --2

69

3.12

For real data, we have m-1

∑ am-1, k r ( m – k )

r ( m ) = – κ m P m-1 –

k=1

For m = 1, r ( 1 ) = –κ1 P0 = 0.8 For m = 2, r ( 2 ) = – κ 2 P 1 – a 1, 1 r ( 1 ) = 0.2 3.13

(a) The transfer function of the forward prediction-error filter equals –1

H f , M ( z ) = ( 1 – z i z )c i ( z ) where zi = ρi e

jω i

The power spectral density of the prediction error fM(n) equals S f (ω) = H f , M (e

jω 2

) S(ω)

where S(ω) is the power spectral density of the input process u(n). Hence, the meansquare value of the prediction error fM(n) equals ε =

=

π

∫–π H f , M ( e π

∫– π

1 – ρi e

jω 2

) S ( ω ) d( ω )

jω i – jω 2

e

Ci(e

jω 2

) S ( ω ) d( ω )

70

=

π

2

∫–π [ 1 – 2ρi cos ( ( ω – ωi ) + ρi ) ] C i ( e

jω 2

) S ( ω ) d( ω )

(b) Differentiating ε with respect to ρi: π jω 2 ∂ε -------- = 2 ∫ [ – cos ( ω – ω i ) + ρ i ] C i ( e ) S ( ω ) d( ω ) ∂ρ i –π

If zi lies outside the unit circle, ρi > 1. We note that (regardless of ρi) – 1 ≤ cos ( ω – ω i ) ≤ 1,

– π ≤ ( ω, ω i ) ≤ π

Hence, – cos ( ω – ω i ) + ρ i > 0, Since C i ( e ∂ε -------- > 0, ∂ρ i

jω 2

)

if ρ i > 1

and S(ω) are both positive, it follows that

if ρ i > 1

If the prediction-error filter is to be optimum, its parameters (and therefore the ρi) ∂ε must be chosen in such a way that -------- = 0 . ∂ρ i ∂ε Hence it is not possible for ρi > 1 and yet satisfy the optimality condition -------- = 0 . ∂ρ i The conclusion to be drawn is that the transfer function of a forward prediction-error filter (that is optimum) cannot have any of its zeros outside the unit circle. In other words, a forward prediction-error filter is necessarily minimum phase.

3.14

An AR process u(n) of order M is described by the difference equation u ( n ) = – a1 u ( n – 1 ) – … – a M u ( n – M ) + v ( n ) *

*

Equivalently, in the z-domain we have 1 U ( z ) = -----------------------------------------------------------V ( z ) * –1 * –M 1 + a1 z + … + a M z

71

When the process u(n) is applied to a forward prediction-error filter described by the transfer function * –1

H f ( z ) = 1 + a1 z

* –M

+ … + aM z

The z-transform of the resulting output is H f ( z )U ( z ) = V ( z ) In other words, the output consists of a white noise sequence v(n). Suppose next the process u(n) is applied to a backward prediction-error filter described by the transfer function H b ( z ) = a M + a M-1 z

–1

+ … + a1 z

– M+1

+z

–M

The z-transform of the resulting output is +z a M + a M-1 z + … + a 1 z H b ( z )U ( z ) = -----------------------------------------------------------------------------------------* –1 * –M 1 + a1 z + … + a M z –1

– M+1

–M

This rational function is recognized as an all-pass (nonminimum phase) function with unit magnitude but non-zero phase. Equivalently, we may state that the corresponding output sequence is an anticausal realization of white noise. 3.15

Let 1 I = -------2πj

°∫ C

–1 1 --- φ m ( z )φ k ( z )S ( z )dz z

On the unit circle, z = ejω dz = jejωdω Hence, jω – jω 1 π )S ( ω ) dω I = ------ ∫ φ m ( e )φ k ( e 2π – π

(1)

72

From the definition of φm(z), we have φm ( e



m

jω ( m-i ) 1 ) = ----------- ∑ a m, i e P m i=0

Hence, we may rewrite Eq. (1) as 1 I = ------------------------2π P m P k

m

π

k

∫–π ∑ ∑ am, i ak, l e

jω ( m-i ) – j ω ( k-l )

e

S ( ω ) dω

i=0 l=0

k

k

1 = ------------------------- ∑ ∑ a m, i a k, l 2π P m P k i=0 l=0

π

∫–π S ( ω )e

jω ( m-k+l-i )



(2)

From the Einstein-Wiener-Khintchine relations: jω ( m-k+l-i ) 1 π ------ ∫ S ( ω )e dω = r ( m-k+l-i ) 2π – π

Accordingly, we may simplify Eq. (2) as m

k

1 I = ------------------ ∑ ∑ a m, i a k, l r ( m-k+l-i ) P m P k i=0 l=0

(3)

From the augmented Wiener-Hopf equation for linear prediction: k

∑ am, i r ( m-k+l-i ) l=0

 P , =  k  0,

if m=k and l=0 otherwise

Substituting Eq. (4) into (3), we get  I =  1,  0,

if m=k otherwise

where it is noted that ak,l = 1 for l = 0. Equivalently, I = δ mk

73

(4)

as required. 3.16

From Eqs. (3.81) and (3.82): m

H f , m(z) =

∏ ( 1 – zi z

–1

)

i=1 m

H b, m ( z ) =

∏ (z

*

–1

– zi )

i=1

 z–1 – z*  –1 i = ∏  --------------------- ( 1 – z i z )  – 1 i=1  1 – z i z  m

 z–1 – z*  i = H f , m ( z ) ∏  ---------------------  – 1 i=1  1 – z i z  m

The factor ( z

–1

*

–1

– z i ) ⁄ ( 1 – z i z ) represents an all-pass structure with a magnitude

response equal to unity for z = ejω and all ω. Hence, H b, m ( e



) = H f , m(e



)

Given an input u(n) of power spectral density Su(ω) applied to both Hf,m(ejω) and Hb,m(ejω), we immediately see that S f , m( ω ) = Su( ω ) H f , m( e = S u ( ω ) H b, m ( e S b, m ( ω ) 3.17

jω 2

)

jω 2

)

for all m.

(a) The reflection coefficients of the two-stage lattice predictor equal 2 κ 1 = – --3

74

1 κ 2 = --2 Hence, the structure of this lattice predictor is as follows

.

.

Σ

−2/3

1/2

−2/3

1/2

Σ

f2(n)

Σ

b2(n)

u(n)

.

z−1

.

z−1

Σ

(b) The inverse lattice filter for generating the second-order AR process from a white noise process is as follows (see Fig. 3.11): White noise v(n)

Σ

Σ

.

Σ

.

1/2

−2/3

−1/2

2/3

.

z−1

Σ

.

.

AR process u(n)

z−1

From this latter structure, we see that 2 1 1 2 u ( n ) = --- u ( n-1 ) +  – ---  – --- u ( n-1 ) +  – --- u ( n-2 ) + v ( n )  2  3  2 3 That is, u ( n ) = u ( n-1 ) – 0.5u ( n-2 ) + v ( n ) which is exactly the same as the difference equation specified in Problem 3.11. 3.18

(a) From Fig. 3.10, fM(n) is obtained by passing u(n) through a minimum-phase prediction-error filter of order M, whereas bM(n) is obtained by passing u(n) through a maximum-phase prediction-error filter of order M. Hence, going through the steps outlined in Problem 3.16, we readily see that in passing through the path from the input fM(n) to the output bM(n), we will have gone through all-pass filter of order M.

75

(b) In going from the input fM(n) to the output u(n), we will have passed through the inverse of a forward prediction-error filter of order M. Since such a filter is minimum phase with all its zeros confined to the interior of the unit circle, it follows that its inverse is an all-pole filter with all its confined to the interior of the unit circle; hence its physical realizability is assured.

a 1, 1

0 1

… 0 … 0

a 2, 2

a 2, 1

… 0

...

...

. .. ...

(a) The (M+1)-by-(M+1) lower triangular matrix L is defined by 1 L =

a M , M a M , M-1 … 1 Let Y = LR

r ( – M ) r ( – M+1 ) …

...

… r(M ) … r ( M-1 ) .

r(1) r(0)

..

R =

r(0) r ( –1 )

...

where the (M+1)-by-(M+1) correlation matrix R is defined by

...

3.19

r(0)

Hence, the kmth element of the matrix product LR equals k

y km =

∑ ak, k-l r ( m-l ),

( m, k ) = 0, 1, …, M

(1)

l=0

For k = m, we thus have k

y mm =

∑ am, m-l r ( m-l ),

m = 0, 1, …, M

l=0

However, from the augmented Wiener-Hopf equation for backward prediction we have

76

m

∑ am, m-l r ( l-i ) *

 0, =   P m,

i = 0, 1, …, m-1 i=m

We therefore find that ymm is real-valued and that y mm = P m,

m = 0, 1, …, M .

(b) The mth column of matrix Y equals

r(k)

y 0m

1

∑ a1, 1-l r ( m-l )

y 1m

y mm

=

m

...

...

l=0

∑ am, m-l r ( m-l )

...

l=0

y Mm

0

The kth element of this column vector equals ykm that is defined by Eq. (1). The element ykm is recognized as the output produced by a backward prediction-error with tap weights a * , a * and tap inputs r ( m ), r ( m-1 ), …, r ( m-k ) , , …, a * k, k

k, k-1

k, 0

respectively. By summing the inner products of the respective tap weights and tap inputs, we get ykm. Hence, the mth column of matrix Y is obtained by passing the autocorrelation sequence { r ( 0 ), r ( 1 ), …, r ( m ) } through the sequence of backward prediction-error filters whose transfer functions equal H b, 0 ( z ), H b, 1 ( z ), …, H b, m ( z ) .

77

(c) Apply the autocorrelation sequence { r ( 0 ), r ( 1 ), …, r ( m ) } to the input of a lattice predictor of order m. Denote the variables appearing at the various points on the lower part of the predictor as x 0, x 1, …, x m , as shown here:

.

Σ

.

...

κ1

κm

κ1*

κm

Σ

{r(0),...,r(m)}

.

z−1

.

Σ

x0

. x1

z−1 . . .

. . z−1

xm-1

Σ

. xm

Figure 1 At time m we may express the resulting values of these outputs as follows: x0 = r(m) x1 = output of backward prediction-error filter or order 1 and with tap inputs r(m-1), r(m) 1

=

∑ a1, 1-l r ( m – l )

...

l=0

xm =

output of backward prediction-error filter of order k and with tap inputs r(0), r(1),...,r(m) m

=

∑ am, m-l r ( m – l ) l=0

The various sets of prediction-error filter coefficients are related to the reflection coefficients κ1,κ2,...,κm in accordance with the Levinson-Durbin recursion. Hence, the variables appearing at the various points on the lower line of the lattice predictor in Fig. 1at time m equal the elements of the mth column of matrix Y. (d) The lower output of stage m at time m equals the mmth element ymm of matrix Y. This output equals Pm, as shown in part (a). The upper output of stage m in the lattice predictor is equivalent to the output of a forward prediction-error filter of order m.

78

Hence, this output at time m+1, in response to the autocorrelation sequence { r ( 1 ), r ( 2 ), …, r ( m+1 ) } used as input, equals m

∑ am, l r ( m+1-l ) *

*

= ∆m

l=0

Thus, we deduce that (except for a minus sign) the ratio of the upper output of stage m in the lattice predictor of Fig. 1at time m+1 to the lower output of this stage at time m equals the complex conjugate of the reflection coefficient κm+1 for stage m+1 in the lattice predictor. (e) Using the autocorrelation sequence { r ( 0 ), r ( 1 ), …, r ( m ) } as input into the lattice predictor of Fig. 1, we may thus compute the corresponding sequence of reflection coefficients of the predictor as follows: (i) At the input of the lattice predictor (i.e., m=0), the upper output equals r(1) at time 1, and the lower input equals r(0) at time 0. Hence, the ratio of these two outputs equals [r(1)/r(0)] = – κ *1 . (ii) The upper output of stage 1 at time 2 equals ∆ *1 , and the lower output of this stage at time 1 equals P1. Hence, the ratio of these two outputs equals ( ∆ *1 ⁄ P 1 ) = – κ *2 . (iii)The ratio of the upper output of stage 2 at time 3 to the lower output of this stage at time 2 equals ( ∆ *2 ⁄ P 2 ) = – κ *3 , and so on for the higher stages of the lattice predictor. 3.20

A lattice filter exhibits some interesting correlation properties between the forward and backward prediction errors developed at the various stages of the filter. Basically, these properties are consequences of the principle of orthogonality, as described below: Property 1. The forward prediction error fm(n) and the input signal u(n) are orthogonal: E [ f m ( n )u* ( n – k ) ] = 0,

1≤k≤m

(1)

Similarly, the backward prediction error bm(n) and the input signal u(n) are orthogonal: E [ b m ( n )u* ( n – k ) ] = 0,

1 ≤ k ≤ m-1

Note the difference between the ranges of the index k in Eqs. (1) and (2).

79

(2)

Equations (1) and (2) are both restatements of the principle of orthogonality. By definition, the forward prediction error fm(n) equals the difference between u(n) and the prediction of u(n), given the tap inputs u(n-1),u(n-2),...,u(n-m). By the principle of orthogonality, the error fm(n) is orthogonal to u(n-k). k = 1,2,...,m. This proves Eq. (1). The backward prediction error, by definition, equals the difference between u(n-m) and the prediction of u(n-m), given the tap inputs u(n),u(n-1),...,u(n-m+1). Here, again, by the principle of orthogonality, the error bm(n) is orthogonal to u(n-k), k = 0,1,...,m-1. This proves Eq. (2). Property 2. The cross-correlation of the forward prediction error fm(n) and the input u(n) equals the cross-correlation of the backward prediction error bm(n) and the time-shifted input u(n-m), as shown by E [ f m ( n )u* ( n ) ] = E [ b m ( n )u* ( n – m ) ] = P m

(3)

where Pm is the corresponding prediction-error power. To prove the first part of this property, we note that u(n) equals the forward prediction error fm(n) plus the prediction of u(n), given the samples u(n-1),u(n-2),...,u(n-m). Since this prediction is orthogonal to the error fm(n) [which is a corollary to the principle of orthogonality], it follows that E [ f m ( n )u* ( n ) ] = E [ f m ( n ) f *m ( n ) ] = Pm To prove the second part of the property, we note that u(n-m) equals the backward prediction error bm(n) plus the prediction of u(n-m), given the samples u(n),u(n-1),...,u(nm+1). Since this prediction is orthogonal to the error bm(n), it follows that E [ b m ( n )u* ( n-m ) ] = E [ b m ( n )b *m ( n ) ] = Pm This completes the proof of Eq. (3). Property 3. The backward prediction errors are orthogonal to each other, as shown by  P , E [ b m ( n )b *i ( n ) ] =  m  0,

m=i m≠i

80

The forward prediction errors do not, however, exhibit the same orthogonality property as the backward prediction errors; rather, they are correlated, as shown by E [ f m ( n ) f *i ( n ) ] = P m

m≥i

(4)

Without loss of generality, we may assume that m > i. To prove Eq. (4) we express the backward prediction error bi(n) in terms of the input u(n) as the convolution sum i

bi ( n ) =

∑ ai, i-k u ( n – k )

(5)

k=0

where a *i, i-k, k = 0, 1, …, i , are the coefficients of a backward prediction-error filter of order i. Hence, we may write i

E [ b m ( n )b *i ( n ) ]

= E b m ( n ) ∑ a i, i-k u * ( n – k ) *

(6)

k=0

Now, by Property 1, we have E [ b m ( n )b *i ( n – k ) ] = 0

0≤k≤m–1

For the case when m > i, and with 0 < k < i, we therefore find that all the expectation terms inside the summation on the right side of Eq. (6) are zero. Correspondingly, E [ b m ( n )b *i ( n ) ] = 0

m≠i

When m = i, Eq. (6) reduces to E [ b m ( n )b *i ( n ) ] = E [ b m ( n )b *m ( n ) ] = P m,

m = i

This completes the proof of Eq. (3). To prove Eq. (4), we express the forward prediction error fi(n) in terms of the input u(n) as the convolution sum

81

i

f i(n) =

∑ a*i, k u ( n – k )

(7)

k=0

where ai,k, k = 0,1,...,i, are the coefficients of a forward prediction-error filter or order i. Hence, * E [ f m(n) f i (n)]

i

= E f m ( n ) ∑ a i, k u * ( n – k ) k=0 i

= E [ f m ( n )u* ( n ) ] +

∑ ai, k E [ f m ( n )u* ( n – k ) ]

(8)

k=1

where we have used the fact that ai,0 = 1. However, by Property 1, we have E [ f m ( n )u* ( n – k ) ] = 0

1≤k≤m

Also, by Property 2, we have E [ f m ( n )u* ( n ) ] = P m Therefore, Eq. (8) reduces to *

E [ f m ( n ) f i ( n ) ] = Pm

m≥i

This completes the proof of Eq. (4). Property 4. The time-shifted versions of the forward and backward prediction errors are orthogonal, as shown by, respectively, *

*

E [ f m ( n+l ) f i ( n ) ] = E [ f m ( n – l ) f i ( n ) ] = 0

1≤l≤m–i

(9)

m>i

*

*

E [ b m ( n )b i ( n – l ) ] = E [ b m ( n + l )b i ( n ) ] = 0

0≤l≤m–i–1 m>i

where l is an integer lag.

82

(10)

To prove Eq. (9), we use Eq. (7) to write * E [ f m(n) f i (n –

i

l ) ] = E f m ( n ) ∑ a i, k u * ( n – l – k ) k=0 i

∑ ai, k E [ f m ( n )u* ( n – l – k ) ]

=

(11)

k=0

By Property 1, we have E [ f m ( n )u * ( n – l – k ) ] = 0,

1≤l+k≤m

(12)

In the summation on the right side of Eq. (11) we have 0 < k < i. For the orthogonality relationship of Eq. (12) to hold for all values of k inside this range, the lag l must correspondingly satisfy the condition 1 < l < m-i. Thus, with the lag l bounded in this way, and with m > i, all the expectation terms inside the summation on the right side of Eq. (11) are zero. We therefore have *

E [ f m(n) f i (n – l)] = 0

1≤l≤m–i m>i

By definition, we have *

*

E [ f m(n) f i (n – l)] = E [ f m(n + l) f i (n)] *

Therefore, if the expectation E [ f m ( n ) f i ( n – l ) ] is zero, then so is the expectation *

E [ f m ( n + l ) f i ( n ) ] . This completes the proof of Eq. (9). To prove Eq. (10), we use Eq. (5) to write * E [ b m ( n )b i ( n

i

– l ) ] = E b m ∑ a *i, i-k u * ( n – l – k ) k=0 i

=

∑ a*i, i-k E [ bm ( n )u * ( n – l – k ) ] k=0

83

(13)

By Property 1, we have E [ b m ( n )u * ( n – l – k ) ] = 0,

0≤l+k≤m–1

(14)

For the orthogonality relationship to hold for 0 < k < i, the lag l must satisfy the condition 0 < l < m-i-1. Then, with m > i, we find that all the expectations inside the summation on the right side of Eq. (13) are zero. We therefore have E [ b m ( n )b *i ( n – l ) ] = 0,

0≤l≤m–i–1 m>i

By definition, we have E [ b m ( n )b *i ( n – l ) ] = E [ b m ( n + l )b *i ( n ) ] Hence, if the expectation E [ b m ( n )b *i ( n – l ) ] is zero, then so is the expectation E [ b m ( n + l )b *i ( n ) ] . This completes the proof of Eq. (10). Property 5. The time-shifted forward prediction errors fm(n+m) and fi(n+i) are orthogonal, as shown by  P m=i E [ f m ( n – m ) f *i ( n + i ) ] =  m,  0, m ≠ i

(15)

The corresponding time-shifted backward prediction errors bm(n+m) and bi(n+i), on the other hand, are correlated as shown by E [ b m ( n + m )b *i ( n + i ) ] = P m,

m≥i

(16)

Equations (15) and (16) are the duals of Eqs. (3) and (4), respectively Without loss of generality, we may assume m > i. To prove Eq. (15), we first recognize that E [ f m ( n + m ) f *i ( n + i ) ] = E [ f m ( n ) f *i ( n – m + i ) ]

84

= E [ f m ( n ) f *i ( n – l ) ]

(17)

where l = m–i Therefore, with m > i, we find from Property 4 that the expectation in Eq. (17) is zero. When, however, m = i, the lag l is zero, and this expectation equals Pm, the mean-square value of fm(n). This completes the proof of Eq. (15). To prove Eq. (16) we recognize that E [ b m ( n + m )b *i ( n + i ) ] = E [ b m ( n )b *i ( n – m + i ) ] = E [ b m ( n )b *i ( n – l ) ]

(18)

where l = m–i The value of l lies outside the range for which the expectation in Eq. (18) is zero [see Eq. (10)]. This means that bm(n+m) and bi(n+i) are correlated. To determine this correlation, we use Eq. (5) to write i

E [ bm ( n

+ m )b *i ( n

+ i ) ] = E b m ( n + m ) ∑ a *i, i-k u * ( n + i – k ) k=0

= E [ b m ( n )u * ( n ) ] i-1

+

∑ a*i, i-k E [ bm ( n + m )u

*

(n + i – k)]

(19)

k=0

where we have used ai,0 = 1. By Property 1, we have *

*

E [ b m ( n + m )u ( n + i – k ) ] = E [ b m ( n )u ( n + i – k – m ) ] = 0,

0≤k+m–i≤m–1

85

(20)

The orthogonality relationship of Eq. (20) holds for i-m < k < i-1. The summation on the right side of Eq. (19) applies for 0 < k < i-1. Hence, with m > i, all the expectation terms inside this summation are zero. Correspondingly, Eq. (19) reduces to E [ b m ( n + m )b *i ( n + i ) ] = E [ b m ( n + m )u * ( n ) ] = E [ b m ( n )u * ( n – m ) ] = P m,

m≥i

where we have made use of Property 2. This completes the proof of Eq. (16). Property 6. The forward and backward prediction errors exhibit the following crosscorrelation property:

E [ f m ( n )b *i ( n ) ]

  * =  κi Pm ,  0, 

m≥i

(21)

m i, we therefore find that all the expectation terms in the second term (summation) on the right side of Eq. (22) are zero. Hence, i

∑ a*i, i-k E [ f m ( n )u* ( n – k ) ]

= 0

for m > i

k=1

86

By Property 2, we have E [ f m ( n )u * ( n ) ] = P m Therefore, with a i, i = κ i , we find that Eq. (22) reduces to E [ f m ( n )b *i ( n ) ] = κ *i P m,

m≥i

For the case when m < i, we adapt Eq. (7) to write m

E [ f m ( n )b *i ( n ) ]

∑ a*m, k u ( n – k )b*i ( n )

= E

k=0 m

=

∑ a*m, k u ( n – k )b*i ( n )

(23)

k=0

By Property 1, we have E [ b i ( n )u * ( n – k ) ] = 0

0≤k≤i–1

Therefore, with m < i, we find that all the expectation terms inside the summation on the right side of Eq. (23) are zero. Thus, E [ f m ( n )b *i ( n ) ] = 0

m 1, and so we conclude that all the zeros of the transfer function Hf(z) must lie inside the unit circle. That is, Hf(z) is minimum phase. 8.5

For forward linear prediction we have (a) The data matrix is

u(1)

...

...

...

u ( M ) u ( M+1 ) … u ( N -1 ) H A = u ( M-1 ) u ( M ) … u ( N -2 ) u(2)

… u ( N -M ) H

The correlation matrix is Φ = A A (b) The desired response vector is d

H

= [ u ( M+1 ), u ( M+2 ), …, u ( N ) ]

The cross-correlation vector is AHd. (c) The minimum value of Ef is H

H

H

–1 H

E f , min = d d – d ( A A ) A d 8.6

For backward linear prediction we have

175

u* ( 3 )

… u * ( N -M+1 )

u* ( 3 )

u* ( 4 )

… u * ( N -M+2 )

...

...

u* ( 2 ) ...

(a) The data matrix is

u * ( M+1 ) u * ( M+2 ) …

u* ( N )

Note the difference between this data matrix and that for forward linear prediction. The correlation matrix is H

Φ = A A (b) The data vector is d

H

= [ u * ( 1 ), u * ( 2 ), …, u * ( N -M ) ]

The cross-correlation vector is AHd. (c) The minimum value of Eb is H

H

H

–1

E b, min = d d – d ( A A ) d 8.7

The data matrix is A

H

= [ u ( M ), u ( M+1 ), …, u ( N ) ]

The desired response vector is d

H

= [ d ( M ), d ( M+1 ), …, d ( N ) ]

The cost function is H

H

H

H

E ( w ) = d d – w z – z w + w Φw where z is the cross-correlation vector: z = AHd

176

and Φ is the correlation matrix H

Φ = A A Differentiating the cost function with respect to the weight vector: ∂E ( w ) ----------------- = – 2z + 2Φw ∂(w) Setting this result equal to zero and solving for the optimum w, we get –1

ˆ = Φ z w –1 H

H

= (A A) A d K

8.8

E reg =



H

2

H

w u(n) + λ(w s(θ) – 1) + δ w

2

n=1

Taking the derivative of Ereg with respect to the weight vector w and setting the result equal to zero K ∂E reg H H -------------- = ∑ 2w u ( n )u ( n ) + λs ( θ ) + 2δw = 0 ∂w n=1

Hence, –λ s ( θ ) ˆ = --------------------------------------------------------w K   H 2  ∑ u ( n )u ( n ) + δI  n=1  H

ˆ s(θ) = 1 , Since w 2

(1)

K

 H ˆ  ∑ u ( n )u ( n ) + δI –λ = 2 w  n=1 

(2)

By the virtue of Eq. (8.80) in the text, we may have

177

1 ˆ = – --- λs ( θ ) Φw 2

(3)

Use Eqs. (1) and (2) in (3) and rearrange terms, obtaining K

Φ =

∑ u ( n )u

H

( n ) + δI

n=1

8.9

Starting with the cost function N

E =



2

2

( f M ( i ) + bM ( i ) )

i=M+1 H

f M ( i ) = a M u M+1 ( i ) H

T

a M = [ 1, – w 1, – w 2, …, – w M ] = [ 1, – w ] H

u M+1 ( i ) = [ u ( i ), u ( i-1 ), …, u ( i-M ) ] H B

b M ( i ) = a M u M+1 ( i ) BH

u M+1 ( i ) = [ u ( i-M ), u ( i-M+1 ), …, u ( i ) ] (a) Using these definitions, rewrite the cost function as N



E =

H

H

H B

i=M+1

By definition,

aM =

BH

( a M u M+1 ( i )u M+1 ( i )a M + a M u M+1 ( i )u M+1 ( i )a M )

u(i) 1 and u M+1 ( i ) = u ( i-1 ) –w M

Hence,

178

N



E =

2

H

H

H

( u(i) + w Φ f w – θ f w – w θ f

i=M+1 2

H

H

H

+ u ( i – M ) + w Φb w – θb w – w θb ) where N



Φf =

H

u M ( i-1 )u M ( i-1 )

i=M+1 N



θf =

*

u M ( i-1 )u ( i )

i=M+1 N



Φb =

B*

BH

u M ( i )u M ( i )

i=M+1 N



θb =

B

*

u M ( i )u ( i-M )

i=M+1

Setting ∂E ------- = 0 ∂w and solving for the optimum value of w, we obtain –1

ˆ = Φ θ w where N

Φ = Φ f + Φb =



H

B*

BH

[ u M ( i-1 )u M ( i-1 ) + u M ( i )u M ( i ) ]

i=M+1 N

θ = θ f + θb =



B*

[ u M ( i-1 )u ( i ) + u M ( i )u ( i-M ) ]

i=M+1

179

(b) Finally N



E min = E f , min + E b, min =

2

2

H

ˆ [ u ( i ) + u ( i-M ) ] – θ w

i=M+1

(c) aˆ =

1 , ˆ –w

Φaˆ =

E min 0

N



Φ = Φ f + Φb =

H

B*

BH

[ u M ( i-1 )u M ( i-1 ) + u M ( i )u M ( i ) ]

(1)

i=M+1

Examining Eq. (1) and noting that H

u M ( i ) = [ u ( i ), u ( i-1 ), …, u ( i-M ) ] BH

u M ( i ) = [ u ( i-M ), u ( i-M+1 ), …, u ( i ) ] we easily find that

8.10

φ ( M-k, M-t ) = φ ( t, k ) ,

*

0 ≤ ( k, t ) ≤ M

*

φ ( k, t ) = φ ( t, k ) ,

0 ≤ ( k, t ) ≤ M

The SIR maximization problem may be stated as follows:  w H ss H w max  --------------------- w  H w Rw 

H

subject to C N -1 w = f N -1

(a) Using Lagrang’s method, the SIR maximization problem can be written as a minimization problem 1 H  H H minJ ( w ) = min  --- w Rw + λ ( C N -1 w – f N -1 )  2  where λ

H

= [ λ 1 …λ N -1 ] is the Lagrange vector (0 < λi < 1)

180

∂J ( w ) Set --------------- = 0 , obtaining ∂w H

That is, Rw + C N -1 λ = 0 –1

w opt = – R C N -1 λ

(1)

H

H

–1

Since C N -1 w opt = f N -1 = – C N -1 R C N -1 λ H

–1

λ = – [ C N -1 R C N -1 ]

–1

f N -1

(2)

Using Eq. (2) in (1): –1

H

–1

w opt = R C [ C R C ] –1

H

–1

f

–1

(b) When w opt = R C [ C R C ] H

H

–1

f , the SIR becomes

H

H

H

–H

w ss w w ss w ---------------------- = --------------------------------------------------------------------------------------------------------H H H –1 –H H –H H –1 –1 w Rw f [C R C] C R C[C R C] f H

H

–1

–H

H

–1

H

–1

–1

f [ C R C ] C R ss R C [ C R C ] f = ----------------------------------------------------------------------------------------------------------------------------H H –1 –H H –H H –1 –1 f [C R C] C R C[C R C] f (c) When there are no auxiliary constraints Cn-1 = 0, the fixed value fo will only determine the normalization of the solution yielding an unconditional maximum for SIR: –1

w opt = αR s H

(α is constant) –H

H

–1

H –1 s R ss R s SIR max = ---------------------------------------- = s R s H –H s R s

ˆ ( n ) = 1--(d) R n

n

∑ u ( i )u

H

(i)

i=1

ˆ ( n ) = λR ˆ ( n-1 ) + u ( n )u H ( n ) R

181

where 0 < λ < 1; the parameter λ is a weighting factor, not to be confused with the Lagrange multiplier. 8.11

(a) We are given

A =

1 –1 0.5 2

Therefore, T

1 0.5 1 – 1 – 1 2 0.5 2

A A =

= 1.25 0 0 5 This is a diagonal matrix. Hence, its eigenvalues are λ1 = 1.25 λ2 = 5 The singular values of matrix A are therefore σ1 =

λ1 =

5 1.25 = ------2

σ2 =

λ2 =

5

The eigenvectors of ATA are the right singular vectors of A. For the problem at hand, the eigenvectors of ATA constitute a unit matrix. We therefore have V = 10 01 (b) The matrix product AAT is

AA

T

=

1 – 1 1 0.5 0.5 2 – 1 2

182

2 – 1.5 – 1.5 4.25

=

The eigenvalues of AAT are the roots of the characteristic equation 2

( 2 – λ ) ( 4.25 – λ ) – ( 1.5 ) = 0 Expanding this equation: 2

λ – 6.25λ + 6.25 = 0 Solving for the roots of this quadratic equation: λ1 = 1.25 λ2 = 5 The singular values of the data matrix A are therefore σ1 =

5 λ 1 = ------2

σ2 =

λ2 =

5

which are identical to the values calculated in part (a). To find the eigenvectors of AAT, we note that 2 – 1.5 q 11 = 1.25 q 11 – 1.5 4.25 q 12 q 12 Hence, 0.75 q11 - 1.5 q12 = 0 Equivalently, q11 = 2 q12 2

2

Setting q 11 + q 12 = 1 , we find that

183

1 2 q 12 = ± ------- and q 12 = ± ------5 5 We may thus write 1 q 1 = ------- 1 5 2 Similarly, we may show that 1 q 2 = ------- 2 5 –1 The eigenvectors of AAT are the left singular vectors of A. We may thus express the left singular vectors of matrix A as 1 U = ------- 1 2 5 2 –1 The pseudoinverse of the matrix A is given by

A

+

–1

T = V Σ 0 U 0 0

2 ------- 0 1 5 ------- 1 2 = 1 0 5 2 –1 0 1 1 0 ------5 1 = --- 2 0 1 2 5 0 1 2 –1 1 = --- 2 4 5 2 –1 8.12

Given the 2 x 2 complex matrix

184

 1 + 0.5 j A =  1+ j   0.5 – j 1 – j  we have   H A A =  3.25 3  ,  3 3.25 

AA

H

  =  3.25 3 j   – 3 j 3.25 

The eigenvalues and eigenvectors of matrix A are found in the usual way by first solving A – λI = 0 for the eigenvalues λi for i = 1,2, and then using these values in  x   x A  1  = λi  1  x   x  2  2

   

to determine the eigenvectors x. The two eigenvalues in descending order, for both AHA and AAH are λ1 = 6.25 and λ2 = 0.25. Regarding the eigenvectors, we have for AHA    1  V =  x y  = -------  1 1  2  1 –1   x –y  where x and y are arbitrary real numbers, and we arbitrarily make the simplest choice that gives us an orthonormal set of vectors. For AAH we similarly have    1  U =  x y  = -------  j 1  2 1 j   jx – jy  for another arbitrary choice of x and y. The SVD of A is given by A = UΣVH

185

where the right singular matrix V is the eigen-matrix of AHA and the left singular matrix U is the eigen-matrix of AAH. The singular values should be the square roots of the common eigenvalues: σ 1 =  σ 0 Σ =  1  0 σ 2 

λ 1 = 2.5 and σ 2 = 0.5 with

   

Interestingly enough

UΣV

H

 = B =  1.25 + j0.25  0.25 + j1.25

– 1.25 + j0.25   0.25 – j1.25 

where, obviously B ≠ A, but, BHB = AHA and BBH = AAH. In order to have B = A, the arbitrariness in choosing the eigenvectors of AHA and AAH above, should be removed. The natural constraint relation is that A = UΣVH holds true. Starting from that premise, we take V as is and define U using

U = AVΣ

–1

 =  0.5657 + j0.4243  0.4243 – j0.5657

j0.7071   – 0.7071 

We can easily verify that the new matrix U is indeed composed of eigenvectors of AAH determined above. Clearly, the new U is related to the old one by some similarity transformation H

U new = W U old W where W is a unitary matrix. This problem illustrates the need for a less arbitrary procedure for determining the SVD of a matrix, even in the simplest of cases, with a 2 x 2 matrix. 8.13

We are given

A =

2 3 1 2 –1 1

186

We note that

A

T

= 2 1 –1 3 2 1

Next, we set up

T

A A = 2 1 –1 3 2 1

= 6 7

2 3 1 2 –1 1

7 14

The eigenvalues of ATA are roots of the characteristic equation: (6-λ)(14-λ) - (7)2 = 0 or, λ2 - 20λ + 35 = 0 Solving for the roots of this equation, we thus get λ = 10 ± 65 That is, λ 1 = 10 – 65 ≈ 1.94 ,

σ1 =

λ 1 = 1.393

λ 2 = 10 + 65 ≈ 18.06 ,

σ2 =

λ 2 = 4.25

To find the eigenvectors of ATA, we note that 6 7 q 11 = 1.94 q 11 7 14 q 12 q 12 Hence,

187

4.06 q11 + 7 q12 = 0 or q12 = - 0.58 q11 We also note that 2

2

q 11 + q 12 = 1 Therefore, 2

2 2

q 11 + ( 0.58 ) q 11 = 1 or 1 q 11 = ± ----------------- = ± 0.87 1.336 Hence, q 12 = − + 0.505 Similarly, we may show that the eigenvector associated with λ2 = 18.06 is defined by q21 = -0.505 q22 = -0.87 Therefore, the right singular vectors of the data matrix A constitute the matrix

V =

0.87 – 0.505 – 0.505 – 0.87

(1)

Next, we set up

AA

T

=

2 3 2 1 –1 1 2 32 1 –1 1

188

=

13 8 1 8 51 1 12

The eigenvalues of AAT are the roots of the third-order characteristic equation: 13-λ 8 1 8 5-λ 1 = 0 1 1 2-λ or λ(λ2 - 20λ + 25) = 0 The eigenvalues of AAT are therefore as follows λ0 = 0 λ 1 ≈ 1.94 λ 2 ≈ 18.06 The two nonzero eigenvalues of AAT are the same as those found for ATA. To find the eigenvectors of AAT, we note that for λ1 = 1.94: q 11 13 8 1 q 11 8 5 1 q 12 = 1.94 q 12 1 1 2 q q 13 13 11.06 q11 + 8 q12 + q13 = 0 8 q11 + 3.06 q12 + q13 = 0 q11 + q12 + 0.06 q13 = 0 Using the first two equations to eliminate q13: 3.06 q11 + 4.94 q12 = 0 Hence,

189

q 12 ≈ – 0.619q 11 Using the first and third equations to eliminate q12: 3.06 q11 + 0.52 q13 = 0 Hence, q 13 ≈ – 5.88q 11 Next, we note that 2

2

2

q 11 + q 12 + q 13 = 1 Therefore, 2

2

2

q 11 + 0.383q 11 + 34.574q 11 = 1 2

36q 11 ≈ 1 q 11 ≈ ± 0.167 Correspondingly, q 12 ≈ − + 0.109 q 13 ≈ − + 0.98

We may thus set 0.167 q 1 ≈ – 0.109 – 0.981

(2)

For the eigenvalue λ2 = 18.06, we may write q 21 13 8 1 q 21 8 5 1 q 22 = 18.06 q 22 1 1 2 q q 23 23

190

-5.06 q21 + 8 q22 + q23 = 0 8 q21 - 13.06 q22 + q23 = 0 q21 + q22 - 16.06 q23 = 0 Using the first two equations to eliminate q23: -13.06 q21 + 21.06 q22 = 0 q22 = 0.62 q21 Using the first and third equations to eliminate q22: -13.06 q21 + 129.48 q23 = 0 q 23 ≈ 0.101q 21 We next note that 2

2

2

q 21 + q 22 + q 23 = 1 We therefore obtain 2

2

2

q 21 + 0.384q 21 + 0.01q 21 = 1 q 21 ≈ ± 0.845 Hence, q 22 ≈ ± 0.524 q 23 ≈ ± 0.085 We may thus set – 0.845 q 2 = – 0.524 – 0.0845

(3)

Note that q2 is orthogonal to q1, as it should be; that is, T

q1 q2 = 0 .

191

To complete the eigencomputation, we need to determine the eigenvector associated with the zero eigenvalue of matrix AAT. Here we note 13 8 1 8 5 1 q0 = 0 , 1 1 2 where q 0

2

λ0 = 0

= 1 . Solving for q0, we get

0.5071 q 0 = – 0.8452 0.169

(4)

Putting together Eqs. (2) through (4) for the eigenvectors q1, q2, and q0, we may express the left singular vectors of the data matrix A as the matrix 0.167 0.845 0.5071 U = – 0.109 0.524 – 0.8452 – 0.981 0.088 0.169 where the third column corresponds to the zero eigenvalue of AAT. The singular value decomposition of the data matrix A may therefore be expressed as 0.167 – 0.845 0.5071 1.393 A = – 0.198 – 0.524 – 0.8452 0 – 0.98 – 0.085 0.169 0

0 0.87 4.25 – 0.505 0

– 0.505 – 0.87

0.167 – 0.845 0.87 – 0.505 = – 0.109 – 0.524 1.393 0 0 4.25 – 0.505 – 0.87 – 0.98 – 0.085 As a check, carrying out the matrix multiplication given here, we get 2.016 3.0069 A = 0.9925 2.0142 – 1.0065 1.0044

192

which is very close to the original data matrix. (a) The pseudoinverse of matrix A is

A

+

2

=

∑ i=1

T 1 ----- v i u i σi

(b) The least-squares weight vector is +

ˆ = A d w T

2

=

∑ i=1

( ui d ) --------------- v i σi

T

T

( ui d ) ( u2 d ) = --------------- v 1 + --------------- v 2 σ1 σ2

1 = ------------- 0.167 0.109 – 0.981 1.393

1 + ------------- 0.845 0.524 0.085 4.250

2 0.87 1 – 0.505 1 ⁄ 24 2 0.505 1 0.87 1 ⁄ 34

= 0.3859 0.3859

193

8.14

First we set up the following identifications: Least-squares

Normalized LMS

Data matrix

A

xH(n)

Desired data vector

d

e*(n)

Parameter vector

w

c(n+1)

Eigenvalue

σi2

||x(n)||2

ui

1

Eigenvector

Hence, the application of the linear least-squares solution yields 1 c ( n+1 ) = ------------------x ( n )e * ( n ) 2 x(n) That is to say, 1 ˆ ( n+1 ) = ---------------------------- u ( n )e * ( n ) δw 2 u(n) + a Correspondingly, we may write µ ˆ ( n+1 ) = w ˆ ( n ) + ---------------------------- u ( n )e * ( n ) w 2 u(n) + a where µ is the step-size parameter. 8.15

We are given an SVD computer that may be pictured as follows:

Data matrix A

singular values: σ1,σ2,...,σW SVD Computer

left singular vectors: u1,u2,...,uK right singular vectors v1,v2,...,vM

W: rank of data matrix A K: number of rows of A M: number of columns of A The MVDR spectrum is defined by

194

1 S MVDR ( ω ) = -------------------------------------H –1 s ( ω )Φ s ( ω )

(1)

–1

where s(ω) is the steering vector, and Φ is the inverse of the correlation matrix Φ . Specifically, Φ is defined in terms ofthe data matrix A by H

Φ = A A where W

∑ σi ui vi

A =

H

i=1

A

H

W

∑ σi vi ui

=

H

i=1

That is, W

Φ =

∑ σi vi vi 2

H

ui

,

2

= 1 for all i

i=1

Correspondingly, we may express the inverse matrix Φ Φ

–1

W

=

1

–1

as

∑ -----2- vi vi

H

(2)

i=1 σ i

Hence, the denominator of the MVDR spectrum may be expressed as H

W

–1

s ( ω )Φ s ( ω ) =

i=1 σ i W

=

1 H

∑ -----2- s

H

( ω )v i v i s ( ω )

∑ zi ( ω )

2

(3)

i=1

where zi(ω) is a frequency-dependent scalar that is defined by the inner product:

195

1 H z i ( ω ) = ----- s ( ω )v i, σi

i = 1, 2, …, W

(4)

Accordingly, Eq. (2) may be rewritten as 1 S MVDR ( ω ) = ------------------------W 2 ∑ zi ( ω ) i=1

(5)

Thus, to evaluate the MVDR spectrum, we may proceed as follows: •

Compute the SVD of the data matrix A.



Use the right-singular vectors vi and the corresponding singular values σi in Eq. (4) to evaluate zi(ω) for i = 1,2,...,W, where W is the rank of A.



Use Eq. (5) to evaluate the MVDR spectrum.

196

CHAPTER 9 9.1

Assume that β ( n, i ) = λ ( i )β ( n, i-1 ),

i = 1, …, n

Hence, for i = n: β ( n, n ) = λ ( n )β ( n, n-1 ) Since β(n,n) = 1, we have –1

λ ( n ) = β ( n, n-1 ) Next, for i = n-1, β ( n, n-1 ) = λ ( n-1 )β ( n, n-2 ) or equivalently, –1

β ( n, n-2 ) = λ ( n-1 )β ( n, n-1 ) –1

–1

= λ ( n-1 )λ ( n ) Proceeding in this way, we may thus write –1

–1

–1

β ( n, i ) = λ ( i+1 )…λ ( n-1 )λ ( n ) n

=



–1

λ (k)

k=i+1

For β(n,i) to equal λn-i, we must have –1

–1

–1

λ ( i+1 )…λ ( n-1 )λ ( n ) = λ

n-i

This requirement is satisfied by choosing λ(k ) = λ

–1

for all k

197

We thus find that

  

β ( n, i ) =

n – i terms λ…λλ

= λ 9.2

n-i

The matrix inversion lemma states that if A = B-1 + CD-1C

(1)

then A-1 = B - BC(D + CHBC)-1CHB

(2)

To prove this lemma, we multiply Eq. (1) by (2): AA-1 = (B-1 + CD-1CH)[B - BC(D + CHBC)-1CHB] = B-1B - B-1BC(D + CHBC)-1CHB - CD-1CHBC(D + CHBC)-1CHB + CD-1CHB We have to show that AA-1 = I. Since (D + CHBC)-1(D + CHBC) = I, and B-1B = I, we may rewrite this result as AA-1 = I - C( D + CHBC)-1CHB + CD-1(D + CHBC)( D + CHBC)-1CHB - CD-1CHBC(D + CHBC)-1CHB = I - [C - CD-1(D + CHBC) - CD-1CHBC] · (D + CHBC)-1CHB = I - (C - CD-1D)(D + CHBC)-1CHB Since D-1D = I, the second term in this last line is zero; hence, AA-1 = I which is the desired result.

198

9.3

We are given H

Φ ( n ) = δI + u ( n )u ( n )

(1)

Let A = B-1 + CD-1CH

(2)

Then, according to the matrix inversion lemma: A-1 = B - BC(D + CHBC)-1CHB

(3)

From Eqs. (1) and (2), we note: A = Φ(n) B-1 = δI C = u(n) D = I Hence, using Eq. (3): –1 H –1 1 H 1 1 Φ ( n ) = --- I – -----u ( n )  1 + --- u ( n )u ( n ) u ( n )   δ δ δ2 H 1  u ( n )u ( n )  = ---  ------------------------------------ δ  δ + u H ( n )u ( n )

1 δI = ---  ------------------------------------ δ  δ + u H ( n )u ( n ) 1 =  ------------------------------------ I   H δ + u ( n )u ( n ) 9.4

From Section 9.6, we have *

ˆ (n) = w ˆ ( n-1 ) + k ( n )ξ ( n ) w

(1)

199

Define ˆ (n) ε ( n ) = wo – w We may thus write *

ε ( n ) = ε ( n-1 ) – k ( n )ξ ( n )

(2)

Since *

ξ ( n ) = d ( n ) – wˆ ( n-1 )u ( n ) *

eo ( n ) = d ( n ) – wo u ( n ) We may expand Eq. (2) as follows *

ˆ ( n-1 )u ( n ) ) ε ( n ) = ε ( n-1 ) – k ( n ) ( d ( n ) – w *

*

*

ˆ ( n-1 )u ( n ) ) = ε ( n-1 ) – k ( n ) ( d ( n ) – w o u ( n ) – w *

= ε ( n-1 ) – k ( n ) ( e o ( n ) + ε ( n-1 )u ( n ) ) *

*

*

*

= ε ( n-1 ) – k ( n )e o ( n ) – k ( n )u ( n )ε ( n-1 ) *

*

= 1 – k ( n ) ( u ( n ) )ε ( n-1 ) – k ( n )e o ( n ) Hence, *

a ( n ) = 1 – k ( n )u ( n ) < 1, provided that 0 < |k(n)u*(n)| < 1. Then under this condition, ε ( n ) is guaranteed to decrease with increasing n. The convergence process is perturbed by the white noise eo(n). 9.5

From Eq. (9.25) of the text: *

ˆ (n) = w ˆ ( n-1 ) + k ( n )ξ ( n ) w

200

where we have *

ˆ (1) = w ˆ ( 0 ) + k ( 1 )ξ ( 1 ) w *

...

ˆ (2) = w ˆ ( 1 ) + k ( 2 )ξ ( 1 ) w *

ˆ (n) = w ˆ ( n-1 ) + k ( n )ξ ( n ) w Summing the above n equations, we have n

ˆ (n) = w ˆ ( 0 ) + ∑ k ( i )ξ ( i ) w *

i=1

Hence ˆ (n) ε ( n ) = wo – w n

ˆ ( 0 ) – ∑ k ( i )ξ ( i ) = wo – w *

i=1 n

= ε ( 0 ) – ∑ k ( i )ξ ( i ) *

i=1

9.6

The a posteriori estimation error e(n) and the a priori estimation error ξ(n) are related by e ( n ) = γ ( n )ξ ( n ) where γ ( n ) is defined by [see Eq. (9.42) of the text] 1 γ ( n ) = --------------------------------------------------------------------–1 H –1 1 + λ u ( n )Φ ( n-1 )u ( n ) Hence, for n = 1 we have e ( 1 ) = γ ( 1 )ξ ( 1 ) where

201

1 γ ( 1 ) = ---------------------------------------------------------------–1 H –1 1 + λ u ( 1 )Φ ( 0 )u ( 1 ) 1 = ------------------------------------------------------, –1 –1 H 1 + λ δ u ( 1 )u ( 1 ) λδ = -----------------------------, 2 λδ + u ( 1 ) λδ ≈ ----------------, 2 u(1)

–1

–1

Φ (0) = δ I

u(1) = u(1) 0 δ M+1 )

–1 1 –1 E [ Φ ( n ) ] = -----------------R n-M+1

Usually n >> M-1. Hence 1 –1 –1 E [ Φ ( n ) ] ≈ --- R n H

H

(b) By definition D ( n ) = E [ ε ( n )ε ( n ) ] , and K ( n ) = E [ ε ( n )ε ( n ) ] ; thus D ( n ) = tr [ K ( n ) ] From Eq. (9.58), we may obtain 2 1 2 –1 –1 K ( n ) = σ o E [ Φ ( n ) ] = --- σ o R n

Hence, –1 1 2 D ( n ) = --- σ o tr [ R ] n M

1 2 1 = --- σ o ∑ ---n λi i=1

9.9

Consider the difference: X ( n ) = ∈ H ( n )Φ ( n ) ∈ ( n ) - ∈ H ( n-1 )Φ ( n-1 ) ∈ ( n-1 )

(1)

From RLS theory: –1

*

∈ ( n ) = ∈ ( n-1 ) – Φ ( n )u ( n )ξ ( n )

(2)

We may therefore rewrite Eq. (1) as –1

*

H

–1

*

X ( n ) = [ ∈ ( n-1 ) – Φ ( n )u ( n )ξ ( n )] Φ ( n )[ ∈ ( n-1 ) + Φ ( n )u ( n )ξ ( n )] H

- ∈ ( n-1 )Φ ( n-1 ) ∈ ( n-1 ) H

= ∈ ( n-1 )[Φ ( n ) – Φ ( n-1 )]- ∈ ( n-1 )

204

H

–1

– ξ ( n )u ( n )Φ ( n )Φ ( n ) ∈ ( n-1 ) H

*

–1

– ξ ( n ) ∈ ( n-1 )Φ ( n )Φ ( n )u ( n ) 2 H

–1

–1

+ ξ ( n ) u ( n )Φ ( n )Φ ( n )Φ ( n )u ( n ) H

–1

= ∈ ( n-1 ) [ Φ ( n ) – Φ ( n-1 ) ] ∈ ( n-1 ) H

H

*

– ξ ( n )u ( n ) ∈ ( n-1 ) – ξ ( n ) ∈ ( n-1 )u ( n ) 2 H

+ ξ ( n ) u ( n )Φ ( n )u ( n )

(3)

Moreover, from RLS theory: H

Φ ( n ) = Φ ( n-1 ) + u ( n )u ( n ) , –1

H

λ=1

(4)

λ=1

(5)

–1

r ( n ) = 1 + u ( n )Φ ( n )u ( n ) Hence, using Eqs. (4) and (5) in (3) yields H

H

X ( n )= ∈ ( n-1 )u ( n )u ( n ) ∈ ( n-1 ) H

*

H

– ξ ( n )u ( n ) ∈ ( n-1 ) – ξ ( n ) ∈ ( n-1 )u ( n ) 2

–1

+ ξ(n) (1 – r (n))

(6)

Again, from RLS theory: H

∈ ( n-1 )u ( n ) = ξ u ( n )

(7)

ν ( n ) = ξ ( n ) – ξu ( n )

(8)

Hence, substituting Eqs. (7) and (8) into Eq. (6), we finally get 2

*

*

2

–1

X ( n ) = ξ u ( n ) – ξ ( n )ξ u ( n ) – ξ ( n )ξ u ( n ) + ξ ( n ) ( 1 – r ( n ) )

205

2

–1

= ξ ( n ) – ξu ( n ) – r ( n ) ξ ( n ) 2

–1

= ν(n) – r (n) ξ(n)

2

2

(9)

Recalling the definition of X(n) given in Eq. (1), we may thus write H

H

2

–1

∈ ( n )Φ ( n ) ∈ ( n )- ∈ ( n-1 )Φ ( n-1 ) ∈ ( n-1 ) = ν ( n ) – r ( n ) ξ ( n ) or equivalently 2

H 2 ξ(n) ∈ ( n )Φ ( n ) ∈ ( n ) + ---------------- = ∈H ( n-1 )Φ ( n-1 ) ∈ ( n-1 ) + ν ( n ) r(n)

which is the desired result.

206

2

CHAPTER 10 10.1

Let α(1) = y(1)

(1)

α ( 2 ) = y ( 2 ) + A 1, 1 y ( 1 )

(2)

where the matrix A1,1 is to be determined. This matrix is chosen so as to make the innovations processes α(1) and α(2) uncorrelated with each other. That is, H

E [ α ( 2 )α ( 1 ) ] = 0

(3)

Substitute Eqs. (1) and (2) into (3): H

H

E [ y ( 2 )y ( 1 ) ] + A 1, 1 E [ y ( 1 )y ( 1 ) ] = 0 H

Postmultiplying both sides of this equation by the inverse of E [ y ( 1 )y ( 1 ) ] and rearranging: H

H

A 1, 1 = – E [ y ( 2 )y ( 1 ) ] { E [ y ( 1 )y ( 1 ) ] }

–1

We may rewrite Eqs. (1) and (2) in the compact form 0 y(1) α(1) = I A 1, 1 I y ( 2 ) α(2) This relation shows that, given the observation vectors y(1) and y(2), we may compute the innovations processes α(1) and α(2). The block lower triangular transformation matrix I 0 A 1, 1 I is invertible since its determinant equals 1. Hence, we may recover y(1) and y(2) from α(1) and α(2) by using the relation:

207

0 y(1) = I A 1, 1 I y(2)

–1

α(1) α(2)

I 0 α(1) – A 1, 1 I α ( 2 )

=

In general, we may express the innovations process α(n) as a linear combination of the observation vectors y(1), y(2),...,y(n) as follows: α ( n ) = y ( n ) + A 1, 1 y ( n – 1 ) + … + A n-1, n-1 y ( 1 ) n

=

∑ An, 1, k y ( n-k+1 ),

n = 1, 2, …,

k=1

where An-1,0 = I. The set of matrices {An-1,k} is chosen to satisfy the following conditions H

E [ α ( n+1 )α ( n ) ] = 0

n = 1, 2, …,

We may thus write … 0 y(1) … 0 y(2) ...

0 I

...

...

...

α(n)

I A 1, 1

...

α(1) α(2) =

A n, 1, n-1 A n, 1, n-2 … I y ( n )

The block lower triangular transformation matrix is invertible since its determinant equals one. Hence, we may go back and forth between the given set of observation vectors { y ( 1 ), y ( 2 ), …, y ( n ) } and the corresponding set of innovations processes { α ( 1 ), α ( 2 ), …, α ( n ) } without any loss of information. 10.2

First, we note that H

H

H

E [ ε ( n, n-1 )v 1 ( n ) ] = E [ x ( n )v 1 ( n ) ] – E [ xˆ ( n y n-1 )v 1 ( n ) ] Since the estimate xˆ ( n y n – 1 ) consists of a linear combination of the observation vectors y ( 1 ), …, y ( n-1 ) , and since

208

H

E [ y ( k )v 1 ( n ) ] = 0,

0≤k≤n

it follows that H

E [ xˆ ( n y n-1 )v 1 ( n ) ] = 0 We also have H E [ x ( n )v 1 ( n ) ]

=

H Φ ( n, 0 )E [ x ( 0 )v 1 ( n ) ] +

n-1

∑ Φ ( n, i )E [ v1 ( i )v1 ( n ) ] H

i=1

Since, by hypothesis H

E [ x ( 0 )v 1 ( n ) ] = 0 and H

E [ v 1 ( i )v 1 ( n ) ] = 0,

0≤i≤n

it follows that H

E [ x ( n )v 1 ( n ) ] = 0 Accordingly, we deduce that H

E [ ε ( n, n-1 )v 1 ( n ) ] = 0 Next, we note that H

H

H

E [ ε ( n, n-1 )v 2 ( n ) ] = E [ x ( n )v 2 ( n ) ] – E [ xˆ ( n y n-1 )v 2 ( n ) ] We have H

E [ x ( n )v 2 ( n ) ] = 0 Also, since ( xˆ ( n ) y n-1 ) consists of a linear combination of y(1),...,y(n-1) and since

209

H

E [ y ( k )v 2 ( n ) ] = 0 ,

1 ≤ k ≤ n-1

it follows that H

E [ xˆ ( n y n-1 )v 2 ( n ) ] = 0 We therefore conclude that H

E [ ε ( n, n-1 )v 2 ( n ) ] = 0 10.3

The estimated state-error vector equals ε ( i, n ) = x ( i ) – xˆ ( i y n ) n

= x(i) –

∑ bi ( k )α ( k ) k=1

The expected value of the squared norm of ε(i,n) equals 2

H

E [ ε ( i, n ) ] = E [ ε ( i, n )ε ( i, n ) ] n

=

n

∑ ∑

H b i ( k )b i ( l )E [ α * ( k )α ( l ) ] –

k=1 l=1 n



∑ E[x

H

n

∑ bi

H

( k )E [ x ( i )α * ( k ) ]

k=1

H

( i )α ( k ) ]b i ( k ) + E [ x ( i )x ( i ) ]

k=1

Differentiating this index of performance with respect to the vector b(k) and setting the result equal to zero, we find that the optimum value of bi(k) is determined by 2b i ( k )E [ α ( k )α * ( k ) ] – 2E [ x ( i )α * ( k ) ] = 0 Hence, the optimum value of bi(k) equals –2

b i ( k ) = E [ x ( i )α * ( k ) ]σ α

210

where 2

2

σα = E [ α ( k ) ] Correspondingly, the estimate of the state vector equals n

xˆ ( i y n ) =

∑ E [ x ( i )α* ( k ) ]σα α ( k ) –2

k=1 n

=

∑ E [ x ( i )ϕ* ( k ) ]ϕ ( k ) k=1

where ϕ ( k ) is the normalized innovation: α(k ) ϕ ( k ) = ----------σα 10.4

(a) The matrices K(n,n-1) and Q2(n) are both correlation matrices and therefore nonnegative definite. In particular, we note K(n,n-1) = E[ε(n,n-1)εH(n,n-1)] Q2(n) = E[v2(n)v2H(n)] where ε(n,n-1) is the predicted state-error vector, and v2(n) is the measurement noise vector. We may therefore express R(n) in the equivalent form R(n) = E[e(n)eH(n)] where e(n) = C(n)ε(n,n-1) + v2(n) where it is noted that ε(n,n-1) and v2(n) are uncorrelated. We now see that R(n) is a correlation matrix and therefore nonnegative definite. (b) For R(n) to be nonsingular, and therefore for the inverse matrix R-1(n) to exist, we demand that Q2(n) be positive definite such that the determinant of

211

C(n)K(n,n-1)CH(n) + Q2(n) is nonzero. This requirement, in effect, says that no measurement is exact, hence the unavoidable presence of measurement noise and therefore Q2(n). Such a requirement is reasonable on physical grounds. 10.5

In the limit, when n approaches infinity, we may put K(n+1,n) = K(n,n-1) = K Under this condition, Eq. (10.54) of the text simplifies to K = (I - GC)K(I - CHGH) + Q1 + GQ2GH

(1)

where it is assumed that the state-transition matrix equals the identity matrix, and G, C, Q1 and Q2 are the limiting values of the matrices G(n), C(n), Q1(n) and Q2(n), respectively. From Eq. (10.49) we find that the limiting value of the Kalman gain equals G = KCH(CKCH + Q2)-1

(2)

Expanding Eq. (1) and then using Eq. (2) to eliminate G, we get KCH(CKCH + Q2)-1CK - Q1 = O

10.6

x ( n+1 ) = 0 1 x ( n ) + v 1 ( n ) 1 1

Q1 = I ; E [ v1 ] = 0

y ( n ) = 1 0 x ( n ) + v2 ( n )

Q2 = 1 ; E [ v2 ] = 0

(a) Using Table 10.2, we formulate the recursions for computing the Kalman filter Known parameters: F ( n+1, n ) = F = 0 1 1 1 c(n) = c = 1 0 Q1 ( n ) = I Q2 ( n ) = 1

212

Unknown parameters:

K ( n ), n-1 ) = K =

k 11 k 12 k 21 k 22

We may then write

  G ( n ) = 0 1 K ( n, n-1 ) 1  1 0 K ( n, n-1 ) 1 + 1  1 1 0  0

–1

=

k 21 ( n, n-1 ) -----------------------------------k 11 ( n, n-1 ) + 1 k 11 ( n, n-1 ) + k 21 ( n, n-1 ) ------------------------------------------------------------k 11 ( n, n-1 ) + 1

α ( n ) = y ( n ) – 1 0 xˆ ( n Y n-1 ) = y ( n ) – xˆ 1 ( n Y n-1 ) { k 11 ( n, n-1 ) – k 21 ( n, n-1 ) }xˆ ( n Y ) + k 21 ( n, n-1 )y ( n ) 1 n-1 --------------------------------------------------------------------------------------------------------------------------------------------------k 11 ( n, n-1 ) + 1

) + G ( n )α ( n ) = xˆ ( n+1 Y ) = 0 1 xˆ ( n Y n-1 n 1 1 { 1 – k 21 ( n, n-1 ) }xˆ ( n Y ) + { k 11 ( n, n-1 ) + k 21 ( n, n-1 )y ( n ) } 1 n-1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------

k 11 ( n, n-1 ) + 1

(1) K ( n ) = K ( n, n-1 ) – 0 1 G ( n ) 1 0 K ( n, n-1 ) = 11

k 11 + k 21

------------------------- 0 k 11 + 1 = K ( n, n-1 ) – K ( n, n-1 ) = K ( n, n-1 ) – k 11 + 2 k 21 ---------------------------- 0 k 11 + 1

=

– k 21 + 1 k 11 ---------------------k 11 + 1

– k 21 + 1 k 12 ---------------------k 11 + 1

k 11 + 2 k 21 k 21 – k 11 ---------------------------k 11 + 1

k 11 + 2k 21 k 22 – k ---------------------------12 k + 1 11

213

k 11 + k 21 k 11 -----------------------k 11 + 1

k 11 + k 21 k 12 -----------------------k 11 + 1

k 11 + 2 k 21 k 11 ---------------------------k 11 + 1

k 11 + k 21 k 12 ---------------------------k 11 + 1

K ( n+1, n ) = 0 1 K ( n ) 0 1 + I = 1 1 1 1

=

 k 11 + 2k 21 1 + k 22 – k  ------------------------ 12  k 11 + 1   k 11 + k 21 k 22 – k  --------------------- 12 k + 1  11 

k 21 + k

 k 11 + 2k 21 – ( k 11 + k 12 )  ------------------------ 22  k 11 + 1 

(2)

 1 – k 11 + k 21 1 + k 21 + k + ( k 11 + k 12 )  ------------------------------ 22  k 11 + 1 

For this particular case, the Kalman filtering process is entirely described by the pair of equations (1) and (2). (b) The generalized form of the algebraic Riccati equation is: H

H

–1

H

H

K – FKF + FKC ( CKC + Q 2 ) CKF – Q 1 = 0 here we use the fact that the predicted state-error correlation matrix K is symmetric

For our problem:

F = 0 1 , 1 1

Q1 = I ,

C = 10 ,

Q2 = [ 1 ] ,

K =

k1 k2 k2 k3

Therefore, we have:

                  

C k1 k2 – k2 k3

0 1 k1 1 1 k2

k2 0 1 + k3 1 1

          

K=

B  k2 1 + 1  k3 0 

–1

1 0

k1 k2 0 1 –I = 0 k2 k3 1 1

                        

 k1 0 1 k1 k2 1  1 0 1 1 k 2 k 3 0  k2

A

214

k k k k 1 A = ------------ 0 1 1 2 1 0 1 2 0 1 k 1 +1 1 1 k k 0 0 k k 1 1 2 3 2 3 k3 k2 1 1 0 k 2 k 1 +k 2 = -----------k 1 +1 k +k k +k 0 0 k k +k 1 2 2 3 3 2 3

k2 0 k2 1 = -----------k 1 +1 k +k 0 k 1 2 3

=

k 2 +k 3

k2 -----------k 1 +1

2

k 2 ( k 1 +k 2 ) ------------------------k 1 +1

k 2 ( k 1 +k 2 ) ------------------------k 1 +1

( k 1 +k 2 ) ---------------------k 1 +1

k B = 01 1 1 1 k2

=

k 1 +k 2

k2

1 = -----------k 1 +1

k2 1 0 k3 k2 0 1 = k3 0 0 k 1 +k 2 k 2 +k 3 1 1 k 2 +k 3

k 1 -k 3

–k 3

–k 3

– k 1 -2k 2

C+A–I = 0

215

k 2 ( k 1 +k 2 )

k 2 ( k 1 +k 2 ) ( k 1 +k 2 )

2

k 2 +k 3 k 1 +2k +k 3 2

C = K–B =

2

k2

2

2

 k2   ------------ + k 1 -k 3 -1  k 1 +1 

 k 2 ( k 1 +k 2 )   ------------------------- – k 3  k 1 +1 

 k 2 ( k 1 +k 2 )   ------------------------- – k 3  k 1 +1 

 ( k 1 +k 2 )   ---------------------- – k 1 -2k 2 -1  k 1 +1 

                  

2

= 00 00

k 2 ( k 1 +k 2 ) k 3 = ------------------------k 1 +1  –k 1 k 2  -------------- + k 1 -1=0  k 1 +1  ⇒  ( k +k ) 2 1 2  --------------------- – k 1 -2k 2 -1=0 k +1  1 

2

k 2 – k 2 ( k 1 +k 2 ) ------------------------------------- + k 1 -1=0 k 1 +1



2

( k 1 +k 2 ) ---------------------- – k 1 -2k 2 -1=0 k 1 +1

 2 k1 – 1   k 2 = -------------k1    k 2 +2k k +k 2 -k 2 -k – 2k k – 2k – k – 1=0 1 2 2 1 1 1 2 2 1  1 2

2

 2 k1 – 1   k 2 = -------------k1 ⇒   k 2 – 2k – 2k – 1=0 2 1  2





2

k1 – 1 (k1 – 1) ---------------------- – 2 -------------- – 2k 1 – 1 = 0 2 k1 k1 4

2

4

3

3

3

2

k 1 – 2k 1 + 1 – 2k 1 + 2k 1 – 2k 1 – k 1 = 0 2

k 1 – 4k 1 – 3k 1 + 2k 1 + 1 = 0 This equation has four different real-valued solutions, but it is easy to show that only one of them is meaningful in the context of our problem.

216

10.7

(a) The state equations are x(n+1) = F(n+1,n)x(n) + v1(n)

(1)

y(n) = C(n)x(n) + v2(n)

(2)

We are given x(n) = [a1(n),...,aM(n),...,aM+N(n)]T C(n) = [-y(n-1),...,-y(n-M),v(n-1),...,v(n-N)] We are also given ak(n+1) = ak(n) + wk(n),

k = 1,...,M+N

which means that x(n+1) = x(n) + w(n)

(3)

Comparing Eqs. (1) and (3) we therefore deduce that for the problem at hand F(n+1+,n) = I and v1(n) = w(n) We next note that the difference equation describing the given ARMA process may be recast as follows M

y ( n ) = – ∑ a k ( n )y ( n – k ) + k=1

N

∑ a M + k ( n )v ( n – k ) + v ( n ) k=1

= C ( n )x ( n ) + v ( n )

(4)

Comparing Eqs. (2) and (4) we therefore deduce y(n) = y(n) v2(n) = v(n) This completes the evaluation of the state equations. (b) Since the transition matrix equals the identity matrix, we find that the predicted and filtered forms of the state vector are the same. Hence,

217

G(n) = K(n,n-1)CH(n)[C(n)K(n,n-1)CH(n) + Q2(n)]-1 α(n) = y(n) - C(n)x(n|Yn-1) xˆ ( n+1 Y n ) = xˆ ( n Y n-1 ) + G ( n )α ( n ) x ( n Y n ) = xˆ ( n+1 Y n ) K(n) = K(n,n-1) - G(n)C(n)K(n,n-1) K(n+1,n) = K(n) + Q1(n) (c) To initialize the algorithm, we set xˆ ( 0 Y 0 ) = E [ x ( 0 ) ] , H

K ( 0 ) = E [ x ( 0 )x ( 0 ) ] , 10.8

v ( n ) = ν1 ( n ) w ( n ) = ν2 ( n )

We are given the state equations x(n+1) = A x(n) + b v(n)

(1)

y(n) = hTx(n) + w(n)

(2)

We may therefore make the following identifications: F(n+1,n) = A v(n) = b v(n) C(n) = hT v2(n) = w(n) Using the Kalman filtering algorithm, we have 2 –1

T

G ( n ) = AK ( n, n-1 )h [ h K ( n, n-1 )h + σ w ] T

α ( n ) = y ( n ) – h xˆ ( n Y n-1 )

218

xˆ ( n+1 Y n-1 ) = Axˆ ( n Y n-1 ) + G ( n )α ( n ) xˆ ( n Y n ) = Axˆ ( n+1 Y n ) T

K ( n ) = K ( n, n-1 ) – AG ( n )h K ( n, n-1 ) T

2

K ( n+1,n ) = AK ( n )A + σ v b b

T

We note the following: 2

T

1.

The factor ( h K ( n, n-1 )h + σ w ) is a scalar.

2.

The Kalman gain G(n) is a vector.

3.

The matrix A is M-by-M. The matrix K(n,n-1) is also M-by-M. The vector h is M-by-1. Hence, the Kalman gain is an M-by-1 vector.

The dynamics of the message source are represented by the state Eq. (1), where the M-byM matrix A and the M-by-1 vector b are defined by

...

...

0 . . . . . . 0 10 . . . . . 0 A = 0 1 0 . . . . 0 0 . . . . 0 10

...

1 b = 0 0

The elements of the state vector x(n) consist of the channel input represented by u(n) = v(n) and successively delayed versions of it; v(n) is modeled as a random binary white noise sequence. Equation (1) simply states that each succeeding component at time n is equal to the previous component at time n-1. The channel output is described by Eq. (2), where y(n) is the measured output, hT(n) is the 1-by-M row vector of channel coefficients, and w(n) is a Gaussian white noise sequence independent of v(n).

219

For the digital communication system described by Eqs. (1) and (2), use of the Kalman filter yields xˆ ( n+1 Y n ) = Axˆ ( n Y n-1 ) + G ( n )α ( n ) where G(n) is the Kalman gain, and T

α ( n ) = y ( n ) – h xˆ ( n Y n-1 ) The resulting general form for the Kalman filter is depicted in Fig. 1. We see from this figure that the Kalman filter consists of an IIR filter with forward coefficients defined by the channel impulse response and feedback coefficients defined by the Kalman gain (which in this problem is a column vector).

. g1

Σ

^

x(n|yn-1)

.

z-1

h0

Mean of the input v(n), which is given to be zero

..

.

.

g2

g3

Σ

z-1

... ... .



.

gM-1

Σ

z-1

..

h1

h2

...

hM-2

hM-1

Σ

Σ

...

Σ

Σ α(n)

Σ

Figure 1

10.9

gM

Σ x^ M(n+1|yn)

Estimate hTx(n|yn-1)

+ _

Channel output y(n)

We start with α(n) = y(n) - C(n)x(n)|Yn-1)

(1)

xˆ (n+1|Yn) = F(n+1,n) xˆ (n|Yn-1) + G(n)α(n)

(2)

Substituting Eq. (1) into (2): xˆ ( n+1 Y n ) = F ( n+1, n )xˆ ( n Y n-1 ) + G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) ]

220

The filtered state estimate is xˆ ( n Y n ) = F ( n, n+1 )xˆ ( n+1 Y n ) = xˆ ( n Y n-1 ) + F ( n+1, n )G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) ] = [ I – F ( n+1, n )G ( n )C ( n ) ]xˆ ( n Y n-1 ) + F ( n+1, n )G ( n )y ( n ) But y(n) = C(n)x(n) + v2(n) Hence, xˆ ( n Y n ) = [ I – F ( n, n+1 )G ( n )C ( n ) ]xˆ ( n Y n-1 ) + F ( n, n+1 )G ( n )C ( n )x ( n ) + F ( n, n+1 )G ( n )v 2 ( n ) Taking expectations, and recognizing that the measurement noise vector v2(n) has zero mean: E [ xˆ ( n Y n ) ] = [ I – F ( n, n+1 )G ( n )C ( n ) ]E [ xˆ ( n Y n-1 ) ] + F ( n, n+1 )G ( n )C ( n )E [ xˆ ( n ) ] For n=1, we thus get E [ xˆ ( 1 Y 1 ) ] = [ I – F ( 1, 2 )G ( 1 )C ( 1 ) ]E [ xˆ ( 1 Y 0 ) ] + F ( 1, 2 )G ( 1 )C ( 1 )E [ x ( 1 ) ] Since the one-step prediction xˆ ( 1 Y 0 ) must be specified, we have E [ xˆ ( 1 Y 0 ) ] = xˆ ( 1 Y 0 ) Accordingly, substituting the choice of the initial condition xˆ ( 1 Y 0 ) = E [ x ( 1 ) ] in Eq. (1) yields

221

(1)

E [ xˆ ( 1 Y 1 ) ] = [ I – F ( 1, 2 )G ( 1 )C ( 1 ) ]E [ x ( 1 ) ] + F ( 1, 2 )G ( 1 )C ( 1 )E [ x ( 1 ) ] = E [x(1)] By induction, we may go on to show that in general: E [ xˆ Y n ] = E [ x ( n ) ] In other words, the filtered estimate xˆ ( n y n ) produced by the Kalman filter is an unbiased estimate for the specified method of initialization. 10.10 MAP derivation of Kalman filter f XY ( x ( n ), Y n ) f XY ( x ( n ), y ( n ), Y n-1 ) (a) f X ( x ( n ) y ( n ) ) = ------------------------------------ = -----------------------------------------------------f Y (Y n) f Y ( y ( n ), Y n-1 )

(1)

where f XY ( x ( n ), y ( n ), Y n-1 ) = f Y ( y ( n ) x ( n ), Y n-1 ) f XY ( x ( n ), Y n-1 ) = f Y ( y ( n ) x ( n ), Y n-1 ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 ) = f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 )

(2)

where we have used the fact that y ( n ) = C ( n )x ( n ) + v 2 ( n ) , and v 2 ( n ) does not depend on Yn-1. Consequently, f Y ( y ( n ) x ( n ), Y n-1 ) = f Y ( y ( n ) x ( n ) ) Substituting Eq. (2) into (1), we thus get f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 ) f X ( x ( n ) Y n ) = ------------------------------------------------------------------------------------------------f Y ( y ( n ), Y n-1 ) f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 ) = ------------------------------------------------------------------------------------------------f Y ( y ( ( n ) Y n-1 ) ) f Y ( Y n-1 )

222

f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) = -------------------------------------------------------------------------f Y ( y ( n ), Y n-1 )

(1)

(b) Examine first f Y ( y ( n ) x ( n ) ) ; its mean is E [ y ( n ) x ( n ) ] = E [ C ( n )x ( n ) + v 2 ( n )x ( n ) ] = C ( n )x ( n ) The variance is var [ y ( n ) x ( n ) ] = var [ v 2 ( n ) x ( n ) ] = Q 2 ( n ) , where Q2(n) = correlation matrix of measurement noise v2(n). Thus, assuming Gaussianity, we may write H –1 1 f Y ( y ( n ) x ( n ) ) = A 1 exp  – --- ( y ( n ) – C ( n )x ( n ) ) Q 2 ( n ) ( y ( n ) – C ( n )x ( n ) )  2 

(2)

where the constant A1 is a proper scaling factor. Consider next f X ( x ( n ) Y n-1 ) ; its mean is E [ x ( n ) Y n-1 ] = E [ F ( n, n-1 )xˆ ( n-1 ) + v 1 ( n-1 ) Y n-1 ] = F ( n, n-1 )xˆ ( n-1 ) = xˆ ( n Y n-1 ) The variance is var [ x ( n ) Y n-1 ] = var [ x ( n ) – xˆ ( n ) Y n-1 ] = var [ ε ( n, n-1 ) ] where ε(n,n-1) is the state-error vector. Denote this variance by K(n,n-1), which is to be determined. Again, assuming Gaussianity, we may write

223

H –1 1 f X ( x ( n ) Y n-1 ) = A 2 exp  – --- ( x ( n )-xˆ ( n ) Y n-1 ) K ( n, n-1 ) ( x ( n )-xˆ ( n ) Y n-1 ) (3)  2 

where A2 is another appropriate scaling factor. Thus, substituting Eqs. (2) and (3) into (1), we get H –1 1 f X ( x ( n ) Y n ) = A exp  – --- ( y ( n ) – C ( n )x ( n ) ) Q 2 ( n ) ( y ( n ) – C ( n )x ( n ) )  2 H –1 1 – --- ( x ( n ) – xˆ ( n ) Y n-1 ) K ( n, n-1 ) ( x ( n ) – xˆ ( n ) Y n-1 )) 2

(4)

)

where A = A1A2 is another constant. (v) By definition, the MAP estimate of the state is defined by the condition ∂ ln f X ( x ( n ) Y n ) = 0 ----------------------------------------x ( n ) = xˆ MAP ( n ) ∂(x(n))

(5)

Hence, substituting Eq. (4) into (5) yields H

–1

–1

–1

–1

xˆ MAP ( n ) = [ C ( n )Q 2 ( n )C ( n ) + K ( n, n-1 ) ] [ K ( n, n-1 )xˆ ( n Y n-1 ) H

–1

+ C ( n )Q 2 ( n )y ( n ) ]

(6)

From a computational point of view, we need to put the first inverse matrix into a more convenient form. To that end, we apply the matrix inversion lemma, which may be stated as follows (see Section 9.2 of the text). If A = B

–1

–1 H

+ CD C

then A

–1

H

–1 H

= B – BC ( D + C BC ) C B H

–1

–1

With the expression [ C ( n )Q 2 ( n )C ( n ) + K ( n, n-1 ) ] we note the following B = K ( n, n-1 )

224

–1

as the issue of concern,

H

C = C (n) D = Q2 ( n ) Hence, applying the matrix inversion lemma: H

–1

–1

[ C ( n )Q 2 ( n )C ( n ) + K ( n, n-1 ) ]

–1

H

H

–1

= K ( n, n-1 )-K ( n, n-1 )C ( n ) ( Q 2 ( n )+C ( n )K ( n, n-1 )C ( n ) ) C ( n )K ( n, n-1 ) (7) Substituting Eq. (7) into (6), and then going through some lengthy but straightforward algebraic manipulations, we get xˆ MAP ( n ) = xˆ ( n Y n-1 ) + G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) ]

(8)

where G(n) is the Kalman gain defined by H

H

G ( n ) = F ( n+1, n )K ( n, n-1 )C ( n ) [ C ( n )K ( n, n-1 )C ( n ) + Q 2 ( n ) ]

–1

(9)

The one issue that is yet to be determined is K(n,n-1). Here we note ∈ ( n, n-1 ) = x ( n ) – xˆ ( n Y n-1 ) = F ( n, n-1 )x ( n-1 ) + v 1 ( n-1 ) – F ( n, n-1 )xˆ MAP ( n – 1 ) = F ( n, n-1 ) ∈

MAP

( n-1 ) + v 1 ( n-1 )

Therefore, H

K ( n, n-1 ) = F ( n, n-1 )K ( n-1 )F ( n, n-1 ) + Q 1 ( n-1 ) where K ( n ) = var[ ∈ MAP( n )] There only remains for us to determine K(n-1). Here we note ∈ MAP( n ) = x ( n ) – xˆ MAP ( n )

225

(10)

= x ( n ) + ( – xˆ ( n Y n-1 ) ) + G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) But y ( n ) = C ( n )x ( n ) + v 2 ( n ) Hence, noting that ∈ ( n, n-1 ) = x ( n ) – xˆ ( n Y n-1 ) , ∈ MAP( n ) =

∈ ( n, n-1 ) – G ( n ) [ C ( n ) ∈ ( n, n-1 ) + v 2 ( n ) ]

= [I – G ( n )C ( n )] ∈ ( n, n-1 ) – G ( n )v 2 ( n ) which yields K ( n ) = var[ ∈ MAP( n )] H

H

= [ I – G ( n )C ( n ) ]K ( n, n-1 ) [ I – G ( n )C ( n ) ] + G ( n )Q 2 ( n )G ( n ) After some manipulations, this formula reduces to K ( n ) = K ( n, n-1 ) – F ( n, n+1 )G ( n )C ( n )K ( n, n-1 )

(11)

The algorithm for computing the MAP estimate xˆ MAP ( n ) is now complete. It is made up of Eqs. (8) through (11). Indeed, comparing these equations with the Kalman filtering algorithm summarized in Table 10.2 of the text, we find that the MAP estimate xˆ MAP ( n ) is nothing but the filtered estimate xˆ ( n ) in the standard Kalman filter theory. (d) The second dedrivative of lnfX(x(n)|Yn) is given by the expression –1

K ( n, n-1 )xˆ ( n, n-1 ) which is always negative. Hence, the condition of Eq. (2) in Problem 10.10 is satisfied by the MAP estimate xˆ MAP ( n ) . 10.11 Given a system x ( n + 1 ) = Fx ( n ) y ( n ) = Cx ( n ) (a)

Show that

226

xˆ ( n Y n ) = F ( I – G ( n )C )xˆ ( n Y n-1 ) + CG ( n )y ( n ) α ( n ) = y ( n ) – Cxˆ ( n Y n-1 ) ˜ By definition, the innovation is α ( n ) = y ( n ) – yˆ ( n Y n-1 ) ˜ Following the same reasoning used to derive Eq. (10.31) of the text, we have yˆ ( n Y n-1 ) = Cxˆ ( n Y n-1 ) Therefore, α ( n ) = y ( n ) – Cxˆ ( n Y n-1 ) which confirms the second equation of the problem. ˜ Next, y ( n ) = Cxˆ ( n ) ⇒ α ( n ) = [ Cx ( n ) – xˆ ( n Y n-1 ) ] = Cε ( n, n-1 ) ˜ H

H

R ( n ) = E [ α ( n )α ( n ) ] = CE [ ε ( n, n-1 )ε ( n, n-1 ) ]C ˜ ˜ ˜

H

= CK ( n, n-1 )C

H

Similarly to Eq. (10.45) of the text, we can write: xˆ ( n Y n-1 ) = Fxˆ ( n Y n-1 ) + G ( n )α ( n ) ˜ H

–1

H

–1

G ( n ) = E [ x ( n+1 )α ( n ) ]R ( n ) = FK ( n, n-1 )C R ( n ) ˜ Here we have used Eqs. (10-44)-(10.49) of the text as a model. ε ( n+1,n ) = [ F – G ( n )C ]ε ( n, n-1 ) ˜ ˜ H

H

H

H

K ( n+1,n ) = E [ ε ( n+1,n )ε ( n+1,n ) ] = [ F – G ( n )C ]K ( n, n-1 ) [ F – C G ( n ) ] ˜ ˜ H = FK ( n )F –1

K ( n ) = ( I – F G ( n )C )K ( n, n-1 ) Similarly to Eq. (10.58) of the text,

227

–1

–1

xˆ ( n Y n ) = F xˆ ( n+1 Y n ) = xˆ ( n Y n-1 ) + F G ( n )α ( n ) ˜ –1

H

= xˆ ( n Y n-1 ) + ( I – F G ( n )C )K ( n, n-1 )F y ( n ) – Cxˆ ( n Y n-1 ) –1

–1

= xˆ ( n Y n-1 ) – F G ( n )C ( n Y n-1 ) + F G ( n )y ( n ) –1

–1

= ( I – F G ( n )C )xˆ ( n Y n-1 ) + F G ( n )y ( n ) (Note: An error was made in the first printing of the book.) (b)

The innovations sequence α(1), α(2),...,α(n) consists of samples that are orthogonal to each other; see Property 2, p.467 of the text. Orthogonality is synonymous with whitening. Moreover, invoking the very definition of the innovations process, we may refer to the Kalman filter as a whitening filter.

 H 10.12 K XY ( n ) = E  [ x ( n ) – xˆ ( n Y n-1 ) ] [ y ( n ) – yˆ ( n Y n-1 ) ]     H K YY ( n ) = E  [ y ( n ) – yˆ ( n Y n-1 ) ] [ y ( n ) – yˆ ( n Y n-1 ) ]    –1

Show that G f ( n ) = K XY ( n )K YY ( n ) The innovation vector α ( n ) = y ( n ) – yˆ ( n Y n-1 ) - by definition. H

Therefore, K YY ( n ) = E [ α ( n )α ( n ) ] = R ( n ) Also, by definition, –1

H

–1

G f ( n ) = F ( n+1, n )G ( n ) = F ( n, n+1 )E [ x ( n+1 )α ( n ) ]R ( n )

  H H H E [ x ( n+1 )α ( n ) ] = F ( n+1, n )E [ x ( n )α ( n ) ] = F ( n+1, n )E  [ x ( n )-xˆ ( n Y n-1 ) ]α ( n )  ˜ ˜ ˜   = F ( n+1, n )K XY ( n )

228

H

Here we have used the fact that the estimate xˆ ( n Y n-1 ) is orthogonal to α ( n ) . ˜ Therefore, adding it inside the expectation term does not change the expectation value. Using the results obtained: –1

–1

G f ( n ) = F ( n, n-1 )F ( n+1, n )K XY ( n )R ( n ) = K XY ( n )K YY ( n ) 10.13 Let us consider the unforced dynamical model x ( n+1 ) = F ( n+1, n )x ( n )

(1)

H

y ( n ) = u ( n )x ( n ) + v ( n )

(2)

1 where F ( n+1, n ) = ------- I λ From (1) we can see that x(n) = λ

–n ⁄ 2

x(0)

(3) H

(a) Considering a deterministic multiple regression model d ( n ) = e o ( n ) + w o u ( n ) , we can write:

...

...

...

 * H *  d ( 0 )=u ( 0 )w o + e o ( 0 )   d * ( 1 )=u H ( 1 )w + e * ( 1 ) o o     d * ( n )=u H ( n )w + e * ( n ) o o 

        

which represents a deterministic system of linear equations (b) From Eqs. (1), (2) and (3), we have:

229

  H   y ( 0 )=u ( 0 )x ( 0 ) + v ( 0 ) y ( 0 )=u ( 0 )x ( 0 ) + v ( 0 )   H –1 ⁄ 2 1⁄2 H –1 ⁄ 2 y ( 1 )=u ( 1 )x ( 0 )λ + v ( 0 )  ⇔  λ y ( 1 )=u ( 1 )x ( 0 )λ + v(0)     H –1 ⁄ 2 n⁄2 H –1 ⁄ 2 y ( n )=u ( n )x ( 0 )λ + v(0)   λ y ( n )=u ( n )x ( 0 )λ + v(0)   ...

...

...

...

...

...

H

which represents a stochastic system of linear equations. (c) Both the stochastic and the determinitic systems of linear simultaneous equations describe the same problem, and therefore should have a common solution. Comparing the two systems, we can set  x ( 0 )=w o   –n ⁄ 2 *  y ( n )=λ d (n)  – n ⁄ 2 *  v ( n )=λ eo ( n )  10.14 The reason for the RLS operating satisfactorily is the fact that the minimum mean-square error follows the recursion (see Eq. (9.30)) *

E min ( n ) = λE min ( n-1 ) + ξ ( n )e ( n ) Hence, with λ in the interval 0 < λ < 1, the stored value of the minimum mean-square error, Emin(n-1), is reduced by the factor λ as the minimum mean-square error is recursively updated. 10.15 Consider first the correspondence between xˆ ( 1 Y 0 ) prediction, we have xˆ ( n + 1 Y n ):

λ

–n ⁄ 2

ˆ ( 0 ) . For the one-step and w

ˆ (n) w

ˆ ( 0 ) in Therefore, setting n=0, we see that xˆ ( 1 Y 0 ) in the Kalman filter corresponds to w the RLS algorithm. Consider next the correlation matrix of the error in state prediction, which is defined by K(n):

λ-1P(n)

230

Putting n=0, we readily see that K(0) in the Kalman filter corresponds to λ-1P(0) in the RLS algorithm. 10.16 The condition number of K(n) is λ max χ ( K ) = -----------λ min Given that K(n) = U(n)D(n)UH(n) we may write H

χ ( K ) = χ ( UDU ) H

≤ χ ( U )χ ( D )χ ( U )

(1)

The eigenvalues of an upper triangular matrix are the same as the diagonal elements of the matrix. For the situation at hand, the upper triangular matrix has 1’s for all its diagonal elements. Hence, the λmax and λmin of U(n) are both equal to one, and so H

χ(U) = χ(U ) = 1 Accordingly, Eq. (1) simplifies to χ(D) ≥ χ(K) 10.17 Let K(n) = K1/2(n)KH/2(n) Hence, χ(K) ≤ χ(K

1⁄2

)χ ( K

H⁄2

) = (χ(K

1⁄2

))

2

The implication of this result is that the condition number of the square root K1/2(n) is the square root of the condition number of the original matrix K(n). 10.18 Adding the known vector d(n) to the state equation is equivalent to assuming that the state noise vector v1(n) has a mean equal to d(n). In general, we have

231

xˆ ( n + 1 Y n ) = F ( n + 1, n )xˆ ( n Y n ) + vˆ 1 ( n Y n ) If therefore v1(n) has a mean d(n), it follows that xˆ ( n + 1 Y n ) = F ( n + 1, n )xˆ ( n Y n ) + d ( n )

232

CHAPTER 11 11.1

To drive the square-root information filter, we proceed as follows. Let λ

1⁄2

K

–H ⁄ 2

( n-1 )

λ

A ( n ) = xˆ H ( n Y )K – H ⁄ 2 ( n-1 ) n-1 0

T

1⁄2

u(n)

y* ( n ) 1

Taking the Hermitian transpose:

H

A (n) =

λ

1⁄2

λ

K

–1 ⁄ 2

( n-1 )

1⁄2 H

u (n)

K

–1 ⁄ 2

( n-1 )xˆ ( n Y n-1 )

y(n)

0 1

Hence, postmultiplying A(n) by AH(n):

λK H A ( n )A ( n ) =

(

λ

λ

–1

H ( n-1 )+λu ( n )u ( n )

1⁄2

–1 xˆ ( n Y n-1 )K ( n-1 ) 1⁄2 H +λ u ( n )y* ( n )

1⁄2 H u (n)

λ

)

1 ⁄ 2 –1 1⁄2 K ( n-1 )xˆ ( n Y n-1 )+λ u ( n )y ( n ) –1 H λxˆ ( n Y n-1 )K ( n-1 )xˆ ( n-1 Y n )

y(n)

This result pertains to the pre-array of the square-root information filter. Consider next the post-array of the square-root information filter. Let

B(n) =

B 11 ( n ) b 21 ( n ) b 31 ( n ) 0

T

b 22 ( n ) b 32 ( n )

Taking the Hermitian transpose:

233

λ

1⁄2

u(n)

y* ( n )

1

H

B 11 ( n ) H

B (n) =

0

H

*

H

*

b 21 ( n ) b 22 ( n ) b 31 ( n ) b 32 ( n )

Hence, pre-multiplying B(n) by BH(n): H B 11 ( n )B 11 ( n ) H B ( n )B ( n )

= b ( n )B ( n ) 21 11 H b 31 ( n )B 11 ( n )

H B 11 ( n )b 21 ( n )

H H B 11 ( n )b 31 ( n )

H 2 b 21 ( n )b 21 ( n ) + b 22 ( n ) H * b 31 ( n )b 31 ( n ) + b 32 ( n )b 22 ( n )

H * b 21 ( n )b 31 ( n ) + b 22 ( n )b 32 ( n ) H 2 b 31 ( n )b 31 ( n ) + b 32 ( n )

Equating terms in A(n)AH(n) to corresponding terms in BH(n)B(n), we get, H

–1

H

1. B 11 ( n )B 11 ( n ) = λK ( n-1 ) + λu ( n )u ( n ) –1

= K (n) = K

–H ⁄ 2

( n )K

–1 ⁄ 2

(n)

Hence, H

B 11 ( n ) = K

–H ⁄ 2

(n)

H

2. B 11 ( n )b 21 ( n ) = λ

1⁄2

–1

K ( n-1 )xˆ ( n Y n-1 ) + λ

–1

= K ( n )xˆ ( n+1 Y n ) Hence, b 21 ( n ) = K H

–H ⁄ 2

( n )xˆ ( n+1 Y n )

3. B 11 ( n )b 31 ( n ) = λ

1⁄2

u(n)

Hence,

234

1⁄2

u ( n )y ( n )

b 31 ( n ) = λ

1⁄2

K

H⁄2

( n )u ( n )

H

4. b 31 ( n )b 31 ( n ) + b 32 ( n )

2

= 1

Hence, b 32 ( n )

2

H

= 1 – λu ( n )K ( n )u ( n ) –1

= r (n) That is, b 32 = r

–1 ⁄ 2

(n)

H

5. b 21 ( n )b 21 ( n ) + b 22 ( n )

2

H

–1

= xˆ ( n Y n-1 )K ( n-1 )xˆ ( n Y n-1 ) + y ( n )

2

Hence, b 22 ( n )

2

H

–1

= xˆ ( n Y n-1 )K ( n-1 )xˆ ( n Y n-1 ) + y ( n )

2

–1

– xˆ ( n+1 Y n )K ( n )xˆ ( n+1 Y n )

(1)

H

6. b 31 ( b )b 21 ( n ) + b *32 ( n )b 22 ( n ) = y ( n ) Hence, r

–1 ⁄ 2

( n )b 22 ( n ) = y ( n ) – λ

1⁄2 H

u ( n )K

1⁄2

( n )K

–1 ⁄ 2

( n )xˆ ( n+1 Y n )

That is, b 22 ( n ) = r

1⁄2

(n)[ y(n) – λ

1⁄2 H

u ( n )xˆ ( n+1 Y n ) ]

But, we know that xˆ ( n+1 Y n ) = λ

–1 ⁄ 2

xˆ ( n Y n-1 ) + α ( n )g ( n )

where g(n) is the Kalman gain and α(n) is the innovation. Therefore,

235

b 22 ( n ) = r = r = r

1⁄2

1⁄2 1⁄2

H

( n ) [ y ( n ) – u ( n )xˆ ( n Y n-1 ) – λ (n)[α(n) – λ

1⁄2

( n )α ( n ) [ 1 – λ

1⁄2

H

α ( n )u ( n )g ( n ) ]

H

α ( n )u ( n )g ( n ) ]

1⁄2 H

u ( n )g ( n ) ]

But, λ

1⁄2

–1

–1

g ( n ) = K ( n )u ( n ) ⋅ r ( n )

Therefore, b 22 ( n ) = r = r = r

1⁄2

H

–1

–1

( n )α ( n ) [ 1 – u ( n )K ( n )u ( n )r ( n ) ]

–1 ⁄ 2 –1 ⁄ 2

H

–1

( n )α ( n ) [ r ( n ) – u ( n )K ( n )u ( n ) ] ( n )α ( n )

(2)

where we have used r(n) = 1 + uH(n)K-1(n)u(n). Final Check In a laborious way, it can be shown that Eq. (1) is satisfied exactly by the value of b22(n) defined in Eq. (2). 11.2

Consider a linear dynamical system described by the state-space model: x ( n+1 ) = λ

–1 ⁄ 2

x(n)

(1)

H

y ( n ) = u ( n )x ( n ) + v ( n )

(2)

where v(n) is a Gaussian variable of zero mean and variance Q(n). The Kalman filtering algorithm for the model is described by: g(n) = λ

–1 ⁄ 2

–1

K ( n-1 )u ( n )R ( n )

H

R ( n ) = u ( n )K ( n-1 )u ( n ) + Q ( n )

236

H

α ( n ) = y ( n ) – u ( n )xˆ ( n Y n-1 ) xˆ ( n+1 Y n ) = λ

–1 ⁄ 2

xˆ ( n Y n-1 ) + g ( n )α ( n )

–1

K ( n ) = λ K ( n-1 ) – λ

–1 ⁄ 2

H

g ( n )u ( n )K ( n-1 )

Then, proceeding in a manner similar to that described in Chapter 11, we may formulate the extended square-root information filter for the state-space model of Eqs. (1) and (2) as follows: –1

–1

–1

H

K ( n ) = λ [ K ( n-1 ) + Q ( n )u ( n )u ( n ) ] –1

K ( n )xˆ ( n+1 Y n ) = λ

–1 ⁄ 2

(3)

–1

–1

[ K ( n-1 )xˆ ( n Y n-1 ) + Q ( n )u ( n )y ( n ) ]

(4)

where g(n) is the Kalman gain (vector). Thus, in light of Eqs. (3) and (4), we may formulate the following array structure for the square-root information filter:

λ

1 ⁄ 2 –H ⁄ 2

K

( n-1 )

H

xˆ ( n Y n-1 )K T 0

–H ⁄ 2

λ ( n-1 )

Q

–1 ⁄ 2 –1 ⁄ 2

Q

–1 ⁄ 2

Q

( n )u ( n )

–H ⁄ 2

(n)

0

Θ ( n ) = xˆ H ( n+1 Y )K – H ⁄ 2 ( n ) n

( n )y ( n )

–1 ⁄ 2

K

λ

(n)

–1 ⁄ 2 –1

Q

H

( n )u ( n )K

R

1⁄2

(n)

–1 ⁄ 2

R

( n )α * ( n )

–1 ⁄ 2

(n)

This equation includes an ordinary square-root information filter as a special case. Specifically, putting Q(n) = 0 we get the pre-array-to-post-array transformation for that filter. 11.3

(a) Verify the expression for the extended square-root information filter given by λ

1 ⁄ 2 –H ⁄ 2 K ( n-1 )

xˆ ( n Y n-1 ) K 0 λ

–H ⁄ 2

( n-1 )

T

1⁄2 1⁄2 K ( n-1 )

λ

1⁄2

u(n)

*

K

–H ⁄ 2

( n-1 )

y (n)

xˆ ( n+1 Y n ) K Θ ( n )=

1

λ

0

237

( n-1 )

1⁄2 H 1⁄2 u (n)K (n) K

Let

–H ⁄ 2

0

1⁄2

(n)

r

1⁄2

r

*

( n )α ( n )

1⁄2

– g ( n )r

(n)

1⁄2

(n)

λ

1 ⁄ 2 –H ⁄ 2 K ( n-1 )

xˆ ( n Y n-1 ) K A(n) = 0 λ

–H ⁄ 2

u(n)

*

( n-1 )

y (n) 1

1 ⁄ 2 –1 ⁄ 2 ( n-1 ) K

H

B (n) =

1⁄2

T

B 11 ( n ) H

λ

0

0

H

b 22 ( n )

*

H

b 32 ( n )

H

B 42 ( n )

b 21 ( n )

*

b 31 ( n )

H

B 41 ( n )

Then, equating the matrix product A(n) AH(n) to the matrix product BH(n) B(n), and comparing their corresponding terms, we may say the following: •

Points 1 through 6 discussed in the solution to Problem 11.1 remain valid. Accordingly, the entries B11(n), b21(n), b31(n), b22(n), and b32(n) will all have the values determined there.



We have the following additional equations to consider: H

H

1. B 41 ( n )B 11 ( n ) = I since B 11 ( n ) = K H

B 41 ( n ) = K

1⁄2

–1 ⁄ 2

( n ) , it follows that

(n)

H

(1)

H

2. B 41 ( n )b 31 ( n ) + B 42 ( n )b 32 ( n ) = 0 Hence, H

H

B 42 ( n ) = B 41 ( n )b 31 ( n ) ⁄ b 32 ( n ) Since b 32 ( n ) = r

–1 ⁄ 2

H

( n ) and b 31 ( n ) = λ

238

1⁄2 H

u ( n )K

1⁄2

(n) ,

it follows that H

B 42 ( n ) = λ

1⁄2 1⁄2

= –λ

r

( n )K

1⁄2 1⁄2

r

1⁄2

( n )K

–H ⁄ 2

( n )u ( n )

( n )K ( n )u ( n )

(2)

But from Kalman filter theory for the dynamical system being considered here, we know that λ

1⁄2

K ( n )u ( n ) = g ( n )

Accordingly, we may simplify Eq. (2) to H

B 42 ( n ) = – r

1⁄2

( n )g ( n )

where g(n) is the Kalman gain. This completes the evaluation of the post-array for the extended square-root information filter. 3. In addition to the two equations described above, we have several other relations that follow from A(n) AH(n) = BH(n) B(n); these additional relations merely provide a means to check the values already determined for the entries of the post-array. (b) Using results from part (a), derive the extended QR-RLS algorithm. In order to derive the extended QR-RLS algorithm we make use of the following relationships between the Kalman and RLS variables: K

–1

( n ) → λΦ ( n )

α(n) → λ

–1

r (n) → γ (n) g(n) → λ

–1 ⁄ 2

y(n) → λ

k(n)

–n ⁄ 2 *

ξ (n)

–n ⁄ 2 *

d (n)

xˆ ( n Y n-1 ) → λ

–n ⁄ 2

ˆ ( n-1 ) w

Therefore, the extended QR-RLS could be written as: λ λ

1⁄2

Φ

(n)

1⁄2 H

p ( n-1 ) 0

λ

1⁄2

–1 ⁄ 2

Φ

T

–H ⁄ 2

( n-1 )

u(n)

Φ

1⁄2

(n)

H

d ( n ) Θ ( n )= p (n) H –H ⁄ 2 1 u ( n )Φ (n) Φ

0

–H ⁄ 2

(n)

0 ξ ( n )γ γ

1⁄2

1⁄2

– k ( n )γ

(n)

(n)

–1 ⁄ 2

(n)

where we used the same reasoning as while obtaining equations (11.40)-(11.47)

239

ˆ (n) = w ˆ ( n-1 ) + [ k ( n )γ w 11.4

–1 ⁄ 2

( n ) ] [ ξ ( n )γ

–1 ⁄ 2

(n)]

*

We start with Q ( n )A ( n ) = R ( n ) O Let

Q(n) =

Q1 ( n ) Q2 ( n )

Hence, Q1 ( n )

A(n) = R(n) Q2 ( n ) O H H A ( n ) = [ Q1 , Q2 ( n ) ] R ( n ) O H

= Q 1 ( n )R ( n ) H

H

A ( n ) = R ( n )Q 1 ( n ) The projection matrix is therefore –1 H

H

P ( n ) = A ( n ) ( A ( n )A ( n ) ) A ( n ) H

H

H

–1 H

H

= Q 1 ( n )R ( n ) ( R ( n )Q 1 ( n )Q 1 ( n )R ( n ) ) R ( n )Q 1 ( n ) H

Since Q 1 ( n )Q 1 ( n ) = I , we have H

H

–1 H

P ( n ) = Q 1 ( n )R ( n ) ( R ( n )R ( n ) ) R ( n )Q 1 ( n )

240

We also note that for an upper triangular matrix R(n), –1 H

H

R ( n ) ( R ( n )R ( n ) ) R ( n ) = identity matrix Hence, H

P ( n ) = Q 1 ( n )Q 1 ( n ) 11.5

In a prediction-error filter, the input u(n) represents the desired response and the tap inputs u(n-1),...,u(n-M) represent the variables used to estimate u(n). Hence, we may restructure the inputs of the systolic array in Fig. 11.4 of the text in the following manner so that it operates as a prediction-error filter (illustrated here for order M = 3): 1.0 u(3), u(2), u(1)

u(2) u(1),

u(1)

u(4)

0

u(3,

. .

0

.

0

u(2) fm(n)

Prediction-error filter 11.6

We are given that H

R ( n )a ( n ) = s Hence, a(n) = R

–H

( n )s

We note that RH(n) is a lower triangular matrix. Hence, given R(n) and s, the vector a(n) may be computed by means of a linear section using forward substitution.

241

11.7

The output of the last interval cell in the bottom row of the triangular section is given by (see Fig. 11.2(a) of the text): * 1⁄2

u out = c u in – s λ

x

where c and s are the Givens parameters, uin is the input to the cell and x is the stored value to the cell. At time n, we have u in ( n ) = d ( n ) *

s ( n )λ

1⁄2

H

ˆ ( n-1 )u ( n ) x ( n-1 ) = c ( n )w

ˆ ( n-1 ) is the previous where d(n) is the desired response, u(n) is the input vector, and w value of the least-squares weight vector. Hence, H

ˆ ( n-1 )u ( n ) u out = c ( n )d ( n ) – c ( n )w H

ˆ ( n-1 )u ( n ) ] = c(n)[d (n) – w Recognizing that c(n) = γ

1⁄2

(n)

and H

ˆ ( n-1 )u ( n ) = ξ ( n ) , d(n) – w we finally get u out = γ 11.8

1⁄2

( n )ξ ( n ) .

(a) We note that H

R ( n )a ( n ) = s ( φ ) Hence, a(n) = R

–H

( n )s ( φ )

242

Taking Hermitian transpose: H

H

–1

a ( n ) = s ( φ )R ( n ) (b) For an MVDR beamformer: a(n) ˆ ( n ) = -------------------------R ( n )w H a ( n )a ( n ) Hence, –1

R ( n )a ( n ) ˆ ( n ) = ----------------------------w H a ( n )a ( n ) We note that aH(n)a(n) is a scalar. With R(n) being an upper triangular matrix, it follows that the linear section performs backward substitution. The resulting output of ˆ (n) . this section is the weight vector w 11.9

Reformulating the prearray of the extended QR-RLQ algorithm in Problem 2 of Chapter 11 by making use of the correspondences in Table 11.3, we may write λ

1⁄2

λ 0

Φ

1⁄2

( n-1 )

u(n)

1⁄2 H

a ( n-1 )

0

T

Φ

1⁄2

(n)

0

Θ ( n ) = aH ( n ) H

u ( n )Φ

1

– e′ ( n )γ –H ⁄ 2

(n)

γ

1⁄2

–1 ⁄ 2

(n)

(n)

Squaring both sides of this equation, and retaining the terms on the second rows: λ a ( n-1 )

2

= a(n)

The term e′ ( n ) ⁄ γ λ a ( n-1 )

2

1⁄2

= a(n)

2



–1

( n ) e′ ( n )

2

( n ) is recognized as an estimation error denoted by ε(n). Hence, 2

+ ε(n)

2

11.10 The standard RLS filter is the covariance version of the Kalman filter. The inverse QRRLS filter is the square-root version of the covariance Kalman filter. It follows therefore that the inverse QR-RLS filter is the square-root RLS filter.

243

CHAPTER 12 12.1

2 2 2 1 J fb, m = --- ( E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] ) ( 1 + κ m ) 2 *

*

*

+ κ m E [ f m-1 ( n )b m-1 ( n-1 ) ] + κ m E [ b m-1 ( n-1 ) f m-1 ( n ) ] Differentiating with respect to a complex variable: ∂J fb, m 2 2 ----------------- = κ m ( E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] ) ∂κ m *

*

+ E [ b m-1 ( n-1 ) f m-1 ( n ) ] + E [ b m-1 ( n-1 ) f m-1 ( n ) ] 2

2

= κ m ( E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] ) *

+ 2E [ b m-1 ( n-1 ) f m-1 ( n ) ] 12.2

(a) Suppose we write f m ( i ) = f m-1 ( i ) + κˆ m ( n )b m-1 ( i-1 ) b m ( i ) = b m-1 ( i ) + κˆ m ( n ) f m-1 ( i ) Then substituting these relations into Burg’s formula: n

2 ∑ b m-1 ( i-1 ) f m-1 ( i ) *

i=1 κˆ m ( n ) = – -----------------------------------------------------------------n 2 2 ∑ f m-1 ( i ) + bm-1 ( i-1 ) i=1 n

2 ∑ b m-1 ( i-1 ) f m-1 ( i ) + 2b m-1 ( n-1 ) f m-1 ( n ) *

*

i=1 = – -----------------------------------------------------------------------------------------------------------, 2 2 E m-1 ( n-1 ) + f m-1 ( n ) + b m-1 ( n-1 )

244

Cross-multiplying and proceeding in a manner similar to that described after Eqs. (12.12) and (12.13) in the text, we finally get *

*

f m-1 ( n )b m ( n ) + b m-1 ( n-1 ) f m ( n ) κˆ m ( n ) = κˆ m ( n-1 ) – ---------------------------------------------------------------------------------E m-1 ( n-1 ) (b) The algorithm so formulated is impractical because to compute the updated forward prediction error fm(n) and backward prediction error bm(n) we need to know the updated κˆ m ( n ) . This is not possible to do because the correction term for κˆ m ( n ) requires knowledge of fm(n) and bm(n). 12.3

For the transversal filter of Fig. 12.6 in the text we have: tap-weight vector

= kM(n)

tap-input vector

= uM(i),

 desired response, d(i) =  1  0

i = 1,2,...,n i=n i = n-1,…, 1

The a posteriori estimation error equals H

e ( i ) = d ( i ) – k M ( n )u M ( i ),

i = 1, 2, …, n

The deterministic cross-correlation vector φ ( n ) equals n

φ(n) =

∑λ

n-i

uM d *( i )

i=1

= uM ( n ) We also note that n

E d(n) =

∑λ

n-i

d(i)

2

= 1

i=1

Hence, the sum of weighted error squares equals

245

H

ˆ (n) E min ( n ) = E d ( n ) – φ ( n )w H

= 1 – u M ( n )k m ( n ) H

We note that the inner product u M ( n )k m ( n ) is a real-valued scalar. Hence, H

E min ( n ) = 1 – k m ( n )u m ( n ) = γ M (n) 12.4

We start with H

Φ m ( n ) = λΦ m ( n-1 ) + u m ( n )u m ( n ) where u M ( n ) is the tap-input vector, Φ m ( n-1 ) is the past value of the deterministic correlation matrix, and Φ m ( n ) is its present value. Hence, H

λΦ m ( n-1 ) = Φ m ( n ) – u m ( n )u ( n ) m H

–1

= Φ m ( n ) [ I – u m ( n )u ( n )Φ m ( n ) ] m where I is the identity matrix. Hence, taking the determinants of both sides: H m

–1

λdet [ Φ m ( n-1 ) ] = det [ Φ m ( n ) ]det [ I – u m ( n )u ( n )Φ m ( n ) ] But, H m

–1

–1

H

det [ I – u m ( n )u ( n )Φ m ( n ) ] = det I – u m ( n )Φ m ( n )u m ( n ) H

–1

= 1 – u m ( n )Φ m ( n )u m ( n ) = γ m(n) We may therefore rewrite Eq. (1) as

246

(1)

λdet [ Φ m ( n-1 ) ] = det [ Φ m ( n ) ]γ m ( n ) Hence, we may express the conversion factor γ m ( n ) as det [ Φ m ( n-1 ) ] γ m ( n ) = λ ----------------------------------det [ Φ m ( n ) ] 12.5

(a) The (m+1)-by-(m+1) correlation matrix Φ m+1 may be expressed in the form H

U (n)

Φ m+1 ( n ) =

φ1 ( n )

(1)

φ 1 ( n ) Φ m ( n-1 )

Define the inverse of this matrix as H

α1 β1 β1 Γ1

–1

Φ m+1 ( n ) =

(2)

Hence, from Eqs. (1) and (2): –1

I m+1 = Φ m+1 ( n )Φ m+1 ( n ) U (n)

H

φ1 ( n )

H

α1 β1 = φ 1 ( n ) Φ m ( n-1 ) β 1 Γ 1 H

=

H

H

U ( n )α 1 + φ 1 ( n )β 1

U ( n )β 1 + φ 1 ( n )Γ 1

φ 1 ( n )α 1 + Φ m ( n-1 )β 1

φ 1 ( n )β 1 ( n ) + Φ m ( n-1 )Γ 1

H

From this relation we deduce the following four equations: H

U ( n )α 1 + φ 1 ( n )β 1 = 1 H

(3)

H

U ( n )β 1 + φ 1 ( n )Γ 1 = 0

(4)

247

φ 1 ( n )α 1 + Φ m ( n-1 )β 1 = 0

(5)

H

φ 1 ( n )β 1 + Φ m ( n-1 )Γ 1 = I m

(6)

Eliminate β1 between Eqs. (3) and (5): H

–1

U ( n )α 1 – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )α 1 = 1 Hence, 1 α 1 = -----------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )

(7)

which is real-valued. Correspondingly, –1

β 1 = – Φ m ( n-1 )φ 1 ( n )α 1 –1

– Φ m ( n-1 )φ 1 ( n ) = -----------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )

(8)

From Eq. (6): –1

–1

H

Γ 1 = Φ m ( n-1 ) – Φ m ( n-1 )φ 1 ( n )β 1 –1

=

–1 Φ m ( n-1 ) +

H

–1

Φ m ( n-1 )φ 1 ( n )φ 1 ( n )Φ m ( n-1 ) ----------------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )

Check: Substitute Eqs. (8) and (9) into the left hand side of Eq. (4):

H U ( n )β 1

H + φ 1 ( n )Γ 1

H –1   – φ 1 ( n )Φ m ( n-1 )  = U ( n ) ------------------------------------------------------------------------   H –1  U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n ) H

–1

+ φ 1 ( n )Φ m ( n-1 )

248

(9)

H

–1

H

–1

φ 1 ( n )Φ m ( n-1 )φ 1 ( n )φ 1 ( n )Φ m ( n-1 ) + --------------------------------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n ) = 0 This agrees with the right-hand side of Eq. (4). From Eq. (7) we note that 1 α 1 = --------------Fm(n) From Eq. (8) we note that ˆ f (n) w β 1 = – --------------Fm(n) From Eq. (9) we note that

Γ1 =

H

T

0

0m

0m

Φ m ( n-1 )

ˆ f (n) ˆ f ( n )w w + -------------------------------Fm(n)

–1

Hence, we may express the inverse matrix of Eq. (1) as follows:

–1 Φ m+1 ( n )

=

=

0m

0m

Φ m ( n-1 )

0

0m

0m

=

T

0

–1

1 + --------------Fm(n)

H

1

ˆ f (n) –w

ˆ f (n) –w

ˆ f ( n )w ˆ f (n) w

T

H

1 H 1 ˆ f (n)] + --------------[ 1, – w F m ( n ) –w ˆ f (n) –1 Φ m ( n-1 ) T

0

0m

0m

Φ m ( n-1 )

–1

H 1 + ---------------a m ( n )a m ( n ) Fm(n)

249

(b) Consider next the second form of the (m+1)-by-(m+1) correlation matrix Φ m+1 ( n ) given by

Φ m+1 ( n ) =

Φm ( n )

φ2 ( n )

H φ2 ( n )

U ( n-m )

(10)

Define the inverse of this matrix as

–1

Φ m+1 ( n ) =

Γ2

β2

H

α2

β2

(11)

Using Eqs. (10) and (11): –1

I m+1 = Φ m+1 ( n )Φ m+1 ( n )

=

Φm ( n ) H

φ2 ( n )

φ2 ( n )

Γ2

β2

H

α2

U ( n-m ) β 2 H

=

Φ m ( n )Γ 2 + φ 2 ( n )β 2 H

H

φ 2 ( n )Γ 2 + U ( n-m )β 2

Φ m ( n )β 2 + φ 2 ( n )α 2 H

φ 2 ( n )β 2 + U ( n-m )α 2

We thus deduce the following four relations: H

Φ m ( n )Γ 2 + φ 2 ( n )β 2 = I m

(12)

Φ m ( n )β 2 + φ 2 ( n )α 2 = 0

(13)

H

H

φ 2 ( n )Γ 2 + U ( n-m )β 2 = 0

(14)

H

φ 2 ( n )β 2 + U ( n-m )α 2 = 1

(15)

Eliminate β2 between Eqs. (13) and (15):

250

H

–1

– φ 2 ( n )Φ m ( n )φ 2 ( n )α 2 + U ( n-m )α 2 = 1 Hence, 1 α 2 = -------------------------------------------------------------------------------------------H –1 –1 U ( n-m ) – φ 2 ( n )Φ m ( n )Φ m ( n )θ 2 ( n )

(16)

Correspondingly, β2 equals –1

β 2 = – Φ m ( n )φ 2 ( n )α 2 –1

Φ m ( n )φ 2 ( n ) = – -------------------------------------------------------------------------H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n )

(17)

Substitute Eq. (17) in (12) and solve for Γ 2 : –1

–1

H

Γ 2 = Φ m ( n ) – Φ m ( n )φ 2 ( n )β 2 –1

=

–1 Φm ( n )

H

–1

Φ m ( n )φ 2 ( n )φ 2 ( n )Φ m ( n ) + -------------------------------------------------------------------------H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n )

Check: Substitute Eqs. (17) and (18) into the left-hand side of Eq. (14): H –1 H –1 φ 2 ( n )Φ m ( n )φ 2 ( n )φ 2 ( n )Φ m ( n ) H –1 φ 2 ( n )Φ m ( n ) + ---------------------------------------------------------------------------------H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n ) H

–1

φ 2 ( n )Φ m ( n ) – U ( n-m ) -------------------------------------------------------------------------- = 0 H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n ) which agrees with the right-hand side of Eq. (14). Next, we note that

251

(18)

1 α 2 = --------------Bm ( n ) g(n) β 2 = – --------------Bm ( n ) –1

Γ2 =

Φm ( n ) 0 0

T

0

H 1 + ---------------w b ( n )w b ( n ) Bm ( n )

–1

Hence, we may express the inverse matrix Φ m+1 ( n ) in the alternative form:

–1 Φ m+1 ( n )

–1

=

0

1 w ( n )w b ( n ) + --------------- b Bm ( n ) –wb 0

T

–1

=

1

–wb ( n ) H 1 + --------------[ – w b ( n ), 1 ] Bm ( n ) 1 0

T

–1

Φm ( n ) 0 0

12.6

–wb ( n )

Φm ( n ) 0 0

=

H

Φm ( n ) 0

T

0

H 1 + ---------------c m ( n )c m ( n ) Bm ( n )

(a) From the solution to part (a) of Problem 12.5, we have

–1 Φ m+1 ( n )

=

T

0

0m

0m

–1 Φ m ( n-1 )

H 1 + ---------------a m ( m )a m ( n ) Fm(n)

(1)

Correspondingly, the input vector um+1(n) is partitioned as follows: u m+1 ( n ) =

u(n) u m+1 ( n-1 )

(2)

From the definition of the conversion factor, we have

252

H

–1

γ m+1 ( n ) = 1 – u m+1 ( n-1 )Φ m+1 ( n )u m+1 ( n )

(3)

Therefore, substituting Eqs. (1) and (2) into (3): H

–1

γ m+1 ( n ) = 1 – u m ( n-1 )Φ m ( n-1 )u m ( n-1 ) H H 1 – ---------------u m+1 ( n )a m ( m )a m ( n )u m+1 ( n ) Fm(n) 2

f m(n) = γ m+1 ( n-1 ) – --------------------Fm(n)

(4)

where we have used the fact that H

f m ( n ) = a m ( n )u m+1 ( n ) (b) From the solution to part (b) of Problem 12.5, we have –1

–1 Φ m+1 ( n )

Φm ( n ) 0m H 1 = + ---------------c m ( n )c m ( n ) Bm ( n ) T 0m 0

(5)

This time, we partition the input vector um+1(n) as follows: u m+1 ( n ) =

um ( n )

(6)

u(n – m)

Therefore, substituting Eqs. (5) and (6) into (3): H

–1

γ m+1 ( n ) = 1 – u m ( n )Φ m ( n )u m ( n ) H H 1 – ---------------u m+1 ( n )c m ( n )c m ( n )u m+1 ( n ) Bm ( n ) 2

= γ m ( n ) – bm ( n ) ⁄ Bm ( n )

253

(7)

where we have made use of the fact that H

b m ( n ) = c m ( n )u m+1 ( n ) (c) We invoke the following property of the conversion factor: f m(n) γ m ( n-1 ) = --------------ηm ( n )

(8)

Also, we note that F m ( n ) = λF m ( n-1 ) + η m ( n ) f *m ( n )

(9)

Therefore eliminating η m ( n ) between Eqs. (8) and (9): 2

f m(n) F m ( n ) = λF m ( n-1 ) + --------------------γ m ( n-1 ) Finally, eliminating f m ( n )

2

(10)

between Eqs. (4) and (10):

γ m ( n-1 ) γ m+1 ( n ) = γ m ( n-1 ) – -------------------- [ F m ( n ) – λF m ( n-1 ) ] F (n) m

= λF m ( n-1 )γ m ( n-1 ) ⁄ F m ( n ) (d) Next, we invoke another property of the conversion factor: bm ( n ) γ m ( n ) = --------------βm ( n )

(11)

We also note that B m ( n ) = λB m ( n-1 ) + b *m ( n )β m ( n ) Therefore, eliminating β m ( n ) between Eqs. (11) and (12):

254

(12)

2

B m ( n ) = λB m ( n-1 ) + b m ( n ) ⁄ γ m ( n ) Eliminating b m ( n )

2

(13)

between Eqs. (7) and (13):

γ m(n) γ m+1 ( n ) = γ m ( n ) – --------------- [ B m ( n ) – λB m ( n-1 ) ] B (n) m

= λB m ( n-1 )γ m ( n ) ⁄ B m ( n ) n

12.7

(a) F m ( n ) =

∑λ

n-i

f m(i)

2

i=1 n

=

∑λ

n-i

f m(i) f m(i)

n-i

ˆ f , m ( n )u m ( i-1 ) ] f m ( i ) [u(i) – w

n-i

* u(i) f m(i) –

n-i

u(i) f m(i)

n-i

ˆ f , m ( n-1 )u m ( i-1 ) ] f m ( i ) [ ηm ( i ) + w

n-i

* ηm ( i ) f m ( i )

n-i

ηm ( i ) f m ( i )

*

i=1 n

=

∑λ

H

*

i=1 n

=

∑λ i=1 n

=

∑λ

n

 H  n-i * ˆ f , m ( n ) ∑ λ u m ( i-1 ) f m ( i ) = 0 w   i=1

*

i=1 n

=

∑λ

H

*

i=1 n

=

∑λ i=1 n

=

∑λ

n

 H  * ˆ f , m ( n-1 ) ∑ u m ( i-1 ) f m ( i ) = 0 + w   i=1

*

i=1

255

n

=

∑λ

n-i

*

*

ηm ( i ) f m ( i ) + ηm ( n ) f m ( n )

i=1 *

= λF m ( n-1 ) + η m ( n ) f m ( n ) (b) Following a similar procedure, we may use the relations n

Bm ( n ) =

∑λ

n-i

bm ( i )

2

i=1

and n

∑λ

n-i

*

u m ( i )b m ( i ) = 0

i=1

to derive the recursion *

B m ( n )λB m ( n-1 ) + β m ( n )b m ( n ) 12.8

(a) We start with n

∆ m-1 ( n ) =

∑λ

n-i

*

b m-1 ( i-1 ) f m-1 ( i )

i=1 n

=

∑λ

n-i

H

*

ˆ b, m-1 ( n-2 )u m-1 ( i-1 ) ] f m-1 ( i ) [u(i – m) – w

i=1 n

=

∑λ

n-i

* m ) f m-1 ( i )

u(i –

n-i

u ( i – m ) f m-1 ( i )

i=1 n

=

∑λ

n

 H  * ˆ b, m-1 ( n-2 ) ∑ u m-1 ( i-1 ) f m-1 ( i ) = 0 – w   i=1

*

i=1

256

n

=

∑λ

H

*

n-i

ˆ b, m-1 ( n-1 )u m-1 ( i ) ] f m-1 ( i ) [ β m-1 ( i-1 ) + w

n-i

* β m-1 ( i-1 ) f m-1 ( i ) +

n-i

β m-1 ( i-1 ) f m-1 ( i )

n-i

β m-1 ( i-1 ) f m-1 ( i ) + β m-1 ( n-1 ) f m-1 ( n )

i=1 n

=

∑λ i=1 n

=

∑λ

n

 H  * ˆ b, m-1 ( n-1 ) ∑ u m-1 ( i-1 ) f m-1 ( i ) = 0 w   i=1

*

i=1 n

=

∑λ

*

*

i=1 *

= λ∆ m-1 ( n-1 ) + β m-1 ( n-1 ) f m-1 ( n )

(1)

Comparing this equation with Eq. (12.69) of the text, we deduce the equivalence *

*

η m-1 ( n )b m-1 ( n-1 ) = β m-1 ( n-1 ) f m-1 ( n )

(2)

(b) Applying Eqs. (12.48) and (12.49) of the text, we may write *

* η m-1 ( n )

f m-1 ( n ) = -----------------------γ m-1 ( n-1 )

(3)

b m-1 ( n-1 ) = γ m-1 ( n-1 )β m-1 ( n-1 )

(4)

Multiplying Eqs. (3) and (4): *

* η m-1 ( n )b m-1 ( n-1 )

f m-1 ( n ) = ------------------------ ⋅ γ m-1 ( n-1 )β m-1 ( n-1 ) γ m-1 ( n-1 ) *

= f m-1 ( n )β m-1 ( n-1 ) which proves Eq. (2)

257

12.9

(a) Let Φ m+1 ( n ) denote the (m+1)-by-(m+1) correlation matrix of the tap-input vector ˜ um+1(i) applied to the forward prediction error filter of order m, where 1 < i < n. Let am(n) denote the tap-weight vector of this filter, and Fm(n) denote the corresponding sum of weighted prediction-error squares. We may characterize this filter by the augmented normal equations

Φ m+1 ( n )a m ( n ) = ˜

F m(n)

(1)

0m

where 0m is the m-by-1 null vector. The correlation matrix Φ m+1 ( n ) may be ˜ partitioned in two different ways, depending on how we interpret the first or last element of the tap-input vector um+1(i). The form of partitioning that we like to use first is the one that enables us to relate the tap-weight vector am(n), pertaining to prediction order m, to the tap-weight vector am-1(n), pertaining to prediction order m-1. This aim is realized by using

Φ m+1 ( n ) = ˜

Φm ( n ) ˜ H

φ2 ( n ) ˜

φ2 ( n ) ˜

(2)

U 2(n)

where Φ m ( n ) is the m-by-m correlation matrix of the tap-input vector u m ( i ), φ 2 ( n ) is ˜ the m-by-1 cross-correlation vector between um(i) and u(i-m), and U2(n) is the˜ sum of weighted squared values of the input u(i-m) for 1 < i < n. Note that U2(n) is zero for n - m < 0. We postmultiply both sides of Eq. (2) by an (m+1)-by-1 vector whose first m elements are defined by the vector am-1(n) and whose last element equals zero. We may thus write

Φ m+1 ( n ) ˜

a m-1 ( n ) 0

Φm ( n ) = ˜ H φ2 ( n ) ˜

φ2 ( n ) a m-1 ( n ) ˜ 0 U 2(n)

Φ m ( n )a m-1 ( n ) = ˜ H φ 2 ( n )a m-1 ( n ) ˜

(3)

Both Φ m ( n ) and am-1(n) have the same time argument n. Furthermore, in the first line ˜ of Eq. (3), they are both positioned in such a way that when the matrix multiplication

258

is performed Φ m ( n ) becomes postmultiplied by am-1(n). For a forward prediction˜ error filter of order m-1, evaluated at time n, the set of augmented normal equations defined in Eq. (1) takes the form

Φ m ( n )a m-1 ( n ) = ˜

F m-1 ( n ) 0 m-1

Define the scalar H

∆ m-1 ( n ) = φ 2 ( n )a m-1 ( n ) ˜

(4)

Accordingly, we may rewrite Eq. (3) as

Φ m+1 ( n ) ˜

a m-1 ( n ) 0

F m-1 ( n ) =

(5)

0 m-1 ∆ m-1 ( n )

(b) For a definition of ∆ m-1 ( n ) , we have n

∆ m-1 ( n ) =

∑λ

n-i * f m-1 ( i )b m-1 ( i-1 )

(6)

i=1

For another definition of this same quantity, we have H

∆ m-1 ( n ) = φ 2 ( n )a m-1 ( n ) ˜

(7)

where n

φ2 ( n ) = ∑ λ ˜ i=1

n-i

*

u m ( i )u ( i-1 )

(8)

To show that these two definitions are equivalent, we first substitute Eq. (8) into (7): n

∆ m-1 ( n ) =

∑λ

n-i H u m ( i )a m-1 ( n )u ( i-m )

i=1

259

(9)

From the definition of the forward prediction error, we have H

f m-1 ( i ) = a m-1 ( n )u m ( i ) ,

1≤i≤n

We may therefore rewrite Eq. (9) as n

∆ m-1 ( n ) =

∑λ

n-i * f m-1 ( i )u ( i-m )

(10)

i=1

Next, from the definition of the backward prediction error, we have m-1

b m-1 ( i ) = u ( i-m ) –

∑ wb, m-1,k ( n )u ( i-k ) *

(11)

k=1

where w b, m-1,k ( n ) is the kth element of the backward predictor’s coefficient vector. Therefore, eliminating u(i-m) between Eqs. (10) and (11): n

∆ m-1 ( n ) =

∑λ

n-i * f m-1 ( i )b m-1 ( i )

i=1 n

+∑

n

∑λ

n-i * * w b, m-1,k ( n ) f m-1 ( i )u ( i-k )

(12)

i=1 k=1

But, the tap inputs u(i-1),u(i-2),...,u(i-m+1) are the very inputs involved in computing the forward prediction error. From the principle of orthogonality, we therefore have m-1

∑λ

n-i * f m-1 ( i )u ( i-k )

= 0

for all i

k=1

That is, n

∆ m-1 ( n ) =

∑λ

n-i * f m-1 ( i )b m-1 ( i )

i=1

H

= φ 2 ( n )a m-1 ( n ) ˜

which is the desired result.

260

(c) Consider next the backward prediction-error filter of order m. Let cm(n) denote its tapweight vector, and Bm(n) denote the corresponding sum of weighted prediction-error squares. This filter is characterized by the augmented normal equations written in the matrix form: 0m Φ m+1 ( n )c m ( n ) = ˜ B m(n)

(13)

where Φ m+1 ( n ) is as defined previously, and 0m is the m-by-1 null vector. This time ˜ we use the other partitioned form of the correlation matrix Φ m ( n ), as shown by ˜

Φ m+1 ( n ) = ˜

U 1(n) φ1 ( n ) ˜

H

φ1 ( n ) ˜

(14)

Φ 1 ( n-1 ) ˜

where U1(n) is the sum of weighted squared values of the input u(i) for the time interval 1 < i < n, φ 1 ( n ) is the m-by-1 cross-correlation vector between u(i) and the ˜ tap-input vector um(i-1), and Φ 1 ( n-1 ) is the m-by-m correlation matrix of um(i-1). ˜ Correspondingly, we postmultiply Φ m+1 ( n ) by an (m+1)-by-1 vector whose first ˜ element is zero and whose m remaining elements are defined by the tap-weight vector cm-1(n-1) that pertains to a backward prediction-error filter of order m-1. We may thus write H

U 1(n) φ1 ( n ) 0 0 = Φ m+1 ( n ) ˜ ˜ c m-1 ( n-1 ) φ 1 ( n ) Φ m ( n-1 ) c m-1 ( n-1 ) ˜ ˜ H

φ ( n )c m-1 ( n-1 ) = ˜1 Φ m ( n-1 )c m-1 ( n-1 ) ˜

(15)

Both Φ m ( n-1 ) and cm-1(n-1) have the same time argument, n-1. Also, they are both ˜ positioned in the first line of Eq. (15) in such a way that, when the matrix multiplication is performed, Φ m-1 ( n-1 ) becomes postmultiplied by cm-1(n-1). For a ˜

261

backward prediction-error filter of order m-1, evaluated at time n-1, the set of augmented normal equations in Eq. (13) takes the form 0 m-1 Φ m ( n-1 )c m-1 ( n-1 ) = ˜ B m-1 ( n-1 ) Define the second scalar H

∆′ m-1 ( n-1 ) = φ 1 ( n )c m-1 ( n-1 ) ˜

(16)

where the prime is intended to distinguish this new parameter from ∆ m-1 ( n-1 ) . Accordingly, we may rewrite Eq. (15) as

0 Φ m+1 ( n ) = ˜ c m-1 ( n-1 )

∆′ m-1 ( n ) (17)

0 m-1 B m-1 ( n-1 )

(d) The parameters ∆ m-1 ( n-1 ) and ∆′ m-1 ( n-1 ), defined by Eqs. (4) and (16), respectively, are in actual fact the complex conjugate of one another; that is, *

∆′ m-1 ( n ) = ∆ m-1 ( n )

(18)

*

where ∆ m-1 ( n ) is the complex conjugate of ∆ m-1 ( n ) . We prove this relation in three stages: 1. We premultiply both sides of Eq. (5) by the row vector H

[ 0, c m-1 ( n-1 ) ] where the superscript H denotes Hermitian transposition. The result of this matrix multiplication is the scalar

H

[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) ˜

a m-1 ( n ) 0

H

= [ 0, c m-1 ( n-1 ) ]

F m-1 ( n-1 ) 0 m-1 ∆ m-1 ( n )

262

= ∆ m-1 ( n )

(19)

where we have used the fact that the last element of c m-1 ( n-1 ) equals unity. 2. We apply Hermitian transposition to both sides of Eq. (17), and use the Hermitian property of the correlation matrix Φ m+1 ( n ) , thereby obtaining ˜ H

*

T

[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) = [ ∆′ m-1 ( n ), 0 m-1, B m-1 ( n-1 ) ] ˜ *

where ∆′ m-1 ( n ) is the complex conjugate of ∆′ m-1 ( n ) and B m-1 ( n-1 ) is real valued. Next we use this relation to evaluate the scalar

[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) ˜

a m-1 ( n )

*

0

T

= ∆′ m-1 ( n )0 m-1, B m-1 ( n-1 )

a m-1 ( n ) 0

*

= ∆′ m-1 ( n )

(20)

where we have used the fact that the first element of am-1(n) equals unity. 3. Comparison of Eqs. (19) and (20) immediately yields the relation of Eq. (18) between the parameters ∆ m-1 ( n ) and ∆′ m-1 ( n ) . (e) We are now equipped with the relations needed to derive the desired time-update for recursive computation of the parameter ∆ m-1 ( n ) . Consider the m-by-1 tap-weight vector am-1(n-1) that pertains to a forward predictionerror filter of order m-1, evaluated at time n-1. The reason for considering time n-1 will become apparent presently. Since the leading element of the vector am-1(n-1) equals unity, we may express ∆ m-1 ( n ) as follows [see Eq. (18) and (20)]: T

∆ m-1 ( n ) = [ ∆ m-1 ( n ), 0 m-1, B m-1 ( n-1 ) ]

263

a m-1 ( n-1 ) 0

(21)

Taking the Hermitian transpose of both sides of Eq. (17), recognizing the Hermitian property of Φ m+1 ( n ) and using the relation of Eq. (19), we get ˜ T

[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) = [ ∆ m-1 ( n ), 0 m-1, B m-1 ( n-1 ) ] ˜

(22)

Hence, substitution of Eq. (22) into (21) yields

∆ m-1 ( n ) = [ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) ˜

a m-1 ( n-1 )

(23)

0

But the correlation matrix Φ m+1 ( n ) may be time-updated as follows: ˜ H

Φ m+1 ( n ) = λΦ m+1 ( n-1 ) + u m+1 ( n )u m+1 ( n ) ˜ ˜

(24)

Accordingly, we may use this relation for Φ m+1 ( n ) to rewrite Eq. (23) as ˜ H

∆ m-1 ( n ) = λ [ 0, c m-1 ( n-1 ) ]Φ m+1 ( n-1 ) ˜

a m-1 ( n-1 )

H

0

H

+ [ 0, c m-1 ( n-1 ) ]u m+1 ( n )u m+1 ( n )

a m-1 ( n-1 )

(25)

0

Next, we recognize from the definition of forward a priori prediction error that H

u m+1 ( n )

a m-1 ( n-1 ) 0

H

*

= [ u m ( n ), u ( n-m ) ]

a m-1 ( n-1 ) 0

H

= u m ( n )a m-1 ( n-1 ) *

= η m-1 ( n )

(26)

and from the definition of the backward a posteriori prediction error that

264

H

H

[ 0, c m-1 ( n-1 ) ]u m+1 ( n ) = [ 0, c m-1 ( n-1 ) ]

u(n) u m ( n-1 )

H

= c m-1 ( n-1 )u m ( n-1 ) = b m-1 ( n-1 )

(27)

Also, by substituting n-1 for n into Eq. (5), we have

Φ m+1 ( n-1 ) ˜

a m-1 ( n-1 )

F m-1 ( n-1 ) =

0 m-1

0

∆ m-1 ( n-1 )

Hence, using this relation and the fact that the last element of the tap-weight vector cm-1(n-1), pertaining to the backward prediction-error filter, equals unity, we may write the first term on the right-hand side of Eq. (25), except for λ, as H

[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n-1 ) ˜

a m-1 ( n-1 )

H

= [ 0, c m-1 ( n-1 ) ]

0 F m-1 ( n-1 ) 0 m-1 ∆ m-1 ( n-1 )

= ∆ m-1 ( n-1 )

(28)

Finally, substituting Eqs. (26), (27), and (28) into (25), we may express the timeupdate recursion for ∆ m-1 ( n ) simply as *

∆ m-1 ( n ) = λ∆ m-1 ( n-1 ) + b m-1 ( n-1 )η m-1 ( n ) which is the desired result.

265

(29)

12.10 (a) We start with the relations

Φ m+1 ( n )

a m-1 ( n )

F m-1 ( n ) =

(1)

0 m-1

0

∆ m-1 ( n )

and

0 = Φ m+1 ( n ) c m-1 ( n-1 )

∆ *m-1 ( n ) (2)

0 m-1 B m-1 ( n-1 )

Multiplying Eq. (2) by the ratio ∆ m-1 ( n ) ⁄ B m-1 ( n-1 ) and subtracting the result from Eq. (1):

Φ m+1 ( n )

a m-1 ( n ) 0

∆ m-1 ( n ) 0 – ------------------------B m-1 ( n-1 ) c m-1 ( n-1 ) 2

F m-1 ( n ) – ∆ m-1 ( n ) ⁄ B m-1 ( n-1 ) = 0m

(3)

Equation (3) represents the augmented normal equations for forward prediction with order m, as shown by

Φ m+1 ( n )a m-1 ( n ) =

Fm(n)

(4)

0m

on the basis of which we may immediately write 2

∆ m-1 ( n ) F m ( n ) = F m-1 ( n ) – ------------------------B m-1 ( n-1 ) (b) Multiplying Eq. (1) by ∆ *m-1 ( n ) ⁄ F m-1 ( n ) and subtracting the result from Eq. (2):

266

∆ *m-1 ( n ) a ( n ) 0 – -------------------- m-1 c m-1 ( n-1 ) F m-1 ( n ) 0

Φ m+1 ( n )

=

0m

(5)

2

B m-1 ( n-1 ) – ∆ m-1 ( n ) ⁄ F m-1 ( n )

Equation (5) represents the augmented normal equations for backward prediction with order m, as shown by 0m

Φ m+1 ( n )c m-1 ( n ) =

(6)

B m-1 ( n )

on the basis of which we may immediately write 2

B m ( n ) = B m-1 ( n-1 ) – ∆ m-1 ( n ) ⁄ F m-1 ( n ) 12.11 (a) From part (a) of problem 12.5, we have

–1 Φ M+1 ( n )

=

T

0

0M

0M

Φ M ( n-1 )

–1

H 1 + ----------------a M ( n )a M ( n ) . FM (n)

Similarly, from part (b) of the same problem, we have –1

–1 Φ M+1 ( n )

=

ΦM ( n ) T

0M

0M 0

H 1 + ----------------c M ( n )c M ( n ) . BM ( n )

–1

–1

Subtracting these two equations gives Φ M+1 ( n ) – Φ M+1 ( n ) = 0 , or

0 =

0 0M

T

–1

ΦM ( n ) H 1 + ----------------a M ( n )a M ( n ) – FM(n) –1 T Φ M ( n-1 ) 0M 0M

This is easily rearranged as

267

0M 0

H 1 – ----------------c M ( n )c M ( n ) BM ( n )

–1

ΦM ( n )

0M

T

0M



0

T

0

0M

0M

Φ M ( n-1 )

H H 1 1 = ----------------a M ( n )a M ( n ) – ----------------c M ( n )c M ( n ) . BM ( n ) FM (n)

–1

(b) From page 441, equation (9.16) we have the basic matrix recursion –1 ΦM ( n )

= λ

–1

–1 Φ M ( n-1 )

–1 λ

–1

H

–1

–1

Φ M ( n-1 )u ( n )u ( n )Φ M ( n-1 ) – λ -----------------------------------------------------------------------------------–1 –1 H 1 + λ u ( n )Φ M ( n-1 )u ( n )

From Eq. (9.18) of the text, we may introduce the gain vector –1

–1

λ Φ M ( n-1 )u ( n ) k M ( n ) = --------------------------------------------------------------------–1 H –1 1 + λ u ( n )Φ M ( n-1 )u ( n ) which yields the expression –1

–1

–1

–1

H

–1

Φ M ( n ) = λ Φ M ( n-1 ) – λ k M ( n )u ( n )Φ M ( n-1 ) Now, again from Eq. (9.18) of the text, we have –1 H

–1

H

–1 H

–1

λ u ( n )Φ M ( n-1 ) = k M ( n ) ( 1 + λ u ( n )Φ M ( n-1 )u ( n ) ) But from Eq. (10.100) of the text, we may introduce the conversion factor as 1 γ M ( n ) = --------------------------------------------------------------------–1 H –1 1 + λ u ( n )Φ M ( n-1 )u ( n ) so that the basic matrix inversion lemma becomes H

–1 ΦM ( n )

= λ

–1

–1 Φ M ( n-1 )

k M ( n )k M ( n ) – -------------------------------γ M (n)

(c) The result of part (b) can be rearranged as H

–1 Φ M ( n-1 )

=

–1 λΦ M ( n ) +

k M ( n )k M ( n ) λ -------------------------------γ M (n)

268

Inserting this recursion into part (a), we have –1

ΦM ( n )

0M

T 0M

0



T

0

0M –1

H

λΦ M ( n ) + λk M ( n )k M ( n ) ⁄ γ M ( n )

0M

H H 1 1 = ----------------a M ( n )a M ( n ) – ----------------c M ( n )c M ( n ) FM (n) BM ( n )

This, in turn, can be rearranged as –1

ΦM ( n )

0M

T 0M

0

T

0

0M

0M

ΦM ( n )

–λ

–1

H

H 0 k M ( n ) c ( n )c H ( n ) a M ( n )a M ( n ) 0 M M -------------------------- – -----------------------------= -------------------------------- + λ -. FM (n) BM ( n ) kM ( n ) γ M ( n )

(d) Now we multiply the solution from (c) from the left by 2

M

[ 1, ( z ⁄ λ ), ( z ⁄ λ ) , …, ( z ⁄ λ ) ] , and from the right by M H

2

[ 1, ( w ⁄ λ ), ( (w ) ⁄ λ , …, w ⁄ λ) ] . First note that, if we set 1 0M 0

w* ⁄ λ ...

M

P ( z, w* ) = [ 1 ( z ⁄ λ )… ( z ⁄ λ ) ]

–1 ΦM ( n ) T 0M

( w* ⁄ λ )

M

then, upon displacing this matrix one position along the main diagonal, we obtain

269

1 0 0M

w* ⁄ λ

= zw*P ( z, w* )

...

M

[ 1 ( z ⁄ λ )… ( z ⁄ λ ) ]λ

T 0M –1 ΦM ( n )

( w* ⁄ λ )

M

Thus the left-hand side of part (c) yields the two-variable polynomial ( 1 – zw* )P ( z, w* ) Similarly, rewrite the right-hand side as the sum and difference of dyads: H

H

cM ( n ) aM ( n ) cM ( n ) aM ( n ) 0 λ λ  -------------- – -------------------H -------------------- -------------------- + ---------------------------------γ M ( n ) kM ( n )  γ M ( n ) 0 kM ( n )  FM (n) FM (n) BM ( n ) BM ( n ) Accordingly, if we set M aM ( n ) A ( z ) = [ 1, z ⁄ λ … ( z ⁄ λ ) ] -------------------FM(n) M 0 λ K ( z ) = [ 1, z ⁄ λ … ( z ⁄ λ ) ] --------------γ M ( n ) kM ( n ) M cM ( n ) C ( z ) = [ 1, z ⁄ λ … ( z ⁄ λ ) ] -------------------BM ( n )

then we will have

M aM ( n ) [ 1, z ⁄ λ … ( z ⁄ λ ) ] -------------------- -------------------FM (n) FM (n)

1 w* ⁄ λ

= A ( z ) A* ( w ),

...

H aM ( n )

( w* ⁄ λ )

M

and likewise for the remaining terms. Putting all this together, we finally obtain ( 1 – zw* )P ( z, w* ) = A ( z ) A* ( w ) + K ( z )K * ( w ) – B ( z )B* ( w )

270

(e) If we set z = w = e jω, then the result of part (d) reads as jω ( – j )ω

e = 0



) P(e , e

– jω

      

(1 – e

) = A(e



) A* ( e



) + K (e



)K * ( e



) – C(e



)C* ( e

This gives A(e

jω 2

jω 2

) + K (e

)

= C(e

jω 2

) ,

for all ω.

(f) If we set w* = z* in the result of part (d), we have 2

( 1 – z )P ( z, z* ) = A ( z ) A* ( z ) + K ( z )K * ( z ) – C ( z )C* ( z ) 2

2

= A(z) + K (z) – C (z)

2

(1)

–1

Now, since Φ M ( n ) is a positive definite matrix, it is true that 1 M-1

–1

]Φ m ( n )

z* ⁄ λ

> 0,

...

P ( z, z* ) = [ 1, z ⁄ λ, …, ( z ⁄ λ )

( z* ⁄ λ )

for all z.

M

Accordingly, the right-hand side of Eq. (1) above must take the same sign as (1 - |z|2), which gives  0, 2

2

2

z >1; z =1; z 1.

Now, if C(z) were to have a zero with modulus greater than one, that is, C ( z 0 ) = 0,

with z 0 >1

271

(2)



)

then we would have 2

C ( z0 )

2

2

2

= A ( z 0 ) + K ( z 0 ) ≥ 0.

    

2

A ( z0 ) + K ( z0 ) –

0 With |z0| > 1, we obtain a contradiction to Eq. (2) above. Accordingly, C(z) can have no zeros in |z| > 1, so that it may be determined without phase ambiguity from A(z) and K(z) by way of spectral factorization using the result of part (e). 12.12 (a) Backward prediction The first three lines of Table 12.4 follow from the state-space models of Eq. (12.87) through (12.95), togehter with Table 10.3 of Chapter 10 in the text. For the remaining three entries, we may proceed as follows: –1 

n-1

(a) K ( n-1 ) ↔ λ Φ ( n-1 ) = λ  ∑ λ  i=1 –1

–1

n – i+1

2

ε f , m-1 ( i-1 )  

–1 –1

= λ F m-1 ( n-1 ) (b) g ( n ) ↔ λ

–1 ⁄ 2

k(n) = λ

–1 ⁄ 2 –1 F m-1 ( n )ε f , m-1 ( n )

= λ

–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )β m-1 ( n-1 )

= λ

–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )β m-1 ( n )

e(n) ↔ λ

–n ⁄ 2

*

*

+ η m-1 ( n )κ b, m ( n-1 )

*

( ε b, m-1 ( n-1 ) + ε f , m-1 ( n )κ b, m ( n ) )

= λ

–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 ) ( b m-1 ( n-1 )

= λ

–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )b m ( n )

Hence,  B m ( n ) α(n) r ( n ) = ----------- = γ m-1 ( n-1 )  --------------- e(n)  bm ( n ) 

272

*

+ f m-1 ( n )κ b, m ( n ) )

–1

γ m-1 ( n-1 ) = -----------------------γ m(n) (b) The relations for joint-process estimation follow a similar procedure. 12.13 (a) The correction in the update equation for Emin(n) is defined by the product term α m ( n )e *m ( n ) . This correction term is also equal to |εm+1(n)|2. Hence ε m+1 ( n )ε *m+1 ( n ) = α m ( n )e *m ( n ) By definition εm ( n ) =

em ( n ) ⋅ αm ( n )

We know α m ( n )e *m ( n ) is real. Hence, we must have arg [ ε m ( n ) ] = arg [ e m ( n ) ] Moreover, it is natural to have arg [ ε m ( n ) ] = arg [ e m ( n ) ] which goes to show that arg [ ε m ( n ) ] = arg [ e m ( n ) ] = arg [ α m ( n ) ] (b) The correction term in the update equation for Bmin(n) is defined by the product term ψ m ( n )b *m ( n ) . This correction term is also equal to |εb,m(n)|2. Hence, ε b, m ( n )ε *b, m ( n ) = ψ m ( n )b *m ( n ) By definition, ε b, m ( n ) =

bm ( n ) ⋅ ψ m ( n )

We know ψ m ( n )b *m ( n ) is real. Hence, we must have arg [ b m ( n ) ] = arg [ ψ m ( n ) ]

273

Moreover, it is natural to have arg [ ε b, m ( n ) ] = arg [ b m ( n ) ] Therefore, arg [ ε b, m ( n ) ] = arg [ b m ( n ) ] = arg [ ψ m ( n ) ] (c) The correction term in the update equation for Fm(n) is η m ( n ) f *m ( n ) which is also equal to |εf,m(n)|2. Hence, ε f , m ( n )ε *f , m ( n ) = η m ( n ) f *m ( n ) By definition, ε f , m(n) =

f m ( n ) ⋅ ηm ( n )

We know η m ( n ) f *m ( n ) is real. Hence, we must have arg [ f m ( n ) ] = arg [ η m ( n ) ] Moreover, it is natural to have arg [ ε f , m ( n ) ] = arg [ f m ( n ) ] Therefore, arg [ ε f , m ( n ) ] = arg [ f m ( n ) ] arg [ η m ( n ) ] 12.14 We start with 1⁄2 1⁄2

λ B m-1 ( n-1 ) c b, m-1 ( n ) = -------------------------------------1⁄2 B m-1 ( n )

(1)

Also, we note that

274

1⁄2 1⁄2

λ F m-1 ( n-1 ) c f , m-1 ( n ) = -------------------------------------1⁄2 F m-1 ( n )

(2)

But from the solution to part (d) of Problem 12.6: 1⁄2 1⁄2

λ B m-1 ( n-1 ) 1 ⁄ 2 1⁄2 -------------------------------------- γ m-1 ( n ) = γ m ( n ) 1⁄2 B m-1 ( n )

(3)

We next adapt the relation of part (c) of Problem 12.6 to our present situation: 1⁄2 γ m (n)

1⁄2 1⁄2

λ F m-1 ( n-1 ) 1 ⁄ 2 = -------------------------------------- γ m-1 ( n ) 1⁄2 Fm (n)

(4)

Hence, using Eqs. (1) and (3): 1⁄2

1⁄2

γ m ( n ) = c b, m-1 ( n )γ m-1 ( n )

(5)

and using Eqs. (2) and (4): 1⁄2

1⁄2

γ m ( n ) = c f , m-1 ( n )γ m-1 ( n )

(6)

Multiplying Eqs. (1) and (2), we get γ m ( n ) = c b, m-1 ( n )c f , m-1 ( n )γ m-1 ( n )

(7)

which is the desired result. H

12.15 The matrix product Φ m+1 ( n )L m ( n ) equals 1 c 1, 1 ( n ) … c m-1,m-1 ( n ) 0 1 … c m-1,m-2 ( n ) H : : Φ m+1 ( n )L m ( n ) = Φ m+1 ( n ) : 0 0 … 1 0

0



275

0

c m, m ( n ) c m,m-1 ( n ) : c m, 1 ( n ) 1

(1)

(1) From the augmented system of normal equations, we have c m, m ( n )

...

=

...

Φ m+1 ( n )

0 0

c m, m – 1 ( n ) c m, 1 ( n )

Bm ( n )

1 (2) From the solution to Problem 12.9 we note that

Φ m+1 ( n ) =

Φm ( n ) H

φ2 ( n )

φ2 ( n ) U 2(n)

Hence c m-1, m-1 ( n )

...

Φ m+1 ( n )

c m-1, m – 2 ( n )

=

Φ m ( n ) c m-1, m – 2 ( n ) ...

c m-1, m-1 ( n )

1

1 0

H φ 2 ( n )c m-1 ( n )

]

...

0 0 =

B m-1 ( n ) H

φ 2 ( n )c m-1 ( n ) (3) Thus far we have dealt with the last two columns of the matrix on the right-hand side of Eq. (1). Similarly, we may go onto show that

276

c 1, 1 ( n ) 1

...

=

...

Φ m+1 ( n )

0 B1 ( n ) x x

0 0

where the crosses refer to some nonzero elements. Finally, we arrive at

x =

...

...

Φ m+1 ( n ) =

B0 ( n )

1 0 0 0

x x

where again the crosses refer to some other nonzero elements.

0

B1 ( n ) …

0

0

...

0

...

x

0

...

B0 ( n ) H Φ m+1 ( n )L m ( n )



...

Putting all of these pieces together, we conclude that

x

x

x

x

… B m-1 ( n ) 0 … x Bm ( n )

H

This shows that Φ m+1 ( n )L m ( n ) is a lower triangular matrix. Since L m ( n ) is itself a lower triangular matrix, all the elements of L m ( n ) above the main diagonal are zero. H

Moreover, since Φ m+1 ( n ) is Hermitian symmetric, L m ( n )Φ m+1 ( n )L m ( n ) is likewise Hermitian symmetric. Accordingly, all the elements of this product below the main diagonal are also zero. Finally, since all the diagonal elements of L m ( n ) equal unity, we H

conclude that L m ( n )Φ m+1 ( n )L m ( n ) is a diagonal matrix, as shown by H

L m ( n )Φ m+1 ( n )L m ( n ) = diag [ B 0 ( n ), B 1 ( n ), …, B m-1 ( n ), B m ( n ) ]

277

12.16 The joint probability density function of the time series u(n),u(n-1),...,u(n-M) equals –1 1 H 1 f U ( u ) = -------------------------------------- exp – --- u ( n )R u ( n ) 1⁄2 2 [ 2πdet ( R ) ]

where uT(n) = [u(n),u(n-1),...,u(n-M)] and R is the (M+1)-by-(M+1) ensemble-averaged correlation matrix of the input vector u(n). The log-likelihood function equals L = ln f U ( u ) –1 1 1 H = – --- ln [ 2πdet ( R ) ] – --- u ( n )R u ( n ) 2 2

(1)

For n > M+1, we may approximate the correlation matrix R in terms of the deterministic correlation matrix Φ ( n ) as 1 R ≈ --- Φ ( n ), n

n ≥ M+1

(2)

Hence, we may rewrite Eq. (1) as –1 1 1 H L ≈ – --- ln [ 2πdet ( R ) ] – ------ u ( n )Φ ( n )u ( n ) 2 2n

(3)

The second term on the right side of Eq. (3) equals –1 1 H 1 ------ u ( n )Φ ( n )u ( n ) = ------ [ 1 – γ ( n ) ] 2n 2n

where γ ( n ) is the conversion factor. 12.17 The a posteriori estimation em(n) is order-updated by using the recursion e m ( n ) = e m-1 ( n ) – κ *m ( n )b m ( n ),

m = 1, 2, …, M

(1)

By definition, we have e m ( n ) = d ( n ) – dˆ ( n U n-m )

(2)

278

where d(n) is the desired response and dˆ ( n U n-m ) is the least-squares estimate of d(n) given the input samples u(n),...,u(n-m+1),u(n-m) that span the space Un-m. Similarly, e m-1 ( n ) = d ( n ) – dˆ ( n U n-m+1 )

(3)

where dˆ ( n U n-m+1 ) is the least-squares estimate of d(n) given the input samples u(n),...,u(n-m+1). Substituting Eqs. (1) and (2) into (3), we get dˆ ( n U n-m ) = dˆ ( n U n-m+1 ) + h *m ( n )b m ( n )

(4)

Equation (4) shows that given the regression coefficient hm, we only need bm(n) as the new piece of information for updating the least-squares estimate of the desired response. Hence, bm(n) may be viewed as a form of innovation, which is in perfect agreement with the discussion presented in Section 10.1. 12.18 From Eqs. (12.159) of the text: H

D m+1 ( n ) = L m ( n )Φ m+1 ( n )L m ( n ) Equivalently, we may write –1

H

–1

Φ m+1 ( n ) = L m ( n )D m+1 ( n )L m ( n ) Hence, –1

P ( n ) = Φ m+1 ( n ) = [D

–1 ⁄ 2

H

( n )L ( n ) ] [ D

–1 ⁄ 2

( n )L ( n ) ]

12.19 (a) From Eq. (12.152), xˆ ( n+1 Y n ) = λ

–1 ⁄ 2

xˆ ( n+1 Y n-1 ) + g ( n )α ( n )

(1)

From Eqs. (12.97) and the solution to Problem 12.12: g(n) ↔ λ

–1 ⁄ 2 –1 F m-1 ( n )ε f , m-1 ( n )

(2)

279

α(n) ↔ λ

–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )β m ( n )

(3)

(It is important to note that the relation corresponding to α(n) must involve the error signal appropriate to the estimation problem of interest.) Also from Table 12.4 (under backward prediction) xˆ ( n+1 Y n-1 ) ↔ ( – λ

–n ⁄ 2

κ b, m ( n-1 ) )

(4)

Hence, substituting Eqs. (2), (3) and (4) into (1), and cancelling the common term λ

–( 1 ⁄ 2 ) ⁄ 2

: 1⁄2

 γ m-1 ( n-1 )ε f , m-1 ( n ) * κ b, m ( n ) = κ b, m ( n-1 ) –  ------------------------------------------------- β m ( n ) F m-1 ( n )  

(5)

But, from Eq. (12.80): 1⁄2

ε f , m-1 ( n-1 ) = γ m-1 ( n-1 )η m-1 ( n )

(6)

Thus Eq. (5) simplifies to 1⁄2

 γ m-1 ( n-1 )η m-1 ( n ) * κ b, m ( n ) = κ b, m ( n-1 ) –  --------------------------------------------- β m ( n ) F m-1 ( n )  

(7)

which is the desired result; see Eq. (12.154). (b)From Table 12.4 (under joint-process estimation) and the solution to Problem 12.12: xˆ ( n Y n-1 ) ↔ λ g(n) ↔ λ

–n ⁄ 2

h m-1 ( n-1 )

(10)

–1 ⁄ 2 –1 B m-1 ( n )ε b, m-1 ( n )

(11)

–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )ξ m ( n )

(12)

α(n) ↔ λ

Hence, using Eqs. (10) through (12) in (1), we get

280

1⁄2

 γ m-1 ( n-1 )ε b, m-1 ( n ) * h m-1 ( n ) = h m-1 ( n-1 ) +  ------------------------------------------------- ξ m ( n ) B m-1 ( n )  

(13)

From Eq. (12.80): ε b, m-1 ( n ) = γ m-1 ( n-1 )β m-1 ( n-1 ) Hence,  γ m-1 ( n )β m-1 ( n ) * h m-1 ( n ) = h m-1 ( n-1 ) +  --------------------------------------- ξ m ( n ) B m-1 ( n )  

(14)

which is exactly the same as Eq. (12.155). 12.20 For all m, when we put *

*

κ b, m ( n ) = κ f , m ( n ) = κ m ( n ) γm = 1 in Table 12.7, we get F m-1 ( n ) = λF m-1 ( n-1 ) + f m-1 ( n )

2

(1)

B m-1 ( n-1 ) = λB m-1 ( n-2 ) + b m-1 ( n-1 )

2

*

(2)

f m ( n ) = f m-1 ( n ) + κ m ( n-1 )b m-1 ( n-1 )

(3)

b m ( n ) = b m-1 ( n-1 ) + κ m ( n-1 ) f m-1 ( n )

(4)

*

b m-1 ( n-1 ) f m ( n ) κ m ( n ) = κ m ( n-1 ) – ---------------------------------------B m-1 ( n-1 )

(5)

*

f m-1 ( n )b m ( n ) = κ m ( n-1 ) – ----------------------------------F m-1 ( n )

(6)

281

Adding Eqs. (1) and (2), and setting E m-1 ( n ) = F m-1 ( n ) + B m-1 ( n-1 ) we get 2

2

E m-1 ( n ) = λE m-1 ( n-1 ) + ( f m-1 ( n ) + b m-1 ( n-1 ) )

(7)

Equation (7) is recognized as the update equation (12.9) for the GAL algorithm in the text. The only outstanding issue is the recursive relation for the reflection coefficient. Adding Eqs. (5) and (6) and setting κ m ( n ) = κˆ m ( n ) and dividing by two, we get *

*

1 b m-1 ( n-1 ) f m ( n ) f m-1 ( n )b m ( n ) κˆ m ( n ) = κˆ m ( n-1 ) – --- ---------------------------------------- + ----------------------------------2 B m-1 ( n-1 ) F m-1 ( n )

(8)

Now assuming that 1 B m-1 ( n-1 ) = F m-1 ( n ) = --- E m-1 ( n ) 2

(9)

then Eq. (8) takes the form * * 1 κˆ m ( n ) = κˆ m ( n-1 ) – -------------------- [ b m-1 ( n-1 ) f m ( n ) + f m-1 ( n )b m ( n ) ] E (n)

(10)

m-1

Equation (10) is recognized as the update equation (12.15) of the algorithm in the text. Conclusions: The GAL algorithm is a special case of the recursive LSL algorithm, using a posteriori estimation errors under the following settings: κ b, m ( n ) = κˆ f , m ( n ) γ m(n) = 1 1 F m-1 ( n ) = B m-1 ( n-1 ) = --- E m-1 ( n ) 2

282

12.21 The forward reflection coefficient equals ∆ m-1 ( n ) κ f , m ( n ) = ------------------------B m-1 ( n-1 ) *

∆ m-1 ( n-1 ) b m-1 ( n-1 ) f m-1 ( n ) = – λ ------------------------- – -------------------------------------------------B m-1 ( n-1 ) γ m-1 ( n-1 )B m-1 ( n-1 ) *

b m-1 ( n-1 ) f m-1 ( n ) * B m-1 ( n-2 ) ∆ m-1 ( n-1 ) = – λ ------------------------- ------------------------- – --------------------------------------------------ξ m ( n ) B m-1 ( n-1 ) B m-1 ( n-2 ) γ m-1 ( n-1 )B m-1 ( n-1 ) where, in the first term, we have multiplied and divided by B m-1 ( n-2 ) . But ∆ m-1 ( n-1 ) κ f , m ( n-1 ) = – ------------------------B m-1 ( n-2 ) Hence, *

B m-1 ( n-2 ) b m-1 ( n-1 ) f m-1 ( n ) κ f , m ( n ) = ------------------------- κ f , m ( n-1 ) – -----------------------------------------------------B m-1 ( n-1 ) λB m-1 ( n-2 )γ m-1 ( n-1 ) We now use the relations (see part d of Problem 12.6) γ m ( n-1 ) λB m-1 ( n-2 ) ----------------------------- = -----------------------B m-1 ( n-1 ) γ m-1 ( n-1 ) Hence, we may rewrite Eq. (1) as *

γ m ( n-1 ) b m-1 ( n-1 ) f m-1 ( n ) κ f , m ( n ) = ------------------------ κ f , m ( n-1 ) – -----------------------------------------------------γ m-1 ( n-1 ) λB m-1 ( n-2 )γ m-1 ( n-1 ) Next, the backward reflection coefficient equals ∆ *m-1 ( n ) κ b, m ( n ) = – -------------------F m-1 ( n )

283

(1)

∆ *m-1 ( n-1 ) b *m-1 ( n-1 ) f m-1 ( n ) = – λ ------------------------- – --------------------------------------------F m-1 ( n ) F m-1 ( n )γ m-1 ( n-1 ) F m-1 ( n-1 ) b *m-1 ( n-1 ) f m-1 ( n ) = λκ b, m ( n-1 ) ------------------------- – --------------------------------------------F ( n ) F ( n )γ ( n-1 ) m-1

m-1

m-1

F m-1 ( n-1 ) b *m-1 ( n-1 ) f m-1 ( n ) = λ ------------------------- κ b, m ( n-1 ) – -----------------------------------------------------F m-1 ( n ) λF m-1 ( n-1 )γ m-1 ( n-1 )

(2)

We now recognize that (see part c of Problem 12.6) F m-1 ( n-1 ) γ m(n) λ ------------------------- = -----------------------F m-1 ( n ) γ m-1 ( n-1 ) Hence, we may rewrite Eq. (2) as γ m(n) b *m-1 ( n-1 ) f m-1 ( n ) κ b, m ( n ) = ------------------------ κ b, m ( n-1 ) – -----------------------------------------------------γ m-1 ( n-1 ) λF m-1 ( n-1 )γ m-1 ( n-1 ) 12.22 The forward a posteriori prediction error equals f m ( n ) = f m-1 ( n ) + κ *f , m ( n )b m-1 ( n-1 ) ∆ *m-1 ( n ) = f m-1 ( n ) – ------------------------- b m-1 ( n-1 ) B ( n-1 ) m-1

Hence, the normalized value of fm(n) equals f m(n) f m ( n ) = ---------------------------------------------1⁄2 1⁄2 F m ( n )γ m ( n-1 ) ∆ *m-1 ( n ) f m-1 ( n ) = ---------------------------------------------- – -----------------------------------------------------------------------b m-1 ( n-1 ) 1⁄2 1⁄2 1⁄2 1⁄2 F m ( n )γ m ( n-1 ) B m-1 ( n-1 )F m ( n )γ m ( n-1 ) But

284

(1)

2

∆ m-1 ( n ) F m ( n ) = F m-1 ( n ) 1 – ---------------------------------------------F m-1 ( n )B m-1 ( n-1 ) = F m-1 ( n ) [ 1 – ∆ m-1 ( n ) ]

(2)

where ∆ m-1 ( n ) ∆ m-1 ( n ) = ---------------------------------------------1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) Similarly, 2

∆ m-1 ( n ) B m ( n ) = B m-1 ( n-1 ) 1 – ---------------------------------------------B m-1 ( n-1 )F m-1 ( n ) 2

= B m-1 ( n-1 ) [ 1 – ∆ m-1 ( n ) ]

(3)

Also, we may write 2

b m-1 ( n-1 ) γ m ( n-1 ) = γ m-1 ( n-1 ) 1 – -------------------------------------------------γ m-1 ( n-1 )B m-1 ( n-1 ) 2

= γ m-1 ( n-1 ) [ 1 – b m-1 ( n-1 ) ]

(4)

where b m-1 ( n-1 ) b m-1 ( n-1 ) = --------------------------------------------------1⁄2 1⁄2 γ m-1 ( n-1 )B m-1 ( n-1 )

(5)

Hence, we may use Eqs. (2) and (4) to express the first term on the right side of Eq. (1) as 1⁄2

1⁄2

f m-1 ( n ) ⁄ [ F m-1 ( n )γ m-1 ( n-1 ) ] f m-1 ( n ) ---------------------------------------------- = ----------------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 F m ( n )γ m ( n-1 ) [ 1 – ∆ m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ]

285

f m-1 ( n ) = ----------------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ]

(6)

Next, we use Eqs. (2) to (4) to express the second term on the right side of Eq. (1) as

∆ *m-1 ( n )b m-1 ( n-1 ) ∆ *m-1 ( n )b m-1 ( n-1 ) -------------------------------------------------------------------------- = --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 1⁄2 1⁄2 B m-1 ( n-1 )F m ( n )γ m ( n-1 ) B m-1 ( n-1 )F m ( n )γ m ( n-1 ) 1 – ∆ m-1 ( n ) 1 – b m-1 ( n-1 )

∆ *m-1 ( n )b m-1 ( n-1 )

= ----------------------------------------------------------------------------------------1⁄2 1⁄2 1 – ∆ m-1 ( n )

2

1 – b m-1 ( n-1 )

2

(7)

Substituting Eqs. (6) and (7) into (1), we may thus write f m-1 ( n ) – ∆ *m-1 ( n )b m-1 ( n-1 ) f m ( n ) = ----------------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ]

(8)

Next, the backward a posteriori prediction error equals ∆ m-1 ( n ) b m ( b ) = b m-1 ( n-1 ) – -------------------- f m-1 ( n ) F m-1 ( n ) Hence, the normalized value of bm(n) equals ∆ m-1 ( n ) f m-1 ( n ) b m-1 ( n-1 ) b m ( b ) = ---------------------------------------- – ------------------------------------------------------------1⁄2 1⁄2 1⁄2 1⁄2 B m ( n )γ m ( n ) F m-1 ( n )B m ( n )γ m ( n ) The conversion factor may be updated by using the recursion: 2

f m-1 ( n ) γ m ( n ) = γ m-1 ( n-1 ) – ------------------------F m-1 ( n )

2

f m-1 ( n ) = γ m-1 ( n-1 ) 1 – --------------------------------------------F m-1 ( n )γ m-1 ( n-1 )

286

(9)

2

= γ m-1 ( n-1 ) [ 1 – f m-1 ( n ) ]

(10)

where f m-1 ( n ) f m-1 ( n ) = ---------------------------------------------1⁄2 1⁄2 F m-1 ( n )γ m-1 ( n-1 ) Using Eqs. (3) and (10), we may express the first term on the right side of (Eq. 9) as 1⁄2

1⁄2

b m-1 ( n-1 ) b m-1 ( n-1 ) ⁄ [ B m-1 ( n-1 )γ m-1 ( n-1 ) ] ---------------------------------------- = -----------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 B m ( n )γ m ( n ) [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ] b m-1 ( n-1 ) = -----------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ]

(11)

Using Eqs. (3) and (9), we may also rewrite the second term on the right side of Eq. (9) as ∆ m-1 ( n ) f m-1 ( n ) ∆ m-1 ( n ) f m-1 ( n ) ------------------------------------------------------------- = ------------------------------------------------------------------------------------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 1⁄2 1⁄2 F m-1 ( n )B m ( n )γ m ( n ) F m-1 ( n )B m-1 ( n-1 )γ m-1 ( n-1 ) [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ] ∆ m-1 ( n ) f m-1 ( n ) = -----------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ]

(12)

Hence, the use of Eqs. (11) and (12) in (9) yields b m-1 ( n-1 ) – ∆ m-1 ( n ) f m-1 ( n ) b m ( n ) = -----------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ] Finally, we note that b m-1 ( n-1 ) f *m-1 ( n ) ∆ m-1 ( n ) = λ∆ m-1 ( n-1 ) + --------------------------------------------γ m-1 ( n-1 )

287

(13)

The normalized value of ∆ m-1 ( n ) equals ∆ m-1 ( n ) ∆ m-1 ( n ) = ---------------------------------------------1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) ∆ m-1 ( n ) b m-1 ( n-1 ) f *m-1 ( n ) ---------------------------------------------= λ + ----------------------------------------------------------------------1⁄2 1⁄2 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) F m-1 ( n )B m-1 ( n-1 )γ m-1 ( n-1 )

(14)

Using Eqs. (2) and (3), we may rewrite the first term on the right side of Eq. (14) as 1⁄2

1⁄2

∆ m-1 ( n-1 ) ∆ m-1 ( n-1 ) γ F m-1 ( n-1 )B m-1 ( n-2 ) λ ---------------------------------------------- = ---------------------------------------------------- ------------------------------------------------------1⁄2 1⁄2 1⁄2 1⁄2 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) F m-1 ( n-1 )B m-1 ( n-2 ) F m-1 ( n )B m-1 ( n-1 )

(15)

We next use the relations: λF m-1 ( n-1 ) γ m(n) ----------------------------- = -----------------------F m-1 ( n ) γ m-1 ( n-1 ) λB m-1 ( n-2 ) γ m ( n-1 ) ----------------------------- = -----------------------B m-1 ( n-1 ) γ m-1 ( n-1 ) ∆ m-1 ( n-1 ) ∆ m-1 ( n-1 ) = ---------------------------------------------------1⁄2 1⁄2 F m-1 ( n-1 )B m ( n-2 ) Hence, we may rewrite Eq. (15) as 1⁄2

1⁄2

∆ m-1 ( n-1 ) γ m ( n )γ m ( n-1 ) λ ---------------------------------------------- = ∆ m-1 ( n-1 ) --------------------------------------------1⁄2 1⁄2 γ m-1 ( n-1 ) F m-1 ( n )B m-1 ( n-1 ) Next, multiplying Eq. (4) by (10), we have γ m ( n )γ m ( n-1 ) 2 2 ----------------------------------- = [ 1 – f m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ] 2 γ m-1 ( n-1 ) Accordingly, we may rewrite (16) as

288

(16)

∆ m-1 ( n-1 ) 2 1⁄2 2 1⁄2 λ ---------------------------------------------- = ∆ m-1 ( n-1 ) [ 1 – f m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ] 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 )

(17)

Next, the second term on the right side of Eq. (14) equals b m-1 ( n-1 ) f *m-1 ( n ) ----------------------------------------------------------------------- = b m-1 ( n-1 ) f *m-1 ( n ) 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 )γ m-1 ( n-1 )

(18)

Thus, substituting Eqs. (17) and (18) into (14), we get the desired time-update 2 1⁄2

∆ m-1 ( n ) = ∆ m-1 ( n-1 ) [ 1 – f m-1 ( n ) ]

2 1⁄2

[ 1 – b m-1 ( n-1 ) ]

+ b m-1 ( n-1 ) f *m-1 ( n )

(19)

Equations (19), (8) and (13), in that order, constitute the normalized LSL algorithm.

289

CHAPTER 13 13.1

The analog (infinite-precision) form of the LMS algorithm is described by T

ˆ ( n+1 ) = w ˆ ( n ) + µu ( n ) [ d ( n ) – w ( n )u ( n ) ], w

n = 0, 1, 2, …

ˆ ( n ) is the tap-weight vector estimate at time n, u(n) is the tap-input vector, d(n) is where w the desired response, and µ is the step-size parameter. The digital (finite-precision) counterpart of this update may be expressed as ˆ q ( n+1 ) = w ˆ q ( n ) + Q [ µu q ( n )e q ( n ) ] w where T

e ( n ) = d ( n ) – w ( n )u ( n ) and the use of subscripts q signifies the use of finite-precision arithmetic. Let Q [ µu q ( n )e q ( n ) ] = µu q ( n )e q ( n ) + v ( n ) where the quantizing noise vector v(n) is determined by the manner in which the term u q ( n )e q ( n ) is computed. Hence, ˆ q ( n ) + µu q ( n )e q ( n ) + v ( n ) ˆ q ( n+1 ) = w w

(1)

The quantized value of the estimation error e(n) may be expressed as T

T

e q ( n ) = e ( n ) – ∆u ( n )w ( n ) – u ( n )∆w ( n ) + ζ ( n ) where ζ ( n ) denotes residual error. The quantized value of u(n) equals u q ( n ) = u ( n ) + ∆u ( n ) Hence, we may express u q ( n )e q ( n ), ignoring second-order effects, as follows T

ˆ (n) u q ( n )e q ( n ) = u ( n )e ( n ) + ∆u ( n )e ( n ) – u ( n )∆u ( n )w T

ˆ ( n ) + u ( n )ζ ( n ) – u ( n )u ( n )∆w

290

We may therefore rewrite Eq. (1) as ˆ q ( n ) + µu q ( n )e q ( n ) + µ∆u ( n )e ( n ) ˆ q ( n+1 ) = w w T

T

ˆ ( n ) – µu ( n )u ( n )∆w ˆ ( n ) + µu ( n )ζ ( n ) – µu ( n )∆u ( n )w + v(n)

(2)

But ˆ q ( n+1 ) = w ˆ ( n+1 ) + ∆w ˆ ( n+1 ) w ˆ q(n) = w ˆ ( n ) + ∆w ˆ (n) w ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e ( n ) w Hence, from Eq. (2) we deduce that ∆w ( n+1 ) = F ( n )∆w ( n ) + t ( n ) where T

F ( n ) = I – µu ( n )u ( n ) T

ˆ ( n )∆u ( n ) + µu ( n )ζ ( n ) + v ( n ) t ( n ) = µ∆u ( n )e ( n ) – µu ( n )w Note that T

T

ˆ (n) = w ˆ ( n )∆u ( n ) ∆u ( n )w 13.2

Building on the solution to Problem 13.1, assume that ∆w ( n ) is stationary. Then the expectation of ∆w ( n ) is zero because the expectation of t(n) is zero.

13.3

We note that yI ( n ) =

∑ wi u ( n – i ) i

y II ( n ) =

∑ wiq u ( n – i ) i

291

Hence, ε ( n ) = y I ( n ) – y II ( n ) =

∑ ( wi – wiq )u ( n – i ) i

The mean-square value of ε(n) is

∑ ∑ ( wi – wiq ) ( w j – w jq )u ( n – i )u ( n – j )

2

E[ε (n)] = E

i

=

j

∑ ∑ ( wi – wiq ) ( w j – w jq )E [ u ( n – i )u ( n – j ) ] i

j

Assuming that   2 E [ u ( n – i )u ( n – j ) ] =  A rms ,  0, 

j=i j≠i

we may simplify Eq. (1) as E [ ε ( n ) ] = A rms ∑ ( w i – w iq ) 2

2

2

i

13.4

(a) The digital residual error is LSB e D ( n ) = --------------µ A rms With 12-bit quantization, the least significant bit is LSB = 2

– 12

≈ 0.25 × 10

–3

We are given µ = 0.07 A rms = 1 Hence, the digital residual error is

292

(1)

– 12

–3

–2 0.25 × 10 2 e D ( n ) = ------------------- ≈ ---------------------------- = 0.35 × 10 0.07 × 1 0.07

(b) The rms quantization error is ( QE ) rms =

M-1

2

 2 1⁄2 E [ ε ( n ) ] = A rms  ∑ ( w i – w iq )  ≤ ( A rms M ) ( LSB )  i=0  2

For the problem at hand we have ( QE ) rms ≈ 17 × 0.25 × 10

–3

≈ 10

–3

We thus see that (QE)rms is about 3.5 times worse than the digital residual error. 13.5

We start with 1 1 -------------- = ----------------------------------κq( n ) λ + π q ( n )u ( n ) η π ( n )u ( n ) 1 ≈ ---------------------------------  1 – --------------------------------- λ + π ( n )u ( n )  λ + π ( n )u ( n )

(1)

where it is noted that πq ( n ) = π ( n ) + ηπ ( n ) Next, we note that P q ( n-1 )u ( n ) k q ( n ) = ------------------------------κq( n )

(2)

Let P q ( n-1 ) = P ( n-1 ) + η P ( n-1 )

(3)

Therefore, using Eqs. (1) and (3) in (2): ηk ( n ) = kq ( n ) – k ( n )

293

η π ( n )u ( n ) 1 = [ ( P ( n-1 ) + η P ( n-1 ) )u ( n ) ] ---------------------------------  1 – --------------------------------- – k ( n ) λ + π ( n )u ( n )  λ + π ( n )u ( n ) P ( n-1 )u ( n ) η P ( n-1 )u ( n ) P ( n-1 )u ( n )η π ( n )u ( n ) ≈ --------------------------------- + --------------------------------- – -------------------------------------------------------- – k ( n ) 2 λ + π ( n )u ( n ) λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) η P ( n-1 )u ( n ) P ( n-1 )u ( n )η π ( n )u ( n ) = --------------------------------- – -------------------------------------------------------2 λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H

We next calculate (using the relation η π ( n ) = u ( n )η P ( n-1 ) ) η P′ ( n ) = η k ( n )π ( n ) + k ( n )η π ( n ) η P ( n-1 )u ( n )π ( n ) P ( n-1 )u ( n )η π ( n )u ( n )π ( n ) P ( n-1 )u ( n )η π ( n ) ≈ -------------------------------------------- – -------------------------------------------------------------------- + -------------------------------------------2 λ + π ( n )u ( n ) λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H

η P ( n-1 )u ( n )π ( n ) P ( n-1 )u ( n )u ( n )η P ( n-1 )u ( n )π ( n ) = -------------------------------------------- – ----------------------------------------------------------------------------------------2 λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H

P ( n-1 )u ( n )u ( n )η P ( n-1 ) + ----------------------------------------------------------------λ + π ( n )u ( n ) Finally, we calculate 1 η P ( n ) = --- ( η P ( n-1 ) – η P′ ( n-1 ) ) λ H

η P ( n-1 )u ( n )π ( n ) P ( n-1 )u ( n )u ( n )η P ( n-1 )u ( n )π ( n ) 1 = ---  η P ( n-1 ) – -------------------------------------------- + ----------------------------------------------------------------------------------------- 2 λ λ + π ( n )u ( n )  ( λ + π ( n )u ( n ) ) H

P ( n-1 )u ( n )u ( n )η P ( n-1 ) – ----------------------------------------------------------------- λ + π ( n )u ( n )  We now note that

294

P ( n-1 )u ( n ) --------------------------------- = k ( n ) λ + π ( n )u ( n ) H

π ( n ) = u ( n )P ( n-1 ) Hence, assuming that P(n-1) is Hermitian: H H H 1 η P ( n ) = --- η P ( n-1 ) – η P ( n-1 )u ( n )k ( n ) + k ( n )u ( n )η P ( n-1 )u ( n )k ( n ) λ

(

H

– k ( n )u ( n )η P ( n-1 )

)

H H H 1 1 = --- ( I – k ( n )u ( n ) )η cP ( n ) – --- ( I – k ( n )u ( n ) )η P ( n-1 )u ( n )k ( n ) λ λ H H H 1 = --- ( I – k ( n )u ( n ) )η P ( n-1 ) ( I – k ( n )u ( n ) ) λ T

From this result we readily see that η P ( n ) = η P ( n ) which demonstrates the symmetrypreserving-property of the RLS algorithm summarized in Table 13.1. 13.6

The condition for persistent excitation (assuming real data) may be expressed as n

aI ≤

∑λ

n-i

T

u ( i )u ( i ) ≤ bI

i=n 0

where a and b are both positive numbers. Premultiplying by zT and postmultiplying by z: T

az z ≤

n

∑λ

n-i T

T

T

z u ( i )u ( i )z ≤ bz z

i=n 0

We now recognize that T

T

z u ( i ) = u ( i )z Hence,

295

n

∑λ

T

az z ≤

2

n-i T

T

z u ( i ) ≤ bz z

i=n 0

For a nonzero vector z, this condition requires that we have T

z u(i) > α

for n 0 ≤ i ≤ n

This is another way of defining the condition for persistent excitation. 13.7

We start with the matrix relation: λ

1⁄2

λ

Φ

1⁄2

u(n)

1⁄2 H

1⁄2

Φ

1⁄2

(n)

d (n) Θ(n) = p (n) H –H ⁄ 2 1 (n) u ( n )Φ

T

H⁄2

Φ

H

p ( n-1 ) 0

λ

( n-1 )

( n-1 )

0

Φ

–H ⁄ 2

(n)

0 ξ ( n )γ γ

1⁄2

1⁄2

– k ( n )γ

(n)

(1)

(n)

–1 ⁄ 2

(n)

where Θ ( n ) is unitary rotation. Let X(n) = Φ

–H ⁄ 2

(n) + ηx(n)

(2)

where η x ( n ) represents the effect of round-off error. Assuming there are no additional local errors introduced at time n, the recursion pertaining to the bottom parts of the prearray and postarray of Eq. (1) takes on the following form: λ

–1 ⁄ 2

X ( n-1 ),

0 Θ ( n ) = X ( n ),

–y ( n )

(3)

The vector y(n) is the quantized version of k(n)γ-1/2(n): y ( n ) = k ( n )γ

–1 ⁄ 2

(n) + ηy(n)

(4)

where k(n) is the gain vector, γ(n) is the conversion factor, and η y ( n ) is the round-off error.

296

Substituting Eq. (2) and (4) inro (3), we may express the quantized version of the last rows in Eq. (1) as follows:

λ

–1 ⁄ 2 –H ⁄ 2 –1 ⁄ 2 Φ ( n-1 ) + λ η x ( n-1 ),

0 Θ(n)

= Φ – H ⁄ 2 ( n ) + η ( n ), x

– k ( n )γ

–1 ⁄ 2

(n) – ηy(n)

(5)

Under infinite-precision arithmetic, we have from the last rows of Eq. (1):

λ

–1 ⁄ 2 –H ⁄ 2 Φ ( n-1 )

0

Θ(n) =

Φ

–H ⁄ 2

( n ),

– k ( n )γ

–1 ⁄ 2

(n)

(6)

Hence, comparing Eqs. (5) and (6), we infer λ

–1 ⁄ 2

0 θ(n) = ηx(n)

η x ( n-1 )

–η y ( n )

(7)

Equation (7) reveals that the error propagation due to η x ( n-1 ) is NOT necessarily stable, in that the local errors tend to grow unboundedly. The unlimited error growth is due to (1) the amplification produced by the factor λ-1/2 for λ < 1, and (2) the fact that the unitary rotation Θ is independent of the error η x . Consequently, as the recursion progresses the –H ⁄ 2

1⁄2

stored values of Φ and Φ deviate more and more from each other’s Hermitian transpose, thereby contradicting the very premise on which the extended QR-RLS algorithm is based. 13.8

We start with the relations (see Problem 12.5): –1

–1 Φ M+1 ( n )

=

ΦM ( n ) T

0M

u M+1 ( n ) =

0M 0

H 1 + ----------------c M ( n )c ( n ) M BM ( n )

uM ( n ) u ( n-M )

Hence, –1

k M+1 ( n ) = Φ M+1 ( n )u M+1 ( n )

297

=

kM ( n ) 0

bM ( n ) + ----------------c M ( n ) BM ( n )

(1)

Let kM+1,M+1(n) denote the last element of the gain vector kM+1(n). Then, recognizing that the last element of cM(n) is unity by definition, we immediately deduce from Eq. (1) that bM ( n ) k M+1,M+1 ( n ) = ---------------BM ( n ) Normalizing with respect to γ M+1 ( n ) , we write k M+1,M+1 ( n ) bM ( n ) k˜ M+1,M+1 ( n ) = -------------------------------- = -------------------------------------γ M+1 ( n ) γ M+1 ( n )B M ( n )

(2)

Next, we use the relation (see part (d) of the solution to Problem 12.6) B M ( n-1 ) γ M+1 ( n ) = λ ---------------------γ M ( n ) BM ( n )

(3)

But B M ( n ) = λB M ( n-1 ) + b *M ( n ) β M ( n ) or b *M ( n )β M ( n ) λB M ( n-1 ) ------------------------- = 1 – -------------------------------BM ( n ) BM ( n ) We may therefore rewrite Eq. (3) in the equivalent form: b *M ( n )β M ( n )  γ M ( n ) =  1 – -------------------------------- BM ( n )  

–1

γ M+1 ( n )

The rescue variable is thus defined by γ M+1 ( n ) b *M ( n )β M ( n ) R = --------------------- = 1 – -------------------------------γ M (n) BM ( n )

(4)

298

Ideally, we have 0 < R < 1. Eliminating bM(n)/BM(n) between Eqs. (2) and (4), we may express the rescue variable in the equivalent form: R = 1 – γ M+1 ( n )k˜ M+1,M+1 ( n )β M ( n )

299

CHAPTER 14 14.1

In an adaptive equalizer, the input signal equals the channel output and the desired response equals the channel input (i.e., transmitted signal). In a stationary environment, both of these signals are stationary with the result that the error-performance surface is fixed in all respects. On the other hand, in a nonstationary environment, the channel output (i.e., equalizer input) is nonstationary with the result that both the correlation matrix R of the input vector and the cross-correlation vector p between the input vector and desired response take on time-varying forms. Consequently, the error-performance surface is continually changing its shape and is also in a constant state of motion.

14.2

In adaptive prediction applied to a nonstationary process, both the input vector (defined by a set of past values of the process) and the desired response (defined by the present value of the process) are nonstationary. Accordingly, in such a case the error-performance surface behaves in a manner similar to that described for adaptive equalization in Problem 14.1. Specifically, the error-performance surface constantly changes its shape and constantly moves. In contrast, the error-performance surface for the adaptive prediction of a stationary process is completely fixed.

14.3

We have, by definition, ˆ (n) – E[w ˆ (n)] ∈1( n ) = w ˆ ( n ) ] – wo ∈2( n ) = E [ w We may therefore expand H 1

H

ˆ ( n )- [ w ˆ (n)]) (E[w ˆ ( n ) ] – wo ) ] E [ ∈ ( n ) ∈2( n )] = E [ ( w H

H

ˆ (n)] – E[w ˆ ( n ) ]E [ w ˆ (n)] ˆ ( n )E [ w = E[w H

H

ˆ ( n )w o + E [ w ˆ ( n ) ]w o ] –w H

H

ˆ ( n )w o ] + E [ w ˆ ( n ) ]E [ w o ] = – E[w Invoking the assumption that w(n) and wo are statistically independent, we may go on to write H 1

H

H

ˆ ( n ) ]E [ w o ] ) + E [ w ˆ ( n ) ]E [ w o ] E [ ∈ ( n ) ∈2( n )] = ( – E [ w = 0

(1)

300

From this result we immediately deduce that we also have H 2

E [ ∈ ( n )∈1( n )] = 0

(2)

Finally, we note that E[ ∈ (n)

2

H

] = E [ ∈ (n) ∈ (n) ] H

= E[( ∈ 1 ( n )+ ∈ 2 ( n ) ) ( ∈ ( n )+ ∈ 2 ( n )] 1

= E [ ∈ 1( n )

2

H 1

]+E [ ∈ ( n ) ∈2( n ) ] H 2

+ E[ ∈ ( n ) ∈1( n )] = E[

∈1( n )

2

] + E[

∈2( n )

+ E[ 2

∈2( n )

2

]

]

where in the last line we have made use of Eqs. (1) and (2). 14.4

Invoking the low-pass filtering action of the LMS filter for small µ, we note that ∈1(n) and ∈2(n) are both independent of the input vector u(n). We may therefore write:

1.

H H   H H E[ ∈ 1 ( n )u ( n )u ( n ) ∈1( n )] = tr  E[ ∈ 1 ( n ) u ( n )u ( n ) ∈1( n )]    H   H = E  tr[ ∈ 1( n )u ( n )u ( n ) ∈1( n )]   

 H H = E  tr[u ( n )u ( n ) ∈1( n )∈ 1( n )] 

  

  H H = tr  E[u ( n )u ( n ) ∈1( n ) ∈ 1( n ) ]      H H = tr  E [ u ( n )u ( n ) ]E[ ∈1( n ) ∈ 1( n ) ]   

301

= tr [ RK 1 ( n ) ] 2. Similarly, we may show that H

H

H

E[ ∈ 2( n )u ( n )u ( n ) ∈ 2( n )]

3.

H

H

H

E[ ∈ 1( n )u ( n )u ( n ) ∈ 2( n )]

= tr [ RK 2 ( n ) ] H  H = tr  E[ ∈ 1( n )u ( n )u ( n )∈ 2( n )] 

  

  H H = tr  E [ u ( n )u ( n ) ]E[ ∈ 2( n ) ∈ 1( n )]      H = tr  RE[ ∈ 2( n ) ∈ 1( n )]   

(1)

Next we note that H

H

E[ ∈ 2( n ) ∈ 1( n )]

= 0

It follows therefore that H

H

H

E[ ∈ 1( n )u ( n )u ( n ) ∈ 2( n )]

= 0

Similarly, we may show that H

H

H

E[ ∈ 2( n )u ( n )u ( n ) ∈ 1( n )] 14.5

= 0

Evaluating the mean-square values of both sides of Eq. (14.27) yields 2

2

2

2

E [ v k ( n+1 ) ] = ( 1 – µλ k ) E [ v k ( n ) ] + E [ φ k ( n ) ] 2 2

2

2

= ( 1 – 2µλ k + µ λ k )E [ v k ( n ) ] + σ φ k

302

(1)

2 2

For small m, we may ignore the term µ λ k in comparison to the unity term, and so approximate Eq. (1) as 2

2

2

E [ v k ( n+1 ) ] ≈ ( 1 – 2µλ k )E [ v k ( n ) ] + σ φ k

(2)

Under steady-state conditions, v k ( n+1 ) → v k ( n ) as n → ∞ , in which case Eq. (2) reduces further to 2

2

2µλ k E [ v k ( n ) ] ≈ σ φ k 2 2

≈ µ σ ν λ u, k + λ ω, k 2

Solving for E [ v k ( n ) ] , we get 2

σν λ ω, k E [ v k ( n ) ] ≈ ------ µ + -------------2 2λ u, k 2

(3)

Hence, the mean-square deviation is (see Eq. (14.31)) 2

D ( n ) = E [ ε0 ( n ) ] M

=

∑ E [ vk ( n )

2

]

k=1

λ ω, k M 2 1 = ----- σ ν µ + --- ∑ ----------2 2 λ u, k M

k=1

–1 M 2 1 = ----- σ ν µ + --- tr [ R u R ω ] 2 2 H

where R u = E [ u ( n )u ( n ) ], H

R ω = E [ ω ( n )ω ( n ) ],

(4) u ( n ) = tap-input vector ω ( n ) = process noise

303

14.6

The misadjustment of the LMS algorithm is given by (see Eqs. (5.91) and (14.36) M

2 1 M ≈ ------ ∑ λ u, k E [ v k ( n ) ] 2 σ ν k=1

(1)

where M is the filter length. From the solution to Problem 14.5, we have 2 µ 2 1 λ ω, k E [ v k ( n ) ] ≈ --- σ ν + ------ ----------2µ λ 2

(2)

u, k

Substituting Eq. (2) into (1), we get M  µ 2 1 λ ω, k 1 M ≈ ------ ∑ λ u, k  --- σ ν + ------ ----------- 2 2µ λ u, k  2 σ ν k=1

µ = --2

M

M

1 ∑ λu, k + ------------2- ∑ λω, k 2µσ ν k=1 k=1

1 µ = --- + r [ R u ] + -------------tr [ R ω ] 2 2 2µσ

(3)

ν

which is the desired result. 14.7

To simplify the presentation, we use the following notations in the solution to this problem: R u = R and R w = Q (a) The minimum misadjustment for the LMS algorithm is LMS 1 M min = ------ tr [ R ]tr [ Q ] σν

(1)

The corresponding value for the RLS algorithm is RLS 1 M min = ------ Mtr [ RQ ] σν

(2)

304

For Q = c1R, Eqs. (1) and (2) yield the ratio: LMS

M min tr [ R ] ---------------- = ------------------------RLS 2 M min Mtr [ R ]

(3)

Now M

∑ λi qi qi

R =

H

i=1 M

2

R =

∑ λi qi qi 2

H

i=1

We may therefore write M

tr [ R ] =

∑ λi

(4)

i=1

2

tr [ R ] =

M

∑ λi

2

(5)

i=1

Let λ = [ λ 1, λ 2, …, λ M ] 1 = [ 1, 1, …, 1 ]

T

T

We may then reformulate Eqs. (4) and (5) as follows, respectively: T

tr [ R ] = λ 1 2

T

tr [ R ] = λ λ = λ

2

Applying the Cauchy-Schwarz inequality to the matrix product λT1:

305

T

2

λ 1 ≤ λ Since 1

2

2

⋅ 1

2

= M , it follows that

2

2

( tr [ R ] ) ≤ tr [ R ] ⋅ M or, equivalently, tr [ R ] -------------------------- ≤ 1 2 Mtr [ R ]

(6)

Accordingly, we may rewrite Eq. (3) as LMS

M min ---------------- ≤ 1 RLS M min or LMS

RLS

M min ≤ M min ,

Q = c1 R

(b) Consider next the minimum mean-square deviation as the criterion of interest. For the LMS algorithm, we have LMS

–1

D min = σ ν Mtr [ R Q ] and for the RLS algorithm: RLS

–1

D min = σ ν tr [ R ]tr [ Q ] Therefore, LMS

D min --------------- = RLS D min

–1

Mtr [ R Q ] --------------------------------–1 tr [ R ]tr [ Q ]

For Q = c2R-1,

306

LMS

–2 D min Mtr [ R ] --------------- = ---------------------------RLS –1 tr [ R ] D min

(7)

Since,

R

–1

M

=

∑ λi

–1

H

qi qi

i=1

and

R

–2

M

=

∑ λi

–2

H

qi qi ,

i=1

it follows that M

–1

tr [ R ] =

∑ λi

–1

i=1

and –2

M

tr [ R ] =

∑ λi

–2

i=1

Let –1

–1

–1 T

λ inv = [ λ 1 , λ 2 , …, λ M ] 1 = [ 1, 1, …, 1 ]

T

Hence, –1

T

–2

T

tr [ R ] = λ inv 1 tr [ R ] = λ inv λ inv = λ inv

2

307

T

Applying the Cauchy-Schwarz inequality to the matrix product λ inv 1 , we may write 2

T

λ inv 1 ≤ λ inv

2

1

2

That is, –1

2

–2

( tr [ R ] ) ≤ tr [ R ] ⋅ M or equivalently, –2

Mtr [ R ] ---------------------------- ≥ 1 –1 tr [ R ]

(8)

Accordingly, we may rewrite Eq. (7) as LMS

D min --------------- ≥ 1 RLS D min That is, LMS

RLS

D min ≥ D min , 14.8

Q = c2 R

–1

As with Problem 14.7, here again we simplify the presentation by using the notations R u = R and R ω = Q (a) LMS Algorithm For Q = c1R, or equivalently, R-1Q = c1I: (i)

LMS

–1

D min = σ ν M ( tr [ R Q ] ) = σ ν M ( tr [ c 1 I ] )

1⁄2

1⁄2

= σν M c1

(1)

308

1⁄2 –1 1 µ opt = ---------------- ( tr [ R Q ] ) σν M

=

(ii)

c1 ⁄ σν

(2)

LMS 1⁄2 1 M min = ------ ( tr [ R ]tr [ Q ] ) σν

c1 = ---------tr [ R ] σν

(3)

Givben the two-by-two correlation matrix

R =

r 11

r 21

r 21

r 22

we may write tr [ R ] = r 11 + r 22

(4)

Therefore, substituting Eq. (4) into (3): c 1 ( r 11 + r 22 ) LMS M min = -----------------------------------σν

(5)

The optimum step-size parameter is c1 1 tr [ Q ] 1 ⁄ 2 µ opt = ------  ------------- = --------σ ν  tr [ R ] σν

(6)

which is the same as the µopt for minimum DLMS. Consider next the case of Q = c2R-1: LMS

–1

(iii) D min = σ ν M ( tr [ R Q ] )

1⁄2

309

–2

= σ ν 2c 2 ( tr [ R ] )

1⁄2

(7)

R =

R

–1

r 11

r 21

r 21

r 22

1 r = ----- 22 ∆r –r 21

– r 21

2

∆ r = r 11 r 22 – r 21

,

r 11 2

R

–2

–1 –1

= R R

2

r 22 + r 21 1 = -----2 ∆r –r ( r + r ) 21 11 22

– r 21 ( r 11 + r 22 ) 2

2

r 11 + r 21

–2 2 2 1 2 tr [ R ] = ------ [ r 11 + 2r 21 + r 22 ] 2 ∆r

(8)

Substituting Eq. (8) into (7), and noting that M = 2: 2

LMS D min

2

2

r 11 + 2r 21 + r 22 = σ ν 2c 2 ------------------------------------------2 r 11 r 22 – r 21

(9)

The optimum step-size parameter is 1⁄2 –1 1 µ opt = ---------------- ( tr [ R Q ] ) σν M –2 1 ⁄ 2 1 c2 = ------ ----- ( tr [ R ] ) σν 2 2

2

2

1 c 2 r 11 + 2r 21 + r 22 = ------ ----- ------------------------------------------2 σν 2 r 11 r 22 – r 21 LMS 1⁄2 1 (iv) M min = ------ ( tr [ R ]tr [ Q ] ) σν

310

(10)

c2 –1 1 ⁄ 2 = --------- ( tr [ R ]tr [ R ] ) σν c2 r 11 + r 22 = --------- -----------------------------------------σν 2 1⁄2 ( r 11 r 22 – r 21 ) LMS

µ opt

(11)

1 tr [ Q ] 1 ⁄ 2 = ------  ------------- σ ν  tr [ R ] c 2 tr [ R – 1 ] 1 ⁄ 2 = ---------  ------------------- σ ν  tr [ R ]  c2 2 –1 ⁄ 2 = --------- ( r 11 r 22 – r 21 ) σν

(12)

which is different from the µopt for minimum DLMS. (b) RLS Algorithm For Q = c1R: (i)

RLS

–1

D min = σ ν ( tr [ R ]tr [ Q ] )

1⁄2

–1

= σ ν c 1 ( tr [ R ]tr [ R ] )

1⁄2

r 11 + r 22 = σ ν c 1 -----------------------------------------2 1⁄2 ( r 11 r 22 – r 21 )

(13)

1 tr [ Q ] 1 ⁄ 2 λ opt = 1 – ------  ------------------- σ ν  tr [ R – 1 ] c1 2 = 1 – --------- r 11 r 22 – r 21 σν

311

(14)

RLS 1⁄2 1 (ii) M min = ------ ( Mtr [ RQ ] ) σν

c1 2 1⁄2 , = --------- ( 2tr [ R ] ) σν

M=2

2c 1 2 2 2 = ------------ r 11 + 2r 21 + r 22 σν 1⁄2 1 1 , λ opt = 1 – ------  ----- tr [ RQ ]  σν  M

(15)

M=2

2 2 1 c1 2 = 1 – ------ ----- r 11 + 2r 21 + r 22 σν 2

(16)

which is different from the λopt for minimum DRLS. Consider next the case of Q = c2R-1 or equivalently RQ = c2I: RLS

–1

(iii) D min = σ ν ( tr [ R ]tr [ Q ] )

1⁄2

–1

= σ ν c 2 ( tr [ R ] ) r 11 + r 22 = σ ν c 2 ----------------------------2 r 11 r 22 – r 21

(17)

1 tr [ Q ] 1 ⁄ 2 λ opt = 1 – ------  ------------------- σ ν  tr [ R – 1 ] c2 2 = 1 – --------- r 11 r 22 – r 21 σν RLS 1⁄2 1 (iv) M min = ------ ( Mtr [ RQ ] ) , σν

(18)

M=2

312

2 = ------ c 2 σν

(19)

1 1 λ opt = 1 – ------  ----- tr [ c 2 I ] ,  σν  M

M=2

c2 = 1 – --------σν

(20)

RLS

which is different from the λopt for minimum D min . Comparisons of LMS and RLS algorithms: 1.

Q = c1 R LMS

2

D min 2 r 11 r 22 – r 21 --------------- = ------------------------------------RLS r 11 + r 22 D min LMS

M min r 11 + r 22 ---------------- = --------------------------------------------------RLS 2 2 2 M min 2 r 11 + 2r 21 + r 22 2.

Q = c2 R LMS

D min --------------- = RLS D min

–1

2

2

2

r 11 + 2r 21 + r 22 2 ------------------------------------------r 11 + r 22

LMS

M min r 11 + r 22 ---------------- = ------------------------------------RLS 2 M min 2 r 11 r 22 – r 21

313

CHAPTER 15 15.1

For the output error method M

N

∑ ai ( n )u ( n-i ) + ∑ bi ( n )y ( n-i )

y(n) =

i=0

(1)

i=1

Taking the z-transform of both sizes of Eq. (1) Y ( z ) = A ( z )U ( z ) + B ( z )Y ( z )

(2)

For the equation error method y′ ( n ) =

=

=

M

N

i=0

i=1

M

N

i=0

i=1

M

N

∑ ai ( n )u ( n-i ) + ∑ bi ( n )d ( n-i ) ∑ ai ( n )u ( n-i ) + ∑ bi ( n ) [ e′ ( n-i ) + y′ ( n-i ) ] N

∑ ai ( n )u ( n-i ) + ∑ bi ( n )e′ ( n-i ) + ∑ bi ( n )y′ ( n-i ) i=0

i=1

(3)

i=1

Taking the z-transform of both sides of Eq. (3): Y′ ( z ) = A ( z )U ( z ) + B ( z ) E′ ( z ) + B ( z )Y′ ( z ) = A ( z )U ( z ) + B ( z ) ( 1 – B ( z ) )E ( z ) + B ( z )Y′ ( z ) ( 1 – B ( z ) )Y′ ( z ) = A ( z )U ( z ) + B ( z ) ( 1 – B ( z ) )E ( z ) 1 Y′ ( z ) = -------------------- A ( z )U ( z ) + B ( z )E ( z ) , 1 – B(z) 1 which explains why the transfer function -------------------- comes in. 1 – B(z)

314

(4)

15.2

The approximation used in Eqs. (15.18) and (15.19), reproduced here for convenience of presentation, ∂y ( n- j ) α j n ≈ α 0 ( n- j ) = ---------------------∂a 0 ( n- j )

j = 1, 2, …, M j = 2, …, N

β j ( n ) ≈ β 1 ( n- j+1 )

is based on the following observation: When the adaptive filtering algorithm reaches a convergent point, then the parameters of the filter could be held constant, at which point the two equations become exact. Hence, ∂y ( n ) ∂y ( n- j ) α j ( n ) = ---------------- ≈ ------------------- = α 0 ( n- j ) ∂a j ( n ) ∂a 0 ( n ) ∂y ( n ) ∂y ( n- j+1 ) β j ( n ) = ---------------- ≈ -------------------------- = β 1 ( n- j+1 ) ∂b 1 ( n ) ∂b j ( n ) 15.3

j = 1, 2, …, M

j = 2, …, N

To mitigate the stability problem in adaptive IIR filters, the following measures can be born in mind in formulating the algorithm: 1. Use of the “equation error method” to replace “output error method” for rough approximation. 2. Use of a lattice structure for the IIR filter1 3. Combine IIR and FIR structures to devise hybrid filter, e.g., Laguerre transversal filter.

15.4

LMS Algorithm for Laguerre transversal filter: Initialization: Initialize the weights w0,w1,...,wM by setting them equal to zero, or else by assigning them small randon values. Computation: –1

z –α L ( z, α ) = -------------------–1 1 – αz L i ( z , α ) = L o ( z, α ) ( L ( z , α ) ) U i ( z, α ) = L i ( z, α )U ( z )

1.

i

i = 0, 1, 2, … i = 0, 1, 2, …

Miao, K.X., et al., IEEE Trans. Signal Processing, 42, pp. 721-742, 1994.

315

M

y(n) =

∑ wi Z

–1

M

[ U i ( z, α ) ] =

i=0

∑ wi ui ( n, α ) i=0

e(n) = d (n) – y(n) Update: –1

w i ( n+1 ) = w i ( n ) + µ˜ z [ U i ( z, α ) ]e ( n ) = w i ( n ) + µ˜ u M ( n, α )e ( n ) 15.5

In order to prove the shift invariance property, we need to prove a Lemma first. Lemma: Let y1(n) and y2(n) be two random signals obtained by linear time-invariant filtering of the (wide-sense) stationary signal x(n), then     E  L [ y 1 ( n ) ] { L [ y 2 ( n ) ] } *  = E  y 1 ( n )y *2 ( n )      where L denotes the filtering operation. Proof: Let H1(z) and H2(z) be the filters used to get y1(n) and y2(n) from x(n). If y1(n) and y2(n) are jointly wide-sense stationary, and   * 1 dz E  y 1 ( n )y *2 ( n )  = -------- ∫ jω H 1 ( z )H 2 ( z )S x ( z ) ----° 2πj z z=e   where Sx(ejω) is the power spectrum of input signal x(n), then we may write  * E  L [ y1 ( n ) ] { L [ y2 ( n ) ] }    2 * 1 dz = -------- ∫ jω L 1 ( z ) H 1 ( z )H ( z )S x ( z ) ----2 ° 2πj z=e z * 1 dz = -------- ∫ jω H 1 ( z )H ( z )S x ( z ) ----2 2πj°z=e z

316

2

for L 1 ( z ) =1

  = E  y 1 ( n )y *2 ( n )    *

With this Lemma at hand, now lexamine ξ j ( n ) and ξ k ( n ) and assume j > k without loss of generality. Applying the Lemma given above, we have   * jω * jω jω 1 π E  ξ j ( n )ξ k ( n )  = ------ ∫ L j ( e )L k ( e )S x ( e ) dω 2π – π   j

–1  z–1 – a  z – a = 1 , for z = 1 it follows that Since L j ( z ) = L 0 ( z )  -------------------- , and ------------------–1  1 – az – 1 1 – az

  * E  ξ j ( n )ξ k ( n )    – jω jω 2  e 1 π –a = ------ ∫ L 0 ( e )  ----------------------- 2π – π  1 – ae – j ω

j-k

Sx(e



) dω

jω * jω jω 1 π = ------ ∫ L j – k ( e )L 0 ( e )S x ( e ) dω 2π – π

  * = E  ξ j – k ( n )ξ 0 ( n )    The above proof naturally holds for the real-valued situation as a special case: E { ξ j ( n )ξ k ( n ) } = E { ξ j – k ( n )ξ 0 ( n ) } 15.6

ξ ( n ) = [ ξ 0 ( n ), ξ 1 ( n ), …, ξ M ( n ) ]

T

From the solution to Problem 15.5, we know ξ j ( n ) satisfies the shift-invariant property E [ ξ j ( n )ξ k ( n ) ] = E [ ξ j – k ( n )ξ 0 ( n ) ] = C ξ ( j – k )

317

(1)

T

Now examine the correlation matrix R = E [ ξ ( n )ξ ( n ) ] which is a M-by-M matrix. For any element (j,k) of this matrix (1 < j, k < M) we know that Eq. (1) holds. For any (j+1, k+1) element, E [ ξ j+1 ( n )ξ k+1 ( n ) ] = E [ ξ j-k ( n )ξ 0 ( n ) ] = E [ ξ j ( n )ξ k ( n ) ] = Cξ( j – k ) For the (j+M, k+M) element we have E [ ξ j+M ( n )ξ k+M ( n ) ] = E [ ξ j ( n )ξ k ( n ) ] Thus the diagonal and sub-diagonal elements of R will be the same, hence R is a Toeplitz matrix. 15.7

(a) The derivation of the algorithm summarized in Table 15.3 is based on the Burg formula for reflection coefficients of a lattice predictor for real-valued data [see Eq. (2.7)] n

2 ∑ b m-1 ( i-1 ) f m-1 ( i ) i=1 κˆ m ( n ) = – ----------------------------------------------------------------------n 2 2 ∑ ( f m-1 ( i ) + bm-1 ( i-1 ) ) i=1

∆m ( n ) = – ---------------Dm ( n )

(1)

where n

∆ m ( n ) = 2 ∑ b m-1 ( i-1 ) f m-1 ( i ) i=1 n

Dm ( n ) =

∑ f m-1 ( i ) + bm-1 ( i-1 ) 2

2

i=1

For recursive computations of the numerator ∆ m ( n ) and denominator Dm(n), we proceed by writing

318

n

∆ m ( n ) = 2 ∑ b m-1 ( i-1 ) f m-1 ( i ) + 2b m-1 ( n-1 ) f m-1 ( n ) i=1

= ∆ m ( n-1 ) + 2b m-1 ( n-1 ) f m-1 ( n ) n

Dm ( n ) =

∑ ( f m-1 ( i ) + bm-1 ( i-1 ) ) + f m-1 ( n ) + bm-1 ( n-1 ) 2

2

2

2

i=1 2

2

= D m ( n-1 ) + f m-1 ( n ) + b m-1 ( n-1 ) To exercise designer control on these two recursive computations, we modify them respectively as follows: ∆ m ( n ) = λ∆ m ( n-1 ) + 2b m-1 ( n-1 ) f m-1 ( n ) 2

2

D m ( n ) = λD m ( n-1 ) + f m-1 ( n ) + b m-1 ( n-1 )

(2) (3)

where λ is a design parameter, 0 < λ < 1. Finally, we apply Eq. (15.35) of the text in accordance with the Laguerre formulation of the lattice filter: b˜ m-1 ( n ) = b m-1 ( n-1 ) + α ( b˜ m-1 ( n-1 ) – b m-1 ( n ) )

(4)

where α is another design parameter. Thus, using b˜ m-1 ( n ) in place of b m-1 ( n-1 ) in Eqs. (2) and (3), we get the entries in the first two lines of the time updates. All that remains for us to do is to use Eq. (1) to compute the reflection coefficient κˆ m ( n ) ; hence, the third line of the time updates. (b) Using GAL algorithm, compared to conventional GAL algorithm, the lattice structure has a faster initial convergence and less computational complexity.

319

CHAPTER 16 16.1

(a) The received signal of a digital communication system in baseband form is given by ∞

u(t ) =



x m h ( t – mT ) + v ( t )

m=-∞

where xk is the transmitted symbol, h(t) is the overall impulse response, T is the symbol period, and v(t) is the channel noise. Evaluating u(t) at times t1 and t2: ∞

u(t1) =



x m h ( t 1 – mT ) + v ( t 1 )

m=-∞ ∞

u(t2) =



x l h ( t 2 – lT ) + v ( t 2 )

m=-∞

Hence, the autocorrelation function of u(t) is r u ( t 1, t 2 ) = E [ u ( t 1 )u * ( t 2 ) ] ∞



∑ ∑ xm x*l h ( t 1 – mT )h ( t 2 – lT )

= E

m=-∞ l=-∞

+ E [ v ( t 1 )v * ( t 2 ) ] ∞

=



∑ ∑ r x ( mT – lT )h ( t 1 – mT )h* ( t 2 – lT )

(1)

m=-∞ l=-∞ 2

+ σv δ ( t 1 – t 2 ) where rx(mT - lT) is the autocorrelation function of the transmitted signal. From Eq. (1) we immediately see that r u ( t 1 + T , t 2 + T ) = r u ( t 1, t 2 ) which shows that u(t) is indeed cyclostationary in the wide sense.

320

(b) Applying the definitions α 1 T ⁄2 τ τ j2παt r u ( τ ) = --- ∫ r u  t + ---, t – --- e dt T –T ⁄ 2  2 2 α

Su ( ω ) =



α

∫–∞ r u ( τ )e

α = k ⁄ T,

– j 2πfτ

dτ,

ω = 2πf

k = 0, ± 1, ± 2, …

to the result obtained in part (a), we may show that k⁄T

Su

jω + jkπ ⁄ T jω jkπ ⁄ T 1 kπ = ---H ( e )H * ( e – e )S x  ω + ------  T T 2

+ σ v δ ( k ),

k = 0, ± 1, ± 2…

(2)

As a check, we see that for k = 0, Eq. (2) reduces to the standard result: jω 2 2 1 S u ( ω ) = --- H ( e ) S x + σ v T k⁄T

Let ψ k ( ω ) denote the phase response of S u and Φ ( ω ) denote the phase response of the channel. Then recognizing that the power spectral density Sx(ω) of the transmitted signal is real valued, we readily find from Eq. (2) that kπ kπ Ψ k ( ω ) = Φ  ω + ------ – Φ  ω – ------ ,   T T

k = 0, ± 1, ± 2, …

(c) From the formula for the inverse Fourier transform, we have jωτ 1 ∞ ψ k ( τ ) = ------ ∫ Ψ k ( ω )e dω 2π – ∞ jωτ 1 ∞ φ ( τ ) = ------ ∫ Φ k ( ω )e dω 2π – ∞

Applying these definitions to Eq. (3): ψ k ( τ ) = φ ( τ )e

– j πkτ ⁄ T

– φ ( τ )e

jπkτ ⁄ T

321

(3)

kπτ = – 2 jφ ( τ ) sin  --------- ,  T 

k = 0, ± 1, ± 2, …

(4)

On the basis of Eq. (4), we may make two important observations: (1) For k = 0 and kτ/T equal to an integer, ψ k ( τ ) is identically zero. For these values of k, φ(τ) cannot be determined. This means that for an arbitrary channel, the unknown phase response Φ ( f ) cannot be identified for k = 0 or kτ/T = integer by using cyclostationary second-order statistics of the channel output. (2) For k = +2 and higher in absolute value, the use of ψ k ( τ ) does not reveal any more information about the channel phase response than what can be obtained with k = +1. We may therefore just as well work with k = 1, for which ψ k ( τ ) has the largest support, as shown by πτ ψ 1 ( τ ) = – 2 jφ ( τ ) sin  ------ T That is, jψ 1 ( τ ) φ ( τ ) = -----------------------------2 sin ( πτ ⁄ T ) which shows that φ(τ) is identifiable from ψ 1 ( τ ) except for τ = mT, where m is an integer. 16.2

In the noise-free case, we have (1)

U n = Hx n

where H is the LN-by-(M+N) multichannel filtering matrix, xn is the (M+N)-by-1 transmitted signal vector, and un is the LN-by-1 multichannel received signal vector. Let un be applied to a multichannel structure characterized by the (M+N)-by-LN filtering matrix T such that we have TU n = x n

(2)

This zero-forcing condition ensures the perfect recovery of xn from un. Substituting Eq. (1) into (2): (3)

TH = I

322

where I is the identity matrix. From Eq. (3) we immediately deduce that T = H

+

where H+ is the pseudoinverse of H.

The relationship between g and H on the one hand and h and gk on the other hand, as described in Eq. (16.27), namely, gkHHHgk = hHgkgkh follows directly from the two sets of definitions

H

,

(l)

=

( L-1 )

gk

0 … h M-1 … 0 (l)

(l)

0 … h0

… hM

(0) (1)

( L-1 )

(1)

Gk

( L-1 )

(l)

(l)

g k, 0 … g k, N -1 … (l)

, Gk

=

0

0

(l)

… g k, N -2 … (l)

0 0 ...

Gk

(0)

...

Gk

...

gk

Gk =

(l)

...

gk =

H

...

(1)

gk

(l)

...

H

...

H=

(l)

h0 … h M … 0

(0)

...

H

...

16.3

(l)

… g k, 0 … g k, N -1

323

h h h 16.4

(1)

...

h =

(0)

( L-1 )

For a noiseless channel, the received signal is M

(l)

un =

(l)

∑ hn

x n-m,

l = 0, 1, …, L-1

(1)

m=0

By definition, we have M

( l ) –m

∑ hm z

l

H (z) =

m=0

where z is the unit-delay operator. We therefore rewrite Eq. (1) in the equivalent form: (l)

l

un = H ( z ) [ xn ]

(2)

where H(l)(z) acts as an operator. Multiplying Eq. (2) by G(l)(z) and then summing over l: L-1

∑G

l

(l) ( z ) [ un ]

l=0

L-1

=

∑ G ( z )H ( z ) [ xn ] l

l

(3)

l=0

According to the generalized Bezout identity: L-1

∑ G ( z )H ( z ) l

l

= 1

l=0

We may therefore simplify Eq. (3) to L-1

(l)

∑ G ( z ) [ un l

] = xn

l=0

Let

324

y

(l)

(l)

l

( n ) = G ( z ) [ u n ],

l = 0, 1, …, L-1

Let G(l)(z) be written in the expanded form: M

( l ) –i

∑ gi

l

G (z) =

z

i=0

Hence,

y

(l)

M

( l ) –i

∑ gi

(n) =

(l)

z [ un ]

i=0 M

(l) (l) un –i

∑ gi

=

i=0

From linear prediction theory, we recognize that (l) uˆ n+1

M

=

(l) (l) un –i

∑ gi i=0

It follows therefore that y

(l)

–i

(l)

( n ) = z [ uˆ n+1 ]

and so we may rewrite L-1

xn =

∑z

–i

(l)

[ uˆ n+1 ]

l=0

16.5

We are given 1 ∞ xˆ = -------------- ∫ x f V ( y – c 0 x ) f X ( x ) dx f Y ( y ) –∞ where

325

 f X ( x) =  1 ⁄ 2 3  0,

– 3≤x< 3 otherwise

2 2 – υ ⁄ 2σ 1 --------------e f V (υ) = , 2πσ

and ∞

∫–∞ f X ( x ) f V ( y – c0 x ) dx

f Y ( y) = Hence,

∫–

2

– ( y – c 0 x ) ⁄ 2σ

3

2

dx xe xˆ = ---------------------------------------------------------2 2

∫–

3

3

e

– ( y – c 0 x ) ⁄ 2σ

(1)

dx

3

Let t = ( y – c0 x ) ⁄ σ dt = – c 0 d x ⁄ σ Then, we may rewrite Eq. (1) as ( y+ 3c 0 ) ⁄ σ

2 –t ⁄ 2 σ - ( y – tσ )e dt ∫( y – 3c0 ) ⁄ σ ---2 c0 xˆ = -----------------------------------------------------------------------------( y+ 3c 0 ) ⁄ σ σ – t 2 ⁄ 2 -e dt ∫( y – 3c0 ) ⁄ σ ---c0

( y+ 3c 0 ) ⁄ σ

2

–t ⁄ 2

dt te ∫ 1 σ ( y – 3c 0 ) ⁄ σ = ----- y – ----- ------------------------------------------------------c0 c 0 ( y+ 3c 0 ) ⁄ σ – t 2 ⁄ 2 e dt ∫

(2)

( y – 3c 0 ) ⁄ σ

Next we recognize the following two results:

326

( y+ 3c 0 ) ⁄ σ

∫( y –

3c 0 ) ⁄ σ

2

te

–t ⁄ 2

2

dt =

=

( y+ 3c 0 ) ⁄ σ – t 2 ⁄ 2

∫( y –

3c 0 ) ⁄ σ

e

dt =

=

e

–t ⁄ 2

( y + 3c 0 ) ⁄ σ ( y – 3c 0 ) ⁄ σ

 y – 3c 0    y+ 3c 0 2π  Z  ------------------- – Z  -------------------   σ    σ  ∞

∫( y –

2

3c 0 ) ⁄ σ

e

–t ⁄ 2



dt – ∫

( y + 3c 0 ) ⁄ σ

σ Z ( ( y + 3c 0 ) ⁄ σ ) – Z ( ( y – 3c 0 ) ⁄ σ ) 1 xˆ = ----- y + ----- ------------------------------------------------------------------------------------------c 0 Q ( ( y – 3c ) ⁄ σ ) – Q ( ( y + 3c ) ⁄ σ ) c0 0 0 For convergence of a Bussgang algorithm: E [ y ( n )y ( n + k ) ] = E [ y ( n )g ( y ( n + k ) ) ] For large n: 2

E [ y ( n ) ] = E [ y ( n )g ( y ( n ) ) ] For y(n) = x(n) to achieve perfect equalization: 2

E [ xˆ ] = E [ xg ( x ) ] With xˆ being of zero mean and unit variance, we thus have E [ xˆ g ( x ) ] = 1 16.7

We start with (for real-valued data) y(n) =

∑ wˆ i ( n )u ( n – i ) i

T

–t ⁄ 2

dt

2π { Q ( ( y – 3c 0 ) ⁄ σ ) – Q ( ( y + 3c 0 ) ⁄ σ ) }

We may therefore rewrite Eq. (2) in the compact form:

16.6

2

e

T

ˆ ( n )u ( n ) = u ( n )w ˆ (n) = w

327

Let y = [ y ( n 1 ), y ( n 2 ), …, y ( n K ) ]

T

T

u ( n1 ) T

u ( n2 ) ...

U =

T

u ( nK ) ˆ ( n ) has a constant value w ˆ , averaged over the whole block of data, then Assuming that w ˆ y = Uw ˆ is The solution for w +

ˆ = U y w where U+ is the pseudoinverse of the matrix U.

16.8

(a) For the binary PSK system, a plot of the error signal versus the equalizer output has the form shown by the continuous curve in Fig. 1:

e(n)

2 1.5

SE-CMA

1.0 0.5 0 CMA

-0.5 -1 -1.5 2

y(n) -2

-1

0

1

Fig. 1

328

2

(b) The corresponding plot for the signed-error (SE) version of the CMA is shown by the dashed line in Fig. 1 (c) CMA is a stochastic algorithm minimizing the Godard criterion 2 2 1  J cm = --- E  [ y n – R 2 ]  4  

where the positive constant R2 is the dispersion constant, which is chosen in accordance with the source statistics. For a fractionally spaced equalizer (FSE) update algorithm, the algorithm is described by *

2

2

γ

2

{

w ( n+1 ) = w ( n ) + µu ( n )y ( n )γ – y ( n ) , ψ ( yn )

= 1 + R2

where µ is small positive step size and ψ ( . ) is called the CMA error function. The signed-error (SE)-CMA algorithm is described as follows: w ( n+1 ) = w ( n ) + µu ( n ) sgn ψ ( y ( n ) ) where sgn ( . ) denotes the signum function. Specifically,  sgn ( x ) =  1 for x > 0  – 1 for x < 0 The SE-CMA is computationally more efficient than CMA. 16.9

The update function for the constant modulus algorithm (CMA) is described by ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e * ( n ) w where 2

2

e ( n ) = y ( n ) ( γ – y ( n ) ),

γ

2

= 1 + R2

In quadriphase-shift keying (QPSK), the output signal y(n) is complex-valued, as shown by

329

y ( n ) = y I ( n ) + jy Q ( n ),

y I ( n ) = in-phase component y Q ( n ) = quadrature component

Hence, 2

2

eI ( n ) = yI ( n ) ( R2 – yI ( n ) – yQ ( n ) ) 2

2

eQ ( n ) = yQ ( n ) ( R2 – yI ( n ) – yQ ( n ) ) For the signed CMA, we thus have ˆ ( n+1 ) = w ˆ ( n ) + µu ( n ) sgn e ( n ) w ˆ ( n+1 ) + µu ( n ) sgn [ e I ( n ) + je Q ( n ) ] = w The weights are complex valued. Hence, following the analysis presented in Section 5.3 of the text, we may write ˆ I ( n ) + µ ( u I ( n ) sgn [ e I ( n ) ] – u Q ( n ) sgn [ e Q ( n ) ] ) ˆ I ( n+1 ) = w w

(1)

ˆ Q ( n ) + µ ( u Q ( n ) sgn [ e I ( n ) ] – u I ( n ) sgn [ e Q ( n ) ] ) ˆ Q ( n+1 ) = w w

(2)

where ˆ (n) = w ˆ I ( n ) + jw ˆ Q(n) w u ( n ) = u I ( n ) + ju Q ( n ) The standard version of the complex CMA is as follows: ˆ I ( n+1 ) = w ˆ I ( n ) + µ ( u I ( n )e I ( n ) – u Q ( n )e Q ( n ) ) w

(3)

ˆ Q ( n ) + µ ( u Q ( n )e I ( n ) – u I ( n )e Q ( n ) ) ˆ Q ( n+1 ) = w w

(4)

Both versions of the CMA, the signed version of Eqs. (1) and (2) and the standard version of Eqs. (3) and (4), can now be treated in the same way as the real-valued CMA in Problem 16.8.

330

16.10 Dithered signed-error version of CMA, hereafter referred to as DSE-CMA: (a) According to quantization theory, the operator αsgn(v(n)) has an effect equivalent to that of the two-level quantizer:  Q(v(n)) =  ∆ ⁄ 2  –∆ ⁄ 2

v(n) ≥ 0 v(n) > 0

where v ( n ) = e ( n ) + αε ( n ) Furthermore, since the samples of the dither ε ( n ) are i.i.d over the interval [-1, 1], { αε ( n ) } satisfies the requirement for a valid dither process if the constant α satisfies the requirement: α ≥ e(n) ε(n)

w(n) x(n)

+

sgn

x(n)

+

Fig. 1 The equivalent model is illustrated in Fig. 1. Hence, we may rewrite the DSE-CMA update formula: w ( n+1 ) = w ( n ) + µu ( n ) ( e ( n ) + ε ( n ) )

(1)

Also since ε ( n ) is an uncorrelated random process, its first moment is defined by E [ε(n) e(n)] = E [ε(n)] = 0

(2)

Taking the expectation of the DSE-CMA error function, we find that it is a hardlimited version of the e(n), as shown by E [v(n) y(n)]  α,  =  e ( n ),   – α,

v(n) > 0 v(n) ≤ α v ( n ) < –α

which follows from Eqs. (1) and (2).

331

16.11 (a) The Shalvi-Weinstein equalizer is based on the cost function 4

2

2

J ( n ) = E [ y ( n ) ] subject to the constraint E [ y ( n ) ] = σ x 2

where y(n) is the equalizer output and σ x is the variance of the original data sequence applied to the channel input. Applying the method of Lagrange multipliers, we may define a cost function for the Shalvi-Weinstein equalizer that incorporates the constraint as follows: 4

2

2

J SW ( n ) = E [ y ( n ) ] + λ ( E [ y ( n ) ] – σ x )

(1)

where λ is the Lagrange multiplier. From Eq. (16.105) of the text, we find that the cost function for the Godard algorithm takes the following form for p = 2: 2

J G = E ( y ( n ) – R2 ) 4

2

2

2

= E [ y ( n ) ] – 2R 2 E [ y ( n ) ] + R 2

(2)

where R2 is a positive real constant. Comparing Eqs. (1) and (2), we see that these two cost functions have the same mathematical form. Hence, we may infer that these two equalization algorithms share the same optimization criterion. (b) For a more detailed discussion of the equivalence between the Godard and ShalviWeinstein algorithms, we may proceed as follows: First, rewrite the tap-weight vector of the equalizer in polar form (i.e., a unit-norm vector times a radial scale factor), and then optimize the Godard cost function with respect to the radial factor. The “reduced” cost function that results from this transformation is then recognized as a monotonic transformation of the corresponding Shalvi-Weinstein cost function. Since the transformation relating these two criteria is monotonic, their stationary points and local/global minima coincide to within a radial factor. By taking this approach, the equivalence between the Godard and ShalviWeinstein equalizers seem to hold under general conditions (linear or nonlinear channels, iid or correlated data sequences applied to the channel input, Gaussian or non-Gaussian channel noise, etc).1

1.

For further details on the issues raised herein, see P.A. Regalia, “On the equivalence between the Godard and Shalvi-Weinstein schemes of blind equalization”, Signal Processing, vol. 73, pp.185-190, 1999.

332

16.12 For the derivation of Eq. (16.116), see the Appendix of the paper by Johnson et al.2 Note, however, the CMA cost function for binary PSK given in Eq. (63) of that paper is four times that of Eq. (16.116).

2.

C.R. Johnson, et al., “Blind equalization using the constant modulus criterion: A review”, Proc. IEEE, vol. 86, pp. 1927-1950, October 1998.

333

CHAPTER 17 17.1

(a) The complementary error function 1 x –t ⁄ 2 dt ϕ ( x ) = ---------- ∫ e 2π – ∞ qualifies as a sigmoid function for two reasons: 1. The function is a monotonically increasing function of x, with ϕ ( –∞ ) = 0 ϕ ( 0 ) = 0.5 ϕ(∞) = 1 For x = ∞, ϕ equals the total area under the probability density function of a Gaussian variable with zero mean and unit variable; this area is unity by definition. 2. The function ϕ(x) is continuously differentiable: 1 – x2 ⁄ 2 dϕ ------ = ----------e dx 2π (b) The inverse tangent function 2 –1 ϕ ( x ) = --- tan ( x ) π also qualifies as a sigmoid function for two reasons: 1. ϕ ( – ∞ ) = – 1 ϕ(0) = 0 ϕ ( ∞ ) = +1 2. ϕ(x) is continuously differentiable: 2 1 dϕ ------ = --- ------------------π dx 2 1+x The complementary error function and the inverse tangent function differ from each other in the following respects:

334

• • 17.2

The complementary error function is unipolar (nonsymmetric). The inverse tangent function is bipolar (antisymmetric).

The incorporation of a momentum modifies the update rule for sympatic weight wkj as follows: ∂E ( n ) ∆w kj ( n ) = α∆w kj ( n-1 ) – η --------------∂w

(1)

kj

where α η ε(n) n

= momentum constant = learning-rate parameter = cost function to be minimized = iteration number

Equation (1) represents a first-order difference equation in ∆w wj ( n ) . Solving it for ∆w kj ( n ) , we get n

∆w kj ( n ) = – η ∑ α

n-t ∂E ( t )

t=0

-------------∂w kj

(2)

For -1 < α < 0, we may rewrite Eq. (2) as n

∆w kj ( n ) = – η ∑ ( – 1 ) t=0

n-t

α

n-t ∂E ( t )

-------------∂w kj

Thus, the use of -1 < α < 0 in place of the commonly used value 0 < α < 1 is merely to introduce the multiplying factor (-1)n-t, which (for a specified n) alternates in algebraic sign as t increases. 17.3

Consider the real-valued version of the back-propagation algorithm, which is summarized in Table 17.2 of the text. In the backward pass, starting from the output layer, we see that as we progress from the output layer, the error signal decreases the further away we are from the output layer. This would then suggest that the learning rates used to adjust the weights in the multilayer perceptron should be increased to make up for the decrease in the error signal as we move away from the output layer. In so doing, the rate at which the learning process takes place in the different layers of the network is equalized, which is highly desirable.

335

17.4

We are given the time series 3

u(n) =

2

2

∑ ai v ( n-i ) + ∑ ∑ aij v ( n-1 )v ( n- j ) i=1

i=1 j=1

We may implement it as follows: v(n) z-1

. . . . . .

v(n-1)

a1

z-1

v(n-2)

17.5

a11 a21 + a12

X X

Σ

Σ

u(n)

a22

a2

z-1 v(n-3)

X

a3

Σ

The minimum description length (MDL) criterion strives to optimize the model order. In particular, it provides the best compromise between its two components: a likelihood function, and a penalty function. Realizing that model order has a direct bearing on model complexity, it may therefore be argued that the MDL criterion tries to match the complexity of a model to the underlying complexity of the input data. The risk R of Eq. (17.63) also has two components: one determined by the training data, and the other determined by the network complexity. Loosely speaking, the roles of these two components are comparable to those of the likelihood function and the penalty function in the MDL criterion, respectively. Increasing the first component of the risk R at the expense of the second component implies that the training data are highly reliable, whereas increasing the second component of R at the expense of the first component implies that the training data are of poor quality.

336

17.6

(a) Laguerre-based version of MLP has the following structure: wo,0 wo,1

Input signal u(n) Lo(z,α)

wo,2

v1 v2 v3

L(z,α)

Output y(n)

...

... ...

...

v4

desired output d(n)

L(z,α)

i=0,2,...,N-1

wo,N

vM

wM,N j=0,1,...,M-1

(b) The new BP algorithm can be devised for Laguerre-based version MLP in a manner similar to the LMS algorithm formulated for Langerre filter (Problem 4 in Chapter 15). The only difference between the new BP algorithm and the conventional BP algorithm lies in the adjustment of the input-to-hidden layer weights, and the calculation of each hidden unit output. Recall from the solution to Problem 4 in Chapter 15, we have For hidden unit output: N -1

hj =

∑ wi, j ui ( n, α )

i=0

N -1

=

∑z

–1

[ U i ( z, α ) ]

i=0 M-1

ϕ ( h j ) = tanh ( h j ) ,

j = 0, 1, …, M-1

  y = tanh  ∑ v j ϕ ( h j )  j=0 

337

For adaptation of weights 1. Output layer (for simplicity, consider only one output unit) M-1

  ∆v j ( n ) = µ˜ h j ( n )ϕ′  ∑ v j ϕ ( h j ) [ d – y ( n ) ]  j=0  M-1

  Bias ( n ) = µ˜ ϕ′  ∑ v j ϕ ( h j ) [ d – y ( n ) ]  j=0  2. Hidden layer (for simplicity, consider only one hidden layer) ∆w ij ( n ) = µ˜ u i ( n, α )ϕ′ ( h j ) ∑ v j ϕ′  ∑ v j h j [ d – y ( n ) ]   j

j

Bias ( n ) = µ˜ ϕ′ ( h j ) ∑ v j ϕ  ∑ v j h j [ d – y ( n ) ]   j

j

338