934 26 2MB
Pages 445 Page size 389 x 600 pts
OPTIMALITY CONDITIONS IN CONVEX OPTIMIZATION A Finite-Dimensional View
© 2012 by Taylor & Francis Group, LLC K13102_FM.indd 1
9/2/11 11:33 AM
OPTIMALITY CONDITIONS IN CONVEX OPTIMIZATION A Finite-Dimensional View
Anulekha Dhara Joydeep Dutta
© 2012 by Taylor & Francis Group, LLC K13102_FM.indd 3
9/2/11 11:33 AM
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20110831 International Standard Book Number-13: 978-1-4398-6823-2 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
© 2012 by Taylor & Francis Group, LLC
In memory of Professor M. C. Puri and Professor Alex M. Rubinov
© 2012 by Taylor & Francis Group, LLC
Contents
List of Figures
xi
Symbol Description
xiii
Foreword
xv
Preface
xvii
1 What Is Convex Optimization? 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Smooth Convex Optimization . . . . . . . . . . . . . . . . . 2 Tools for Convex Optimization 2.1 Introduction . . . . . . . . . . . . . . . . . . . 2.2 Convex Sets . . . . . . . . . . . . . . . . . . . 2.2.1 Convex Cones . . . . . . . . . . . . . . . 2.2.2 Hyperplane and Separation Theorems . 2.2.3 Polar Cones . . . . . . . . . . . . . . . . 2.2.4 Tangent and Normal Cones . . . . . . . 2.2.5 Polyhedral Sets . . . . . . . . . . . . . . 2.3 Convex Functions . . . . . . . . . . . . . . . . 2.3.1 Sublinear and Support Functions . . . . 2.3.2 Continuity Property . . . . . . . . . . . 2.3.3 Differentiability Property . . . . . . . . 2.4 Subdifferential Calculus . . . . . . . . . . . . . 2.5 Conjugate Functions . . . . . . . . . . . . . . . 2.6 ε-Subdifferential . . . . . . . . . . . . . . . . . 2.7 Epigraphical Properties of Conjugate Functions
1 1 2 15
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
23 23 23 39 44 50 54 60 62 71 75 85 98 111 122 136
3 Basic Optimality Conditions Using the Normal Cone 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Slater Constraint Qualification . . . . . . . . . . . . . . 3.3 Abadie Constraint Qualification . . . . . . . . . . . . . 3.4 Convex Problems with Abstract Constraints . . . . . . 3.5 Max-Function Approach . . . . . . . . . . . . . . . . . 3.6 Cone-Constrained Convex Programming . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
143 143 145 154 157 159 161
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
vii © 2012 by Taylor & Francis Group, LLC
viii
Contents
4 Saddle Points, Optimality, and Duality 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic Saddle Point Theorem . . . . . . . . . . . . . . . . . . 4.3 Affine Inequalities and Equalities and Saddle Point Condition 4.4 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . 4.5 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Equivalence between Lagrangian and Fenchel Duality . . . .
169 169 171 175 185 196 200
5 Enhanced Fritz John Optimality Conditions 207 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 5.2 Enhanced Fritz John Conditions Using the Subdifferential . 208 5.3 Enhanced Fritz John Conditions under Restrictions . . . . . 216 5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5.5 Enhanced Dual Fritz John Optimality Conditions . . . . . . 235 6 Optimality without Constraint Qualification 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . 6.2 Geometric Optimality Condition: Smooth Case . . 6.3 Geometric Optimality Condition: Nonsmooth Case 6.4 Separable Sublinear Case . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
243 243 249 255 274
7 Sequential Optimality Conditions 7.1 Introduction . . . . . . . . . . . . . . . . . . . . 7.2 Sequential Optimality: Thibault’s Approach . . 7.3 Fenchel Conjugates and Constraint Qualification 7.4 Applications to Bilevel Programming Problems .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
281 281 282 293 308
Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . .
315 315 315 320
9 Weak Sharp Minima in Convex Optimization 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Weak Sharp Minima and Optimality . . . . . . . . . . . . . .
327 327 328
10 Approximate Optimality Conditions 10.1 Introduction . . . . . . . . . . . . . . . . 10.2 ε-Subdifferential Approach . . . . . . . . 10.3 Max-Function Approach . . . . . . . . . 10.4 ε-Saddle Point Approach . . . . . . . . . 10.5 Exact Penalization Approach . . . . . . . 10.6 Ekeland’s Variational Principle Approach 10.7 Modified ε-KKT Conditions . . . . . . . 10.8 Duality-Based Approach to ε-Optimality
337 337 338 342 345 350 355 358 361
8 Representation of the Feasible Set and KKT 8.1 Introduction . . . . . . . . . . . . . . . . . . 8.2 Smooth Case . . . . . . . . . . . . . . . . . . 8.3 Nonsmooth Case . . . . . . . . . . . . . . . .
© 2012 by Taylor & Francis Group, LLC
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Contents 11 Convex Semi-Infinite Optimization 11.1 Introduction . . . . . . . . . . . . . . . . . . . 11.2 Sup-Function Approach . . . . . . . . . . . . . 11.3 Reduction Approach . . . . . . . . . . . . . . . 11.4 Lagrangian Regular Point . . . . . . . . . . . . 11.5 Farkas–Minkowski Linearization . . . . . . . . 11.6 Noncompact Scenario: An Alternate Approach
ix . . . . . .
365 365 366 368 374 382 395
12 Convexity in Nonconvex Optimization 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Maximization of a Convex Function . . . . . . . . . . . . . . 12.3 Minimization of d.c. Functions . . . . . . . . . . . . . . . . .
403 403 403 408
Bibliography
413
Index
423
© 2012 by Taylor & Francis Group, LLC
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
List of Figures
1.1 1.2 1.3
Lower semicontinuous hull. . . . . . . . . . . . . . . . . . . . Graph of a real-valued differentiable convex function. . . . . . Local minimizer is global minimizer. . . . . . . . . . . . . . .
11 16 18
2.1 2.2
Convex and nonconvex sets. . . . . . . . . . . . . . . . . . F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 nonconvex. . . . . . . . . . . . . . . . . . . . . . . . . . . Line segment principle. . . . . . . . . . . . . . . . . . . . . Tangent cone. . . . . . . . . . . . . . . . . . . . . . . . . . Normal cone. . . . . . . . . . . . . . . . . . . . . . . . . . Graph and epigraph of convex function. . . . . . . . . . . Epigraphs of improper functions φ1 and φ2 . . . . . . . . . Graph of ∂(|.|). . . . . . . . . . . . . . . . . . . . . . . . . Graph of ∂1 (|.|). . . . . . . . . . . . . . . . . . . . . . . .
24
2.3 2.4 2.5 2.6 2.7 2.8 2.9
. . are . . . . . . . . . . . . . . . .
26 33 55 57 63 76 126 126
3.1 3.2
NC (¯ x) is not polyhedral. . . . . . . . . . . . . . . . . . . . . . C ∩Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
153 165
5.1 5.2
Pseudonormality. . . . . . . . . . . . . . . . . . . . . . . . . . Not pseudonormal. . . . . . . . . . . . . . . . . . . . . . . . .
222 223
9.1
Pictorial representation of Theorem 9.6. . . . . . . . . . . . .
336
xi © 2012 by Taylor & Francis Group, LLC
Symbol Description
∅ ∞ N R ¯ R Rn R+ Rn+ [x, y] (x, y) RI R[I] [I]
R+ supp λ B Bδ (¯ x) cl F co F cl co F af f F int F ri F cone F F+ F◦ x→x ¯ lim lim inf lim sup h., .i
empty set infinity set of natural numbers real line R ∪ {−∞, +∞} n-dimensional Euclidean space nonnegative orthant of R nonnegative orthant of Rn closed line segment joining x and y open line segment joining x and y Q product space I R {λ ∈ RI : λi 6= 0 for finitely many i ∈ I} positive cone in R[I] {i ∈ I : λ ∈ R[I] , λi 6= 0} open unit ball open ball with radius δ > 0 and center at x ¯ closure of F convex hull of F closed convex hull of F affine hull of F interior of F relative interior of F cone generated by F positive polar cone of F polar cone of F x converges to x ¯ limit limit infimum limit supremum inner product
k.k φ(F ) gph Φ dom φ
norm image space of F under φ graph of set-valued map Φ effective domain of φ : X ¯ →R epi φ epigraph of φ lev≤α φ α-lower level set of φ δF indicator function to F dF distance function to F projF (¯ x) projection of x ¯ to F σ( . ; F ) support function to F φ∗ conjugate function of φ φ+ (x) max{0, φ(x)} ∇φ(¯ x) derivative or gradient of φ at x ¯ φ◦ (¯ x; d) Clarke directional derivative of φ at x ¯ in the direction d ∂φ partial derivative of φ with ∂x respect to x ∂2φ second-order partial deriva∂xi ∂xj tive of φ with respect to xi and xj ∂φ(¯ x) convex subdifferential of φ at x ¯ ∂ǫ φ(¯ x) ǫ-subdifferential of φ at x ¯ ∂ ◦ φ(¯ x) Clarke subdifferential or generalized gradient of φ at x ¯ ∇2 φ(¯ x) Hessian of φ at x ¯ Jφ(¯ x) Jacobian of φ at x ¯ TF (¯ x) tangent cone to F at x ¯ NF (¯ x) normal cone to F at x ¯ Nǫ,F (¯ x) ǫ-normal set to F at x ¯
xiii © 2012 by Taylor & Francis Group, LLC
Foreword
The roots of the mathematical topic of optimization go back to ancient Greece, when Euclid considered the minimal distance of a point to a line; convex sets were investigated by Minkowski about a hundred years ago, and fifty years ago, J.-J. Moreau [87] defined the notion of the subdifferential of a convex function. In 1970, R.T. Rockafellar wrote his monograph [97] on convex analysis. Since then, the field of convex optimization and convex analysis has developed rapidly, a huge number of papers on that topic have been published in scientific journals and a large number of monographs and textbooks have been produced. Now, we have a new book at hand and one can ask why read this book. A recent topic of research in mathematical optimization is the need to compute global optima of nonconvex problems. To do that, the problem can be convexified using the optimal function value of the resulting convex optimization problem as a bound for the problem investigated. Combining this with an enumeration idea the problem can be solved. The same approach of convexification plus enumeration can serve as a way to solve mixed-integer nonlinear optimization problems which is a second challenging problem of recent and future research. Moreover, many practical situations lead directly to convex programming problems. Hence the need to develop a deep insight into convex optimization. The theory of convex differentiable optimization is well established. Every student will be introduced in basic courses on mathematical optimization to the Fritz John and Karush–Kuhn–Tucker necessary optimality conditions. For guaranteeing the Karush–Kuhn–Tucker conditions a constraint qualification such as the Slater condition is needed. But, in many applications, this condition is violated. There are a larger number of ways out in such a situation: Abadie constraint qualification can be supposed, sequential optimality conditions can be used or we can try to filter out full information of (enhanced) Fritz John necessary optimality conditions. These nonstandard but essential parts of the theory of convex optimization need to be described in detail and in close relation to each other. Nonsmooth analysis (see, for example, Mordukhovich [86]) is a quickly developing area in mathematical optimization. The initial point of nonsmooth analysis is convex analysis, but recent developments in nonsmooth analysis are a good influence on convex analysis. The aim of this book is to develop deep insight into the theory of convex xv © 2012 by Taylor & Francis Group, LLC
xvi
Foreword
optimization, combining very recent ideas of nonsmooth analysis with standard and nonstandard theoretical results. Lagrange and Fenchel duality use different tools and can be applied successfully in distinct directions. But in the end, both are shown to coincide. If, at an optimal solution, no constraint qualification is satisfied, algorithms solving the Karush–Kuhn–Tucker conditions cannot be used to compute this point. And, how to characterize such a point? Roughly speaking one idea is the existence of a sequence outside of the feasible set with smaller objective function values converging to that point. These are the enhanced Fritz John necessary optimality conditions. A second idea is to characterize optimality via subgradients of the regular Lagrangian function at perturbed points converging to zero. This is the sequential optimality condition. Both optimality conditions work without constraint qualifications. ε-optimal solutions can be characterized using ε-subgradients. One special convex optimization problem is also investigated. This is the problem to find a best point within the set of optimal solutions of a convex optimization problem. If the objective function is convex, this is a convex optimization problem called a simple bilevel programming problem. It is easy to see that standard regularity conditions are violated at every feasible point. For this problem, a very general constraint qualification is derived. A last question is if convexity can successfully be used to investigate nonconvex problems as the maximization of convex functions or the minimization of a function being the difference of convex functions. Algorithmic approaches for solving convex optimization problems are not described in this book. This results in much more space for theoretical properties. The result is a book illuminating not only the body but also the bounds and corners of the theory of convex optimization. Many of the results presented are usually not contained in books on this topic. But, if more and more (applied) difficult optimization problems need to be solved, we are more likely be faced with instances where usual approaches fail. Then it is necessary to search away from standard tools for applicable approaches. I am sure that this book will be very helpful. I deeply recommend this book for advanced reading.
Stephan Dempe Freiberg, Germany
© 2012 by Taylor & Francis Group, LLC
Preface
This is a book on convex optimization. More precisely it is a book on the recent advances in the theory of optimality conditions for convex optimization. The question is why should one need an additional book on the subject? However, possibly the books on convex analysis are much more in number than the ones on convex optimization. In the books dealing with convex analysis, like the classic Convex Analysis by Rockafellar [97] or the the more recent Convex Analysis and Nonlinear Optimization by Borwein and Lewis [17], one would find convex optimization theory appears as an application to various results of convex analysis. However, from 1970 until now there has been a growing body of research in the area of optimality conditions for a convex optimization. Many of these results address the question as to what happens when the Slater condition fails for a convex optimization problem or are there very general constraint qualification conditions which hold even if the the most popular ones fail? The books on convex analysis usually do not present results of this type and thus these results remain largely scattered in the vast literature on convex optimization. On the other hand, the books dealing with convex optimization largely focus on algorithms or algorithms and theory associated with a certain special class of problems like second-order conic programming or semidedfinite programming. Some recent books like Introductory Lectures in Convex Optimization by Nesterov [90] or Lectures on Modern Convex Optimization by Ben-Tal and Nemirovskii [8] deal with algorithms and the special problems, respectively. This book has a completely different focus. It deals with optimality conditions in convex optimization. It attempts to bring in most of the important and recent results in this area that are scattered in the literature. However, we do not ignore the required convex analysis either. We provide a detailed chapter on the main convex analytic tools and also provide some new results that have appeared recently in the literature. These results are usually not found in standard books on convex analysis but they are essential in developing many of the important results in this book. This book actually began as a survey paper but then we realized that it has too much material to be considered as a survey; and then we thought of converting the survey paper into the form of a monograph. We would look to thank the many people who encouraged us to write the book. Professor Stephan Dempe agreed very kindly to write the foreword. Professor Boris Mordukhovich, Professor Suresh Chandra, Professor Juan Enxvii © 2012 by Taylor & Francis Group, LLC
xviii
Preface
rique Martinez-Legaz also encouraged us to go ahead and write the book. We are indeed grateful to them. We would also like to thank Aastha Sharma of Taylor & Francis, India, for superb handling of the whole book project and Shashi Kumar from the helpdesk of Taylor & Francis for helping with the formatting. We would also like to extend our deepest gratitude to our families for their support. Joydeep Dutta would like to thank his daughter Naina and his wife Lalty for their understanding and patience during the time this book was written. Anulekha Dhara would like to express her deepest and sincerest regards and gratitude to her parents Dr. Madhu Sudan Dhara and Dolly Dhara for their understanding and support. She would also like to thank the National Board for Higher Mathematics, Mumbai, India, for providing financial support during her tenure at the Indian Institute of Technology Kanpur, India. The book is intended for research mathematicians in convex optimization and also for graduate students in the area of optimization theory. This could be of interest also to the practitioner who might be interested in the development of the theory. We have tried our best to make the book free of errors. But to err is human, so we take the responsibility for any errors the readers might find in the book. We would also like to request that readers communicate with us by email at the address: [email protected]. We sincerely hope that the young researchers in the field of optimization will find this book helpful.
Anulekha Dhara Avignon, France Joydeep Dutta Kanpur, India
© 2012 by Taylor & Francis Group, LLC
Chapter 1 What Is Convex Optimization?
1.1
Introduction
Optimization is the heart of applied mathematics. Various problems encountered in the areas of engineering, sciences, management science, and economics are based on the fundamental idea of mathematical formulation. Optimization is an essential tool for the formulation of many such problems expressed in the form of minimization of a function under certain constraints like inequalities, equalities, and/or abstract constraints. It is thus rightly considered a science of selecting the best of the many possible decisions in a complex real-life environment. Even though optimization problems have existed since very early times, the optimization theory has settled as a solid and autonomous field only in recent decades. The origin of analytic optimization lies in the classical calculus of variations and is interrelated with the development of calculus. The very concept of derivative introduced by Fermat in the mid-seventeenth century via the tangent slope to the graph of a function was motivated by solving an optimization problem, leading to the Fermat stationary principle. Around 1684, Leibniz developed a method to distinguish between minima and maxima via second-order derivatives. The calculus of variations was introduced by Euler while solving the Brachistochrone problem, which was posed by Bernoulli in 1696. The problem is stated as “Given two points x and y in the vertical plane. A particle is allowed to move under its own gravity from x to y. What should be the curve along which the particle should move so as to reach y from x in the shortest time?” In 1759, Lagrange gave a completely different approach to solve the problems in calculus of variations, today known as the Lagrange multiplier rule. The Lagrange multipliers are viewed as the auxiliary variables that are primarily used to derive the optimality conditions for constrained optimization problems. These optimality conditions are the building blocks of optimization theory. During the second world war, Dantzig developed the simplex method to solve linear programming problems. The first attempt to develop the Lagrange multiplier rules for nonlinear optimization problem was made by Fritz John [71] in 1948. In 1951, Kuhn and Tucker [73] gave the Lagrange multiplier rule for convex and other nonlinear optimization problems involving differen1 © 2012 by Taylor & Francis Group, LLC
2
What Is Convex Optimization?
tiable functions. It was later found that Karush in 1939 had independently established the optimality conditions similar to those of Kuhn and Tucker. These optimality conditions are today famous as the Karush–Kuhn–Tucker (KKT) optimality conditions. All the initial theories were developed with the differentiability assumptions of the functions involved. Meanwhile, efforts were made to shed the differentiability hypothesis, thereby leading to the development of nonsmooth convex analysis as a subject in itself. This added a new chapter to optimization theory. The key contributors in the development of convexity theory are Fenchel [45], Moreau [88], and Rockafellar [97]. An important milestone in this direction was the publication of Convex Analysis by Rockafellar [97], where the theory of nonsmooth convex analysis was presented in detail for the first time. No wonder this text is by far a must for all optimization researchers. In the early 1970s, his student Clarke coined the term nonsmooth optimization to categorize the theory involving nondifferentiable optimization problems. He extended the calculus rules and applied them to optimization problems involving locally Lipschitz functions. This was just the beginning. The subsequent decade witnessed a large development in the field of nonsmooth nonconvex optimization. For details on nonsmooth analysis, one may refer to Borwein and Lewis [17]; Borwein and Zhu [18]; Clarke [27]; Clarke, Ledyaev, Stern and Wolenshi [28]; Mordukhovich [86]; and Rockafellar and Wets [101]. However, such developments have not overshadowed the importance of convex optimization, which still is and will remain a pivotal area of research. It has paved a path not only for theoretical improvements, but also algorithmic designing aspects. In this book we focus mainly on convex analysis and its application to the development of convex optimization theory.
1.2
Basic Concepts
By convex optimization we simply mean the problem of minimizing a convex function over a convex set. More precisely, we are concerned with the following problem min f (x)
subject to
x ∈ C,
(CP )
where f : Rn → R is a convex function and C ⊂ Rn is a convex set. Of course in most cases the the set C is described by a system of convex inequalities and affine equalities. Thus we can write C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m and hj (x) = 0, j = 1, 2, . . . , l}, where gi : Rn → R, i = 1, 2, . . . , m are convex functions and hj : Rn → R,
© 2012 by Taylor & Francis Group, LLC
1.2 Basic Concepts
3
j = 1, 2, . . . , l are affine functions. When C is expressed explicitly as above, (CP ) is called the convex programming problem. A set C ⊂ Rn is a convex set if for any x, y ∈ Rn , the line segment joining them, that is [x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1}, is also in C. A function φ : Rn → R is a convex function if for any x, y ∈ Rn and λ ∈ [0, 1], φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y), while it is an affine function if it is a translate of a linear function; that is, φ is affine if φ(x) = ha, xi + b, where a ∈ Rn and b ∈ R. It is important to note at the very outset that in optimization theory it is worthwhile to consider extended-valued functions, that is, functions that take ¯ = R∪{−∞, +∞}. The need to do so arises when we seek to convert values in R a constrained optimization problem into an unconstrained one. Consider for example the problem (CP ), which can be restated as min f0 (x) where f0 (x) =
subject to
x ∈ Rn ,
f (x), x ∈ C, +∞, otherwise.
All the modern books on convex analysis beginning with the classic Convex Analysis by Rockafellar [97] follow this framework. However, when we include infinities, we need to know how to deal with them. Most rules with infinity are intuitively clear except possibly 0 × (+∞) and ∞ − ∞. Because we will be dealing mainly with minimization problems, we will follow the convention 0 × (+∞) = (+∞) × 0 = 0 and ∞ − ∞ = ∞. This convention was adopted in Rockafellar and Wets [101] and we shall follow it. However, we would like to ascertain that we really need not get worried about ∞ − ∞ as the functions considered in this book are real-valued or proper functions. An extended¯ is said to be a proper function if φ(x) > −∞ for valued function φ : Rn → R n every x ∈ R and dom φ is nonempty where dom φ = {x ∈ Rn : φ(x) < +∞} is the domain of φ. It is worthwhile to note that the definition of a convex function given above can be extended to the case when φ is an extended-valued function. An ¯ is a convex function if for any x, y ∈ Rn extended-valued function φ : Rn → R and λ ∈ [0, 1], φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),
© 2012 by Taylor & Francis Group, LLC
4
What Is Convex Optimization?
with the convention that ∞ − ∞ = +∞. A better way to handle the convexity of an extended-valued convex function is to use its associated geometry. In ¯ which is this direction we describe the epigraph of a function φ : Rn → R, given as epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}. A function is said to be convex if the epigraph is convex. We leave it as a simple ¯ exercise for the reader to show that if the epigraph of a function φ : Rn → R n n is convex in R × R, then φ is a convex function over R . For more details see Chapter 2. In case of extended-valued functions, one can work with the semicontinuity of the functions rather than the continuity. Before we define those notions, we present certain notations that will be used throughout. For any two sets F1 , F2 ⊂ Rn , define F1 + F2 = {x1 + x2 ∈ Rn : x1 ∈ F1 , x2 ∈ F2 }. For any set F ⊂ Rn and any scalar λ ∈ R, λF = {λx ∈ Rn : x ∈ F }. The closure of a set F is denoted by cl F while the interior is given by int F . The open unit ball, or simply unit ball, is denoted by B. By Bδ (¯ x) we mean an open ball of radius δ > 0 with center at x ¯. Explicitly, Bδ (¯ x) = x ¯ + δB. For vectors x = (x1 , x2 , . . . , xn ) and y = P (y1 , y2 , . . . , yn ) in Rn , the inner n product of x and py is denoted by hx, yi = i=1 xi yi while the norm of x is given by kxk = hx, xi. We state a standard result on the norm.
Proposition 1.1 (Cauchy–Schwarz Inequality) For any two vectors x, y ∈ Rn , |hx, yi| ≤ kxkkyk. The above inequality holds as equality if and only if x = αy for some scalar α ∈ R. To discuss the concept of continuities of a function, we shall consider the notions of limit infimum and limit supremum of a function. But first we discuss the convergence of sequences in Rn . Definition 1.2 A sequence {xk ∈ R : k = 1, 2, . . .} or simply {xk } ⊂ R is said to converge to x ¯ ∈ R if for every ε > 0, there exists kε such that |xk − x ¯| < ε, ∀ k ≥ kε . A sequence {xk } ⊂ Rn converges to x ¯ ∈ Rn if the i-th component of xk
© 2012 by Taylor & Francis Group, LLC
1.2 Basic Concepts
5
converges to the i-th component of x ¯. The vector x ¯ is called the limit of {xk }. Symbolically it is expressed as xk → x ¯
or
lim xk = x ¯.
k→∞
The sequence {xk } ⊂ Rn is bounded if each of its components is bounded. Equivalently, {xk } is bounded if and only if there exists M ∈ R such that kxk k ≤ M for every k ∈ N. A subsequence of {xk } ⊂ Rn is a sequence {xkj }, j = 1, 2, . . ., where each xkj is a member of the original sequence and the order of the elements as in the original sequence is maintained. A vector x¯ ∈ Rn is a limit point of {xk } ⊂ Rn if there exists a subsequence of {xk } converging to x ¯. If the limit point is unique, it is the limit of {xk }. Next we state the classical result on the bounded sequences. Proposition 1.3 (Bolzano–Weierstrass Theorem) A bounded sequence in Rn has a convergent subsequence. For a sequence {xk } ⊂ R, define zr = inf{xk : k ≥ r}
and
yr = sup{xk : k ≥ r}.
It is obvious that the sequences {zr } and {yr } are nondecreasing and nonincreasing, respectively. If {xk } is bounded below or bounded above, the sequences {zr } or {yr }, respectively, have a limit. The limit of {zr } is called the limit infimum or lower limit of {xk } and denoted by lim inf k→∞ xk , while that of {yr } is called the limit supremum or upper limit of {xk } and denoted by lim supk→∞ xk . Equivalently, lim inf xk = lim { inf xr } k→∞
k→∞ r≥k
and
lim sup xk = lim {sup xr }. k→∞
k→∞ r≥k
For a sequence {xk }, lim inf k→∞ xk = −∞ if the sequence is unbounded below while lim supk→∞ xk = +∞ if the sequence is unbounded above. Therefore, {xk } converges to x ¯ if and only if −∞ < lim inf xk = x ¯ = lim sup xk < +∞. k→∞
k→∞
Now we move on to define the semicontinuities of a function that involve the limit infimum and limit supremum of the function. ¯ is said to be lower semicontinuous Definition 1.4 A function φ : Rn → R n (lsc) at x ¯ ∈ R if for every sequence {xk } ⊂ Rn converging to x ¯, φ(¯ x) ≤ lim inf φ(xk ). k→∞
Equivalently, φ(¯ x) ≤ lim inf φ(x), x→¯ x
© 2012 by Taylor & Francis Group, LLC
6
What Is Convex Optimization?
where the term on the right-hand side of the inequality denotes the limit infimum or the lower limit of the function φ defined as lim inf φ(x) = lim x→¯ x
inf
δ↓0 x∈Bδ (¯ x)
φ(x).
The function φ is lsc over a set F ⊂ Rn if φ is lsc at every x ¯ ∈ F. ¯ For a function φ : Rn → R,
inf
x∈Bδ (¯ x)
φ(x) ≤ φ(¯ x).
Taking the limit as δ ↓ 0 in the above inequality leads to lim inf φ(x) ≤ φ(¯ x). x→¯ x
Thus, the inequality in the above definition of lsc can be replaced by an ¯ is lsc at x equality, that is, φ : Rn → R ¯ if φ(¯ x) = lim inf φ(x). x→¯ x
Similar to the concept of lower semicontinuity and limit infimum, we next define the upper semicontinuity and the limit supremum of a function. ¯ is said to be upper semicontinuous Definition 1.5 A function φ : Rn → R n (usc) at x ¯ ∈ R if for every sequence {xk } ⊂ Rn converging to x ¯, φ(¯ x) ≥ lim sup φ(xk ). k→∞
Equivalently, φ(¯ x) ≥ lim sup φ(x), x→¯ x
where the term on the right-hand side of the inequality denotes the limit supremum or the upper limit of the function φ defined as lim sup φ(x) = lim sup φ(x). δ↓0 x∈Bδ (¯ x)
x→¯ x
The function φ is usc over a set F ⊂ Rn if φ is usc at every x ¯ ∈ F.
¯ is said to be continuous at x Definition 1.6 A function φ : Rn → R ¯ if it is lsc as well as usc at x ¯, that is, lim φ(x) = φ(¯ x).
x→¯ x
Alternatively, φ is continuous at x ¯ if for any ε > 0 there exists δ(ε, x ¯) > 0 such that |φ(x) − φ(¯ x)| ≤ ε
whenever
kx − x ¯k < δ(ε, x ¯).
The function φ is continuous over a set F ⊂ Rn if φ is continuous at every x ¯ ∈ F.
© 2012 by Taylor & Francis Group, LLC
1.2 Basic Concepts
7
Because we will be considering minimization problems, the continuity of a function will be replaced by lower semicontinuity. Before moving on, we state a result on the infimum and supremum operations from Rockafellar and Wets [101]. ¯ and Proposition 1.7 (i) Consider an extended-valued function φ : Rn → R n sets Fi ⊂ R , i = 1, 2 such that F1 ⊂ F2 . Then inf φ(x1 ) ≥ inf φ(x2 )
x1 ∈F1
sup φ(x1 ) ≤ sup φ(x2 ).
and
x2 ∈F2
x1 ∈F1
x2 ∈F2
¯ and a set F ⊂ Rn . Then (ii) Consider the functions φ1 , φ2 : Rn → R inf φ1 (x) + inf φ2 (x)
x∈F
x∈F
≤
x∈F
≤
x∈F
inf (φ1 + φ2 )(x)
sup (φ1 + φ2 )(x) ≤ sup φ1 (x) + sup φ2 (x). x∈F
x∈F
¯ and sets Fi ⊂ Rni , i = 1, 2, Also, for functions φi : Rni → R inf φ1 (x1 ) + inf φ2 (x2 ) =
x1 ∈F1
x2 ∈F2
sup φ1 (x1 ) + sup φ2 (x2 ) =
x1 ∈F1
x2 ∈F2
inf
(φ1 (x1 ) + φ2 (x2 )),
sup
(φ1 (x1 ) + φ2 (x2 )).
(x1 ,x2 )∈F1 ×F2
(x1 ,x2 )∈F1 ×F2
¯ and a set F ⊂ Rn . (iii) Consider an extended-valued function φ : Rn → R Then for λ ≥ 0, inf (λφ)(x) = λ inf φ(x)
x∈F
x∈F
and
sup (λφ)(x) = λ sup φ(x), x∈F
x∈F
provided 0 × (+∞) = 0 = 0 × (−∞). The next result from Rockafellar and Wets [101] gives a characterization of limit infimum of an arbitrary extended-valued function. ¯ Lemma 1.8 For an extended-valued function φ : Rn → R, ¯ : there exists xk → x lim inf φ(x) = min{α ∈ R ¯ satisfying φ(xk ) → α}. x→¯ x
Proof. Suppose that lim inf x→¯x φ(x) = α. ¯ We claim that for xk → x ¯ with φ(xk ) → α, α ≥ α ¯ . As xk → x ¯, for any δ > 0, there exists kδ ∈ N such that xk ∈ Bδ (¯ x) for every k ≥ kδ . Therefore, φ(xk ) ≥
inf
x∈Bδ (¯ x)
φ(x).
Taking the limit as k → +∞ in the above inequality, α≥
© 2012 by Taylor & Francis Group, LLC
inf
x∈Bδ (¯ x)
φ(x), ∀ δ > 0.
8
What Is Convex Optimization?
Because δ is arbitrarily chosen, so taking the limit δ ↓ 0 along with the definition of the limit infimum of φ leads to α ≥ lim inf φ(x), x→¯ x
that is, α ≥ α ¯. To prove the result, we shall show that there exists a sequence xk → x ¯ such that φ(xk ) → α ¯ . For a nonnegative sequence {δk }, define α ¯k =
inf
x∈Bδk (¯ x)
φ(x).
As δk → 0, by Definition 1.4 of limit infimum, α ¯k → α ¯ . Now for every k ∈ N, by the definition of infimum it is possible to find xk ∈ Bδk (¯ x) for which φ(xk ) is very close to α ¯ k , that is, in an interval [¯ αk , αk ] where α ¯ k < αk and αk → α ¯. Therefore, as k → +∞, xk → x ¯, and φ(xk ) → α ¯ , thereby establishing the result. After the characterization of limit infimum of a function, the result below gives an equivalent characterization of lower semicontinuity of the function in terms of the epigraph and lower level set. ¯ Then the following condiTheorem 1.9 Consider a function φ : Rn → R. tions are equivalent: (i) φ is lsc over Rn . (ii) The epigraph of φ, epi φ, is a closed set in Rn × R. (iii) The lower-level set lev≤α φ = {x ∈ Rn : φ(x) ≤ α} is closed for every α ∈ R. Proof. If φ ≡ ∞, the result holds trivially. So assume that dom φ is nonempty and thus epi φ and lev≤α φ are nonempty. We will first show that (i) implies (ii). Consider a sequence {(xk , αk )} ⊂ epi φ such that (xk , αk ) → (¯ x, α). ¯ Therefore, φ(xk ) ≤ αk , which implies that lim inf φ(x) ≤ lim inf φ(xk ) ≤ α ¯. x→¯ x
k→∞
By the lower semicontinuity of φ, φ(¯ x) = lim inf φ(x), x→¯ x
which reduces the preceding condition to φ(¯ x) ≤ α, ¯ thereby proving that epi φ is a closed set in Rn × R. Next we show that (ii) implies (iii). For a fixed α ∈ R, suppose that {xk } ⊂ lev≤α φ such that xk → x ¯. Therefore, φ(xk ) ≤ α, that is, (xk , α) ∈ epi φ. By (ii), epi φ is closed, which implies (¯ x, α) ∈ epi φ, that is, φ(¯ x) ≤ α. Thus, x ¯ ∈ lev≤α φ, thereby yielding condition (iii).
© 2012 by Taylor & Francis Group, LLC
1.2 Basic Concepts
9
Finally, to obtain the equivalence, we will establish that (iii) implies (i). To show that φ is lsc, we need to show that for every x ¯ ∈ Rn , φ(¯ x) ≤ lim inf φ(xk ) k→∞
whenever
xk → x ¯.
On the contrary, assume that for some x ¯ ∈ Rn and some sequence xk → x ¯, φ(¯ x) > lim inf φ(xk ), k→∞
which implies there exists α ∈ R such that φ(¯ x) > α > lim inf φ(xk ). k→∞
(1.1)
Thus, there exists a subsequence, without relabeling, say {xk } such that φ(xk ) ≤ α for every k ∈ N, which implies xk ∈ lev≤α φ. By (iii), the lower level set lev≤α φ is closed and hence x ¯ ∈ lev≤α φ, that is, φ(¯ x) ≤ α, which contradicts (1.1). Therefore, φ is lsc over Rn . The proof of the last implication, that is, (iii) implies (i) of Theorem 1.9 by contradiction was from Bertsekas [12]. We present an alternative proof for the same from Rockafellar and Wets [101]. It is obvious that for any x ¯ ∈ Rn , α ¯ = lim inf φ(x) ≤ φ(¯ x). x→¯ x
Therefore, to establish the lower semicontinuity of φ at x ¯, we need to prove that φ(¯ x) ≤ α ¯ . By Lemma 1.8, there exists a sequence {xk } ⊂ Rn with xk → x ¯ such that φ(xk ) → α. ¯ Thus, for every α > α ¯ , φ(xk ) ≤ α, which implies xk ∈ lev≤α φ. Now if condition (iii) of the above theorem holds, that is, lev≤α φ is closed in Rn , x ¯ ∈ lev≤α φ, ∀ α > α ¯. Thus, φ(¯ x) ≤ α, which leads to φ(¯ x) ≤ α ¯ . Because x ¯ ∈ Rn was arbitrarily n chosen, φ is lsc over R . Theorem 1.9 gives equivalent characterization of lower semicontinuity of a function. But if the function is not lsc, its epigraph is not closed. The result below gives an equivalent characterization of the closure of the epigraph of any arbitrary function. ¯ Proposition 1.10 For any arbitrary extended-valued function φ : Rn → R, (¯ x, α) ¯ ∈ cl epi φ if and only if lim inf φ(x) ≤ α ¯. x→¯ x
© 2012 by Taylor & Francis Group, LLC
10
What Is Convex Optimization?
Proof. Suppose that (¯ x, α) ¯ ∈ cl epi φ, which implies that there exists {(xk , αk )} ⊂ epi φ such that (xk , αk ) → (¯ x, α). ¯ Thus, taking the limit as k → +∞, the condition lim inf φ(x) ≤ lim inf φ(xk ) xk →¯ x
x→¯ x
yields lim inf φ(x) ≤ α ¯, x→¯ x
as desired. Conversely, assume that lim inf x→¯x φ(x) ≤ α ¯ but (¯ x, α) ¯ 6∈ cl epi φ. We claim that, lim inf x→¯x φ(x) = α. ¯ On the contrary, suppose that lim inf x→¯x φ(x) < α ¯ . As (¯ x, α) ¯ ∈ / cl epi φ, there exists δ¯ > 0 such that for ¯ every δ ∈ (0, δ), Bδ ((¯ x, α)) ¯ ∩ cl epi φ = ∅, which implies for every (x, α) ∈ Bδ ((¯ x, α)), ¯ φ(x) > α. In particular for (x, α) ¯ ∈ Bδ ((¯ x, α)), ¯ φ(x) > α ¯ , that is, φ(x) > α ¯ , ∀ x ∈ Bδ (¯ x). Therefore, taking the limit as δ → 0 along with the definition of limit infimum of a function yields lim inf φ(x) ≥ α ¯, x→¯ x
which is a contradiction. Therefore, lim inf x→¯x φ(x) = α. ¯ By Lemma 1.8, there exists a sequence xk → x ¯ such that φ(xk ) → α ¯ . Because (xk , φ(xk )) ∈ epi φ, (¯ x, α) ¯ ∈ cl epi φ, thereby reaching a contradiction and hence the result. Now the question is whether it is possible to construct a function that is the closure of the epigraph of another function. This leads to the concept of closure of a function. ¯ an lsc function that is conDefinition 1.11 For any function φ : Rn → R, structed in such a way that its epigraph is the closure of the epigraph of φ is called the lower semicontinuous hull or the closure of the function φ and is denoted by cl φ. Therefore, epi(cl φ) = cl epi φ. Equivalently, the closure of φ is defined as cl φ(¯ x) = lim inf φ(x), ∀ x ¯ ∈ Rn . x→¯ x
By Proposition 1.10, it is obvious that (¯ x, α) ¯ ∈ cl epi φ if and only if (¯ x, α) ¯ ∈ epi cl φ. The function φ is said to be closed if cl φ = φ.
© 2012 by Taylor & Francis Group, LLC
1.2 Basic Concepts
−1
1 epi φ
11
1
−1 epi cl φ
FIGURE 1.1: Lower semicontinuous hull.
If φ is lsc, then it is closed as well. Also cl φ is lsc and the greatest of all lsc functions ψ such that ψ(x) ≤ φ(x) for every x ∈ Rn . From Theorem 1.9, one has that closedness is the same as lower semicontinuity over Rn . In this discussion, the function φ was defined over Rn . But what if φ is defined over some subset of Rn . Then one cannot talk about the lower semicontinuity of the function over Rn . In such a case, how is the closedness of a function related to lower semicontinuity? This issue was addressed by Bertsekas [12]. Consider ¯ Observe that here we define φ over a set F ⊂ Rn and a function φ : F → R. n the set F and not R . The function φ can be extended over Rn by defining a ¯ as function φ¯ : Rn → R φ(x), x ∈ F, ¯ φ(x) = +∞, otherwise. Note that both the extended-valued functions φ and φ¯ have the same epigraph. Thus from the above discussion, one has φ is closed if and only if φ¯ is lsc over Rn . Also observe that the lower semicontinuity of φ over dom φ is not sufficient for φ to be closed. In addition, one has to assume the closedness of dom φ. ¯ To emphasize this fact, let us consider a simple example. Consider φ : R → R defined as 0, x ∈ (−1, 1), φ(x) = +∞, otherwise. Here, dom φ = (−1, 1) over which the function is lsc but epi φ is not closed and hence, φ is not closed. The closure of φ is given by 0, x ∈ [−1, 1], cl φ(x) = +∞, otherwise. Observe that in Figure 1.1, epi φ is not closed while epi cl φ is closed. Therefore, we have the following result from Bertsekas [12].
© 2012 by Taylor & Francis Group, LLC
12
What Is Convex Optimization?
¯ If dom φ is Proposition 1.12 Consider F ⊂ Rn and a function φ : F → R. closed and φ is lsc over dom φ, then φ is closed. Because we are interested in studying the minimization problem, it is important to know whether a minimizer exists or not. In this respect, we have the classical Weierstrass theorem, according to which “A continuous function attains its minimum over a compact set.” For a more general version of this theorem from Bertsekas [12], we require the notion of coercivity. ¯ is said to be coercive over a set Definition 1.13 A function φ : Rn → R F ⊂ Rn if for every sequence {xk } ⊂ F lim φ(xk ) = +∞
k→∞
whenever
kxk k → +∞.
For F = Rn , φ is simply called coercive. Observe that for a coercive function, every nonempty lower level set is bounded. Below we prove the Weierstrass Theorem. Theorem 1.14 (Weierstrass Theorem) Consider a proper lsc function ¯ and assume that one of the following holds: φ : Rn → R (i) dom φ is bounded. (ii) there exists α ∈ R such that the lower level set lev≤α φ is nonempty and bounded. (iii) φ is coercive. Then the set of minimizers of φ over Rn is nonempty and compact. Proof. Suppose that condition (i) holds, that is, dom φ is bounded. Because φ is proper, φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. Denote φinf = inf x∈Rn φ(x), which implies φinf = inf x∈dom φ φ(x). Therefore, there exists a sequence {xk } ⊂ dom φ such that φ(xk ) → φinf . Because dom φ is bounded, {xk } is a bounded sequence, which by Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence. Without loss of generality, assume that xk → x ¯. By the lower semicontinuity of φ, φ(¯ x) ≤ lim inf φ(xk ) = lim φ(xk ) = φinf , k→∞
k→∞
which implies that x ¯ is a point of minimizer of φ over Rn . Denote the set of minimizers by S. Therefore, x ¯ ∈ S and hence S is nonempty. Because S ⊂ dom φ which is bounded, S is a bounded set. Also, S is the intersection of the lower level sets lev≤α φ, where α > m. For an lsc function φ, lev≤α φ is closed by Theorem 1.9 and thus S is closed. Hence S is compact. Assume that condition (ii) holds; that is, for some α ∈ R, the lower level
© 2012 by Taylor & Francis Group, LLC
1.2 Basic Concepts
13
¯ set lev≤α φ is nonempty and bounded. Consider a proper function φ¯ : Rn → R defined as φ(x), φ(x) ≤ α, ¯ φ(x) = +∞, otherwise. Therefore, dom φ¯ = lev≤α φ which is nonempty and bounded by condition (ii). Since φ is lsc which by Theorem 1.9 implies that dom φ¯ is closed. Also by the lower semicontinuity of φ along with Proposition 1.12, φ¯ is closed and hence lsc. Moreover, the set of minimizers of φ¯ is the same as that of φ. The result ¯ follows by applying condition (i) to φ. Suppose that condition (iii) is satisfied, that is, φ is coercive. Because φ is proper, dom φ is nonempty and thus has a nonempty lower level set. By the coercivity of φ, it is obvious that the nonempty lower level sets of φ are bounded, thereby satisfying condition (ii), and therefore leading to the desired result. As we all know, the next concept that comes to mind after limit and continuity is the derivative of a function. Below we define this very notion. Definition 1.15 For a scalar-valued function φ : Rn → R, the derivative of φ at x ¯ is denoted by ∇φ(¯ x) ∈ Rn and is defined as φ(¯ x + h) − φ(¯ x) − h∇φ(¯ x), hi = 0. khk khk→0 lim
Equivalently, the derivative can also be expressed as φ(x) = φ(¯ x) + h∇φ(¯ x), x − x ¯i + o(kx − x ¯k), o(kx − x ¯)k = 0. A function φ is differentiable if it is differenkx − x ¯k tiable at every x ∈ Rn . The derivative, ∇φ(¯ x), of φ at x ¯ is also called the gradient of φ at x ¯, which can be expressed as ∂φ ∂φ ∂φ (¯ (¯ (¯ ∇φ(¯ x) = x), x), . . . , x) , ∂x1 ∂x2 ∂xn where limx→¯x
∂φ , i = 1, 2, . . . , n denotes the i-th partial derivative of φ. If φ is ∂xi continuously differentiable, that is, the map x 7→ ∇φ(x) is continuous over Rn , then φ is called a smooth function. If φ is not smooth, it is called a nonsmooth function. Similar to the first-order differentiability, we have the second-order differentiability notion as follows. where
© 2012 by Taylor & Francis Group, LLC
14
What Is Convex Optimization?
Definition 1.16 For a scalar-valued function φ : Rn → R, the second-order derivative of φ at x ¯ is denoted by ∇2 φ(¯ x) ∈ Rn×n and is defined as 1 φ(¯ x + h) − φ(¯ x) − h∇φ(¯ x), hi − h∇2 φ(¯ x)h, hi 2 = 0, lim 2 khk khk→0 which is equivalent to φ(x) = φ(¯ x) + h∇φ(¯ x), x − x ¯i + h∇2 φ(¯ x)(x − x ¯), x − x ¯i + o(kx − x ¯k2 ). The matrix ∇2 φ(¯ x) is also referred to as the Hessian with the ij-th entry of ∂2φ (¯ x). If φ is twice the matrix being the second-order partial derivative ∂xi ∂xj 2 continuously differentiable, then the matrix ∇ φ(¯ x) is a symmetric matrix. In the above definitions we considered the function φ to be a scalar-valued function. Next we define the notion of differentiability for a vector-valued function Φ. Definition 1.17 For a vector-valued function Φ : Rn → Rm , the derivative of Φ at x ¯ is denoted by JΦ(¯ x) ∈ Rm×n and is defined as lim
khk→0
kΦ(¯ x + h) − Φ(¯ x) − hJΦ(¯ x), hik = 0. khk
The matrix JΦ(¯ x) is also called the Jacobian of Φ at x ¯. If Φ = (φ1 , φ2 , . . . , φm ), Φ is differentiable if each φi : Rn → R, i = 1, 2, . . . , m is differentiable. The Jacobian of Φ at x ¯ can be expressed as ∇φ1 (¯ x) ∇φ2 (¯ x) JΦ(¯ x) = .. . ∇φm (¯ x)
∂φi (¯ x). In ∂xj the above expression of JΦ(¯ x), the vectors ∇φ1 (¯ x), ∇φ2 (¯ x), . . . , ∇φm (¯ x) are written as row vectors. Observe that the derivative is a local concept and it is defined at a point x if x ∈ int dom φ. Below we state the Mean Value Theorem, which plays a pivotal role in the study of optimality conditions. with the ij-th entry of the matrix being the partial derivative
Theorem 1.18 (Mean Value Theorem) Consider a continuously differentiable function φ : Rn → R. Then for every x, y ∈ Rn , there exists z ∈ [x, y] such that φ(y) − φ(x) = h∇φ(z), y − xi.
© 2012 by Taylor & Francis Group, LLC
1.3 Smooth Convex Optimization
15
With all these basic concepts we now move on to the study of convexity. The importance of convexity in optimization stems from the fact that whenever we minimize a convex function over a convex set, every local minimum is a global minimum. Many other issues in optimization depend on convexity. However, convex functions suffer from the drawback that they need not be differentiable at every point of their domain of definition and the nondifferentiability may be precisely at the point where the minimum is achieved. For instance, consider the minimization of the absolute value function, |x|, over R. At the point of minima, x ¯ = 0, the function is nondifferentiable. How this major difficulty was overcome by the development of a completely different type of analysis is possibly one of the most thrilling developments in optimization theory. This analysis depends on set-valued maps, which we briefly present below. Definition 1.19 A set-valued map Φ from Rn to Rm associates every x ∈ Rn to a set in Rm ; that is, for every x ∈ Rn , Φ(x) ⊂ Rm . Symbolically it is expressed as Φ : Rn ⇉ Rm . A set-valued map is associated with its graph defined as gph Φ = {(x, y) ∈ Rn × Rm : y ∈ Φ(x)}. Φ is said to be a proper map if there exists x ∈ Rn such that Φ(x) 6= ∅. Φ is said to be closed-valued or convex-valued or bounded-valued if for every x ∈ Rn , the sets Φ(x) are closed or convex or bounded, respectively. Φ is locally bounded at x ¯ ∈ Rn if there exists δ > 0 and a bounded set F ⊂ Rn such that Φ(x) ⊂ V, ∀ x ∈ Bδ (¯ x). The set-valued map Φ is said to be closed if it has a closed graph; that is, for any sequence {xk } ⊂ Rn with xk → x ¯ and yk ∈ Φ(xk ) with yk → y¯, y¯ ∈ Φ(¯ x). A set-valued map Φ : Rn → Rm is said to be upper semicontinuous (usc) at x ¯ ∈ Rn if for any ε > 0, there exists δ > 0 such that Φ(x) ⊂ Φ(¯ x) + εB, ∀ x ∈ Bδ (¯ x), where the balls are in the respective spaces. If Φ is locally bounded and has a closed graph, then it is usc. If Φ is single-valued, that is, Φ(x) is singleton for every x, the upper semicontinuity of Φ coincides with continuity. For more on set-valued maps, the readers may refer to Berge [10]. A detailed analysis of convex function appears in Chapter 2.
1.3
Smooth Convex Optimization
Recall the convex optimization problem (CP ) stated in Section 1.1, that is,
© 2012 by Taylor & Francis Group, LLC
16
What Is Convex Optimization?
f
(y, f (y))
(x, f (x))
graph of f
(y, f (x) + ∇f (x)(y − x))
y
x
x
FIGURE 1.2: Graph of a real-valued differentiable convex function.
min f (x)
subject to x ∈ C,
(CP )
where f : Rn → R is a convex function and C is a closed convex set in Rn . Let us additionally assume that f is differentiable. It is mentioned in Chapter 2 that if f is differentiable, then for any x ∈ Rn , f (y) − f (x) ≥ h∇f (x), y − xi, ∀ y ∈ Rn . Conversely, if the above relation holds for a function, then the function is convex. This fact appears as Theorem 2.81 in the next chapter. It is mentioned there as a consequence of more general facts. However, we provide a direct proof here. Observe that if f is convex, then for any x, y ∈ Rn and any λ ∈ [0, 1], (1 − λ)f (x) + λf (y) ≥ f (x + λ(y − x)). Hence, for any λ ∈ (0, 1), f (y) − f (x) ≥
f (x + λ(y − x)) − f (x) . λ
Taking the limit as λ ↓ 0, the above inequality yields f (y) − f (x) ≥ h∇f (x), y − xi.
(1.2)
For the converse, suppose that (1.2) holds for any x, y ∈ Rn . Setting z = x + λ(y − x) with λ ∈ (0, 1), then f (y) − f (z) ≥ h∇f (z), y − zi
f (x) − f (z) ≥ h∇f (z), x − zi
© 2012 by Taylor & Francis Group, LLC
(1.3) (1.4)
1.3 Smooth Convex Optimization
17
The result is obtained by simply multiplying (1.3) with λ and (1.4) with (1 − λ) and then adding them up. This description geometrically means that the tangent plane should always lie below the graph of the function. For a convex function f : R → R, it looks something like Figure 1.2. This important characterization of a convex function leads to the following result. Theorem 1.20 Consider the convex optimization problem (CP ) where f is a differentiable convex function and C is a closed convex set in Rn . Then x ¯ is a point of minimizer of (CP ) if and only if h∇f (¯ x), x − x ¯i ≥ 0, ∀ x ∈ C.
(1.5)
Proof. It is simple to see that as C is a convex set, for x ∈ C, x ¯ + λ(x − x ¯) ∈ C, ∀ λ ∈ [0, 1]. Therefore, if x ¯ is a point of minimum, f (¯ x + λ(x − x ¯)) ≥ f (¯ x), that is, f (¯ x + λ(x − x ¯)) − f (¯ x) ≥ 0. Dividing both sides by λ > 0 and taking the limit as λ ↓ 0 leads to h∇f (¯ x), x − x ¯i ≥ 0. Because x ∈ C was arbitrarily chosen, h∇f (¯ x), x − x ¯i ≥ 0, ∀ x ∈ C. Also as f is convex, by the condition (1.2), for any x ∈ C, f (x) − f (¯ x) ≥ h∇f (¯ x), x − x ¯i. Now if (1.5) is satisfied, then the above inequality reduces to f (x) ≥ f (¯ x), ∀ x ∈ C, thereby proving the requisite result.
Remark 1.21 Expressing the optimality condition in the form of (1.5) leads to what is called a variational inequality. Let F : Rn → Rn be a given function and C be a closed convex set in Rn . Then the variational inequality V I(F, C) is the problem of finding x ¯ ∈ C such that hF (¯ x), x − x ¯i ≥ 0, ∀ x ∈ C.
© 2012 by Taylor & Francis Group, LLC
18
What Is Convex Optimization?
C x
λ = λ0 x¯ δ
FIGURE 1.3: Local minimizer is global minimizer.
When f is a differentiable convex function, for F = ∇f , V I(∇f, C) is nothing but the condition (1.5). In order to solve V I(F, C) efficiently, one needs an additional property on F which is monotonicity. A function F : Rn → Rn is called monotone if for any x, y ∈ Rn , hF (y) − F (x), y − xi ≥ 0. However, when f is a convex function, one has the following pleasant property. Theorem 1.22 A differentiable function f is convex if and only if ∇f is monotone. For proof, see Rockafellar [97]. However, the reader should try to prove it on his/her own. We have shown that when (CP ) has a smooth f , one can write down a necessary and sufficient condition for a point x¯ ∈ C to be a global minimizer of (CP ). In fact, as already mentioned, the importance of studying convexity in optimization stems from the following fact. For the problem (CP ), every local minimizer is a global minimizer irrespective of the fact whether f is smooth or not. This can be proved in a simple way as follows. If x¯ is a local minimizer of (CP ), then there exists δ > 0 such that f (x) ≥ f (¯ x), ∀ x ∈ C ∩ Bδ . Now consider any x ∈ C. Then it is easy to observe from Figure 1.3 that there exists λ0 ∈ (0, 1) such that for every λ ∈ (0, λ0 ), λx + (1 − λ)¯ x ∈ C ∩ Bδ . Hence f (λx + (1 − λ)¯ x) ≥ f (¯ x).
© 2012 by Taylor & Francis Group, LLC
1.3 Smooth Convex Optimization
19
The convexity of f shows that λ(f (x) − f (¯ x)) ≥ 0. Because λ > 0, f (x) ≥ f (¯ x). As x ∈ C was arbitrary, our claim is established. The result can also be obtained using the approach of contradiction as done in Theorem 2.90. Now consider the following function θ(x) = sup h∇f (x), x − yi. y∈C
The interesting feature of the function is that θ(x) ≥ 0, ∀ x ∈ C and if θ(x) = 0 for x ∈ C, then x solves the problem (CP ). Furthermore, if x solves the problem (CP ), we have θ(x) = 0. The function θ is usually called the gap function or the merit function associated with (CP ). For the variational inequality problem, such a function was first introduced by Auslender [5]. The next question is how useful is the function θ to the problem (CP ). What we will now show is that for certain classes of the problem (CP ), the function θ can provide an error bound for the problem (CP ). By an error bound we mean an upper estimate of the distance of a point in C to the solution set of (CP ). The class of convex optimization problems where such a thing can be achieved is the class of strongly convex optimization problems. A function f : Rn → R is strongly convex with modulus of strong convexity ρ > 0 if for any x, y ∈ Rn and λ ∈ [0, 1], (1 − λ)f (x) + λf (y) ≥ f (x + λ(y − x)) + ρλ(1 − λ)ky − xk2 . If f is differentiable, then f is strongly convex if and only if for any x, y ∈ Rn , f (y) − f (x) ≥ h∇f (x), y − xi + ρky − xk2 . 1 hx, Axi, where x ∈ Rn and A is a positive definite n×n 2 matrix, is strongly convex with ρ = λmin (A), the minimum eigenvalue of A while f (x) = x with x ∈ Rn is not strongly convex. If f is a twice continuously differentiable strongly convex function, then ∇2 f (x) is always positive definite for each x. Now if f is strongly convex with modulus of convexity ρ > 0, then for any x, y ∈ Rn , Observe that f (x) =
f (y) − f (x) ≥ h∇f (x), y − xi + ρky − xk2 ,
f (x) − f (y) ≥ h∇f (y), x − yi + ρky − xk2 .
Adding the above inequalities leads to 0 ≥ h∇f (x), y − xi + h∇f (y), x − yi + 2ρky − xk2 ,
© 2012 by Taylor & Francis Group, LLC
20
What Is Convex Optimization?
that is, h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 .
(1.6)
The property of ∇f given by (1.6) is called strong monotonicity with 2ρ as the modulus of monotonicity. It is in fact interesting to observe that if f : Rn → R is a differentiable function for which there exists ρ > 0 such that for every x, y ∈ Rn , h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 , which implies that h∇f (y) − ∇f (x), y − xi ≥ ρky − xk2 , ∀ x, y ∈ Rn . Now we request the reader to show that f is strongly convex with modulus ρ > 0. In fact, if f is strongly convex with ρ > 0 one can also show that ∇f is strongly monotone with ρ > 0. Thus we conclude that f is strongly convex with modulus of strong convexity ρ > 0 if and only if ∇f is strongly monotone with modulus of monotonicity ρ > 0. It is important to note that one cannot guarantee θ to be finite unless C has some additional conditions, for example, C is compact. Assume that C is compact and let x ¯ be a solution of the problem (CP ), where f is strongly convex. (Think why a solution should exist.) Now as f is strongly convex, it is simple enough to see that x ¯ is the unique solution of (CP ). Thus from the definition of θ, for any x ∈ C and y = x ¯, θ(x) ≥ h∇f (x), x − x ¯i. By strong convexity of f with ρ > 0 as the modulus of strong convexity, ∇f is strongly monotone with modulus 2ρ. Thus, h∇f (x), x − x ¯i ≥ h∇f (¯ x), x − x ¯i + 2ρkx − x ¯ k2 , thereby yielding θ(x) ≥ h∇f (¯ x), x − x ¯i + 2ρkx − x ¯ k2 . But by the optimality condition in Theorem 1.20, h∇f (¯ x), x − x ¯i ≥ 0, ∀ x ∈ C. Therefore, the inequality (1.7) reduces to θ(x) ≥ 2ρkx − x ¯k2 , which leads to kx − x ¯k ≤
© 2012 by Taylor & Francis Group, LLC
s
θ(x) . 2ρ
(1.7)
1.3 Smooth Convex Optimization
21
This provides an error bound for (CP ), where f is strongly convex and C is compact. In this derivation if ∇f was strongly monotone with modulus ρ > 0, then the error bound will have the expression s θ(x) . kx − x ¯k ≤ ρ Observe that as ρ > 0, s
θ(x) ≤ 2ρ
s
θ(x) ρ
and hence the error bound provided by considering that f is strongly monotone with modulus 2ρ gives a sharper error bound. Now the question is can we design a merit function for (CP ) that can be used to develop an error bound even when C is noncompact. Such a merit function was first developed by Fukushima [48] for general variational inequalities. In our context, the function given by α θˆα (x) = sup h∇f (x), x − yi − ky − xk2 , α > 0. 2 y∈C It will be an interesting exercise for the reader to show that θˆα (x) ≥ 0, ∀ x ∈ C and θˆα (x) = 0 for x ∈ C if and only if x is a solution of (CP ). Observe that α θˆα (x) = − inf h∇f (x), y − xi + ky − xk2 , α > 0. y∈C 2
For a fixed x, observe that the function
φα x (y) = h∇f (x), y − xi +
α ky − xk2 2
is a strongly convex function and is coercive (Definition 1.13). Hence φα x attains a lower bound on C. The point of minimum is unique as φα is strongly convex. x ˆα (x) is Hence for each x, the function φα has a finite minimum value. Thus θ x always finite, thereby leading to the following error bound. Theorem 1.23 Consider the convex optimization problem (CP ) where f is a differentiable strongly convex function with modulus ρ > 0 and C is a closed convex set in Rn . Let x ¯ ∈ C be the unique solution of (CP ). Furthermore, if α ρ > , then for any x ∈ C, 2 s 2θˆα (x) . kx − x ¯k ≤ 2ρ − α
© 2012 by Taylor & Francis Group, LLC
22
What Is Convex Optimization?
Proof. For any x ∈ C and y = x ¯ in particular, α θˆα (x) ≥ h∇f (x), x − x ¯i − kx − x ¯k2 . 2 By the fact that ∇f is strongly monotone with modulus ρ > 0, α ¯k2 . θˆα (x) ≥ h∇f (¯ x), x − x ¯i + ρkx − x ¯k2 − kx − x 2
(1.8)
Because x ¯ is the unique point of minimizer of (CP ), by Theorem 1.20, h∇f (¯ x), x − x ¯i ≥ 0, thereby reducing the inequality (1.8) to α θˆα (x) ≥ (ρ − )kx − x ¯k2 . 2 Therefore, kx − x ¯k ≤ as desired.
s
2θˆα (x) , 2ρ − α
The reader is urged to show that under the hypothesis of the above theorem, one can prove a more tighter error bound of the form s 2θˆα (x) . kx − x ¯k ≤ 4ρ − α The study of optimality conditions with C explicitly given by functional constraints will begin in Chapter 3.
© 2012 by Taylor & Francis Group, LLC
Chapter 2 Tools for Convex Optimization
2.1
Introduction
With the basic concepts discussed in the previous chapter, we devote this chapter to the study of concepts related to the convex analysis. Convex analysis is the branch of mathematics that studies convex objects, namely, convex sets, convex functions, and convex optimization theory. These concepts will be used in the subsequent chapters to discuss the details of convex optimization theory and in the development of the book.
2.2
Convex Sets
Recall that for any x, y ∈ Rn , the set [x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1} denotes the line segment joining the points x and y. The open line segment joining x and y is given by (x, y) = {z ∈ Rn : z = (1 − λ)x + λy, 0 < λ < 1}. Definition 2.1 A set F ⊂ Rn is a convex set if λx + (1 − λ)y ∈ F, ∀ x, y ∈ F, ∀ λ ∈ [0, 1]. Equivalently, for any x, y ∈ F , the line segment [x, y] is contained in F . Figure 2.1 present convex and nonconvex sets. Consider the hyperplane defined as H(a, b) = {x ∈ Rn : ha, xi = b}, where a ∈ Rn and b ∈ R. Observe that it is a convex set. Similarly, the closed half spaces given by H≤ (a, b) = {x ∈ Rn : ha, xi ≤ b}
and H≥ (a, b) = {x ∈ Rn : ha, xi ≥ b}, 23
© 2012 by Taylor & Francis Group, LLC
24
Tools for Convex Optimization
nonconvex set
convex set
FIGURE 2.1: Convex and nonconvex sets.
and the open half spaces given by H< (a, b) = {x ∈ Rn : ha, xi < b}
and H> (a, b) = {x ∈ Rn : ha, xi > b}
are also convex. Another class of sets that are also convex is the affine sets. Definition 2.2 A set M ⊂ Rn is said to be an affine set if (1 − λ)x + λy ∈ M, ∀ x, y ∈ M, ∀ λ ∈ R, where the set {z ∈ Rn : z = (1 − λ)x + λy, λ ∈ R} denotes the line passing through x and y. Equivalently, M ∈ Rn is affine if for any x, y ∈ M , the line passing through them is contained in M . Note that a hyperplane is an example of an affine set. The empty set ∅ and the whole space Rn are the trivial examples of affine sets. Even though affine sets are convex, the converse need not be true, as is obvious from the example of half spaces. Next we state some basic properties of convex sets. Proposition 2.3 (i) The intersection of an arbitrary collection of convex sets is convex. (ii) For two convex sets F1 , F2 ⊂ Rn , F1 + F2 is convex.
(iii) For a convex set F ⊂ Rn and scalar λ ∈ R, λF is convex. (iv) For a convex set F ⊂ Rn and scalars λ1 ≥ 0 and λ2 ≥ 0, (λ1 + λ2 )F = λ1 F + λ2 F which is convex. Proof. The properties (i)-(iii) can be established by simply using Definition 2.1. The readers are urged to prove (i)-(iii) on their own. Here we will prove only (iv). Consider z ∈ (λ1 + λ2 )F . Thus, there exists x ∈ F such that z = (λ1 + λ2 )x = λ1 x + λ2 x ∈ λ1 F + λ2 F.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
25
Because z ∈ (λ1 + λ2 )F was arbitrary, (λ1 + λ2 )F ⊂ λ1 F + λ2 F.
(2.1)
Conversely, let z ∈ λ1 F + λ2 F , which implies that there exist x1 , x2 ∈ F such that λ2 λ1 x1 + x2 . (2.2) z = λ1 x1 + λ2 x2 = (λ1 + λ2 ) λ1 + λ2 λ1 + λ2 Because that
λi ∈ [0, 1], i = 1, 2, which along with the convexity of F implies λ1 + λ2 x=
λ2 λ1 x1 + x2 ∈ F. λ1 + λ2 λ1 + λ2
(2.3)
Combining the conditions (2.2) and (2.3) lead to z = (λ1 + λ2 )x ∈ (λ1 + λ2 )F. As z ∈ λ1 F + λ2 F was arbitrarily chosen, (λ1 + λ2 )F ⊃ λ1 F + λ2 F, which along with the inclusion (2.1) yields the desired equality. Observe that (ii) and (iii) lead to the convexity of (λ1 + λ2 )F = λ1 F + λ2 F . From Proposition 2.3, it is obvious that intersection of finitely many closed half spaces is again a convex set. Such sets that can be expressed in this form are called polyhedral sets. These sets play an important role in linear programming problems. We will deal with polyhedral sets later in this chapter. However, unlike the intersection and sum of convex sets, the union as well as the complement of convex sets need not be convex. For instance, consider the sets F1 = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}
and
F2 = {(x, y) ∈ R2 : y ≥ x2 }.
Observe from Figure 2.2 that both F1 and F2 along with their intersection are convex sets but neither their complements nor the union of these two sets is convex. To overcome such situations where nonconvex sets come into the picture in convex analysis, one has to convexify the nonconvex sets. This leads to the notion of convex combination and convex hull. Definition 2.4 A point x ∈ Rn is said to be a convex combination of the points x1 , x2 , . . . , xm ∈ Rn if x = λ1 x1 + λ2 x2 + . . . + λm xm Pm with λi ≥ 0, i = 1, 2, . . . , m and i=1 λi = 1.
© 2012 by Taylor & Francis Group, LLC
26
Tools for Convex Optimization
0000000 1111111 111 000 11 00 1111111 0000000 111 000 11 00 F1 11 00 11 00 11 00 111 000 111 000 111 000 11 00 1111111 0000000 1111111 0000000 1111111 0000000
0 1 1 0 1 0 1 0 1 0 1 0 1 11 0 1 0 1 0 1 0 1 0 000 1 0 1 0 1 0 1 0 1 0 1 0 1 11 00 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 11 00 11 00 11 00 01 1 0 1 1 0 1 01 0 1 0 0 1 0 0 1 F 111 000 0 1 0 1 0 1 0 1 0 1 2 0 1 0 1 0 1 0 1 1111 0000 111 000 1111 0000 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1111 0000 111 000 0 1 0 1 0 1 0 1 0 1 0 1 11111 00000 1111 0000 0 0 1 0 1 01 1 0 1 1 0 1 0 1111111111 0000000000 1111111111 0000000000 11111111111 00000000000 1111111111 0000000000 F2c
F1c
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 01 1 0 1 0 1 0 0 1 0 1
F1 ∩ F2
1 0 0 1 0 1 0 1 0 0 1 01 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 01 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 11 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 1 F1 ∪ F2
FIGURE 2.2: F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 are nonconvex.
The next result expresses the concept of convex set in terms of the convex combination of its elements. Theorem 2.5 A set F ⊂ Rn is convex if and only if it contains all the convex combinations of its elements. Proof. From Definition 2.1 of convex set, F ⊂ Rn is convex if and only if (1 − λ)x1 + λx2 ∈ F, ∀ x1 , x2 ∈ F, λ ∈ [0, 1], that is, the convex combination for m = 2 belongs to F . To establish the result, we will use the induction approach. Suppose that the convex combination of m = l − 1 elements of F belong to F . Consider m = l. The convex combination of l elements is x = λ1 x1 + λ2 x2 + . . . + λl xl ,
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
27 Pl
where xi ∈ F and λi ≥ 0, i = 1, 2, . . . , l with i=1 λi = 1. Because Pl i=1 λi = 1, there exists at least one λj > 0 for some j ∈ {1, 2, . . . , l}. Denote ˜ 1 x1 + . . . + λ ˜ j−1 xj−1 + λ ˜ j+1 xj+1 + . . . + λ ˜ l xl , x ˜=λ
λi ˜i = ≥ 0, i = 1, . . . , j − 1, j + 1, . . . , l. Observe that where λ 1 − λj Pl ˜ ˜ ∈ F because it is a convex combination of l − 1 i=1, i6=j λi = 1 and thus x elements of F . The element x can now be expressed in terms of xj and x ˜ as x = λj xj + (1 − λj )˜ x. Therefore, x is a convex combination of two elements from F , which implies that x ∈ F . Thus, the convex set F can be equivalently expressed as a convex combination of its elements, as desired. Similar to the concept of convex combination of points, next we introduce the notion of the convex hull of a set. Definition 2.6 The convex hull of a set F ⊂ Rn is the smallest convex set containing F and is denoted by co F . It is basically nothing but the intersection of all the convex sets containing F . Further, the convex hull of F can be expressed in terms of the convex combination of the elements of F as presented in the theorem below. Theorem 2.7 For any set F ⊂ Rn , the convex hull of F , co F , consists of all the convex combinations of the elements of F , that is, co F = {x ∈ Rn : x =
m X i=1
λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m, m X i=1
λi = 1, m ≥ 0}.
Proof. Denote the set of convex combination of the elements of F by F, that is, F = {x ∈ Rn : x =
m X i=1
λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m, m X i=1
λi = 1, m ≥ 0}.
From Definition 2.6, co F is the smallest convex set containing F . Therefore, F ⊂ co F . By Theorem 2.5, the convex combination of the elements of F also belong to the convex set co F , that is, co F ⊃ F.
© 2012 by Taylor & Francis Group, LLC
(2.4)
28
Tools for Convex Optimization
To establish the result, we will show that F is also convex. Suppose that x, ˜ ∈ F, which implies there exist xi ∈ F, λi ≥ P 0, i = 1, 2, . . . , m with Px m l ˜ ˜ λ = 1 and x ˜ ∈ F, λ ≥ 0, i = 1, 2, . . . , l with i i i=1 i i=1 λi = 1 such that x = λ1 x1 + λ2 x2 + . . . + λm xm , ˜1x ˜2x ˜lx x ˜=λ ˜1 + λ ˜2 + . . . + λ ˜l .
For any λ ∈ [0, 1], ˜1x ˜lx (1 − λ)x + λ˜ x = (1 − λ)λ1 x1 + . . . + (1 − λ)λm xm + λλ ˜ 1 + . . . + λλ ˜l . ˜ i ≥ 0, i = 1, 2, . . . , l Observe that (1 − λ)λi ≥ 0, i = 1, 2, . . . , m and λλ satisfying (1 − λ)
m X i=1
λi + λ
l X
˜ i = 1. λ
i=1
Thus, for any λ ∈ [0, 1], (1 − λ)x + λ˜ x ∈ F. As x, x ˜ ∈ F were arbitrary, the above relation holds for any x, x ˜ ∈ F, thereby implying the convexity of F. Also, F ⊂ F. Therefore, by Definition 2.6 of convex hull, co F ⊂ F, which along with the inclusion (2.4) leads to the desired result. It follows from the above discussion that a set F ⊂ Rn is convex if co F = F and thus equivalent to the fact that a convex set F ⊂ Rn contains all the convex combinations of the elements of F . From the above theorem we observe that co F is expressed as a convex combination of m elements of F , where m ≥ 0 is arbitrary. But the obvious question is how large this m has to be chosen in the result. This is answered in the famous Carath´eodory Theorem, which we present next. Though one finds various approaches to prove the result [12, 97, 101], we present a simple proof from Mangasarian [82]. Theorem 2.8 (Carath´eodory Theorem) Consider a nonempty set F ⊂ Rn . Then any point of the convex hull of F is representable as a convex combination of at most n + 1 points of F . Proof. From Theorem 2.7, any element in co F can be expressed as a convex combination of m elements of F . We have to show that m ≤ (n + 1). Suppose that xP∈ co F , which implies that there exist xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m, m with i=1 λi = 1 such that x = λ1 x1 + λ2 x2 + . . . + λm xm .
Assume that m > (n + 1). We will prove that x can be expressed as a convex combination of (m − 1) elements. The result can be established by applying
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
29
the reduction process until m = n + 1. In case for some i ∈ {1, 2, . . . , m}, λi = 0, then x is a convex combination of (m − 1) elements of F . Suppose that λi > 0, i = 1, 2, . . . , m. It is known that for l > n, any l elements in Rn are linearly dependent. As m − 1 > n, there exist αi ∈ R, i = 1, 2, . . . , m − 1, not all zeroes, such that α1 (x1 − xm ) + α2 (x2 − xm ) + . . . + αm−1 (xm−1 − xm ) = 0. Define αm = −(α1 + α2 + . . . + αm−1 ). Observe that m X
αi = 0
m X
and
i=1
αi xi = 0.
(2.5)
i=1
˜ i = λi − γαi , i = 1, 2, . . . , m, where γ > 0 is chosen such that λ ˜ i ≥ 0, Define λ ˜ i ∈ {1, 2, . . . , m} and for some j ∈ {1, 2, . . . , m}, λj = 0. This is possible by taking αj αi 1 = max . = i=1,...,m γ λi λj ˜ j = 0 and λ ˜ i ≥ 0, i = 1, . . . , j − 1, j + 1, . . . , m, which by the By choice, λ condition (2.5) yields m X
˜i = λ
i=1, i6=j
m X
˜i = λ
i=1
m X i=1
λi − γ
m X
αi = 1
i=1
and x=
m X
λi xi =
i=1
m X
˜ i xi + γ λ
i=1
m X i=1
αi xi =
m X
˜ i xi , λ
i=1, i6=j
which implies that x is now expressed as a convex combination of (m − 1) elements of F . This reduction can be carried out until m = (n + 1) and thus any element in the convex hull of F is representable as a convex combination of at most (n + 1) elements of F , as desired. Using the above theorem, we have the following important result for a compact set from Bertsekas [12] and Rockafellar [97]. Theorem 2.9 For a compact set F ⊂ Rn , its convex hull co F is also a compact set. Proof. We claim that co F is closed. Consider a sequence {zk } ⊂ co F . By the Carath´eodory Theorem, Theorem 2.8, there exist sequences {λki } ⊂ R+ , Pn+1 i = 1, 2, . . . , n + 1, satisfying i=1 λki = 1 and {xki } ⊂ F , i = 1, 2, . . . , n + 1, such that m X zk = λki xki , ∀ k ∈ N. (2.6) i=1
© 2012 by Taylor & Francis Group, LLC
30
Tools for Convex Optimization Pm As for any k ∈ N, λki ≥ 0, i = 1, 2, . . . , n + 1 with i=1 λki = 1, {λki } is a bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, {λki }, i = 1, 2, . . . , n + 1, has a convergent subsequence. Without loss of generality, assume that λki → λi , i = 1, 2, . . . , n + 1, such that λi ≥ 0, Pn+1 i = 1, 2, . . . , n + 1 with i=1 λi = 1. By the compactness of F , the sek quence {xi }, i = 1, 2, . . . , n+1, is bounded. Again by the Bolzano–Weierstrass Theorem, {xki } has a convergent subsequence. Without loss of generality, let xki → xi , i = 1, 2, . . . , n + 1. By the compactness of F , F is a closed set and thus xi ∈ F, i = 1, 2, . . . , n + 1. Taking the limit as k → +∞, (2.6) yields that zk → z =
n+1 X i=1
λi xi ∈ co F.
Because {zk } ⊂ co F was arbitrary sequence, co F is a closed set. To prove that co F is compact, we will establish that co F is bounded. As F is compact, it is a bounded set, which implies that there exists M > 0 such that kxk ≤ M for every x ∈ F . Now consider z ∈ co F , which by the Carath´eodory Theorem implies that there exist λi ≥ 0, i = 1, 2, . . . , n + 1, Pn+1 satisfying i=1 λi = 1 and xi ∈ F , i = 1, 2, . . . , n + 1, such that z=
n+1 X
λi xi .
i=1
Therefore, by the boundedness of F along with the fact that λi ∈ [0, 1], i = 1, 2, . . . , n + 1 yields that kzk = k
n+1 X i=1
λi xi k ≤
n+1 X i=1
λi kxi k ≤ (n + 1)M.
Because z ∈ co F was arbitrary, every element in co F is bounded above by (n + 1)M and thus co F is bounded. Hence co F is a compact set. However, the above result does not hold true if one replaces the compactness of F by simply the closedness of the set. To verify this fact, we present an example from Bertsekas [12]. Consider the closed set F defined as F = {(0, 0)} ∪ {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0}, while the convex hull of F is co F = {(0, 0)} ∪ {(x1 , x2 ) ∈ R2 : x1 > 0, x2 > 0}, which is not a closed set. Now, similar to the concepts of convex combination and convex hull, we present the notions of affine combination and affine hulls.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
31
Definition 2.10 A point x ∈ Rn is said to be a affine combination of the points x1 , x2 , . . . , xm ∈ Rn if x = λ1 x1 + λ2 x2 + . . . + λm xm with λi ∈ R, i = 1, 2, . . . , m and
m X
λi = 1.
i=1
Definition 2.11 The affine hull of a set F ⊂ Rn is the smallest affine set containing F and is denoted by af f F . It consists of all affine combinations of the elements of F . Now we move on to the properties of closure, interior, and relative interior of convex sets. Definition 2.12 The closure of a set F ⊂ Rn , cl F , is expressed as \ \ cl F = (F + εB) = {x + εB : x ∈ F }, ε>0
ε>0
while the interior of the set F , int F , is defined as int F = {x ∈ Rn : there exists ε > 0 such that (x + εB) ⊂ F }. It is well known that the arbitrary intersection of closed sets is closed but not for the union. However, for the case of union, the following relation holds. For any arbitrary family of set {Fλ }, λ ∈ Λ, where the index set Λ may possibly be infinite, [ [ cl Fλ ⊂ cl Fλ . λ∈Λ
λ∈Λ
The notion of interior suffers from a drawback that even for a nonempty convex set, it may turn out to be empty. For example, consider a line in R2 . From the above definition, it is obvious that the interior is empty. But the set of interior points relative to the affine hull of the set is nonempty. This motivates us to introduce the notion of relative interior. Definition 2.13 The relative interior of a convex set F ⊂ Rn , ri F , is the interior of F relative to the affine hull of F , that is, ri F = {x ∈ Rn : there exists ε > 0 such that (x + εB) ∩ af f F ⊂ F }. For an n-dimensional convex set F ⊂ Rn , af f F = Rn and thus ri F = int F . Though the notion of relative interior helps in overcoming the emptiness of the interior of a convex set, it also suffers from a drawback. For nonempty convex sets F1 , F2 ⊂ Rn , F1 ⊂ F2
=⇒
© 2012 by Taylor & Francis Group, LLC
cl F1 ⊂ cl F2
and
int F1 ⊂ int F2 ,
32
Tools for Convex Optimization
but ri F1 ⊂ ri F2 need not hold. For instance, consider F1 = {(0, 0)} and F2 = {(0, y) ∈ R2 : y ≥ 0}. Here F1 ⊂ F2 with ri F1 = {(0, 0)} and ri F2 = {(0, y) ∈ R2 : y > 0}. Here the relative interiors are nonempty and disjoint. Next we present some properties of closure and relative interior of convex sets. The proofs are from Bertsekas [11, 12] and Rockafellar [97]. Proposition 2.14 Consider a nonempty convex set F ⊂ Rn . Then the following hold: (i) ri F is nonempty. (ii) (Line Segment Principle) Let x ∈ ri F and y ∈ cl F . Then for λ ∈ [0, 1), (1 − λ)x + λy ∈ ri F. (iii) (Prolongation Principle) x ∈ ri F if and only if every line segment in F having x as one end point can be prolonged beyond x without leaving F , that is, for every y ∈ F there exists γ > 1 such that x + (γ − 1)(x − y) ∈ F. (iv) ri F and cl F are convex sets with the same affine hulls as that of F . Proof. (i) Without loss of generality assume that 0 ∈ F . Then the affine hull of F , af f F , is a subspace containing F . Denote the dimension of af f F by m. If m = 0, then F as well as af f F consist of a single point and hence ri F is the point itself, thus proving the result. Suppose that m > 0. Then one can always find linearly independent elements x1 , x2 , . . . , xm from F such that af f F = span{x1 , x2 , . . . , xm }, that is, x1 , x2 , . . . , xm form a basis of the subspace af f F . If this was not possible, then there exist linearly independent elements y1 , y2 , . . . , yl with l < m from F such that F ⊂ span{y1 , y2 , . . . , yl }, thereby contradicting the fact that the dimension of af f F is m. Observe that co {0, x1 , x2 , . . . , xm } ⊂ F has a nonempty interior with respect to af f F , which implies co {0, x1 , x2 , . . . , xm } ⊂ ri F , thereby yielding that ri F is nonempty. (ii) Suppose that y ∈ cl F , which implies there exists {yk } ⊂ F such that yk → y. As x ∈ ri F , there exists ε > 0 such that Bε (x) ∩ af f F ⊂ F . For λ ∈ [0, 1), define yλ = (1 − λ)x + λy and yk,λ = (1 − λ)x + λyk . Therefore, from Figure 2.3, it is obvious that each point of B(1−λ)ε (yk,λ ) ∩ af f F is a convex combination of yk and some point from Bε (x) ∩ af f F . By the convexity of F , B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F, ∀ k ∈ N. Because yk → y, yk,λ → yλ . Thus, for sufficiently large k, B(1−λ)ε/2 (yλ ) ⊂ B(1−λ)ε (yk,λ ),
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
33
F
yk,λ
x
yk
ε (1 − λ)ε
FIGURE 2.3: Line segment principle.
which implies B(1−λ)ε/2 (yλ ) ∩ af f F ⊂ B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F. Hence, yλ = (1 − λ)x + λy ∈ ri F for λ ∈ [0, 1). Aliter. The above approach was direct and somewhat cumbersome. In the proof to follow, we use the fact that relative interiors are preserved under oneto-one affine transformation of Rn to itself and hence these transformations preserve the affine hulls. This property simplifies the proofs. If F is an mdimensional set in Rn , there exists a one-to-one affine transformation of Rn to itself that carries af f F to the subspace S = {(x1 , . . . , xm , xm+1 , . . . , xn ) ∈ Rn : xm+1 = 0, . . . , xn = 0}. Thus, S can be considered a copy of Rm . From this view, one can simply consider the case when F ⊂ Rn is an n-dimensional set, which implies that ri F = int F . We will now establish the result for int F instead of ri F . Because y ∈ cl F , y ∈ F + εB, ∀ ε > 0. Therefore, for every ε > 0, (1 − λ)x + λy + εB ⊂ (1 − λ)x + λ(F + εB) + εB ε(1 + λ) B + λF, ∀ λ ∈ [0, 1). = (1 − λ) x + 1−λ
© 2012 by Taylor & Francis Group, LLC
34
Tools for Convex Optimization
Because x ∈ int F , choosing ε > 0 sufficiently small such that x+
ε(1 + λ) B ⊂ F, 1−λ
which along with the convexity of F reduces the preceding relation to (1 − λ)x + λy + εB ⊂ (1 − λ)F + λF ⊂ F, ∀ λ ∈ [0, 1). Thus, (1 − λ)x + λy ∈ int F for λ ∈ [0, 1) as desired. (iii) For x ∈ ri F , by the definition of relative interior, the condition holds. Conversely, suppose that x ∈ Rn satisfies the condition. We claim that x ∈ ri F . By (i) there exists x ˜ ∈ ri F . If x = x ˜, we are done. So assume that x 6= x ˜. As x ˜ ∈ ri F ⊂ F , by the condition, there exists γ > 1 such that y = x + (γ − 1)(x − x ˜) ∈ F. Therefore, for λ =
1 ∈ (0, 1), γ x = (1 − λ)˜ x + λy,
which by the fact that y ∈ F ⊂ cl F along with the line segment principle (ii) implies that x ∈ ri F , thereby establishing the result. (iv) Because ri F ⊂ cl F , by (ii) we have that ri F is convex. From (i), we know that there exist x1 , x2 , . . . , xm ∈ F such that af f {x1 , x2 , . . . , xm } = af f F and co {0, x1 , x2 , . . . , xm } ⊂ ri F . Therefore, ri F has an affine hull the same as that of F . By Proposition 2.3, F +εB is convex for every ε > 0. Also, as intersection of convex sets is convex, cl F , which is the intersection of the collection of the sets F + εB over ε > 0 is convex. Because F ⊂ af f F , cl F ⊂ cl af f F = af f F , and as F ⊂ cl F , af f F ⊂ af f cl F , which together implies that the affine hull of cl F coincides with af f F . In the result below we discuss the closure and relative interior operations. The proofs are from Bertsekas [12] and Rockafellar [97]. Proposition 2.15 Consider nonempty convex sets F, F1 , F2 ⊂ Rn . Then the following hold: (i) cl(ri F ) = cl F . (ii) ri(cl F ) = ri F . (iii) ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ) and cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2 . In addition if ri F1 ∩ ri F2 6= ∅, ri F1 ∩ ri F2 = ri (F1 ∩ F2 )
© 2012 by Taylor & Francis Group, LLC
and
cl (F1 ∩ F2 ) = cl F1 ∩ cl F2 .
2.2 Convex Sets
35
(iv) Consider a linear transformation L : Rn → Rm . Then L(cl F ) ⊂ cl (LF )
and
L(ri F ) = ri (LF ).
(v) ri (αF ) = α ri F for every α ∈ R. (vi) ri (F1 + F2 ) = ri F1 + ri F2 and cl F1 + cl F2 ⊂ cl (F1 + F2 ). If either F1 or F2 is bounded, cl F1 + cl F2 = cl (F1 + F2 ) Proof. (i) Because ri F ⊂ F , it is obvious that cl(ri F ) ⊂ cl F. Conversely, suppose that y ∈ cl F . We claim that y ∈ cl(ri F ). Consider any x ∈ ri F . By the line segment principle, Proposition 2.14 (ii), for every λ ∈ [0, 1), (1 − λ)x + λy ∈ ri F. Observe that the sequence {(1 − λk )x + λk y} ⊂ ri F is such that as the limit λk → 1, (1 − λk )x + λk y → y, which implies that y ∈ cl(ri F ), as claimed. Hence the result. (ii) We know that F ⊂ cl F and by Proposition 2.14 (iv), af f F = af f (cl F ). Consider x ∈ ri F , which by the definition of relative interior along with the preceding facts imply that there exists ε > 0 such that (x + εB) ∩ af f (cl F ) = (x + εB) ∩ af f F ⊂ F ⊂ cl F, thereby yielding that x ∈ ri (cl F ). Hence, ri F ⊂ ri (cl F ). Conversely, suppose that x ∈ ri (cl F ). We claim that x ∈ ri F . By the nonemptiness of ri F , Proposition 2.14 (i), there exists x ˜ ∈ ri F ⊂ cl F . If in particular x = x ˜, we are done. So assume that x 6= x ˜. We can choose γ > 1, sufficiently close to 1 such that by applying the Prolongation Principle, Proposition 2.14 (iii), y = x + (γ − 1)(x − x ˜) ∈ cl F. Therefore, for λ =
1 ∈ (0, 1), γ x = (1 − λ)˜ x + λy,
which by the Line Segment Principle, Proposition 2.14 (ii), implies that x ∈ ri F , thereby leading to the requisite result.
© 2012 by Taylor & Francis Group, LLC
36
Tools for Convex Optimization
(iii) Suppose that x ∈ ri F1 ∩ ri F2 and y ∈ F1 ∩ F2 . By the Prolongation Principle, Proposition 2.14 (iii), there exist γi > 1, i = 1, 2 such that x + (γi − 1)(x − y) ∈ Fi , i = 1, 2. Choosing γ = min{γ1 , γ2 } > 1, the above condition reduces to x + (γ − 1)(x − y) ∈ F1 ∩ F2 , which again by the Prolongation Principle leads to x ∈ ri (F1 ∩ F2 ). Thus, ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ). Because F1 ∩ F2 ⊂ cl F1 ∩ cl F2 , it is obvious that cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2 as intersection of arbitrary closed sets is closed. Assume that ri F1 ∩ ri F2 is nonempty. Suppose that x ∈ ri F1 ∩ ri F2 and y ∈ cl F1 ∩ cl F2 . By the Line Segment Principle, Proposition 2.14 (ii), for every λ ∈ [0, 1), (1 − λ)x + λy ∈ ri F1 ∩ ri F2 . Observe that the sequence {(1 − λk )x + λk y} ⊂ F is such that as λk → 1, (1 − λk )x + λk y → y and hence y ∈ cl (ri F1 ∩ ri F2 ). Therefore, cl F1 ∩ cl F2 ⊂ cl (ri F1 ∩ ri F2 ) ⊂ cl (F1 ∩ F2 ),
(2.7)
thereby yielding the desired equality, that is, cl F1 ∩ cl F2 = cl (F1 ∩ F2 ). Also from the inclusion (2.7), cl (ri F1 ∩ ri F2 ) = cl (F1 ∩ F2 ). By (ii), the above condition leads to ri (ri F1 ∩ ri F2 ) = ri (cl (ri F1 ∩ ri F2 )) = ri (cl (F1 ∩ F2 )) = ri (F1 ∩ F2 ), which implies that ri F1 ∩ ri F2 ⊃ ri (F1 ∩ F2 ), thus establishing the requisite result. (iv) Suppose that x ∈ cl F , which implies there exists a sequence {xk } ⊂ F such that xk → x. Because L is a linear transformation, it is continuous. Therefore, L(xk ) → L(x), which implies L(x) ∈ cl(LF ) and hence L(cl F ) ⊂ cl(LF ).
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
37
As ri F ⊂ F , on applying the linear transformation L, L(ri F ) ⊂ LF and thus cl L(ri F ) ⊂ cl (LF ). Also, as F ⊂ cl F , proceeding as before which along with (i) and the closure inclusion yields LF ⊂ L(cl F ) = L(cl (ri F )) ⊂ cl L(ri F ). Therefore, cl (LF ) ⊂ cl L(ri F ), which by earlier condition leads to cl (LF ) = cl L(ri F ). By (ii), ri (LF ) = ri (cl (LF )) = ri (cl L(ri F )) = ri (L(ri F )), thereby yielding ri (LF ) ⊂ L(ri F ). Conversely, suppose that x ¯ ∈ L(ri F ), which implies that there exists x ˜ ∈ ri F such that x ¯ = L(˜ x). Consider any y¯ ∈ LF and corresponding y˜ ∈ F such that y¯ = L(˜ y ). By the Prolongation Principle, Proposition 2.14 (iii), there exists γ > 1 such that (1 − γ)˜ y + γx ˜ ∈ F, which under the linear transformation leads to (1 − γ)¯ y + γx ¯ ∈ LF. Because y¯ ∈ LF was arbitrary, again applying the Prolongation Principle yields x ¯ ∈ ri (LF ), that is, L(ri F ) ⊂ ri (LF ), thereby establishing the desired equality. (v) For arbitrary but fixed α ∈ R, define a linear transformation Lα : Rn → Rn given by Lα (x) = αx. Therefore, for a set F , Lα F = αF . For every α ∈ R, applying (iv) to Lα F leads to α ri F = Lα (ri F ) = ri (Lα F ) = ri (αF ) and hence the result. (vi) Define a linear transformation L : Rn × Rn → Rn given by L(x1 , x2 ) = x1 + x2 which implies L(F1 , F2 ) = F1 + F2 . Now applying (iv) to L yields ri (F1 + F2 ) = ri F1 + ri F2
and
cl F1 + cl F2 ⊂ cl (F1 + F2 ).
To establish the equality in the closure part, assume that F1 is bounded. Suppose that x ∈ cl(F1 + F2 ), which implies that there exist {xki } ⊂ Fi , i = 1, 2, such that xk1 + xk2 → x. Because F1 is bounded, {xk1 } is a bounded sequence, which leads to the boundedness of {xk2 }. By the Bolzano–Weierstrass Theorem, Proposition 1.3, the sequence {(xk1 , xk2 )} has a subsequence converging to (x1 , x2 ) such that x1 +x2 = x. As xi ∈ cl Fi for i = 1, 2, x ∈ cl F1 +cl F2 , hence establishing the result cl F1 + cl F2 = cl (F1 + F2 ).
© 2012 by Taylor & Francis Group, LLC
38
Tools for Convex Optimization
Note that for the equality part in Proposition 2.15 (iii), the nonemptiness of ri F1 ∩ ri F2 is required, otherwise the equality need not hold. We present an example from Bertsekas [12] to illustrate this fact. Consider the sets F1 = {x ∈ R : x ≥ 0}
and
F2 = {x ∈ R : x ≤ 0}.
Therefore, ri (F1 ∩ F2 ) = {0} 6= ∅ = ri F1 ∩ ri F2 . For the closure part, consider F1 = {x ∈ R : x > 0}
and
F2 = {x ∈ R : x < 0}.
Thus, cl (F1 ∩ F2 ) = ∅ 6= {0} = cl F1 ∩ cl F2 . Also the boundedness assumption in (vi) for the closure equality is necessary. For instance, consider the sets F1 F2
= {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 > 0, x2 > 0}, = {(x1 , x2 ) ∈ R2 : x1 = 0}.
Here, both F1 and F2 are closed unbounded sets, whereas the sum F1 + F2 = {(x1 , x2 ) ∈ R2 : x1 > 0} is not closed. Thus cl F1 + cl F2 = F1 + F2 $ cl (F1 + F2 ). As a consequence of Proposition 2.15, we have the following result from Rockafellar [97]. Corollary 2.16 (i) Consider two convex sets F1 and F2 in Rn . Then cl F1 = cl F2 if and only if ri F1 = ri F2 . Equivalently, ri F1 ⊂ F2 ⊂ cl F1 . (ii) Consider a convex set F ⊂ Rn . Then any open set that meets cl F also meets ri F . (iii) Consider a convex set F ⊂ Rn and an affine set H ⊂ Rn containing a point from ri F . Then ri (F ∩ H) = ri F ∩ H
and
cl(F ∩ H) = cl F ∩ H.
Proof. (i) Suppose that cl F1 = cl F2 . Invoking Proposition 2.15 (ii), ri F1 = ri(cl F1 ) = ri(cl F2 ) = ri F2 .
(2.8)
Now assume that ri F1 = ri F2 , which by Proposition 2.15 (i) implies that cl F1 = cl(ri F1 ) = cl(ri F2 ) = cl F2 . Combining the relations (2.8) and (2.9) leads to ri F1 = ri F2 ⊂ F2 ⊂ cl F2 = cl F1 ,
© 2012 by Taylor & Francis Group, LLC
(2.9)
2.2 Convex Sets
39
thereby establishing the desired result. (ii) Denote the open set by O. Suppose that O meets cl F , that is, there exists x ∈ Rn such that x ∈ O ∩ cl F . By Proposition 2.15 (i), cl F = cl(ri F ), which implies x ∈ O ∩ cl(ri F ). Because x ∈ cl(ri F ), there exists {xk } ⊂ ri F such that xk → x. Therefore, for k sufficiently large, one can choose ε¯ > 0 such that xk ∈ x + ε¯B. Also, as O is an open set and x ∈ O, there exists ε˜ > 0 such that x + ε˜B ⊂ O. Define ε = min{¯ ε, ε˜}. Thus for sufficiently large k, xk ∈ x + εB ⊂ O, which along with the fact that xk ∈ ri F implies that O also meets ri F , hence proving the result. (iii) Observe that for an affine set H, ri H = H = cl H. Therefore, by the given hypothesis, ri F ∩ H = ri F ∩ ri H 6= ∅. Thus, by Proposition 2.15 (iii), ri(F ∩ H) = ri F ∩ H
and cl(F ∩ H) = cl F ∩ H,
thereby completing the proof.
Before moving on to the various classes of convex sets, we would like to mention the concept of core of a set like the notions of closure and interior of a set from Borwein and Lewis [17]. Definition 2.17 The core of a set F ⊂ Rn , denoted by core F , is defined as core F = {x ∈ F : for every d ∈ Rn there exists λ > 0 such that x + λd ∈ F }. It is obvious that int F ⊂ core F . For a convex set F ⊂ Rn , int F = core F .
2.2.1
Convex Cones
Since we are interested in the study of convex optimization theory, a class of sets that plays an active role in this direction is the epigraphical set as discussed briefly in Chapter 1. From the definition it is obvious that epigraphical sets are unbounded. Thus it seems worthwhile to understand the class of unbounded convex sets for which one needs the idea of recession cones. But before that, we require the concept of cones.
© 2012 by Taylor & Francis Group, LLC
40
Tools for Convex Optimization
Definition 2.18 A set K ⊂ Rn is said to be a cone if for every x ∈ K, λx ∈ K for every λ ≥ 0. Therefore, for any set F ⊂ Rn , the cone generated by F is denoted by cone F and is defined as [ cone F = λF = {z ∈ Rn : z = λx, x ∈ F, λ ≥ 0}. λ≥0
Note that for a nonconvex set F , cone F may or may not be convex. For example, consider F = {(1, 1), (2, 2)}. Here, cone F = {z ∈ R2 : z = λ(1, 1), λ ≥ 0}, which is convex. Now consider F = {(−1, 1), (1, 1)}. Observe that the cone generated by F comprises of two rays, that is, cone F = {z ∈ R2 : z = λ(−1, 1) or z = λ(1, 1), λ ≥ 0}. But we are interested in the convex scenarios, thereby moving on to the notion of the convex cone. Definition 2.19 The set K ⊂ Rn is said to be convex cone if it is convex as well as a cone. Therefore, for any set F ⊂ Rn , the convex cone generated by F is denoted by cone co F and is expressed as a set containing all conic combinations of the elements of the set F , that is, cone co F = {x ∈ Rn : x =
m X i=1
λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m, m ≥ 0}.
The convex cone generated by the set F is the smallest convex cone containing F . Also, for a collection of convex sets Fi ⊂ Rn , i = 1, 2, . . . , m, the convex cone generated by Fi , i = 1, 2, . . . , m can be easily shown to be expressed as cone co
m [
Fi =
i=1
m [ X
λ i Fi .
i=1 λ∈Rm +
Some of the important convex cones that play a pivotal role in convex optimization are the polar cone, tangent cone, and the normal cone. We shall discuss them later in the chapter. Before going back to the discussion of unbounded sets, we characterize the class of convex cones in the result below. Theorem 2.20 A cone K ⊂ Rn is convex if and only if K + K ⊂ K. Proof. Suppose that the cone K is convex. Consider x, y ∈ K. Because K is convex, for λ = 1/2, (1 − λ)x + λy =
© 2012 by Taylor & Francis Group, LLC
1 (x + y) ∈ K. 2
2.2 Convex Sets
41
Also, as K is a cone, x + y ∈ 2K ⊂ K which implies that K + K ⊂ K. Conversely, suppose that K + K ⊂ K. Consider x, y ∈ K. Because K is a cone, for λ ∈ [0, 1], (1 − λ)x ∈ K
and
λy ∈ K,
which along with the assumption K + K ⊂ K leads to (1 − λ)x + λy ∈ K, ∀ λ ∈ [0, 1]. As x, y ∈ K were arbitrary, the cone K is convex, thus proving the result. Now coming back to unbounded convex sets, a set can be thought to be unbounded if for any point in the set there exists a direction moving along which one still remains within the set. Such directions are known as the directions of recession and are independent of the point chosen. Definition 2.21 For a convex set F ⊂ Rn , d ∈ Rn is said to be the direction of recession of F if x + λd ∈ F for every x ∈ F and for every λ ≥ 0. The collection of all the directions of recession of a set F ⊂ Rn form a cone known as the recession cone of F and is denoted by 0+ F . Equivalently, 0+ F = {d ∈ Rn : F + d ⊂ F }.
(2.10)
It is easy to observe that for any d ∈ 0+ F , d belongs to the set on the right-hand side of the relation (2.10) by choosing in particular λ = 1. Now suppose that d ∈ Rn belongs to the set on the right-hand side of the relation (2.10). Therefore, for any x ∈ F , x + d ∈ F . Invoking the condition iteratively, ¯ ∈ [0, 1], x + kd ∈ F for k ∈ N. Because F is convex, for any λ ¯ + λ(x ¯ + kd) = x + λkd ¯ ∈ F, ∀ k ∈ N. (1 − λ)x ¯ ≥ 0, the above condition reduces to x + λd ∈ F for every Denoting λ = λk λ ≥ 0. As this relation holds for any x ∈ F , d ∈ 0+ F , thereby establishing the relation (2.10). Below we present some properties of the recession cone with proofs from Bertsekas [12]. Proposition 2.22 Consider a closed convex set F ⊂ Rn . Then the following holds: (i) 0+ F is a closed convex cone. (ii) d ∈ 0+ F if and only if there exists x ∈ F such that x + λd ∈ F for every λ ≥ 0. (iii) The set F is bounded if and only if 0+ F = {0}.
© 2012 by Taylor & Francis Group, LLC
42
Tools for Convex Optimization
Proof. (i) Suppose that d ∈ 0+ F , which implies that for every x ∈ F and ¯ = λ/α ≥ 0. Then, λ ≥ 0, x + λd ∈ F . Consider α > 0. Denote λ ¯ x + λ(αd) = x + λd ∈ F, which implies αd ∈ 0+ F for α > 0. For α = 0, it is trivial. Thus, 0+ F is a cone. Suppose that d1 , d2 ∈ 0+ F , which implies for every x ∈ F and λ ≥ 0, x + λdi ∈ F, i = 1, 2. ˜ ∈ [0, 1], By the convexity of F , for any λ ˜ ˜ + λd2 ) = x + λ((1 − λ)d ˜ 1 + λd ˜ 2 ) ∈ F, (1 − λ)(x + λd1 ) + λ(x ˜ 1 + λd ˜ 2 ∈ 0+ F , thus implying the convexity of 0+ F . which yields that (1 − λ)d Finally, to establish the closedness of 0+ F , suppose that d ∈ cl 0+ F , which implies that there exists {dk } ⊂ 0+ F . Therefore, for every x ∈ F and every λ ≥ 0, x + λdk ∈ F . Because F is closed, x + λd ∈ F, ∀ x ∈ F, ∀ λ ≥ 0, which implies that d ∈ 0+ F , thereby implying that 0+ F is closed.
(ii) If d ∈ 0+ F , then from the definition of recession cone itself, the condition is satisfied. Conversely, suppose that d ∈ Rn is such that there exists x ∈ F satisfying x + λd ∈ F, ∀ λ ≥ 0. Without loss of generality, assume that d 6= 0. Consider arbitrary x ˜ ∈ F. Because 0+ F is a cone, it suffices to prove that x ˜ + d ∈ F . Define xk = x + kd, k ∈ N, which by the condition implies that {xk } ⊂ F . If x ˜ = xk for some k ∈ N, then again by the condition x ˜ + d = x + (k + 1)d ∈ F and thus we are done. So assume that x ˜ 6= xk for every k. Define dk = ˜= Therefore, for λ
xk − x ˜ kdk, ∀ k ∈ N. kxk − x ˜k
kyk ≥ 0, kxk − x ˜k ˜ x + λx ˜ k, x ˜ + dk = (1 − λ)˜
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
43
which implies that x ˜ + dk lies on the line starting at x ˜ and passing through xk . Now consider dk kdk
= = = =
xk − x ˜ kxk − x ˜k xk − x x−x ˜ + kxk − x ˜k kxk − x ˜k x−x ˜ kxk − xk xk − x + kxk − x ˜k kxk − xk kxk − x ˜k kxk − xk d x−x ˜ + . kxk − x ˜k kdk kxk − x ˜k
By the construction of {xk }, we know that it is an unbounded sequence. Therefore, kxk − xk kkdk = →1 kxk − x ˜k kx − x ˜ + kdk
and
x−x ˜ x−x ˜ = → 0, kxk − x ˜k kx − x ˜ + kdk
which along with the preceding condition leads to dk → d. The vector x ˜ + dk ∈ (˜ x, xk ) for every k ∈ N such that kxk − x ˜k ≥ kdk, which by the convexity of F implies that x ˜ + dk ∈ F . Therefore, x ˜ + dk → x ˜ + d, which by the closedness of F leads to x ˜ + d ∈ F . As x ˜ ∈ F was arbitrarily chosen, F + d ⊂ F , thereby implying that d ∈ 0+ F .
(iii) Suppose that F is bounded. Consider 0 6= d ∈ 0+ F , which implies that for every x ∈ F , x + λd ∈ F, ∀ λ ≥ 0. Therefore, as the limit λ → ∞, kx + λdk → ∞, thereby contradicting the boundedness of F . Hence, 0+ F = {0}. Conversely, suppose that F is unbounded. Consider x ∈ F and an unbounded sequence {xk } ⊂ F . Define dk =
xk − x . kxk − xk
Observe that {dk } is a bounded sequence and thus by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence. Without loss of generality, assume that dk → d and as kdk k = 1, kdk = 1. For any fixed λ ≥ 0, x + λdk ∈ (x, xk ) for every k ∈ N such that kxk − xk ≥ λ. By the convexity of F , x + λdk ∈ F . Because x + λdk → x + λd, which by the closedness of F implies that x + λd ∈ F, ∀ λ ≥ 0. Applying (ii) yields that 0 6= d ∈ 0+ F , thereby establishing the result.
© 2012 by Taylor & Francis Group, LLC
44
Tools for Convex Optimization
Observe that if the set F is not closed, then the recession cone of F need not be closed. Also the equivalence in (ii) of the above proposition need not hold. To verify this claim, we present an example from Rockafellar [97]. Consider the set F = {(x, y) ∈ R2 : x > 0, y > 0} ∪ {(0, 0)}, which is not closed. Here the recession cone 0+ F = F and hence is not closed. Also (1, 0) ∈ / 0+ F but (1, 1)+λ(1, 0) ∈ F for every λ ≥ 0, thereby contradicting the equivalence in (ii).
2.2.2
Hyperplane and Separation Theorems
An unbounded convex set that plays a pivotal role in the development of convex optimization is the hyperplane. A hyperplane divides the space into two half spaces. This property helps in the study of separation theorems, thus moving us a step ahead in the study of convex analysis. Definition 2.23 A hyperplane H ⊂ Rn is defined as H = {x ∈ Rn : ha, xi = b}, where a ∈ Rn with a 6= 0 and b ∈ R. If x ¯ ∈ H, then the hyperplane can be equivalently expressed as H = {x ∈ Rn : ha, xi = ha, x ¯i} = x ¯ + {x ∈ Rn : ha, xi = 0}. Therefore, H is an affine set parallel to {x ∈ Rn : ha, xi = 0}. Definition 2.24 The hyperplane H divides the space into two half spaces, either closed or open. The closed half spaces associated with H are H≤ = {x ∈ Rn : ha, xi ≤ b}
and
H≥ = {x ∈ Rn : ha, xi ≥ b},
while the open half spaces associated with H are H< = {x ∈ Rn : ha, xi < b}
and
H> = {x ∈ Rn : ha, xi > b}.
As already mentioned, the notion of separation is based on the fact that the hyperplane in Rn divides it into two parts. Before discussing the separation theorems, we first present types of separation that we will be using in our subsequent study of developing the optimality conditions for convex optimization problems. Definition 2.25 Consider two convex sets F1 and F2 in Rn . A hyperplane H ⊂ Rn is said to separate F1 and F2 if ha, x1 i ≤ b ≤ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
45
The separation is said to be strict if ha, x1 i ≤ b < ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 . The separation is proper if sup ha, x1 i ≤ inf ha, x2 i
x1 ∈F1
x2 ∈F2
and
inf ha, x1 i < sup ha, x2 i.
x1 ∈F1
x2 ∈F2
In particular, if F1 = {¯ x} and F2 = F such that x ¯ ∈ cl F , a hyperplane that separates {¯ x} and F is called a supporting hyperplane to F at x ¯, that is, ha, x ¯i ≤ ha, xi, ∀ x ∈ F. The next obvious question is when will the separating hyperplane or the supporting hyperplane exist. In this respect we prove some existence results below. The proof is from Bertsekas [12]. Theorem 2.26 (i) (Supporting Hyperplane Theorem) Consider a nonempty convex set F ⊂ Rn and x ¯∈ / ri F . Then there exist a ∈ Rn with a 6= 0 and b ∈ R such that ha, x ¯i ≤ b ≤ ha, xi, ∀ x ∈ F. (ii) (Separation Theorem) Consider two nonempty convex sets F1 and F2 in Rn such that either F1 ∩ F2 = ∅ or F1 ∩ ri F2 = ∅. Then there exists a hyperplane in Rn separating them. (iii) (Strict Separation Theorem) Consider two nonempty convex sets F1 and F2 in Rn such that F1 ∩F2 = ∅. Furthermore, if F1 −F2 is closed or F1 is closed while F2 is compact, then there exists a hyperplane in Rn strictly separating them. In particular, consider a nonempty closed convex set F ⊂ Rn and x ¯∈ / F. Then there exist a ∈ Rn with a 6= 0 and b ∈ R such that ha, x ¯i < b ≤ ha, xi, ∀ x ∈ F. (iv) (Proper Separation Theorem) Consider a nonempty convex set F ⊂ Rn and x ¯ ∈ Rn . There exists a hyperplane separating F and x ¯ properly if and only if x ¯∈ / ri F. Further, consider two nonempty convex sets F1 and F2 in Rn . Then ri F1 ∩ ri F2 = ∅ if and only if there exists a hyperplane in Rn separating the sets properly. Proof. (i) Consider the closure of F , cl F , which by Proposition 2.14 (iv) is also convex. Because x ¯∈ / ri F , there exists a sequence {xk } such that xk ∈ / cl F
© 2012 by Taylor & Francis Group, LLC
46
Tools for Convex Optimization
and xk → x ¯. Denote the projection of xk on cl F by x ¯k . By Proposition 2.52 (see Section 2.3), for every k ∈ N, hxk − x ¯k , x − x ¯k i ≤ 0, ∀ x ∈ cl F, which implies for every k ∈ N, h¯ xk − xk , xi
≥ h¯ xk − xk , x ¯i = h¯ xk − xk , x ¯ − xk i + h¯ xk − xk , xk i
≥
h¯ xk − xk , xk i, ∀ x ∈ cl F.
Dividing the above inequality throughout by k¯ xk − xk k and denoting x ¯k − xk , ak = k¯ xk − xk k hak , xk i ≤ hak , xi, ∀ x ∈ cl F, ∀ k ∈ N. As kak k = 1 for every k, {ak } is a bounded sequence. By the Bolzano– Weierstrass Theorem, Proposition 1.3, {ak } has a convergent subsequence. Without loss of generality, assume that ak → a, where a 6= 0 with kak = 1. Taking the limit as k → +∞ in the above inequality yields ha, x ¯i ≤ ha, xi, ∀ x ∈ cl F. Because F ⊂ cl F , the above inequality holds in particular for x ∈ F , that is, ha, x ¯i ≤ b ≤ ha, xi, ∀ x ∈ F, where b = inf x∈cl F ha, xi, thereby yielding the desired result. If x ¯ ∈ cl F , then the hyperplane so obtained supports F at x ¯. (ii) Define the set F = F1 − F2 = {x ∈ Rn : x = x1 − x2 , xi ∈ Fi , i = 1, 2}. Suppose that either F1 ∩ F2 = ∅ or F1 ∩ ri F2 = ∅. Under both scenarios, 0 ∈ / ri F . By the Supporting Hyperplane Theorem, that is (i), there exist a ∈ Rn with a 6= 0 such that ha, xi ≥ 0, ∀ x ∈ F, which implies ha, x1 i ≥ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 , hence proving the requisite result. (iii) We shall prove the result under the assumption that F2 −F1 is closed as by Proposition 2.15, the closedness of F1 along with the compactness of F2 imply
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
47
that F1 − F2 is closed. As F1 ∩ F2 = ∅, 0 6∈ F2 − F1 . Suppose that a ∈ F2 − F1 is the projection of origin on F2 − F1 . Therefore, there exist x ¯i ∈ Fi , i = 1, 2, x ¯1 + x ¯2 . Then the projection of x such that a = x ¯2 − x ¯1 . Define x ¯= ¯ on cl F1 2 is x ¯1 while that on cl F2 is x ¯2 . By Proposition 2.52, h¯ x−x ¯i , xi − x ¯i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2, which implies kak2 < ha, x ¯i, ∀ x1 ∈ F1 , 2 kak2 ha, x ¯i < ha, x ¯i + ≤ ha, x2 i, ∀ x2 ∈ F2 . 2
ha, x1 i ≤ ha, x ¯i −
Denoting b = ha, x ¯i, the above inequality leads to ha, x1 i < b < ha, x2 i, ∀ xi ∈ Fi , i = 1, 2, thus obtaining the strict separation result. Now consider a closed convex set F ⊂ Rn with x ¯∈ / F . Taking F1 = F and F2 = {¯ x} in the strict separation result, there exist a ∈ Rn with a 6= 0 and b ∈ Rn such that ha, x ¯i < b < ha, xi, ∀ x ∈ F. Defining ¯b = inf x∈F ha, xi, the above inequality yields ha, x ¯i < ¯b ≤ ha, xi, ∀ x ∈ F, as desired. (iv) Suppose that there exists a hyperplane that separates F and x ¯ properly; that is, there exists a ∈ Rn with a 6= 0 such that ha, x ¯i ≤ inf ha, xi x∈F
and
ha, x ¯i < sup ha, xi. x∈F
We claim that x ¯∈ / ri F . Suppose on the contrary that x ¯ ∈ ri F . Therefore by the conditions of proper separation, ha, .i attains its minimum at x ¯ over F . By the assumption that x ¯ ∈ ri F implies that ha, xi = ha, x ¯i for every x ∈ F , thereby violating the strict inequality. Hence the supposition was wrong and x ¯∈ / ri F . Conversely, suppose that x ¯∈ / ri F . Consider the following two cases. (a) x ¯ 6∈ af f F : Because af f F is a closed convex subset of Rn , by the Strict Separation Theorem, that is (iii), there exists a ∈ Rn with a 6= 0 such that ha, x ¯i < ha, xi, ∀ x ∈ af f F.
© 2012 by Taylor & Francis Group, LLC
48
Tools for Convex Optimization As F ⊂ af f F , the above inequality holds for every x ∈ F and hence ha, x ¯i ≤ inf ha, xi x∈F
and
ha, x ¯i < sup ha, xi, x∈F
thereby establishing the proper separation between F and x ¯. (b) x ¯ ∈ af f F : Consider a subspace C parallel to af f F and define the orthogonal complement of C as C ⊥ = {x∗ ∈ Rn : hx∗ , xi = 0, ∀ x ∈ C}. Define F˜ = F +C ⊥ and thus, by Proposition 2.15 (vi), ri F˜ = ri F +C ⊥ . We claim that x ¯∈ / ri F˜ . On the contrary, assume that x ¯ ∈ ri F˜ , which ⊥ implies that there exists x ∈ ri F such that x ¯ −x ∈ C . As x ¯, x ∈ af f F , x ¯ −x ∈ C. Therefore, k¯ x −xk2 = 0, thereby yielding x ¯ = x ∈ ri F , which is a contradiction. Therefore, x ¯ 6∈ ri F˜ . By the Supporting Hyperplane Theorem, that is (i), there exists a ∈ Rn with a 6= 0 such that ha, x ¯i ≤ ha, x ˜i, ∀ x ˜ ∈ F˜ , which implies that ha, x ¯i ≤ ha, x + yi, ∀ x ∈ F, ∀ y ∈ C ⊥ . Suppose that ha, y¯i 6= 0 for some y¯ ∈ C ⊥ . Without loss of generality, let ha, y¯i > 0. Consider x ˜ = x + α¯ y . Therefore, as the limit α → −∞, ha, x ˜i → −∞, thereby contradicting the above inequality. Thus, ha, yi = 0, ∀ y ∈ C ⊥ . Now by Proposition 2.14, ri F˜ is nonempty and thus ha, xi is not constant over F˜ . Thus, by the above condition on C ⊥ , ha, x ¯i
< =
sup ha, x ˜i
x ˜∈F˜
sup ha, xi + sup ha, yi = sup ha, xi,
x∈F
y∈C ⊥
x∈F
thereby establishing the proper separation between F and x ¯. Thus, the equivalence between the proper separation of F and x ¯ and the fact that x ¯∈ / ri F is proved. Consider the nonempty convex sets F1 , F2 ⊂ Rn . Define F = F2 − F1 , which by Proposition 2.15 (v) and (vi) implies that ri F = ri F1 − ri F2 .
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
49
Therefore, ri F1 ∩ ri F2 = ∅ if and only if 0 ∈ / ri F . By the proper separation result, 0 ∈ / ri F is equivalent to the existence of a ∈ Rn with a 6= 0 such that 0 ≤ inf ha, xi x∈F
and
0 < sup ha, xi. x∈F
By Proposition 1.7, sup ha, x1 i ≤ inf ha, x2 i
x1 ∈F1
x2 ∈F2
and
inf ha, x1 i < sup ha, x2 i,
x1 ∈F1
thereby completing the proof.
x2 ∈F2
A consequence of the Separation Theorem is the following characterization of a closed convex set. Theorem 2.27 A closed convex set F ⊂ Rn is the intersection of closed half spaces containing it. Consequently for any F˜ ⊂ Rn , cl co F˜ is the intersection of all the closed half spaces containing F˜ . Proof. Without loss of generality, we assume that F 6= Rn , otherwise the result is trivial. For any x ¯∈ / F , define F1 = {¯ x} and F2 = F . Therefore by Theorem 2.26 (iii), there exist (a, b) ∈ Rn × R with a 6= 0 such that ha, x ¯i < b ≤ ha, xi, ∀ x ∈ F, which implies that a closed half space associated with the supporting hyperplane contains F and not x ¯. Thus the intersection of the closed half spaces containing F has no points that are not in F . For any F˜ ⊂ Rn , taking F = cl co F˜ and applying the result for closed convex set F yields that cl co F˜ is the intersection of all the closed half spaces containing F˜ . Another application of the separation theorem is the famous Helly’s Theorem. We state the result from Rockafellar [97] without proof. Proposition 2.28 (Helly’s Theorem) Consider a collection of nonempty closed convex sets Fi , i ∈ I in Rn , where I is an arbitrary index set. Assume that the sets Fi have no common direction of recession. If every subcollection T consisting of n + 1 or fewer sets has nonempty intersection, then i∈I Fi is nonempty. The supporting hyperplanes through the boundary points of a set characterizes the convexity of the set that we present in the result below. The proof is from Schneider [103]. Proposition 2.29 Consider a closed set F ⊂ Rn such that int F is nonempty and through each boundary point of F there is a supporting hyperplane. Then F is convex.
© 2012 by Taylor & Francis Group, LLC
50
Tools for Convex Optimization
Proof. Suppose that F is not convex, which implies that there exist x, y ∈ F such that z ∈ [x, y] but z ∈ / F . Because int F is nonempty, there exists some a ∈ int F such that x, y and a are affinely independent. Also as F is closed, [a, z) meets the boundary of F , say at b ∈ F . By the given hypothesis, there is a supporting hyperplane H to F through b with a 6∈ H. Therefore, H meets af f {x, y, a} in a line and hence x, y and a must lie on the same side of the line, which is a contradiction. Hence, F is a convex set.
2.2.3
Polar Cones
From the previous discussions, we know that closed half spaces are closed convex sets and by Proposition 2.3 that arbitrary intersection of half spaces give rise to another closed convex set. One such class is of the polar cones. Definition 2.30 Consider a set F ⊂ Rn . The cone defined as F ◦ = {x∗ ∈ Rn : hx∗ , xi ≤ 0, ∀ x ∈ F } is called the polar cone of F . Observe that the elements of the polar cone make an obtuse angle with every element of the set. The cone F ◦◦ = (F ◦ )◦ is called the bipolar cone of the set F . Thus, the polar of the set F is a closed convex cone irrespective of whether F is closed convex or not. We present some properties of polar and bipolar cones. Proposition 2.31 (i) Consider two sets F1 , F2 ⊂ Rn such that F1 ⊂ F2 . Then F2◦ ⊂ F1◦ . (ii) Consider a nonempty set F ⊂ Rn . Then
F ◦ = (cl F )◦ = (co F )◦ = (cone co F )◦ . (iii) Consider a nonempty set F ⊂ Rn . Then F ◦◦ = cl cone co F. If F is a convex cone, F ◦◦ = cl F and in addition if F is closed, F ◦◦ = F . (iv) Consider two cones Ki ∈ Rni , i = 1, 2. Then (K1 × K2 )◦ = K1◦ × K2◦ . (v) Consider two cones K1 , K2 ⊂ Rn . Then (K1 + K2 )◦ = K1◦ ∩ K2◦ . (vi) Consider two closed convex cones K1 , K2 ⊂ Rn . Then (K1 ∩ K2 )◦ = cl(K1◦ + K2◦ ). The closure is superfluous under the condition K1 ∩ int K2 6= ∅.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
51
Proof. (i) Suppose that x∗ ∈ F2◦ , which implies that hx∗ , xi ≤ 0, ∀ x ∈ F2 . Because F1 ⊂ F2 , the above inequality leads to hx∗ , xi ≤ 0, ∀ x ∈ F1 , thereby showing that F2◦ ⊂ F1◦ .
(ii) As F ⊂ cl F , by (i) (cl F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ . Consider x ∈ cl F , which implies that there exists {xk } ⊂ F such that xk → x. Because x∗ ∈ F ◦ , by Definition 2.30, hx∗ , xk i ≤ 0, which implies that hx∗ , xi ≤ 0. Because x ∈ cl F was arbitrary, the above inequality holds for every x ∈ cl F and hence x∗ ∈ (cl F )◦ , thus yielding F ◦ = (cl F )◦ . As F ⊂ co F , by (i) (co F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ , which implies hx∗ , xi ≤ 0, ∀ x ∈ F. For any λ ∈ [0, 1], hx∗ , (1 − λ)x + λyi ≤ 0, ∀ x, y ∈ F, which implies that hx∗ , zi ≤ 0, ∀ z ∈ co F. Therefore, x∗ ∈ (co F )◦ , as desired. Also, because F ⊂ cone F , again by (i) (cone F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ . For any λ ≥ 0, hx∗ , λxi ≤ 0, ∀ x ∈ F, which implies that hx∗ , zi ≤ 0, ∀ z ∈ cone F. Therefore, x∗ ∈ (cone F )◦ , thereby yielding the desired result. (iii) We shall first establish the case when F is a closed convex cone. Suppose that x ∈ F . By Definition 2.30 of F ◦ , hx∗ , xi ≤ 0, ∀ x∗ ∈ F ◦ ,
© 2012 by Taylor & Francis Group, LLC
52
Tools for Convex Optimization
which implies that x ∈ F ◦◦ . Therefore, F ⊂ F ◦◦ . Conversely, suppose that x ¯ ∈ F ◦◦ . We claim that x ∈ F . On the contrary, assume that x ¯ 6∈ F . Because F is closed, by Theorem 2.26 (iii), there exist a ∈ Rn with a 6= 0 and b ∈ R such that ha, x ¯i < b ≤ ha, xi, ∀ x ∈ F. As F is a cone, 0 ∈ F , which along with the above inequality implies that b ≤ 0 and ha, x ¯i < 0. We claim that a ∈ −F ◦ . If not, then there exists x ˜∈F ˜ > 0 such that ha, λ˜ ˜ xi < b. Again, as F is a such that ha, x ˜i < 0. Choosing λ ˜ x ∈ F , thereby contradicting the fact that cone, λ˜ b ≤ ha, xi, ∀ x ∈ F. Therefore, a ∈ −F ◦ . Because ha, x ¯i < 0 for x ¯ ∈ F ◦◦ , it contradicts a ∈ −F ◦ . Thus we arrive at a contradiction and hence F ◦◦ ⊂ F , thereby leading to the requisite result. Now from (ii), it is obvious that F ◦ = (cl cone co F )◦ . Therefore, F ◦◦ = (cl cone co F )◦◦ , which by the fact that cl cone co F is a closed convex cone yields that F ◦◦ = cl cone co F, as desired. If F is a convex cone, from the above condition it is obvious that F ◦◦ = cl F . (iv) Suppose that d = (d1 , d2 ) ∈ (K1 × K2 )◦ , which implies that hd, xi ≤ 0, ∀ x ∈ K1 × K2 . Therefore, for x = (x1 , x2 ) ∈ K1 × K2 , hd1 , x1 i + hd2 , x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 . Because K1 and K2 are cones, 0 ∈ Ki , i = 1, 2. In particular, for x2 = 0, the above inequality reduces to hd1 , x1 i ≤ 0, ∀ x1 ∈ K1 , which implies that d1 ∈ K1◦ . Similarly it can be shown that d2 ∈ K2◦ . Thus, d ∈ K1◦ × K2◦ , thereby leading to (K1 × K2 )◦ ⊂ K1◦ × K2◦ . Conversely, suppose that di ∈ Ki◦ , i = 1, 2, which implies hdi , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
53
Therefore, h(d1 , d2 ), (x1 , x2 )i ≤ 0, ∀ (x1 , x2 ) ∈ K1 × K2 , which yields (d1 , d2 ) ∈ (K1 × K2 )◦ , that is, K1◦ × K2◦ ⊂ (K1 × K2 )◦ , thereby proving the result. (v) Suppose that x∗ ∈ (K1 + K2 )◦ , which implies that for xi ∈ Ki , i = 1, 2, hx∗ , x1 + x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 . Because K1 and K2 are cones, 0 ∈ Ki , i = 1, 2, which reduces the above inequality to hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2. Therefore, x∗ ∈ K1◦ ∩ K2◦ , thereby leading to (K1 + K2 )◦ ⊂ K1◦ ∩ K2◦ . Conversely, suppose that x∗ ∈ K1◦ ∩ K2◦ , which implies that hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2. Thus, for x = x1 + x2 ∈ K1 + K2 , the above inequality leads to hx∗ , xi ≤ 0, ∀ x ∈ K1 + K2 , which implies that x∗ ∈ (K1 + K2 )◦ , thus yielding the desired result. (vi) Replacing Ki by Ki◦ , i = 1, 2, in (iv) along with (iii) leads to (K1◦ + K2◦ )◦ = K1 ∩ K2 . Again by (iii), the above condition becomes cl (K1◦ + K2◦ ) = (K1 ∩ K2 )◦ , thereby yielding the requisite result.
Similar to the polar cone, we have the notion of a positive polar cone. Definition 2.32 Consider a set F ⊂ Rn . The positive polar cone to the set F is defined as F + = {x∗ ∈ Rn : hx∗ , xi ≥ 0, ∀ x ∈ F }. Observe that F + = (−F )◦ = −F ◦ . The notion of polarity will play a major role in the study of tangent and normal cones that are polar to each other. These cones are important in the development of convex optimization.
© 2012 by Taylor & Francis Group, LLC
54
Tools for Convex Optimization
2.2.4
Tangent and Normal Cones
In the analysis of a constrained optimization problem, we try to look at the local behavior of the function in the neighboring feasible points. To move from one point to another feasible point, we need a direction that leads to the notion of feasible directions. Definition 2.33 Let F ⊂ Rn and x ¯ ∈ F . A vector d ∈ Rn is said to be a feasible direction of F at x ¯ if there exists α ¯ > 0 such that x ¯ + αd ∈ F, ∀ α ∈ [0, α]. ¯ It is easy to observe that the set of all feasible directions forms a cone. For a convex set F , the set of feasible directions at x ¯ is of the form α(x − x ¯) where α ∈ [0, 1] and x ∈ F . However, in case F is nonconvex, the set of feasible directions may reduce to singleton {0}. For example, consider the nonconvex set F = {(−1, 1), (1, 1)}. The only feasible direction possible is {0}. This motivates us to introduce the concept of tangent cones that would provide local information of the set F at a point even when feasible direction is just zero. The notion of tangent cones may be considered a generalization of the tangent concept in a smooth scenario to that in a nonsmooth case. Definition 2.34 Consider a set F ⊂ Rn and x ¯ ∈ F . A vector d ∈ Rn is said to be a tangent to F at x ¯ if there exist {dk } ⊂ Rn with dk → d and {tk } ⊂ R+ with tk → 0 such that x ¯ + tk dk ∈ F, ∀ k ∈ N. Observe that if d is a tangent, then so is λd for λ ≥ 0. Thus, the collection of all tangents form a cone known as the tangent cone denoted by TF (¯ x) and given by TF (¯ x) = {d ∈ Rn : there exist dk → d, tk ↓ 0 such that x ¯ + tk dk ∈ F, ∀ k ∈ N}. In the above definition, denote xk = x ¯ + tk dk ∈ F . Taking the limit as k → +∞, tk → 0, and dk → d, which implies that tk dk → 0, thereby leading to xk → x ¯. Also from construction, xk − x ¯ = dk → d. tk
Thus, the tangent cone can be equivalently expressed as TF (¯ x) = {d ∈ Rn : there exist {xk } ⊂ F, xk → x ¯, tk ↓ 0 such that xk − x ¯ → d}. tk Figure 2.4 is a representation of the tangent cone to a convex set F . Next we present some properties of the tangent cone. The proofs are from HiriartUrruty and Lemar´echal [63].
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
55
x¯ + TF (¯ x)
F
x¯ TF (¯ x)
FIGURE 2.4: Tangent cone.
Theorem 2.35 Consider a set F ⊂ Rn and x ¯ ∈ F . Then the following hold: (i) TF (¯ x) is closed. (ii) If F is convex, TF (¯ x) is the closure of the cone generated by F − {¯ x}, that is, TF (¯ x) = cl cone(F − x ¯) and hence convex. Proof. (i) Suppose that {dk } ⊂ TF (¯ x) such that dk → d. Because dk ∈ TF (¯ x), there exist {xrk } ⊂ F with xrk → x ¯ and {trk } ⊂ R+ with trk → 0 such that xrk − x ¯ → dk , ∀ k ∈ N. trk For a fixed k, one can always find r¯ such that k
© 2012 by Taylor & Francis Group, LLC
xrk − x ¯ 1 − dk k < , ∀ r ≥ r¯. trk k
56
Tools for Convex Optimization
Taking the limit as k → +∞, one can generate a sequence {xk } ⊂ F with xk → x ¯ and {tk } ⊂ R+ with tk → 0 such that xk − x ¯ → d. tk Thus, d ∈ TF (¯ x), thereby establishing that TF (¯ x) is closed. (ii) Suppose that d ∈ TF (¯ x), which implies that there exist {xk } ⊂ F with xk → x ¯ and {tk } ⊂ R+ with tk → 0 such that xk − x ¯ → d. tk Observe that xk − x ¯∈F −x ¯. As tk > 0, 1/tk > 0, which implies that xk − x ¯ ∈ cone (F − x ¯), tk thereby implying that d ∈ cl cone (F − x ¯). Hence TF (¯ x) ⊂ cl cone (F − x ¯).
(2.11)
Conversely, consider an arbitrary but fixed element x ∈ F . Define a sequence xk = x ¯+
1 (x − x ¯), k ∈ N. k
By the convexity of F , it is obvious that {xk } ⊂ F . Taking the limit as k → +∞, xk → x ¯, then by construction k(xk − x ¯) = x − x ¯. (xk − x ¯) 1 > 0, tk → 0 such that → x−x ¯, which implies that k tk x−x ¯ ∈ TF (¯ x). Because x ∈ F is arbitrary, F − x ¯ ⊂ TF (¯ x). As TF (¯ x) is a cone, cone (F − x ¯) ⊂ TF (¯ x). By (i), TF (¯ x) is closed, which implies Denoting tk =
cl cone (F − x ¯) ⊂ TF (¯ x). The above inclusion along with the reverse inclusion (2.11) yields the desired equality. Because F is convex, the set F − x ¯ is also convex. Invoking Proposition 2.14 (iv) implies that TF (¯ x) is a convex set. We now move on to another conical approximation of a convex set that is the normal cone that plays a major role in establishing the optimality conditions.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
57
F x¯ x¯ + NF (¯ x)
NF (¯ x)
FIGURE 2.5: Normal cone.
Definition 2.36 Consider a convex set F ⊂ Rn and x ¯ ∈ F . A vector d ∈ Rn is normal to F at x ¯ if hd, x − x ¯i ≤ 0, ∀ x ∈ F. Observe that if d is a normal, then so is λd for λ ≥ 0. The collection of all normals forms the cone called normal cone and is denoted by NF (¯ x). For a convex set, the relation between the tangent cone and the normal cone is given by the proposition below. Proposition 2.37 Consider a convex set F ⊂ Rn . Then TF (¯ x) and NF (¯ x) are polar to each other, that is, NF (¯ x) = (TF (¯ x))◦
and
TF (¯ x) = (NF (¯ x))◦ .
Proof. Suppose that d ∈ NF (¯ x), which implies that hd, x − x ¯i ≤ 0, ∀ x ∈ F.
© 2012 by Taylor & Francis Group, LLC
58
Tools for Convex Optimization
Observe that for x ∈ F , x − x ¯ ∈ F −x ¯, which implies that d ∈ (F − x ¯)◦ . By Proposition 2.31 (ii) along with the convexity of F and hence of F − x ¯, and Theorem 2.35 (ii), d ∈ (cl cone (F − x ¯))◦ = (TF (¯ x))◦ , thereby implying that NF (¯ x) ∈ (TF (¯ x))◦ . Conversely, suppose that d ∈ (TF (¯ x))◦ . As F − x ¯ ⊂ TF (¯ x), by Proposi◦ ◦ tion 2.31 (i), (TF (¯ x)) ⊂ (F − x ¯) , which implies that hd, x − x ¯i ≤ 0, ∀ x ∈ F, that is, d ∈ NF (¯ x). Therefore, NF (¯ x) = (TF (¯ x))◦ as desired. For a convex set F , TF (¯ x) is a closed convex cone. Therefore, by Proposition 2.31 (iii), (NF (¯ x))◦ = (TF (¯ x))◦◦ = TF (¯ x), thereby yielding the requisite result.
Figure 2.5 is a representation of the normal cone to a convex set F . Observe that the normal cone is polar to the tangent cone in Figure 2.4. Now we present some simple examples for tangent cones and normal cones. Example 2.38 (i) For a convex set F ⊂ Rn , it can be easily observed that TF (x) = Rn for every x ∈ int F and by polarity, NF (x) = {0} for every x ∈ int F . (ii) For a closed convex cone K ⊂ Rn , by Theorem 2.35 (ii) it is obvious that TK (0) = K while by Proposition 2.37, NK (0) = K ◦ . Also, for 0 6= x ∈ K, from the definition of normal cone, NK (x) = {d ∈ Rn : d ∈ K ◦ , hd, xi = 0}. (iii) Consider the closed convex set F ⊂ Rn given by F = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m} and define the active index set I(x) = {i ∈ {1, 2, . . . , m} : hai , xi = bi }. The set F is called a polyhedral set, which we will discuss in the next section. Then TF (x) NF (x)
= {d ∈ Rn : hai , di ≤ 0, ∀ i ∈ I(x)}, = cone co {ai : i ∈ I(x)}.
Before moving on to discuss the polyhedral sets, we present some results on the tangent and normal cones.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
59
Proposition 2.39 (i) Consider two closed convex sets Fi ⊂ Rn , i = 1, 2. Let x ¯ ∈ F1 ∩ F2 . Then T (¯ x; F1 ∩ F2 ) ⊂ T (¯ x; F1 ) ∩ T (¯ x; F2 ), N (¯ x; F1 ∩ F2 ) ⊃ N (¯ x; F1 ) + N (¯ x; F2 ). If in addition, ri F1 ∩ ri F2 6= ∅, the above inclusions hold as equality.
(ii) Consider two closed convex sets Fi ⊂ Rni , i = 1, 2. Let x ¯i ∈ Fi , i = 1, 2. Then T ((¯ x1 , x ¯2 ); F1 × F2 ) = T (¯ x1 ; F1 ) × T (¯ x2 ; F2 ), N ((¯ x1 , x ¯2 ); F1 × F2 ) = N (¯ x1 ; F1 ) × N (¯ x2 ; F2 ). (iii) Consider two closed convex sets Fi ⊂ Rn , i = 1, 2. Let x ¯i ∈ Fi , i = 1, 2. Then T (¯ x1 + x ¯2 ; F1 + F2 ) = cl (T (¯ x1 ; F1 ) + T (¯ x2 ; F2 )), N (¯ x1 + x ¯2 ; F1 + F2 ) = N (¯ x1 ; F1 ) ∩ N (¯ x2 ; F2 ). Proof. (i) We first establish the result for the normal cone and then use it to derive the result for the tangent cone. Suppose that di ∈ NFi (¯ x), i = 1, 2, which implies that hdi , xi − x ¯i ≤ 0, ∀ xi ∈ Fi , i = 1, 2. For any x ∈ F1 ∩ F2 , the above inequality is still valid for i = 1, 2. Therefore, hd1 + d2 , x − x ¯i ≤ 0, ∀ x ∈ F1 ∩ F2 , which implies that d1 + d2 ∈ NF1 ∩F2 (¯ x). Because di ∈ NFi (¯ x), i = 1, 2, were arbitrarily chosen, NF1 (¯ x) + NF2 (¯ x) ⊂ NF1 ∩F2 (¯ x). By Propositions 2.31 (i), (v), and 2.37, TF1 ∩F2 (¯ x) ⊂ (NF1 (¯ x) + NF2 (¯ x))◦ = TF1 (¯ x) ∩ TF2 (¯ x), as desired. We shall prove the equality part as an application of the subdifferential sum rule, Theorem 2.91. (ii) Suppose that d = (d1 , d2 ) ∈ NF1 ×F2 (¯ x1 , x ¯2 ), which implies h(d1 , d2 ), (x1 , x2 ) − (¯ x1 , x ¯2 )i ≤ 0, ∀ (x1 , x2 ) ∈ F1 × F2 , that is, hd1 , x1 − x ¯1 i + hd2 , x2 − x ¯2 i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 . The above inequality holds in particular for x2 = x ¯2 , thereby reducing it to hd1 , x1 − x ¯1 i ≤ 0, ∀ x1 ∈ F1 ,
© 2012 by Taylor & Francis Group, LLC
60
Tools for Convex Optimization
which by Definition 2.36 implies that d1 ∈ NF1 (¯ x1 ). Similarly, it can be shown that d2 ∈ NF2 (¯ x2 ). Because (d1 , d2 ) ∈ NF1 ×F2 (¯ x1 , x ¯2 ) was arbitrary, NF1 ×F2 (¯ x1 , x ¯2 ) ⊂ NF1 (¯ x1 ) × NF2 (¯ x2 ). Conversely, consider d1 ∈ NF1 (¯ x1 ) and d2 ∈ NF2 (¯ x2 ), which implies that (d1 , d2 ) ∈ NF1 (¯ x1 ) × NF2 (¯ x2 ). Therefore, hdi , xi − x ¯i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2, which leads to h(d1 , d2 ), (x1 , x2 ) − (¯ x1 , x ¯2 )i ≤ 0, ∀ (x1 , x2 ) ∈ F1 × F2 , thereby yielding that (d1 , d2 ) ∈ NF1 ×F2 (¯ x1 , x ¯2 ). As di ∈ NFi (¯ xi ), i = 1, 2, were arbitrary, NF1 ×F2 (¯ x1 , x ¯2 ) ⊃ NF1 (¯ x1 ) × NF2 (¯ x2 ), thereby leading to the desired result. The result on the tangent cone can be obtained by applying Propositions 2.31 (iv) and 2.37. (iii) Suppose that d ∈ NF1 +F2 (¯ x1 + x ¯2 ), which leads to hd, x1 − x ¯1 i + hd, x2 − x ¯2 i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 . In particular, for x2 = x ¯2 , the above inequality reduces to hd, x1 − x ¯1 i ≤ 0, ∀ x1 ∈ F1 , that is, d ∈ NF1 (¯ x1 ). Similarly, d ∈ NF2 (¯ x). Because d ∈ NF1 +F2 (¯ x1 + x ¯2 ) was arbitrary, NF1 +F2 (¯ x1 + x ¯2 ) ⊂ NF1 (¯ x1 ) ∩ NF2 (¯ x2 ). Conversely, consider d ∈ NF1 (¯ x1 ) ∩ NF2 (¯ x2 ). Therefore, hd, xi − x ¯i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2, which implies that hd, (x1 + x2 ) − (¯ x1 + x ¯2 )i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 . This leads to d ∈ NF1 +F2 (¯ x1 + x ¯2 ). As d ∈ NF1 (¯ x1 ) ∩ NF2 (¯ x2 ) was arbitrary, NF1 (¯ x1 ) ∩ NF2 (¯ x2 ) ⊂ NF1 +F2 (¯ x1 + x ¯2 ), thus establishing the desired result. The result on tangent cone can be obtained by applying Propositions 2.31 (vi) and 2.37.
2.2.5
Polyhedral Sets
As discussed in the beginning, finite intersection of closed half spaces generate a class of convex sets known as the polyhedral sets. Here, we discuss briefly this class of sets. Definition 2.40 A set P ⊂ Rn is said to be a polyhedral set if it is nonempty and is expressed as P = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m}, where ai ∈ Rn and bi ∈ R for i = 1, 2, . . . , m. Obviously, P is a convex set.
© 2012 by Taylor & Francis Group, LLC
2.2 Convex Sets
61
Polyhedral sets play an important role in the study of linear programming problems. A polyhedral set can also be considered a finite intersection of closed half spaces and hyperplane. Any hyperplane ha, xi = b can be further segregated into two half spaces, ha, xi ≤ b and h−a, xi ≤ −b, and thus can be expressed as the form in the definition. If in the above definition of polyhedral sets, bi = 0, i = 1, 2, . . . , m, we get the notion of polyhedral cones. Definition 2.41 A polyhedral set P is a polyhedral cone if and only if it can be expressed as the intersection of finite collection of closed half spaces whose supporting hyperplane pass through the origin. Equivalently, the polyhedral cone P is given by P = {x ∈ Rn : hai , xi ≤ 0, i = 1, 2, . . . , m}, where ai ∈ Rn for i = 1, 2, . . . , m. Next we state some operations on the polyhedral sets and cones. For proofs, the readers are advised to refer to Rockafellar [97]. Proposition 2.42 (i) Consider polyhedral set (cone) P ⊂ Rn and a linear transformation A : Rn → Rm . Then A(C) as well as A−1 (C) are polyhedral sets (cones). (ii) Consider polyhedral sets (cones) Fi ⊂ Rni , i = 1, 2, . . . , m. Then the Cartesian product F1 × F2 × . . . × Fm is a polyhedral set (cone). (iii) Consider polyhedral sets (cones) Fi ⊂ Rn , i = 1, 2, . . . , m. Then the m m X \ Fi are also polyhedral sets (cones). Fi and the sum intersection i=1
i=1
With the notion of polyhedral sets, another concept that comes into the picture is that of a finitely generated set. Definition 2.43 A set F ⊂ Rn is a finitely generated set if and only if there exist xi ∈ Rn , i = 1, 2, . . . , m, such that for a fixed integer j, 0 ≤ j ≤ m, F is given by F = {x ∈ Rn : x =
j X i=1
λi xi +
m X
i=j+1
λi xi , λi ≥ 0, i = 1, 2, . . . , m, j X
λi = 1}
i=1
where {x1 , x2 , . . . , xm } are the generators of the set. For a finitely generated cone, it is the same set with j = 0 and then {x1 , x2 , . . . , xm } are the generators of the cone.
© 2012 by Taylor & Francis Group, LLC
62
Tools for Convex Optimization
Below we mention some characterizations and properties of polyhedral sets and finitely generated cones. The results are stated without proofs. For more details on polyhedral sets, one can refer to Bertsekas [12], Rockafellar [97], and Wets [111]. Proposition 2.44 (i) A set (cone) is polyhedral if and only if it is finitely generated. (ii) The polar of a polyhedral convex set is polyhedral. (iii) Let x1 , x2 , . . . , xm ∈ Rn . Then the finitely generated cone F = cone co{x1 , x2 , . . . , xm } is closed and its polar cone is a polyhedral cone given by F ◦ = {x ∈ Rn : hxi , xi ≤ 0, i = 1, 2, . . . , m}. With all these background on convex sets, we move on to the study of convex functions.
2.3
Convex Functions
We devote this section to the study of convex functions and their properties. We also look into some special class of convex functions, namely the sublinear functions. We begin by formally defining the convex functions. But before that, let us recall some notions. ¯ The domain of the funcDefinition 2.45 Consider a function φ : Rn → R. tion φ is defined as dom φ = {x ∈ Rn : φ(x) < +∞}. The epigraph of the function φ is given by epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}. Observe that the notion of epigraph holds for domain points only. The function is proper if φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. A function is said to be improper if there exists x ˆ ∈ Rn such that φ(ˆ x) = −∞. ¯ is said to be convex if for any Definition 2.46 A function φ : Rn → R n x, y ∈ R and λ ∈ [0, 1] we have φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
φ
63
epi φ (y, φ(y))
(x, φ(x)) x
y
FIGURE 2.6: Graph and epigraph of convex function.
If φ is a convex function, then the function ψ : Rn → R defined as ψ = −ψ is said to be a concave function. ¯ is said to be strictly convex if for Definition 2.47 A function φ : Rn → R n distinct x, y ∈ R and λ ∈ (0, 1) we have φ((1 − λ)x + λy) < (1 − λ)φ(x) + λφ(y). The proposition given below is an equivalent characterization of a convex set in terms of its epigraph mentioned in Chapter 1. ¯ φ is convex if Proposition 2.48 Consider a proper function φ : Rn → R. n and only if epi φ is a convex set on R × R. Proof. Suppose φ is convex. Consider (xi , αi ) ∈ epi φ, i = 1, 2, which implies that φ(xi ) ≤ αi , i = 1, 2. This along with the convexity of φ yields that for every λ ∈ [0, 1], φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ) ≤ (1 − λ)α1 + λα2 . Thus, ((1 − λ)x1 + λx2 , (1 − λ)α1 + λα2 ) ∈ epi f . Because (xi , αi ), i = 1, 2, were arbitrary, thereby leading to the convexity of epi φ on Rn × R. Conversely, suppose that epi φ is convex. Consider x1 , x2 ∈ dom φ. It is obvious that (xi , φ(xi )) ∈ epi φ, i = 1, 2. By the convexity of epi φ, for every λ ∈ [0, 1], (1 − λ)(x1 , φ(x1 )) + λ(x2 , φ(x2 )) ∈ epi φ, which implies that φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1], thereby implying the convexity of φ, and thus establishing the result.
© 2012 by Taylor & Francis Group, LLC
64
Tools for Convex Optimization
Figure 2.6 represents the graph and epigraph of a convex function. Observe that the epigraph is a convex set. Another alternate characterization of the convex function is in terms of the strict epigraph set. So next we state the notion of strict epigraph and present the equivalent characterization. Definition 2.49 The strict epigraph of the function φ is given by epis φ = {(x, α) ∈ Rn × R : φ(x) < α}. ¯ φ is convex if Proposition 2.50 Consider a proper function φ : Rn → R. and only if epis φ is a convex set on Rn × R. Proof. The necessary part, that is, the convexity of φ implies that epi s φ is convex, can be worked along the lines of proof of Proposition 2.48. Conversely, suppose that epis φ is convex. Consider x1 , x2 ∈ dom φ and αi ∈ R, i = 1, 2 such that φ(xi ) < αi , i = 1, 2. Therefore, (xi , αi ) ∈ epis φ, i = 1, 2. By the convexity of epis φ, for every λ ∈ [0, 1], (1 − λ)(x1 , α1 ) + λ(x2 , α2 ) ∈ epis φ, which implies that φ((1 − λ)x1 + λx2 ) < (1 − λ)α1 + λα2 , ∀ λ ∈ [0, 1]. As the above inequality holds for every αi > φ(xi ), i = 1, 2, taking the limit as αi → φ(xi ), i = 1, 2, the above condition becomes φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1]. Because x1 and x2 were arbitrarily chosen, the above inequality leads to the convexity of φ and hence the result. The definitions presented above are for extended-valued functions. These definitions can also be given for a function to be convex over a convex set F ⊂ Rn as φ is convex when dom φ is restricted to F . However, in this book, we will be considering real-valued functions unless otherwise specified. Next we state Jensen’s inequality for the proper convex functions. The proof can be worked out using the induction and the readers are advised to do so. Theorem 2.51 (Jensen’s Inequality) Consider a proper convex function ¯ Let xi ∈ dom φ and λi ≥ 0 for i = 1, 2, . . . , m with Pm λi = 1. φ : Rn → R. i=1 Then φ is convex if and only if m m X X φ( λi xi ) ≤ λi φ(xi ) i=1
i=1
for every such collection of xi and λi , i = 1, 2, . . . , m.
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
65
¯ to the set F Consider a set F ⊂ Rn . The indicator function, δF : Rn → R, is defined as 0, x ∈ F, δF (x) = +∞, otherwise. It can be easily shown that δF is lsc and convex if and only if F is closed and convex, respectively. Also, for sets F1 , F2 ⊂ Rn , δF1 ∩F2 (x) = δF1 (x) + δF2 (x). The indicator function plays an important role in the study of optimality conditions by converting a constrained problem into an unconstrained one. Consider a constrained programming problem min f (x)
subject to
x ∈ C,
where f : Rn → R and C ⊂ Rn . Then the associated unconstrained problem is min f0 (x)
subject to
x ∈ Rn ,
¯ is a function given by f0 (x) = f (x) + δC (x), that is, where f0 : Rn → R f (x), x ∈ C, f0 (x) = +∞, otherwise. We will look into this aspect more when we study the derivations of optimality condition for the convex programming problem (CP ) presented in Chapter 1 in the subsequent chapters. For a set F ⊂ Rn , the distance function, dF : Rn → R, to F from a point x ¯ is defined as dF (¯ x) = inf kx − x ¯k. x∈F
For a convex set F , the distance function dF is a convex function. If the infimum is attained, say at x ˜ ∈ F , that is, inf kx − x ¯k = k˜ x−x ¯k,
x∈F
then x ˜ is said to be a projection of x ¯ to F and denoted by projF (¯ x). Below we present an important result on projection. Proposition 2.52 Consider a closed convex set F ⊂ Rn and x ¯ ∈ Rn . Then x ˜ ∈ projF (¯ x) if and only if h¯ x−x ˜, x − x ˜i ≤ 0, ∀ x ∈ F.
© 2012 by Taylor & Francis Group, LLC
(2.12)
66
Tools for Convex Optimization
Proof. Suppose that the inequality (2.12) holds for x˜ ∈ F and x ¯ ∈ Rn . For any x ∈ F , consider kx − x ¯k2
= kx − x ¯k2 + k˜ x−x ¯k2 − 2h¯ x−x ˜, x − x ˜i 2 ≥ k˜ x−x ¯k − 2h¯ x−x ˜, x − x ˜i, ∀ x ∈ F.
Because (2.12) is assumed to hold, the above condition leads to kx − x ¯k2 ≥ k˜ x−x ¯k2 , ∀ x ∈ F, thereby implying that x ˜ ∈ projF (¯ x). Conversely, suppose that x ˜ ∈ projF (¯ x). Consider any x ∈ F and for α ∈ [0, 1], define xα = (1 − α)˜ x + αx ∈ F. Therefore, k¯ x − xα k2 = k(1 − α)(¯ x−x ˜) + α(¯ x − x)k2
= (1 − α)2 k¯ x−x ˜k2 + α2 k¯ x − xk2 + 2α(1 − α)h¯ x−x ˜, x ¯ − xi.
Observe that as a function of α, k¯ x − xα k2 has a point of minimizer over [0, 1] at α = 0. Thus, ∇α {k¯ x − xα k2 } |α=0 ≥ 0, which implies 2 −k¯ x−x ˜k2 + h¯ x − x, x ¯−x ˜i ≥ 0.
The above inequality leads to
−h¯ x−x ˜, x ¯−x ˜i + h¯ x − x, x ¯−x ˜i = h¯ x−x ˜, x ˜ − xi ≥ 0, ∀ x ∈ F, thereby yielding (2.12) and hence completing the proof.
Another class of functions that is also convex in nature are the sublinear functions and support functions. These classes of functions will be discussed in the next subsection. But before that we present some operations on the convex functions that again belong to the class of convex functions itself. n ¯ Proposition 2.53 (i) Consider proper Pm convex functions φi : R → R and αi ≥ 0, i = 1, 2, . . . , m. Then φ = i=1 αi φi is also a convex function. ¯ and a nondecreasing (ii) Consider a proper convex function φ : Rn → R ¯ proper convex function ψ : R → R. Then the composition function defined as (ψ ◦ φ)(x) = ψ(φ(x)) is a convex function provided ψ(+∞) = +∞. ¯ i ∈ I, where (iii) Consider a family of proper convex functions φi : Rn → R, I is an arbitrary index set. Then φ = supi∈I φi is a convex function.
(iv) Consider a convex set F ⊂ Rn+1 . Then φ(x) = inf{α ∈ R : (x, α) ∈ F } is convex.
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
67
Proof. (i) By Definition 2.46 of convexity, for any x, y ∈ Rn and any λ ∈ [0, 1], φi ((1 − λ)x + λy) ≤ (1 − λ)φi (x) + λφi (y), i = 1, 2, . . . , m. As αi ≥ 0, i = 1, 2, . . . , m, multiplying the above inequality by αi and adding them leads to m X i=1
αi φi ((1 − λ)x + λy) ≤ (1 − λ)
thereby yielding the convexity of
Pm
i=1
m X
αi φi (x) + λ
i=1
m X
αi φi (y),
i=1
αi φi .
(ii) By the convexity of φ, for every x, y ∈ Rn and for every λ ∈ [0, 1], φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y). Because ψ is nondecreasing convex function, for every x, y ∈ Rn , ψ(φ((1 − λ)x + λy)) ≤ ψ((1 − λ)φ(x) + λφ(y))
≤ (1 − λ)ψ(φ(x)) + λψ(φ(y)), ∀ λ ∈ [0, 1].
Thus, (ψ ◦ φ) is a convex function. T (iii) Observe that epi φ = i∈I epi φi , which on applying Proposition 2.3 (i) leads to the convexity of epi φ. Now invoking Proposition 2.48 yields the convexity of φ. (iv) Consider any arbitrary ε > 0. Then for any (xi , αi ) ∈ F, i = 1, 2, αi ≤ φ(xi ) + ε, i = 1, 2. By the convexity of F , for any λ ∈ [0, 1], φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)α1 + λα2 ≤ (1 − λ)φ(x1 ) + λφ(x2 ) + ε. Because the above condition holds for every ε > 0, taking the limit as ε → 0, the above condition reduces to φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1], thereby leading to the convexity of φ.
The proof of (iv) is from Hiriart-Urruty and Lemar´echal [63]. These properties play an important role in convex analysis. From the earlier discussions we have that a constrained problem can be equivalently expressed as an unconstrained problem using the indicator function. Under the convexity assumptions as in the convex programming problem (CP ) and using (i) of the above proposition, one has that f0 (x) = (f + δC )(x) is a convex function, thereby reducing (CP ) to an unconstrained convex programming problem that then
© 2012 by Taylor & Francis Group, LLC
68
Tools for Convex Optimization
leads to the KKT optimality conditions under some assumptions, as we shall see in Chapter 3. The property (ii) of Proposition 2.53 leads to the formulation of conjugate functions. We will discuss this class of functions later in this chapter as it will also play a pivotal in the study of convex optimization theory. Next we define infimal convolution or simply inf-convolution on convex functions. The motivation for this operation comes from the sum of epigraph and the infimum operation as in (iv) of the above proposition. Consider two ¯ i = 1, 2. Then by Proposition 2.3 (ii), proper convex functions φi : Rn → R, the set F = epi φ1 +epi φ2 is a convex set in Rn ×R. Explicitly, F is expressed as F = {(x1 + x2 , α1 + α2 ) ∈ Rn × R : (xi , αi ) ∈ epi φi , i = 1, 2}. Then by (iv) of Proposition 2.53, the function φ(x) = inf{α1 + α2 : (x1 + x2 , α1 + α2 ) ∈ F, x1 + x2 = x} is a convex function. This function φ can be reduced to the form known as the inf-convolution of φ1 and φ2 as defined below. ¯ Definition 2.54 Consider proper convex functions φi : Rn → R, i = 1, 2, . . . , m. Then the infimal convolution or inf-convolution of φ1 and φ2 ¯ and defined as is denoted by φ1 φ2 : Rn → R (φ1 φ2 )(¯ x)
= inf{φ1 (x1 ) + φ2 (x2 ) : xi ∈ Rn , i = 1, 2, x1 + x2 = x ¯} = inf{φ1 (x) + φ2 (¯ x − x) : x ∈ Rn }.
A simple consequence for the inf-convolution is the distance function. Consider a convex set F ⊂ Rn . Then the distance function φ(x) = dF (x) can be expressed as φ(x) = (φ1 φ2 )(x), where φ1 (x) = kxk and φ2 (x) = δF (x). As it turns out, the inf-convolution of convex functions is again convex. To verify this claim, we will need the following result on strict epigraph sum. The proof appears in Moreau [89] but here we present its proof and that of the proposition to follow from Attouch, Buttazzo, and Michaille [3]. ¯ Proposition 2.55 Consider two proper convex functions φi : Rn → R, i = 1, 2. Then epis (φ1 φ2 ) = epis φ1 + epis φ2 . (2.13) Consequently, cl epi (φ1 φ2 ) = cl (epi φ1 + epi φ2 ).
© 2012 by Taylor & Francis Group, LLC
(2.14)
2.3 Convex Functions
69
Proof. Consider (x, α) ∈ epis (φ1 φ2 ), which implies (φ1 φ2 )(x) < α. The above inequality holds if and only if there exist x1 , x2 ∈ Rn with x1 + x2 = x such that φ1 (x1 ) + φ2 (x2 ) < α. This is equivalent to the existence of x1 , x2 ∈ Rn and α1 , α2 ∈ R with x1 + x2 = x and α1 + α2 = α such that φi (xi ) < αi , i = 1, 2, thereby establishing (2.13). By Definition 2.49 of strict epigraph, it is obvious that for any function φ, epis φ ⊂ epi φ
and
cl epis φ = epi φ,
which along with the strict epigraph condition implies that epi (φ1 φ2 ) = cl epis (φ1 φ2 ) = cl (epis φ1 + epis φ2 ) ⊂ cl (epi φ1 + epi φ2 ).
(2.15)
Now suppose that (xi , αi ) ∈ epi φi , i = 1, 2, which along with the definition of inf-convolution implies that (φ1 φ2 )(x1 + x2 ) ≤ φ1 (x1 ) + φ2 (x2 ) ≤ α1 + α2 . Therefore, (x1 + x2 , α1 + α2 ) ∈ epi (φ1 φ2 ). Because (xi , αi ) ∈ epi φi , i = 1, 2, were arbitrary, epi φ1 + epi φ2 ⊂ epi (φ1 φ2 ). Taking closure on both sides of the above relation along with the condition (2.15) yields the condition (2.14), as desired. Using this proposition, we now move on to show that the inf-convolution of proper convex functions is also convex. ¯ Proposition 2.56 Consider two proper convex functions φ1 , φ2 : Rn → R. Then φ1 φ2 is also convex. Proof. From Proposition 2.55, epis (φ1 φ2 ) = epis φ1 + epis φ2 . As φ1 and φ2 are convex functions, by Proposition 2.50, epis φ1 and epis φ2 are convex sets. This along with the above condition implies that epis (φ1 φ2 ), is convex which again by the characterization of convex functions, Proposition 2.50, leads to the convexity of φ1 φ2 .
© 2012 by Taylor & Francis Group, LLC
70
Tools for Convex Optimization
An application of inf-convolution can be seen in the following property of indicator function. For convex sets F1 and F2 in Rn , the indicator function of the sum of the sets is δF1 +F2 = δF1 δF2 . The importance of inf-convolution will be discussed in the study of conjugate functions later in this chapter. For more on inf-convolution, the readers may refer to Str¨ omberg [107]. As we discussed that the inf-convolution is motivated by taking F = epi φ1 + epi φ2 , similarly the notion of convex hull of a function is motivated by taking F = co epi φ. Below we define this concept. Definition 2.57 The convex hull of a nonconvex function φ is denoted as co φ and is obtained from Proposition 2.53 (iv) with F = co epi φ. Therefore, by Theorem 2.7, (x, P α) ∈ F if and only if there exist (xi , αi ) ∈ epi φ, λi ≥ 0, m i = 1, 2, . . . , m with i=1 λi = 1 such that (x, α)
= λ1 (x1 , α1 ) + λ2 (x2 , α2 ) + . . . + λm (xm , αm ) = (λ1 x1 + λ2 x2 + . . . + λm xm , λ1 α1 + λ2 α2 + . . . + λm αm ).
Because φ(xi ) ≤ αi , i = 1, 2, . . . , m, Proposition 2.53 (iv) leads to co φ(x) = inf{λ1 φ(x1 ) + λ2 φ(x2 ) + . . . + λm φ(xm ) ∈ R : λ1 x1 + λ2 x2 + . . . + λm xm = x, X λi ≥ 0, i = 1, 2, . . . , m, λi = 1}.
It is the greatest convex function majorized by φ. If φ is convex, co φ = φ. The convex S hull of an arbitrary collection of functions {φi : i ∈ I} is denoted by co i∈I φi and is the convex hull of the pointwise infimum of the collection, that is, [ co φi = co inf φi . i∈I
i∈I
It is a function obtained from Proposition 2.53 (iv) by taking [ F = co epi φi . i∈I
It is the greatest convex function majorized by every φi , i = 1, 2, . . . , m. The closed convex hull of a function φ is denoted by cl co φ and defined as cl co φ(x′ ) = sup{hξ, x′ i − α : hξ, xi − α ≤ φ(x), ∀ x ∈ Rn }. Similar to closure of a function, cl co φ satisfies the condition epi cl co φ = cl co epi φ.
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
71
For more details on the convex hull and the closed convex hull of a function, readers are advised to refer to Hiriart-Urruty and Lemar´echal [62, 63]. Now before moving on with the properties of convex functions, we briefly discuss an important class of convex functions, namely sublinear and support functions, which as we will see later in the chapter are important in the study of convex analysis.
2.3.1
Sublinear and Support Functions
¯ is said to be a sublinear Definition 2.58 A proper function p : Rn → R function if and only if p is subadditive and positively homogeneous, that is, p(x1 + x2 ) ≤ p(x1 ) + p(x2 ), ∀ x1 , x2 ∈ Rn (subadditive property) p(λx) = λp(x), ∀ x ∈ Rn , ∀ λ > 0 (positively homogeneous property) From the positive homogeneity property, p(0) = λp(0) for every λ > 0, which is satisfied for p(0) = 0 as well as p(0) = +∞. Most sublinear functions satisfy p(0) = 0. As p is proper, dom p is nonempty. So if p(x) < +∞, then by the positive homogeneity property, p(tx) < +∞, which implies that dom p is a cone. Observe that as p is positively homogeneous, for x, y ∈ Rn and any λ ∈ (0, 1), p((1 − λ)x) = (1 − λ)p(x)
and
p(λy) = λp(y).
By the subadditive property of p, p((1 − λ)x + λy) ≤ p((1 − λ)x) + p(λy) = (1 − λ)p(x) + λp(y), ∀ λ ∈ (0, 1). The inequality holds as equality for λ = 0 and λ = 1. Because x, y ∈ Rn were arbitrary, p is convex. Therefore, a sublinear function is a particular class of convex functions and hence dom p is convex. Next we present a proposition that gives the geometric characterization of sublinear functions. For the proof, we will also need the equivalent form of positive homogeneity from HiriartUrruty and Lemar´echal [63] according to which p(λx) ≤ λp(x), ∀ x ∈ Rn , ∀ λ > 0. Note that if p is positively homogeneous, then the above condition holds trivially. Conversely, if the above inequality holds, then for any λ > 0, p(x) = p(λ−1 λx) ≤
1 p(λx), ∀ x ∈ Rn , λ
which along with the preceding inequality yields that p is positively homogeneous. ¯ p is a sublinear Theorem 2.59 Consider a proper function p : Rn → R. function if and only if its epigraph, epi p, is a convex cone in Rn × R.
© 2012 by Taylor & Francis Group, LLC
72
Tools for Convex Optimization
Proof. Suppose that p is sublinear. From the above discussion, p is a convex function as well and thus epi p is convex. Consider (x, α) ∈ epi p, which implies that p(x) ≤ α. By the positively homogeneous property p(λx) = λp(x) ≤ λα, λ > 0, which implies that λ(x, α) = (λx, λα) ∈ epi p for every λ > 0. Also, (0, 0) ∈ epi p. Thus, epi p is a cone. Conversely, suppose that epi p is a convex cone. By Theorem 2.20, for any (xi , αi ) ∈ epi p, i = 1, 2, (x1 + x2 , α1 + α2 ) ∈ epi p. In particular for αi = p(xi ), i = 1, 2, the above condition leads to p(x1 + x2 ) ≤ p(x1 ) + p(x2 ). Because x1 , x2 ∈ Rn are arbitrarily chosen, the above inequality implies that p is subadditive. Also, as epi p is a cone, any (x, α) ∈ epi p implies that λ(x, α) ∈ epi p for every λ > 0. In particular, for α = p(x), p(λx) ≤ λp(x), ∀ λ > 0, which is an equivalent definition for positive homogeneity, as discussed before. Hence, p is a sublinear function. Sublinear functions are particular class of convex functions. For a convex cone K ⊂ Rn , the indicator function δK and the distance function dK are also sublinear functions. An important class of sublinear functions is that of support functions. We will discuss the support functions in brief. ¯ Definition 2.60 Consider a set F ⊂ Rn . The support function, σF : Rn → R, to F at x ¯ ∈ Rn is defined as σF (¯ x) = sup h¯ x, xi. x∈F
From Proposition 1.7 (ii) and (iii), it is obvious that a support function is sublinear. As it is the supremum of linear functions that are continuous, support functions are lsc. For a closed convex cone K ⊂ Rn , 0, h¯ x, xi ≤ 0, ∀ x ∈ K, σK (¯ x) = +∞, otherwise, which is nothing but the indicator function of the polar cone K ◦ . Equivalently, σ K = δK ◦
and
δK = σ K ◦ .
Next we present some properties of the support functions, the proofs of which are from Burke and Deng [22], Hiriart-Urruty and Lemar´echal [63], and Rockafellar [97].
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
73
Proposition 2.61 (i) Consider two convex sets F1 and F2 in Rn . Then F1 ⊂ F2
=⇒
σF1 (x) ≤ σF2 (x), ∀ x ∈ Rn .
(ii) For a set F ⊂ Rn , one has σF = σcl F = σco F = σcl co F . (iii) Consider a convex set F ⊂ Rn . Then x ¯ ∈ cl F if and only if hx∗ , x ¯i ≤ σF (x∗ ), ∀ x∗ ∈ Rn . (iv) For convex sets F1 , F2 ⊂ Rn , cl F1 ⊂ cl F2 if and only if σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn . (v) Let F1 , F2 ⊂ Rn be convex sets and K ⊂ Rn be a closed convex cone. Then ⇐⇒ σF1 (x) ≤ σF2 +K ◦ (x), ∀ x ∈ Rn ⇐⇒ F1 ⊂ cl(F2 + K ◦ ).
σF1 (x) ≤ σF2 (x), ∀ x ∈ K
(vi) The support function of a set F ⊂ Rn is finite everywhere if and only if F is bounded. Proof. (i) By Proposition 1.7 (i), it is obvious that for F1 ⊂ F2 , sup hx, x1 i ≤ sup hx, x2 i, ∀ x ∈ Rn ,
x1 ∈F1
x2 ∈F2
thereby leading to the desired result. (ii) As hx, .i is linear and hence continuous over Rn , then on taking supremum over F , σF (x) = σcl
F (x),
∀ x ∈ Rn .
Because F ⊂ co F , by (i), σF (x) ≤ σco F (x), ∀ x ∈ Rn . Also, for any x′ ∈ co F , by Carath´eodory Theorem,P Theorem 2.8, there exn+1 ist x′i ∈ F , λi ≥ 0, i = 1, 2, . . . , n + 1, satisfying i=1 λi = 1 such that P n+1 ′ ′ x = i=1 λi xi . Therefore, hx, x′ i =
n+1 X i=1
λi hx, x′i i ≤
n+1 X
λi σF (x) = σF (x).
i=1
Because x′ ∈ co F was arbitrary, the above inequality holds for every x′ ∈ co F and hence σco F (x) ≤ σF (x), ∀ x ∈ Rn ,
© 2012 by Taylor & Francis Group, LLC
74
Tools for Convex Optimization
thus yielding the equality as desired. These relations also imply that σF = σcl co F . (iii) Invoking Theorem 2.27, the desired result holds. (iv) By (i) and (ii), cl F1 ⊂ cl F2 implies that
σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Conversely, suppose that the above inequality holds, which implies for every x ∈ cl F1 , hx∗ , xi ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Therefore, by (iii), x ∈ cl F2 . Because x ∈ cl F1 was arbitrary, cl F1 ⊂ cl F2 , thereby completing the proof. (v) Consider x ∈ K. As F2 ⊂ F2 + K ◦ , by (i) along with Proposition 1.7 and the definition of polar cone leads to σF2 (x) ≤ σF2 +K ◦ (x) = σF2 (x) + σK ◦ (x) ≤ σF2 (x),
that is, σF2 (x) = σF2 +K ◦ (x) for x ∈ K. Now if x 6∈ K, there exists z ∈ K ◦ such that hz, xi > 0. Consider y ∈ F2 . Therefore, as the limit λ → +∞, hy + λz, xi → +∞ which implies σF2 +K ◦ (x) = +∞. Thus, establishing the first equivalence. The second equivalence can be obtained by (ii) and (iv). (vi) Suppose that F is bounded, which implies that there exists M > 0 such that kx′ k ≤ M, ∀ x′ ∈ F. Therefore, by the Cauchy–Schwarz Inequality, Proposition 1.1, hx, x′ i ≤ kxkkx′ k ≤ kxkM, ∀ x′ ∈ F, which implies that σF (x) ≤ kxkM for every x ∈ Rn . Thus, σF is finite everywhere. Conversely, suppose that σF is finite everywhere. In the next section, we will present a result establishing the local Lipschitz property and hence continuity of the convex function, Theorem 2.72. This leads to the local boundedness. Therefore there exists M such that hx, x′ i ≤ σF (x) ≤ M, ∀ (x, x′ ) ∈ B × F.
x′ , the above inequality leads to kx′ k ≤ M for every kx′ k x′ ∈ F , thereby establishing the boundedness of F and hence proving the result. If x′ 6= 0, taking x =
As mentioned earlier, the support function is lsc and sublinear. Similarly, a closed sublinear function can be viewed as a support function. We end this subsection by presenting this important result to assert the preceding statement. The proof is again due to Hiriart-Urruty and Lemar´echal [63].
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
75
¯ there exists a Theorem 2.62 For a proper lsc sublinear function σ : Rn → R, linear function minorizing σ. In fact, σ is the supremum of the linear function minorizing it; that is, σ is the support function of the closed convex set given by Fσ = {x ∈ Rn : hx, di ≤ σ(d), ∀ d ∈ Rn }. Proof. Because sublinear functions are convex, σ is a proper lsc convex function. As we will discuss in one of the later sections, every proper lsc convex function can be represented as a pointwise supremum of affine functions majorized by it, Theorem 2.100 and there exists (x, α) ∈ Rn × R such that hx, di − α ≤ σ(d), ∀ d ∈ Rn . As σ(0) = 0, the preceding inequality leads to α ≥ 0. By the positive homogeneity of σ, hx, di −
α ≤ σ(d), ∀ d ∈ Rn , ∀ λ > 0. λ
Taking the limit as λ → +∞, hx, di ≤ σ(d), ∀ d ∈ Rn , that is, σ is minorized by linear functions. As mentioned in the beginning, convex functions are supremum of affine functions, which for sublinear functions can be restricted to linear functions. Therefore, by Theorem 2.100, σ(d) = sup hx, di x∈Fσ
and hence σ is the support function of Fσ .
After discussing these classes of convex functions, we move on to discuss the nature of convex functions.
2.3.2
Continuity Property
We have already discussed the operations that preserve convexity of the functions. Now we shall study the continuity, Lipschitzian and differentiability properties of the function. But before doing so, let us recall proper functions. ¯ is proper if φ(x) > −∞ for every x ∈ Rn and A function φ : Rn → R dom φ is nonempty, that is, epi φ is nonempty and contains no vertical lines. A function that is not proper is called an improper function. We know that for a convex function, the epigraph is a convex set. If φ is an improper convex function such that there exists x ¯ ∈ ri dom φ such that φ(¯ x) = −∞, then the convexity of epi φ is broken unless φ(x) = −∞ for every x ∈ ri dom φ. Such
© 2012 by Taylor & Francis Group, LLC
76
Tools for Convex Optimization
−1
1
epi φ1
−1
1
epi φ2
FIGURE 2.7: Epigraphs of improper functions φ1 and φ2 .
functions can however have finite values at the boundary points. For example, ¯ given by consider φ1 : R → R −∞, |x| < 1, 0, |x| = 1, φ1 (x) = +∞, |x| > 1.
Here, φ1 is an improper convex function with finite values at boundary points of the domain x = −1 and x = 1. Also it is obvious that φ cannot have a finite value on ri dom φ and −∞ at a boundary point. For better understanding, suppose that x ∈ ri odm φ such that φ(x) > −∞ and let y be a boundary point of dom φ with φ(y) = −∞. By the convexity of φ, φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y), ∀ λ ∈ (0, 1), which implies that for (1 − λ)x + λy ∈ ri dom φ, φ((1 − λ)x + λy) = −∞.
This contradicts the convexity of the epigraph. This aspect can be easily visualized by modifying the previous example as follows. Define an improper ¯ as function φ2 : R → R −∞, x = 1, 0, −1 ≤ x < 1, φ2 (x) = +∞, |x| > 1. Obviously φ2 cannot be convex as epi φ2 is not convex as in Figure 2.7. These discussions can be stated as the following result from Rockafellar [97].
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
77
¯ Then Proposition 2.63 Consider an improper convex function φ : Rn → R. φ(x) = −∞ for every x ∈ ri dom φ. Thus φ is necessarily infinite except perhaps at the boundary point of dom φ. Moreover, an lsc improper convex function can have no finite values. As discussed in Chapter 1, the continuity of a function plays an important role in the study of its bounds and hence in optimization problems. Before discussing the continuity property of convex functions we shall present some results on interior of the epigraph of a convex function and closure of a convex function. ¯ such that Proposition 2.64 Consider a convex function φ : Rn → R ri dom φ is nonempty. Then ri epi φ is also nonempty and given by ri epi φ = {(x, α) ∈ Rn × R : x ∈ ri dom φ, φ(x) < α}. Equivalently, (¯ x, α) ¯ ∈ ri epi φ if and only if α ¯ > lim supx→¯x φ(x). Proof. To obtain the result for ri epi φ, it is sufficient to derive it for int epi φ, that is, int epi φ = {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}. By Definition 2.12, for (¯ x, α) ¯ ∈ int epi φ, there exists ε > 0 such that (¯ x, α) ¯ + εB ⊂ epi φ, which implies that x ¯ ∈ int dom φ along with φ(¯ x) < α ¯ . As (¯ x, α) ¯ ∈ int epi φ is arbitrary, int epi φ ⊂ {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}. Now suppose that x ¯ ∈ int dom φ and φ(¯ x) ≤ α ¯ . Consider x1 , x2 , . . . , xm ∈ dom φ such that x ¯ ∈ int F where F = co {x1 , x2 , . . . , xm }. Define γ = max {φ(xi )}. 1,2,...,m
By the convexity of F , for any x ∈ F there exist λi ≥ 0, i = 1, 2, . . . , m, Pm satisfying i=1 λi = 1 such that x=
m X
λi xi .
i=1
Because φ is convex, φ(x) ≤
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi φ(xi ) ≤
m X i=1
λi γ = γ.
78
Tools for Convex Optimization
Therefore, the open set {(x, α) ∈ Rn × R : x ∈ int F, γ < α} ⊂ epi φ. In particular, for any α > γ, (¯ x, α) ∈ int epi φ. Thus, (¯ x, α) ¯ can be considered as lying on the interior of line segment passing through the points (¯ x, α) ∈ int epi φ, which by the line segment principle, Proposition 2.14, (¯ x, α) ¯ ∈ int epi φ. Because (¯ x, α) ¯ is arbitrary, int epi φ ⊃ {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}, thereby leading to the requisite result. Now we move on to prove the equivalent part for ri epi φ. Suppose that (¯ x, α) ¯ ∈ ri epi φ. Therefore, by the earlier characterization one can always find an ε > 0 such that x ¯ ∈ ri dom φ
and
sup φ(x) < α ¯. x∈Bε (¯ x)
Taking the limit as ε → 0 along with Definition 1.5 of limit supremum, lim sup φ(x) < α. ¯ x→¯ x
Conversely, suppose that for (¯ x, α) ¯ the strict inequality condition holds which implies lim sup φ(x) < α ¯. ε↓0 x∈Bε (¯ x)
Therefore, there exists ε > 0 such that sup φ(x) < α ¯, x∈Bε (¯ x)
which yields φ(¯ x) < α ¯ with x ¯ ∈ ri dom φ, thereby proving the equivalent result. Note that this equivalence can be established for int epi φ as well. Note that the above result can also be obtained for the relative interior of the epigraph as it is nothing but the interior relative to the affine hull of the epigraph. As a consequence of the above characterization of ri F , we have the following result from Rockafellar [97]. ¯ to be a proper convex Corollary 2.65 Consider α ∈ R and φ : Rn → R function such that for some x ∈ dom φ, φ(x) < α. Then actually φ(x) < α for some x ∈ ri dom φ. Proof. Define a hyperplane H as H = {(x, µ) ∈ Rn × R : µ < α}.
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
79
Because for some x ∈ Rn , φ(x) < α, in particular for µ = φ(x), we have that H meets epi φ. Invoking Corollary 2.16 (ii), H also meets ri epi φ, which by Proposition 2.64 implies that there exists x ∈ ri dom φ such that φ(x) < α, thereby yielding the desired result. ¯ Recall that in the previous chapter the closure of a function φ : Rn → R was defined as cl φ(¯ x) = lim inf φ(x), ∀ x ¯ ∈ Rn , x→¯ x
which is a bit complicated to compute. In case of a convex function, it is much easier to compute and is presented in the next proposition. The proof is from Rockafellar [97]. ¯ Then cl φ Proposition 2.66 Consider a proper convex function φ : Rn → R. agrees with φ in ri dom φ and for x ˆ ∈ ri dom φ, cl φ(x) = lim φ((1 − λ)ˆ x + λx), ∀ x ∈ Rn . λ→1
Proof. From Definition 1.11 of closure of a function, cl φ is lsc and cl φ ≤ φ. Therefore, by the lower semicontinuity of cl φ, lim inf (cl φ)((1 − λ)ˆ x + λx) = cl φ(x) ≤ lim inf φ((1 − λ)ˆ x + λx). λ→1
λ→1
To prove the result, we will establish the following inequality cl φ(x) ≥ lim sup φ((1 − λ)ˆ x + λx). λ→1
Consider α ∈ R such that cl φ(x) ≤ α, which implies that (x, α) ∈ epi cl φ = cl epi φ. Consider any (ˆ x, α) ˆ ∈ ri epi φ. Applying the Line Segment Principle, Proposition 2.14, (1 − λ)(ˆ x, α) ˆ + λ(x, α) ∈ ri epi φ, ∀ λ ∈ [0, 1). By Proposition 2.64, φ((1 − λ)ˆ x + λx) < (1 − λ)ˆ α + λα, ∀ λ ∈ [0, 1). Taking the limit superior as λ → 1, the above inequality leads to lim sup φ((1 − λ)ˆ x + λx) ≤ lim sup(1 − λ)ˆ α + λα = α. λ→1
λ→1
In particular, taking α = cl φ(x) in the above inequality yields the desired result.
© 2012 by Taylor & Francis Group, LLC
80
Tools for Convex Optimization In the relation cl φ(x) = lim φ((1 − λ)ˆ x + λx), λ→1
in particular, taking x = x ˆ ∈ ri dom φ leads to cl φ(ˆ x) = φ(ˆ x). Because x ˆ ∈ ri dom φ is arbitrary, cl φ = φ on ri dom φ, that is, cl φ agrees with φ in ri dom φ. Next we present some results from Rockafellar [97] on closure and relative interior. ¯ and let Proposition 2.67 Consider a proper convex function φ : Rn → R α ∈ R such that α > inf x∈Rn φ(x). Then the level sets {x ∈ Rn : φ(x) ≤ α}
and
{x ∈ Rn : φ(x) < α}
have the same closure and relative interior, namely {x ∈ Rn : cl φ(x) ≤ α}
and
{x ∈ Rn : x ∈ ri dom φ, φ(x) < α},
respectively. Proof. Define a hyperplane H = {(x, α) ∈ Rn × R : x ∈ Rn } in Rn+1 . Applying Corollary 2.65 and Proposition 2.64, H intersects ri epi φ, which implies that ri H ∩ ri epi φ = H ∩ ri epi φ 6= ∅. Now consider H ∩ epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}. Invoking Corollary 2.16 (iii), cl(H ∩ epi φ) = cl H ∩ cl epi φ = H ∩ epi cl φ,
ri(H ∩ epi φ) = ri H ∩ ri epi φ = H ∩ ri epi φ.
(2.16) (2.17)
The projection of these sets in Rn are, respectively, cl {x ∈ Rn : φ(x) ≤ α} = {x ∈ Rn : cl φ(x) ≤ α},
ri {x ∈ Rn : φ(x) ≤ α} = {x ∈ Rn : x ∈ ri dom φ, φ(x) < α}.
The latter relation implies that ri {x ∈ Rn : φ(x) ≤ α} ⊂ {x ∈ Rn : φ(x) < α} ⊂ {x ∈ Rn : φ(x) ≤ α}. Therefore, by Corollary 2.16 (ii), {x ∈ Rn : φ(x) < α} has the same closure and relative interior as {x ∈ Rn : φ(x) ≤ α}.
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
81
¯ Proposition 2.68 Consider proper convex functions φi : Rn → R, i = 1, 2, . . . , m. If every φi , i = 1, 2, . . . , m, is lsc and φ1 +φ2 +. . .+φm 6≡ +∞, then φ1 + φ2 + . . . + φm is a proper lsc convex function. If φi , i = 1, 2, . . . , m, are not all lsc but ri dom φ1 ∩ ri dom φ2 ∩ . . . ∩ ri dom φm is nonempty, then cl (φ1 + φ2 + . . . + φm ) = cl φ1 + cl φ2 + . . . + cl φm . Proof. Define φ = φ1 + φ2 + . . . + φm and assume x ˆ ∈ ri dom φ = ri (
m \
dom φi ).
i=1
By Proposition 2.66, for every x ∈ Rn , cl φ(x) = lim φ((1 − λ)ˆ x + λx) = lim λ→1
λ→1
m X i=1
φi ((1 − λ)ˆ x + λx).
(2.18)
If φi , i = 1, 2, . . . , m, are all lsc, then the above condition becomes cl φ(x) = φ1 (x) + φ2 (x) + . . . + φm (x), ∀ x ∈ Rn and thus cl φ = φ. Suppose that φi , i = 1, 2, . . . , m, are not all lsc. If m \
i=1
ri dom φi 6= ∅,
by Proposition 2.15 (iii), m \
ri dom φi = ri
i=1
m \
dom φi = ri dom φ.
i=1
Therefore, x ˆ ∈ ri dom φi , i = 1, 2, . . . , m. Again by Proposition 2.66, cl φi (x) = lim φi ((1 − λ)ˆ x + λx), i = 1, 2, . . . , m. λ→1
Therefore, the condition (2.18) becomes cl φ(x) = cl φ1 (x) + cl φ2 (x) + . . . + cl φm (x), ∀ x ∈ Rn , thereby completing the proof.
Using the above propositions, one can prove the continuity property of the convex functions.
© 2012 by Taylor & Francis Group, LLC
82
Tools for Convex Optimization
¯ is continuous on Theorem 2.69 A proper convex function φ : Rn → R ri dom φ. Proof. By Proposition 2.66, cl φ agrees with φ in ri dom φ, which implies that φ is lsc on ri dom φ. Now suppose that x ¯ ∈ ri dom φ. For any α such that (¯ x, α) ∈ ri epi φ, by Proposition 2.64, lim sup φ(x) < α. x→¯ x
Taking the limit as α → φ(¯ x), the preceding condition becomes lim sup φ(x) ≤ φ(¯ x), x→¯ x
thereby implying the upper semicontinuity of φ at x ¯. Because x ¯ ∈ ri dom φ is arbitrary, φ is usc on ri dom φ. Thus φ is continuous on ri dom φ, thereby yielding the desired result. Before moving on to discuss the derivative property of a convex function, we shall discuss its Lipschitzian property. For that we first define Lipschitz and locally Lipschitz functions. Definition 2.70 A function φ : Rn → R is said to be Lipschitz if there exists L > 0 such that |φ(x) − φ(y)| ≤ L kx − yk, ∀ x, y ∈ Rn . The positive number L is called the Lipschitz constant of φ, or φ is said to be Lipschitz with constant L. Definition 2.71 Consider a function φ : Rn → R and x ¯ ∈ Rn . Then φ is said to be locally Lipschitz if there exist Lx¯ > 0 and a neighborhood N (¯ x) of x ¯ such that |φ(x) − φ(y)| ≤ Lx¯ kx − yk, ∀ x, y ∈ N (¯ x). It is a well known that a Lipschitz function is continuous but the converse need not hold. From Theorem 2.69, we know that a convex function function is continuous in the relative interior of its domain. In the results to follow, we show that local boundedness of a convex function implies that the function is continuous as well as is locally Lipschitz. The result is from Attouch, Buttazzo, and Michaille [3]. ¯ and Theorem 2.72 Consider a proper convex function φ : Rn → R x ¯ ∈ dom φ. For some ε > 0 such that sup φ(x) = M < +∞. x∈Bε (¯ x)
Then φ is continuous at x ¯. Moreover, φ is Lipschitz continuous on every ball Bε′ (¯ x) with ε′ < ε and |φ(x) − φ(y)| ≤
© 2012 by Taylor & Francis Group, LLC
2M kx − yk, ∀ x, y ∈ Bε′ (¯ x). ε − ε′
2.3 Convex Functions
83
Proof. Without loss of generality, by translation (that is, by considering the function φ(x + x ¯) − φ(¯ x)), the problem reduces to the case when x ¯ = 0 and φ(0) = 0. Therefore, the local boundedness in the neighborhood of x¯ = 0 reduces to sup φ(x) = M < +∞. x∈Bε (0)
Consider an arbitrary δ ∈ (0, 1] and x ∈ Bδε (0). Now expressing 1 x = (1 − δ)0 + δ( x), δ
1 where x ∈ Bε (0). The convexity of φ along with the local boundedness δ condition leads to 1 φ(x) ≤ (1 − δ)φ(0) + δφ( x) ≤ δM. δ Rewriting 0= where
δ −1 1 x+ ( x), 1+δ 1+δ δ
1 x ∈ Bε (0). Again, the convexity of φ yields δ δ −1 1 δM 1 φ( x) ≤ φ(x) + , 0 = φ(0) ≤ φ(x) + δ 1+δ δ 1+δ 1+δ
which along with the previous condition on φ(x) implies that −δM ≤ φ(x) ≤ δM. Because x ∈ Bδε (0) is arbitrary, |φ(x)| ≤ δM, ∀ x ∈ Bδε (0), thereby establishing the continuity of φ at 0. In the above discussion, in particular for δ = 1, |φ(x + x ¯) − φ(¯ x)| ≤ M, ∀ x ∈ Bε (0).
Consider arbitrary x, y ∈ Bε′ (¯ x) with x 6= y. Denoting δ = ε − ε′ > 0, z =x+ Observe that
δ (x − y) kx − yk
kz − x ¯k
© 2012 by Taylor & Francis Group, LLC
and
λ=
kx − yk . δ + kx − yk
δ (x − y)k kx − yk δ ≤ ky − x kx − yk ¯k + kx − yk = ε′ + δ = ε,
= k(y − x ¯) + +
84
Tools for Convex Optimization
which implies z ∈ Bε (¯ x). Also kx − ykz = (δ + kx − yk)x − δy, which implies that x = (1 − λ)y + λz, ∀ λ ∈ (0, 1). By the convexity of φ, φ(x) ≤ (1 − λ)φ(y) + λφ(x) = φ(y) + λ(φ(z) − φ(y)), which leads to φ(x) − φ(y) ≤ λ(φ(z) − φ(y)) ≤ λ|φ(z) − φ(y)|. Observe that |φ(z) − φ(y)| ≤ |φ(z) − φ(¯ x)| + |φ(y) − φ(¯ x)| ≤ 2M, as z ∈ Bε (¯ x) and y ∈ Bε′ (¯ x) ⊂ Bε (¯ x). Therefore, φ(x) − φ(y) ≤
2M kx − yk kx − yk. 2M ≤ δ + kx − yk δ
Interchanging the roles of x and y yields |φ(x) − φ(y)| ≤
2M kx − yk, δ
thereby establishing the result.
In the above result, we showed that if a proper convex function is locally bounded at a point, then it is locally Lipschitz at that point. As a matter of fact, it is more than that which is presented in the result below, the proof of which is along the lines of Hiriart-Urruty and Lemar´echal [63]. ¯ Then φ is Theorem 2.73 Consider a proper convex function φ : Rn → R. locally Lipschitz on ri dom φ. Proof. Similar to the proof of Proposition 2.14 (i), consider n + 1 linearly independent vectors x1 , x2 , . . . , xn+1 ∈ dom φ such that x ¯ ∈ ri co {x1 , x2 , . . . , xn+1 } ⊂ dom φ. Now consider ε > 0 such that Bε (¯ x) ⊂ co {x1 , x2 , . . . , xn+1 }. For any arbitrary x ∈ Bε (¯ x), there exist λi ≥ 0, Pn+1 i = 1, 2, . . . , n + 1, satisfying i=1 λi = 1 such that x=
n+1 X i=1
© 2012 by Taylor & Francis Group, LLC
λi xi .
2.3 Convex Functions
85
By the convexity of φ, φ(x) ≤
n+1 X i=1
λi φ(xi ) ≤
max
1,2,...,n+1
φ(xi ) = M < +∞.
Because x ∈ Bε (¯ x) is arbitrary, the above condition holds for every x ∈ Bε (¯ x). Therefore, by Theorem 2.72, for ε′ < ε, |φ(x) − φ(y)| ≤
2M kx − yk, ∀ x, y ∈ Bε′ (¯ x), ε − ε′
thus proving that φ is locally Lipschitz at x ¯ ∈ ri dom φ. Because x ¯ ∈ ri dom φ is arbitrary, φ is locally Lipschitz on ri dom φ.
2.3.3
Differentiability Property
After discussing the continuity and the Lipschitzian property of a convex function, we will now make a move toward studying its differentiability nature. In general, a convex function need not be differentiable on the whole of Rn . For instance, consider the convex function φ(x) = |x|. It is differentiable everywhere except x = 0, which is the point of minimizer if we minimize this function over the whole of R. Another example of nonsmooth convex function that appears naturally is the max-function. Consider φ(x) = max{x, x2 }. As we know from Proposition 2.53, the supremum of convex functions is convex, so φ is convex. Here both x and x2 are differentiable over R but φ is not differentiable at x = 0 and x = 1. Again for the unconstrained minimization of φ over R, the point of minimizer is x ¯ = 0. So how is one supposed to study the optimality at a point if the function is not differentiable there. This means the notion of differentiability must be replaced by some other concept so as to facilitate nonsmooth convex functions. For a differentiable function we know that both left-sided as well as right-sided derivatives exist and are equal. In case of a convex function, the right-sided derivative always exists. So in the direction to replace differentiability, we first introduce the concept of one-sided directional derivative or simply directional derivative. ¯ the directional Definition 2.74 For a proper convex function φ : Rn → R, derivative of φ at x ¯ ∈ dom φ in the direction d ∈ Rn is defined as φ′ (¯ x, d) = lim λ↓0
φ(¯ x + λd) − φ(¯ x) , λ
provided +∞ and −∞ are allowed as limits. Before we move on to present the result on the existence of directional derivatives of a convex function, we present a result from Rockafellar and Wets [101].
© 2012 by Taylor & Francis Group, LLC
86
Tools for Convex Optimization
Proposition 2.75 (Slope Inequality) Consider a function φ : I → R where I ⊂ R denotes an interval. Then φ is convex on I if and only if for arbitrary points x < z < y in I, φ(y) − φ(x) φ(y) − φ(z) φ(z) − φ(x) ≤ ≤ . z−x y−x y−z
(2.19)
φ(y) − φ(x) is nondecreasing on I for every y ∈ I \{x}. y−x Moreover, if φ is differentiable over an open interval I ⊂ R, then ∇φ is nondecreasing on I. Consequently, ψ(y) =
Proof. We know that the convexity of φ on I is equivalent to y−z z−x φ(x) + φ(y), ∀ x < z < y in I. φ(z) ≤ y−x y−x The above inequality leads to φ(z) − φ(x) ≤ =
y−z z−x − 1 φ(x) + φ(y) y−x y−x φ(y) − φ(x) , (z − x) y−x
as desired. The other inequalities can be established similarly, thereby leading to (2.19). Conversely, suppose that x < z < y, which implies that there exists λ ∈ (0, 1) such that z = (1 − λ)x + λy. Substituting z = (1 − λ)x + λy in (2.19) leads to φ(y) − φ(x) φ((1 − λ)x + λy) − φ(x) , ≤ λ(y − x) y−x that is, φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y). Because x and y were arbitrarily chosen, the above inequality holds for any x, y ∈ I and any λ ∈ [0, 1] (the above inequality holds trivially for λ = 0 and λ = 1). Hence, φ is a convex function. Suppose that y1 , y2 ∈ I such that yi 6= x, i = 1, 2, and y1 < y2 . Consider the following cases: x < y1 < y2 ,
y1 < x < y2
and
y1 < y2 < x.
Suppose that x < y1 < y2 . In particular, for z = y1 and y = y2 in the inequality (2.19) yields ψ(y1 ) =
© 2012 by Taylor & Francis Group, LLC
φ(y2 ) − φ(x) φ(y1 ) − φ(x) ≤ = ψ(y2 ). y1 − x y2 − x
2.3 Convex Functions
87
Applying (2.19) to the remaining two cases leads to the fact that φ(y) − φ(x) is nondecreasing. ψ(y) = y−x
Suppose that φ is convex, which implies (2.19) holds. As φ is differentiable, for x1 , x2 ∈ I with x1 < x2 , ∇φ(x1 ) ≤
φ(x1 ) − φ(x2 ) φ(x2 ) − φ(x1 ) = ≤ ∇φ(x2 ), x2 − x1 x1 − x2
thereby establishing the result.
¯ is a proper convex function, dom φ may be considered an If φ : R → R interval I. Then from the nondecreasing property of ψ in the above proposition, the right-sided derivative of φ, φ′+ , exists at x ¯ provided both −∞ and +∞ values are allowed and is defined as φ′+ (¯ x) = lim x↓¯ x
If
φ(x) − φ(¯ x) . x−x ¯
φ(x) − φ(¯ x) has a finite lower bound, x−x ¯ lim x↓¯ x
φ(x) − φ(¯ x) φ(x) − φ(¯ x) = inf , x>¯ x, x∈I x−x ¯ x−x ¯
φ(x) − φ(¯ x) φ(x) − φ(¯ x) is nondecreasing on I. In case does not have x−x ¯ x−x ¯ a finite lower bound, because
lim x↓¯ x
φ(x) − φ(¯ x) φ(x) − φ(¯ x) = inf = −∞, x>¯ x, x∈I x−x ¯ x−x ¯
and for the case when I = {¯ x}, inf
x>¯ x, x∈I
φ(x) − φ(¯ x) = +∞ x−x ¯
as {x ∈ R : x > x ¯, x ∈ I} = ∅. Thus, φ′+ (¯ x) =
inf
x>¯ x, x∈I
φ(x) − φ(¯ x) . x−x ¯
¯ and Theorem 2.76 Consider a proper convex function φ : Rn → R n ′ x ¯ ∈ dom φ. Then for every d ∈ R , the directional derivative φ (¯ x, d) exists with φ′ (¯ x, 0) = 0 and φ′ (¯ x, d) = inf
λ>0
φ(¯ x + λd) − φ(¯ x) . λ
Moreover, φ′ (¯ x, d) is a sublinear function in d for every d ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
88
Tools for Convex Optimization
¯ given by Proof. Define ψ : R → R ψ(λ) = φ(¯ x + λd). As x ¯ ∈ dom φ, ψ(0) = φ(¯ x) < +∞, which along with the convexity of φ ¯ defined implies that ψ is a proper convex function. Now consider ϕ : R → R as ϕ(λ) =
φ(¯ x + λd) − φ(¯ x) ψ(λ) − ψ(0) = . λ λ
By Proposition 2.75, ϕ is nondecreasing when λ > 0. Then by the discussion ′ preceding the theorem, ψ+ (0) exists and ′ ψ+ (0) = lim ϕ(λ) = inf ϕ(λ), λ→0
λ>0
as desired. Suppose that d ∈ Rn and α > 0. Then φ′ (¯ x, αd)
φ(¯ x + λαd) − φ(¯ x) λ→0 λ φ(¯ x + λαd) − φ(¯ x) = lim α λ→0 λα φ(¯ x + λ′ d) − φ(¯ x) = α lim = αφ′ (¯ x, d), λ′ →0 λ′
=
lim
which implies that φ′ (x, .) is positively homogeneous. Suppose that d1 , d2 ∈ Rn and α ∈ [0, 1], by the convexity of φ, φ(¯ x + λ((1 − α)d1 + αd2 )) − φ(¯ x) ≤ (1 − α)(φ(¯ x + λd1 ) − φ(¯ x)) +α(φ(¯ x + λd2 ) − φ(¯ x)). Dividing both sides by λ > 0 and taking the limit as λ → 0, the above inequality reduces to φ′ (¯ x, ((1 − α)d1 + αd2 )) ≤ (1 − α)φ′ (¯ x, d1 ) + αφ(¯ x, d2 ), ∀ α ∈ [0, 1]. In particular for α = 1/2 and applying the positive homogeneity property, the above condition yields φ′ (¯ x, (d1 + d2 )) ≤ φ′ (¯ x, d1 ) + φ(¯ x, d2 ). Because d1 , d2 ∈ Rn were arbitrary, the above inequality implies that φ′ (¯ x, .) is subadditive, which along with positive homogeneity implies that φ′ (¯ x, .) is sublinear. For a differentiable convex function φ : Rn → R, the following relation holds between the directional derivative and the gradient of the function φ φ′ (¯ x, d) = h∇φ(¯ x), di, ∀ d ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
89
But in absence of differentiability, can one have such a relation for the directional derivative? The answer is yes. The notion that replaces the gradient in the above condition is the subgradient. ¯ and Definition 2.77 Consider a proper convex function φ : Rn → R n x ¯ ∈ dom φ. Then ξ ∈ R is said to be the subgradient of the function φ at x ¯ if φ(x) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn . The collection of all such vectors constitute the subdifferential of φ at x ¯ and is denoted by ∂φ(¯ x). For x ¯∈ / dom φ, ∂φ(¯ x) is empty. For a differentiable function, its gradient at any point acts as a tangent to the graph of the function at that point. In a similar way, from the definition above it can be seen that the affine function φ(¯ x) + hξ, x − x ¯i is a supporting hyperplane to the epigraph of φ at (¯ x, φ(¯ x)) with the slope ξ. In fact, at the point of nondifferentiability, there can be an infinite number of such supporting hyperplanes and the collection of the slopes of each of these hyperplanes forms the subdifferential. Recall the indicator function to the convex set F ⊂ Rn . Obviously ¯ is a proper convex function. Now from the above definition, the δF : Rn → R subdifferential of δF at x ¯ ∈ F is given by ∂δF (¯ x)
= {ξ ∈ Rn : δF (x) − δF (¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn } = {ξ ∈ Rn : 0 ≥ hξ, x − x ¯i, ∀ x ∈ F },
which is nothing but the normal cone to the set F at x ¯. Therefore, for a convex set F , ∂δF = NF . Consider the norm function φ(x) = kxk, x ∈ Rn . Observe that φ is a convex function. At x ¯ = 0, φ is not differentiable and ∂φ(¯ x) = B. Like the relation between the directional derivative and gradient, we are interested in deriving a relationship between the directional derivative and the subdifferential, which we establish in the next result. ¯ and Theorem 2.78 Consider a proper convex function φ : Rn → R x ¯ ∈ dom φ. Then ∂φ(¯ x) = {ξ ∈ Rn : φ′ (¯ x, d) ≥ hξ, di, ∀ d ∈ Rn }. Proof. Suppose that ξ ∈ ∂φ(¯ x), which by Definition 2.77 implies that φ(x) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn . In particular, for x = x ¯ + λd with λ > 0, the above condition reduces to φ(¯ x + λd) − φ(¯ x) ≥ hξ, di, ∀ d ∈ Rn . λ
© 2012 by Taylor & Francis Group, LLC
90
Tools for Convex Optimization
Taking the limit as λ → 0 leads to φ′ (¯ x, d) ≥ hξ, di, ∀ d ∈ Rn , as desired. Conversely, suppose that ξ ∈ Rn satisfies φ′ (¯ x, d) ≥ hξ, di, ∀ d ∈ Rn . By the alternate definition of φ′ (¯ x, d) from Theorem 2.76 leads to φ(¯ x + λd) − φ(¯ x) ≥ hξ, di, ∀ d ∈ Rn . λ In particular, for λ ∈ [0, 1] and d = x − x ¯, which along with the convexity of φ leads to φ(x) − φ(¯ x) ≥
φ(¯ x + λ(x − x ¯)) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , λ
which implies that ξ ∈ ∂φ(¯ x), thereby establishing the result.
The result below from Rockafellar [97] shows that actually the directional derivative is the support function of the subdifferential set. ¯ and Theorem 2.79 Consider a proper convex function φ : Rn → R x ¯ ∈ dom φ. Then cl φ′ (¯ x, d) =
sup hξ, di = σ∂φ(¯x) (d), ∀ d ∈ Rn .
ξ∈∂φ(¯ x)
However, if x ¯ ∈ ri dom φ, φ′ (¯ x, d) = σ∂φ(¯x) (d), ∀ d ∈ Rn and if x ¯ ∈ int dom φ, φ′ (¯ x, d) is finite for every d ∈ Rn . Proof. Because φ′ (¯ x, .) is sublinear, combining Theorems 2.62 and 2.78 leads to cl φ′ (¯ x, d) = σ∂φ(¯x) (d). If x ¯ ∈ ri dom φ, the domain of φ′ (¯ x, .) is an affine set that is actually a subspace parallel to the affine hull of dom φ. By sublinearity, φ′ (¯ x, 0) = 0, it is not identically −∞ on the affine set. Therefore, by Proposition 2.63, cl φ′ (¯ x, .) and hence φ′ (¯ x, .) is a proper function. By Proposition 2.66, cl φ′ (¯ x, .) agrees with φ′ (¯ x, .) on the affine set and hence is closed, thereby leading to the desired condition. For x ¯ ∈ int dom φ, the domain of φ′ (¯ x, .) is Rn and hence it is finite everywhere. As mentioned earlier for a differentiable convex function, for every d ∈ Rn , φ (¯ x, d) = h∇φ(¯ x), di. So the question is: for a differentiable convex function, how are the gradient and the subdifferential related? We discuss this aspect in the result below. ′
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
91
Proposition 2.80 Consider a convex function φ : Rn → R differentiable at x ¯ with gradient ∇φ(¯ x). Then the unique subgradient of φ at x ¯ is the gradient, that is, ∂φ(¯ x) = {∇φ(¯ x)}. Proof. For a differentiable convex function φ, φ′ (¯ x, d) = h∇φ(¯ x), di, ∀ d ∈ Rn . By Theorem 2.79, for every ξ ∈ ∂φ(¯ x), h∇φ(¯ x) − ξ, di ≥ 0, ∀ d ∈ Rn . Because the above condition holds for every d ∈ Rn , it reduces to h∇φ(¯ x) − ξ, di = 0, ∀ d ∈ Rn , which leads to ∇φ(¯ x) = ξ. As ξ ∈ ∂φ(¯ x) is arbitrary, the subdifferential is a singleton with ∂φ(¯ x) = {∇φ(¯ x)}. From the above theorem, we have the following result, which gives the equivalent characterization of a differentiable convex function. Theorem 2.81 Consider a differentiable function φ : Rn → R. Then φ is convex if and only if φ(y) − φ(x) ≥ h∇φ(x), y − xi, ∀ x, y ∈ Rn . Observe that in Theorem 2.79 we defined the relation between the directional derivative and the support function of the subdifferential for point x¯ in the relative interior of the domain. The reason for this is the fact that at the boundary of the domain, the subdifferential may be an empty set. For a clear view into this aspect, we consider the following example from Bertsekas [12]. ¯ be a proper convex function given by Let φ : R → R √ − x, 0 ≤ x ≤ 1, φ(x) = +∞, otherwise. The subdifferential of φ is
−1 √ , 2 x ∂φ(x) = [−1/2, +∞) , ∅,
0 < x < 1, x = 1, x ≤ 0 or x > 1.
Note that the subdifferential is empty at the boundary point x = 0. Also at the other boundary point x = 1, it is unbounded. But the subdifferential may also turn out to be unbounded at a point in the relative interior of the domain.
© 2012 by Taylor & Francis Group, LLC
92
Tools for Convex Optimization
¯ defined For example, consider the following proper convex function φ : R → R as 0, x = 0, φ(x) = +∞, otherwise. Observe that at x = 0, ∂φ(x) = R, which is unbounded even though 0 is in the relative interior of the domain. Based on these illustrations, we have the following result from Rockafellar [97] and Attouch, Buttazzo, and Michaille [3]. ¯ and Proposition 2.82 Consider a proper convex function φ : Rn → R x ¯ ∈ dom φ. Then ∂φ(¯ x) is closed and convex. For x ¯ ∈ ri dom φ, the subdifferential ∂φ(¯ x) is nonempty. Furthermore, if x ¯ ∈ int dom φ, ∂φ(¯ x) is nonempty and compact. Moreover, if φ is continuous at x ¯ ∈ dom φ, then ∂φ(¯ x) is compact. Proof. Suppose that {ξk } ⊂ ∂φ(¯ x) such that ξk → ξ. By Definition 2.77 of subdifferential, φ(x) − φ(¯ x) ≥ hξk , x − x ¯i, ∀ x ∈ Rn . Taking the limit as k → +∞, the above inequality leads to φ(x) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , which implies that ξ ∈ ∂φ(¯ x), thereby yielding the closedness of ∂φ(¯ x). Consider ξ1 , ξ2 ∈ ∂φ(¯ x), which implies that for i = 1, 2, φ(x) − φ(¯ x) ≥ hξi , x − x ¯i, ∀ x ∈ Rn . Therefore, for any λ ∈ [0, 1], φ(x) − φ(¯ x) ≥ h(1 − λ)ξ1 + λξ2 , x − x ¯i, ∀ x ∈ Rn , which implies (1 − λ)ξ1 + λξ2 ∈ ∂φ(¯ x). Because ξ1 , ξ2 were arbitrary, ∂φ(¯ x) is convex. From the proof of Theorem 2.79, for x ¯ ∈ ri dom φ, φ′ (¯ x, .) is the support function of ∂φ(¯ x), which is proper. Hence, ∂φ(¯ x) is nonempty. Again by Theorem 2.79, for x ¯ ∈ int dom φ, φ′ (¯ x, .) is finite everywhere. Because it is a support of ∂φ(¯ x), by Proposition 2.61, ∂φ(¯ x) is bounded and hence compact. Now suppose that φ is continuous at x ¯ ∈ dom φ. We have already seen that ∂φ is always closed and convex. Therefore to establish that ∂φ(¯ x) is compact, we only need to show that it is bounded. By the continuity of φ at x ¯, it is bounded in the neighborhood of x¯. Thus, there exist ε > 0 and M ≥ 0 such that φ(¯ x + εd) ≤ M, ∀ d ∈ B.
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
93
Consider ξ ∈ ∂φ(¯ x), which implies that hξ, x − x ¯i ≤ φ(x) − φ(¯ x), ∀ x ∈ Rn . In particular, for any d ∈ B, the above inequality along with the boundedness of φ in the neighborhood of x ¯ leads to hξ, εdi ≤ φ(¯ x + εd) − φ(¯ x) ≤ M + |φ(¯ x)|, which implies that hξ, di ≤
1 (M + |φ(¯ x)|), ∀ d ∈ B. ε
Therefore, kξk ≤
1 (M + |φ(¯ x)|). ε
Because ξ ∈ ∂φ(¯ x) was arbitrary, ∂φ(¯ x) is bounded and hence compact.
If we consider a real-valued convex function φ : Rn → R, then int dom φ = R and therefore, the above result reduces to the following. n
Proposition 2.83 Consider a convex function φ : Rn → R. Then the subdifferential ∂φ(x) is nonempty, convex, and compact for every x ∈ Rn . With the discussion on subdifferentials, we present some properties of the subdifferential as x varies by treating it as a multifunction or set-valued mapping x 7→ ∂φ(x) starting with some of the fundamental continuity results of the subdifferential mapping. Theorem 2.84 (Closed Graph Theorem) Consider a proper lsc convex func¯ If for sequences {xk }, {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ) with tion φ : Rn → R. ¯ then ξ¯ ∈ ∂φ(¯ xk → x ¯ and ξk → ξ, x). This means gph ∂φ is a closed subset of Rn × Rn . ¯ then from Definition 2.77 Proof. Because ξk ∈ ∂φ(xk ) with (xk , ξk ) → (¯ x, ξ), of subdifferential, φ(x) − φ(xk ) ≥ hξk , x − xk i, ∀ x ∈ Rn . Taking the limit infimum as k → +∞, which along with the lower semicontinuity of φ reduces the above condition to ¯ x−x φ(x) − φ(¯ x) ≥ hξ, ¯i, ∀ x ∈ Rn , thereby implying that ξ¯ ∈ ∂φ(¯ x) and thus establishing that gph ∂φ is closed, as desired.
© 2012 by Taylor & Francis Group, LLC
94
Tools for Convex Optimization
From the above theorem one may note that the normal cone to a convex set F ⊂ Rn is also graph closed as it is nothing but the subdifferential of the convex indicator function δF , that is, NF = ∂δF . In general we know that the arbitrary union of closed sets need not be closed. But in the proposition below from Bertsekas [12] and Rockafellar [97] we have that the union of the subdifferential over a compact set is compact. n Proposition 2.85 Consider a convex S function φ : R → R and a compact n set F ∈ R . Then the set ∂φ(F ) = x∈F ∂φ(x) is nonempty and compact.
Proof. Because F is a nonempty subset of dom φ = Rn , by Proposition 2.82, ∂φ(F ) is nonempty. We claim that ∂φ(F ) is closed. Consider a sequence {ξk } ⊂ ∂φ(F ) such ¯ As ξk ∈ ∂φ(F ) for k ∈ N, there exist xk ∈ F such that that ξk → ξ. ξk ∈ ∂φ(xk ), k ∈ N. By the compactness of F , {xk } is a bounded sequence that by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence. Without loss of generality, suppose that xk → x ¯, which by the closedness of F implies that x ¯ ∈ F . Invoking the Closed Graph Theorem, Theorem 2.84, ξ¯ ∈ ∂φ(¯ x) ⊂ ∂φ(F ). Thus, ∂φ(F ) is closed. Now to establish the compactness of ∂φ(F ), we will establish the boundedness of ∂φ(F ). On the contrary, suppose that there exist a bounded sequence {xk } ⊂ F and an unbounded sequence {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ). ξk , which is a bounded sequence. Because {xk } and {ηk } are Define ηk = kξk k bounded sequences, by the Bolzano–Weierstrass Theorem, have a convergent subsequence. As ξk ∈ ∂φ(xk ), by Definition 2.77 of subdifferential, φ(xk + ηk ) − φ(xk ) ≥ hξk , ηk i = kξk k. By Theorem 2.69, φ is continuous on Rn , which along with the convergence of {xk } and {ηk } yields that φ(xk + ηk ) − φ(xk ) is bounded. Therefore, by the above inequality, {ξk } is a bounded sequence, thereby contradicting our assumption. Thus, ∂φ(F ) is a bounded set and hence compact. ¯ Then ∂φ is Theorem 2.86 Consider a proper convex function φ : Rn → R. n usc on int dom φ. Moreover, if φ : R → R is a differentiable convex function, then it is continuously differentiable. Proof. By Proposition 2.82, ∂φ(¯ x) is nonempty and compact if and only if x ¯ ∈ int dom φ. By Theorem 2.84, ∂φ is graph closed. Therefore, from the discussion on set-valued mappings in Chapter 1, ∂φ is usc on int dom φ. As for a single-valued map, the notion of upper semicontinuity coincides with that of continuity and by Proposition 2.80, for a differentiable convex function ∂φ = {∇φ}, φ is continuously differentiable. Below we state another important characteristic of the subdifferential without proof. For more details on the treatment of ∂φ as a multifunction, one may refer to Rockafellar [97].
© 2012 by Taylor & Francis Group, LLC
2.3 Convex Functions
95
¯ Then Theorem 2.87 Consider a closed proper convex function φ : Rn → R. the subdifferential ∂φ is a maximal monotone where by monotonicity we mean that for any x1 , x2 ∈ Rn , hξ1 − ξ2 , x1 − x2 i ≥ 0, ∀ ξi ∈ ∂φ(xi ), i = 1, 2, and maximal monotone map in the sense that its graph is not properly contained in the graph of any other monotone map. Similar to the standard Mean Value Theorem, Theorem 1.18, we present the Mean Value Theorem for convex functions in terms of the subdifferential. Theorem 2.88 Consider a convex function φ : Rn → R. Then for x, y ∈ Rn , there exists z ∈ (x, y) such that φ(y) − φ(x) ∈ h∂φ(z), y − xi, where h∂φ(z), y − xi = {hξ, y − xi : ξ ∈ ∂φ(z)}. Proof. Consider the function ψ : [0, 1] → R defined by ψ(λ) = φ(x + λ(y − x)) − φ(x) + λ(φ(x) − φ(y)). Because φ is real-valued and by Theorem 2.69 it is continuous on Rn , hence ψ is a real-valued continuous function on [0,1]. Observe that ψ(0) = 0 = ψ(1). Also, by the convexity of φ, ψ(λ) ≤ (1 − λ)φ(x) + λφ(y) − φ(x) + λ(φ(x) − φ(y)) = 0, ∀ λ ∈ [0, 1]. Thus, ψ attains its maximum at λ = 0 and λ = 1 and hence there exists ¯ ∈ (0, 1) at which ψ attains its minimum over [0, 1]. Therefore, λ ¯ d) ≥ 0, ∀ d ∈ R. ψ ′ (λ, ¯ − x) ∈ (x, y). Therefore, Denote z = x + λ(y ¯ d) ψ ′ (λ,
¯ + λd) − ψ(λ) ¯ ψ(λ λ↓0 λ ¯ ¯ − x)) φ(x + (λ + λd)(y − x)) − φ(x + λ(y = lim + d(φ(x) − φ(y)) λ↓0 λ = φ′ (z, d(y − x)) + d(φ(x) − φ(y)), ∀ d ∈ R, =
lim
which implies that φ′ (z, d(y − x)) ≥ d(φ(y) − φ(x)), ∀ d ∈ R. In particular, taking d = 1 in the above condition leads to φ(y) − φ(x) ≤ φ′ (z, y − x),
© 2012 by Taylor & Francis Group, LLC
96
Tools for Convex Optimization
whereas taking d = −1 yields −φ′ (z, x − y) ≤ φ(y) − φ(x). Combining the preceding inequalities imply −φ′ (z, x − y) ≤ φ(y) − φ(x) ≤ φ′ (z, y − x), which by Theorem 2.79 becomes inf hξ, y − xi = − sup hξ, x − yi ≤ φ(y) − φ(x) ≤ sup hξ, y − xi.
ξ∈∂φ(z)
ξ∈∂φ(z)
ξ∈∂φ(z)
By Proposition 2.83, ∂φ(z) is compact, which along with the continuity of hξ, y − xi implies that there exists ξ¯ ∈ ∂φ(z) such that ¯ y − xi ∈ h∂φ(z), y − xi, φ(y) − φ(x) = hξ, thereby completing the proof.
We have discussed the various continuity and differentiability behaviors of convex functions but in most cases these properties were restricted to the interior or relative interior of the domain of the function. As seen in the discussion preceding Proposition 2.82, the subdifferential set may be empty at the boundary of the domain. To overcome this flaw of the subdifferential of a convex function, we have the notion of ε-subdifferentials, which have the nonemptiness property throughout the domain of the function. We will discuss this notion in a later section in the chapter. As we are interested in the convex optimization problem, we first give the optimality condition for the unconstrained convex programming problem min f (x)
subject to
where f : Rn → R is a convex function.
x ∈ Rn ,
(CPu )
Theorem 2.89 Consider the unconstrained convex programming problem (CPu ). Then x ¯ ∈ Rn is the point of minimizer of (CPu ) if and only if 0 ∈ ∂f (¯ x). Proof. Suppose that x ¯ ∈ Rn is a point of minimizer of (CPu ), which implies that f (x) − f (¯ x) ≥ 0, ∀ x ∈ Rn . By Definition 2.77 of subdifferential, 0 ∈ ∂f (¯ x). The converse can be proved by again employing the definition of the subdifferential. Now recall the constrained convex programming problem presented in Chapter 1: min f (x)
© 2012 by Taylor & Francis Group, LLC
subject to
x ∈ C,
(CP )
2.3 Convex Functions
97
where f : Rn → R is a convex function and C is a convex subset of Rn . Recall the important property of convex optimization discussed in Section 1.3 that makes its study useful is that every local minimizer is also a global minimizer. The next result provides an alternative proof to this fact. ¯ be a Theorem 2.90 Consider a convex set C ⊂ Rn and let f : Rn → R proper convex function. Then the point of local minimum is a point of global minimum. If in addition f is strictly convex, there exists at most one global point of minimum. Proof. Suppose that x ¯ ∈ Rn is a point of local minimum of f over C. We claim that x ¯ is a point of global minimum. On the contrary, assume that x¯ is not a point of global minimum. Thus there exists x˜ ∈ C such that f (˜ x) < f (¯ x). By the convexity of f , for every λ ∈ (0, 1), f ((1 − λ)¯ x + λ˜ x) ≤ (1 − λ)f (¯ x) + λf (˜ x) < f (¯ x).
(2.20)
Also by the convexity of C, (1 − λ)¯ x + λ˜ x ∈ C. Taking λ sufficiently small, (1 − λ)¯ x + λ˜ x is in the neighborhood of x ¯, which by the inequality (2.20) implies that f ((1 − λ)¯ x + λ˜ x) < f (¯ x), which contradicts that x ¯ is a point of local minimum. Hence, x ¯ is a point of global minimum of f over C. Suppose that f is a strictly convex function with x ¯ and y¯ as the points of global minimum. Let f (¯ x) = f (¯ y ) = fmin , say. We claim that x ¯ = y¯. On the contrary, assume that x ¯ 6= y¯. By Definition 2.47 of strict convexity, for every λ ∈ (0, 1), f ((1 − λ)¯ x + λ¯ y ) < (1 − λ)f (¯ x) + λf (¯ y ) = fmin .
(2.21)
By the convexity of C, (1 − λ)¯ x + λ¯ y ∈ (¯ x, y¯) ⊂ C. Now the strict inequality (2.21) contradicts the fact that x ¯ and y¯ are the points of global minimizers of f over C, which is a contradiction. Thus, x ¯ = y¯, thereby implying that minimizing a strictly convex function f over a convex set C has at most one point of global minimum. As discussed earlier in this chapter, the above problem can be converted into the unconstrained convex programming problem of the form (CPu ) with the objective function f replaced by f + δC . From the above theorem, x ¯ is the point of minimizer of (CP ) if and only if 0 ∈ ∂(f + δC )(¯ x). To express the above inclusion explicitly in terms of the subdifferentials of the objective function f and the indicator function δC , one needs the calculus rules for the subdifferentials. Thus, following this path we shall now discuss the subdifferential calculus rules.
© 2012 by Taylor & Francis Group, LLC
98
2.4
Tools for Convex Optimization
Subdifferential Calculus
As we have already seen that subdifferentials play a pivotal role in the convex analysis. It replaces the role of derivative in case of nondifferentiable convex functions. So it is obvious to look into the matter as to whether or not the differential calculus is carried over to subdifferential calculus. As we proceed in this direction, one will see that it does satisfy results similar to standard calculus but under certain assumptions. We begin our journey of subdifferential calculus with the sum rule. Theorem 2.91 (Moreau–Rockafellar Sum Rule) Consider two proper convex ¯ i = 1, 2. Suppose that ri dom φ1 ∩ ri dom φ2 6= ∅. functions φi : Rn → R, Then ∂(φ1 + φ2 )(x) = ∂φ1 (x) + ∂φ2 (x) for every x ∈ dom(φ1 + φ2 ). Proof. We first show that ∂φ1 (¯ x) + ∂φ2 (¯ x) ⊂ ∂(φ1 + φ2 )(¯ x).
(2.22)
Suppose that ξi ∈ ∂φi (¯ x), i = 1, 2. By the definition of a subdifferential, φi (x) − φi (¯ x) ≥ hξi , x − x ¯i, ∀ x ∈ Rn , i = 1, 2. Therefore, (φ1 + φ2 )(x) − (φ1 + φ2 )(¯ x) ≥ hξ1 + ξ2 , x − x ¯i, ∀ x ∈ Rn , which implies that (ξ1 + ξ2 ) ∈ ∂(φ1 + φ2 )(¯ x), thereby establishing (2.22). To obtain the result, we will now prove the reverse inclusion, that is, ∂(φ1 + φ2 )(¯ x) ⊂ ∂φ1 (¯ x) + ∂φ2 (¯ x).
(2.23)
Suppose that ξ ∈ ∂(φ1 + φ2 )(¯ x). Define two convex functions ψ1 (x) = φ1 (x + x ¯) − φ1 (¯ x) − hξ, xi
and ψ2 (x) = φ2 (x + x ¯) − φ2 (¯ x).
Here, ψ1 (0) = ψ2 (0) = 0. Observe that ξ ∈ ∂(φ1 + φ2 )(¯ x) which by the above constructed functions is equivalent to (ψ1 + ψ2 )(x) ≥ 0, ∀ x ∈ Rn , that is, 0 ∈ ∂(ψ1 + ψ2 )(0). Thus, without loss of generality, consider x¯ = 0, ξ = 0, and φ1 (0) = φ2 (0) = 0 such that 0 ∈ ∂(φ1 + φ2 )(0),
© 2012 by Taylor & Francis Group, LLC
2.4 Subdifferential Calculus
99
which implies (φ1 + φ2 )(x) ≥ (φ1 + φ2 )(0) = 0, ∀ x ∈ Rn , that is, φ1 (x) ≥ −φ2 (x) for every x ∈ Rn . Define F1 = {(x, α) ∈ Rn × R : φ1 (x) ≤ α} and
F2 = {(x, α) ∈ Rn × R : α ≤ −φ2 (x)}.
Observe that both F1 and F2 are closed convex sets, where by Proposition 2.64, ri F1 = ri epi φ1 = {(x, α) ∈ Rn × R : x ∈ ri dom φ1 , φ1 (x) < α}. As φ1 (x) ≥ −φ2 (x), we have ri F1 ∩ F2 = ∅ with (0, 0) ∈ F1 ∩F2 . Therefore, by the separation theorem, Theorem 2.26 (ii), there exists (x∗ , α∗ ) ∈ Rn × R with (x∗ , α∗ ) 6= (0, 0) such that hx∗ , xi + α∗ α ≥ 0, ∀ (x, α) ∈ F1 , hx∗ , xi + α∗ α ≤ 0, ∀ (x, α) ∈ F2 . By assumption as φ1 (0) = 0, we have (0, α) ∈ F1 for α ≥ 0. Therefore, from the inequality above, we have α∗ ≥ 0. We claim that α∗ 6= 0. Suppose that α∗ = 0. Thus the above inequalities imply hx∗ , x1 i ≥ 0 ≥ hx∗ , x2 i, ∀ x1 ∈ dom φ1 , ∀ x2 ∈ dom φ2 . This implies that dom φ1 and dom φ2 can be separated, which contradicts the hypothesis that ri dom φ1 ∩ ri dom φ2 6= ∅. Hence, α∗ > 0 and can be normalized to one and thus hx∗ , xi + α ≥ 0, ∀ (x, α) ∈ F1 , hx∗ , xi + α ≤ 0, ∀ (x, α) ∈ F2 . In particular, for (x, φ1 (x)) ∈ F1 and (x, −φ2 (x)) ∈ F2 , we have −x∗ ∈ ∂φ1 (0) and x∗ ∈ ∂φ2 (0), thereby leading to 0 ∈ ∂φ1 (0) + ∂φ2 (0), thus establishing (2.23) and hence completing the proof.
The necessity of the condition ri dom φ1 ∩ ri dom φ2 6= ∅ can be seen from ¯ defined as the following example from Phelps [93]. Consider φ1 , φ2 : R2 → R φ1 (x) = δF1 (x), φ2 (x) = δF2 (x),
© 2012 by Taylor & Francis Group, LLC
F1 = epi y 2 , y ∈ R,
F2 = {(y1 , y2 ) ∈ R2 : y2 = 0}.
100
Tools for Convex Optimization
Here, ∂(φ1 + φ2 )(0) = R2 whereas ∂φ1 (0) = {(0, ξ) ∈ R2 : ξ ≤ 0}
and
∂φ2 (0) = {(0, ξ) ∈ R2 : ξ ∈ R}.
Therefore, ∂(φ1 + φ2 )(0) 6= ∂φ1 (0) + ∂φ2 (0). Observe that dom φ1 ∩ dom φ2 = F1 ∩ F2 = {(0, 0)} while ri dom φ1 ∩ ri dom φ2 = ri F1 ∩ ri F2 = ∅. Now as an application of the Subdifferential Sum Rule, we prove the equality in Proposition 2.39 (i) under the assumption of ri F1 ∩ ri F2 6= ∅. Proof of Proposition 2.39 (i). For convex sets F1 , F2 ⊂ Rn , define φ1 = δF1 and φ2 = δF2 . Observe that dom φi = Fi for i = 1, 2. If ri F1 ∩ ri F2 6= ∅, then ri dom φ1 ∩ ri dom φ2 6= ∅. Now applying the Sum Rule, Theorem 2.91, ∂(φ1 + φ2 )(¯ x) = ∂φ1 (¯ x) + ∂φ2 (¯ x), ∀ x ¯ ∈ dom φ1 ∩ dom φ2 , which along with the facts that δF1 + δF2 = δF1 ∩F2 and ∂δF = NF implies that NF1 ∩F2 (¯ x) = NF1 (¯ x) + NF2 (¯ x), ∀ x ¯ ∈ F 1 ∩ F2 , hence completing the proof.
n
Now if in Theorem 2.91, φi : R → R for i = 1, 2 are real-valued convex functions, then the Sum Rule can be derived using the directional derivative. We briefly discuss that approach from Hiriart-Urruty and Lemar´echal [63]. Using Theorem 2.79, the support of ∂φ1 (¯ x) + ∂φ2 (¯ x) is φ′1 (¯ x, .) + φ′2 (¯ x, .). Readers are advised to verify this fact using the definition of support. Also, the support of ∂(φ1 + φ2 )(¯ x) is (φ1 + φ2 )′ (¯ x, .), which is same as that of ∂φ1 (¯ x) + ∂φ2 (¯ x). Because the support functions are same for both sets, ∂(φ1 + φ2 )(¯ x) = ∂φ1 (¯ x) + ∂φ2 (¯ x). Observe that no additional assumption was required as here ri dom φ1 as well as ri dom φ2 is Rn . Other than the sum of convex functions being convex, from Proposition 2.53, we have that the composition of a nondecreasing convex function with a convex function is also convex. So before presenting the Chain Rule, we introduce the notion of increasing function defined over Rn and a result on the subdifferential of a nondecreasing function. Recall that in Proposition 2.53, the nondecreasing function ψ was defined over R. Definition 2.92 A function φ : Rn → R is called nondecreasing if for x, y ∈ Rn with xi ≥ yi , i = 1, 2, . . . , n, implies that φ(x) ≥ φ(y). Theorem 2.93 Consider a nondecreasing convex function φ : Rn → R. Then for every x ∈ Rn , ∂φ(x) ⊂ Rn+ .
© 2012 by Taylor & Francis Group, LLC
2.4 Subdifferential Calculus
101
Proof. Because φ is a nondecreasing convex function, φ(¯ x) ≥ φ(¯ x − ei ) ≥ φ(¯ x) + hξ, −ei i, where ei = (0, . . . , 0, 1, 0, . . . , 0) with 1 at the i-th place and ξ ∈ ∂φ(¯ x). This implies that φ(¯ x) ≥ φ(¯ x) − ξi , that is, ξi ≥ 0. Since i was arbitrary, ξi ≥ 0, i = 1, 2, . . . , n and thus ∂φ(¯ x) ⊂ Rn+ .
We now present the subdifferential calculus rule of the composition of convex functions. The proof is from Hiriart-Urruty and Lemar´echal [63]. Theorem 2.94 (Chain Rule) Consider a nondecreasing convex function φ : Rm → R and a vector-valued function Φ : Rn → Rm given by Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) where φi : Rn → R, i = 1, 2, . . . , m be a convex function. Then (m X ∂(φ ◦ Φ)(¯ x) = µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)), i=1
ξi ∈ ∂φi (¯ x), i = 1, 2, . . . , m} .
Proof. Define (m ) X F= µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)), ξi ∈ ∂φi (¯ x), i = 1, 2, . . . , m . i=1
We will prove the result in the following steps: 1. We shall show that F is a convex compact set as ∂(φ ◦ Φ). 2. We shall calculate the support function of F. 3. We shall calculate the support function of ∂(φ ◦ Φ) and establish that it is same as the support of F. The result is completed by the fact that two convex sets are equal if and only if their support functions are equal. Step 1: Consider any ξ ∈ F. Thus there exist (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)) and ξi ∈ ∂φi (¯ x), i = 1, 2, . . . , m, such that ξ=
m X
µi ξi .
i=1
Therefore, kξk ≤
© 2012 by Taylor & Francis Group, LLC
m X i=1
|µi | kξi k.
102
Tools for Convex Optimization
By Proposition 2.83, ∂φ(Φ(¯ x)) as well ∂φi (¯ x), i = 1, 2, . . . , m, are bounded sets and hence ξ is bounded. Because ξ ∈ F was arbitrary, F is a bounded set. Moreover, ∂φ(Φ(¯ x)) and ∂φi (¯ x), i = 1, 2, . . . , m, are closed sets; thus F is also a closed set, thereby yielding the compactness of F . Suppose that ξ1 , ξ2 ∈ F, which implies for j = 1, 2, m X
ξj =
µji ξij ,
i=1
where (µj1 , µj2 , . . . , µjm ) ∈ ∂φ(Φ(¯ x) and ξij ∈ ∂φi (¯ x), i = 1, 2, . . . , m, for j = 1, 2. Now for any λ ∈ (0, 1), define ξλ = (1 − λ)ξ1 + λξ2 . From Theorem 2.93, µji ≥ 0 for i = 1, 2, . . . , m and j = 1, 2. Define µλi = (1 − λ)µ1i + λµ2i , i = 1, 2, . . . , m. Note that µλi = 0 only when µ1i = µ2i = 0 as λ ∈ (0, 1). Therefore, X (1 − λ)µ1 λµ2i 2 i 1 ξi + ξi , ξ= µi µi µi ¯ i∈I
where I¯ = {i ∈ {1, 2, . . . , m} : µi > 0}. By Proposition 2.83, ∂φ(Φ(¯ x)) and ∂φi (¯ x), i = 1, 2, . . . , m, are convex sets and hence (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)) (1 − λ)µ1i 1 λµ2i 2 ξi + ξ ∈ ∂φi (¯ x), i = 1, 2, . . . , m, µi µi i
and
thereby showing that F is convex. Step 2: Denote Φ′ (¯ x, d) = (φ′1 (¯ x, d), φ′2 (¯ x, d), . . . , φ′m (¯ x, d)). We will establish that σF (d) = φ′ (Φ(¯ x), Φ′ (¯ x, d)). Consider ξ ∈ F, which implies that ξ=
m X
µi ξi ,
i=1
where (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)) and ξi ∈ ∂φi (¯ x), i = 1, 2, . . . , m. By Theorem 2.79, hξi , di ≤ φ′i (¯ x, d), i = 1, 2, . . . , m.
© 2012 by Taylor & Francis Group, LLC
2.4 Subdifferential Calculus
103
By Theorem 2.93, µi ≥ 0, i = 1, 2, . . . , m, which along with the above inequality implies that hξ, di =
m X i=1
µi hξi , di ≤
m X
µi φ′i (¯ x, d).
i=1
As µ = (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)), m X i=1
µi φ′i (¯ x, d) = hµ, Φ′ (¯ x, d)i ≤ φ′ (Φ(¯ x), Φ′ (¯ x, d)).
We claim that there exists ξ¯ ∈ F such that ¯ di = φ′ (Φ(¯ hξ, x), Φ′ (¯ x, d)). By Proposition 2.83, ∂φ(Φ(¯ x)) is compact and therefore, there exists µ ¯ = (¯ µ1 , µ ¯2 , . . . , µ ¯m ) ∈ ∂φ(Φ(¯ x)) such that m X i=1
µ ¯i φ′i (¯ x, d) = h¯ µ, Φ′ (¯ x, d)i = φ′ (Φ(¯ x), Φ′ (¯ x, d)).
(2.24)
Also, for i = 1, 2, . . . , m, ∂φi (¯ x) is compact, which implies there exists ξ¯i ∈ ∂φi (¯ x) such that hξ¯i , di = φ′i (¯ x, d), i = 1, 2, . . . , m. Therefore, the condition (2.24) becomes m X i=1
Denoting ξ¯ =
m X i=1
µ ¯i hξ¯i , di = φ′ (Φ(¯ x), Φ′ (¯ x, d)).
µ ¯i ξ¯i ∈ F, ¯ di = φ′ (Φ(¯ hξ, x), Φ′ (¯ x, d)),
which implies that σF (d) = φ′ (Φ(¯ x), Φ′ (¯ x, d)), ∀ d ∈ Rn . Step 3: It is obvious that the support function of ∂(φ ◦ Φ)(¯ x) is (φ ◦ Φ)′ (¯ x, d). We claim that (φ ◦ Φ)′ (¯ x, d) = φ′ (Φ(¯ x), Φ′ (¯ x, d)).
© 2012 by Taylor & Francis Group, LLC
104
Tools for Convex Optimization
For real-valued convex functions φi , i = 1, 2, . . . , m, from Definition 2.74 of directional derivative, it is obvious that φi (¯ x + λd) = φi (¯ x) + λφ′i (¯ x, d) + o(λ), i = 1, 2, . . . , m, which implies that Φ(¯ x + λd) = Φ(¯ x) + λΦ′ (¯ x, d) + o(λ). By Theorem 2.69, φ is continuous on ri dom φ = Rn which yields φ(Φ(¯ x + λd)) = φ(Φ(¯ x) + λΦ′ (¯ x, d)) + o(λ), which again by the definition of φ′ leads to φ(Φ(¯ x + λd)) = φ(Φ(¯ x)) + λφ′ (Φ(¯ x), Φ′ (¯ x, d)) + o(λ). Dividing throughout by λ > 0 and taking the limit as λ → 0 reduces the above condition to (φ ◦ Φ)′ (¯ x, d) = φ′ (Φ(¯ x), Φ′ (¯ x, d)). Because the support functions of both the sets are same, the sets ∂(φ ◦ Φ) and F coincide.
As we will discuss in this book, one of the ways to derive the optimality conditions for (CP ) is the max-function approach, thereby hinting at the use of subdifferential calculus for the max-function. Consider the convex functions φi : Rn → R, i = 1, 2, . . . , m, and define the max-function φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}. Observe that φ can be expressed as a composition of the functions Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x))
and
ϕ(y) = max{y1 , y2 , . . . , ym }
given by φ(x) = (ϕ◦Φ)(x). It is now natural to apply the Chain Rule presented above but along with that one needs to calculate ∂ϕ or ϕ′ (x, d). So before moving on to establish the Max-Function Rule, we will present a result to derive ϕ′ (x, d). The proof is from Hiriart-Urruty [59]. Theorem 2.95 Consider differentiable convex functions ϕi : Rn → R for i = 1, 2, . . . , m. For x ∈ Rn , define ϕ(x) = max{ϕ1 (x), ϕ2 (x), . . . , ϕm (x)} and denote the active index set by I(x) defined as I(x) = {i ∈ {1, 2, . . . , m} : ϕ(x) = ϕi (x)}. Then ϕ′ (¯ x, d) = max {h∇ϕi (¯ x), di}. i∈I(¯ x)
© 2012 by Taylor & Francis Group, LLC
2.4 Subdifferential Calculus
105
Proof. Without loss of generality, assume that I(¯ x) = {1, 2, . . . , m} because those ϕi where the maximum is not attained, do not affect ϕ′ (¯ x, d). By the definition of the max-function, ϕ(¯ x + λd) ≥ ϕi (¯ x + λd), ∀ i = 1, 2, . . . , m, which implies that ϕ(¯ x + λd) − ϕ(¯ x) ≥ ϕi (¯ x + λd) − ϕ(¯ x), ∀ i = 1, 2, . . . , m. As ϕ(¯ x) = ϕi (¯ x) for i ∈ I(¯ x), ϕ(¯ x + λd) − ϕ(¯ x) ≥ ϕi (¯ x + λd) − ϕi (¯ x), ∀ i ∈ I(¯ x). By Definition 2.74 of the directional derivative, ϕ′ (¯ x, d) ≥ lim λ↓0
ϕi (¯ x + λd) − ϕi (¯ x) , ∀ i ∈ I(¯ x). λ
Because φi , i ∈ I(¯ x) are differentiable functions, which along with the above inequality yields ϕ′ (¯ x, d) ≥ max h∇ϕi (¯ x), di. i∈I(¯ x)
To establish the result, we will prove the reverse inequality, that is, ϕ′ (¯ x, d) ≤ max h∇ϕi (¯ x), di. i∈I(¯ x)
We claim that there exists a neighborhood N (¯ x) such that I(x) ⊂ I(¯ x) for every x ∈ N (¯ x). On the contrary, assume that there exists {xk } ⊂ Rn with xk → x ¯ such that I(xk ) 6⊂ I(¯ x). Therefore, we may choose ik ∈ I(xk ) but ik ∈ / I(¯ x). As {ik } ⊂ {1, 2, . . . , m} for every k ∈ N, by the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, suppose that ik → ¯i. Because I(xk ) is closed, ¯i ∈ I(xk ), which implies ϕ¯i (xk ) = ϕ(xk ). By Theorem 2.69, the functions are continuous on Rn . Thus ϕ¯i (¯ x) = ϕ(¯ x), that is, ¯i ∈ I(¯ x). Because ik ∈ / I(¯ x) for every k ∈ N, ¯ which implies that i 6∈ I(¯ x), which is a contradiction, thereby establishing the claim. Now consider {λk } ⊂ R+ such that λk → 0. Observe that ϕik (¯ x + λk d) = ϕ(¯ x + λk d), ∀ ik ∈ I(¯ x + λk d). For sufficiently large k ∈ N, we may choose ik ∈ I(¯ x). Because I(¯ x) is closed, which along with the Bolzano–Weierstrass Theorem implies that ik has a convergent subsequence. Without loss of generality, assume that {ik } converges to ¯i ∈ I(¯ x). We may choose ik = ¯i. Therefore, lim
k→∞
ϕ(¯ x + λk d) − ϕ(¯ x) ≤ max h∇ϕi (¯ x), di. λk i∈I(¯ x)
© 2012 by Taylor & Francis Group, LLC
106
Tools for Convex Optimization
By Theorem 2.76, the directional derivative of a convex function always exists and therefore, ϕ′ (¯ x, d) = lim
λ→0
ϕ(¯ x + λd) − ϕ(¯ x) ≤ max h∇ϕi (¯ x), di, λ i∈I(¯ x)
hence completing the proof.
We are now in a position to obtain the Subdifferential Max-Function Rule as an application of the Chain Rule, Theorem 2.94, and the result Theorem 2.95 established above. Theorem 2.96 (Max-Function Rule) Consider convex functions φi : Rn → R, i = 1, 2, . . . , m, and let φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}. Then [ ∂φ(¯ x) = co , ∂φi (¯ x), i∈I(¯ x)
where I(¯ x) denotes the active index set. Proof. In the discussion preceding Theorem 2.95, we observed that φ = ϕ ◦Φ, where Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x))
and
ϕ(y) = max{y1 , y2 , . . . , ym }
with y = (y1 , y2 , . . . , ym ) ∈ Rm . By Theorem 2.95, ϕ′ (y, d) = max {hei , di}, ′ i∈I (y)
where ei = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rm with 1 at the i-th place and I ′ (y) = {i ∈ {1, 2, . . . , m} : yi = ϕ(y)}. It is obvious that ϕ′ (y, .) is a support function of {ei ∈ Rm : i ∈ I ′ (y)} and by Proposition 2.61, it is also the support function of co {ei ∈ Rm : i ∈ I ′ (y)}. Therefore, by Theorem 2.79, ∂ϕ(y) = co {ei ∈ Rm : i ∈ I ′ (y)}, that is, ∂ϕ(y) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I ′ (y), µi = 0, i ∈ / I ′ (y),
m X
µi = 1}.
m X
µi = 1}.
i=1
Thus, ∂ϕ(Φ(¯ x)) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I ′ (Φ(¯ x)), µi = 0, i ∈ / I ′ (Φ(¯ x)),
© 2012 by Taylor & Francis Group, LLC
i=1
2.4 Subdifferential Calculus
107
As I ′ (Φ(¯ x)) = I(¯ x), the above condition reduces to ∂ϕ(Φ(¯ x)) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I(¯ x), m X µi = 0, i ∈ / I(¯ x), µi = 1}. i=1
As ϕ is a nondecreasing convex function, applying Theorem 2.94 to φ = ϕ ◦ Φ yields ∂φ(¯ x)
m X = { µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(¯ x)), i=1
ξi ∈ ∂φi (¯ x), i = 1, 2, . . . , m} m m X X = { µi ξi : µi ≥ 0, i ∈ I(¯ x), µi = 0, i ∈ / I(¯ x), µi = 1, i=1
i=1
ξi ∈ ∂φi (¯ x), i = 1, 2, . . . , m},
which implies ∂φ(¯ x) = co
[
∂φi (¯ x),
i∈I(¯ x)
thereby leading to the desired result.
Observe that in the Max-Function Rule above, the maximum was over a finite index set. Now if the index set is a compact set, need not be finite, then what will the subdifferential for sup-function be? This aspect was looked into by Valadier [109] and thus is also referred to as the Valadier Formula. Below we present the Valadier Formula from Ruszczy´ nski [102]. Theorem 2.97 Consider a function Φ(x) = sup φ(x, y), y∈Y
¯ Let x where φ : Rn × Y → R. ¯ ∈ dom Φ such that (i) φ(., y) is convex for every y ∈ Y , (ii) φ(x, .) is usc for every x ∈ Rn , (iii) Y ⊂ Rm is compact. Furthermore, if φ(., y) is continuous at x ¯ for every y ∈ Y , then [ ∂x φ(¯ x, y), ∂Φ(¯ x) = co y∈Yˆ (¯ x)
where Yˆ (¯ x) = {y ∈ Y : φ(¯ x, y) = Φ(¯ x)} and ∂x φ denotes the subdifferential with respect to x.
© 2012 by Taylor & Francis Group, LLC
108
Tools for Convex Optimization
Proof. Observe that (ii) and (iii) ensure that Yˆ (¯ x) is nonempty and compact. Suppose that ξ ∈ ∂x φ(¯ x, y¯) for some y¯ ∈ Yˆ (¯ x). By Definition 2.77 of the subdifferential, φ(x, y¯) − φ(¯ x, y¯) ≥ hξ, x − x ¯i, ∀ x ∈ Rn . As y¯ ∈ Yˆ (¯ x), φ(¯ x, y¯) = Φ(¯ x). Therefore, the above inequality leads to Φ(x) − Φ(¯ x) = sup φ(x, y) − φ(¯ x, y¯) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , y∈Y
thus implying that ξ ∈ ∂Φ(¯ x). Because y¯ ∈ Yˆ (¯ x) and ξ ∈ ∂x φ(¯ x, y¯) were arbitrary, ∂Φ(¯ x) ⊃ ∂x φ(¯ x, y), ∀ y ∈ Yˆ (¯ x). Because ∂Φ(¯ x) is convex, the preceding inclusion yields [ ∂Φ(¯ x) ⊃ co ∂x φ(¯ x, y). y∈Yˆ (¯ x)
To establish the converse, we will prove the reverse inclusion in the above S x, y) is relation. Because ∂Φ(¯ x) is closed, we first show that y∈Yˆ (¯x) ∂x φ(¯ ¯ ˆ closed. Suppose that ξk ∈ ∂x φ(¯ x, yk ), where {yk } ⊂ Y (¯ x) such that ξk → ξ. ˆ Because Y (¯ x) is compact and hence closed, {yk } is a bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, suppose yk → y¯, which by the closedness of Yˆ (¯ x) implies that y¯ ∈ Yˆ (¯ x). By the definition of subdifferential along with the facts that {yk } ⊂ Yˆ (¯ x) and y¯ ∈ Yˆ (¯ x), that is, φ(¯ x, yk ) = Φ(¯ x) = φ(¯ x, y¯) imply that for every x ∈ Rn , φ(x, yk )
≥ φ(¯ x, yk ) + hξk , x − x ¯i
= φ(¯ x, y¯) + hξk , x − x ¯i, ∀ k ∈ N.
Taking the limit supremum as k → +∞, which by the upper semicontinuity of φ(x, .) over Rn leads to φ(x, y¯) ≥ lim sup φ(x, yk ) k→∞
≥ φ(¯ x, y¯) + lim suphξk , x − x ¯i k→∞
¯ x−x = φ(¯ x, y¯) + hξ, ¯i, ∀ x ∈ Rn ,
S thereby yielding that ξ¯ ∈ ∂x φ(¯ x, y¯). Hence, y∈Yˆ (¯x) ∂x φ(¯ x, y) is closed. Now let us assume on the contrary that [ ∂Φ(¯ x) 6⊂ co ∂x φ(¯ x, y), y∈Yˆ (¯ x)
© 2012 by Taylor & Francis Group, LLC
2.4 Subdifferential Calculus
109
that is, there exists ξ¯ ∈ ∂Φ(¯ x) such that ξ¯ ∈ / co
[
∂x φ(¯ x, y).
y∈Yˆ (¯ x)
S As co y∈Yˆ (¯x) ∂x φ(¯ x, y) is a closed convex set, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists d ∈ Rn with d 6= 0 such that ¯ di > hξ, di, ∀ ξ ∈ ∂x φ(¯ hξ, x, y), ∀ y ∈ Yˆ (¯ x).
(2.25)
Consider a sequence {λk } ⊂ R+ such that λk → 0. As Φ is convex, by the definition of subdifferential, Φ(¯ x + λk d) − Φ(¯ x) ¯ di. ≥ hξ, λk
(2.26)
For k ∈ N, define the set φ(¯ x + λk d, y) − Φ(¯ x) ¯ di . ≥ hξ, Yk = y ∈ Y : λk We claim that Yk is compact and nonempty. Consider {yr } ∈ Yk such that yr → yˆ. Because yr ∈ Yk , φ(¯ x + λk d, yr ) − Φ(¯ x) ¯ di. ≥ hξ, λk Taking the limit supremum as r → +∞, which along with the upper semicontinuity of φ(x, .) for every x ∈ Rn implies that φ(¯ x + λk d, yˆ) − Φ(¯ x) ¯ di. ≥ hξ, λk Thus, yˆ ∈ Yk and hence Yk is closed for every k ∈ N. As Yk ⊂ Y and Y is compact, Yk is closed and bounded and thus compact. Also by the upper semicontinuity of φ(x, .) for every x ∈ Rn , Yˆ (¯ x + λk d) is nonempty. From the inequality (2.26) and the definition of the set Yk , Yˆ (¯ x + λk d) ⊂ Yk and hence Yk is nonempty. For every y ∈ Y , consider the expression φ(¯ φ(¯ x + λd, y) − Φ(¯ x) x + λd, y) − φ(¯ x, y) φ(¯ x, y) − Φ(¯ x) = + . λ λ λ
(2.27)
From the discussion preceding Theorem 2.76 on directional derivatives, the first term on the right-hand side of the above expression is a nondecreasing function of λ, that is, φ(¯ φ(¯ x + λ1 d, y) − φ(¯ x, y) x + λ2 d, y) − φ(¯ x, y) ≤ , ∀ 0 < λ 1 ≤ λ2 . λ1 λ2
© 2012 by Taylor & Francis Group, LLC
(2.28)
110
Tools for Convex Optimization
Also, as Φ(¯ x) ≥ φ(¯ x, y) for every y ∈ Y ,
φ(¯ φ(¯ x, y) − Φ(¯ x) x, y) − Φ(¯ x) ≤ , ∀ 0 < λ 1 ≤ λ2 , λ1 λ2
(2.29)
which implies that the second term is also nondecreasing in λ. Thus, combining the conditions (2.28) and (2.29), the expression φ(¯ x + λ2 d, y) − Φ(¯ x) φ(¯ x + λ1 d, y) − Φ(¯ x) ≤ , ∀ 0 < λ 1 ≤ λ2 , λ1 λ2 that is, the expression (2.27) is nondecreasing in λ. From the above inequality, it is obvious that Y1 ⊂ Y2 for every 0 < λ1 ≤ λ2 . As {λk } is a decreasing sequence, Y1 ⊃ Y2 ⊃ Y3 ⊃ . . . . As for every k ∈ N, Yk is compact and nonempty, there exists y˜ ∈ Yk for all k ∈ N. Therefore, φ(¯ x + λk d, y˜) − Φ(¯ x) ¯ di, ∀ k ∈ N, ≥ hξ, λk
which implies that the term on the left-hand side is bounded below for every k ∈ N. By the continuity of φ(., y) at x ¯ for every y ∈ Y , φ(¯ x + λk d, y˜) → φ(¯ x, y˜) which along with the lower boundedness yields that y˜ ∈ Yˆ (¯ x), that is, Φ(¯ x) = φ(¯ x, y˜). Taking the limit as k → +∞ in the above inequality along with Definition 2.74 of the directional derivative implies that ¯ di. φ′ ((¯ x, y˜), d) ≥ hξ, As φ(., y˜) is continuous at x ¯, any neighborhood of x ¯ is contained in dom φ(., y˜). Thus, x ¯ ∈ int dom φ(., y˜), which by Theorem 2.79 implies that φ′ (¯ x, y˜) is the support function of ∂x φ(¯ x, y˜). Also, by Proposition 2.82, ∂x φ(¯ x, y˜) is compact. Therefore, there exists ξ ∈ ∂x φ(¯ x, y˜) such that the above inequality becomes ¯ di, hξ, di ≥ hξ,
thereby contradicting the inequality (2.25) as y˜ ∈ Yˆ (¯ x), hence completing the proof. From Proposition 2.56, another operation on the convex functions that leads to a convex function is the inf-convolution. We end this section of the subdifferential calculus rules by presenting the subdifferential rule for the infconvolution for a particular case from Lucchetti [79]. ¯ i = 1, 2. Theorem 2.98 Consider proper lsc convex functions φi : Rn → R, n Let x ¯, x1 , x2 ∈ R be such that x1 + x2 = x ¯
and
(φ1 φ2 )(¯ x) = φ1 (x1 ) + φ2 (x2 ).
Then ∂(φ1 φ2 )(¯ x) = ∂φ1 (x1 ) ∩ ∂φ2 (x2 ).
© 2012 by Taylor & Francis Group, LLC
2.5 Conjugate Functions
111
Proof. Suppose that ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ). By Definition 2.77 of the subdifferential, for i = 1, 2, φi (yi ) − φi (xi ) ≥ hξ, yi − xi i, ∀ yi ∈ Rn . Define y1 +y2 = y¯. The above inequality along with the given hypothesis leads to φ1 (y1 ) + φ2 (y2 ) ≥ (φ1 φ2 )(¯ x) + hξ, y¯ − x ¯i, ∀ y1 , y2 ∈ Rn . Taking the infimum over y1 and y2 satisfying y1 + y2 = y¯ in the above inequality, which by Definition 2.54 of the inf-convolution yields (φ1 φ2 )(¯ y ) ≥ (φ1 φ2 )(¯ x) + hξ, y¯ − x ¯i. As y¯ ∈ Rn was arbitrary, the above inequality holds for every y¯ ∈ Rn . Thus, ξ ∈ ∂(φ1 φ2 )(¯ x). Because ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ) was arbitrary, ∂φ1 (x1 ) ∩ ∂φ2 (x2 ) ⊂ ∂(φ1 φ2 )(¯ x). Conversely, suppose that ξ ∈ ∂(φ1 φ2 )(¯ x). Therefore, ∂(φ1 φ2 )(¯ y ) ≥ φ1 (x1 ) + φ2 (x2 ) + hξ, y¯ − x ¯i, ∀ y¯ ∈ Rn . As the above inequality holds for any y¯ ∈ Rn , then y¯ = x + x2 for some x ∈ Rn . Substituting in the above inequality along with the definition of the inf-convolution yields φ1 (x) + φ2 (x2 ) ≥ φ1 (x1 ) + φ2 (x2 ) + hξ, (x + x2 ) − (x1 + x2 )i, ∀ x ∈ Rn , which implies that φ1 (x) ≥ φ1 (x1 ) + hξ, x − x1 i, ∀ x ∈ Rn . Therefore, ξ ∈ ∂φ1 (x1 ). Similarly, it can be shown that ξ ∈ ∂φ2 (x2 ) and hence ξ ∈ ∂φ1 (x1 ) ∩ φ2 (x2 ). Because ξ ∈ ∂(φ1 φ2 )(¯ x) was arbitrary, ∂(φ1 φ2 )(¯ x) ⊂ ∂φ1 (x1 ) ∩ φ2 (x2 ), thereby establishing the result.
2.5
Conjugate Functions
All this background on convexity, convex sets as well as convex functions, the subdifferentials and their calculus form a backbone for the study of convex optimization theory. Optimization problems appear not only in the specialized fields of engineering, management sciences, and finance, but also in some simple real-life problems. For instance, if the cost of manufacturing x1 , x2 , . . . , xn quantities of n goods is given by φ(x) and the price of selling these goods
© 2012 by Taylor & Francis Group, LLC
112
Tools for Convex Optimization
is ξ1 , ξ2 , . . . , ξn , respectively, then the manufacturer would like to choose the quantities x1 , x2 , . . . , xn in such a way that it leads to maximum profit, where the profit function is given by the affine function {hξ, xi − φ(x)}. Theoretically, this problem had been expressed using the conjugate functions of φ introduced by Fenchel [45], which forms a class of convex functions. As we will see in a short while, these conjugate functions are related to not only the subdifferential for a convex function by the Fenchel–Young inequality but also to the ε-subdifferential via its epigraph. For convex functions, the very idea of conjugacy seems to derive from the fact that a proper lsc convex function is a pointwise supremum of affine functions majorized by it. But before moving on to this result, we present the following lemma from Lucchetti [79]. ¯ Let Lemma 2.99 Consider a proper lsc convex function φ : Rn → R. x ¯ ∈ dom φ and γ ∈ R such that φ(¯ x) > γ. Then there exists (a, b) ∈ Rn × R such that the affine function h(x) = ha, xi + b satisfies φ(x) ≥ h(x), ∀ x ∈ Rn
and
h(¯ x) > γ.
Proof. As φ is an lsc convex function, by Theorem 1.9 and Proposition 2.48, epi φ is a closed convex set in Rn × R. From the given hypothesis, it is obvious that (¯ x, γ) ∈ / epi φ. By the Strict Separation Theorem, Theorem 2.26 (iii), there exist (a, λ) ∈ Rn × R with (a, λ) 6= (0, 0) and b ∈ R such that ha, xi + λα ≥ b > ha, x ¯i + λγ, ∀ (x, α) ∈ epi φ.
(2.30)
In particular, taking (¯ x, φ(¯ x)) ∈ epi φ, the above inequality reduces to λ(φ(¯ x) − γ) > 0. As φ(¯ x) > γ, the above strict inequality leads to λ > 0. Again, taking (x, φ(x)) ∈ epi φ in the condition (2.30) yields φ(x) ≥ h(x), ∀ x ∈ dom φ
and
h(¯ x) > γ,
b −a , xi + . Observe that for x ∈ / dom φ, the first inequality where h(x) = h λ λ holds trivially, that is, φ(x) ≥ h(x), ∀ x ∈ Rn , thereby establishing the result.
Now we present the main result, the proof of which is from Lucchetti [79]. ¯ can be expressed Theorem 2.100 A proper lsc convex function φ : Rn → R as a pointwise supremum of the collection of all affine functions majorized by it, that is, for every x ∈ Rn , φ(x) = sup{h(x) : φ(x) ≥ h(x), h(x) = ha, xi + b, a ∈ Rn , b ∈ R}.
© 2012 by Taylor & Francis Group, LLC
2.5 Conjugate Functions
113
¯ as Proof. Define the function Φ : Rn → R Φ(x) = sup{h(x) : φ(x) ≥ h(x), h(x) = ha, xi + b, a ∈ Rn , b ∈ R}. Because Φ is a pointwise supremum of affine functions, it is an lsc convex function. Also, as φ(x) ≥ h(x), φ(x) ≥ Φ(x) for every x ∈ Rn , which implies that epi φ is contained in the intersection of epigraph of the affine functions h, epi h, which are majorized by φ, that is, φ(x) ≥ h(x) for every x ∈ Rn . Therefore, to complete the proof, it is sufficient to prove that for (¯ x, γ) 6∈ epi φ, there exists an affine function h such that h(¯ x) > γ. By Lemma 2.99, for x ¯ ∈ dom φ such an h exists. Now suppose that x ¯∈ / dom φ. As (¯ x, γ) 6∈ epi φ, working along the lines of the proof of Lemma 2.99, there exist (a, λ) ∈ Rn × R with (a, λ) 6= (0, 0) and b ∈ R such that ha, xi + λα ≥ b > ha, x ¯i + λγ, ∀ (x, α) ∈ epi φ. If λ 6= 0, the affine function h exists as in the lemma. If λ = 0, the above inequality reduces to ha, xi ≥ b > ha, x ¯i, ∀ x ∈ dom φ. From the above condition, h(x) ≤ 0, ∀ x ∈ dom φ
and
h(¯ x) > 0,
where h(x) = h−a, xi + b. As a consequence of Lemma 2.99, it is obvious that a proper lsc convex function has at least one affine function majorized by it. ¯ majorized by it, that is φ(x) ≥ h(x) ¯ Therefore, φ has an affine function, say h, n for every x ∈ R . Now for any µ > 0, ¯ φ(x) ≥ h(x) + µh(x), ∀ x ∈ dom φ. The above inequality holds trivially for x ∈ / dom φ. Thus, ¯ φ(x) ≥ (h + µh)(x), ∀ x ∈ Rn , ¯ is majorized by φ. As h(¯ which implies the affine function (h + µh) x) > 0, for ¯ x) > γ, thereby establishing the result. µ sufficiently large, (h + µh)(¯ Denote the set of all affine functions by H. Consider the support set of φ denoted by supp(φ, H), which is the collection of all affine functions majorized by φ, that is, supp(φ, H) = {h ∈ H : h(x) ≤ φ(x), ∀ x ∈ Rn }. An affine function h ∈ H is the affine support of φ if h(x) ≤ φ(x), ∀ x ∈ Rn
© 2012 by Taylor & Francis Group, LLC
and
h(¯ x) = φ(¯ x), for some x ¯ ∈ Rn .
114
Tools for Convex Optimization
¯ and x Consider φ : Rn → R ¯ ∈ dom φ such that ∂φ(¯ x) is nonempty. Then for any ξ ∈ ∂φ(¯ x), by Definition 2.77, φ(x) ≥ hξ, xi + (φ(¯ x) − hξ, x ¯i), ∀ x ∈ Rn .
(2.31)
Define an affine function h : Rn → R given by h(x) = hξ, xi + (φ(¯ x) − hξ, x ¯i).
(2.32)
Combining (2.31) and (2.32), h(x) ≤ φ(x), ∀ x ∈ Rn
and
h(¯ x) = φ(¯ x),
thereby implying that h ∈ H is an affine support of φ. Therefore, if ∂φ is nonempty, then there exists an affine support to it. Now consider a set Φ∗ ⊂ Rn defined as ¯ α) ¯ xi − α Φ∗ = {(ξ, ¯ ∈ Rn × R : h(x) = hξ, ¯ ≤ φ(x)}, which implies for every x ∈ Rn , h(x) ≤ φ(x). Therefore, ¯ xi − φ(x)}, α ¯ ≥ sup {hξ, x∈Rn
which implies Φ∗ can be considered the epigraph of the function φ∗ , which is the conjugate of φ. We formally introduce the notion of conjugate below. ¯ The conjugate of φ, Definition 2.101 Consider a function φ : Rn → R. ¯ is defined as φ∗ : Rn → R, φ∗ (ξ) = sup {hξ, xi − φ(x)}. x∈Rn
Observe that Φ∗ = epi φ∗ , as discussed above. The biconjugate of φ, φ∗∗ , is the conjugate of φ∗ , that is, φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)}. ξ∈Rn
Consider a set F ⊂ Rn . The conjugate of the indicator function to the set F is δF∗ (ξ) = sup {hξ, xi − δF (x)} = sup hξ, xi, x∈Rn
x∈F
which is actually the support function to the set F . Therefore, δF∗ = σF for any set F . Observe that the definitions of conjugate and biconjugate functions are given for any arbitrary function. Below we present some properties of conjugate functions.
© 2012 by Taylor & Francis Group, LLC
2.5 Conjugate Functions
115
¯ the conjugate function Proposition 2.102 For any function φ : Rn → R, φ∗ is always lsc convex. In addition, if φ is proper convex, then φ∗ is also a proper convex function. Proof. Consider any ξ1 , ξ2 ∈ Rn . Then for every λ ∈ [0, 1], φ∗ ((1 − λ)ξ1 + λξ2 )
sup {h((1 − λ)ξ1 + λξ2 ), xi − φ(x)}
=
x∈Rn
sup {(1 − λ)(hξ1 , xi − φ(x)) + λ(hξ2 , xi − φ(x))},
=
x∈Rn
which by Proposition 1.7 leads to φ∗ ((1 − λ)ξ1 + λξ2 ) ≤ (1 − λ) sup {hξ1 , xi − φ(x)} + x∈Rn
λ sup {hξ2 , xi − φ(x)} x∈Rn
=
∗
∗
(1 − λ)φ (ξ1 ) + λφ (ξ2 ), ∀ λ ∈ [0, 1].
Because ξ1 and ξ2 are arbitrary, from the above inequality φ∗ is convex. Also, as φ∗ is a pointwise supremum of affine functions hx, .i − φ(x), it is lsc. As φ is a proper convex function, dom φ is a nonempty convex set in Rn , which by Proposition 2.14 (i) implies that ri dom φ is nonempty. Also, by Proposition 2.82, for any x¯ ∈ ri dom φ, ∂φ(¯ x) is nonempty. Suppose that ξ ∈ ∂φ(¯ x), which by Definition 2.77 of the subdifferential implies that hξ, x ¯i − φ(¯ x) ≥ hξ, xi − φ(x), ∀ x ∈ Rn , which along with the definition of conjugate φ∗ implies that hξ, x ¯i − φ(¯ x) = φ∗ (ξ). As φ(¯ x) is finite, φ∗ (ξ) is also finite, that is, ξ ∈ dom φ∗ . Also, by the properness of φ and the definition of φ∗ , it is obvious that φ∗ (ξ) > −∞ for every x ∈ Rn , thereby showing that φ∗ is proper convex function.
Observe that φ∗ is lsc convex irrespective of the nature of φ but for φ∗ to be proper, we need φ to be a proper convex function. Simply assuming φ to be proper need not imply that φ∗ is proper. For instance, consider φ(x) = −x2 , which is a nonconvex proper function. Then φ∗ ≡ +∞ and hence not proper. Next we state some conjugate rules that can be proved directly using the definition of conjugate functions. ¯ φ : Rn → R. ¯ Proposition 2.103 Consider a function φ, (i) If φ¯ ≤ φ, then φ¯∗ ≥ φ∗ .
¯ (ii) If φ(x) = φ(x) + c, φ¯∗ (ξ) = φ∗ (ξ) − c.
¯ (iii) If φ(x) = λφ(x) for λ > 0, φ¯∗ (ξ) = λφ∗ (ξ/λ).
© 2012 by Taylor & Francis Group, LLC
116
Tools for Convex Optimization
(iv) For every x and ξ in Rn , φ∗ (ξ) + φ(x) ≥ hξ, xi. This is known as the Fenchel–Young Inequality. Equivalently, φ∗∗ (x) ≤ φ(x), ∀ x ∈ Rn . The readers are urged to verify these properties simply using Definition 2.101 of conjugate and biconjugate functions. As we discussed in Theorem 2.100 that a convex function is pointwise supremum of affine functions, the biconjugate of the function plays an important role in this respect. Below we present a result that relates the biconjugate with the support set. The proof is along the lines of Hiriart-Urruty and Lemar´echal [63]. ¯ Then φ∗∗ is the Theorem 2.104 Consider a proper function φ : Rn → R. pointwise supremum of all affine functions majorized by φ, that is, φ∗∗ (¯ x) =
sup
h(¯ x).
h∈supp(φ,H)
More precisely, φ∗∗ = cl co φ. Proof. An affine function h is majorized by φ, that is, h(x) ≤ φ(x) for every x ∈ Rn . Because an affine function is expressed as h(x) = hξ, xi − α for some ξ ∈ Rn and α ∈ R, hξ, xi − α ≤ φ(x), ∀ x ∈ Rn . Therefore, by Definition 2.101 of the conjugate function, φ∗ (ξ) ≤ α, which implies ξ ∈ dom φ∗ . Then for any x ∈ Rn , sup
h(x)
=
sup ξ∈dom φ∗ , φ∗ (ξ)≤α
h∈supp(φ,H)
=
sup ξ∈dom
=
φ∗
{hξ, xi − α}
{hξ, xi − φ∗ (ξ)}
sup {hξ, xi − φ∗ (ξ)} = φ∗∗ (x),
ξ∈Rn
thereby yielding the desired result. From Definition 2.57 of the closed convex function, φ∗∗ = cl co φ, as desired. Combining Theorems 2.100 and 2.104 we have the following result for a proper lsc convex function. ¯ Then Theorem 2.105 Consider a proper lsc convex function φ : Rn → R. ∗∗ φ = φ.
© 2012 by Taylor & Francis Group, LLC
2.5 Conjugate Functions
117
Observe that the above theorem holds when the function is lsc. What if φ is only proper convex but not lsc, then how is one supposed to relate the function φ to its biconjugate φ∗∗ ? The next result from Attouch, Buttazzo, and Michaille [3] looks into this aspect. ¯ Assume Proposition 2.106 Consider a proper convex function φ : Rn → R. ∗∗ that φ admits a continuous affine minorant. Then φ = cl φ. Consequently, φ is lsc at x ¯ ∈ Rn if and only if φ(¯ x) = φ∗∗ (¯ x). Proof. By the Fenchel–Young inequality, Proposition 2.103 (iv), φ(x) ≥ hξ, xi − φ∗ (ξ), ∀ x ∈ Rn , which implies that h(x) = hξ, xi − φ∗ (ξ) belongs to supp(φ, H). By Definition 2.101 of the biconjugate function, φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)}, ξ∈Rn
which leads to φ∗∗ being the upper envelope of the continuous affine minorants of φ. Applying Proposition 2.102 to φ∗ , φ∗∗ is a proper lsc convex function and thus, φ∗∗ ≤ cl φ ≤ φ. This inequality along with Proposition 2.103 (i) leads to (φ∗∗ )∗∗ ≤ (cl φ)∗∗ ≤ φ∗∗ . As φ∗∗ and cl φ are both proper lsc convex functions, by Theorem 2.105, (φ∗∗ )∗∗ = φ∗∗
and
(cl φ)∗∗ = cl φ,
thereby reducing the preceding inequality to φ∗∗ ≤ cl φ)∗∗ ≤ φ∗∗ . Hence, φ∗∗ = cl φ, thereby establishing the first part of the result. From Chapter 1, we know that closure of a function φ is defined as cl φ(¯ x) = lim inf φ(x), x→¯ x
which is the same as φ(¯ x) if φ is lsc at x ¯ by Definition 1.4, thereby yielding φ(¯ x) = cl φ(¯ x). Consequently, by the first part, the lower semicontinuity of φ at x ¯ is equivalent to φ(¯ x) = φ∗∗ (¯ x), thereby completing the proof. With all the preceding results, and discussions on the properties of the conjugates and biconjugates, we now move on to see how the conjugates of the function operations are defined. More precisely, if given some functions and we perform some operation on them, like the sum operation or the supremum operation, then how are their conjugates related to the conjugates of the given functions? In the next result from Hiriart-Urruty and Lemar´echal [63] and Rockafellar [97], we look into this aspect of conjugate functions.
© 2012 by Taylor & Francis Group, LLC
118
Tools for Convex Optimization
Theorem 2.107 (i) (Inf-Convolution TRule) Consider proper functions ¯ i = 1, 2, . . . , m, satisfying m dom φ∗ 6= ∅. Then φi : Rn → R, i i=1 (φ1 φ2 . . . φm )∗ = φ∗1 + φ∗2 + . . . + φ∗m .
(ii) (Sum Rule) Consider Tm proper convex functions φi i = 1, 2, . . . , m, satisfying i=1 dom φi 6= ∅. Then
:
Rn
→
¯ R,
(cl φ1 + cl φ2 + . . . + cl φm )∗ = cl (φ∗1 φ∗2 . . . φ∗m ).
If
Tm
i=1
ri dom φi 6= ∅, then (φ1 + φ2 + . . . + φm )∗ = φ∗1 φ∗2 . . . φ∗m
and for every ξ ∈ dom (φ1 + φ2 + . . . + φm )∗ , the infimum of the problem inf{φ∗1 (ξ1 ) + φ∗2 (ξ2 ) + . . . + φ∗m (ξm ) : ξ1 + ξ2 + . . . + ξm = ξ} is attained. ¯ (iii) (Infimum Rule) Consider a family of proper functions φi : Rn → R, i ∈ I, where I is an arbitrary index set, having a common affine minorant and satisfying supi∈I φ∗i (ξ) < +∞ for some ξ ∈ Rn . Then (inf φi )∗ = sup φ∗i . i∈I
i∈I
(iv) (Supremum Rule) Consider a family of proper lsc convex functions ¯ i ∈ I, where I is an arbitrary index set. If sup φi is not φi : Rn → R, i∈I indentically +∞, then (sup φi )∗ = cl co(inf φ∗i ). i∈I
i∈I
Proof. (i) From Definition 2.101 of the conjugate function and Definition 2.54 of the inf-convolution along with Proposition 1.7, (φ1 . . . φm )∗ (ξ)
=
sup {hξ, xi −
x∈Rn
=
sup
sup
inf
(φ1 (x1 ) + . . . + φm (xm ))}
x1 +...+xm =x
{hξ, xi − (φ1 (x1 ) + . . . + φm (xm ))}
x∈Rn x1 +...+xm =x
=
sup x1 ,...,xm ∈Rn
{hξ, x1 i − φ1 (x1 ) + . . . +
= φ∗1 (ξ) + . . . + φ∗m (xm ),
hξ, xm i − φm (xm )}
thereby establishing the desired result. (ii) Replacing φi by φ∗i for i = 1, 2, . . . , m, in (i) along with Proposition 2.106 leads to cl φ∗1 + cl φ2 + . . . + cl φm = (φ∗1 φ∗2 . . . φ∗m )∗ .
© 2012 by Taylor & Francis Group, LLC
2.5 Conjugate Functions
119
Taking the conjugate on both sides and again applying Proposition 2.106 yields the requisite condition, (cl φ∗1 + cl φ2 + . . . + cl φm )∗ = cl (φ∗1 φ∗2 . . . φ∗m ). If
Tm
i=1
ri dom φi is nonempty, then by Proposition 2.68, cl φ1 + cl φ2 + . . . + cl φm = cl (φ1 + φ2 + . . . + φm ).
Also, by the definition of conjugate functions, (cl φ1 + cl φ2 + . . . + cl φm )∗
= =
(cl (φ1 + φ2 + . . . + φm ))∗ (φ1 + φ2 + . . . + φm )∗ .
Now to establish the result, it is enough to prove that φ∗1 φ∗2 . . . φ∗m is lsc. By Theorem 1.9, it is equivalent to showing that the lower-level set, Sα = {ξ ∈ Rn : (φ∗1 . . . φ∗m )(ξ) ≤ α}, is closed for every α ∈ R. Consider a bounded sequence {ξk } ⊂ Sα such that ξk → ξ. By Definition 2.54 of the inf-convolution, there exist ξki ∈ Rn with P m i i=1 ξk = ξk such that φ∗1 (ξk1 ) + . . . + φ∗m (ξkm ) ≤ α +
1 , ∀ k ∈ N. k
(2.33)
Tm By assumption, suppose that x ˆ ∈ i=1 ri dom φi . As φi , i = 1, 2, . . . , m, are convex, by Theorem 2.69, the functions are continuous at xˆ. Therefore, for some ε > 0 and Mi ∈ R, i = 1, 2, . . . , m, φi (x) ≤ Mi , ∀ x ∈ Bε (¯ x), i = 1, 2, . . . , m.
(2.34)
For any d ∈ Bε (0), consider hξk1 , di
= hξk1 , x ˆi − hξk1 , x ˆ − di
= hξk1 , x ˆi + hξk2 , x ˆ − di + . . . + hξkm , x ˆ − di − hξk , x ˆ − di,
which by the Fenchel–Young inequality, Proposition 2.103 (iv), and the Cauchy–Schwarz inequality, Proposition 1.1, leads to hξk1 , di
≤ φ∗1 (ξk1 ) + φ1 (ˆ x) + φ∗2 (ξk2 ) + φ2 (ˆ x − d) + . . . + ∗ m φm (ξk ) + φm (ˆ x − d) + kξk k kˆ x − dk.
By the conditions (2.33) and (2.34), the above inequality reduces to hξk1 , di ≤ α +
© 2012 by Taylor & Francis Group, LLC
1 + M2 + . . . + Mm + kξk k kˆ x − dk, k
120
Tools for Convex Optimization
which along with the boundedness of {ξk } implies that {ξk1 } ⊂ Rn is a bounded sequence. Similarly, it can be shown that {ξki }, i = 2, . . . , m, are bounded sequences. By the Bolzano–Weierstrass Theorem, Proposition 1.3, {ξki }, i = 1, 2, . . . , m, have a convergent subsequence. Without loss of generality, assume that ξki → ξi , i = 1, 2, . . . , m. Because ξk = ξk1 + ξk2 + . . . + ξkm , as limit k → +∞, ξ = ξ1 + ξ2 + . . . + ξm . By Proposition 2.102, φ∗i , i = 1, 2, . . . , m, are proper lsc convex functions, therefore taking the limit as k → +∞, φ∗1 (ξ1 ) + φ∗2 (ξ2 ) + . . . + φ∗m (ξm ) ≤ α. By the definition of inf-convolution, the above inequality leads to (φ∗1 φ∗2 . . . φ∗m )(ξ) ≤ α, which implies that ξ ∈ Sα . Because α ∈ R was arbitrary, the lower-level set is closed for every α ∈ R and hence φ∗1 φ∗2 . . . φ∗m is closed. Repeating the same arguments with (φ∗1 φ∗2 . . . φ∗m )(ξ) = α
and
ξk = ξ
yields that the infimum is achieved, thereby completing the proof. (iii) By Definition 2.101, for every ξ ∈ Rn , (inf φi )∗ (ξ) i∈I
sup {hξ, xi − inf φi (x)}
=
i∈I
x∈Rn
sup sup{hξ, xi − φi (x)}
=
x∈Rn i∈I
=
sup sup {hξ, xi − φi (x)} = sup φ∗i (ξ), i∈I x∈Rn
i∈I
as desired. (iv) Replacing φi by φ∗i for i = 1, 2, . . . , m in (iii), ∗ ∗ sup φ∗∗ i = (inf φi ) . i∈I
i∈I
As φi , i = 1, 2, . . . , m, are lsc, the above condition reduces to sup φi = (inf φ∗i )∗ . i∈I
i∈I
Taking the conjugate on both sides leads to (sup φi )∗ = (inf φ∗i )∗∗ , i∈I
© 2012 by Taylor & Francis Group, LLC
i∈I
2.5 Conjugate Functions
121
which by Theorem 2.104 yields (sup φi )∗ = cl co (inf φ∗i ). i∈I
i∈I
Next, using the Fenchel–Young inequality, we present an equivalent characterization of the subdifferential of a convex function. ¯ Then for Theorem 2.108 Consider a proper convex function φ : Rn → R. any x, ξ ∈ Rn ξ ∈ ∂φ(x)
⇐⇒
φ(x) + φ∗ (ξ) = hξ, xi.
In addition, if φ is also lsc, then for any x and ξ in Rn ξ ∈ ∂φ(x)
⇐⇒
φ(x) + φ∗ (ξ) = hξ, xi
⇐⇒
x ∈ ∂φ∗ (ξ).
Proof. Suppose that ξ ∈ ∂φ(¯ x), which by Definition 2.77 of the subdifferential implies that φ(x) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn . The above inequality leads to hξ, x ¯i − φ(¯ x) ≥ sup {hξ, xi − φ(x)} = φ∗ (ξ), x∈Rn
that is, φ(¯ x) + φ∗ (ξ) ≤ hξ, x ¯i which along with the Fenchel–Young inequality, Proposition 2.103 (iv), reduces to the desired condition φ(¯ x) + φ∗ (ξ) = hξ, x ¯i. Conversely, suppose that above condition is satisfied, which by Definition 2.101 of the conjugate function implies hξ, x ¯i − φ(¯ x) ≥ hξ, xi − φ(x), ∀ x ∈ Rn , that is, φ(x) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn . Thus, ξ ∈ ∂φ(¯ x), thereby establishing the equivalence. Now if φ is lsc as well, then by Theorem 2.105, φ = φ∗∗ . Then the equivalent condition can be expressed as ξ¯ ∈ ∂φ(¯ x) if and only if ¯ = hξ, ¯x φ∗∗ (¯ x) + φ∗ (ξ) ¯i.
© 2012 by Taylor & Francis Group, LLC
122
Tools for Convex Optimization
By Definition 2.101 of the biconjugate function, the above condition is equivalent to ¯x ¯ ≥ hξ, x hξ, ¯i − φ∗ (ξ) ¯i − φ∗ (ξ), ∀ ξ ∈ Rn , that is, ¯ ≥ hξ − ξ, ¯x φ∗ (ξ) − φ∗ (ξ) ¯i, ∀ ξ ∈ Rn .
¯ The converse can be worked By the definition of subdifferential, x ¯ ∈ ∂φ∗ (ξ). out along the lines of the previous part and thus establishing the desired relation. As an application of the above theorem, consider a closed convex cone ¯ Suppose that K ⊂ Rn . We claim that ξ¯ ∈ ∂δK (¯ x) if and only if x ¯ ∈ ∂δK ◦ (ξ). ¯ ξ ∈ ∂δK (¯ x) = NK (¯ x), which is equivalent to ¯ x−x hξ, ¯i ≤ 0, ∀ x ∈ K.
¯x In particular, taking x = 0 and x = 2¯ x, respectively, implies that hξ, ¯i = 0. Therefore, the above inequality reduces to ¯ xi ≤ 0, ∀ x ∈ K, hξ,
which by Definition 2.30 implies that ξ¯ ∈ K ◦ . Thus, ξ¯ ∈ NK (¯ x) is equivalent to ξ¯ ∈ K ◦ ,
x ¯ ∈ K,
¯x hξ, ¯i = 0.
For a closed convex cone K, by Proposition 2.31, K ◦◦ = K. As x ¯ ∈ K = K ◦◦ , hξ, x ¯i ≤ 0, ∀ ξ ∈ K ◦ .
¯x Because hξ, ¯i = 0, the above condition is equivalent to ¯x hξ − ξ, ¯i ≤ 0, ∀ ξ ∈ K ◦ ,
¯ thereby proving our claim. which implies that x ¯ ∈ NK ◦ (ξ),
2.6
ε-Subdifferential
In Subsection 2.3.3 on differentiability properties of a convex function, from Proposition 2.82 and the examples preceding it, we noticed that ∂φ(x) may turn out to be empty, even though x ∈ dom φ. To overcome this aspect of subdifferentials, the concept of the ε-subdifferential came into existence; it not only overcomes the drawback of subdifferentials but is also important from the optimization point of view. The idea can be found in the work of Brønsted and Rockafellar [19] but the theory of ε-subdifferential calculus was given by Hiriart-Urruty [58].
© 2012 by Taylor & Francis Group, LLC
2.6 ε-Subdifferential
123
¯ For ε > 0, Definition 2.109 Consider a proper convex function φ : Rn → R. the ε-subdifferential of φ at x ¯ ∈ dom φ is given by ∂ε φ(¯ x) = {ξ ∈ Rn : φ(x) − φ(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn }. For a zero function, 0 : Rn → R, defined as 0(x) = 0 for every x ∈ Rn , ∂ε 0(¯ x) = {0} for every ε > 0. Otherwise if there exists ξ ∈ ∂ε 0(¯ x) with ξ = 6 0, by the above definition of ε-subdifferential, 0
≥ hξ, x − x ¯i − ε n X = ξi (xi − x ¯i ) − ε, ∀ x ∈ Rn . i=1
Because ξ 6= 0, there exists some j ∈ {1, 2, . . . , n} such that ξj 6= 0. In 2ε and xi = x ¯i , i 6= j, the above inequality yields particular, taking xj = x ¯j + ξj ε ≤ 0, which is a contradiction. As shown in Section 2.4 that for a convex set F ⊂ Rn , the subdifferential of the indicator function coincides with the normal cone, that is, ∂δF = NF . Along similar lines, we define the ε-normal set. Definition 2.110 Consider a convex set F ⊂ Rn . Then for ε > 0, the ε-subdifferential of the indicator function at x¯ ∈ F is ∂ε δF (¯ x)
= {ξ ∈ Rn : δF (x) − δF (¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn } = {ξ ∈ Rn : ε ≥ hξ, x − x ¯i, ∀ x ∈ F },
which is also called the ε-normal set and denoted as Nε,F (¯ x). Note that Nε,F is not a cone unlike NF , which is always a cone. ¯ given by Recall the proper convex function φ : R → R √ − x, 0 ≤ x ≤ 1, φ(x) = +∞, otherwise, considered in Subsection 2.3.3. As already mentioned, for x = 0, the subdifferential ∂φ(x) is empty. But for any ε > 0, the ε-subdifferential at x = 0 is −1 and hence nonempty. ∂ε φ(x) = −∞, 2ε In the proposition below we present some properties of the ε-subdifferential of the convex functions. ¯ and Proposition 2.111 Consider a proper lsc convex function φ : Rn → R let ε > 0 be given. Then for every x ¯ ∈ dom φ, the ε-subdifferential ∂ε φ(¯ x) is a nonempty closed convex set and \ ∂φ(¯ x) = ∂ε φ(¯ x). ε>0
For ε1 ≥ ε2 , ∂ε2 (¯ x) ⊂ ∂ε1 (¯ x).
© 2012 by Taylor & Francis Group, LLC
124
Tools for Convex Optimization
Proof. Observe that for x ¯ ∈ dom φ and ε > 0, φ(¯ x) − ε < φ(¯ x), which implies (¯ x, φ(¯ x) − ε) 6∈ epi φ. Because φ is a lsc convex, by Theorem 1.9 and Proposition 2.48, epi φ is closed convex set in Rn × R. Therefore, applying the Strict Separation Theorem, Theorem 2.26 (iii), there exists (ξ, γ) ∈ Rn × R with (ξ, γ) 6= (0, 0) such that hξ, x ¯i + γ(φ(¯ x) − ε) < hξ, xi + γα, ∀ (x, α) ∈ epi φ. As (x, φ(x)) ∈ epi φ for every x ∈ dom φ, the above condition leads to hξ, x ¯i + γ(φ(¯ x) − ε) < hξ, xi + γφ(x), ∀ x ∈ dom φ.
(2.35)
In particular, taking x = x ¯ in the preceding inequality yields γ > 0. Now dividing (2.35) throughout by γ implies that ξ ¯i − ε < φ(x) − φ(¯ x), ∀ x ∈ dom φ. h− , x − x γ The above condition is also satisfied by x 6∈ dom φ, which implies ξ h− , x − x ¯i − ε < φ(x) − φ(¯ x), ∀ x ∈ Rn . γ By Definition 2.109 of the ε-subdifferential, −
ξ ∈ ∂ε φ(¯ x). Thus, ∂ε φ(¯ x) is γ
nonempty for every x ¯ ∈ dom φ. Suppose that {ξk } ⊂ ∂ε φ(¯ x) such that ξk → ξ. By the definition of ε-subdifferential, φ(x) − φ(¯ x) ≥ hξk , x − x ¯i − ε, ∀ x ∈ Rn . Taking the limit as k → +∞, the above inequality leads to φ(x) − φ(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn , which implies that ξ ∈ ∂ε φ(¯ x), thereby yielding the closedness of ∂ε φ(¯ x). Consider ξ1 , ξ2 ∈ ∂ε φ(¯ x), which implies that for i = 1, 2, φ(x) − φ(¯ x) ≥ hξi , x − x ¯i − ε, ∀ x ∈ Rn . Therefore, for any λ ∈ [0, 1], φ(x) − φ(¯ x) ≥ h(1 − λ)ξ1 + λξ2 , x − x ¯i − ε, ∀ x ∈ Rn , which implies (1 − λ)ξ1 + λξ2 ∈ ∂ε φ(¯ x). Because ξ1 , ξ2 were arbitrary, ∂ε φ(¯ x) is convex. Now we will prove that \ ∂φ(¯ x) = ∂ε φ(¯ x). ε>0
© 2012 by Taylor & Francis Group, LLC
2.6 ε-Subdifferential
125
Suppose that ξ ∈ ∂φ(¯ x), which by Definition 2.77 of the subdifferential implies that for every x ∈ Rn , φ(x) − φ(¯ x)
≥ hξ, x − x ¯i
≥ hξ, x − x ¯i − ε, ∀ ε > 0.
Thus, by the definition of ε-subdifferential, ξ ∈ ∂ε φ(¯ x) for every ε > 0. Because ξ ∈ ∂φ(¯ x) was arbitrary, \ ∂φ(¯ x) ⊂ ∂ε φ(¯ x). ε>0
Conversely, consider ξ ∈ ∂ε φ(¯ x) for every ε > 0, which implies that for every x ∈ Rn , φ(x) − φ(¯ x) ≥ hξ, x − x ¯i − ε, ∀ ε > 0. As the preceding inequality holds for every ε > 0, taking the limit as ε → 0 leads to φ(x) − φ(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , thereby yielding ξ ∈ ∂φ(¯ x). Because ξ was arbitrary, the reverse inclusion is satisfied, that is, \ ∂φ(¯ x) ⊃ ∂ε φ(¯ x), ε>0
hence establishing the result. The relation ∂ε2 (¯ x) ⊂ ∂ε1 (¯ x) for ε1 ≥ ε2 can be easily worked out using the definition of ε-subdifferential. The proof of the nonemptiness of ε-subdifferential T is from Lucchetti [79]. In the example, it is easy to observe that at x = 0, ε>0 ∂ε φ(x) is empty, as is ∂φ(x). Before moving any further, let us consider the absolute value function, φ(x) = |x|. The subdifferential of φ is given by 1, x > 0, [−1, 1] , x = 0, ∂φ(x) = −1, x < 0. Now for ε > 0, the ε-subdifferential of φ [1 − ε/x, 1] , [−1, 1] , ∂ε φ(x) = [−1, −1 − ε/x]
is
x > ε/2, −ε/2 ≤ x ≤ ε/2, x < −ε/2.
The graphs of ∂φ and ∂ε φ for ε = 1 are shown in Figures 2.8 and 2.9. The graph of the subdifferential is a simple step function. Similar to the characterization of the subdifferential in terms of the conjugate function, the following result provides a relation between the ε-subdifferential and the conjugate function.
© 2012 by Taylor & Francis Group, LLC
126
Tools for Convex Optimization
1
−1 FIGURE 2.8: Graph of ∂(|.|).
1 −1
1/2 −1/2
1
−1 FIGURE 2.9: Graph of ∂1 (|.|).
© 2012 by Taylor & Francis Group, LLC
2.6 ε-Subdifferential
127
¯ Then for Theorem 2.112 Consider a proper convex function φ : Rn → R. any ε > 0 and x ∈ dom φ, ξ ∈ ∂ε φ(x)
⇐⇒
φ(x) + φ∗ (ξ) − hξ, xi ≤ ε.
Proof. Consider any ξ ∈ ∂ε φ(¯ x). By Definition 2.109 of ε-subdifferential, φ(x) − φ(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn , which implies that hξ, xi − φ(x) + φ(¯ x) − hξ, x ¯i ≤ ε, ∀ x ∈ Rn . By Definition 2.101 of the conjugate function, φ∗ (ξ) + φ(¯ x) − hξ, x ¯i ≤ ε, as desired. Conversely, suppose that the inequality holds, which by the definition of conjugate function implies that hξ, xi − φ(x) + φ(¯ x) − hξ, x ¯i ≤ ε, ∀ x ∈ Rn , which yields that ξ ∈ ∂ε φ(¯ x), thus establishing the equivalence.
As mentioned earlier, the notion of ε-subdifferential appears in the wellknown work of Brønsted and Rockafellar [19] in which they estimated how well ∂ε φ “approximates” ∂φ. We present the modified version of that famous Brønsted–Rockafellar Theorem from Thibault [108] below. The proof involves the famous Ekeland’s Variational Principle [41, 42, 43], which we state without proof before moving on with the result. Theorem 2.113 (Ekeland’s Variational Principle) Consider a closed proper ¯ and for ε > 0, let x function φ : Rn → R ¯ ∈ Rn be such that φ(¯ x) ≤ infn φ(x) + ε. x∈R
Then for any λ > 0, there exists xλ ∈ Rn such that kxλ − x ¯k ≤
ε , λ
φ(xλ ) ≤ φ(¯ x),
and xλ is the unique minimizer of the unconstrained problem inf φ(x) +
ε kx − xλ k λ
subject to
x ∈ Rn .
Observe that the second condition in Ekeland’s Variational Principle, φ(xλ ) ≤ φ(¯ x), implies that φ(xλ ) − φ(¯ x) ≤ 0 ≤ ε.
© 2012 by Taylor & Francis Group, LLC
(2.36)
128
Tools for Convex Optimization
From the condition on x ¯, φ(¯ x) ≤ infn φ(x) + ε ≤ φ(xλ ) + ε, x∈R
which implies that φ(¯ x) − φ(xλ ) ≤ ε. The above condition along with (2.36) leads to |φ(xλ ) − φ(¯ x)| ≤ ε. Now to establish the modified version of Brønsted–Rockafellar Theorem, we can apply |φ(xλ ) − φ(¯ x)| ≤ ε
instead of
φ(xλ ) ≤ φ(¯ x)
in the Ekeland’s Variational Principle. Theorem 2.114 (A modified version of the Brønsted–Rockafellar Theorem) ¯ and x Consider a proper lsc convex function φ : Rn → R ¯ ∈ dom φ. Then for any ε > 0 and for any ξ ∈ ∂ε φ(¯ x), there exist xε ∈ Rn and ξε ∈ ∂φ(xε ) such that √ √ ¯i − φ(¯ x)| ≤ 2ε. kxε − x ¯k ≤ ε, kξε − ξk ≤ ε and |φ(xε ) − hξε , xε − x Proof. By Definition 2.109 of the ε-subdifferential, ξ ∈ ∂ε φ(¯ x) implies φ(x) − φ(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn , that is, φ(¯ x) − hξ, x ¯i ≥ φ(x) − hξ, xi + ε, ∀ x ∈ Rn . By applying Ekeland’s Variational Principle, Theorem √ 2.113, to φ − hξ, .i with √ ¯k ≤ ε, λ = ε, there exists xε ∈ Rn such that kxε − x |φ(xε ) − hξ, xε i − φ(¯ x) + hξ, x ¯i| ≤ ε
(2.37)
and φ(xε ) − hξ, xε i ≤ φ(x) − hξ, xi +
√
εkx − xε k, ∀ x ∈ Rn .
(2.38)
By the definition of subdifferential, the above condition (2.38) implies that √ (2.39) ξ ∈ ∂(φ + εk. − xε k)(xε ). As dom k. − xε k = Rn , by Theorem 2.69, k. − xε k is continuous on Rn . Therefore, by Theorem 2.91 along with the fact that ∂(k. − xε k)(xε ) = B, (2.39) becomes √ ξ ∈ ∂φ(xε ) + εB.
© 2012 by Taylor & Francis Group, LLC
2.6 ε-Subdifferential
129 √ Thus, there exists ξε ∈ ∂φ(xε ) such that kξε − ξk ≤ ε. From condition (2.37) along with the Cauchy–Schwarz inequality, Proposition 1.1, |φ(xε ) − hξ, xε − x ¯i − φ(¯ x)|
≤ ε + |hξε − ξ, xε − x ¯i| ≤ ε + kξε − ξkkxε − x ¯k = 2ε,
thereby completing the proof.
As in the study of optimality conditions, we need the subdifferential calculus rules; similarly, ε-subdifferentials also play a pivotal role in this respect. Below we present the ε-subdifferential Sum Rule, Max-Function and the Scalar Product Rules that we will need in our study of optimality conditions for the convex programming problem (CP ). The proofs of the Sum and the MaxFunction Rules are from Hiriart-Urruty and Lemar´echal [62]. ¯ Theorem 2.115 (Sum Rule) Consider two proper convex functions φi : Rn → R, i = 1, 2 such that ri dom φ1 ∩ ri dom φ2 6= ∅. Then for ε > 0, [ (∂ε1 φ1 (¯ x) + ∂ε2 φ2 (¯ x)) ∂ε (φ1 + φ2 )(¯ x) = ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε
for every x ¯ ∈ dom φ1 ∩ dom φ2 . Proof. Suppose that ε1 ≥ 0 and ε2 ≥ 0 such that ε1 + ε2 = ε. Consider ξi ∈ ∂εi φi (¯ x), i = 1, 2, which by Definition 2.109 of the ε-subdifferential implies that for every x ∈ Rn , φi (x) − φi (¯ x) ≥ hξi , x − x ¯i − εi , i = 1, 2. The above condition along with the assumption ε1 + ε2 = ε leads to (φ1 + φ2 )(x) − (φ1 + φ2 )(¯ x) ≥
hξ1 + ξ2 , x − x ¯i − (ε1 + ε2 )
= hξ1 + ξ2 , x − x ¯i − ε, ∀ x ∈ Rn ,
thereby yielding ξ1 + ξ2 ∈ ∂ε (φ1 + φ2 )(¯ x). Because εi ≥ 0 and ξi ∈ ∂εi φi (¯ x) for i = 1, 2, were arbitrary, [ ∂ε (φ1 + φ2 )(¯ x) ⊃ (∂ε1 φ1 (¯ x) + ∂ε2 φ2 (¯ x)) ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε
Conversely, suppose that ξ ∈ ∂ε (φ1 + φ2 )(¯ x), which by the definition of ε-subdifferential implies that (φ1 + φ2 )(x) − (φ1 + φ2 )(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn . By Definition 2.101 of the conjugate function, (φ1 + φ2 )∗ (ξ) + (φ1 + φ2 )(¯ x) − hξ, x ¯i ≤ ε.
© 2012 by Taylor & Francis Group, LLC
(2.40)
130
Tools for Convex Optimization
By the Sum Rule of the conjugate function, Theorem 2.107 (ii), as the assumption ri dom φ1 ∩ ri dom φ2 6= ∅ holds, (φ1 + φ2 )∗ (ξ) = (φ∗1 φ∗2 )(ξ), and the infimum is attained, which implies there exist ξi ∈ Rn , i = 1, 2, satisfying ξ1 + ξ2 = ξ such that (φ1 + φ2 )∗ (ξ) = φ∗1 (ξ1 ) + φ∗2 (ξ2 ). Therefore, the inequality (2.40) becomes (φ∗1 (ξ1 ) + φ1 (¯ x) − hξ1 , x ¯i) + (φ∗2 (ξ2 ) + φ2 (¯ x) − hξ2 , x ¯i) ≤ ε. Denote εi = φ∗i (ξi ) + φi (¯ x) − hξi , x ¯i, i = 1, 2, which by the Fenchel–Young inequality, Proposition 2.103 (iv), implies that εi ≥ 0, i = 1, 2. Observe that ε1 + ε2 ≤ ε. Again, by the definition of conjugate function, φi (x) − φi (¯ x) ≥
≥
where ε¯i = εi +
hξi , x − x ¯ i − εi
hξi , x − x ¯i − ε¯i ,
ε − ε1 − ε2 ≥ εi , i = 1, 2. Therefore, for i = 1, 2, 2 ξi ∈ ∂ε¯i φi (¯ x),
with ε¯1 + ε¯2 = ε. Thus, ξ = ξ1 + ξ2 ∈ ∂ε¯1 φ1 (¯ x) + ∂ε¯2 φ2 (¯ x). Because ξ ∈ ∂ε (φ1 + φ2 )(¯ x) was arbitrary, [ ∂ε (φ1 + φ2 )(¯ x) ⊂ (∂ε1 φ1 (¯ x) + ∂ε2 φ2 (¯ x)), ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε
thereby completing the proof.
Before proving the ε-subdifferential Max-Function Rule, we state a result from Hiriart-Urruty and Lemar´echal [62] without proof and present the Scalar Product Rule. ¯ Proposition 2.116 Consider proper convex functions φi : Rn → R, i = 1, . . . , m. Let φ(x) = max{φ (x), φ (x), . . . , φ (x)} and p = min{m, n + 1}. 2 m Sm 1 ∗ For every ξ ∈ dom φP = co i=1 dom φ∗i , there exist ξi ∈ dom φ∗i and λi ≥ 0, p i = 1, 2, . . . , p, with i=1 λi = 1 such that φ∗ (ξ) =
p X i=1
© 2012 by Taylor & Francis Group, LLC
λi φ∗i (ξi )
and
ξ=
p X i=1
λi ξi .
2.6 ε-Subdifferential
131
More precisely, (ξi , λi ) solve the problem inf
p X i=1
subject to ξ =
λi φ∗i (ξi ) p X
λi ξi ,
i=1
ξi ∈ dom φ∗i ,
p X
λi = 1,
i=1 λi ≥
(P )
0, i = 1, 2, . . . , p.
For the ε-subdifferential Max-Function Rule, we will need the Scalar Product Rule that we present below. Theorem 2.117 (Scalar Product Rule) For a proper convex function ¯ and any ε ≥ 0, φ : Rn → R ∂ε (λg)(¯ x) = λ∂ε/λ g(¯ x), ∀ λ > 0. Proof. Suppose that xi ∈ ∂ε (λφ)(¯ x), which by Definition 2.109 implies that (λφ)(x) − (λφ)(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn . As λ > 0, dividing throughout by λ leads to ε ξ ¯i − , ∀ x ∈ Rn , φ(x) − φ(¯ x) ≥ h , x − x λ λ ξ ε ∈ ∂ε˜φ(¯ x), where ε˜ = , that is, ξ ∈ λ∂ε˜φ(¯ x). Because λ λ ξ ∈ ∂ε (λφ)(¯ x) was arbitrary,
which implies
∂ε (λg)(¯ x) ⊂ λ∂ε˜g(¯ x). Conversely, suppose that ξ ∈ λ∂ε˜φ(¯ x) for λ > 0, which implies there exists ˜ By the definition of ε-subdifferential, ξ˜ ∈ ∂ε˜φ(¯ x) such that ξ = λξ. ˜ x−x φ(x) − φ(¯ x) ≥ hξ, ¯i − ε˜, ∀ x ∈ Rn , which implies (λφ)(x) − (λφ)(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn , where ε = λ˜ ε. Therefore, ξ ∈ ∂ε (λφ)(¯ x). Because ξ ∈ λ∂ε˜φ(¯ x) was arbitrary, ∂ε (λg)(¯ x) ⊃ λ∂ε˜g(¯ x), thereby yielding the desired result.
Now we proceed with establishing the ε-subdifferential Max-Function Rule with the above results as the tool.
© 2012 by Taylor & Francis Group, LLC
132
Tools for Convex Optimization
Theorem 2.118 (Max-Function Rule) Consider proper convex functions ¯ i = 1, 2, . . . , m. Let φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)} and φi : Rn → R, p = min{m, n + 1}. Then ξ ∈ ∂ε φ(¯ x) if P and only if there exist ξi ∈ dom φ∗i , p εi ≥ 0, and λi ≥ 0, i = 1, 2, . . . , p, with i=1 λi = 1 such that ξi ∈ ∂εi /λi φi (¯ x)
ξ=
p X
λi ξi
for every
and
i=1
p X i=1
i
satisfying
εi + φ(¯ x) −
p X i=1
λi > 0,
(2.41)
λi φi (¯ x) ≤ ε.
(2.42)
Proof. By Proposition 2.116, φ∗ (ξ) =
p X
λi φ∗i (ξi ),
i=1
where p = min{m, n + 1} and (ξi , λi ) ∈ dom φ∗i × R+ , i = 1, 2, . . . , p, solves the problem (P ), that is, satisfies p X
ξ=
λi ξi
p X
and
i=1
λi = 1.
i=1
By the relation between the ε-subdifferential and the conjugate function, Theorem 2.112, as ξ ∈ ∂ε φ(x), φ∗ (ξ) + φ(x) − hξ, xi ≤ ε, which by the conditions on (ξi , λi ), i = 1, 2, . . . , p, along with the definition of φ leads to p p X X λi φ∗i (ξi ) + φ(x) − λi hξi , xi ≤ ε. (2.43) i=1
i=1
The above condition can be rewritten as p X i=1
εi + φ(x) −
p X i=1
λi φi (x) ≤ ε,
where εi = λi (φ∗i (ξi ) + φi (x) − hξi , xi), i = 1, 2, . . . , p, which by Theorem 2.112 yields that ξi ∈ ∂εi /λi φi (x) provided λi > 0, thereby leading to the conditions (2.41) and (2.42) as desired. Conversely, suppose that the conditions (2.41) and (2.42) hold. By Theorem 2.112, (2.41) implies that for λi > 0, λi (φ∗i (ξi ) + φi (x) − hξi , xi) ≤ εi , which along with (2.42) lead to p X i=1
© 2012 by Taylor & Francis Group, LLC
λi φ∗i (ξi ) + φ(x) −
p X i=1
λi hξi , xi ≤ ε,
2.6 ε-Subdifferential
133
that is, the inequality (2.43). Invoking Proposition 2.116 yields φ∗ (ξ) =
p X
λi φ∗i (ξi ),
i=1
which along with (2.43) and Theorem 2.112 implies that ξ ∈ ∂ε φ(x), thereby completing the proof. Remark 2.119 In the above result, applying the Scalar Product Rule, Theorem 2.117, to the condition (2.41) implies that ξ˜i = λi ξi ∈ ∂εi (λi φi )(x) provided λi > 0. Therefore, ξ ∈ ∂ε φ(x) is such that there exist ξ˜i ∈ ∂εi (λi φi )(x), i = 1, 2, . . . , p, satisfying ξ=
p X
ξ˜i
and
i=1
p X i=1
εi + φ(x) −
p X i=1
λi φi (x) ≤ ε.
As p = min{m, n + 1}, we consider two cases. If p = m, for some j ∈ {1, 2, . . . , p}, define ε˜j = εj + (ε − and the conditions become
p X
εi )
i=1
and
ε˜i = εi , i 6= j
ξ˜i ∈ ∂εi (λi φi )(¯ x) for every i satisfying λi > 0, m m m X X X ˜ λi φi (¯ x) = ε. εi + φ(¯ x) − ξi and ξ= i=1
i=1
(2.44) (2.45)
i=1
If p < m,Pdefine λi = 0 and Pp εi > 0 arbitrary for i = p + 1, p + 2, . . . , m, m such that i=p+1 εi = ε − j=1 εj . As already discussed, ∂εi (λi φi )(x) = {0}, i = p + 1, p + 2, . . . , m and hence yield the conditions (2.44) and (2.45). Thus, if ξ ∈P ∂ε φ(x), then there exist ξi ∈ dom φ∗i , εi ≥ 0 and λi ≥ 0, i = 1, 2, . . . , m, m with i=1 λi = 1 such that the coditions (2.44) and (2.45) hold. ¯ and In particular, for a proper convex function φ : Rn → R + φ (x) = max{0, φ(x)}, ∂ε (φ+ )(¯ x) ⊂ {∂η (λφ)(¯ x) : 0 ≤ λ ≤ 1, η ≥ 0, ε = η + φ+ (¯ x) − λφ(¯ x)}. In the results stated above, the ε-subdifferential calculus rules were expressed in terms of the ε-subdifferential itself. Below we state a result by Hiriart-Urruty and Phelps [64] relating the Sum Rule of the subdifferentials and the ε-subdifferentials. ¯ Theorem 2.120 Consider two proper lsc convex functions φi : Rn → R, i = 1, 2. Then for any x ¯ ∈ dom φ1 ∩ dom φ2 , \ ∂(φ1 + φ2 )(¯ x) = cl (∂ε φ1 (¯ x) + ∂ε φ2 (¯ x)). ε>0
© 2012 by Taylor & Francis Group, LLC
134
Tools for Convex Optimization
Proof. Suppose that ξi ∈ ∂ε φi (¯ x), i = 1, 2, which implies φi (x) − φi (¯ x) ≥ hξi , x − x ¯i − ε, ∀ x ∈ Rn , i = 1, 2. Therefore, (φ1 + φ2 )(x) − (φ1 + φ2 )(¯ x) ≥ hξ1 + ξ2 , x − x ¯i − 2ε, ∀ x ∈ Rn , that is, ξ1 + ξ2 ∈ ∂2ε (φ1 + φ2 )(¯ x). Because ξi ∈ ∂ε φi (¯ x), i = 1, 2, are arbitrary, which along with the closedness of ε-subdifferential by Proposition 2.111 yields ∂2ε (φ1 + φ2 )(¯ x) ⊃ cl (∂ε φ1 (¯ x) + ∂ε φ2 (¯ x)). Further, applying Proposition 2.111 leads to \ ∂(φ1 + φ2 )(¯ x) ⊃ cl (∂ε φ1 (¯ x) + ∂ε φ2 (¯ x)). ε>0
To establish the result, we shall prove the reverse containment in the above condition. Suppose that ξ¯ ∈ ∂(φ1 + φ2 )(¯ x). By Theorem 2.108, ¯ = hξ, ¯x (φ1 + φ2 )(¯ x) + (φ1 + φ2 )∗ (ξ) ¯i, which along with the Fenchel–Young inequality, Proposition 2.103 (iv), implies that ¯ ≤ hξ, ¯x (φ1 + φ2 )(¯ x) + (φ1 + φ2 )∗ (ξ) ¯i. Applying the Sum Rule of conjugate functions, Theorem 2.107 (ii), to proper lsc convex functions φ1 and φ2 leads to (φ1 + φ2 )∗ = cl (φ∗1 φ∗2 ). Define φ(ξ) = (φ∗1 φ∗2 )(ξ) − hξ, x ¯i, which implies cl φ = cl (φ∗1 φ∗2 ) − h., x ¯i. By the preceding conditions, ¯ ≤ α. It is easy to observe that denoting α = −(φ1 + φ2 )(¯ x) yields φ(ξ) \ {ξ ∈ Rn : cl φ(ξ) ≤ α} = cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}. ε>0
Therefore, for every ε > 0, ξ¯ ∈ cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}. If φ(ξ) ≤ α + ε/2, then φ(ξ) − α = =
inf
¯i − hξ2 , x ¯i + φ1 (¯ x) + φ2 (¯ x)} {φ∗1 (ξ1 ) + φ∗2 (ξ2 ) − hξ1 , x
inf
¯i + φ1 (¯ x)) + (φ∗2 (ξ2 ) − hξ2 , x ¯i + φ2 (¯ x))}. {(φ∗1 (ξ1 ) − hξ1 , x
ξ=ξ1 +ξ2 ξ=ξ1 +ξ2
© 2012 by Taylor & Francis Group, LLC
2.6 ε-Subdifferential
135
Therefore, there exist ξ1 , ξ2 such that ξ = ξ1 + ξ2 and (φ∗1 (ξ1 ) − hξ1 , x ¯i + φ1 (¯ x)) + (φ∗2 (ξ2 ) − hξ2 , x ¯i + φ2 (¯ x))} < ε. By the Fenchel–Young inequality, φ∗i (ξi ) − hξi , x ¯i + φi (¯ x) ≥ 0, i = 1, 2, which along with Definition 2.101 of the conjugate and the preceding conditions imply that hξi , x − x ¯i − φ(x) + φi (¯ x) ≤ ε, ∀ x ∈ Rn , i = 1, 2, that is, ξi ∈ ∂εi φi (¯ x) for i = 1, 2. Thus, cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2} ⊂ cl (∂ε φ1 (¯ x) + ∂ε φ2 (¯ x)), which implies ξ¯ ∈ cl (∂ε φ1 (¯ x) + ∂ε φ2 (¯ x)) for every ε > 0. As ξ¯ ∈ ∂(φ1 + φ2 )(¯ x) was arbitrary, ∂(φ1 + φ2 )(¯ x) ⊂
\
cl (∂ε φ1 (¯ x) + ∂ε φ2 (¯ x)),
ε>0
thus establishing the result.
Now if one goes back to the optimality condition 0 ∈ ∂φ(¯ x) in Theorem 2.89, it gives an equivalent characterization to the point of minimizer x¯ of the unconstrained problem (CPu ). So one will like to know then what the condition 0 ∈ ∂ε φ(¯ x) implies. As it turns out, it leads to the concept of approximate optimality conditions, which we will deal with in one of the later chapters. For now we simply state the result on approximate optimality for the unconstrained convex programming problem (CPu ). ¯ and let ε > 0 Theorem 2.121 Consider a proper convex function φ : Rn → R be given. Then 0 ∈ ∂ε φ(¯ x) if and only if φ(¯ x) ≤ infn φ(x) + ε. x∈R
The point x ¯ is called an ε-solution of (CPu ). In the above theorem we mentioned only one of the notions of approximate solutions, namely the ε-solution. But there are other approximate solution concepts, as we shall see later in the book, some of which are motivated by the Ekeland’s Variational Principle, Theorem 2.113.
© 2012 by Taylor & Francis Group, LLC
136
2.7
Tools for Convex Optimization
Epigraphical Properties of Conjugate Functions
With the study of conjugate function and ε-subdifferential, we are now in a position to present the relation of the epigraph of conjugate functions with the ε-subdifferentials of a convex function from Jeyakumar, Lee, and Dinh [68]. This relation plays an important part in the study of sequential optimality conditions as we shall see in the chapter devoted to its study. ¯ and let Theorem 2.122 Consider a proper lsc convex function φ : Rn → R x ¯ ∈ dom φ. Then [ epi φ∗ = {(ξ, hξ, x ¯ − φ(¯ x) + ε) : ξ ∈ ∂ε φ(¯ x)}. ε≥0
Proof. Denote F=
[
ε≥0
{(ξ, hξ, x ¯ − φ(¯ x) + ε) : ξ ∈ ∂ε φ(¯ x)}.
Suppose that (ξ, α) ∈ epi φ∗ , which implies φ∗ (ξ) ≤ α. By Definition 2.101 of the conjugate function, hξ, xi − φ(x) ≤ α, ∀ x ∈ Rn . Denoting ε = α − hξ, x ¯i + φ(¯ x), the above inequality becomes φ(x) − φ(¯ x)
≥ hξ, xi − φ(¯ x) − α
= hξ, x − x ¯i − ε, ∀ x ∈ Rn ,
which by Definition 2.109 of the ε-subdifferential implies that ξ ∈ ∂ε φ(¯ x). Therefore, (ξ, α) ∈ F. Because (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ F. Conversely, suppose that (ξ, α) ∈ F, which implies there exists ε ≥ 0 and x ¯ ∈ dom ∂φ with ξ ∈ ∂ε φ(¯ x)
and
α = hξ, x ¯i − φ(¯ x) + ε.
As ξ ∈ ∂ε φ(¯ x), by the definition of ε-subdifferential, φ(x) − φ(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn , which by the definition of conjugate function leads to φ∗ (ξ) ≤ hξ, x ¯i − φ(¯ x) + ε = α. Thus, (ξ, α) ∈ epi φ∗ . Because (ξ, α) ∈ F was arbitrary, epi φ∗ ⊃ F, thereby establishing the result. Next we discuss the epigraphical conditions for the operations of the conjugate functions.
© 2012 by Taylor & Francis Group, LLC
2.7 Epigraphical Properties of Conjugate Functions
137
Theorem 2.123 (i) (Sum Rule) Consider proper lsc convex functions ¯ i = 1, 2, . . . , m. Then φi : Rn → R, epi(φ1 + φ2 + . . . + φm )∗ = cl (epi φ∗1 + epi φ∗2 + . . . + epi φ∗m ). (ii) (Supremum Rule) Consider a family of proper lsc convex functions ¯ i ∈ I, where I is an arbitrary index set. Then φi : Rn → R, [ epi (sup φi )∗ = cl co epi φ∗i . i∈I
i∈I
(iii) Consider proper lsc convex functions φi : Rn → R, i = 1, 2, . . . , m. Define a vector-valued convex function Φ : Rn → Rm , defined as Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)). Then [ epi (λΦ)∗ is a convex cone. λ∈Rm +
(iv) Consider a proper lsc convex function φ : Rn → R. Then for every λ > 0, epi (λφ)∗ = λ epi φ∗ . Proof. (i) As φi , i = 1, 2, . . . , m are proper lsc convex functions, the condition of Theorem 2.107 (ii) reduces to (φ1 + φ2 + . . . + φm )∗ = cl (φ∗1 φ∗2 . . . φ∗m ), which implies epi (φ1 + φ2 + . . . + φm )∗ = epi cl (φ∗1 φ∗2 . . . φ∗m ). By Definition 1.11 of the closure of a function, the above condition becomes epi (φ1 + φ2 + . . . + φm )∗ = cl epi (φ∗1 φ∗2 . . . φ∗m ), which by Proposition 2.55 leads to the desired condition. (ii) Theorem 2.107 (iv) along with Definition 2.57 of the closed convex hull of a function implies that [ epi (sup φi )∗ = epi cl co (inf φ∗i ) = cl co epi φ∗i , i∈I
i∈I
i∈I
thereby establishing the result. S (iii) Suppose that (ξ, α) ∈ λ∈Rm (λΦ)∗ , which implies that there exists + ′ ∗ λ′ ∈ Rm + such that (ξ, α) ∈ epi (λ Φ) . This along with Definition 2.101 of the conjugate function leads to hξ, xi − (λ′ Φ)(x) ≤ (λ′ Φ)∗ (ξ) ≤ α, ∀ x ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
138
Tools for Convex Optimization
Multiplying throughout by any γ > 0, hγξ, xi − ((γλ′ )Φ)(x) ≤ γα, ∀ x ∈ Rn , where γλ′ ∈ Rm + . Again, by the definition of conjugate function, the above condition leads to ((γλ′ )Φ)∗ (γξ) ≤ γα, which implies that γ(ξ, α) ∈ epi ((γλ′ )Φ)∗ ⊂
[
λ∈Rm +
epi (λΦ)∗ , ∀ γ > 0.
S Hence, λ∈Rm epi (λΦ)∗ is a cone. + S Now consider (ξi , αi ) ∈ λ∈Rm epi (λΦ)∗ , i = 1, 2, which implies there + ∗ exist λi ∈ Rm + such that (ξi , αi ) ∈ epi (λi Φ) for i = 1, 2. Therefore, by the definition of conjugate function, for every x ∈ Rn , hξi , xi − (λi Φ)(x) ≤ αi , i = 1, 2. For any γ ∈ [0, 1], the above condition leads to h(1 − γ)ξ1 + γξ2 , xi − (λ′ Φ)(x) ≤ (1 − γ)α1 + γα2 , ∀ x ∈ Rn , where λ′ = (1 − γ)λ1 + γλ2 ∈ Rm + . Therefore, (λ′ Φ)∗ ((1 − γ)ξ1 + γξ2 ) ≤ (1 − γ)α1 + γα2 , which implies that (1 − γ)(ξ1 , α1 ) + γ(ξ2 , α2 ) ∈ epi (λ′ Φ)∗ ⊂
[
λ∈Rm +
Because (ξi , αi ), i = 1, 2, were arbitrary, thus set.
epi (λΦ)∗ , ∀ γ ∈ [0, 1].
S
λ∈Rm +
epi (λΦ)∗ is a convex
(iv) Suppose that (ξ, α) ∈ epi (λφ)∗ , which implies that (λφ)∗ (ξ) ≤ α. As λ > 0, Proposition 2.103 (iii) leads to α ξ ∗ ≤ , φ λ λ ξ α which implies , ∈ epi φ∗ , that is, (ξ, α) ∈ λ epi φ∗ . Because λ λ (ξ, α) ∈ epi (λφ)∗ was arbitrary, epi (λφ)∗ ⊂ λepi φ∗ . The reverse inclusion can be obtained by following the steps backwards, thereby establishing the result.
© 2012 by Taylor & Francis Group, LLC
2.7 Epigraphical Properties of Conjugate Functions
139
¯ From the above theorem, for two proper lsc convex functions φi : Rn → R, i = 1, 2, epi(φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ). In general, epi φ∗1 + epi φ∗2 need not be closed. But under certain additional conditions, it can be shown that epi φ∗1 + epi φ∗2 is closed. We present below the result from Burachik and Jeyakumar [20] and Dinh, Goberna, L´opez, and Son [32] to establish the same. ¯ Proposition 2.124 Consider two proper lsc convex functions φi : Rn → R, i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. If cone(dom φ1 − dom φ2 ) is a closed subspace or at least one of the functions is continuous at some point in dom φ1 ∩ dom φ2 , then epi φ∗1 + epi φ∗2 is closed. Proof. As cone(dom φ1 − dom φ2 ) is a closed subspace, by Theorem 1.1 of Attouch and Br´ezis [2] or Theorem 3.6 of Str¨omberg [107], the exact infimal convolution holds, that is, (φ1 + φ2 )∗ = φ∗1 φ∗2 . The above condition along with Theorem 2.123 (i) leads to cl (epi φ∗1 + epi φ∗2 ) = epi (φ1 + φ2 )∗ = epi (φ∗1 φ∗2 ) = epi φ∗1 + epi φ∗2 , thereby yielding the result that epi φ∗1 + epi φ∗2 is closed. Suppose that φ1 is continuous at x ˆ ∈ dom φ1 ∩ dom φ2 , which yields 0 ∈ core (dom φ1 − dom φ2 ). This implies that cone (dom φ1 − dom φ2 ) is a closed subspace and thus leads to the desired result. Note that the result gives only sufficient condition for the closedness of epi φ∗1 + epi φ∗2 . The converse need not be true. For a better understanding, we consider the following example from Burachik and Jeyakumar [20]. Let φ1 = δ[0,∞) and φ2 = δ(−∞,0] . Therefore, epi φ∗1 = epi σ[0,∞) = R− × R+
and epi φ∗2 = epi σ(−∞,0] = R+ × R+ ,
which leads to epi φ∗1 + epi φ∗2 = R × R+ , a closed convex cone. Observe that cone(dom φ1 − dom φ2 ) = [0, ∞), which is not a subspace, and also neither φ1 nor φ2 are continuous at dom φ1 ∩ dom φ2 = {0}. Thus, the condition, epi φ∗1 + epi φ∗2 is closed, is a relaxed condition in comparison to the other assumptions. Using this closedness assumption, Burachik and Jeyakumar [21] obtained an equivalence between the exact inf-convolution and ε-subdifferential Sum Rule, which we present next.
© 2012 by Taylor & Francis Group, LLC
140
Tools for Convex Optimization
¯ Theorem 2.125 Consider two proper lsc convex functions φi : Rn → R, i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. Then the following are equivalent: (i) (φ1 + φ2 )∗ = φ∗1 φ∗2 with exact infimal convolution, (ii) For every ε ≥ 0 and every x ¯ ∈ dom φ1 ∩ dom φ2 , [ ∂ε (φ1 + φ2 )(¯ x) = (∂ε1 φ1 (¯ x) + ∂ε2 φ2 (¯ x)). ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε
(iii) epi φ∗1 + epi φ∗2 is closed, Proof. (i) ⇒ (ii): The proof follows along the lines of Theorem 2.115.
(ii) ⇒ (iii): Suppose that (ξ, γ) ∈ cl (epi φ∗1 + epi φ∗2 ). By Theorem 2.123 (i), (ξ, γ) ∈ epi (φ1 + φ2 )∗ . Let x ¯ ∈ dom φ1 ∩ dom φ2 . By Theorem 2.122, there exists ε ≥ 0 such that ξ ∈ ∂ε (φ1 + φ2 )(¯ x)
and
γ = hξ, x ¯i − (φ1 + φ2 )(¯ x) + ε.
By (ii), there exist εi ≥ 0 and ξi ∈ ∂εi φi (¯ x), i = 1, 2, such that ξ = ξ1 + ξ2
and
ε = ε 1 + ε2 .
Define γi = hξi , x ¯i−φi (¯ x)+εi , i = 1, 2. Then from Theorem 2.122, for i = 1, 2, (ξi , γi ) ∈ epi φ∗i , which implies (ξ, γ) = (ξ1 , γ1 ) + (ξ2 , γ2 ) ∈ epi φ∗1 + epi φ∗2 , thereby leading to (iii). (iii) ⇒ (i): Suppose that there exists ξ ∈ Rn such that ξ ∈ dom (φ1 + φ2 )∗ (ξ). Otherwise (i) holds trivially. By (iii), epi (φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ) = epi φ∗1 + epi φ∗2 , which implies (ξ, (φ1 + φ2 )∗ (ξ)) ∈ epi φ∗1 + epi φ∗2 . Thus for i = 1, 2, there exist (ξi , γi ) ∈ epi φ∗i such that ξ = ξ1 + ξ2
and
(φ1 + φ2 )∗ = γ1 + γ2 ,
which implies there exists ξ¯ ∈ Rn such that ¯ + φ∗ (ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ). φ∗1 (ξ − ξ) 2 Therefore, ¯ + φ∗ (ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ). (φ∗1 φ∗2 )(ξ) ≤ φ∗1 (ξ − ξ) 2
© 2012 by Taylor & Francis Group, LLC
2.7 Epigraphical Properties of Conjugate Functions
141
By Theorem 2.107 and (iii), (φ1 + φ2 )∗ (ξ) = cl (φ∗1 φ∗2 )(ξ) ≤ (φ∗1 φ∗2 )(ξ), which along with the preceding condition leads to the exact infimal convolution, thereby establishing (i). Though it is obvious that under the closedness of epi φ∗1 + epi φ∗2 , one can obtain the subdifferential Sum Rule by choosing ε = 0 in (ii) of the above theorem, we present a detailed version of the result from Burachik and Jeyakumar [20]. Below is an alternative approach to the Sum Rule, Theorem 2.91. ¯ Theorem 2.126 Consider two proper lsc convex functions φi : Rn → R, ∗ ∗ i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. If epi φ1 + epi φ2 is closed, then ∂(φ1 + φ2 )(¯ x) = ∂φ1 (¯ x) + ∂φ2 (¯ x), ∀ x ¯ ∈ dom φ1 ∩ dom φ2 . Proof. Let x ¯ ∈ dom φ1 ∩ dom φ2 . It is easy to observe that ∂(φ1 + φ2 )(¯ x) ⊃ ∂φ1 (¯ x) + ∂φ2 (¯ x). To prove the result, we shall prove the converse inclusion. Suppose that ξ ∈ ∂(φ1 + φ2 )(¯ x). By Theorem 2.108, (φ1 + φ2 )∗ (ξ) + (φ1 + φ2 )(¯ x) = hξ, x ¯i. Therefore, the above condition along with the given hypothesis (ξ, hξ, x ¯i − (φ1 + φ2 )(¯ x)) ∈ epi (φ1 + φ2 )∗ = epi φ∗1 + epi φ∗2 , which implies that there exist (ξi , γi ) ∈ epi φ∗i , i = 1, 2, such that ξ = ξ1 + ξ2
and
hξ, x ¯i − (φ1 + φ2 )(¯ x) = γ1 + γ2 .
Also, as (ξi , γi ) ∈ epi φ∗i , i = 1, 2, which along with the above conditions lead to φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≤ hξ, x ¯i − (φ1 + φ2 )(¯ x) = hξ1 , x ¯i + hξ2 , x ¯i − φ1 (¯ x) − φ2 (¯ x). By the Fenchel–Young inequality, Proposition 2.103 (iv), φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≥ hξ1 , x ¯i + hξ2 , x ¯i − φ1 (¯ x) − φ2 (¯ x), which together with the preceding inequality leads to φ∗1 (ξ1 ) + φ∗2 (ξ2 ) = hξ1 , x ¯i + hξ2 , x ¯i − φ1 (¯ x) − φ2 (¯ x). Again by the Fenchel–Young inequality and the above equation, φ∗1 (ξ1 ) + φ1 (¯ x) − hξ1 , x ¯i = hξ2 , x ¯i − φ2 (¯ x) − φ∗2 (ξ2 ) ≤ 0,
© 2012 by Taylor & Francis Group, LLC
142
Tools for Convex Optimization
which by Theorem 2.108 yields ξ1 ∈ ∂φ1 (¯ x). Along similar lines it can be obtained that ξ2 ∈ ∂φ2 (¯ x). Thus, ξ = ξ1 + ξ2 ∈ ∂φ1 (¯ x) + ∂φ2 (¯ x), which implies that ∂(φ1 + φ2 )(¯ x) ⊂ ∂φ1 (¯ x) + ∂φ2 (¯ x), thereby leading to the desired result.
We end this chapter with an application of Theorem 2.126 to provide an alternative assumption to establish equality in Proposition 2.39 (i). Corollary 2.127 Consider convex sets F1 , F2 ⊂ Rn such that F1 ∩ F2 6= ∅. If epi σF1 + epi σF2 is closed, then NF1 ∩F2 (¯ x) = NF1 (¯ x) + NF2 (¯ x), ∀ x ¯ ∈ F 1 ∩ F2 . Proof. We know that for any convex set F ⊂ Rn , δF∗ = σF . Thus the condition epi σF1 + epi σF2
is closed
epi δF∗ 1 + epi δF∗ 2
is closed.
is equivalent to
As the condition for Theorem 2.126 to hold is satisfied, ∂(δF1 + δF2 )(¯ x) = ∂δF1 (¯ x) + ∂δF2 (¯ x), x ¯ ∈ dom δF1 ∩ dom δF2 . Because δF1 + δF2 = δF1 ∩F2 , the above equality condition along with the fact that for any convex set F ⊂ Rn , ∂δF = NF , yields the desired result, that is, NF1 ∩F2 (¯ x) = NF1 (¯ x) + NF2 (¯ x), ∀ x ¯ ∈ F 1 ∩ F2 .
© 2012 by Taylor & Francis Group, LLC
Chapter 3 Basic Optimality Conditions Using the Normal Cone
3.1
Introduction
Recall the convex optimization problem presented in Chapter 1 min f (x)
subject to
n
x ∈ C,
(CP )
n
where f : R → R is a convex function and C ⊂ R is a convex set. It is natural to think that f ′ (x, h) and ∂f (x) will play a major role in the process of establishing the optimality conditions as these objects have been successful in overcoming the difficulty posed by the absence of a derivative. In this chapter we will not bother ourselves with extended-valued function but such a framework can be easily adapted into the current framework. But use of extendedvalued convex functions might appear while doing some of the proofs, as one will need to use the calculus rules for subdifferentials like the Sum Rule or the Chain Rule. To begin our discussion more formally, we right away state the following basic result. Theorem 3.1 Consider the convex optimization problem (CP ). Then x ¯ is a point of minimizer of (CP ) if and only if either of two conditions hold: (i) f ′ (¯ x, d) ≥ 0, ∀ d ∈ TC (¯ x) or, (ii) 0 ∈ ∂f (¯ x) + NC (¯ x). Proof. (i) As x ¯ ∈ C and C is a convex set, x ¯ + λ(x − x ¯) ∈ C, ∀ x ∈ C, ∀ λ ∈ [0, 1]. Also, as x ¯ is a point of minimizer of (CP ), then for every x ∈ C f (¯ x + λ(x − x ¯)) ≥ f (¯ x), ∀ λ ∈ [0, 1]. Therefore, lim λ↓0
f (¯ x + λ(x − x ¯)) − f (¯ x) ≥ 0, λ 143
© 2012 by Taylor & Francis Group, LLC
144
Basic Optimality Conditions Using the Normal Cone
which implies f ′ (¯ x, x − x ¯) ≥ 0, ∀ x ∈ C. By Theorem 2.76, the directional derivative is sublinear in the direction and thus f ′ (¯ x, d) ≥ 0, ∀ d ∈ cl cone(C − x ¯). By Theorem 2.35, TC (¯ x) = cl cone(C − x ¯) and therefore, the above inequality becomes f ′ (¯ x, d) ≥ 0, ∀ d ∈ TC (¯ x). Conversely, suppose condition (i) holds. As f : Rn → R, applying Proposition 2.83 and Theorem 2.79, for any d ∈ Rn there exists ξ ∈ ∂f (¯ x) such that hξ, di = f ′ (¯ x, d). For every x ∈ C, x − x ¯ ∈ TC (¯ x). Therefore, the convexity of f along with the above condition and condition (i) implies that for every x ∈ C, there exists ξ ∈ ∂f (¯ x) such that f (x) − f (¯ x) ≥ hξ, x − x ¯i ≥ 0, ∀ x ∈ C, thereby proving that x ¯ is the point of minimizer of f over C. (ii) As x ¯ is a point of minimizer of f over C, we have that x ¯ also solves the problem min (f + δC )(x).
x∈Rn
Hence, by the optimality conditions for the unconstrained optimization problem, Theorem 2.89, 0 ∈ ∂(f + δC )(¯ x). Because x ¯ ∈ dom δC , by Proposition 2.14, ri dom δC = ri C is nonempty. Also ri dom f = Rn and hence ri dom f ∩ ri dom δC 6= ∅. Now using the Sum Rule, Theorem 2.91, 0 ∈ ∂f (¯ x) + ∂δC (¯ x), which by the fact that ∂δC (x) = NC (x) leads to 0 ∈ ∂f (¯ x) + NC (¯ x).
© 2012 by Taylor & Francis Group, LLC
3.2 Slater Constraint Qualification
145
Conversely, suppose that condition (ii) is satisfied, which means that there exists ξ ∈ ∂f (¯ x) such that −ξ ∈ NC (¯ x), that is, hξ, x − x ¯i ≥ 0, ∀ x ∈ C. Therefore by the convexity of f , which along with the above inequality yields f (x) ≥ f (¯ x), ∀ x ∈ C, thereby leading to the desired result.
By condition (ii) of the above theorem, there exists ξ ∈ ∂f (¯ x) such that h−ξ, xi ≤ h−ξ, x ¯i, ∀ x ∈ C. As x ¯ ∈ C, the above condition yields that the support function to the set C at −ξ is given by σC (−ξ) = −hξ, x ¯i. Thus, condition (ii) is equivalent to the above condition. Again, by condition (ii) of Theorem 3.1, there exists ξ ∈ ∂f (¯ x) such that −ξ ∈ NC (¯ x), which can be equivalently expressed as h(¯ x − αξ) − x ¯, x − x ¯i ≤ 0, ∀ x ∈ C, ∀ α ≥ 0. Therefore, by Proposition 2.52, condition (ii) is equivalent to x ¯ = projC (¯ x − αξ), ∀ α ≥ 0. We state the above discussion as the following result. Theorem 3.2 Consider the convex optimization problem (CP ). Then x ¯ is a point of minimizer of (CP ) if and only if there exists ξ ∈ ∂f (¯ x) such that either
3.2
σC (−ξ) = −hξ, x ¯i
or
x ¯ = projC (¯ x − αξ), ∀ α ≥ 0.
Slater Constraint Qualification
Now consider the case where C is represented only through convex inequality constraints. Observe that the equality affine constraints of the form hj (x) = 0, j = 1, 2, . . . , l, n
where hj : R → R, j = 1, 2, . . . , l, are affine functions can also be expressed in the convex inequality form as hj (x) ≤ 0, j = 1, 2, . . . , l,
−hj (x) ≤ 0, j = 1, 2, . . . , l.
© 2012 by Taylor & Francis Group, LLC
146
Basic Optimality Conditions Using the Normal Cone
Thus, we define C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
(3.1)
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. In practice, this is most often the case. In order to write an explicit optimality condition we need to compute NC (¯ x) and express it in terms of the constraint functions gi , i = 1, 2, . . . , m. So how do we do that? In this respect, we present the following result. Proposition 3.3 Consider the set C as in (3.1). Assume that the active index set at x ¯, that is, I(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = 0} is nonempty. Let the Slater constraint qualification hold, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, for i = 1, 2, . . . , m. Then X NC (¯ x) = λi ξi ∈ Rn : ξi ∈ ∂gi (¯ x), λi ≥ 0, i ∈ I(¯ x) . i∈I(¯ x)
In order to prove the above proposition, we need to do a bit of work, which we will do step by step. Denote the set on the right-hand side of the equality by X b x) = S(¯ λi ξi ∈ Rn : ξi ∈ ∂gi (¯ x), λi ≥ 0, i ∈ I(¯ x) . (3.2) i∈I(¯ x)
One might get curious as to what are these λi , i = 1, 2, . . . , m, in the expresb x). These are the Lagrange multipliers, vital stuff in sion of the elements of S(¯ optimization and that we need to discuss more in detail. In order to establish b x), we will prove that S(¯ b x) is a closed Proposition 3.3, that is, NC (¯ x) = S(¯ convex cone for which we need the following lemma whose proof is as given in van Tiel [110].
Proposition 3.4 Consider a nonempty compact set A ⊂ Rn with 0 6∈ A. Let K be the cone generated by A, that is, K = coneA = {λa ∈ Rn : λ ≥ 0, a ∈ A}. Then K is a closed set. Proof. Consider a sequence {xk } ⊂ K such that xk → x ˜. To prove the result, we need to show that x ˜ ∈ K. As xk ∈ K, there exist λk ≥ 0 and ak ∈ A such that xk = λk ak for every k. Because A is compact, {aν } is a bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, {ak } has a
© 2012 by Taylor & Francis Group, LLC
3.2 Slater Constraint Qualification
147
convergent subsequence. Thus, without loss of generality, let ak → a ˜ and as A is closed, a ˜ ∈ A. Because 0 6∈ A, it is simple to observe that there exists α > 0 such that kak ≥ α for every a ∈ A. Hence, |λk | =
1 |λk |kak k ≤ kλk ak k. kak k α
As λk ak → x ˜, kλk ak k is bounded, thereby implying that {λk } is a bounded sequence, that by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a ˜ This convergent subsequence. Without loss of generality, assume that λk → λ. shows that ˜a xk = λk ak → λ˜
k → +∞.
as
˜a = x By the assumption xk → x ˜ and as the limit is unique, λ˜ ˜. Hence x ˜ ∈ K, thereby establishing the result. b x) is a closed convex cone. This fact will We will now show that the set S(¯ play a major role in the proof of Proposition 3.3. Lemma 3.5 Assume that I(¯ x) is nonempty and the Slater constraint qualib x) given by (3.2) is a closed convex cone. fication holds. Then the set S(¯
b x) is a cone. To prove the convexity of S(¯ b x), let Proof. Observe that S(¯ P j j j j b x). Then vj = x), v1 , v2 (6= 0) ∈ S(¯ i∈I(¯ x) λi ξi where λi ≥ 0 and ξi ∈ ∂gi (¯ b x) is a cone, to show that it is convex, by Theoi ∈ I(¯ x) for j = 1, 2. As S(¯ b x). Consider rem 2.20 we just have to show that v1 + v2 ∈ S(¯ v1 + v2
=
X
λ1i ξi1 + λ2i ξi2
i∈I(¯ x)
=
X
(λ1i + λ2i )
i∈I(¯ x)
λ2i λ1i 1 2 ξ + ξ . λ1i + λ2i i λ1i + λ2i i
Because ∂gi (¯ x) is a convex set, λ1i
λ2 λ1i ξi1 + 1 i 2 ξi2 ∈ ∂gi (¯ x). 2 + λi λi + λi
b x). Hence, v1 + v2 ∈ S(¯ b x) is closed. Consider the function Finally, we have to show that S(¯ g(x) = max{g1 (x), g2 (x), . . . , gm (x)}.
Moreover, as I(¯ x) is nonempty, g(¯ x) = 0 with J(¯ x) = I(¯ x), where J(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = g(¯ x)}.
© 2012 by Taylor & Francis Group, LLC
148
Basic Optimality Conditions Using the Normal Cone
Further, from the Max-Function Rule, Theorem 2.96, [ ∂g(¯ x) = co ∂gi (¯ x) .
(3.3)
i∈I(¯ x)
b x) = cone(∂g(¯ We claim that S(¯ x)), that is,
b x) = {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(¯ S(¯ x)}.
(3.4)
b x) is given as above and applying Proposition 3.4 But before showing that S(¯ b to conclude that S(¯ x) is closed, we first need to show that 0 6∈ ∂g(¯ x). As the Slater constraint qualification holds, there exists xˆ such that gi (ˆ x) < 0 for every i = 1, 2, . . . , m. Hence g(ˆ x) < 0. By the convexity of g, hξ, x ˆ−x ¯i ≤ g(ˆ x) − g(¯ x), ∀ ξ ∈ ∂g(¯ x). Because J(¯ x) = I(¯ x) is nonempty, for every ξ ∈ ∂g(¯ x), hξ, x ˆ−x ¯i < 0. As x ˆ 6= x ¯, it is clear that 0 6∈ ∂g(¯ x). Otherwise, if 0 ∈ ∂g(¯ x), the above inequality will be violated. Hence, observe that 0 6∈ ∂gi (¯ x) for every i ∈ J(¯ x) = I(¯ x). b x) is a cone, 0 ∈ S(¯ b x). For λ = 0, Because S(¯ 0 ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(¯ x)}.
b x) with v 6= 0. We will show that Consider v ∈ S(¯
v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(¯ x)}.
b x), there exist λi ≥ 0 and ξi ∈ ∂gi (¯ As vP∈ S(¯ x), i ∈ I(¯ x) such that x) for all i ∈ I(¯ x), it is clear that v = i∈I(¯x) λi ξi . Because v 6= 0 and 0 6∈ ∂gi (¯ P all the λi , i ∈ I(¯ x) cannot be simultaneously zero and hence i∈I(¯x) λi > 0. P P Let α = i∈I(¯x) λi and thus i∈I(¯x) λi /α = 1. Therefore, X λi [ 1 v= ξi ∈ co ∂gi (¯ x) , α α i∈I(¯ x)
i∈I(¯ x)
which by (3.3) implies that v ∈ α ∂g(¯ x). Hence,
b x) ⊆ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(¯ S(¯ x)}.
Conversely, consider v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(¯ x)} with v 6= 0.
© 2012 by Taylor & Francis Group, LLC
3.2 Slater Constraint Qualification
149
Therefore, v = λξ for some λ ≥ 0, ξ ∈ ∂g(¯ x). The condition (3.3) yields that there exist µi ≥ 0 and ξi ∈ ∂gi (¯ x) for i ∈ I(¯ x) such that X ξ= µi ξi i∈I(¯ x)
with
P
i∈I(¯ x)
µi = 1. Therefore, v=
X
λµi ξi =
i∈I(¯ x)
X
λ′i ξi ,
i∈I(¯ x)
b x). Because v was arbitrary, where λ′i = λµi ≥ 0 for i ∈ I(¯ x). Hence, v ∈ S(¯ (3.4) holds, which along with the fact that 0 6∈ ∂g(¯ x) and Proposition 3.4 b yields that S(¯ x) is closed. b x) is proved to be closed under Remark 3.6 It may be noted here that S(¯ the Slater constraint qualification, which is equivalent to [ 0 6∈ co ∂gi (¯ x). i∈I(¯ x)
This observation was made by Wolkowicz [112]. In the absence of such condib x) need not be closed. tions, S(¯
Now we turn to establish Proposition 3.3 according to which, if the Slater constraint qualification holds, then b x). NC (¯ x) = S(¯
b x) ⊆ NC (¯ Proof of Proposition 3.3. First we will prove that S(¯ x). Consider b any P v ∈ S(¯ x). Thus, there exist λi ≥ 0 and ξi ∈ ∂gi (¯ x) for i ∈ I(¯ x) such that v = i∈I(¯x) λi ξi . Hence, for any x ∈ C, hv, x − x ¯i =
X
i∈I(¯ x)
λi hξi , x − x ¯i.
By the convexity of gi , i ∈ I(¯ x), hξi , x − x ¯i ≤ gi (x) − gi (¯ x) ≤ 0, ∀ x ∈ C. Thus hv, x − x ¯i ≤ 0 for every x ∈ C, thereby showing that v ∈ NC (¯ x). b x). On Conversely, suppose that v ∈ NC (¯ x). We have to show that v ∈ S(¯ b b the contrary, assume that v 6∈ S(¯ x). As S(¯ x) is a closed convex cone, by the strict separation theorem, Theorem 2.26 (iii), there exists w ∈ Rn with w 6= 0 such that b x). hw, ξi ≤ 0 < hw, vi, ∀ ξ ∈ S(¯ © 2012 by Taylor & Francis Group, LLC
150
Basic Optimality Conditions Using the Normal Cone b x) = cone co S As S(¯ x) , for each i ∈ I(¯ x), hw, ξi i ≤ 0 for every i∈I(¯ x) ∂gi (¯ ξi ∈ ∂gi (¯ x), which along with Theorem 2.79 yields gi′ (¯ x, w) ≤ 0, ∀ i ∈ I(¯ x).
(3.5)
Define K = {u ∈ Rn : gi′ (¯ x, u) < 0, ∀ i ∈ I(¯ x)}. Our first step is to show that K is nonempty. By the Slater constraint qualification, there exists x ˆ such that gi (ˆ x) < 0 for every i = 1, 2, . . . , m, and corresponding to that x ˆ, set u = x ˆ−x ¯. By the convexity of each gi and Theorem 2.79, gi′ (¯ x, x ˆ−x ¯) ≤ gi (ˆ x) − gi (¯ x), ∀ i ∈ I(¯ x), which implies gi′ (¯ x, x ˆ−x ¯) < 0, ∀ i ∈ I(¯ x). Hence, x ˆ−x ¯ ∈ K, thereby showing that K is nonempty. Observe that for any u ∈ K, there exists λ > 0 sufficiently small such that gi (¯ x + λu) < 0 for all i = 1, 2, . . . , m, which implies x ¯ + λu ∈ C. Therefore, u∈
1 (C − x ¯) ⊆ cone(C − x ¯) ⊆ cl cone(C − x ¯). λ
By Theorem 2.35, u ∈ TC (¯ x). Because TC (¯ x) is closed, cl K ⊆ TC (¯ x). Also, as K is nonempty, it is simple to show that cl K = {u ∈ Rn : gi′ (¯ x, u) ≤ 0, ∀ i ∈ I(¯ x)}. By (3.5), w ∈ cl K and hence, w ∈ TC (¯ x). As v ∈ NC (¯ x), hv, wi ≤ 0, thereby contradicting the fact that hv, wi > 0 and thus establishing the result. Recall condition (ii) from Theorem 3.1, that is, 0 ∈ ∂f (¯ x) + NC (¯ x). By combining it with Proposition 3.3, we can conclude that under the Slater constraint qualification, x ¯ is a point of minimizer of the convex programming ¯ ∈ Rm such problem (CP ) with C given by (3.1) if and only if there exists λ + that X ¯ i ∂gi (¯ λ x). 0 ∈ ∂f (¯ x) + i∈I(¯ x)
¯ i = 0 for i 6∈ I(¯ Setting λ x), the above expression can be rewritten as 0 ∈ ∂f (¯ x) +
m X i=1
© 2012 by Taylor & Francis Group, LLC
¯ i ∂gi (¯ λ x)
and
¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m.
3.2 Slater Constraint Qualification
151
The above two expressions form the celebrated Karush–Kuhn–Tucker (KKT) optimality conditions for the convex programming problem (CP ) with C given ¯ ∈ Rm is called a Lagrange multiplier or a Karush–Kuhn– by (3.1). The vector λ + Tucker (KKT) multiplier. The second condition is known as the complementary slackness condition. Now suppose that the KKT optimality conditions are satisfied. Then there exist ξ0 ∈ ∂f (¯ x) and ξi ∈ ∂gi (¯ x), i = 1, 2, . . . , m, such that 0 = ξ0 +
m X
λi ξi .
(3.6)
i=1
Therefore, by the convexity of f and gi , i = 1, 2, . . . , m, for every x ∈ Rn , f (x) − f (¯ x) ≥
gi (x) − gi (¯ x) ≥
hξ0 , x − x ¯i,
hξi , x − x ¯i, i = 1, 2, . . . , m.
The above inequalities along with (3.6) yields that for every x ∈ Rn , f (x) − f (¯ x) +
m X i=1
λi (gi (x) − gi (¯ x)) ≥ hξ0 , x − x ¯i +
m X i=1
λi hξi , x − x ¯i = 0. (3.7)
The above inequality holds, in particular, for x ∈ C ⊂ Rn . Invoking the complementary slackness condition along with the feasibility of x ∈ C, the condition (3.7) reduces to f (x) ≥ f (¯ x), ∀ x ∈ C. Thus, x ¯ is a point of minimizer of (CP ). This discussion can be summed up in the form of the following theorem. Theorem 3.7 Consider the convex programming problem (CP ) with C given by (3.1). Assume that the Slater constraint qualification holds. Then x ¯ is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
¯ i ∂gi (¯ λ x)
and
¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m.
i=1
It is obvious that the computation of the normal cone in Proposition 3.3 plays a major role in the derivation of the KKT optimality conditions. What is shown by the computation of the normal cone in Proposition 3.3 is that the Lagrange multipliers are not just auxiliary multipliers that help us convert a constrained problem into a unconstrained one but are related to the geometry of the feasible set.
© 2012 by Taylor & Francis Group, LLC
152
Basic Optimality Conditions Using the Normal Cone
Remark 3.8 In Proposition 3.3, we have seen how to compute the normal cone when the convex inequality constraints need not be smooth. Now if gi , i = 1, 2, . . . , m, are differentiable and the Slater constraint qualification holds, then from Proposition 3.3 X NC (¯ x) = {v ∈ Rn : λi ∇gi (¯ x), λi ≥ 0, ∀ i ∈ I(¯ x)}. (3.8) i∈I(¯ x)
This can be actually computed easily. Note that v ∈ NC (¯ x) if x ¯ is a point of minimizer of the problem min − hv, xi
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m.
As the Slater condition holds, by Theorem 3.7 there exist λi ≥ 0, i = 1, 2, . . . , m, such that −v +
m X i=1
λi ∇gi (¯ x) = 0.
By the complementary slackness condition, λi = 0, i 6∈ I(¯ x); thus the above relation becomes X −v + λi ∇gi (¯ x) = 0. i∈I(¯ x)
Hence, any v ∈ NC (¯ x) belongs to the set on the right-hand side. One can simply check that any element on the right-hand side is also an element in the normal cone. From (3.8), it is simple to see that NC (¯ x) is a finitely generated cone with {∇gi (¯ x) : i ∈ I(¯ x)} being the set of generators. Thus, NC (¯ x) is polyhedral when the gi , i = 1, 2, . . . , m, are differentiable and the Slater constraint qualification holds. Is the normal cone also polyhedral if the Slater constraint qualification holds but the constraint functions gi , i = 1, 2, . . . , m, are not be differentiable? What is seen from Proposition 3.3 is that in the case nondifferentiable constraints, NC (¯ x) can be represented as X NC (¯ x) = { λi ξi ∈ Rn : λi ≥ 0, ξi ∈ ∂gi (¯ x), i ∈ I(¯ x)} i∈I(¯ x)
=
[
{
X
ξi ∈∂gi (¯ x) i∈I(¯ x)
λi ξi ∈ Rn : λi ≥ 0, i ∈ I(¯ x)},
that is, the union of a family of polyhedral cones. We will now show by an example that even though NC (¯ x) is a union of a family of polyhedral cones, it itself need not be polyhedral. Consider the set C ⊆ R3 given as q C = {x ∈ R3 : x21 + x22 ≤ −x3 , x3 ≤ 0}.
© 2012 by Taylor & Francis Group, LLC
3.2 Slater Constraint Qualification
153
x3
NC (¯ x)
x2 x¯ = (0, 0, 0)
x1
C
FIGURE 3.1: NC (¯ x) is not polyhedral.
It is clear that C is described by the constraints q x21 + x22 + x3 x3
≤ 0
≤ 0.
Each of these are convex functions. It is simple to see that the Slater condition holds. Just take the point x ˆ = (0, 0, −1). It is also simple to see that the first constraint is not differentiable at x ¯ = (0, 0, 0). However, from the geometry,
© 2012 by Taylor & Francis Group, LLC
154
Basic Optimality Conditions Using the Normal Cone
Figure 3.1, it is simple to observe that q NC (¯ x) = {v ∈ R3 : v12 + v22 ≤ v3 , v3 ≥ 0}.
It is easy to observe that this cone, which is also known as the second-order cone, is not polyhedral as it has an infinite number of generators and hence is not finitely generated.
3.3
Abadie Constraint Qualification
From the previous section it is obvious that to derive the KKT conditions an important feature is that the Slater constraint qualification is satisfied. But what happens if the Slater constraint qualification is not satisfied? Is there any other route to derive the KKT conditions? In this direction, we introduce what is known as the Abadie constraint qualification. Consider the problem (CP ) with C given by (3.1), that is C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m} where gi , i = 1, 2, . . . , m, are convex functions. Then the Abadie constraint qualification is said to hold at x ¯ ∈ C if TC (¯ x) = {v ∈ Rn : gi′ (¯ x, v) ≤ 0, ∀ i ∈ I(¯ x)}. ◦
As C is convex, (TC (¯ x)) = NC (¯ x) and the expression (ii) in Theorem 3.1 can be written as ◦
0 ∈ ∂f (¯ x) + (TC (¯ x)) . If the Abadie constraint qualification holds, we can compute the NC (¯ x) as ◦
NC (¯ x) = (S(¯ x)) , ◦
where (S(¯ x)) denotes the polar cone of the cone S(¯ x) = {v ∈ Rn : gi′ (¯ x, v) ≤ 0, ∀ i ∈ I(¯ x)}. It can be easily verified that S(¯ x) is a closed convex cone. Also observe that TC (¯ x) ⊂ {v ∈ Rn : gi′ (¯ x, v) ≤ 0, ∀ i ∈ I(¯ x)} is always satisfied. So one may simply consider the reverse inclusion as the ◦ Abadie constraint qualification. We will now compute (S(¯ x)) . But before we do that, let us convince ourselves through an example that the Abadie
© 2012 by Taylor & Francis Group, LLC
3.3 Abadie Constraint Qualification
155
constraint qualification can hold even if the Slater constraint qualification fails. Consider C = {x ∈ R : |x| ≤ 0, x ≤ 0}. Here, g1 (x) = |x|, g2 (x) = x and of course C = {0}. Let us set x ¯ = 0. This shows that TC (¯ x) = {0}. Further, because both constraints are active at x ¯, S(¯ x)
= {v ∈ R : g1′ (¯ x, v) ≤ 0, g2′ (¯ x, v) ≤ 0} ′ = {v ∈ R : g1 (¯ x, v) ≤ 0, h∇g2 (¯ x), vi ≤ 0} = {v ∈ R : |v| ≤ 0, v ≤ 0} = {0}.
Hence TC (¯ x) = S(¯ x), showing that the Abadie constraint qualification holds while it is clear that the Slater constraint qualification does not hold. Now we present the following result. ◦ b x). Proposition 3.9 (S(¯ x)) = cl S(¯
Proof. From the relation (3.2), X b x) = S(¯ λi ξi : λi ≥ 0, ξi ∈ ∂gi (¯ x), i ∈ I(¯ x) i∈I(¯ x)
is a convex cone from Lemma 3.5. Recall from the proof of Lemma 3.5 that b x) was shown to be closed under the Slater constraint qualification. In the S(¯ b x) need not be closed. First absence of the Slater constraint qualification, S(¯ ◦ b x) ⊆ (S(¯ b x), which implies there we show that cl S(¯ x)) . Consider any v ∈ S(¯ P exist λi ≥ 0 and ξi ∈ ∂gi (¯ x) for i ∈ I(¯ x) such that v = i∈I(¯x) λi ξi . Consider any element w ∈ S(¯ x), that is, gi′ (¯ x, w) ≤ 0 for i ∈ I(¯ x). Hence for every i ∈ I(¯ x), by Theorem 2.79, hξi , wi ≤ 0 for every ξi ∈ ∂gi (¯ x), which implies * + X λi ξi , w ≤ 0, i∈I(¯ x)
◦ b x) ⊆ (S(¯ that is, hv, wi ≤ 0. Because w ∈ S(¯ x) was arbitrarily chosen, S(¯ x)) , ◦ ◦ b x) ⊆ (S(¯ which by closedness of (S(¯ x)) leads to cl S(¯ x)) . To complete the proof, we will establish the reverse inclusion, that is, ◦ ◦ b x). On the contrary, assume that (S(¯ b x), which (S(¯ x)) ⊆ cl S(¯ x)) * cl S(¯ ◦ b b implies there exists w ∈ (S(¯ x)) and w ∈ / cl S(¯ x). As cl S(¯ x) is a closed convex cone, by the strict separation theorem, Theorem 2.26 (iii), there exists v ∈ Rn with v 6= 0 such that
sup hv, ξi < hv, wi.
b x) ξ∈cl S(¯
© 2012 by Taylor & Francis Group, LLC
156
Basic Optimality Conditions Using the Normal Cone
b x), hv, wi > 0. We claim that v ∈ (cl S(¯ b x))◦ , that is, Because 0 ∈ cl S(¯ b x). If v 6∈ (cl S(¯ b x))◦ , then there exists ξ˜ ∈ cl S(¯ b x) hv, ξi ≤ 0 for every ξ ∈ cl S(¯ ˜ ˜ ˜ b x) such that hv, ξi > 0. For every λ > 0, λhv, ξi = hv, λξi > 0. Because cl S(¯ ˜ b is a cone, λξ ∈ cl S(¯ x) for λ > 0, which means that as λ becomes sufficiently large, the inequality ˜ < hv, wi hv, λξi
b x))◦ . Further, observe that for i ∈ I(¯ will be violated. Thus, v ∈ (cl S(¯ x), b x), where ξi ∈ ∂gi (¯ ξi ∈ S(¯ x). Therefore, hv, ξi i ≤ 0 for every ξi ∈ ∂gi (¯ x), i ∈ I(¯ x), which implies that gi′ (¯ x, v) ≤ 0 for every i ∈ I(¯ x). This shows that ◦ v ∈ S(¯ x) and therefore, hv, wi ≤ 0 because w ∈ (S(¯ x)) . This leads to a contradiction, thereby establishing the result. The result below presents the KKT optimality conditions under the Abadie constraint qualification. Theorem 3.10 Consider the convex programming problem (CP ) with C given by (3.1). Let x ¯ be a point of minimizer of (CP ) and assume that the Abadie constraint qualification holds at x ¯. Then b x). 0 ∈ ∂f (¯ x) + cl S(¯
(3.9)
Conversely, if (3.9) holds for some x ¯ ∈ Rn , then x ¯ is a point of minimizer of (CP ). Moreover, the standard KKT optimality conditions hold at x ¯ if either b x) is closed or the functions gi , i ∈ I(¯ S(¯ x), are smooth functions. Proof. If the Abadie constraint qualification holds at x¯, then using Proposition 3.9, the relation (3.9) holds. Conversely, suppose that (3.9) holds at x ¯. By the convexity of gi , i ∈ I(¯ x), for every ξi ∈ ∂gi (¯ x), hξi , x − x ¯i ≤ gi (x) − gi (¯ x) ≤ 0, ∀ x ∈ C. b x), there exist λi ≥ 0 and ξi ∈ ∂gi (¯ For every v ∈ S(¯ x) for i ∈ I(¯ x) such that P v = i∈I(¯x) λi ξi . Therefore, by the above inequality, hv, x − x ¯i =
X
i∈I(¯ x)
hλi ξi , x − x ¯i ≤ 0, ∀ x ∈ C,
b x) ⊆ NC (¯ which implies v ∈ NC (¯ x). Thus S(¯ x), which along with the fact that b x) ⊆ NC (¯ NC (¯ x) is closed implies that cl S(¯ x). Therefore, (3.9) yields 0 ∈ ∂f (¯ x) + NC (¯ x)
and hence, by Theorem 3.1 (ii), x ¯ is a point of minimizer of the convex programming problem (CP ).
© 2012 by Taylor & Francis Group, LLC
3.4 Convex Problems with Abstract Constraints
157
If gi , i ∈ I(¯ x), are smooth, X b x) = S(¯ λi ∇gi (¯ x) ∈ Rn : λi ≥ 0, i ∈ I(¯ x) . i∈I(¯ x)
b x) is a finitely generated cone and hence is closed. Therefore, it is Thus, S(¯ b x) is closed or gi , i ∈ I(¯ clear that when either S(¯ x) are smooth functions, then under the Abadie constraint qualification, the standard KKT conditions are satisfied.
3.4
Convex Problems with Abstract Constraints
After studying the convex programming problem involving only inequality constraints, in this section we turn our attention to a slightly modified version of (CP ), which we denote as (CP 1) given as min f (x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X,
(CP 1)
where we have the additional abstract constraint x ∈ X with X as a closed convex subset of Rn . The question is how to write down the KKT conditions for the problem (CP 1). Theorem 3.11 Let us consider the problem (CP 1). Assume the Slater-type constraint qualification, that is, there exists x ˆ ∈ ri X such that gi (ˆ x) < 0 for i = 1, 2, . . . , m. Then the KKT optimality conditions are necessary as well as sufficient at a point of minimizer x ¯ of (CP 1) and are given as 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x)
and
λi gi (¯ x) = 0, i = 1, 2 . . . , m.
i=1
Proof. The problem (CP 1) can be written as min f (x)
subject to
x ∈ C ∩ X,
where C is given by (3.1). Thus if x ¯ is a point of minimizer of (CP 1), then x ¯ solves the unconstrained problem min (f + δC∩X )(x),
x∈Rn
that is, x ¯ solves min (f + δC + δX )(x).
x∈Rn
© 2012 by Taylor & Francis Group, LLC
158
Basic Optimality Conditions Using the Normal Cone
By the optimality condition for unconstrained problem, Theorem 2.89, 0 ∈ ∂(f + δC + δX )(¯ x). The fact that ri dom f = Rn along with the Slater-type constraint qualification and Propositions 2.15 and 2.67 imply that xˆ ∈ ri dom f ∩ ri C ∩ ri X. Invoking the Sum Rule, Theorem 2.91, along with the facts that ∂δC (¯ x) = NC (¯ x) and ∂δX (¯ x) = NX (¯ x), the above relation leads to 0 ∈ ∂f (¯ x) + NC (¯ x) + NX (¯ x). The Slater-type constraint qualification implies the Slater constraint qualification which along with Proposition 3.3 yields X NC (¯ x) = { λi ∂gi (¯ x) : λi ≥ 0, i ∈ I(¯ x)}. i∈I(¯ x)
By choosing λi = 0, i ∈ / I(¯ x), the desired KKT optimality conditions are obtained. Conversely, by the optimality condition, there exist ξ0 ∈ ∂f (¯ x) and ξi ∈ ∂gi (¯ x), i = 1, 2 . . . , m, such that −ξ0 −
m X i=1
λi ξi ∈ NX (¯ x),
that is, hξ0 , x − x ¯i +
m X i=1
hλi ξi , x − x ¯i ≥ 0, ∀ x ∈ X.
The convexity of f and gi , i = 1, 2, . . . , m, along with the above condition leads to f (x) − f (¯ x) +
m X i=1
λi gi (x) −
m X i=1
λi gi (¯ x) ≥ 0, ∀ x ∈ X.
In particular, for any x ∈ C, the above inequality reduces to f (x) ≥ f (¯ x), ∀ x ∈ X, thereby establishing that x ¯ is a point of minimizer of (CP 1).
Next consider the problem min f (x) subject to x ∈ C = {x ∈ Rn : Ax = b}, m
(CP 2)
where A is a m × n matrix and b ∈ R . It is clear that C is a polyhedron. Further, a point x ¯ ∈ C is a point of minimizer of f over C if and only if 0 ∈ ∂f (¯ x) + NC (¯ x).
© 2012 by Taylor & Francis Group, LLC
3.5 Max-Function Approach
159
If v ∈ NC (¯ x), then x ¯ solves the following smooth problem min −hv, xi subject to Ax = b. As the constraints are affine, the KKT optimality conditions for this problem automatically hold, that is, there exists λ ∈ Rm such that −v + AT λ = 0, that is, v = AT λ. Therefore, NC (¯ x) = v ∈ Rn : v = AT λ, λ ∈ Rm .
Hence, the optimality condition is that there exists λ ∈ Rm such that −AT λ ∈ ∂f (¯ x). Using the convexity of f , the above relation implies that x¯ is a point of minimizer of (CP 2). This discussion can be stated as the following theorem. Theorem 3.12 Consider the problem (CP 2). Then x ¯ is a point of minimizer of (CP 2) if and only if there exists λ ∈ Rm such that −AT λ ∈ ∂f (¯ x).
3.5
Max-Function Approach
Until now the convex programming problems were tackled without modifying the constraint sets. But every convex programming problem can be expressed as nonsmooth convex programming with a lesser number of constraints. Consider the problem (CP ) where C is given by (3.1), that is, convex inequality constraints. Assume that the objective function f and the constraints functions gi , i = 1, 2, . . . , m, are convex and smooth. Then (CP ) can be equivalently posed as a problem with only one constraint, which is given as min f (x) n
m
where g : R → R
subject to
g(x) ≤ 0,
(CPeq )
is defined as
g(x) = max{g1 (x), g2 (x), . . . , gm (x)}. Hence g is intrinsically nonsmooth. We would like to invite the reader to deduce the optimality condition of the problem (CP ) using (CPeq ). It is clear that one needs to use the Max-Function Rule, Theorem 2.96, for evaluating the subdifferential of the max-function. Thus, at a very fundamental level,
© 2012 by Taylor & Francis Group, LLC
160
Basic Optimality Conditions Using the Normal Cone
every convex programming problem (smooth or nonsmooth) is a nonsmooth convex programming problem. The Max-Function Rule is also in some sense very fundamental to convex programming problems as can be seen in the result below. In the following result, we derive the KKT optimality conditions for the convex programming problem (CP ) with C given by (3.1) using the max-function approach. Theorem 3.13 Consider the convex programming problem (CP ) with C given by (3.1). Assume that the Slater constraint qualification holds. Then x ¯ is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
¯ i ∂gi (¯ λ x)
¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m.
and
i=1
Proof. As x ¯ is a point of minimizer of (CP ), it also solves the unconstrained problem min F (x)
subject to
x ∈ Rn ,
where F (x) = max{f (x) − f (¯ x), g1 (x), g2 (x), . . . , gm (x)}. Then by the unconstrained optimality condition, Theorem 2.89, 0 ∈ ∂F (¯ x). Applying the Max-Function Rule, Theorem 2.96, [ 0 ∈ co {∂f (¯ x) ∪ ( ∂gi (¯ x))}, i∈I(¯ x)
where I(¯ x) is the activePindex set at x ¯. Therefore, there exists λi ≥ 0, i ∈ {0} ∪ I(¯ x) satisfying i∈{0}∪I(¯x) λi = 1 such that 0 ∈ λ0 ∂f (¯ x) +
X
λi ∂gi (¯ x).
(3.10)
i∈I(¯ x)
We claim that λ0 6= 0. On the contrary, assume that λ0 = 0. Thus, the above inclusion reduces to X 0∈ λi ∂gi (¯ x), i∈I(¯ x)
that is, there exists ξi ∈ ∂gi (¯ x) such that 0=
X
i∈I(¯ x)
© 2012 by Taylor & Francis Group, LLC
λi ξi .
(3.11)
3.6 Cone-Constrained Convex Programming
161
By the convexity of gi , i ∈ I(¯ x), gi (x) = gi (x) − gi (¯ x) ≥ hξi , x − x ¯i, ∀ x ∈ Rn , i ∈ I(¯ x), which along with (3.11) implies that X λi gi (x) ≥ 0, ∀ x ∈ Rn . i∈I(¯ x)
As the Slater constraint qualification holds, there exists xˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Thus, X λi gi (ˆ x) < 0, i∈I(¯ x)
which is a contradiction of the preceding inequality. Therefore, λ0 6= 0 and hence dividing (3.10) throughout by λ0 yields X ¯ i ∂gi (¯ 0 ∈ ∂f (¯ x) + λ x), i∈I(¯ x)
¯ i = 0, i 6∈ I(¯ ¯ i = λi , i ∈ I(¯ x). Taking λ x), the above condition where λ λ0 becomes 0 ∈ ∂f (¯ x) +
m X
¯ i ∂gi (¯ λ x),
i=1
that is, the KKT optimality condition. It is easy to observe that ¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m, thus yielding the desired conditions. The sufficiency part can be worked out using the convexity of the functions, as done in the previous KKT optimality theorems.
3.6
Cone-Constrained Convex Programming
A convex optimization problem can be posed in a more general format. Consider a nonempty closed convex cone S ⊂ Rm . Then consider the problem min f (x) n
subject to
G(x) ∈ −S,
where f : R → R is a convex function and G : R function; that is, for any x, y ∈ Rn and λ ∈ [0, 1],
n
(CCP ) → R
(1 − λ)G(x) + λG(y) − G((1 − λ)x + λy) ∈ S.
© 2012 by Taylor & Francis Group, LLC
m
is a S-convex
162
Basic Optimality Conditions Using the Normal Cone
In particular, if S = Rm + , then the above problem reduces to (CP ) with C given by (3.1). If S = Rs+ × {0}m−s , then (CCP ) reduces to a convex problem with both inequality and equality constraints. If S is not these two cones, then (CCP ) is called a cone-constrained problem. We will now derive optimality conditions for a slightly more general problem that has an added abstract constraint. Consider the problem min f (x)
subject to
G(x) ∈ −S, x ∈ X,
(CCP 1)
where X ⊂ Rn is a nonempty closed convex set. There are many ways to approach this problem. Here we demonstrate one approach. Define C = {x ∈ Rn : G(x) ∈ −S}. As S and X are nonempty convex sets, by Proposition 2.14, ri S and ri X are both nonempty. Assume that the Slater-type constraint qualification holds, that is, there exist x ˆ ∈ ri X such that G(ˆ x) ∈ −ri S. The most natural approach is to observe that if x ¯ solves (CCP 1), then x ¯ is also a point of minimizer of the unconstrained problem min (f + δC∩X )(x)
subject to
x ∈ Rn .
This is equivalent to the problem min (f + δC + δX )(x). As dom f = Rn , which along with the Slater-type constraint qualification implies that x ˆ ∈ ri dom f ∩ ri C ∩ ri X. Invoking the Sum Rule, Theorem 2.91, 0 ∈ ∂f (¯ x) + ∂δC (¯ x) + ∂δX (¯ x). and thus, 0 ∈ ∂f (¯ x) + NC (¯ x) + NX (¯ x). So our main concern now is to explicitly compute NC (¯ x). How does one do that? We have already observed that it is not so straightforward to compute the normal cone when the inequality constraints are not smooth. Let us now mention that in this case also we do not consider G to be differentiable. Thus we shall now introduce the notion of a subdifferential of a cone convex function. As G is a S-convex function, we call an m × n matrix A to be a subgradient of G at x ∈ Rn G(y) − G(x) − A(y − x) ∈ S, ∀ y ∈ Rn . Then the subdifferential of the cone convex function G at x is given as ∂G(x) = {A ∈ Rm×n : G(y) − G(x) − A(y − x) ∈ S, ∀ y ∈ Rn }. The important question is whether the set ∂G(x) is nonempty.
© 2012 by Taylor & Francis Group, LLC
3.6 Cone-Constrained Convex Programming
163
It was shown, for example, in Luc, Tan, and Tinh [78] that if G is an S-convex function, then G is continuous on Rn . Further, G is also a locally Lipschitz function on Rn ; that is, for any x0 ∈ Rn , there exists a neighborhood N (x0 ) of x0 such that there exists Lx0 > 0 satisfying kG(y) − G(x)k ≤ Lx0 ky − xk, ∀ x, y ∈ N (x0 ). Observe that Lx0 depends on the chosen x0 and is also called the Lipschitz constant at x0 . Also, note that a locally Lipschitz vector function need not be differentiable everywhere. For a locally Lipschitz function G, the Clarke Jacobian of G at x is given as follows, ∂C G(x) = co A ∈ Rm×n : A = lim JG(xk ) where xk → x, xk ∈ D , k→∞
where D is the set of points on Rn at which G is differentiable and JG(y) denotes the Jacobian of G at y. In fact, there is a famous theorem of Rademacher that says that Rn \D is a set of Lebesgue measure zero. The set ∂C G(x) 6= ∅ for all x ∈ Rn and is convex and compact. For more details on the Clarke Jacobian, see for example Clarke [27] or Demyanov and Rubinov [30]. The property that will be important to us is the Clarke Jacobian as a set-valued map is locally bounded and graph closed. It was shown for example in Luc, Tan, and Tinh [78] that ∂C G(x) ⊆ ∂G(x), thereby proving that if G is an S-convex function, then ∂G(x) 6= ∅ for every x ∈ Rn . Before we proceed to develop the optimality conditions for (CCP 1), let us look at a locally Lipschitz function φ : Rn → R. Recall that a function φ : Rn → R is locally Lipschitz at x0 if there exists a neighborhood N (x0 ) of x0 and Lx0 > 0 such that |φ(y) − φ(x)| ≤ Lx0 ky − xk, ∀ x, y ∈ N (x0 ). Naturally a locally Lipschitz scalar-valued function is not differentiable everywhere and the Rademacher Theorem tells us that the set of points where φ is not differentiable forms a set of measure zero. Therefore, at any x ∈ Rn , the Clarke generalized gradient or Clarke subdifferential is given as ˜ ∂ ◦ φ(x) = co {ξ ∈ Rn : ξ = lim ∇φ(xk ) where xk → x, xk ∈ D}, k→∞
˜ denotes the set of points at which φ is differentiable. One can observe where D that if m = 1, the Clarke Jacobian reduces to the Clarke subdifferential. The Clarke subdifferential is nonempty, convex, and compact. If x¯ is a local minimum of φ over Rn , then 0 ∈ ∂ ◦ φ(¯ x). It is important to note that this condition is necessary but not sufficient. Now we state a calculus rule that will be useful in our computation of the normal cone. The Sum Rule is from Clarke [27].
© 2012 by Taylor & Francis Group, LLC
164
Basic Optimality Conditions Using the Normal Cone Consider two locally Lipschitz functions φ1 , φ2 : Rn → R. Then ∂ ◦ (φ1 + φ2 )(x) ⊆ ∂ ◦ φ1 (x) + ∂ ◦ φ2 (x). If one of the functions is continuously differentiable, then equality holds.
The Chain Rule that we state is from Demyanov and Rubinov [30] (see also Dutta [36]). Consider the function φ ◦ Φ where Φ : Rn → Rm and φ : Rm → R are locally Lipschitz functions. Assume that φ is continuously differentiable. Then ∂ ◦ (φ ◦ Φ)(x) = {z T ∇φ(Φ(x)) ∈ Rn : z ∈ ∂C Φ(x)}. Observe that v ∈ NC (¯ x) (in the current context of (CCP 1)) if and only if x ¯ is a point of minimizer of the problem min − hv, xi
subject to n
G(x) ∈ −S.
(N P )
For simplicity, assume that C = {x ∈ R : G(x) ∈ −S} is an n-dimensional convex set. The approach to derive the necessary and sufficient condition for optimality is due to Rockafellar [100] (see also Chapter 6 of Rockafellar and Wets [101]). As the above problem is a convex programming problem, x¯ is a global point of minimizer of (N P ). Further, without loss of generality, we can assume it to be a unique. Observe that if we define f (x) = −hv, xi
and
f˜(x) = −hv, xi + εkx − x ¯k2 ,
then ∂f (¯ x) = ∂ f˜(¯ x) = {−v} and x ¯ is the unique minimizer of f˜ because it is a strictly convex function. Consider an n- dimension convex compact set Y ⊂ Rn such that x ¯ ∈ int Y and C ∩ Y 6= ∅ (Figure 3.2). It is simple to see that x ¯ is also the unique minimizer of the problem min f (x)
subject to
G(x) ∈ −S, x ∈ Y.
(N P 1)
Also observe that the normal cone NC (¯ x) = NC∩Y (¯ x). (We would urge the readers to think why). Our approach depends on the use of penalization, a method popular for designing algorithms for constrained optimization. Consider the problem d min fˆ(x, u) = f (x) subject to G(x) − u = 0, (x, u) ∈ Y × (−S). (N P)
As x ¯ is the unique point of minimizer of (N P 1), we deduce that d (¯ x, u ¯) = (¯ x, G(¯ x)) is the unique point of minimizer of (N P ). For a sequence of εk ↓ 0, consider the sequence of penalty approximations 1 kG(x) − uk2 min fˆk (x, u) = f (x) + 2εk d subject to (x, u) ∈ Y ×(−S). (N P k) © 2012 by Taylor & Francis Group, LLC
3.6 Cone-Constrained Convex Programming
165
Y C
C ∩Y
11 00 00 11 x¯
FIGURE 3.2: C ∩ Y . Consider the following closed set Sk = {(x, u) ∈ Y × (−S) : fˆ(x, u) ≤ fˆ(¯ x, u ¯) = f (¯ x)}. Note that u ¯ = G(¯ x). Also, Sk is nonempty as (¯ x, u ¯) ∈ Sk for each k. Because d Sk is nonempty for each k, solving (N P k ) is same as minimizing fˆk over Sk . Denote the minimum of f over the compact set Y by µ. For any (x, u) ∈ Sk , fˆk (x, u) ≤ fˆk (¯ x, u ¯) = f (¯ x),
which implies f (x) +
1 kG(x) − uk2 ≤ f (¯ x), 2εk
where (x, u) ∈ Sk ⊂ Y × (−S). As f (x) ≥ µ for every x ∈ Y , µ+
1 kG(x) − uk2 ≤ f (¯ x), 2εk
which leads to kG(x) − uk ≤ Thus for any given k,
p 2εk (f (¯ x) − µ).
S k ⊆ {(x, u) ∈ Y × (−S) : kG(x) − uk ≤
© 2012 by Taylor & Francis Group, LLC
p
2εk (f (¯ x) − µ)}.
(3.12)
166
Basic Optimality Conditions Using the Normal Cone
Also, for a fixed k, kuk ≤ kG(x)k +
p
2εk (f (¯ x) − µ).
As G is an S-convex function, it is also locally Lipschitz and hence G(Y ) is a compact set. This shows that the right-hand side of (3.12) is bounded for a fixed k. From this, along with the compactness of Y , we can deduce d that Sk is compact and thus fˆk achieves a minimum over Sk . Hence, (N P k) has a point of minimizer that naturally need not be unique. Denote a point d of minimizer of (N P k ) by (xk , uk ) and thus, obtaining a bounded sequence {(xk , uk )} ⊂ Y × (−S), which satisfies p x) − µ), kG(xk ) − uk k ≤ 2εk (f (¯ and as (xk , uk ) ∈ Sk ,
f (xk ) ≤ fˆk (xk , uk ) ≤ f (¯ x). Because {(xk , uk )} is bounded, by the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, assume that xk → x ˜ and uk → u ˜. Therefore, as k → ∞, εk → 0 and thus kG(˜ x) − u ˜k = 0
and
f (˜ x) ≤ f (¯ x).
d Hence, u ˜ = G(˜ x) and thus (˜ x, u ˜) is also a minimizer of (N P ). But as (¯ x, u ¯) is d the unique point of minimizer of (N P ), we have x ˜=x ¯ and u ˜=u ¯. d Because (xk , uk ) is a point of minimizer of (N P k ), it is a simple exercise to ˆ see that xk minimizes fk (x, uk ) over Y and uk minimizes fˆk (xk , u) over −S. Hence, 0 ∈ ∂x◦ fˆk (xk , uk ) + NY (xk ), 0 ∈ ∂ ◦ fˆk (xk , uk ) + N−S (uk ). u
(3.13) (3.14)
Now we analyze these conditions in more detail. Denote yk =
1 (G(xk ) − uk ). εk
From condition (3.14), −∇u fˆk (xk , uk ) ∈ N−S (uk ). One can easily compute ∇u fˆk (xk , uk ) to see that yk = −∇u fˆk (xk , uk ) and hence yk ∈ N−S (uk ). Moreover, applying the Sum Rule and the Chain Rule for a locally Lipschitz to (3.13), then for each k, 0 ∈ −v + ∂C G(xk )T yk + NY (xk ).
© 2012 by Taylor & Francis Group, LLC
(3.15)
3.6 Cone-Constrained Convex Programming
167
Suppose that {yk } is bounded and thus by the Bolzano–Weierstrass Theorem has a convergent subsequence. Without loss of generality, suppose that yk → y¯. Noting that the normal cone is graph closed as a set-valued map and ∂C G is locally bounded, taking the limit as k → ∞ in (3.15) leads to 0 ∈ −v + ∂C G(¯ x)T y¯ + NY (¯ x). But as x ¯ ∈ int Y , by Example 2.38 NY (¯ x) = {0}. As ∂C G(¯ x) ⊂ ∂G(¯ x), v ∈ ∂C G(¯ x)T y¯ ⊂ ∂G(¯ x)T y¯. Thus, v = z T y¯ for some z ∈ ∂G(¯ x). The important question is can {yk } be unbounded? We show that if {yk } is unbounded, then the Slater constraint qualification, that is, there exists x ˆ ∈ Rn such that G(ˆ x) ∈ −ri S is violated. On the contrary, assume that {yk } is unbounded and thus kyk k → ∞ as k → ∞. Hence, noting that ∂C G(xk ) ⊂ ∂G(xk ), from (3.15) we have 0∈
1 yk 1 (−v) + ∂G(xk )T + NY (xk ), kyk k kyk k kyk k
which implies 0∈
1 (−v) + ∂G(xk )T wk + NY (xk ), kyk k
(3.16)
yk . Hence, {wk } is a bounded sequence and thus by the kyk k Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence. Without loss of generality, assume that wk → w ¯ with kwk ¯ = 1. Hence from (3.16), 0 ∈ ∂G(¯ x)T w. ¯ (3.17) where wk =
As yk ∈ N−S (uk ), we have wk ∈ N−S (uk ). Again using the fact that the normal cone map has a closed graph, w ¯ ∈ N−S (¯ u). Hence, hw, ¯ z − G(¯ x)i ≤ 0, z ∈ −S. Because S is a cone, 0 ∈ −S, thus hw, ¯ −G(¯ x)i ≤ 0, that is, hw, ¯ G(¯ x)i ≥ 0.
(3.18)
Consider p ∈ −S. As S is a convex cone, by Theorem 2.20, G(¯ x) + p ∈ −S. Hence, hw, ¯ G(¯ x) + p − G(¯ x)i ≤ 0, which implies hw, ¯ pi ≤ 0. Because p was arbitrary, hw, ¯ pi ≤ 0, ∀ p ∈ −S.
© 2012 by Taylor & Francis Group, LLC
168
Basic Optimality Conditions Using the Normal Cone
Thus, w ¯ ∈ S + . Hence, h¯ x, G(¯ x)i ≤ 0, which together with (3.18) leads to hw, ¯ G(¯ x)i = 0. For any y ∈ Rn , G(y) − G(¯ x) − A(y − x ¯) ∈ S, ∀ A ∈ ∂G(¯ x), which implies hw, ¯ G(y)i − hw, ¯ G(¯ x)i − hw, ¯ A(y − x ¯)i ≥ 0, ∀ A ∈ ∂G(¯ x). From (3.17), if {yk } is unbounded, there exists z¯ ∈ ∂G(¯ x) such that z¯T w ¯ = 0. Thus, from the above inequality we have hw, ¯ G(y)i − hw, ¯ G(¯ x)i − h¯ z T w, ¯ (y − x ¯)i ≥ 0, ∀ y ∈ Rn , which along with hw, ¯ G(¯ x)i = 0 and z¯T w ¯ = 0 yields hw, ¯ G(y)i ≥ 0, ∀ y ∈ Rn . If the Slater constraint qualification holds, there exists xˆ ∈ Rn such that G(ˆ x) ∈ −ri S. As kwk ¯ = 1 and w ¯ ∈ S + , hw, ¯ G(ˆ x)i < 0, which contradicts the above inequality. Therefore, {yk } cannot be an unbounded sequence. Thus, we leave it to the reader to see that v = z T y¯,
where
y¯ ∈ S + ,
and hence conclude that NC (¯ x) = {v ∈ Rn : there exists y¯ ∈ S + , z ∈ ∂G(¯ x) satisfying h¯ y , G(¯ x)i = 0 such that v = z T y¯}. The reader is now urged to write the necessary and sufficient optimality conditions for the problem (CCP 1), as the structure of the normal cone to C at x ¯ is now known.
© 2012 by Taylor & Francis Group, LLC
Chapter 4 Saddle Points, Optimality, and Duality
4.1
Introduction
In the previous chapter, the KKT optimality conditions was studied using the normal cone as one of the main vehicles of expressing the optimality conditions. One of the central issues in the previous chapter was the computation of the normal cone at the point of the feasible set C where the set C was explicitly described by the inequality constraints. In this chapter our approach to the KKT optimality condition will take us deeper into convex optimization theory and also we can avoid the explicit computation of the normal cone. This approach uses the saddle point condition of the Lagrangian function associated with (CP ). We motivate the issue using two-person-zero-sum games. Consider a two-person-zero-sum game where we denote the players as Player 1 and Player 2 having strategy sets X ⊂ Rn and Λ ⊂ Rm , respectively, which we assume to be compact for simplicity. In each move of the game, the players reveal their choices simultaneously. For every choice x ∈ X by Player 1 and λ ∈ Λ by Player 2, an amount L(x, λ) is paid by Player 1 to Player 2. Now Player 1 behaves in the following way. For any given choice of strategy x ∈ X, he would like to know what the maximum amount he would have to give to Player 2. In effect, he computes the function φ(x) = max L(x, λ). λ∈Λ
Further, it is natural that he would choose an x ∈ X that minimizes φ(x), that is, Player 1 solves the problem min φ(x)
subject to
x ∈ X,
which implies that in effect, he solves a minimax problem min max L(x, λ).
x∈X λ∈Λ
Similarly, Player 2 would naturally want to know what the guaranteed amount he will receive once he makes a move λ ∈ Λ. This means he computes the function ψ(λ) = min L(x, λ). x∈X
169 © 2012 by Taylor & Francis Group, LLC
170
Saddle Points, Optimality, and Duality
Of course he would like to maximize the amount of money he gets and therefore solves the problem max ψ(λ)
subject to
λ ∈ Λ,
that is, he solves max min L(x, λ). λ∈Λ x∈X
Thus, in every game there are two associated optimization problems. The minimization problem for Player 1 and the maximization problem for Player 2. In the optimization literature, the problem associated with Player 1 is called the primal problem while that associated with Player 2 is called the dual problem. Duality is a deep issue in modern optimization theory. In this chapter, we will have quite a detailed discussion on duality in convex optimization. The game is said to have a value if min max L(x, λ) = max min L(x, λ).
x∈X λ∈Λ
λ∈Λ x∈X
The above relation is the minimax equality. ˜ ∈ Λ, For any given λ ˜ ≤ min max L(x, λ). min L(x, λ)
x∈X
x∈X λ∈Λ
˜ ∈ Λ is arbitrary, we obtain the minimax inequality, that is, Because λ max min L(x, λ) ≤ min max L(x, λ), λ∈Λ x∈X
x∈X λ∈Λ
which always holds true. Of course the minimax equality would hold true if a saddle point exists, ¯ ∈ X × Λ exists that satisfies the following inequality, that is, a pair (¯ x, λ) ¯ ≤ L(x, λ), ¯ ∀ x ∈ X, ∀ λ ∈ Λ. L(¯ x, λ) ≤ L(¯ x, λ) The above relation is called the saddle point condition. It is easy to observe ¯ ∈ X × Λ is a saddle point if and only if that (¯ x, λ) ¯ = min L(x, λ). ¯ max L(¯ x, λ) = L(¯ x, λ) x∈X
λ∈Λ
The above condition implies ¯ ≤ max min L(x, λ), min max L(x, λ) ≤ max L(¯ x, λ) = min L(x, λ)
x∈X λ∈Λ
λ∈Λ
x∈X
λ∈Λ x∈X
which along with the minimax inequality yields the minimax equality. Before moving on to study the optimality of the convex programming
© 2012 by Taylor & Francis Group, LLC
4.2 Basic Saddle Point Theorem
171
problem (CP ) via the saddle point approach, we state the Saddle Point Theorem (Proposition 2.6.9, Bertsekas [12]) for which we will need the following notations. ¯ as For each λ ∈ Λ, define the proper function φλ : Rn → R L(x, λ), x ∈ X, φλ (x) = +∞, otherwise, ¯ is given by and for every x ∈ X, the proper function ψx : Rm → R −L(x, λ), λ ∈ Λ, ψx (λ) = +∞, otherwise. Proposition 4.1 (Saddle Point Theorem) Assume that for every λ ∈ Λ, φλ and for every x ∈ X, ψx are lsc and convex. The set of saddle points of L is nonempty and compact under any one of the following conditions: (i) X and Λ are compact. ¯ ∈ Λ and α ∈ R such that the set (ii) Λ is compact and there exists λ ¯ ≤ α} {x ∈ X : L(x, λ) is nonempty and compact. (iii) X is compact and there exists x ¯ ∈ X and α ∈ R such that the set {λ ∈ Λ : L(¯ x, λ) ≥ α} is nonempty and compact. ¯ ∈ Λ, and α ∈ R such that (iv) There exist x ¯ ∈ X, λ ¯ ≤ α} {x ∈ X : L(x, λ)
and
{λ ∈ Λ : L(¯ x, λ) ≥ α}
are nonempty and compact. This proposition will play a pivotal role in the study of enhanced optimality conditions in Chapter 5.
4.2
Basic Saddle Point Theorem
The saddle point condition can itself be taken as an optimality condition for the problem of Player 1, that is, min φ(x)
© 2012 by Taylor & Francis Group, LLC
subject to
x ∈ X.
172
Saddle Points, Optimality, and Duality
Our question is, can we construct a function like L(x, λ) for the convex (CP ) for which f (x) can be represented in a way as φ(x) has been represented through L(x, λ)? Note that if we remove the compactness from Λ, then φ(x) could take up +∞ value for some x. It is quite surprising that for the objective function f (x) of (CP ), such a function can be obtained by considering the classical Lagrangian function from calculus. For the problem (CP ) with inequality constraints, we construct the Lagrangian function L : Rn × Rm + → R as L(x, λ) = f (x) +
m X
λi gi (x)
i=1
with λ = (λ1 , λ2 , . . . , λm ) ∈ Rm + . Observe that it is a simple matter to show that f (x), x is feasible, sup L(x, λ) = +∞, otherwise. λ∈Rm + Here, the Lagrangian function L(x, λ) is playing the role of L(x, λ). So the next pertinent question is, if we can solve (CP ) then does there exist a saddle point for it? Does the existence of a saddle point for L(x, λ) guarantee that a solution to the original problem (CP ) is obtained? The following theorem answers the above questions. Recall the convex programming problem min f (x)
subject to
x∈C
(CP )
with C given by C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
(4.1)
where gi : Rn → R, i = 1, 2, . . . , m, are now assumed to be convex and non-affine functions. Theorem 4.2 Consider the convex programming problem (CP ) with C given by (4.1). Assume that the Slater constraint qualification holds, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Then x ¯ is a point of ¯ = (λ ¯1, λ ¯2, . . . , λ ¯ m ) ∈ Rm minimizer of (CP ) if and only if there exists λ + ¯ satisfying the complementary slackness condition, that is, λi gi (¯ x) = 0 for i = 1, 2, . . . , m and the saddle point condition ¯ ≤ L(x, λ), ¯ ∀ x ∈ Rn , λ ∈ Rm . L(¯ x, λ) ≤ L(¯ x, λ) + Proof. As x ¯ is a point of minimizer of (CP ) the following system f (x) − f (¯ x) < 0 gi (x) < 0, i = 1, 2, . . . , m
© 2012 by Taylor & Francis Group, LLC
,
4.2 Basic Saddle Point Theorem
173
has no solution. Define a set Λ = {(y0 , y) ∈ R × Rm : there exists x ∈ Rn such that
f (x) − f (¯ x) < y0 , gi (x) < yi , i = 1, 2, . . . , m}.
We leave it the reader to prove that the set Λ is convex and open. It is clear that (0, 0) ∈ / Λ. Hence, by the Proper Separation Theorem, Theorem 2.26 (iv), there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that λ0 y0 +
m X i=1
λi yi ≥ 0, ∀ (y0 , y) ∈ Λ.
(4.2)
Corresponding to x ¯ ∈ Rn , for yi > 0, i = 0, 1, . . . , m, (y0 , y) ∈ Λ. Also, for any γ > 0, (y0 + γ, y) ∈ Λ. Therefore, from condition (4.2), m X 1 λi yi }, λ0 ≥ − {λ0 y0 + γ i=1
which as the limit γ → ∞ leads to λ0 ≥ 0. It is now left to the reader to prove in a similar fashion that λ ∈ Rm +. For any x ∈ Rn , consider a fixed αi > 0, i = 0, 1, . . . , m. Then for any γi > 0, i = 0, 1, . . . , m, (f (x) − f (¯ x) + γ0 α0 , g1 (x) + γ1 α1 , . . . , gm (x) + γm αm ) ∈ Λ. Therefore, from (4.2), λ0 (f (x) − f (¯ x) + γ0 α0 ) +
m X i=1
λi (gi (x) + γi αi ) ≥ 0.
As γi → 0, the above inequality yields λ0 (f (x) − f (¯ x)) +
m X i=1
λi gi (x) ≥ 0, ∀ x ∈ Rn .
(4.3)
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0, thereby reducing (4.3) to m X i=1
λi gi (x) ≥ 0, ∀ x ∈ Rn .
This violates the Slater constraint qualification. Thus, λ0 > 0. Therefore, ¯ i = λi , the condition (4.3) yields denoting λ λ0 f (x) − f (¯ x) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
¯ i gi (x) ≥ 0, ∀ x ∈ Rn . λ
174
Saddle Points, Optimality, and Duality Pm ¯ In particular, x = x ¯ in the above inequality leads to i=1 λ x) = 0. Because i gi (¯ the sum of negative numbers is zero only if each term is zero, the complemen¯ i gi (¯ tary slackness condition, that is, λ x) = 0, i = 1, 2, . . . , m, holds. Therefore, the preceding inequality leads to f (x) +
m X i=1
¯ i gi (x) ≥ f (¯ λ x) +
m X i=1
¯ i gi (¯ λ x), ∀ x ∈ Rn ,
which implies ¯ ≥ L(¯ ¯ ∀ x ∈ Rn . L(x, λ) x, λ), Pm Further, for any λ ∈ Rm x) ≤ 0. Thus, +, i=1 λi gi (¯ f (¯ x) +
m X i=1
λi gi (¯ x) ≤ f (¯ x) = f (¯ x) +
m X
¯ i gi (¯ λ x),
i=1
that is, ¯ ∀ λ ∈ Rm , L(¯ x, λ) ≤ L(¯ x, λ), + thereby establishing the saddle point condition. ¯ ∈ Rm such that the saddle point Conversely, suppose that there exists λ + condition and the complementary slackness condition hold at x¯. We first prove that x ¯ is feasible, that is, −g(¯ x) = (−g1 (¯ x), −g2 (¯ x), . . . , −gm (¯ x)) ∈ Rm + . On m m the contrary, assume that −g(¯ x) 6∈ R+ . As R+ is a closed convex cone, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists λ ∈ Rm + with λ 6= 0 such that hλ, g(¯ x)i =
m X
λi gi (¯ x) > 0.
i=1
Therefore, f (¯ x) +
m X
λi gi (¯ x) > f (¯ x),
i=1
¯ thereby contradicting the saddle point conwhich implies L(¯ x, λ) > L(¯ x, λ), dition. Hence, x ¯ is feasible to (CP ). ¯ ≤ L(x, λ) ¯ and the complementary slackness condition is Because L(¯ x, λ) satisfied, f (¯ x) ≤ f (x) +
m X i=1
λi gi (x) ≤ f (x), ∀ x ∈ C.
Thus, x ¯ is a point of minimizer of (CP ).
© 2012 by Taylor & Francis Group, LLC
4.3 Affine Inequalities and Equalities and Saddle Point Condition
175
¯ is a saddle The consequence of the saddle point criteria is simple. If (¯ x, λ) point associated with the Lagrangian function of (CP ) where x ¯ is a point of minimizer of f over C, then ¯ = min L(x, λ) ¯ L(¯ x, λ) n x∈R
¯ i gi (¯ with λ x) = 0 for i = 1, 2, . . . , m. Therefore, by the optimality condition for the unconstrained problem, Theorem 2.89, ¯ 0 ∈ ∂x L(¯ x, λ), which under Slater constraint qualification yields 0 ∈ ∂f (¯ x) +
m X
¯ i ∂gi (¯ λ x),
i=1
thus leading to the KKT optimality conditions for (CP ).
4.3
Affine Inequalities and Equalities and Saddle Point Condition
Observe that in the previous section we had mentioned that the convex function are non-affine. This eventually has to do with the Slater constraint qualification. Consider the set C = {(x1 , x2 ) ∈ R2 : x1 + x2 ≤ 0, − x1 ≤ 0}. This set is described by affine inequalities. However, C = {(0, 0)} and hence the Slater constraint qualification fails. The question is whether in such a situation the saddle point condition exists or not. What we show below is that the presence of affine inequalities does not affect the saddle point condition. In fact, we should only bother about the Slater constraint qualification for the convex non-affine inequalities. The presence of affine inequalities by itself is a constraint qualification. To the best of our knowledge, the first study in this respect was due to Jaffray and Pomerol [65]. We present their result establishing the saddle point criteria under a modified version of Slater constraint qualification using the separation theorem. For that we now consider the feasible set C of the convex programming problem (CP ) defined by convex non-affine and affine inequalities as C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , l},
(4.4)
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions while hj : Rn → R,
© 2012 by Taylor & Francis Group, LLC
176
Saddle Points, Optimality, and Duality
j = 1, 2, . . . , l, are affine functions. Observe that C is a convex set. Corresponding to this convex programming problem (CP ), the associated Lagrangian l function L : Rn × Rm + × R+ is defined as L(x, λ, µ) = f (x) +
m X
λi gi (x) +
i=1
l X
µj hj (x).
j=1
¯ µ Then (¯ x, λ, ¯) is the saddle point of (CP ) with C given by (4.4) if ¯ µ ¯ µ L(¯ x, λ, µ) ≤ L(¯ x, λ, ¯) ≤ L(x, λ, ¯). We shall now present the proof of Jaffray and Pomerol in a more detailed and simplified manner. Theorem 4.3 Consider (CP ) with C defined by (4.4). Assume that the modified Slater constraint qualification holds, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m, and hj (ˆ x) ≤ 0, j = 1, 2, . . . , l. Then x ¯ is a point of l ¯ µ minimizer of (CP ) if and only if there exist (λ, ¯) ∈ Rm + × R+ such that l ¯ µ ¯ µ L(¯ x, λ, µ) ≤ L(¯ x, λ, ¯) ≤ L(x, λ, ¯), ∀ x ∈ Rn , λ ∈ Rm + , µ ∈ R+
along with the complementary slackness conditions, that is, ¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m
and
µ ¯j hj (¯ x) = 0, j = 1, 2, . . . , l.
Proof. Consider an index set J as the (possibly empty) maximal subset exists αj > 0 for j ∈ J such that P of {1, 2, . . . , l} such that there n α h (x) = 0 for every x ∈ R . Observe that for every x ∈ C, j j j∈J hj (x) = 0, ∀ j ∈ J.
Otherwise, if for some x ∈ C and for some j ∈ J, hj (x) < 0, the maximality of J is contradicted. Define the Lagrange covers of (CP ) as Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ Rn such that f (x) − f (¯ x) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) ≤ zj , j ∈ J c , hj (x) = zj , j ∈ J},
where J c = {j ∈ {1, 2, . . . , l} : j ∈ / J}. We claim that the set Λ is convex. Consider (y01 , y 1 , z 1 ) and (y02 , y 2 , z 2 ) in Λ with x1 and x2 the respective associated elements from Rn . For any λ ∈ [0, 1], x = λx1 + (1 − λ)x2 ∈ Rn . By the convexity of f and gi , i = 1, 2, . . . , m, f (x) − f (¯ x) ≤ λ(f (x1 ) − f (¯ x)) + (1 − λ)(f (x2 ) − f (¯ x)) 1 2 ≤ λy0 + (1 − λ)y0 , gi (x)
≤ λgi (x1 ) + (1 − λ)gi (x2 ) ≤ λyi1 + (1 − λ)yi2 , i = 1, 2, . . . , m,
© 2012 by Taylor & Francis Group, LLC
4.3 Affine Inequalities and Equalities and Saddle Point Condition
177
while the affineness of hj , j = 1, 2, . . . , l leads to hj (x) = λhj (x1 ) + (1 − λ)hj (x2 ) ≤ λzj1 + (1 − λ)zj2 , j ∈ J c , hj (x) = λhj (x1 ) + (1 − λ)hj (x2 ) = λzj1 + (1 − λ)zj2 , j ∈ J.
Thus, for every λ ∈ [0, 1], λ(y01 , y 1 , z 1 ) + (1 − λ)(y02 , y 2 , z 2 ) ∈ Λ with x ∈ Rn as the associated element, thereby implying the convexity of Λ. Observe that corresponding to the point of minimizer of (CP ), x ¯ ∈ Rn , (¯ y0 , 0, 0) ∈ Λ if and only if y¯0 ≥ 0. Also, (y0 , 0, 0) belongs to the affine hull of Λ for every y0 ∈ R, and hence, (0, 0, 0) belongs to the relative boundary of Λ. Applying the Proper Separation Theorem, Theorem 2.26 (iv), to the Lagrange cover Λ and the relative boundary point (0, 0, 0), there exists (λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that λ0 y0 +
m X
λi yi +
i=1
l X j=1
µj zj ≥ 0, ∀ (y0 , y, z) ∈ Λ
(4.5)
and for some (y0 , y, z) ∈ Λ, λ0 y0′ +
m X
λi yi′ +
i=1
l X
µj zj′ > 0.
(4.6)
j=1
Consider (y0 , y, z) ∈ Λ. For any α0 > 0 and α ∈ int Rm + , (y0 +α0 , y +α, z) ∈ Λ. Therefore, by (4.5), i′ = 0, 1, . . . , m, ′ m l iX −1 m X X X 1 λ0 y0 + λi yi + µj zj + λi αi + λi αi , λi′ ≥ − αi′ ′ i=1
i=1
i=0
i=i +1
which as the limit αi′ → +∞ yields λi′ ≥ 0, i′ = 0, 1, . . . , m. Using the above technique, we can also show that µj ≥ 0, j ∈ J c . The reader is advised to check this out. Observe that µj for j ∈ J are unrestricted. Let us proceed by assuming that J is nonempty. Therefore, there exist P αj > 0, j ∈ J such that j∈J αj hj (x) = 0 for every x ∈ Rn . Redefining λi , i = 0, 1, . . . , m, and µj , j = 1, 2, . . . , l, as ˆ i = λi , i = 0, 1, . . . , m, λ
µ ˆj = µj , j ∈ J c
and µ ˆj = µj + γαj , j ∈ J,
where γ > 0 is chosen such that µ ˆj > 0 for j ∈ J. Also, observe that X X X X µ ˆj hj (x) = µj hj (x) + γαj hj (x) = µj hj (x). j∈J
j∈J
j∈J
j∈J
ˆ 0 , λ, ˆ µ Thus, the conditions (4.5) and (4.6) hold for (λ ˆ) as well. ˆ ˆ We claim that λ0 , λi , i = 1, 2, . . . , m, and µ ˆj , j ∈ J c , are not all simulˆ ˆ i = 0, i = 1, 2, . . . , m, taneously zero. On the contrary, assume that λ0 = 0, λ
© 2012 by Taylor & Francis Group, LLC
178
Saddle Points, Optimality, and Duality
and µ ˆj = 0, j ∈ J c . Therefore, from the construction of Λ along with (4.5) yields X µ ˆj hj (x) ≥ 0, ∀ x ∈ Rn . j∈J
As x ¯ is feasible for (CP ), the above condition becomes X µ ˆj hj (¯ x) = 0. j∈J
P Therefore, the affine function j∈J µ ˆj hj (.) achieves its minimum over Rn at x ¯. Moreover, an affine function is unbounded over Rn . This shows that X µ ˆj hj (x) = 0, ∀ x ∈ Rn . j∈J
By condition (4.6), there exists x′ ∈ Rn associated to (y0′ , y ′ , z ′ ) ∈ Λ such that X µ ˆj hj (x′ ) > 0. j∈J
ˆ0, λ ˆ i , i = 1, 2, . . . , m, and Hence, a contradiction is reached. Therefore, λ c µ ˆj , j ∈ J , are not all simultaneously zero. ˆ 0 = 0 and λ ˆ i = 0, i = 1, 2, . . . , m, and for some j ∈ J c , Next suppose that λ µ ˆj > 0. Again working along the preceding lines, one obtains X X µ ˆj hj (x) + µ ˆj hj (x) = 0, ∀ x ∈ Rn . j∈J c :ˆ µj >0
j∈J
Observe that {j ∈ J c : µ ˆj > 0} is nonempty. Because the above condition holds for j ∈ {j ∈ J c : µ ˆj > 0} ∪ {j ∈ J : µ ˆj > 0}, thereby contradicting ˆ ˆ i , i = 1, 2, . . . , m, are not the maximality of the index set J. Hence λ0 and λ simultaneously zero. ˆ 0 = 0. As the modified Slater constraint qualification holds, Assume that λ there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m, and hj (ˆ x) ≤ 0, j = 1, 2, . . . , l, corresponding to x ˆ, (f (ˆ x) − f (¯ x), g1 (ˆ x), . . . , gm (ˆ x), h1 (ˆ x), . . . , hl (ˆ x)) ∈ Λ, which along with condition (4.5) and the modified Slater constraint qualification leads to 0>
m X i=1
ˆ i gi (ˆ λ x) +
l X j=1
ˆ0 = which is a contradiction. Hence λ 6 0.
© 2012 by Taylor & Francis Group, LLC
µ ˆj hj (ˆ x) ≥ 0,
4.3 Affine Inequalities and Equalities and Saddle Point Condition
179
ˆ 0 yields Now dividing (4.5) throughout by λ y0 +
m X
¯ i yi + λ
i=1
l X j=1
µ ¯j zj ≥ 0, ∀ (y0 , y, z) ∈ Λ,
(4.7)
ˆ µ ˆj ¯ i = λi , i = 1, 2, . . . , m, and µ , j = 1, 2, . . . , l. Corresponding ¯j = where λ ˆ0 ˆ0 λ λ to every x ∈ Rn , (f (x) − f (¯ x), g1 (x), . . . , gm (x), h1 (x), . . . , hl (x)) ∈ Λ, thereby reducing the inequality (4.7) to f (¯ x) ≤ f (x) +
m X
¯ i gi (x) + λ
i=1
l X j=1
µ ¯j hj (x), ∀ x ∈ Rn .
(4.8)
l ¯ µ By the feasibility of x ¯ for (CP ) and the fact that (λ, ¯) ∈ Rm + × R+ , condition (4.8) implies that
¯ µ ¯ µ L(¯ x, λ, ¯) ≤ L(x, λ, ¯), ∀ x ∈ Rn . In particular, taking x = x ¯ in (4.8), along with the feasibility of x¯, leads to m X
¯ i gi (¯ λ x) +
i=1
l X
µ ¯j hj (¯ x) = 0.
j=1
This shows that ¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m
and
µ ¯j hj (¯ x) = 0, j = 1, 2, . . . , l,
thereby establishing the complementary slackness condition. For any l (λ, µ) ∈ Rm ¯, + × R+ , again by the feasibility of x m X
λi gi (¯ x) +
l X j=1
i=1
µj hj (¯ x) ≤ 0 =
m X i=1
¯ i gi (¯ λ x) +
l X
µ ¯j hj (¯ x),
j=1
that is, l ¯ µ L(¯ x, λ, µ) ≤ L(¯ x, λ, ¯), ∀ λ ∈ Rm + , µ ∈ R+ ,
thereby leading to the desired result. The converse of the above the result can be obtained in a manner similar to Theorem 4.2. In the convex programming problem (CP ) considered by Jaffray and Pomerol [65], the problem involved only convex non-affine and affine inequalities. Next we present a similar result from Florenzano and Van [47] to derive
© 2012 by Taylor & Francis Group, LLC
180
Saddle Points, Optimality, and Duality
the saddle point criteria under a modified version of Slater constraint qualification but for a more general scenario involving additional affine equalities and abstract constraints in (4.4). Consider the feasible set C of the convex programming problem (CP ) as C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , s,
hj (x) = 0, j = s + 1, s + 2, . . . , l}, (4.9)
where gi : Rn → R, i = 1, 2, . . . , m, are convex non-affine functions; hj : Rn → R, j = 1, 2, . . . , l, are affine functions; and X ⊂ Rn is a convex set. Corresponding to this problem, the associated Lagrangian function l L : X × Rm + × R → R is defined as L(x, λ, µ) = f (x) +
m X
λi gi (x) +
i=1
l X
µj hj (x),
j=1
¯ µ x, λ, ¯) is called the saddle point of the where µ = (ˆ µ, µ ˜) ∈ Rs+ × Rl−s . Then (¯ above problem if l ¯ µ ¯ µ ¯) ≤ L(x, λ, ¯), ∀ x ∈ X, λ ∈ Rm L(¯ x, λ, µ) ≤ L(¯ x, λ, +, µ ∈ R ,
ˆ¯, µ ˜¯) are in Rs+ × Rl−s . where µ = (ˆ µ, µ ˜) and µ ¯ = (µ Theorem 4.4 Consider the convex programming problem (CP ) with C defined by (4.9). Let x ¯ be a point of minimizer of (CP ). Assume that there exists x ˆ ∈ ri X such that hj (ˆ x) ≤ 0, j = 1, 2, . . . , s, hj (ˆ x) = 0,
j = s + 1, s + 2, . . . , l.
Then there exist (λ0 , λ) ∈ R+ × Rm + with (λ0 , λ) µ = (ˆ µ, µ ˜) ∈ Rs+ × Rl−s such that λ0 f (¯ x) ≤ λ0 f (x) +
m X
λi gi (x) +
i=1
λi gi (¯ x) = 0, i = 1, 2, . . . , m,
l X j=1
and
6=
(0, 0), and
µj hj (x), ∀ x ∈ X,
µ ˆj hj (¯ x) = 0, j = 1, 2, . . . , s.
Proof. Consider the set Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) − f (¯ x) < y0 , gi (x) < yi , i = 1, 2, . . . , m, hj (x) = zj , j = 1, 2, . . . , l}. It can be easily shown as in the proof of Theorem 4.3 that Λ is a convex set.
© 2012 by Taylor & Francis Group, LLC
4.3 Affine Inequalities and Equalities and Saddle Point Condition
181
Also, Λ is nonempty because corresponding to the point of minimizer x¯ of (CP ), one can define (y0 , y, z) ∈ Λ as y0 > 0,
yi > 0, i = 1, 2, . . . , m,
and
zj = hj (¯ x), j = 1, 2, . . . , l.
As Λ is a nonempty convex set, by Proposition 2.14, ri Λ is also a nonempty convex set. Note that Λ ∩ (R1+m+s × {0Rl−s }) = ∅. − Otherwise, there exists an element in Λ such that the associated x ∈ X is feasible for (CP ) satisfying f (x) < f (¯ x), which is a contradiction to the fact that x ¯ is a point of minimizer of (CP ). Therefore, by Proposition 2.15, ri Λ ∩ ri (R1+m+s × {0Rl−s }) = ∅. − Invoking the Proper Separation Theorem, Theorem 2.26 (iv), there exists (λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that λ0 y0 +
m X
λi yi +
i=1
l X j=1
µj zj ≥ λ0 w0 +
m X
λi wi +
i=1
s X
µj vj
(4.10)
j=1
for every (y0 , y, z) ∈ Λ and (w0 , w, v) ∈ R1+m+s , and there exists (y0′ , y ′ , z ′ ) ∈ Λ such that λ0 y0′ +
m X i=1
λi yi′ +
l X
µj zj′ > 0.
(4.11)
j=1
Let us partition µ = (ˆ µ, µ ˜) ∈ Rs × Rl−s . We claim that λ0 ≥ 0, λ ∈ Rm + and µ ˆ ∈ Rs+ . Corresponding to the point of minimizer x ¯, choose y0 > 0, yi > 0, i = 1, 2, . . . , m, and zj = hj (¯ x), j = 1, 2, . . . , l. From condition (4.10), for i′ = 0, 1, . . . , m, ′
iX −1 m l m s X X X X 1 {− λi yi − λi yi − µj zj + λ0 w0 + λi wi + µj vj }. λi′ ≥ yi′ ′ i=0 j=1 i=1 j=1 i=i +1
Taking the limit as yi′ → ∞ yields λi′ ≥ 0, i′ = 0, 1, . . . , m. Again from (4.10), for j ′ = 1, 2, . . . , s, m l m X X X 1 {λ0 y0 + µ ≥ λi yi + µj zj − λ0 w0 − λi wi vj ′ i=1 j=1 i=1 j′
−
′ jX −1
j=1
µj vj −
Taking the limit as vj ′ → ∞ leads to µj ′ ≥ 0, j ′ = 1, 2, . . . , s.
© 2012 by Taylor & Francis Group, LLC
s X
j=j ′ +1
µj vj }.
182
Saddle Points, Optimality, and Duality
Now consider any x ∈ X and δ > 0. Define = f (x) − f (¯ x) + δ,
y0 yi zj
= gi (x) + δ, i = 1, 2, . . . , m, = hj (x), j = 1, 2, . . . , l.
Therefore, (y0 , y, z) ∈ Λ and for (0, 0, 0) ∈ Rm+s × {0Rl−s }, the condition − (4.10) yields that for every x ∈ X and every δ > 0, λ0 (f (x) − f (¯ x)) +
m X
λi gi (x) +
i=1
l X
µj hj (x) +
j=1
m X i=0
λi δ ≥ 0.
Because δ > 0 was arbitrarily chosen, as δ → 0 the above condition reduces to λ0 (f (x) − f (¯ x)) +
m X
λi gi (x) +
i=1
l X j=1
µj hj (x) ≥ 0, ∀ x ∈ X.
(4.12)
In particular, for x = x ¯, condition (4.12) yields m X
λi gi (¯ x) +
i=1
l X j=1
µj hj (¯ x) ≥ 0,
which along with the feasibility of x¯ for (CP ) leads to λi gi (¯ x) = 0, i = 1, 2, . . . , m,
and
µ ˆj hj (¯ x) = 0, j = 1, 2, . . . , s,
as in the proof of Theorem 4.3. We claim that (λ0 , λ) 6= (0, 0). On the contrary, suppose that (λ0 , λ) = (0, 0). Therefore, condition (4.12) leads to l X j=1
µj hj (x) ≥ 0, ∀ x ∈ X.
By the given hypothesis, for x ˆ ∈ ri X along with the above inequality implies that l X
µj hj (ˆ x) = 0,
j=1
Pl that is, the affine function j=1 µj hj (.) achieves its minimum at a relative interiorPpoint. Because an affine function achieves its minimum at a boundary l point, j=1 µj hj (.) has a constant value zero over X, that is, l X j=1
© 2012 by Taylor & Francis Group, LLC
µj hj (x) = 0, ∀ x ∈ X.
(4.13)
4.3 Affine Inequalities and Equalities and Saddle Point Condition
183
Corresponding to (y0′ , y ′ , z ′ ) ∈ Λ satisfying (4.11) there exists x′ ∈ X such that l X
µj hj (x′ ) > 0,
j=1
which contradicts (4.13). Therefore, λi , i = 0, 1, . . . , m, are not all simultaneously zero, which along with (4.12) leads to the desired result. Theorem 4.5 Consider the convex programming problem (CP ) with C defined by (4.9). Assume that the modified Slater constraint qualification is satisfied, that is there exists x ˆ ∈ ri X such that gi (ˆ x) < 0, hj (ˆ x) ≤ 0, hj (ˆ x) = 0,
i = 1, 2, . . . , m, j = 1, 2, . . . , s, j = s + 1, s + 2, . . . , l.
¯ ∈ Rm , Then x ¯ is a point of minimizer of (CP ) if and only if there exist λ + s l−s ¯ ¯ µ ¯ = (µ ˆ, µ ˜) ∈ R+ × R such that l ¯ µ ¯ µ ¯) ≤ L(x, λ, ¯), ∀ x ∈ X, λ ∈ Rm L(¯ x, λ, µ) ≤ L(¯ x, λ, +,µ ∈ R ,
where µ = (ˆ µ, µ ˜) ∈ Rs+ × Rl−s along with λi gi (¯ x) = 0, i = 1, 2, . . . , m,
and
µ ˆj hj (¯ x) = 0, j = 1, 2, . . . , s.
Proof. Because the modified Slater constraint qualification is satisfied, the hypothesis of Theorem 4.4 also holds. Thus, if x ¯ is a point of minimizer of (CP ), there exist (λ0 , λ) ∈ R+ ×Rm µ, µ ˜) ∈ Rs+ × Rl−s + with (λ0 , λ) 6= (0, 0) and µ = (ˆ such that λ0 f (¯ x) ≤ λ0 f (x) +
m X
λi gi (x) +
l X j=1
i=1
µj hj (x), ∀ x ∈ X
(4.14)
and λi gi (¯ x) = 0, i = 1, 2, . . . , m,
and
µj hj (¯ x) = 0, j = 1, 2, . . . , s.
(4.15)
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. Because (λ0 , λ) 6= (0, 0), λ = 6 0. Therefore, the optimality condition (4.14) becomes m X
λi gi (x) +
i=1
l X j=1
µj hj (x) ≥ 0, ∀ x ∈ X.
In particular, for x = x ˆ, the above condition along with the modified Slater constraint qualification leads to 0>
m X i=1
© 2012 by Taylor & Francis Group, LLC
λi gi (ˆ x) +
l X j=1
µj hj (ˆ x) ≥ 0,
184
Saddle Points, Optimality, and Duality
which is a contradiction. Thus, λ0 > 0 and hence dividing (4.14) throughout by λ0 yields f (¯ x) ≤ f (x) +
m X
¯ i gi (x) + λ
i=1
l X j=1
µ ¯j hj (x), ∀ x ∈ X,
µj ¯ i = λi , i = 1, 2, . . . , m, and µ , j = 1, 2, . . . , l. This inequality ¯j = where λ λ0 λ0 along with the condition (4.15) leads to ¯ µ ¯ µ L(¯ x, λ, ¯) ≤ L(x, λ, ¯), ∀ x ∈ X. ˆ x) ∈ Rs and h(¯ ˜ x) = {0}Rl−s . As x ¯ is feasible for (CP ), g(¯ x) ∈ −Rm + , −h(¯ + s l−s Therefore, for λ ∈ Rm , µ = (ˆ µ , µ ˜ ) ∈ R × R , + + m X
λi gi (¯ x) +
i=1
l X j=1
µj hj (¯ x) ≤ 0 =
m X
¯ i gi (¯ λ x) +
l X
µ ¯j hj (¯ x),
j=1
i=1
which leads to ¯ µ L(¯ x, λ, µ) ≤ L(¯ x, λ, ¯), thereby proving the desired saddle point result. The converse can be worked out as in Theorem 4.2. Observe that the saddle point condition in the above theorem ¯ µ ¯ µ L(¯ x, λ, ¯) ≤ L(x, λ, ¯), ∀ x ∈ X can be rewritten as f (¯ x) +
m X
¯ i gi (¯ λ x) +
s X
µ ˆj hj (¯ x) +
≤ f (x) +
µ ˜j hj (¯ x) + δX (¯ x)
j=s+1
j=1
i=1
l X
m X
λi gi (x) +
i=1
s X
µ ˆj hj (x) +
l X
µ ˜j hj (x) + δX (x)
j=s+1
j=1
for every x ∈ Rn . The above inequality implies that 0 ∈ ∂(f +
l X i=1
λi gi +
s X j=1
µ ˆj hj (x) +
l X
µ ˜j hj (x) + δX )(¯ x).
j=s+1
By the modified Tl qualification xˆ ∈ ri X and therefore, Tm Slater constraint ri dom f ∩ i=1 ri dom gi ∩ j=1 ri dom hj ∩ ri dom δX = ri X is
© 2012 by Taylor & Francis Group, LLC
4.4 Lagrangian Duality
185
nonempty. Applying the Sum Rule, Theorem 2.91 along with the fact that ∂δX (¯ x) = NX (¯ x) yields the KKT optimality condition 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) +
i=1
s X
µ ˆj ∂hj (¯ x) + ∂(
j=1
l X
µ ˜j hj )(¯ x) + NX (¯ x).
j=s+1
By the affineness of hj , j = 1, 2, . . . , l, ∂hj (¯ x) = {∇hj (¯ x)}, thereby reducing the above condition to the standard KKT optimality condition 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) +
i=1
s X j=1
µ ˆj ∇hj (¯ x) +
l X
j=s+1
µ ˜j ∇hj (¯ x) + NX (¯ x).
We state this discussion as the following result. Theorem 4.6 Consider the convex programming problem (CP ) with C defined by (4.9). Assume that the modified Slater constraint qualification is satisfied. Then x ¯ is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m; µ ˆj ≥ 0, j = 1, 2, . . . , s; and µ ˜j ∈ R, j = s + 1, . . . , l, such that 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) +
i=1
s X j=1
µ ˆj ∇hj (¯ x) +
l X
j=s+1
µ ˜j ∇hj (¯ x) + NX (¯ x)
along with λi gi (¯ x) = 0, i = 1, 2, . . . , m,
4.4
and
µ ˆj hj (¯ x) = 0, j = 1, 2, . . . , s.
Lagrangian Duality
In the beginning of this chapter, we tried to motivate the notion of a saddle point using two-person-zero-sum games. We observed that two optimization problems were being simultaneously solved. Player 1 was solving a minimization problem while Player 2 was solving a maximization problem. The maximization problem is usually referred to as the dual of the minimization problem. Similarly, corresponding to the problem (CP ), one can actually construct a dual problem following an approach quite similar to that of the two-personzero-sum games. Consider the problem (CP ) with the feasible set given by (4.9). Then if vL denotes the optimal value of (CP ), then observe that vL = inf
x∈C
sup
L(x, λ, µ ˆ, µ ˜),
(λ,ˆ µ,˜ µ)∈Ω
s l−s . Taking a clue from the two-person-zero-sum where Ω = Rm + × R+ × R games, the dual problem to (CP ) that we denote by (DP ) can be stated as
© 2012 by Taylor & Francis Group, LLC
186
Saddle Points, Optimality, and Duality sup w(λ, µ ˆ, µ ˜)
(λ, µ ˆ, µ ˜) ∈ Ω,
subject to
(DP )
where w(λ, µ ˆ, µ ˜) = min L(x, λ, µ ˆ, µ ˜). We denote the optimal value of (DP ) x∈X
by dL . Our main aim here is to check if dL =
sup
w(λ, µ ˆ, µ ˜ ) = vL ,
(4.16)
(λ,ˆ µ,˜ µ)∈Ω
that is, sup
inf L(x, λ, µ ˆ, µ ˜) = inf
(λ,ˆ µ,˜ µ)∈Ω x∈X
x∈C
sup
L(x, λ, µ ˆ, µ ˜).
(λ,ˆ µ,˜ µ)∈Ω
The statement (4.16) is known as strong duality. We now present a result that shows when strong duality holds. Theorem 4.7 Consider the problem (CP ) where the set C is defined by (4.9). Assume that (CP ) has a lower bound, that is, it has an infimum value, vL , that is finite. Also, assume that the modified Slater constraint qualification is satisfied. Then the dual problem (DP ) has a supremum and the supremum is attained with vL = d L . Proof. We always have vL ≥ dL . This is absolutely straightforward and we urge the reader to establish this. This is called weak duality. The problem (CP ) has an infimum, vL , that is, vL = inf f (x). x∈C
Working along the lines of the proof of Theorem 4.4, we conclude from (4.12) s l−s ¯ 0 , λ, ¯ µ ˆ¯, µ ˜¯) ∈ R+ × Rm that there exists nonzero (λ such that + × R+ × R ¯ 0 (f (x) − vL ) + λ
m X
¯ i gi (x) + λ
s X
ˆ¯j hj (x) + µ
j=1
i=1
l X
j=s+1
˜¯j hj (x) ≥ 0, ∀ x ∈ X. µ
As the modified Slater constraint qualification holds, by Theorem 4.5, it is ¯ 0 6= 0 and without loss of generality, assume λ ¯ 0 = 1. simple to observe that λ Hence, (f (x) − vL ) +
m X
¯ i gi (x) + λ
i=1
s X j=1
ˆ¯j hj (x) + µ
l X
j=s+1
˜¯j hj (x) ≥ 0, ∀ x ∈ X. µ
Therefore, ¯ µ ˆ¯, µ ˜¯) ≥ vL , ∀ x ∈ X, L(x, λ,
© 2012 by Taylor & Francis Group, LLC
4.4 Lagrangian Duality
187
¯ µ ˆ ˜ that is, w(λ, ¯, µ ¯) ≥ vL . Hence, sup (λ,ˆ µ,˜ µ)∈Ω
¯ µ ˆ¯, µ ˜¯) ≥ vL . w(λ, µ ˆ, µ ˜) ≥ w(λ,
By the weak duality, vL ≥ sup(λ,ˆµ,˜µ)∈Ω w(λ, µ ˆ, µ ˜). Thus, sup
w(λ, µ ˆ, µ ˜) = vL = inf f (x), x∈C
(λ,ˆ µ,˜ µ)∈Ω
thereby establishing the strong duality between (CP ) and (DP ).
It is important to note that the assumption of the Slater constraint qualification is quite crucial as its absence can give a positive duality gap. We provide below the following famous example due to Duffin [35]. Example 4.8 Consider the primal problem q x21 + x22 ≤ x1 . inf ex2 subject to
The Lagrangian dual problem is max w(λ)
subject to
λ ∈ Rm +,
where q w(λ) = inf2 ex2 + λ( x21 + x22 − x1 ), λ ≥ 0. x∈R
Observe that the only feasible point of the primal problem is (x1 , x2 ) = (0, 0) and hence inf ex2 = e0 = 1. Thus, the minimum value or the infimum value of the primal problem is vL = 1. Now let us evaluate the p function w(λ) for each λ ≥ 0. Observe that for every fixed x2 , the term ( x21 + x22 − x1 ) → 0 as x1 → +∞. Thus, for each x2 , the value ex2 dominates the expression q ex2 + λ( x21 + x22 − x1 ) as x1 → +∞. Hence, for a fixed x2 , q inf ex2 + λ( x21 + x22 − x1 ) = ex2 . x1
By letting x2 → −∞, w(λ) = 0, ∀ λ ≥ 0. Therefore, the supremum value of the dual problem is dL = 0. Hence, there is a positive duality gap. Observe that the Slater constraint qualification does not hold in the primal case.
© 2012 by Taylor & Francis Group, LLC
188
Saddle Points, Optimality, and Duality
We are now going to present some deeper properties of the dual variables (or Lagrange multipliers) for the problem (CP ) with convex non-affine inequality, that is, the feasible set C is given by (4.1), C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m}. The set of Lagrange multipliers at a given solution x¯ of (CP ) is given as M(¯ x) = {λ ∈ Rm x) + + : 0 ∈ ∂f (¯
m X
λi ∂gi (¯ x), λi gi (¯ x) = 0, i = 1, 2, . . . , m}.
i=1
It is quite natural to think that when we change x¯, the set of multipliers will also change. We now show that for a convex programming problem, the set M(¯ x) does not depend on the solution x ¯. Consider the set M = {λ ∈ Rm + : inf f (x) = infn L(x, λ)}. x∈C
x∈R
(4.17)
In the following result we show that M(¯ x) = M for any solution x ¯ of (CP ). The proof of this fact is from Attouch, Buttazzo, and Michaille [3]. Theorem 4.9 Consider the convex programming problem (CP ) with C defined by (4.1). Let x ¯ be the point of minimizer of (CP ). Then M(¯ x) = M. Proof. Suppose that λ ∈ M(¯ x). Then 0 ∈ ∂x L(¯ x, λ) with λi gi (¯ x) = 0, i = 1, 2, . . . , m, where ∂x L denotes the subdifferential with respect to x. Hence, x ¯ solves the problem min L(x, λ)
subject to
x ∈ Rn .
Therefore, for every x ∈ Rn , f (¯ x) +
m X i=1
λi gi (¯ x) ≤ f (x) +
m X
λi gi (x),
i=1
which along with λi gi (¯ x) = 0, i = 1, 2, . . . , m, implies f (¯ x) ≤ f (x) +
m X i=1
λi gi (x), ∀ x ∈ Rn .
Thus, f (¯ x) = infn (f + x∈R
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi )(x) = infn L(x, λ). x∈R
4.4 Lagrangian Duality
189
Further, f (¯ x) = inf f (x). Hence, λ ∈ M. x∈C
Conversely, suppose that λ ∈ M, which implies f (¯ x) = infn (f + x∈R
m X
λi gi )(x).
i=1
Therefore, f (¯ x) ≤ f (¯ x) +
m X
λi gi (¯ x),
i=1
thereby yielding m X i=1
λi gi (¯ x) ≥ 0.
The above inequality along with the feasibility of x¯ for (CP ) and nonnegativity of λi , i = 1, 2, . . . , m, leads to m X
λi gi (¯ x) = 0.
i=1
This further yields λi gi (¯ x) = 0, i = 1, 2, . . . , m. Thus, f (¯ x) +
m X
λi gi (¯ x) = infn (f + x∈R
i=1
m X
λi gi )(x),
i=1
which implies that x ¯ solves the problem min L(x, λ)
subject to
x ∈ Rn .
Therefore, 0 ∈ ∂x L(¯ x, λ). As dom f = dom gi = Rn , i = 1, 2, . . . , m, applying the Sum Rule, Theorem 2.91, 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x).
i=1
This combined with the fact that λi gi (¯ x) = 0, i = 1, 2, . . . , m, shows that λ ∈ M(¯ x), thereby establishing that M(¯ x) = M. Remark 4.10 In the above theorem, x ¯ was chosen to be any arbitrary solution of (CP ). Thus, it is clear that M(¯ x) is independent of the choice of x ¯ and hence M(¯ x) = M for every solution x ¯ of (CP ).
© 2012 by Taylor & Francis Group, LLC
190
Saddle Points, Optimality, and Duality
Note that the above result can be easily extended to the problem with feasible set C defined by (4.9), that is, convex non-affine and affine inequalities along with affine equalities. If we take a careful look at the set M, we realize that for λ ∈ Rm + it is not essential that (CP ) has a solution; one merely needs (CP ) to be bounded below. Thus Attouch, Buttazzo, and Michaille [3] call the set M to be the set of generalized Lagrange multipliers. Of course if (CP ) has a solution, then M is the set of Lagrange multipliers. We now show how deeply the notion of Lagrange multipliers is associated with the perturbation of the constraints of the problem. From a numerical point of view, it is important to deal with constraint perturbations. Note that due to rounding off and other errors, often the iterates do not satisfy the constraints exactly but some perturbed version of it, that is, possibly in the form gi (x) ≤ yi , i = 1, 2, . . . , m. Thus, the function v(y) = inf{f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m} is called the value function or the marginal function associated with (CP ). It is obvious that if v(0) ∈ R, then v(0) is the optimal value of (CP ). We now ¯ is a convex function. In order to show that, we establish that v : Rm → R need the following interesting and important lemma. Lemma 4.11 Consider Φ : Rn × Rm → R ∪ {+∞}, which is convex in both variables. Then the function φ(v) = infn Φ(u, v) u∈R
is a convex function in v. Proof. Consider (vi , αi ) ∈ epis φ, i = 1, 2, that is, φ(vi ) < αi , i = 1, 2. Therefore, there exist u ¯1 , u ¯2 ∈ Rn such that by the definition of infimum, Φ(¯ ui , vi ) < αi , i = 1, 2. By the convexity of Φ, for every λ ∈ [0, 1], Φ((1 − λ)¯ u1 + λ¯ u2 , (1 − λ)v1 + λv2 ) ≤ (1 − λ)Φ(¯ u1 , v1 ) + λΦ(¯ u2 , v2 ) < (1 − λ)α1 + λα2 ,
which implies φ((1 − λ)v1 + λv2 ) < (1 − λ)α1 + λα2 , ∀ λ ∈ [0, 1].
© 2012 by Taylor & Francis Group, LLC
4.4 Lagrangian Duality
191
Thus ((1 − λ)v1 + λv2 ), (1 − λ)α1 + λα2 ) ∈ epis φ, which by Proposition 2.50 leads to the convexity of φ.
Observe that the value function can be expressed as v(y) = infn {f (x) + δC(y) (x)}, x∈R
(4.18)
where C(y) = {x ∈ Rn : gi (x) ≤ yi , i = 1, 2, . . . , m}. Now to prove the convexity of the value function, what one needs to show is that f (x)+δC(y) (x) is convex in both the variables x as well as y, and we leave it to the reader. Once that is done, we just have to use Lemma 4.11 to conclude that v is a convex function. Through the following result given in Attaouch, Buttazzo, and Michaille [3], we show how the Lagrange multipliers (or the generalized Lagrange multipliers) are related to the value function. Theorem 4.12 (i) Let v(0) ∈ R, then M = −∂v(0). Further, if the Slater constraint qualification holds, then v is continuous at the origin and hence M is convex compact set in Rm +. (ii) Consider the problem sup − v ∗ (−λ)
subject to
λ ∈ Rm +.
(DP 1)
The solutions of (DP 1) coincide with the set M. Further, for every λ ∈ Rm +, −v ∗ (−λ) = infn {f (x) + x∈R
m X
λi gi (x)}.
i=1
Thus the problem (DP 1) coincides with the Lagrangian dual problem of (CP ). Proof. (i) We begin by proving M = −∂v(0). Consider any λ ∈ M. By the definition of the value function v (4.18) and M (4.17), v(0) = infn {f + δC }(x) = infn {f + x∈R
x∈R
m X i=1
λi gi }(x).
For any given y ∈ Rm , consider the set C(y) = {x ∈ Rn : gi (x) ≤ yi , i = 1, 2, . . . , m}. As λ ∈ Rm + , for any x ∈ C(y), m X i=1
© 2012 by Taylor & Francis Group, LLC
λi gi (x) ≤
m X i=1
λi yi ,
192
Saddle Points, Optimality, and Duality
which implies that m X
f (x) +
i=1
λi gi (x) ≤ f (x) +
m X
λi yi .
i=1
Therefore, inf {f +
x∈C(y)
m X i=1
λi gi }(x) ≤
inf f (x) +
x∈C(y)
m X
λi yi .
(4.19)
i=1
Because C(y) ⊂ Rn , by Proposition 1.7, infn {f +
x∈R
m X i=1
λi gi }(x) ≤
inf {f +
x∈C(y)
m X i=1
λi gi }(x).
As λ ∈ M, by (4.17) along with (4.19) leads to v(0) ≤ v(y) +
m X
λi yi ,
i=1
that is, v(y) ≥ v(0) + h−λ, y − 0i, ∀ y ∈ Rm . This yields that −λ ∈ ∂v(0), thereby establishing that M ⊂ −∂v(0). Conversely, suppose that λ ∈ −∂v(0), that is, −λ ∈ ∂v(0). We will prove that λ ∈ M. Consider any y ∈ Rm + . Then it is easy to observe that C ⊂ C(y). Again by Proposition 1.7, inf f (x) ≥
x∈C
inf f (x),
x∈C(y)
that is, v(0) ≥ v(y), ∀ y ∈ Rm +. As −λ ∈ ∂v(0), which along with the above inequality leads to hλ, yi ≥ v(0) − v(y) ≥ 0. m Because y ∈ Rm + was arbitrary, it is clear that λ ∈ R+ . We now establish that λ ∈ M by proving that
inf f (x) = infn {f +
x∈C
x∈R
m X i=1
λi gi }(x),
that is, v(0) = infn {f + x∈R
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi }(x).
4.4 Lagrangian Duality
193
Note that if x ∈ C, gi (x) ≤ 0, i = 1, 2, . . . , m. Then λi ≥ 0, i = 1, 2, . . . , m. Thus, f (x) +
m X i=1
Pm
i=1
λi gi (x) ≤ 0 as
λi gi (x) ≤ f (x), ∀ x ∈ C.
Therefore, infn {f +
x∈R
m X i=1
λi gi }(x) ≤ inf {f + x∈C
m X i=1
λi gi }(x) ≤ inf f (x) = v(0). x∈C
(4.20)
The fact that −λ ∈ ∂v(0) leads to v(y) + hλ, yi ≥ v(0), ∀ y ∈ Rm , that is, for every y ∈ Rm , v(y) +
m X i=1
λi yi ≥ v(0).
Consider any x ˜ ∈ Rn and set y˜ = gi (˜ x), i = 1, 2, . . . , m. Therefore, the above inequality leads to v(˜ y) +
m X
λi gi (˜ x) ≥ v(0).
i=1
By the definition (4.18) of value function v(˜ y ) ≤ f (˜ x), which along with the above inequality leads to f (˜ x) +
m X i=1
λi gi (˜ x) ≥ v(0).
Because x ˜ was arbitrary, infn {f +
x∈R
m X i=1
λi gi }(x) ≥ v(0).
(4.21)
Combining (4.21) with (4.20), v(0) = infn {f + x∈R
m X i=1
λi gi }(x).
Therefore, λ ∈ M and thus establishing that M = −∂v(0). Now assume that v(0) is finite and the Slater constraint qualification holds, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Thus there
© 2012 by Taylor & Francis Group, LLC
194
Saddle Points, Optimality, and Duality
exists δ > 0 such that for every y ∈ Bδ (0) = δB, gi (ˆ x) < yi , i = 1, 2, . . . , m, which implies that v(y) ≤ f (ˆ x), ∀ y ∈ Bδ (0). (4.22)
As dom f = Rn , f (ˆ x) < +∞, thereby establishing that v is bounded above on Bδ (0). This fact shows that Bδ (0) × [f (ˆ x), +∞) ⊂ epi v. We claim that v(y) > −∞ for every y ∈ Rm . On the contrary, assume that there exists yˆ ∈ Rm such that v(ˆ y ) = −∞. Thus, {ˆ y } × R ⊂ epi v. Consider z = −αˆ y such that α > 0 and kzk < δ. This is possible by choosing 1 1−λ δ . Setting λ = , we have λ ∈ (0, 1) and α = . This implies α= 2kˆ yk 1+α λ −(1 − λ) yˆ, that is, that z = λ λz + (1 − λ)ˆ y = 0. By choice, z ∈ Bδ (0), which by (4.22) implies that v(z) ≤ f (ˆ x) and thus, (z, f (ˆ x)) ∈ Bδ (0) × [f (ˆ x), +∞) ⊂ epi v. Further, for every t ∈ R, (ˆ y , t) ∈ {ˆ y } × R ⊂ epi v. As v is convex, by Proposition 2.48, epi v is a convex set, which implies that (λz + (1 − λ)ˆ y , λf (ˆ x) + (1 − λ)t) ∈ epi v, that is, (0, λf (ˆ x) + (1 − λ)t) ∈ epi v. Therefore, v(0) ≤ λf (ˆ x) + (1 − λ)t, ∀ t ∈ R. Taking the limit as t → −∞, v(0) ≤ −∞. But v(0) ≥ −∞ and hence v(0) = −∞, which is a contradiction because v(0) ∈ R. By Theorem 2.72, the function v : Rm → R ∪ {+∞} is majorized on a neighborhood of the origin and hence v is continuous at y = 0. Then by Proposition 2.82, ∂v(0) is convex compact set, which implies so is M. (ii) We already know that λ ∈ M if and only −λ ∈ ∂v(0). Therefore, from Theorem 2.108, −λ ∈ ∂v(0)
© 2012 by Taylor & Francis Group, LLC
⇐⇒
v(0) + v ∗ (−λ) = 0,
4.4 Lagrangian Duality
195
which implies λ ∈ M if and only if v(0) + v ∗ (−λ) = 0. From (i) we know that v is continuous at y = 0. Thus, by Proposition 2.106, v(0) = v ∗∗ (0). By Definition 2.101 of the biconjugate, v ∗∗ (0) = sup {−v ∗ (µ)} = sup {−v ∗ (−µ)}. µ∈Rm
µ∈Rm
Thus λ ∈ M if and only if −v ∗ (−λ) = v ∗∗ (0) = sup {−v ∗ (−µ)}, µ∈Rm
which is equivalent to the fact that λ solves the problem sup −v ∗ (µ)
µ ∈ Rm .
subject to
Observe that v ∗ (µ)
=
sup {hµ, yi − v(y)}
y∈Rm
=
sup {hµ, yi − inf {f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}} x
y∈Rm
=
sup {hµ, yi + sup{−f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}} x
y∈Rm
=
sup (y,x)∈Rm ×Rn
{hµ, yi − f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}.
If for some i ∈ {1, 2, . . . , m}, µi > 0, then v ∗ (µ) = +∞. So assume that µ ∈ −Rm + . Then v ∗ (µ) = sup {−f (x) + x∈Rn
As
Pm
i=1
sup
m X
yi ≥gi (x) i=1
µi yi }.
µi yi = hµ, yi is a linear function, sup
m X
µi yi =
yi ≥gi (x) i=1
m X
µi gi (x).
i=1
Hence, for µ ∈ −Rm +, m X µi gi (x) − f (x)}. v ∗ (µ) = sup { x∈Rn i=1
In particular, for µ = −λ, v ∗ (−λ) = sup {−(f (x) + x∈Rn
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi (x))},
196
Saddle Points, Optimality, and Duality
which implies ∗
−v (−λ) = infn {f (x) + x∈R
m X
λi gi (x)}.
i=1
Thus, −v ∗ (−λ) = w(λ), thereby showing that the dual problems (DP ) and (DP 1) are the same.
4.5
Fenchel Duality
In the last section it was clear that the notion of conjugation is linked to the understanding of Lagrangian duality. In this section we explore this relation a bit more. We will focus on Fenchel duality where the dual problem is expressed explicitly in terms of the conjugate functions. Also we shall make a brief presentation of Rockafellar’s perturbation approach to duality. Our approach to Fenchel duality will be that of Borwein and Lewis [17], which we present below. ¯ and Theorem 4.13 Consider proper convex functions f : Rn → R m n m ¯ ¯ g : R → R and a linear map A : R → R . Let vF , dF ∈ R be the optimal values of the primal and the dual problems given below: vF = infn {f (x) + g(Ax)} x∈R
and
dF = sup {−f ∗ (AT λ) − g ∗ (−λ)}, φ∈Rm
where AT denotes the conjugate of the linear map A or the transpose of the matrix represented by A. In fact, A can be viewed as an m×n matrix. Assume that the condition 0 ∈ core(dom g − A dom f ) holds. Then vF = dF and the supremum in the dual problem is attained if the optimal value is finite. (Instead of the term core, one can also use interior or relative interior.) Proof. We first prove that vF ≥ dF , that is, the weak duality holds. By the definition of conjugate function, Definition 2.101, f ∗ (AT λ) = sup {hAT λ, xi − f (x)} ≥ hλ, Axi − f (x), ∀ x ∈ Rn , x∈Rn
which implies f (x) ≥ hλ, Axi − f ∗ (AT λ), ∀ x ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
4.5 Fenchel Duality
197
Similarly, we have g(Ax) ≥ −hλ, Axi − g ∗ (−λ), ∀ x ∈ Rn . The above inequalities immediately show that for any λ ∈ Rm and any x ∈ Rn , f (x) + g(Ax) ≥ −f ∗ (AT λ) − g ∗ (−λ). Thus, the above inequality it yields that vF ≥ dF . Next, to prove the equality under the given constraint qualification, define ¯ as the function h : Rm → R h(y) = infn {f (x) + g(Ax + y)}. x∈R
In the parlance of optimization, h is referred to as the optimal value function or just a value function. Here the vector y acts as a parameter. See the previous section for more details. Using Lemma 4.11, it is easy to observe that h is convex. We urge the reader to reason out for himself / herself. Further, one must decide what dom h is. We claim that dom h = dom g − A dom f. Consider y ∈ dom h, that is, h(y) < +∞. Hence there exists x ∈ Rn such that x ∈ dom f and Ax + y ∈ dom g, which leads to y ∈ dom g − A dom f. This holds for every y ∈ dom h and thus dom h ⊂ dom g − A dom f. Let z ∈ dom g − A dom f , which implies that there exists u ∈ dom g and x ˆ ∈ dom f such that z = u − Aˆ x. Hence z + Aˆ x ∈ dom g, that is, f (ˆ x) + g(z + Aˆ x) < +∞. Thus h(z) < +∞, thereby showing that z ∈ dom h. This proves the assertion toward the domain of h. Note that if vF = −∞, there is nothing to prove. Without loss of generality, we assume that vF is finite. By assumption, 0 ∈ core(dom h) (or 0 ∈ int(dom h)). By Proposition 2.82, ∂h(0) 6= ∅, which implies that there exists −ξ ∈ ∂h(0). Thus, by Definition 2.77 of the subdifferential along with the definition of h, h(0) ≤ h(y) + hξ, yi ≤ f (x) + g(Ax + y) + hξ, yi, ∀ y ∈ Rm . Hence, h(0) ≤ {f (x) − hA∗ ξ, xi} + {g(Ax + y) − h−ξ, Ax + yi}.
© 2012 by Taylor & Francis Group, LLC
198
Saddle Points, Optimality, and Duality
Taking the infimum first over y and then over x yields that h(0) ≤ −f ∗ (A∗ ξ) − g ∗ (−ξ) ≤ dF ≤ vF ≤ h(0), thereby establishing that vF = dF . Observe that the dual value is obtained at λ = ξ. It is important to mention that the above problem was also studied by Rockafellar [97]. In Rockafellar [97], the function g is taken to be a concave function, Definition 2.46, and the objective function of the primal problem and the dual problem are, respectively, given as f (x) − g(Ax)
g∗ (λ) − f ∗ (AT λ).
and
Further, g∗ denotes the conjugate of the concave function g, which is defined as g∗ (λ) = infm {hλ, yi − g(y)}. y∈R
From the historical point of view, we provide the statement of the classical Fenchel duality theorem as it appears in Rockafellar [97]. ¯ and a proper Theorem 4.14 Consider a proper convex function f : Rn → R n ¯ concave function g : R → R. Assume that one of the following conditions holds: (1) ri(dom f ) ∩ ri(dom g) 6= ∅, (2) f and g are lsc and ri(dom f ∗ ) ∩ ri(dom g∗ ) 6= ∅. Then inf {f (x) − g(x)} = sup {g∗ (λ) − f ∗ (λ)}.
x∈Rn
(4.23)
λ∈Rn
We request the readers to figure out how one will define the notion of a proper concave function. Of course, if we consider g to be a convex function, (4.23) can be written as inf {f (x) + g(x)} = sup {−g ∗ (−λ) − f ∗ (λ)}.
x∈Rn
λ∈Rn
Note that this can be easily proved using Theorem 4.13 by taking A to be the identity mapping I : Rn → Rn . Moreover, ri(dom f ) ∩ ri(dom g) 6= ∅ shows that 0 ∈ int(dom g − dom f ). Hence the result follows by invoking Theorem 4.13. We now look into the perturbation-based approach. This approach is due to Rockafellar. Rockafellar’s monogarph [99] entitled Conjugate Duality and Optimization makes a detailed study of this method in an infinite dimensional setting. We however discuss the whole issue from a finite dimensional
© 2012 by Taylor & Francis Group, LLC
4.5 Fenchel Duality
199
viewpoint. In this approach, one considers the original problem being embedded in a family of problems. In fact, we begin by considering the convexly parameterized family of convex problems min F (x, y)
subject to
x ∈ Rn ,
(CP (y)) ¯ where the vector y is called the parameter and the function F : Rn × Rm → R is assumed to be proper convex jointly in x and y. In fact, in such a situation, the optimal value function v(y) = inf F (x, y) x∈C
is a convex function by Lemma 4.11. Of course, the function F is so chosen that f0 (x) = F (x, 0), where f (x), x ∈ C, f0 (x) = +∞, otherwise. In fact, (CP ) can be viewed as min f0 (x)
subject to
x ∈ Rn ,
thus embedding the original problem (CP ) in (CP (y)). Now we pose the standard convex optimization problem as (CP (y)). Consider the problem (CP ) with C given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Corresponding to (CP ), introduce the family of parameterized problems (CP (y)) as follows min F (x, y) where F (x, y) =
f (x), +∞,
It is clear that F (x, 0) = f0 (x) =
subject to
x ∈ Rn ,
gi (x) ≤ yi , i = 1, 2, . . . , m, otherwise.
f (x), +∞,
gi (x) ≤ 0, i = 1, 2, . . . , m, otherwise.
Recall that the Lagrangian function corresponding to (CP ) is given by f (x) + hλ, g(x)i, λ ∈ Rm +, L(x, λ) = +∞, otherwise. Next we look at how to construct the dual problem for (CP (y)). Define the ¯ as Lagrangian function L : Rn × Rm → R L(x, λ) = infm {F (x, y) + hy, λi}, y∈R
© 2012 by Taylor & Francis Group, LLC
200
Saddle Points, Optimality, and Duality
that is, −L(x, λ) = sup {hy, λi − F (x, y)}. y∈Rm
Observe that F ∗ (x∗ , λ∗ ) = =
sup x∈Rn ,y∈Rm ∗
{hx∗ , xi + hλ∗ , yi − F (x, y)}
sup {hx , xi + sup (hλ∗ , yi − F (x, y))}
x∈Rn
=
y∈Rm
∗
sup {hx , xi − L(x, λ∗ )}.
x∈Rn
Thus, −F ∗ (0, λ∗ ) = infn L(x, λ∗ ). x∈R
Hence the Fenchel dual problem associated with (CP ) is sup (−F ∗ (0, λ))
subject to
λ ∈ Rm .
(DPF )
With the given Lagrangian in a similar fashion as before, one can define a saddle point (x, λ) of the Lagrangian function L(x, λ). We now state without proof the following result. For proof, see for example Lucchetti [79] and Rockafellar [99]. Theorem 4.15 Consider the problem (CP ) and (DPF ) as given above. Then the following are equivalent: ¯ be a saddle point of L, (i) (¯ x, λ) ¯ is a solution for (DPF ) and there is no (ii) x ¯ is a solution for (CP ) and λ duality gap. For more details on the perturbation-based approach, see Lucchetti [79] and Rockafellar [99].
4.6
Equivalence between Lagrangian and Fenchel Duality
In the previous sections, we studied two types of duality theories, namely the Lagrangian duality and the Fenchel duality. The obvious question that comes to mind is whether the two theories are equivalent or not. It was shown by Magnanti [81] that for a convex programming problem, both these forms of
© 2012 by Taylor & Francis Group, LLC
4.6 Equivalence between Lagrangian and Fenchel Duality
201
duality coincide. We end this chapter by taking a look at the equivalence between the two duality theories based on the approach of Magnanti [81]. Consider the following convex programming problems: Lagrange:
subject to x ∈ C,
inf f (x)
inf (f1 (x) − f2 (x))
Fenchel:
subject to x ∈ C1 ∩ C2 ,
where C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) = 0, j = 1, 2, . . . , l, x ∈ X}, f : X → R, f1 : C1 → R are convex functions; f2 : C2 → R is a concave function; gi : X → R, i = 1, 2, . . . , m, are convex non-affine functions; hj : Rn → R, j = 1, 2, . . . , l, are affine functions; and C1 , C2 , X are convex subsets of Rn . Denote the optimal values of the Lagrangian and the Fenchel convex programming problems as vL and vF , respectively. Observe that the Lagrangian problem is a particular case of (CP ) with C given by (4.9). Corresponding to the two convex programming problem, we have the following dual problems: Lagrange:
sup inf L(x, λ, µ) x∈X
subject to
sup ((f2 )∗ (ξ) − f1∗ (ξ))
Fenchel:
l (λ, µ) ∈ Rm + ×R ,
subject to ξ ∈ Rn ,
l where the Lagrangian function L : Rn × Rm + × R is defined as m X
L(x, λ, µ) = f (x) +
λi gi (x) +
i=1
l X
µj hj (x).
j=1
As fi are defined over Ci , that is, dom fi = Ci for i = 1, 2, the conjugate functions reduce to f1∗ (ξ)
=
(f2 )∗ (ξ)
=
sup {hξ, xi − f1 (x)},
x∈C1
inf {hξ, xi − f2 (x)}.
x∈C2
Denote the optimal values of the Lagrangian and the Fenchel dual problems as dL and dF , respectively. Note that f1∗ (ξ) = +∞ for some ξ ∈ Rn is a possibility. Similarly, for the concave conjugate, (f2 )∗ (ξ) = −∞ for some ξ ∈ Rn is also a possibility. But these values play no role in the Fenchel dual problem and thus the problem may be considered as Fenchel:
sup ((f2 )∗ (ξ) − f1∗ (ξ))
subject to ξ ∈ C1∗ ∩ C2∗ ,
where C1∗ = {ξ ∈ Rn : f1∗ (ξ) < +∞}
© 2012 by Taylor & Francis Group, LLC
and C2∗ = {ξ ∈ Rn : (f2 )∗ (ξ) > −∞}.
202
Saddle Points, Optimality, and Duality
By Theorem 4.7 and Theorem 4.14, we have the strong duality results for the Lagrangian and the Fenchel problems, respectively, that is, vL = d L
and
vF = d F .
Now we move on to show that the two strong dualities are equivalent. But before doing so we present a result from Magnanti [81] on relative interior. Lemma 4.16 Consider the set Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 ,
gi (x) ≤ yi , i = 1, 2, . . . , m, hj (x) = zj , j = 1, 2, . . . , l}.
If x ˆ ∈ ri X such that f (ˆ x) < yˆ0 , gi (ˆ x) < yˆi , i = 1, 2, . . . , m, and hj (ˆ x) = zˆj , j = 1, 2, . . . , l, then (ˆ y0 , yˆ, zˆ) ∈ ri Λ. Proof. By the convexity of the functions f , gi , i = 1, 2, . . . , m, and hj , j = 1, 2, . . . , l, and the set X, it is easy to observe that the set Λ is convex. It is left to the reader to verify this fact. To prove the result, we will invoke the Prolongation Principle, Proposition 2.14 (iii). Consider (y0 , y, z) ∈ Λ, that is, there exists x ∈ X such that f (x) ≤ y0 ,
gi (x) ≤ yi , i = 1, 2, . . . , m,
and hj (x) = zj , j = 1, 2, . . . , l.
n
Because X ⊂ R is a nonempty convex set and x ˆ ∈ ri X, by the Prolongation Principle, there exists γ > 1 such that γx ˆ + (1 − γ)x ∈ X, which by the convexity of X yields that αˆ x + (1 − α)x ∈ X, ∀ α ∈ (1, γ].
(4.24)
As dom f = dom gi = X, i = 1, 2, . . . , m with x ˆ ∈ ri X, for some α ∈ (1, γ], f (αˆ x + (1 − α)x) < αˆ y0 + (1 − α)y0 , gi (αˆ x + (1 − α)x) < αˆ yi + (1 − α)yi , i = 1, 2, . . . , m.
(4.25) (4.26)
By the affineness of hj , j = 1, 2, . . . , l, hj (αˆ x + (1 − α)x) < αˆ zj + (1 − α)zj , j = 1, 2, . . . , l.
(4.27)
Combining the conditions (4.24) through (4.27) yields that for α > 1, α(ˆ y0 , yˆ, zˆ) + (1 − α)(y0 , y, z) ∈ Λ. Because (y0 , y, z) ∈ Λ was arbitrary, by the Prolongation Principle, (ˆ y0 , yˆ, zˆ) ∈ ri Λ as desired. Now we present the equivalence between the strong duality results.
© 2012 by Taylor & Francis Group, LLC
4.6 Equivalence between Lagrangian and Fenchel Duality
203
Theorem 4.17 Lagrangian strong duality is equivalent to Fenchel strong duality, that is, Theorem 4.7 is equivalent to Theorem 4.14. Proof. Suppose that the Lagrangian strong duality, Theorem 4.7, holds under the assumption of modified Slater constraint qualification, that is, there exists x ˆ ∈ ri X such that gi (ˆ x) < 0, i = 1, 2, . . . , m,
and hj (ˆ x) = 0, j = 1, 2, . . . , l.
Define X = C1 × C2 × Rn and x = (x1 , x2 , x3 ). The Fenchel convex programming problem can now be expressed as vF = inf (f1 (x1 ) − f2 (x2 )), x∈C
where C = {x ∈ R3n : hrj (x) = (xj − x3 )r = 0, j = 1, 2, r = 1, 2, . . . , n, x ∈ X}. Observe that here hj : Rn → Rn . Note that the reformulated Fenchel problem is nothing but the Lagrangian convex programming problem. The corresponding Lagrangian dual problem is as follows: dL =
sup
inf {f1 (x1 ) − f2 (x2 ) +
(µ1 ,µ2 )∈R2n x∈X
n X r=1
µr1 (x1 − x3 )r +
n X r=1
µr2 (x2 − x3 )r },
that is, dL =
sup
inf {f1 (x1 ) − f2 (x2 ) + hµ1 , x1 i + hµ2 , x2 i
(µ1 ,µ2 )∈R2n x∈X
−hµ1 + µ2 , x3 i}.
(4.28)
From the assumption of Theorem 4.14, ri(dom f1 ) ∩ ri(dom f2 ) = ri C1 ∩ ri C2 6= ∅, which implies there exists x ˆ ∈ Rn such that x ˆ ∈ ri C1 ∩ ri C2 . Therefore, x = (ˆ x, x ˆ, x ˆ) ∈ ri X such that hrj (x) = 0, j = 1, 2, r = 1, 2, . . . , n; thereby implying that the modified Slater constraint qualification holds. Invoking the Lagrangian strong duality, Theorem 4.7, vF = d L ,
(4.29)
it is easy to note that if µ1 6= −µ2 , the infimum is −∞ as x3 ∈ Rn . So taking the supremum over µ = −µ1 = µ2 along with Proposition 1.7, the Lagrangian dual problem (4.28) leads to dL
= = =
sup
inf
µ∈Rn (x1 ,x2 )∈C1 ×C2
{f1 (x1 ) − f2 (x2 ) − hµ, x1 i + hµ, x2 i}
sup { inf (hµ, x2 i − f2 (x2 )) + inf (f1 (x1 ) − hµ, x1 i)}
µ∈Rn x2 ∈C2
sup {(f2 )∗ (µ) − f1∗ (µ)},
µ∈Rn
© 2012 by Taylor & Francis Group, LLC
x1 ∈C1
204
Saddle Points, Optimality, and Duality
thereby implying that dL = dF . This along with the relation (4.29) yields that vF = dF and hence the Fenchel strong duality holds. Conversely, suppose that the Fenchel strong duality holds under the assumption that ri C1 ∩ ri C2 6= ∅. Define C1 = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}
and C2 = {(y0 , y, z) ∈ R1+m+l : yi ≤ 0, i = 1, 2, . . . , m, zj = 0, j = 1, 2, . . . , l}. The Lagrange convex programming problem can now be expressed as vL = inf{y0 : (y0 , y, z) ∈ C1 ∩ C2 }, which is of the form of the Fenchel problem with f1 (y0 , y, z) = y0 and f2 (y0 , y, z) = 0. The corresponding Fenchel dual problem is dF
= sup{((f2 )∗ (ξ) − f1∗ (ξ)) : ξ = (λ0 , λ, µ) ∈ R1+m+l } = sup { inf {λ0 y0 + hλ, yi + hµ, zi} (λ0 ,λ,µ)∈R1+m+l (y0 ,y,z)∈C2
− =
{
sup
inf
(λ0 ,λ,µ)∈R1+m+l (y0 ,y,z)∈C1
+
sup (y0 ,y,z)∈C1
{λ0 y0 + hλ, yi + hµ, zi − y0 }
{y0 − λ0 y0 − hλ, yi − hµ, zi} inf
(y0 ,y,z)∈C2
{λ0 y0 + hλ, yi + hµ, zi}}. (4.30)
By the assumption of Theorem 4.7, the modified Slater constraint qualification holds, which implies that there exists xˆ ∈ ri X such that gi (ˆ x) < 0, i = 1, 2, . . . , m, and hj (ˆ x) = 0, j = 1, 2, . . . , l. As dom gi = X, i = 1, 2, . . . , m, by Theorem 2.69, gi , i = 1, 2, . . . , m, is continuous on ri X. Therefore, there exists yˆi < 0 such that gi (ˆ x) < yˆi < 0, i = 1, 2, . . . , m. Also, as dom f = X with x ˆ ∈ ri X, one may choose yˆ0 ∈ R such that f (ˆ x) < yˆ0 . Thus, for x ˆ ∈ ri X, f (ˆ x) < yˆ0 , gi (ˆ x) < yˆi , i = 1, 2, . . . , m, and hj (ˆ x) = zˆj , j = 1, 2, . . . , l, where zˆj = 0, j = 1, 2, . . . , l. By Lemma 4.16, (ˆ y0 , yˆ, zˆ) ∈ ri C1 ∩ ri C2 , which implies ri C1 ∩ ri C2 6= ∅. Thus, by Theorem 4.14, vL = d F . From the definition of C2 , the second infimum in (4.30) reduces to inf{λ0 y0 +
m X i=1
© 2012 by Taylor & Francis Group, LLC
λi yi : y0 ∈ R, yi ≤ 0, i = 1, 2, . . . , m}.
(4.31)
4.6 Equivalence between Lagrangian and Fenchel Duality
205
The infimum is −∞ if either λ0 6= 0 or λi > 0 for some i = 1, 2, . . . , m and takes the value 0 otherwise. Therefore, the Fenchel dual problem becomes dF
=
=
=
sup
inf
{y0 − hλ, yi − hµ, zi}
sup
inf
{y0 +
l (y0 ,y,z)∈C1 (λ,µ)∈Rm − ×R
l (y0 ,y,z)∈C1 (λ,µ)∈Rm + ×R
sup
inf {f (x) +
l x∈X (λ,µ)∈Rm + ×R
m X i=1
m X
λi yi +
i=1
λi gi (x) +
l X j=1
l X
µj zj }
µj hj (x)},
j=1
which yields that dF = dL . This along with (4.31) implies that vL = dL , thereby establishing the Lagrangian strong duality.
© 2012 by Taylor & Francis Group, LLC
Chapter 5 Enhanced Fritz John Optimality Conditions
5.1
Introduction
Until now we have studied how to derive the necessary KKT optimality conditions for convex programming problems (CP ) or its slight variations such as (CP 1), (CCP ) or (CCP 1) via normal cone or saddle point approach. Observe that in the KKT optimality conditions, the multiplier associated with the subdifferential of the objective function is nonzero and thus normalized to one. As discussed in Chapters 3 and 4, some additional conditions known as the constraint qualifications are to be satisfied by the constraints to ensure that the multiplier is nonzero and hence the KKT optimality conditions hold. But in absence a of constraint qualification, one may not be able to derive KKT optimality conditions. For example, consider the problem min x
subject to
x2 ≤ 0.
In this example, f (x) = x and g(x) = x2 with C = {0} at which none of the constraint qualifications is satisfied. Observe that the KKT optimality conditions is also not satisfied at the point of minimizer x¯ = 0, the only feasible point, as there do not exist λ0 = 1 and λ ≥ 0 such that λ0 ∇f (¯ x) + λ∇g(¯ x) = 0
and
λg(¯ x) = 0.
In this chapter, we will consider the convex programming problem min f (x)
subject to x ∈ C
(CP 1)
where (CP 1) which involves not only inequality constraints but also additional abstract constraints, that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X}. Here f, gi : Rn → R, i = 1, 2, . . . , m, are convex functions on Rn and X ⊂ Rn is a closed convex set. Below we present the standard Fritz John optimality conditions for (CP 1). 207 © 2012 by Taylor & Francis Group, LLC
208
Enhanced Fritz John Optimality Conditions
Theorem 5.1 Consider the convex programming problem (CP 1) and let x ¯ be a point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that 0 ∈ λ0 ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x)
λi gi (¯ x) = 0, ∀ i = 1, 2, . . . , m.
and
i=1
Proof. As x ¯ is a point of minimizer of (CP 1), it is a point of minimizer of the problem min F (x)
subject to
x ∈ X,
where F (x) = max{f (x) − f (¯ x), g1 (x), g2 (x), . . . , gm (x)} is a convex function. Therefore by the optimality condition (ii) of Theorem 3.1, 0 ∈ ∂F (¯ x) + NX (¯ x). Applying the Max-Function Rule, XTheorem 2.96, there exist λ0 ≥ 0 and λi ≥ 0, i ∈ I(¯ x), satisfying λ0 + λi = 1 such that i∈I(¯ x)
0 ∈ λ0 ∂f (¯ x) +
X
λi ∂gi (¯ x) + NX (¯ x),
i∈I(¯ x)
where I(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = 0} is the active index set at x ¯. For i∈ / I(¯ x), defining λi = 0 yields 0 ∈ λ0 ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x)
i=1
along with λi gi (¯ x) = 0, i = 1, 2, . . . , m, hence completing the proof.
Note that in the example considered earlier, the Fritz John optimality condition holds if one takes λ0 = 0 and λ > 0. Observe that the Fritz John optimality conditions are only necessary and not sufficient. To study the sufficiency optimality conditions, one needs KKT optimality conditions.
5.2
Enhanced Fritz John Conditions Using the Subdifferential
Recently, Bertsekas [11, 12] studied the Fritz John optimality conditions, which are more enhanced than those stated above and hence called them enhanced Fritz John optimality conditions. The proof of the enhanced Fritz John
© 2012 by Taylor & Francis Group, LLC
5.2 Enhanced Fritz John Conditions Using the Subdifferential
209
optimality condition involves the combination of the quadratic penalty function and metric approximation approaches. The penalty function approach is an important theoretical as well as algorithmic method in the study of constrained programming problems. Corresponding to the given problem, a sequence of the unconstrained penalized problem is formulated and in the limiting scenario, the sequence of point of minimizers of the penalized problem converges to the point of minimizer of the original constrained problem. The approach of metric approximations was introduced by Mordukhovich [84, 85]. This approach involves approximating the objective function and the constraint functions by smooth functions and reducing the constrained into an unconstrained problem. The work of Bertsekas [11, 12] was based mostly on the work of Hestenes [55], which was in turn motivated by the penalty function approach of McShane [83] to establish the Fritz John optimality conditions. It was the work of Hestenes [55] in which the complementary slackness was strengthened to obtain a somewhat weaker condition than the complementary violation condition, which we will discuss in the subsequent derivation of enhanced Fritz John optimality condition. In their works, McShane [83] and Hestenes [55] considered X = Rn while Bertsekas extended the study when X 6= Rn . Below we discuss the above approach to establish the enhanced Fritz John optimality conditions for the convex programming problem (CP 1). Theorem 5.2 Consider the convex programming problem (CP 1) and let x ¯ be the point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that (i) 0 ∈ λ0 ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x).
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ 6= ∅, then there exists a sequence {xk } ⊂ X that converges to x ¯ and is such that for all k sufficiently large, f (xk ) < f (¯ x)
and
¯ λi gi (xk ) > 0, ∀ i ∈ I.
Proof. For k = 1, 2, . . ., consider the penalized problem min Fk (x)
subject to x ∈ X ∩ cl Bε (¯ x),
(Pk )
where ε > 0 is such that f (¯ x) ≤ f (x) for every x ∈ cl Bε (¯ x) feasible to (CP 1). The function Fk : Rn → R is defined as Fk (x) = f (x) +
m 1 kX + (g (x))2 + kx − x ¯k2 , 2 i=1 2
where g + (x) = max{0, g(x)}. By the convexity of the functions f and gi , i = 1, 2, . . . , m, Fk is a real-valued convex on Rn . As dom Fk = Rn , by Theorem 2.69, Fk is continuous on Rn . Also, as X is a closed convex set
© 2012 by Taylor & Francis Group, LLC
210
Enhanced Fritz John Optimality Conditions
and cl Bε (¯ x) is a compact convex set, X ∩ cl Bε (¯ x) is a compact convex subset of Rn . By the Weierstrass Theorem, Theorem 1.14, there exists a point of minimizer xk for the problem (Pk ). Therefore, Fk (xk ) ≤ Fk (¯ x), ∀ k ∈ N, which implies f (xk ) +
m 1 kX + (g (xk ))2 + kxk − x ¯k2 ≤ f (¯ x), ∀ k ∈ N. 2 i=1 i 2
(5.1)
Because dom f = Rn , again by Theorem 2.69, f is continuous on Rn . Hence, it is continuous on X ∩ cl Bε (¯ x) and thus bounded over X ∩ cl Bε (¯ x). By the boundedness of f (xk ) over X ∩ cl Bε (¯ x) and the relation (5.1), we have lim gi+ (xk ) = 0, i = 1, 2, . . . , m.
k→∞
(5.2)
Otherwise as k → +∞, the left-hand side of (5.1) also tends to infinity, which is a contradiction. As {xk } is a bounded sequence, by the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, assume that {xk } converge to x ˜ ∈ X ∩ cl Bε (¯ x). By the condition (5.2), gi (˜ x) ≤ 0, i = 1, 2, . . . , m, and hence x ˜ is feasible for (CP 1). Taking the limit as k → +∞ in the condition (5.1) yields 1 x−x ¯k2 ≤ f (¯ x). f (˜ x) + k˜ 2 As x ¯ is the point of minimizer of (CP 1) and x ˜ is feasible to (CP 1), f (¯ x) ≤ f (˜ x). Thus the above inequality reduces to k˜ x−x ¯k2 ≤ 0, which implies k˜ x−x ¯k = 0, that is, x ˜=x ¯. Hence, the sequence xk → x ¯ and ¯ thus there exists a k¯ ∈ N such that xk ∈ ri X ∩ Bε (¯ x) for every k ≥ k. ¯ xk is a point of minimizer of the penalized problem (Pk ), which For k ≥ k, by Theorem 3.1 implies that 0 ∈ ∂Fk (xk ) + NX∩Bε (¯x) (xk ). As xk ∈ ri X ∩ Bε (¯ x), by Proposition 2.39, NX∩Bε (¯x) (xk ) = NX (xk ) + NBε (¯x) (xk ). Again, because xk ∈ Bε (¯ x), by Example 2.38, NBε (¯x) (xk ) = {0} and thus ¯ 0 ∈ ∂Fk (xk ) + NX (xk ), ∀ k ≥ k.
© 2012 by Taylor & Francis Group, LLC
5.2 Enhanced Fritz John Conditions Using the Subdifferential
211
As dom f = dom gi = Rn , i = 1, 2, . . . , m, by Theorem 2.69, f and gi , i = 1, 2, . . . , m are continuous on Rn . Applying the Sum Rule and the Chain Rule for the subdifferentials, Theorems 2.91 and 2.94, respectively, the above condition becomes 0 ∈ ∂f (xk ) + k
m X i=1
¯ gi+ (xk )∂gi+ (xk ) + (xk − x ¯) + NX (xk ), ∀ k ≥ k,
¯ there exist ξ k ∈ ∂f (xk ) and ξ k ∈ ∂gi (xk ), which implies that for every k ≥ k, 0 i i = 1, 2, . . . , m, such that −{ξ0k +
m X i=1
αik ξik + (xk − x ¯)} ∈ NX (xk ),
where αik = kβk gi+ (xk ) and βk ∈ [0, 1] for i = 1, 2, . . . , m. Denote v u m X u αk 1 k t (αik )2 , λk0 = k and λki = ki , i = 1, 2, . . . , m. γ = 1+ γ γ i=1
Observe that
(λk0 )2 +
m X i=1
¯ (λki )2 = 1, ∀ k ≥ k.
(5.3)
(5.4)
(5.5)
Therefore, the sequences {λk0 } and {λki }, i = 1, 2, . . . , m, are bounded sequences in R+ and thus by the Bolzano–Weierstrass Theorem, Proposition 1.3 have a convergent subsequence. Without loss of generality, let λki → λi , ¯ i = 0, 1, . . . , m. As αik ≥ 0, i = 1, 2, . . . , m and γ k ≥ 1 for every k ≥ k, k λi ≥ 0 and thereby implying that λi ≥ 0, i = 0, 1, . . . , m. Also by condition (5.5), it is obvious that λ0 , λ1 , . . . , λm are not simultaneously zero. Now dividing (5.3) by γ k leads to −{λk0 ξ0k +
m X
λki ξik +
i=1
1 (xk − x ¯)} ∈ NX (xk ). γk
(5.6)
As f and gi , i = 1, 2, . . . , m are continuous at xk ∈ Rn , therefore by Proposition 2.82, ∂f (xk ) and ∂gi (xk ), i = 1, 2, . . . , m, are compact. Thus {ξik }, i = 0, 1, . . . , m, are bounded sequences in Rn and hence by the Bolzano– Weierstrass Theorem have a convergent subsequence. Without loss of generality, let ξik → ξi , i = 0, 1, . . . , m. By the Closed Graph Theorem, Theorem 2.84, of the subdifferentials, ξ0 ∈ ∂f (¯ x) and ξi ∈ ∂gi (¯ x) for i = 1, 2, . . . , m. Taking the limit as k → +∞ in (5.6) along with the fact that the normal cone NX has a closed graph yields −{λ0 ξ0 +
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi ξi } ∈ NX (¯ x),
212
Enhanced Fritz John Optimality Conditions
which implies 0 ∈ λ0 ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x),
i=1
thereby establishing condition (i). Now suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is non¯ corresponding to λi > 0, there exists a sequence λk → λi . empty. For i ∈ I, i ¯ By Therefore, for all k sufficiently large, λki > 0 and hence λi λki > 0 for i ∈ I. the condition (5.4), λi gi+ (xk ) > 0 for sufficiently large k, which implies ¯ λi gi (xk ) > 0, ∀ i ∈ I. Also, by condition (5.1), f (xk ) < f (¯ x) for sufficiently large k and hence condition (ii) is satisfied, thereby yielding the requisite result. Observe that the condition (ii) in the above theorem is a condition that replaces the complementary slackness condition in the Fritz John optimality condition. According to the condition (ii), if λi > 0, the corresponding constraint gi is violated at the points arbitrarily close to x¯. Thus the condition (ii) is called the complementary violation condition by Bertsekas and Ozdaglar [13]. Now let us consider, in particular, X = Rn and gi , i = 1, 2, . . . , m, to be affine in (CP 1). Then from the above theorem there exist λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero, such that conditions (i) and (ii) hold. Due to affineness of gi , i = 1, 2, . . . , m, we have gi (x) = gi (¯ x) + h∇gi (¯ x), x − x ¯i, ∀ x ∈ Rn .
(5.7)
Suppose that λ0 = 0. Then by condition (i) of Theorem 5.2, 0=
m X i=1
λi ∇gi (¯ x),
which implies that 0=
m X i=1
λi h∇gi (¯ x), x − x ¯i.
As all the scalars cannot be all simultaneously zero, the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. By condition (ii), there exists ¯ Therefore, by (5.7), a sequence {xk } ⊂ Rn such that gi (xk ) > 0 for i ∈ I. which along with the above condition for x = xk , leads to m X
λi gi (¯ x) =
i=1
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi (xk ) =
X i∈I¯
λi gi (xk ) > 0,
5.2 Enhanced Fritz John Conditions Using the Subdifferential
213
¯ thereby contradicting the feasiwhich implies that gi (¯ x) > 0 for some i ∈ I, bility of x ¯. Thus λ0 > 0 and hence can be normalized to one, thereby leading to the KKT optimality condition. Observe that in the case as discussed above, the KKT optimality condition holds without any assumption of constraint qualification. But if the convex programming problem is not of the above type, to ensure that λ0 6= 0, one has to impose some form of constraint qualification. In view of the enhanced Fritz John optimality conditions, Bertsekas [12] introduced the notion of pseudonormality, which is defined as follows. Definition 5.3 A feasible point x ¯ of (CP 1) is said to be pseudonormal if there does not exist any λi , i = 1, 2, . . . , m, and sequence {xk } ⊂ X such that (i) 0 ∈
m X
λi ∂gi (¯ x) + NX (¯ x)
i=1
(ii) λi ≥ 0, i = 1, 2, . . . , m and λi = 0 for i 6∈ I(¯ x). Recall that I(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = 0} denotes the active index set at x ¯. (iii) {xk } converges to x ¯ and m X i=1
λi gi (xk ) > 0, ∀ k ∈ N.
Below we present a result to show how the affineness of gi , i = 1, 2, . . . , m, or the Slater-type constraint qualification ensure the pseudonormality at a feasible point. Theorem 5.4 Consider the problem (CP 1) and let x ¯ be a feasible point of (CP 1). Then x ¯ is pseudonormal under either one of the following two criteria: (a) Linearity criterion: X = Rn and the functions gi , i = 1, 2, . . . , m, are affine. (b) Slater-type constraint qualification: there exists a feasible point x ˆ ∈X of (CP 1) such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Proof. (a) Suppose on the contrary that x ¯ is not pseudonormal, which implies that there exist λi , i = 1, 2, . . . , m, and {xk } ⊂ Rn satisfying conditions (i), (ii), and (iii) in the Definition 5.3. By the affineness of gi , i = 1, 2, . . . , m, for every x ∈ Rn , gi (x) = gi (¯ x) + h∇gi (¯ x), x − x ¯i, which implies m X
λi gi (x) =
i=1
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi (¯ x) +
m X i=1
λi h∇gi (¯ x), x − x ¯i, ∀ x ∈ Rn .
(5.8)
214
Enhanced Fritz John Optimality Conditions
By the conditions (i) and (ii) in the definition of pseudonormality, 0=
m X i=1
λi ∇gi (¯ x)
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m,
thereby reducing the condition (5.8) to m X i=1
λi gi (x) = 0, ∀ x ∈ Rn .
This is a contradiction of condition (iii) of Definition 5.3 at x¯. Hence, x ¯ is pseudonormal. (b) On the contrary, suppose that x ¯ is not pseudonormal. By the convexity of gi , i = 1, 2, . . . , m, for every x ∈ Rn , gi (x) − gi (¯ x) ≥ hξi , x − x ¯i, ∀ ξi ∈ ∂gi (¯ x), i = 1, 2, . . . , m.
(5.9)
By condition (i) in the definition of pseudonormality, there exist ξ¯i ∈ ∂gi (¯ x), i = 1, 2, . . . , m, such that m X i=1
λi hξ¯i , x − x ¯i ≥ 0, ∀ x ∈ X.
The above inequality along with condition (ii) reduces the condition (5.9) to m X i=1
λi gi (x) ≥ 0, ∀ x ∈ X.
(5.10)
As the Slater constraint qualification is satisfied at xˆ ∈ X, m X
λi gi (ˆ x) < 0
i=1
if λi > 0 for some i ∈ I(¯ x). Thus, the condition (5.10) holds only if λi = 0 for i = 1, 2, . . . , m. But then this contradicts condition (iii). Therefore, x¯ is pseudonormal. In Chapter 3 we derived the KKT optimality conditions under the Slater constraint qualification as well as the Abadie constraint qualification. For the convex programming problem (CP ) considered in previous chapters, the feasible set C was given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}. Recall that the Abadie constraint qualification is said to hold at x¯ ∈ C if TC (¯ x) = {d ∈ Rn : gi′ (¯ x, d) ≤ 0, ∀ i ∈ I(¯ x)},
© 2012 by Taylor & Francis Group, LLC
5.2 Enhanced Fritz John Conditions Using the Subdifferential
215
where I(¯ x) is the active index set at x ¯. But unlike the Slater constraint qualification, the Abadie constraint qualification need not imply pseudonormality. For better understanding, let us recall the example C = {x ∈ R : |x| ≤ 0, x ≤ 0}. From the discussion in Chapter 3, we know that the Abadie constraint qualification is satisfied at x ¯ = 0 but the Slater constraint qualification does not hold as the feasible set C = {0}. Observe that both constraints are active at x ¯. Taking the scalars λ1 = λ2 = 1 and the sequence {xk } as {1/k}, conditions (i), (ii), and (iii) in Definition 5.3 are satisfied. Thus, x¯ = 0 is not pseudonormal. The Abadie constraint qualification is also known as quasiregularity at x ¯. This condition was defined for X = Rn . The notion of quasiregularity is implied by the concept of quasinormality. This concept was introduced by Hestenes [55] for the case X = Rn . The notion of quasinormality is further implied by pseudonormality. Now if X 6= Rn , the quasiregularity at x ¯ is defined as TC (¯ x) = {d ∈ Rn : gi′ (¯ x, d) ≤ 0, ∀ i ∈ I(¯ x)} ∩ TX (¯ x). The above condition was studied by Gould and Tolle [53] and Guignard [54]. It was shown by Ozdaglar [91] and Ozdaglar and Bertsekas [92] that under the regularity (Chapter 2 end notes) of the set X, pseudonormality implies quasiregularity. They also showed that unlike the case X = Rn where quasiregularity leads to KKT optimality conditions, the concept is not enough to derive KKT conditions when X 6= Rn unless some additional conditions are assumed. For more on quasiregularity and quasinormality, readers are advised to refer to the works of Bertsekas. Next we establish the KKT optimality conditions under the pseudonormality assumptions at the point of minimizer. Theorem 5.5 Consider the convex programming problem (CP 1). Assume that x ¯ satisfies pseudonormality. Then x ¯ is a point of minimizer of (CP 1) if and only if there exist λi ≥ 0, i = 1, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x)
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
i=1
Proof. Observe that the complementary slackness condition is equivalent to condition (ii) in the definition of pseudonormality. Therefore, λi = 0 for every i∈ / I(¯ x). Suppose that the multiplier λ0 associated with the subdifferential of the objective function in the enhanced Fritz John optimality condition is zero. Therefore, 0∈
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi ∂gi (¯ x) + NX (¯ x),
216
Enhanced Fritz John Optimality Conditions
that is, condition (i) of Definition 5.3 holds. As all λi ≥ 0, i = 0, 1, . . . , m, are not simultaneously zero, λi > 0 for some i ∈ I(¯ x) and thus I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Therefore, by condition (ii) of the enhanced Fritz John condition, there exists a sequence {xk } ⊂ X converging to x ¯ such that ¯ λi gi (xk ) > 0, ∀ i ∈ I, which implies m X
λi gi (xk ) > 0.
i=1
Thus condition (iii) in the definition of pseudonormality is satisfied, thereby implying that x ¯ is not pseudonormal. This contradicts the given hypothesis. Therefore, λ0 6= 0, thereby satisfying the KKT optimality conditions. The sufficiency can be worked out using the convexity of the objective function and the constraint functions along with the convexity of the set X as done in Chapter 3.
5.3
Enhanced Fritz John Conditions under Restrictions
Observe that in the problem (CP 1), the functions f and gi , i = 1, 2, . . . , m, are convex on Rn . But if these functions are convex only over the closed convex set X, the line of proof of the above theorem breaks down. Bertsekas, Ozdaglar, and Tseng [14] gave an alternative version of the enhanced Fritz John optimality conditions, which is independent of the subdifferentials. The proof given by them, which we present below relies on the saddle point theory. Theorem 5.6 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex set X ⊂ Rn and let x ¯ be a point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that (i) λ0 f (¯ x) = min{λ0 f (x) + x∈X
m X
λi gi (x)}.
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ 6= ∅, then there exists a sequence {xk } ⊂ X that converges to x ¯ and is such that lim f (xk ) = f (¯ x)
k→∞
lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m,
and
k→∞
and for all k sufficiently large f (xk ) < f (¯ x)
© 2012 by Taylor & Francis Group, LLC
and
¯ gi (xk ) > 0, ∀ i ∈ I.
5.3 Enhanced Fritz John Conditions under Restrictions
217
Proof. For the positive integers k and r, consider the saddle point function Lk,r : X × Rm + → R defined as Lk,r (x, α) = f (x) +
m X 1 1 2 kx − x ¯ k + αi gi (x) − kαk2 . 3 k 2r i=1
For fixed αi ≥ 0, i = 1, 2, . . . , m, by the lower semicontinuity and convexity of the functions f and gi , i = 1, 2, . . . , m, over X, Lk,r (., α) is an lsc convex function while for a fixed x ∈ X, Lk,r (x, .) is strongly concave and quadratic in α. For every k, define the set ¯ k (¯ Xk = X ∩ B x). Observe that Xk is a compact set. As f and gi , i = 1, 2 . . . , m, are lsc convex on X, the functions are lsc, convex on Xk . Also, as Lk,r (x, .) is strongly concave, it has a unique maximizer over Rm + and thus for some β ∈ R, the level set {α ∈ Rm x, α) ≥ β} + : Lk,r (¯ is nonempty and compact. Thus by condition (iii) of the Saddle Point Theorem, Proposition 4.1, Lk,r has a saddle point over Xk × Rm + , say (xk,r , αk,r ). By the saddle point definition, Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm + . (5.11) As Lk,r (., αk,r ) attains an infimum over Xk at xk,r , Lk,r (xk,r , αk,r )
= ≤
m X 1 1 2 ¯k + αk,r i gi (xk,r ) − kαk,r k2 f (xk,r ) + 3 kxk,r − x k 2r i=1
inf {f (x) +
x∈Xk
m X 1 2 kx − x ¯ k + αk,r i gi (x)} k3 i=1
≤
inf
{f (x) +
x∈Xk ,gi (x)≤0,∀i
≤
x∈Xk ,gi (x)≤0,∀i
inf
{f (x) +
m X 1 2 kx − x ¯ k + αk,r i gi (x)} k3 i=1
1 kx − x ¯k2 }. k3
As x ¯ ∈ Xk and satisfies gi (¯ x) ≤ 0, i = 1, 2, . . . , m, the above inequalities yield Lk,r (xk,r , αk,r ) ≤ f (¯ x).
(5.12)
Again from (5.11), Lk,r (xk,r , .) attains a supremum over α ∈ Rm + at αk,r . As a function of α ∈ Rm + , Lk,r (xk,r , .) is strongly concave and quadratic, and thus, has a unique supremum at αk,r i = rgi+ (xk,r ), i = 1, 2, . . . , m.
© 2012 by Taylor & Francis Group, LLC
(5.13)
218
Enhanced Fritz John Optimality Conditions
We leave it to the readers to figure out how to compute αk,r i . Therefore, Lk,r (xk,r , αk,r ) = f (xk,r ) +
1 r kxk,r − x ¯k2 + kg + (xk,r )k2 , k3 2
(5.14)
which implies that Lk,r (xk,r , αk,r ) ≥ f (xk,r ).
(5.15)
From the conditions (5.12) and (5.15), we have xk,r ∈ {x ∈ Xk : f (x) ≤ f (¯ x)}. As Xk is compact, the set {x ∈ Xk : f (x) ≤ f (xk )} is bounded and thus {xk,r } forms a bounded sequence. In fact, we leave it to the readers to show that f is also coercive on Xk . Thus, by the Bolzano–Weierstrass Theorem, Proposition 1.3, for a fixed k the sequence {xk,r } has a convergent subsequence. Without loss of generality, let {xk,r } converge to xk ∈ {x ∈ Xk : f (x) ≤ f (¯ x)}. As f is convex and coercive on Xk , by the Weierstrass Theorem, Theorem 1.14, an infimum over Xk exists. Therefore for each k, the sequence {f (xk,r )} is bounded below by inf x∈Xk f (x). Also, by condition (5.12), Lk,r (xk,r , αk,r ) is bounded above by f (¯ x). Thus, from (5.14), lim sup gi (xk,r ) ≤ 0, i = 1, 2, . . . , m, r→∞
which along with the lower semicontinuity of gi , i = 1, 2, . . . , m, implies that gi (xk ) ≤ 0 for i = 1, 2, . . . , m, thereby yielding the feasibility of xk for (CP 1). We urge the reader to work out the details. As x ¯ is the minimizer of (CP 1), f (xk ) ≥ f (¯ x), which along with the conditions (5.12), (5.15), and the lower semicontinuity of f leads to f (¯ x) ≤ f (xk ) ≤ lim inf f (xk,r ) ≤ lim sup f (xk,r ) ≤ f (¯ x), r→∞
r→∞
which implies that for each k, lim f (xk,r ) = f (¯ x).
r→∞
By the conditions (5.12) and (5.14), we have for every k ∈ N, lim xk,r = x ¯.
r→∞
Further note that using the definition of Lk,r (xk,r , αk,r ) and using (5.12) and (5.15), for every k, lim
r→+∞
© 2012 by Taylor & Francis Group, LLC
m X i=1
αk,ri gi (xk,r ) = 0.
5.3 Enhanced Fritz John Conditions under Restrictions
219
Therefore, by the preceding conditions, lim {f (xk,r ) − f (¯ x) +
r→∞
m X
αk,r i gi (xk,r )} = 0.
(5.16)
i=1
Denote γ k,r
v u m X u (αk,r i )2 , = t1 +
λk,r 0 =
i=1
1 γ k,r
and λk,r = i
αk,r i , i = 1, 2, . . . , m. γ k,r (5.17)
Dividing (5.16) by γ k,r > 0 leads to k,r lim {λk,r x) + 0 f (xk,r ) − λ0 f (¯
r→∞
m X
λk,r i gi (xk,r )} = 0.
i=1
For each k, we fix an integer rk such that k |λk,r f (xk,rk ) 0
−
k λk,r f (¯ x) 0
+
m X i=1
k λk,r gi (xk,rk )| ≤ i
1 k
(5.18)
and kxk,rk − x ¯k ≤
1 , k
|f (xk,rk ) − f (¯ x)| ≤
1 , k
|gi+ (xk,rk )| ≤
1 , i = 1, 2, . . . , m. k (5.19)
Dividing the saddle point condition Lk,rk (xk,rk , αk,rk ) ≤ Lk,rk (x, αk,rk ), ∀ x ∈ Xk by γ k,rk yields k λk,r f (xk,rk ) 0
m k X λk,r 2 0 k ¯k + λk,r gi (xk,rk ) + 3 kxk,rk − x i k i=1 k ≤ λk,r f (x) + 0
m X 1 2 k kx − x ¯ k + λk,r gi (x), ∀ x ∈ Xk . i k 3 γ k,rk i=1
As αik,rk ≥ 0, i = 1, 2, . . . , m, from the condition (5.17), γ k,rk ≥ 1 and k λk,r ≥ 0, i = 0, 1, . . . , m, along with i k 2 (λk,r ) + 0
m X
k 2 (λk,r ) = 1. i
i=1
k Therefore, {λk,r }, i = 0, 1, . . . , m, are bounded sequences in R+ and thus i by the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent k subsequence. Without loss of generality, assume that λk,r → λi ≥ 0, i
© 2012 by Taylor & Francis Group, LLC
220
Enhanced Fritz John Optimality Conditions
i = 0, 1, . . . , m, not all simultaneously zero. Taking the limit as k → +∞ in the above inequality along with the condition (5.18) leads to λ0 f (¯ x) ≤ λ0 f (x) +
m X i=1
λi gi (x), ∀ x ∈ X,
which implies λ0 f (¯ x) ≤
inf {λ0 f (x) +
x∈X
≤
x∈X,gi (x)≤0,∀i
≤
x∈X,gi (x)≤0,∀i
inf
inf
m X
λi gi (x)}
i=1
{λ0 f (x) + λ0 f (x)
m X
λi gi (x)}
i=1
= λ0 f (¯ x). Therefore, λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero, satisfy condition (i), that is, λ0 f (¯ x) = inf {λ0 f (x) + x∈X
m X
λi gi (x)}.
i=1
Next suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is ¯ there is a sequence λk,rk → λi nonempty. Corresponding to λi > 0 for i ∈ I, i k,rk such that λi > 0, i = 1, 2, . . . , m, which along with the condition (5.13) implies gi (xk,rk ) > 0, ∀ i ∈ I¯ for sufficiently large k. For each k, choosing rk such that xk,rk 6= x ¯ and the condition (5.19) is satisfied, implies that xk,rk → x ¯,
f (xk,rk ) → f (¯ x),
gi+ (xk,rk ) → 0, i = 1, 2, . . . , m.
Also, by the condition (5.12), f (xk,rk ) ≤ f (¯ x), thereby proving (ii) and hence establishing the requisite result.
Similar to the pseudonormality notion defined earlier, the notion is stated as below for the enhanced Fritz John conditions obtained in Theorem 5.6. Definition 5.7 The constraint set of (CP 1) is said to be pseudonormal if there do not exist any scalars λi ≥ 0, i = 1, 2, . . . , m, and a vector x′ ∈ X such that
© 2012 by Taylor & Francis Group, LLC
5.3 Enhanced Fritz John Conditions under Restrictions (i) 0 = inf
x∈X
(ii)
m X
m X
221
λi gi (x),
i=1
λi gi (x′ ) > 0.
i=1
For a better understanding of the above definition of pseudonormality, we recall the idea of proper separation from Definition 2.25. A hyperplane H is said to separate two convex sets F1 and F2 properly if sup ha, x1 i ≤ inf ha, x2 i
x1 ∈F1
and
x2 ∈F2
inf ha, x1 i < sup ha, x2 i.
x1 ∈F1
x2 ∈F2
Now consider a set G = {g(x) = (g1 (x), g2 (x), . . . , gm (x)) : x ∈ X}. Then from Definition 5.7 it is easy to observe that pseudonormality implies that there exists no hyperplane H that separates the set G and the origin {0} properly. Similar to Theorem 5.4, the pseudonormality of the constraint set can be derived under the Linearity criterion or the Slater constraint qualification. Theorem 5.8 Consider the problem (CP 1). Then the constraint set is pseudonormal under either one of the following two criteria: (a) Linearity criterion: X = Rn and the functions gi , i = 1, 2, . . . , m, are affine. (b) Slater--type constraint qualification: there exists a feasible point x ˆ∈X of (CP 1) such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Proof. (a) Suppose on the contrary that the constraint set is not pseudonormal, which implies that there exist λi ≥ 0, i = 1, 2, . . . , m, and a vector x′ ∈ Rn satisfying conditions (i) and (ii) in the Definition 5.7. Suppose that x ¯ ∈ Rn is feasible to (CP 1), that is, gi (¯ x) ≤ 0, i = 1, 2, . . . , m, which along with condition (i) yields m X λi gi (¯ x) = 0. (5.20) i=1
By the affineness of gi , i = 1, 2, . . . , m, gi (x) = gi (¯ x) + h∇gi (¯ x), x − x ¯i, ∀ x ∈ Rn , which again by condition (i) and (5.20) implies m X i=1
© 2012 by Taylor & Francis Group, LLC
λi h∇gi (¯ x), x − x ¯i ≥ 0, ∀ x ∈ Rn .
222
Enhanced Fritz John Optimality Conditions
λ
X = Rn λ 0
0
G
G H
H
Linearity criterion
Slater criterion
FIGURE 5.1: Pseudonormality. Pm x) ∈ NRn (¯ By Definition 2.36 of the normal cone, i=1 λi ∇gi (¯ x). As x ¯ ∈ Rn , n by Example 2.38, the normal cone NR (¯ x) = {0}, which implies m X i=1
λi ∇gi (¯ x) = 0.
This equality along with the condition (5.20) and the affineness of gi , i = 1, 2, . . . , m implies that m X i=1
λi gi (x) = 0, ∀ x ∈ Rn ,
thereby contradicting condition (ii) in the definition of pseudonormality. Hence the constraint set is pseudonormal. (b) Suppose on the contrary that the constraint set is not pseudonormal. As the Slater-type constraint qualification holds, there exists xˆ ∈ X such that gi (ˆ x) < 0, i = 1, 2, . . . , m, condition (i) is satisfied only if λi = 0, i = 1, 2, . . . , m, which contradicts condition (ii) in Definition 5.7. Therefore, the constraint set is pseudonormal. In case the Slater-type constraint qualification is satisfied, the set G intersects the orthant {x ∈ Rm : xi ≤ 0, i = 1, 2, . . . , m} as shown in Figure 5.1. Then obviously condition (i) in the definition of pseudonormality does not hold; that is, there exists no hyperplane H passing through origin supporting G such that G lies in the positive orthant. Now when one has the linearity criterion, that is, X = Rn and gi , i = 1, 2, . . . , m, are affine, the set G is also affine (see Figure 5.1) and thus, condition (ii) is violated; that is, the hyperplane H does not contain the set G completely. In the linearity criterion, if X is a polyhedron instead of X = Rn along with gi , i = 1, 2, . . . , m, being affine,
© 2012 by Taylor & Francis Group, LLC
5.3 Enhanced Fritz John Conditions under Restrictions
223
G
λ
H 0 FIGURE 5.2: Not pseudonormal.
pseudonormality need not hold as shown in Figure 5.2. These observations were made by Bertsekas, Ozdaglar, and Tseng [14]. We end this section by establishing the KKT optimality conditions similar to Theorem 5.5, under the pseudonormality of the constraint set. Theorem 5.9 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m are lsc and convex on the closed convex set X ⊂ Rn . Assume that the constraint set is pseudonormal. Then x ¯ is a point of minimizer of (CP 1) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that m X λi gi (x)} and λi gi (¯ x) = 0, i = 1, 2, . . . , m. f (¯ x) = min{f (x) + x∈X
i=1
Proof. Suppose that in the enhanced Fritz John optimality condition, Theorem 5.6, λ0 = 0. This implies 0 = min x∈X
m X
λi gi (x),
i=1
that is, λi ≥ 0, i = 1, 2, . . . , m, satisfies condition (i) in the definition of pseudonormality of the constraint set. As in the enhanced Fritz John condition λi , i = 0, 1, . . . , m, are not all simultaneously zero, there exists at least one i ∈ {1, 2, . . . , m} such that λi > 0, that is, I¯ is nonempty. Again by Theorem 5.6, there exist a sequence {xk } ⊂ X such that ¯ gi (xk ) > 0, ∀ i ∈ I,
which implies X i∈I¯
© 2012 by Taylor & Francis Group, LLC
λi gi (xk ) =
m X i=1
λi gi (xk ) > 0,
224
Enhanced Fritz John Optimality Conditions
that is, satisfying condition (ii) in Definition 5.7, thereby contradicting the fact that the constraint sets are pseudonormal. Thus, λ0 6= 0 and hence can be taken in particular as one, thereby establishing the optimality condition. Using the optimality condition along with the feasibility of x¯ leads to 0≤
m X i=1
λi gi (¯ x) ≤ 0,
that is, m X
λi gi (¯ x) = 0.
i=1
As the sum of nonpositive terms is zero if every term is zero, the above equality leads to λi gi (¯ x) = 0, i = 1, 2, . . . , m, thereby establishing the complementary slackness condition. Conversely, by the optimality condition, f (¯ x) ≤ f (x) +
m X i=1
λi gi (x), ∀ x ∈ X.
In particular, for any x feasible to (CP 1), that is, x ∈ X satisfying gi (x) ≤ 0, i = 1, 2, . . . , m, the above inequality reduces to f (¯ x) ≤ f (x), thus proving that x ¯ is a point of minimizer of (CP 1).
5.4
Enhanced Fritz John Conditions in the Absence of Optimal Solution
Up to now in this chapter, one observes two forms of enhanced Fritz John optimality conditions, one when the functions are convex over the whole space Rn while in the second scenario convexity of the functions is over the convex set X 6= Rn . The results obtained in Section 5.3 are in a form similar to strong duality. In all the results of enhanced Fritz John and KKT optimality conditions, it is assumed that the point of minimizer exists. But what if the convex programming problem (CP 1) has an infimum that is not attained? In such a case is it possible to establish a Fritz John optimality condition that can then be extended to KKT optimality conditions under the pseudonormality
© 2012 by Taylor & Francis Group, LLC
5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 225 condition? The answer is yes and we present a result from Bertsekas [12] and Bertsekas, Ozdaglar, and Tseng [14] to establish the enhanced Fritz John optimality conditions similar to those derived in Section 5.3. But in the absence of a point of minimizer of (CP 1), the multipliers are now dependent on the infimum, as one will observe in the theorem below. Theorem 5.10 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn and let finf < +∞ be the infimum of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that λ0 finf = inf {λ0 f (x) + x∈X
m X
λi gi (x)}.
i=1
Proof. If the infimum finf = −∞, then by the condition inf f (x) ≤
x∈X
inf
x∈X,gi (x)≤0,∀i
f (x) = finf ,
inf x∈X f (x) = −∞. Thus for λ0 = 1 and λi = 0, i = 1, 2, . . . , m, the requisite condition is obtained. Now suppose that finf is finite. To establish the Fritz John optimality condition we will invoke supporting hyperplane theorem. For that purpose, define a set in Rm+1 as M = {(d0 , d) ∈ R × Rm : there exists x ∈ X such that f (x) ≤ d0 , gi (x) ≤ di , i = 1, 2, . . . , m}. We claim that M is a convex set. For j = 1, 2, consider (dj0 , dj ) ∈ M, which implies that there exist xj ∈ X such that f (xj ) ≤ dj0
and
gi (xj ) ≤ dji , i = 1, 2, . . . , m.
As X is a convex set, for every µ ∈ [0, 1], y = µx1 + (1 − µ)x2 ∈ X. Also by the convexity of f and gi , i = 1, 2, . . . , m, f (y) ≤ µf (x1 ) + (1 − µ)f (x2 ) ≤ µd10 + (1 − µ)d20 , gi (y) ≤ µgi (x1 ) + (1 − µ)gi (x2 ) ≤ µd1i + (1 − µ)d2i , i = 1, 2, . . . , m, which implies that µ(d10 , d1 ) + (1 − µ)(d20 , d2 ) ∈ M for every µ ∈ [0, 1]. Hence M is a convex subset of R × Rm . Next we prove that (finf , 0) ∈ / int M. On the contrary, suppose that (finf , 0) ∈ int M, which by Definition 2.12 implies that there exists ε > 0 such that (finf − ε, 0) ∈ M. Thus, there exists x ∈ X such that f (x) ≤ finf − ε
and
gi (x) ≤ 0, i = 1, 2, . . . , m.
From the above condition it is obvious that x is a feasible point of (CP 1), thereby contradicting the fact that finf is the infimum of the problem (CP 1).
© 2012 by Taylor & Francis Group, LLC
226
Enhanced Fritz John Optimality Conditions
Hence (finf , 0) 6∈ int M. By the supporting hyperplane theorem, Theorem 2.26 (i), there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that λ0 finf ≤ λ0 d0 +
m X i=1
λi di , ∀ (d0 , d) ∈ M.
(5.21)
Let (d0 , d) = (d0 , d1 , . . . , dm ) ∈ M. Then for αi > 0, (d0 , . . . , di−1 , di + αi , di+1 , . . . , dm ) ∈ M, i = 0, 1, . . . , m. If for some i ∈ {0, 1, . . . , m}, λi < 0, then as the corresponding αi → +∞, it leads to a contradiction of (5.21). Therefore, λi ≥ 0 for i = 0, 1, . . . , m. It is easy to observe that (f (x), g(x)) = (f (x), g1 (x), g2 (x), . . . , gm (x)) ∈ M for any x ∈ X. Therefore, the condition (5.21) becomes λ0 finf ≤ λ0 f (x) +
m X i=1
λi gi (x), ∀ x ∈ X,
which implies λ0 finf
≤
inf {λ0 f (x) +
x∈X
≤
x∈X,gi (x)≤0,∀i
≤
x∈X,gi (x)≤0,∀i
inf
inf
m X
λi gi (x)}
i=1
{λ0 f (x) + λ0 f (x)
m X
λi gi (x)}
i=1
= λ0 finf , thereby leading to the Fritz John optimality condition, as desired.
Note that in Theorem 5.10, there is no complementary slackness condition. Under the Slater-type constraint qualification, that is, there exists xˆ ∈ X such that gi (ˆ x) < 0, i = 1, 2, . . . , m, it can be ensured that λ0 6= 0. Otherwise if λ0 = 0, from the Fritz John optimality condition, there exist λi ≥ 0, i = 1, 2, . . . , m, not all simultaneously zero such that m X i=1
λi gi (x) ≥ 0, ∀ x ∈ X,
which contradicts the Slater-type constraint qualification. This discussion can be stated as follows. Theorem 5.11 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn and let finf < +∞ be the infimum of (CP 1). Assume that the Slater-type
© 2012 by Taylor & Francis Group, LLC
5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 227 constraint qualification holds. Then there exist λi ≥ 0, i = 1, 2, . . . , m, such that finf = inf {f (x) + x∈X
m X
λi gi (x)}.
i=1
In Theorem 5.10, the Fritz John optimality condition is established in the duality form in the absence of any point of minimizer of (CP 1) but at the cost of the complementary slackness condition. Note that in Theorems 5.10 and 5.11, one requires the set X to be convex, but need not be closed. The enhanced Fritz John optimality condition similar to Theorem 5.6 has also been obtained in this scenario by Bertsekas, Ozdaglar, and Tseng [14] and Bertsekas [12]. The proof is similar to that of Theorem 5.6 but complicated as the point of minimizer does not exist. Theorem 5.12 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex set X ⊂ Rn and let finf < +∞ be the infimum of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that (i) λ0 finf = inf {λ0 f (x) + x∈X
m X
λi gi (x)}.
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ = 6 ∅, then there exists a sequence {xk } ⊂ X such that lim f (xk ) = finf
k→∞
lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m
and
k→∞
and for all k sufficiently large f (xk ) < finf
and
¯ gi (xk ) > 0, ∀ i ∈ I.
Proof. If for every x ∈ X, f (x) ≥ finf , then the result holds for λ0 = 1 and λi = 0, i = 1, 2, . . . , m. Now suppose that there exists an x ¯ ∈ X such that f (¯ x) < finf , thereby implying that finf is finite. Consider the minimization problem min f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ Xk .
(CP 1k )
In (CP 1k ), Xk is a closed convex subset of Rn defined as ¯ βk (0), ∀ k ∈ N Xk = X ∩ B
and β > 0 is chosen to be sufficiently large such that for every k, the constraint set {x ∈ Xk : gi (x) ≤ 0, i = 1, 2, . . . , m}
© 2012 by Taylor & Francis Group, LLC
228
Enhanced Fritz John Optimality Conditions
is nonempty. As f and gi , i = 1, 2, . . . , m, are lsc convex on X, they are lsc convex and coercive on Xk . Thus by the Weierstrass Theorem, Theorem 1.14, the problem (CP 1k ) has a point of minimizer, say x ¯k . As k → ∞, Xk → X and thus f (¯ xk ) → finf . Because Xk ⊂ X, finf ≤ f (¯ xk ). Define δk = f (¯ xk ) − finf . Observe that δk ≥ 0 for every k. If δk = 0 for some k, then x ¯ k ∈ Xk ⊂ X is a point of minimizer of (CP 1) and the result holds by Theorem 5.6 with finf = f (¯ xk ). Now suppose that δk > 0 for every k. For positive integers k and positive scalars r, consider the function Lk,r : Xk × Rm + → R given by Lk,r (x, α) = f (x) +
m X kαk2 δk2 2 kx . − x ¯ k + αi gi (x) − k 2 4k 2r i=1
Observe that the above function is similar to the saddle point function con1 sidered in Theorem 5.6 except that the term 3 kx − x ¯k2 is now replaced by k δk2 kx − x ¯k k2 . In Theorem 5.6, x ¯ is a point of minimizer of (CP 1) whereas 4k 2 here the infimum is not attained and thus the term involves x¯k , the point of minimizer of the problem (CP 1k ) and δk . Now working along the lines of proof of Theorem 5.6, Lk,r has a saddle point over Xk × Rm + , say (xk,r , αk,r ), which by the saddle point definition implies Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm + . (5.22) As Lk,r (., αk,r ) attains an infinimum over Xk at xk,r , Lk,r (xk,r , αk,r ) ≤ f (¯ xk ).
(5.23)
Also, from (5.22), L(xk,r , α) attains a supremum over α ∈ Rm + at αk,r i = rgi+ (xk,r ), i = 1, 2, . . . , m.
(5.24)
Lk,r (xk,r , αk,r ) ≥ f (xk,r ).
(5.25)
Therefore, Further, as in the proof of Theorem 5.6,
lim f (xk,r ) = f (¯ xk ).
r→∞
Note that in the proof, the problem (CP 1k ) is considered instead of (CP 1) as in Theorem 5.6 and hence the condition obtained involves the point of minimizer of (CP 1k ), that is, x ¯k . Now as δk = f (¯ xk )−finf , the above equality leads to lim f (xk,r ) = finf + δk . (5.26) r→∞
Now before continuing with the proof to obtain the multipliers for the Fritz John optimality condition, we present a lemma from Bertsekas, Ozdaglar, and Tseng [14].
© 2012 by Taylor & Francis Group, LLC
5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 229 √ Lemma 5.13 For sufficiently large k and every r ≤ 1/ δk , f (xk,r ) ≤ finf −
δk . 2
(5.27)
√ Furthermore, there exists a scalar rk ≥ 1/ δk such that f (xk,rk ) = finf −
δk . 2
(5.28)
Proof. Define δ = finf − f (¯ x), where x ¯ ∈ X is such that f (¯ x) < finf . For sufficiently large k, x ¯ ∈ Xk . As x ¯k is the point of minimizer of the problem (CP 1k ), f (¯ xk ) ≥ finf such that f (¯ xk ) → finf , thus for sufficiently large k, f (¯ xk ) − finf < finf − f (¯ x), which implies δk < δ. By the convexity of f over X and that of Xk ⊂ X, for λk ∈ [0, 1], f (yk ) ≤ λk f (¯ x) + (1 − λk )f (¯ xk ) = λk (finf − δ) + (1 − λk )(finf + δk ) = finf − λk (δk + δ) + δk ,
where yk = λk x ¯ +(1−λk )¯ xk . Because 0 ≤ δk < δ, 0 ≤ λk =
2δk in the above condition yields δk + δ
2δk < 1. Substituting δk + δ
f (yk ) ≤ finf − δk .
(5.29)
Again by the convexity assumptions of gi , i = 1, 2, . . . , m, along with the 2δk , feasibility of x ¯k for (CP 1k ), for λk = δk + δ gi (yk ) ≤ λk gi (¯ x) + (1 − λk )gi (¯ xk ) 2δk gi (¯ x), i = 1, 2, . . . , m. ≤ δk + δ From the saddle point condition (5.22) along with (5.24) and (5.25), f (xk,r ) ≤ Lk,r (xk,r , αk,r ) δ2 r = inf {f (x) + k2 kx − x ¯k k2 + kg + (x)k2 }. x∈Xk 4k 2 ¯ βk (0), As x, x ¯ k ∈ Xk ⊂ B kx − x ¯k k ≤ kxk + k¯ xk ≤ 2βk,
© 2012 by Taylor & Francis Group, LLC
(5.30)
230
Enhanced Fritz John Optimality Conditions
thereby reducing the preceding inequality to r f (xk,r ) ≤ f (x) + (βδk )2 + kg + (x)k2 , ∀ x ∈ Xk . 2 In particular, taking x = yk ∈ Xk in the above condition, which along with (5.29) and (5.30) implies that for sufficiently large k, 2rδk2 kg + (¯ x)k2 . (δk + δ)2 √ For sufficiently large k, δk → 0 and for every r ≤ 1/ δk , the above inequality reduces to (5.27). Now by the saddle point condition (5.22), which along with (5.24) implies that f (xk,r ) ≤ finf − δk + (βδk )2 +
Lk,r (xk,r , αk,r )
r δk2 kxk,r − x ¯k k2 + kg + (xk,r )k2 2 4k 2 δk2 r 2 = inf {f (x) + 2 kx − x ¯k k + kg + (x)k2 }. x∈Xk 4k 2
=
f (xk,r ) +
Consider r¯ > 0. Then for every r ≥ r¯, Lk,¯r (xk,¯r , αk,¯r )
= ≤ ≤ = ≤
δk2 r¯ kx − x ¯k k2 + kg + (x)k2 } x∈Xk 4k 2 2 δ2 r¯ f (xk,r ) + k2 kxk,r − x ¯k k2 + kg + (xk,r )k2 4k 2 r δ2 f (xk,r ) + k2 kxk,r − x ¯k k2 + kg + (xk,r )k2 4k 2 Lk,r (xk,r , αk,r ) r δ2 ¯k k2 + kg + (xk,¯r )k2 . f (xk,¯r ) + k2 kxk,¯r − x 4k 2 inf {f (x) +
Thus as r ↓ r¯, Lk,r (xk,r , αk,r ) → Lk,¯r (xk,¯r , αk,¯r ). Now for r ≤ r¯, f (xk,¯r ) +
δk2 r kxk,¯r − x ¯k k2 + kg + (xk,¯r )k2 4k 2 2 δ2 r¯ ≤ f (xk,¯r ) + k2 kxk,¯r − x ¯k k2 + kg + (xk,¯r )k2 4k 2 = Lk,¯r (xk,¯r , αk,¯r ) δ2 r¯ ≤ f (xk,r ) + k2 kxk,r − x ¯k k2 + kg + (xk,r )k2 4k 2 δ2 r r¯ − r + kg (xk,r )k2 = f (xk,r ) + k2 kxk,r − x ¯k k2 + kg + (xk,r )k2 + 4k 2 2 r¯ − r + kg (xk,r )k2 = Lk,r (xk,r , αk,r ) + 2 δ2 r r¯ − r + kg (xk,r )k2 . ≤ f (xk,¯r ) + k2 kxk,¯r − x ¯k k2 + kg + (xk,¯r )k2 + 4k 2 2
© 2012 by Taylor & Francis Group, LLC
5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 231 For every k, as gi , i = 1, 2, . . . , m, is lsc and coercive on Xk , {gi (xk,r )} is bounded below by inf gi (x), which exists by the Weierstrass Theorem, Thex∈Xk
orem 1.14. Therefore as r ↑ r¯, Lk,r (xk,r , αk,r ) → Lk,¯r (xk,¯r , αk,¯r ), which along with the previous case of r ↓ r¯ leads to the continuity of Lk,r (xk,r , αk,r ) in r, that is, as r → r¯, Lk,r (xk,r , αk,r ) → Lk,¯r (xk,¯r , αk,¯r ). By the conditions (5.23) and (5.25) xk,r belongs to the compact set {x ∈ Xk : f (x) ≤ f (¯ xk )} for every k and therefore {xk,r } is a bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, as r → r¯, it has convergent subsequence. Without loss of generality, let xk,r → x ˆk , where x ˆk ∈ {x ∈ Xk : f (x) ≤ f (¯ xk )}. The continuity of Lk,r (xk,r , αk,r ) in r along with the lower semicontinuity of f and gi , i = 1, 2, . . . , m, leads to Lk,¯r (xk,¯r , αk,¯r )
=
lim Lk,r (xk,r , αk,r )
r→¯ r
r δk2 kxk,r − x ¯k k2 + kg + (xk,r )k2 } r→¯ r 4k 2 2 r¯ δ2 xk − x ¯k k2 + kg + (ˆ ≥ f (ˆ xk ) + k2 kˆ xk )k2 4k 2 δ2 r¯ ≥ inf {f (x) + k2 kx − x ¯k k2 + kg + (x)k2 } x∈Xk 4k 2 = Lk,¯r (xk,¯r , αk,¯r ), =
lim {f (xk,r ) +
which implies x ˆk is the point of minimizer of f (x) +
δk2 r¯ kx − x ¯k k2 + kg + (x)k2 4k 2 2
over Xk . As a strict convex function has unique point of minimizer and f (x) + r¯ δk2 kx − x ¯k k2 + kg + (x)k2 is strictly convex, x ˆk = xk,¯r . 4k 2 2 We claim that f (xk,r ) → f (xk,¯r ) as r → r¯. As f is lsc, we will prove the upper semicontinuity of f in r. On the contrary, suppose that f (xk,¯r ) < lim supr→¯r f (xk,r ). As r → r¯, Lk,r (xk,r , αk,r ) → Lk,¯r (xk,¯r , αk,¯r ) and xk,r → x ˆk = xk,¯r , which implies that f (xk,¯r ) +
δk2 r¯ kxk,¯r − x ¯k k2 + lim inf kg + (xk,r )k2 2 r→¯ r 4k 2 < lim sup Lk,r (xk,r , αk,r ) r→¯ r
= Lk,¯r (xk,¯r , αk,¯r ) r¯ δ2 ¯k k2 + kg + (xk,¯r )k2 . = f (xk,¯r ) + k2 kxk,¯r − x 4k 2 But the above inequality is a contradiction of the lower semicontinuity of gi , i = 1, 2, . . . , m. Therefore, f (xk,r ) is continuous in r. Now by (5.26), for sufficiently large k, lim f (xk,r ) = finf + δk .
r→+∞
© 2012 by Taylor & Francis Group, LLC
232
Enhanced Fritz John Optimality Conditions
Therefore, taking ε =
3δk , for r sufficiently large, 2
|f (xk,r ) − (finf + δk )|
√ . As f (xk,r ) is continuous in r, by the δk √ Intermediate Value Theorem, there exists rk ≥ 1/ δk such that f (xk,r ) = finf − that is, (5.28) holds.
δk , 2
Now we continue proving the theorem. From the conditions (5.23) and (5.25), m X δk2 2 f (xk,r ) ≤ Lk,r (xk,r , αk,r ) ≤ inf {f (x) + 2 kx − x ¯k k + αk,r i gi (x)} x∈Xk 4k i=1
= f (xk,r ) + ≤ f (¯ xk ).
© 2012 by Taylor & Francis Group, LLC
m X δk2 2 kx − x ¯ k + αk,r i gi (xk,r ) k,r k 4k 2 i=1
5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 233 1 For r = rk ≥ √ , the above condition along with (5.28) and the fact that as δk k → +∞, f (¯ xk ) → finf and δk → 0 imply that lim f (xk,rk ) − finf +
k→∞
m X δk2 2 kx − x ¯ k + αk,rk i gi (xk,rk ) = 0. k,rk k 4k 2 i=1
(5.32)
Define v u m X u 2 αk,r , γk = t1 + ki
λk0 =
i=1
1 , γk
λki =
αk,rk i , i = 1, 2, . . . , m. γk
(5.33)
As αk,rk ∈ Rm + , δk ≥ 1 for every k. Therefore, dividing (5.32) by γk along with the relation (5.33) leads to lim
k→∞
λk0 f (xk,rk )
−
λk0 finf
m X δk2 λk0 2 kxk,rk − x + ¯k k + λki gi (xk,rk ) = 0. (5.34) 4k 2 i=1
By the saddle point condition (5.22), Lk,rk (xk,rk , αk,rk ) ≤ Lk,rk (x, αk,rk ), ∀ x ∈ Xk . Dividing the above inequality throughout by γk along with the fact that kx − x ¯k k ≤ 2βk for every x ∈ Xk implies that λk0 f (xk,rk ) +
m X δk2 λk0 2 kx − x ¯ k + λki gi (xk,rk ) k,r k k 4k 2 i=1
≤ λk0 f (x) +
m X δk2 λk0 2 kx − x ¯ k + λki gi (x). k 4k 2 i=1
From (5.33), for every k, λki ≥ 0, i = 0, 1, . . . , m, such that (λk0 )2 +
m X
(λki )2 = 1.
i=1
Therefore {λki }, i = 1, 2, . . . , m are bounded sequences and hence, by the Bolzano–Weierstrass Theorem, Proposition 1.3, have convergent subsequences. Without loss of generality, assume that λki → λi , i = 1, 2, . . . , m, with λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero. Taking the limit as k → +∞ in the above inequality along with (5.34) leads to λ0 finf ≤ λ0 f (x) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi (x), ∀ x ∈ X.
234
Enhanced Fritz John Optimality Conditions
Therefore, λ0 finf
≤
inf {λ0 f (x) +
x∈X
≤
x∈X,gi (x)≤0,∀ i
≤
x∈X,gi (x)≤0,∀i
inf
inf
m X
λi gi (x)}
i=1
{λ0 f (x) +
λ0 f (x)
m X
λi gi (x)}
i=1
= λ0 finf , which leads to condition (i). Now dividing the condition (5.24) by γk , which along with (5.33) leads to λki =
rk gi+ (xk,rk ) , i = 1, 2, . . . , m. γk
As k → +∞, rk gi+ (xk,rk ) , i = 1, 2, . . . , m. k→∞ γk
λi = lim
In the beginning of the proof, we assumed that there exists x¯ ∈ X satisfying f (¯ x) < finf , which along with the condition (i) implies that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Otherwise, if I¯ is empty, then inf λ0 f (x) ≤ λ0 f (¯ x) < λ0 finf ,
x∈X
¯ λi > 0, which implies there which is a contradiction to condition (i). For i ∈ I, k k exists a sequence λi → λi such that λi > 0. Therefore, gi+ (xk,rk ) > 0, that is, ¯ gi (xk,rk ) > 0, ∀ i ∈ I. 1 In particular, for r = rk ≥ √ , conditions (5.23) and (5.24) yield δk f (xk,rk ) +
rk + δ2 rk kg (xk,rk )k2 ≤ f (xk,rk ) + k2 kxk,rk − x ¯k k2 + kg + (xk,rk )k2 2 4k 2 ≤ f (¯ xk ),
which along with (5.28) and the relation δk = f (¯ xk ) − finf ≥ 0 implies that rk kg + (xk,rk )k2 ≤ 3δk . √ As k → +∞, δk → 0 and rk ≥ 1/ δk → ∞, the above inequality leads to g + (xk,rk ) → 0, that is, lim sup gi (xk,rk ) ≤ 0, i = 1, 2, . . . , m. k→∞
© 2012 by Taylor & Francis Group, LLC
5.5 Enhanced Dual Fritz John Optimality Conditions
235
Also, from the condition (5.28), f (xk,rk ) < finf
and
lim f (xk,rk ) = finf .
k→∞
Thus, the condition (ii) is satisfied by the sequence {xk,rk } ⊂ X, thereby yielding the desired result. Under the Slater-type constraint qualification, the multiplier λ0 can be ensured to be nonzero and hence can be normalized to one.
5.5
Enhanced Dual Fritz John Optimality Conditions
In this chapter we emphasize the enhanced Fritz John conditions. As observed in Section 5.4, we dealt with the situation where the infimum of the original problem (CP 1) exists but is not attained. Those results were extended to the dual scenario where the dual problem has a supremum but not attained by Bertsekas, Ozdaglar and Tseng [14]. Now corresponding to the problem (CP 1), the associated dual problem is sup w(λ)
subject to
λ ∈ Rm +
(DP 1)
where w(λ) = inf L(x, λ) with x∈X
m f (x) + X λ g (x), i i L(x, λ) = i=1 −∞,
λ ∈ Rm +, otherwise.
Before presenting the enhanced dual Fritz John optimality condition, we first prove a lemma that will be required in establishing the theorem. Lemma 5.14 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the convex set X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the infimum of (CP 1) and for every δ > 0, assume that fδ =
inf
x∈X,gi (x)≤δ,∀i
f (x).
Then the supremum of (DP 1), wsup , satisfies fδ ≤ wsup for every δ > 0 and wsup = lim fδ . δ↓0
Proof. For the problem (CP 1), as the infimum finf exists and finf < +∞, the feasible set of (CP 1) is nonempty, that is, there exists x ¯ ∈ X satisfying gi (¯ x) ≤ 0, i = 1, 2, . . . , m. Thus for δ > 0, the problem
© 2012 by Taylor & Francis Group, LLC
236 inf f (x)
Enhanced Fritz John Optimality Conditions gi (x) ≤ δ, i = 1, 2, . . . , m,
subject to
x ∈ X,
(CP 1δ )
satisfies the Slater-type constraint qualification as x¯ ∈ X with gi (¯ x) < δ, i = 1, 2, . . . , m. Therefore, by Theorem 5.11, there exist λδi ≥ 0, i = 1, 2, . . . , m, such that fδ
= ≤
inf {f (x) +
x∈X
inf {f (x) +
x∈X
δ
m X i=1
m X
λδi gi (x) − δ
m X i=1
λδi }
λδi gi (x)}
i=1
= w(λ ) ≤ sup w(λ) = wsup . λ∈Rm +
Therefore, for every δ > 0, fδ ≤ wsup and hence lim fδ ≤ wsup .
(5.35)
δ↓0
Now as δ → 0, the feasible region of (CP 1δ ) decreases and thus fδ is nondecreasing as δ ↓ 0 and for every δ > 0, fδ ≤ finf . This leads to two cases, either limδ→0 fδ > −∞ or limδ→0 fδ = −∞. If limδ→0 fδ > −∞, then fδ > −∞ for every δ > 0 sufficiently small. For those δ > 0, choosing xδ ∈ X such that gi (xδ ) ≤ δ, i = 1, 2, . . . , m, and f (xδ ) ≤ fδ + δ. Such xδ are called almost δ-solution of (CP 1), the concept that will be dealt with in Chapter 10. Therefore for λ ∈ Rm +, w(λ)
=
inf {f (x) +
x∈X
≤ f (xδ ) +
m X
m X
λi gi (x)}
i=1
λi gi (xδ ) i=1 m X
≤ fδ + δ + δ
λi .
i=1
Taking the limit as δ → 0 in the above inequality leads to w(λ) ≤ lim fδ , ∀ λ ∈ Rm +, δ→0
which implies wsup ≤ limδ→0 fδ . If limδ→0 fδ = −∞, then for δ > 0, choose xδ ∈ X such that gi (xδ ) ≤ δ, i = 1, 2, . . . , m, and f (xδ ) ≤ −1/δ. As in the previous case, for λ ∈ Rm +, w(λ) ≤
© 2012 by Taylor & Francis Group, LLC
m X −1 +δ λi , δ i=1
5.5 Enhanced Dual Fritz John Optimality Conditions
237
which leads to w(λ) = −∞ for every λ ∈ Rm + as δ ↓ 0 and hence, wsup = −∞ = limδ→0 fδ . From both these cases along with the condition (5.35), the requisite result is established. Finally, we present the enhanced dual Fritz John optimality conditions obtained by Bertsekas, Ozdaglar, and Tseng [14], which are expressed with respect to the supremum of the dual problem (DP 1). Theorem 5.15 Consider the convex programming problem (CP 1) where the functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex set X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the infimum of (CP 1) and wsup > −∞ be the supremum of (DP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that (i) λ0 wsup = inf {λ0 f (x) + x∈X
m X
λi gi (x)}.
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ = 6 ∅, then there exists a sequence {xk } ⊂ X such that lim f (xk ) = wsup
k→∞
lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m,
and
k→∞
and for all k sufficiently large f (xk ) < wsup
and
¯ gi (xk ) > 0, ∀ i ∈ I.
Proof. By the weak duality, wsup ≤ finf , which along with the hypothesis implies that finf and wsup are finite. For k = 1, 2, . . ., consider the problem: 1 , i = 1, 2, . . . , m, x ∈ X. (CP 1k ) k4 k By Lemma 5.14, the infimum finf of (CP 1k ) satisfies the condition k finf ≤ wsup for every k. For each k, consider x ˆk ∈ X such that min f (x)
subject to
f (ˆ xk ) ≤ wsup +
1 k2
gi (x) ≤
and
gi (ˆ xk ) ≤
1 , i = 1, 2, . . . , m. k4
(5.36)
Now consider another problem: min f (x)
subject to
gi (x) ≤
1 , i = 1, 2, . . . , m, k4
ˆk , x∈X
ˆ 1k ) (CP
ˆ k = X ∩ {x ∈ Rn : kxk ≤ k(maxj=1,...,k kˆ where X xj k + 1)} is a compact set. By the lower semicontinuity and convexity of f and gi , i = 1, 2, . . . , m, ˆ k . Therefore, by the over X, the functions are lsc convex and coercive on X ˆ 1k ) has a point of minimizer, say Weierstrass Theorem, Theorem 1.14, (CP ˆ 1k ) which leads to x ¯k . From (5.36), x ˆk is feasible for (CP f (¯ xk ) ≤ f (ˆ xk ) ≤ wsup +
© 2012 by Taylor & Francis Group, LLC
1 . k2
(5.37)
238
Enhanced Fritz John Optimality Conditions
For every k, define the Lagrangian function as Lk (x, α) = f (x) +
m X i=1
αi gi (x) −
kαk2 2k
and the set ˆ k ∩ {x ∈ Rn : gi (x) ≤ k, i = 1, 2, . . . , m}. Xk = X
(5.38)
For a fixed α ∈ Rm + , Lk (., α) is lsc convex and coercive on Xk by the lower ˆk semicontinuity convexity and coercivity of f and gi , i = 1, 2, . . . , m, on X whereas for a given x ∈ Xk , Lk (x, .) is quadratic negative definite in α. Then by the Saddle Point Theorem, Proposition 4.1, Lk has a saddle point over Xk × Rm + , say (xk , αk ), that is, Lk (xk , α) ≤ Lk (xk , αk ) ≤ L(x, αk ), ∀ x ∈ Xk , ∀ α ∈ Rm +. Because Lk (xk , .) is quadratic negative definite, it attains a supremum over Rm + at αik = kgi+ (xk ), i = 1, 2, . . . , m. (5.39) Also, as Lk (., αk ) attains the infimum over Xk at xk , which along with (5.37), (5.38) and (5.39) implies Lk (xk , αk )
= f (xk ) +
m X i=1
≤ f (xk ) + = ≤
m X
αik gi (xk ) −
kαk2 2k
αik gi (xk )
i=1
inf {f (x) + k
x∈Xk
inf
m X
gi+ (xk )gi (x)}
i=1
x∈Xk ,gi (x)≤ k14 ,∀i
{f (x) + k
m X
gi+ (xk )gi (x)}.
i=1
As xk ∈ Xk , gi (xk ) ≤ k for i = 1, 2, . . . , m. Therefore, the above inequality leads to m Lk (xk , αk ) ≤ inf 1 {f (x) + 2 } k x∈Xk ,gi (x)≤ k4 ,∀i m = f (¯ xk ) + 2 k m+1 ≤ wsup + . (5.40) k2 Due to the finiteness of wsup , there exists a sequence {µk } ⊂ Rm + satisfying w(µk ) → wsup
© 2012 by Taylor & Francis Group, LLC
and
kµk k2 → 0, 2k
(5.41)
5.5 Enhanced Dual Fritz John Optimality Conditions
239
which is ensured by choosing µk as the point of maximizer of the problem max w(α)
kαk ≤ k 1/3 , α ∈ Rm +.
subject to
Thus for every k, Lk (xk , αk )
= ≥ =
sup inf Lk (x, α)
x∈Xk α∈Rm +
sup inf Lk (x, α)
x∈X α∈Rm +
sup { inf {f (x) +
x∈X α∈Rm +
m X i=1
αi gi (x)} −
kαk2 } 2k
kαk2 = sup {w(α) − } 2k α∈Rm + ≥ w(µk ) −
kµk k2 . 2k
(5.42)
From the conditions (5.40) and (5.42), w(µk ) −
kµk k2 2k
≤ f (xk ) + ≤ f (xk ) +
m X i=1
m X
αik gi (xk ) −
kαk k2 2k
αik gi (xk )
i=1
m+1 . ≤ wsup + k2
(5.43)
Taking the limit as k → +∞ in the above inequality, which along with (5.41) implies that m X lim {f (xk ) − wsup + αik gi (xk )} = 0. (5.44) k→∞
i=1
Define
v u m X u (αik )2 , γk = t1 + i=1
λk0 =
1 γk
λki =
and
αik , i = 1, 2, . . . , m. (5.45) γk
Rm +,
As αk ∈ from the above condition it is obvious that γk ≥ 1 for every k and thus dividing (5.44) by γk yields lim {λk0 (f (xk ) − wsup ) +
k→∞
m X
λki gi (xk )} = 0.
i=1
As xk minimizes Lk (., αk ) over Xk , f (xk ) +
m X i=1
© 2012 by Taylor & Francis Group, LLC
αik gi (xk ) ≤ f (x) +
m X i=1
αik gi (x), ∀ x ∈ Xk ,
(5.46)
240
Enhanced Fritz John Optimality Conditions
which on dividing throughout by γk leads to λk0 f (xk )
+
m X
λki gi (xk )
i=1
≤
λk0 f (x)
+
m X i=1
λki gi (x), ∀ x ∈ Xk .
From the condition (5.45), (λk0 )2 +
m X
(λki )2 = 1,
i=1
which implies that the sequences {λki } ⊂ R+ , i = 0, 1, . . . , m, are bounded and thus by Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent subsequence. Without loss of generality, let λki → λi with λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero. Therefore, as k → +∞ in the preceding inequality, which along with (5.46) yields λ0 wsup ≤ λ0 f (x) +
m X i=1
λi gi (x), ∀ x ∈ X,
which leads to λ0 wsup ≤ inf {λ0 f (x) + x∈X
m X
λi gi (x)}.
(5.47)
i=1
If λ0 > 0, then from the above inequality (5.47), wsup ≤ inf {f (x) + x∈X
m X λi gi (x)} = w(λ/λ0 ) ≤ wsup , λ i=1 0
thereby satisfying condition (i). If λ0 = 0, then the relation (5.45) reduces to 0 ≤ inf
x∈X
m X
λi gi (x).
i=1
As finf exists and is finite, the feasible set of (CP 1) is nonempty, which implies that there exists x ∈ X satisfying gi (x) ≤ 0, i = 1, 2, . . . , m. Therefore, the above condition becomes 0 = inf
x∈X
m X
λi gi (x).
i=1
Therefore for both cases, condition (i) holds, that is, λ0 wsup = inf {λ0 f (x) + x∈X
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi gi (x)}.
5.5 Enhanced Dual Fritz John Optimality Conditions
241
Now suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Dividing the condition (5.39) throughout by γk and using (5.45), λki =
kgi+ (xk ) , i = 1, 2, . . . , m. δk
As k → +∞, λki → λi , i = 1, 2, . . . , m, thereby reducing the above equality to kgi+ (xk ) , i = 1, 2, . . . , m. k→∞ δk
λi = lim
¯ λi > 0, which implies for sufficiently large k, g + (xk ) > 0, that For any i ∈ I, i is, ¯ gi (xk ) > 0, ∀ i ∈ I. From the inequalities (5.43), for every k, k(f (xk ) − wsup ) + k By the condition (5.39), equality becomes
Pm
i=1
m X i=1
αik gi (xk ) ≤
αik gi (xk ) =
k(f (xk ) − wsup ) +
m X i=1
m+1 . k
1 kαk k2 . Therefore, the above ink
(αik )2 ≤
m+1 . k
Dividing the above inequality throughout by γk2 , which along with (5.45) implies that m
m+1 k(f (xk ) − wsup ) X k 2 + , (λi ) ≤ γk2 kγk2 i=1 which as k → +∞ yields lim sup k→∞
m X k(f (xk ) − wsup ) ≤− λ2i . 2 γk i=1
As I¯ is nonempty, the above inequality leads to lim sup k→∞
k(f (xk ) − wsup ) < 0, γk2
which for sufficiently large k implies that f (xk ) < wsup .
© 2012 by Taylor & Francis Group, LLC
(5.48)
242
Enhanced Fritz John Optimality Conditions
Now from (5.41) and (5.43), lim {f (xk ) − wsup +
k→∞
m X i=1
kαk k2 = 0, k→∞ 2k
αik gi (xk )} − lim
which by the condition (5.44) implies that kαk k2 = 0. k→∞ 2k lim
(5.49)
The condition (5.39) along with (5.41) and (5.43) leads to lim (f (xk ) − wsup ) +
k→∞
kαk k2 = 0, 2k
which together with (5.48) implies that f (xk ) → wsup . Also, (5.49) along with (5.39) and (5.48) yields lim k
k→∞
m X
(gi+ (xk ))2 = 0,
i=1
which shows that lim sup gi (xk ) ≤ 0. k→∞
¯ the sequence {xk } ⊂ X satisfies condition (ii), thereby Thus for nonempty I, establishing the desired result.
© 2012 by Taylor & Francis Group, LLC
Chapter 6 Optimality without Constraint Qualification
6.1
Introduction
In the last few chapters we saw how fundamental the role of constraint qualification is like the Slater constraint qualification in convex optimization. In Chapter 3 we saw that a relaxation of the Slater constraint qualification to the Abadie constraint qualification leads to an asymptotic version of the KKT conditions for the nonsmooth convex programming problems. Thus it is interesting to ask whether it is possible to develop necessary and sufficient optimality conditions for (CP ) without any constraint qualifications. Recently a lot of work has been done in this respect in the form of sequential optimality conditions. But to the best of our knowledge the first step in this direction was taken by Ben-Tal, Ben-Israel, and Zlobec [7]. They obtained the necessary and sufficient optimality conditions in the smooth scenario in the absence of constraint qualifications. This work was extended to the nonsmooth scenario by Wolkowicz [112]. All these studies involved direction sets, which we will discuss below. So before moving on with the discussion of the results derived by Ben-Tal, Ben-Israel, and Zlobec [7], and Wolkowicz [112], we present the notion of direction sets. Before that we introduce the definition of a blunt cone. A set K ⊂ Rn is said to be a cone (Definition 2.18) if λx ∈ K
whenever
λ ≥ 0 and x ∈ K,
whereas K is a blunt cone if K is a cone without origin, that is, 0∈ /K
and
λx ∈ K
if x ∈ K and λ > 0.
For example, R2+ \ {(0, 0)} is a blunt cone while the set K ⊂ R2 given as K = {(x, y) ∈ R2 : x = y} is not a blunt cone. Definition 6.1 Let φ : Rn → R be a given function and let x ¯ ∈ Rn be any given point. Then the set Dφrelation (¯ x) = {d ∈ Rn : there exists α ¯ > 0 such that
φ(¯ x + αd) relation φ(¯ x), ∀ α ∈ (0, α]}, ¯ 243
© 2012 by Taylor & Francis Group, LLC
244
Optimality without Constraint Qualification
where the relation can be =, ≤, . In particular, the set Dφ= is called the cone of directions of constancy that was considered by Ben-Tal, Ben-Israel, and Zlobec [7]. The other direction sets were introduced in the work of Wolkowicz [112]. We present certain examples of computing explicitly the set Dφ= (¯ x) from Ben-Tal, Ben-Israel, and Zlobec [7]. For a strictly convex function φ : Rn → R, Dφ= (¯ x) = {0} for any x ¯ ∈ Rn . Another interesting example from Ben-Tal, Ben-Israel, and Zlobec [7] is the cone of the directions of constancy for the so-called faithfully convex function given as φ(x) = h(Ax + b) + ha, xi + β, where h : Rn → R is strictly convex, A is an m × n matrix, b ∈ Rm , a ∈ Rn , and β ∈ R. The class of faithfully convex functions is quite broad, comprising all the strictly convex functions and quadratic convex functions. See Rockafellar [98] for more details. In the case of faithfully convex functions, A = Dφ (¯ x) = N ull = {d ∈ Rn : Ad = 0, ha, di = 0}, aT where N ull(S) is the null space of the matrix S. It is obvious that the null space is contained in Dφ= . For the sake of completeness, we provide an explanation for the reverse containment. We consider the following cases. 1. Ad = 0: Then by the definition of direction of constancy, ha, di = 0. 2. ha, di = 0: Suppose that d ∈ Dφ= (¯ x), which implies there exists α ¯ >0 such that h(A¯ x + αAd + b) = h(A¯ x + b), ∀ α ∈ (0, α]. ¯ Suppose Ad 6= 0, then A¯ x + αAd + b 6= A¯ x + b for every α ∈ (0, α]. ¯ Now two cases arise. If h(A¯ x + αAd ˆ + b) = h(A¯ x + b) for some α ˆ ∈ (0, α], ¯ then by the strict convexity of h, for every λ ∈ (0, 1), h(A¯ x + λˆ αAd + b) < (1 − λ)h(A¯ x + b) + λh(A¯ x + αAd ˆ + b), which implies h(A¯ x + αAd + b) < h(A¯ x + b), ∀ α ∈ (0, α) ˆ and hence, d ∈ / Dφ= (¯ x).
The second case is that h(A¯ x +αAd+b) 6= h(A¯ x +b) for every α ∈ (0, α]. ¯ x), which violates our assumption. Then again it implies that d 6∈ Dφ= (¯ Therefore, for d to be a direction of constancy, Ad = 0 .
© 2012 by Taylor & Francis Group, LLC
6.1 Introduction
245
3. Ad 6= 0, ha, di 6= 0: This implies d 6= 0. We will show that φ is strictly convex on the line segment [¯ x, x ¯ + αd]. ¯ Consider xi = x ¯ + αi d, i = 1, 2, where αi ∈ [0, α] ¯ and α1 6= α2 . Therefore x1 6= x2 . By the strict convexity of h, for every λ ∈ (0, 1), h(A(λx1 + (1 − λ)x2 ) + b) < λh(Ax1 + b) + (1 − λ)h(Ax2 + b), and by linearity of ha, .i, ha, λx1 + (1 − λ)x2 i = λha, x1 i + (1 − λ)ha, x2 i. Combining the above two conditions, φ(λx1 + (1 − λ)x2 ) < λφ(x1 ) + (1 − λ)φ(x2 ), ∀ λ ∈ (0, 1). This condition holds for every x1 , x2 ∈ [¯ x, x ¯ + αd] and thus φ is strictly convex on [¯ x, x ¯ + αd]. Hence as mentioned earlier, Dφ= (¯ x) = {0} for the strictly convex function φ. But this contradicts the fact that d 6= 0. Combining the above cases, we have Dφ= (¯ x) = {d ∈ Rn : Ad = 0, ha, di = 0}. Below we present some results on the direction sets that will be required in deriving the optimality conditions from Ben-Tal, Ben-Israel, and Zlobec [7], Ben-Tal and Ben-Israel [6], and Wolkowicz [112]. Proposition 6.2 (i) Consider a function φ : Rn → R and x ¯ ∈ Rn . Then Dφ= (¯ x) ⊂ {d ∈ Rn : φ′ (¯ x, d) = 0}. (ii) Consider a differentiable convex function φ : Rn → R and x ¯ ∈ Rn . Then = Dφ (¯ x) is a convex cone. (iii) Consider a convex function φ : Rn → R and x ¯ ∈ Rn . Then Dφ≤ (¯ x) is a < convex cone while Dφ (¯ x) is a convex blunt open cone. Also x) = {d ∈ Rn : φ′ (¯ x, d) ≤ 0} Dφ≤ (¯
and
Dφ< (¯ x) = {d ∈ Rn : φ′ (¯ x, d) < 0}.
(iv) Consider a convex function φ : Rn → R and x ¯ ∈ Rn . Assume that < Dφ (¯ x) 6= ∅ (equivalently 0 ∈ / ∂φ(¯ x)). Then (Dφ≤ (¯ x))◦ = cone ∂φ(¯ x). x), which implies there exists α ¯ > 0 such that Proof. (i) Consider d ∈ Dφ= (¯ φ(¯ x + αd) = φ(¯ x), ∀ α ∈ (0, α]. ¯
© 2012 by Taylor & Francis Group, LLC
246
Optimality without Constraint Qualification
Therefore, from the above condition, lim α↓0
φ(¯ x + αd) − φ(¯ x) = 0, α
which implies φ′ (¯ x, d) = 0, thereby yielding the desired result. (ii) Consider d ∈ Dφ= (¯ x), which implies there exists α ¯ > 0 such that φ(¯ x + αd) = φ(¯ x), ∀ α ∈ (0, α]. ¯ The above condition can be rewritten as φ(¯ x + α′ d′ ) = φ(¯ x), ∀ α′ ∈ (0, α ¯ ′ ], α ¯ α and α ¯ ′ = for any λ > 0. Also 0 ∈ Dφ= (¯ x). Therefore, λ λ = = λd ∈ Dφ (¯ x) for every λ ≥ 0 and hence Dφ (¯ x) is a cone. Now consider d1 , d2 ∈ Dφ= (¯ x). Then for i = 1, 2, there exists α ¯ i > 0 such that where d′ = λd, α′ =
φ(¯ x + αi di ) = φ(¯ x), ∀ αi ∈ (0, α ¯ i ]. Taking α ¯ = min{¯ α1 , α ¯ 2 } > 0, for i = 1, 2 the above condition becomes φ(¯ x + αdi ) = φ(¯ x), ∀ α ∈ (0, α]. ¯
(6.1)
For any λ ∈ [0, 1], consider d = λd1 + (1 − λ)d2 . The convexity of φ along with (6.1) on d1 and d2 yields φ(¯ x + αd)
=
φ(λ(¯ x + αd1 ) + (1 − λ)(¯ x + αd2 ))
≤ λφ(¯ x + αd1 ) + (1 − λ)φ(¯ x + αd2 ), ∀ α ∈ (0, α], ¯ that is, φ(¯ x + αd) ≤ φ(¯ x), ∀ α ∈ (0, α]. ¯
(6.2)
Again, by the convexity of φ for the differentiable case, for every α ∈ (0, α], ¯ ≥ φ(¯ x) + αh∇φ(¯ x), di
φ(¯ x + αd)
= φ(¯ x) + αλh∇φ(¯ x), d1 i + α(1 − λ)h∇φ(¯ x), d2 i.
(6.3)
For a differentiable convex function, φ′ (¯ x, d) = h∇φ(¯ x), di for any d ∈ Rn . Thus the relation in (i) becomes Dφ= (¯ x) ⊂ {d ∈ Rn : h∇φ(¯ x), di = 0}, which reduces the inequality (6.3) to φ(¯ x + αd) ≥ φ(¯ x), ∀ α ∈ (0, α] ¯
© 2012 by Taylor & Francis Group, LLC
6.1 Introduction
247
as d1 , d2 ∈ Dφ= (¯ x). This inequality along with the condition (6.2) implies that d ∈ Dφ= (¯ x), thereby leading to the convexity of Dφ= . (iii) We will prove the result for Dφ< (¯ x). Consider d ∈ Dφ< (¯ x), which implies there exists α ¯ > 0 such that φ(¯ x + αd) < φ(¯ x), ∀ α ∈ (0, α]. ¯ As done in (ii), the above inequality can be rewritten as φ(¯ x + α′ d′ ) < φ(¯ x), ∀ α′ ∈ (0, α ¯ ′ ], α ¯ α and α for any λ > 0. Note that 0 6∈ Dφ< (¯ ¯′ = x). λ λ < < Therefore, λd ∈ Dφ (¯ x) for every λ > 0 and hence Dφ (¯ x) is a blunt cone. Now consider d1 , d2 ∈ Dφ< (¯ x). Working along the lines of the proof in (ii), for i = 1, 2, φ(¯ x + αdi ) < φ(¯ x), ∀ α ∈ (0, α]. ¯ (6.4) where d′ = λd, α′ =
For any λ ∈ [0, 1], let d = λd1 + (1 − λ)d2 . The convexity of φ along with the condition (6.4) on d1 and d2 yields φ(¯ x + αd)
≤ λφ(¯ x + αd1 ) + (1 − λ)φ(¯ x + αd2 ) < φ(¯ x), ∀ α ∈ (0, α], ¯
thereby implying the convexity of Dφ< . From Definition 6.1, it is obvious that Dφ< is open by using the continuity of φ. Consider d ∈ Dφ< (¯ x), which implies that there exists α ¯ > 0 such that φ(¯ x + αd) < φ(¯ x), ∀ α ∈ (0, α]. ¯ By the convexity of φ, for every α ∈ (0, α], ¯ αhξ, di ≤ φ(¯ x + αd) − φ(¯ x) < 0, ∀ ξ ∈ ∂φ(¯ x). As dom φ = Rn , by Theorem 2.79 and Proposition 2.83, the directional derivative is the support function of the subdifferential, which along with the compactness of ∂φ is attained at some ξ ∈ ∂φ(¯ x). Thus φ′ (¯ x, d) < 0, which leads to Dφ< (¯ x) ⊂ {d ∈ Rn : φ′ (¯ x, d) < 0}. (6.5) Now consider d ∈ Rn such that φ′ (¯ x, d) < 0, that is, lim α↓0
φ(¯ x + αd) − φ(¯ x) < 0. α
Therefore, there exists α ¯ > 0 such that φ(¯ x + αd) < φ(¯ x), ∀ α ∈ (0, α], ¯
© 2012 by Taylor & Francis Group, LLC
248
Optimality without Constraint Qualification
which implies d ∈ Dφ< (¯ x), thereby establishing the equality in the relation (6.5). Working along the above lines of proof, readers are advised to prove the result for Dφ≤ (¯ x). (iv) As Dφ< (¯ x) is nonempty, (iii) implies that φ′ (¯ x, d) < 0, ∀ d ∈ Dφ< (¯ x). Because dom φ = Rn , by Theorem 2.79, the directional derivative acts as a support function of the subdifferential, which along with the above relation is equivalent to 0 6∈ ∂φ(¯ x). Therefore, by Proposition 3.4, cone ∂φ(¯ x) is closed. The proof can be worked along the lines of Proposition 3.9 by replacing S(¯ x) ≤ b by Dφ (¯ x) and cl S(¯ x) by the closed set cone ∂φ(¯ x).
Note that unlike (iii) where the relation holds as equality, one is able to prove only inclusion in (i) and not equality. For example, consider the strict convex function φ : R → R defined as φ(x) = x2 . For x ¯ = 0, Dφ= (¯ x) = {0} and ∇φ(¯ x) = 0. Observe that {d ∈ R : h∇φ(¯ x), di = 0} = R 6= {0} = Dφ= (¯ x). Hence, the equality need not hold in (i) even for a differentiable function. Also, for a differentiable function φ : Rn → R, if there are n linearly independent vectors di ∈ Dφ= (¯ x), i = 1, 2, . . . , n, then ∇φ(¯ x) = 0. Observe that one needs the differentiability assumption only in (ii). A careful look at the proof of (ii) shows that to prove the reverse inequality in (6.2), we make use of (i) under differentiability. So if φ is nondifferentiable, to prove the result one needs to assume that for some ξ ∈ ∂φ(¯ x), φ′ (¯ x, d) = hξ, di = 0 for every d ∈ Dφ= (¯ x). For a better understanding, we illustrate with an example from Ben-Tal, BenIsrael, and Zlobec [7]. Consider a convex nondifferentiable function φ : R2 → R defined as φ(x1 , x2 ) = max{x1 , x2 }. For x ¯ = (0, 0), ∂φ(¯ x) = co {(1, 0), (0, 1)} and Dφ= (¯ x) = {(d, 0) ∈ R2 : d ≤ 0} ∪ {(0, d) ∈ R2 : d ≤ 0}, which is not convex. Note that h(ξ¯1 , ξ¯2 ), (d, 0)i = 0 for ξ¯ = (0, 1) whereas ¯ ξ˜ ∈ ∂φ(¯ h(ξ˜1 , ξ˜2 ), (0, d)i = 0 for ξ˜ = (1, 0), that is, φ′ (¯ x, d) = 0 for ξ, x) with ˜ ξ¯ 6= ξ.
With all these discussions on the direction sets, we move on to study the work done by Ben-Tal, Ben-Israel, and Zlobec [7].
© 2012 by Taylor & Francis Group, LLC
6.2 Geometric Optimality Condition: Smooth Case
6.2
249
Geometric Optimality Condition: Smooth Case
Ben-Tal, Ben-Israel, and Zlobec [7] established the necessary and sufficient optimality conditions for (CP ) with the feasible set C given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, in the absence of any constraint qualifications in the smooth scenario. The result relates the point of minimizer of (CP ) with the inconsistency of a system. We present the result below. Throughout we will assume that the active index set I(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = 0} is nonempty. Theorem 6.3 Consider the convex programming problem (CP ) with C given by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then x ¯ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(¯ x), the system h∇f (¯ x, di < 0, h∇gi (¯ x), di < 0, i ∈ Ω, (CPΩ ) d ∈ Di= (¯ x), i ∈ Ω∗ = I(¯ x)\Ω is inconsistent where Di= (¯ x) = Dg=i (¯ x) for i ∈ Ω∗ . It is important to note that ∗ for Ω = I(¯ x), Ω = ∅ and then by convention we will consider d ∈ Rn .
Proof. We will prove that the negation of the result, that is, x¯ is not a point of minimizer of (CP ) if and only if there exists some subset Ω ⊂ I(¯ x) such that the system (CPΩ ) is consistent. Suppose that x ¯ is not a point of minimizer of (CP ), which implies that there exists a feasible point x˜ ∈ C of (CP ) such that f (˜ x) < f (¯ x). Therefore, by the convexity of the differentiable functions f and gi , i ∈ I(¯ x), Theorem 2.81, h∇f (¯ x), x ˜−x ¯i ≤ f (˜ x) − f (¯ x) < 0, h∇gi (¯ x), x ˜−x ¯i ≤ gi (˜ x) − gi (¯ x) ≤ 0, i ∈ I(¯ x), which implies for d = x ˜−x ¯, the system h∇f (¯ x), di h∇gi (¯ x), di
< 0, ≤ 0, i ∈ I(¯ x).
Define the subset Ω of I(¯ x) as Ω = {i ∈ I(¯ x) : h∇gi (¯ x), di < 0}. Therefore, d satisfies the system h∇f (¯ x), di h∇gi (¯ x), di
h∇gi (¯ x), di
© 2012 by Taylor & Francis Group, LLC
< 0, < 0, i ∈ Ω, =
0, i ∈ Ω∗ .
250
Optimality without Constraint Qualification
We claim that for every i ∈ Ω∗ , Di= (¯ x) = {d ∈ Rn : h∇gi (¯ x), di = 0}. By Proposition 6.2 (i), Di= (¯ x) ⊂ {d ∈ Rn : gi′ (¯ x, d) = 0} = {d ∈ Rn : h∇gi (¯ x), di = 0}.
(6.6)
Thus, to establish our claim, we will prove the reverse inclusion in the condition (6.6). Consider any i ∈ Ω∗ . Define a differentiable convex function Gi : R → R as Gi (λ) = gi (¯ x + λd). Therefore, ∇Gi (λ)
Gi (λ + δ) − Gi (λ) δ↓0 δ gi (¯ x + (λ + δ)d) − gi (¯ x + λd) , = lim δ↓0 δ
=
lim
which for λ = 0 along with the fact that i ∈ Ω∗ implies that ∇Gi (λ) = lim δ↓0
gi (¯ x + (λ + δ)d) = h∇gi (¯ x), di = 0. δ
By Proposition 2.75, ∇Gi is a nondecreasing over λ > 0, that is, ∇Gi (λ) ≥ ∇Gi (0) = 0, ∀ λ > 0. Therefore, Gi is a nondecreasing function over λ > 0, which implies that λ = 0 is a point of minimizer of Gi . Hence, gi (¯ x + λd) = Gi (λ) ≥ Gi (0) = 0, ∀ λ > 0
(6.7)
as i ∈ Ω∗ ⊂ I(¯ x). As x ˜=x ¯ + d is feasible to (CP ), for i ∈ Ω∗ , gi (¯ x + d) ≤ 0. Thus, for λ = 1, the condition (6.7) reduces to gi (¯ x + d) = 0. By the convexity of gi , gi (¯ x + λd)
= gi ((1 − λ)¯ x + λ(¯ x + d)) ≤ (1 − λ)gi (¯ x) + λgi (¯ x + d) = 0, ∀ λ ∈ (0, 1),
which by (6.7) yields gi (¯ x + λd) = 0, ∀ λ ∈ (0, 1]. Thus, d ∈ Di= (¯ x). Because d ∈ {d ∈ Rn : h∇gi (¯ x), di = 0} was arbitrary, Di= (¯ x) ⊃ {d ∈ Rn : h∇gi (¯ x), di = 0}, thereby proving the claim. As the claim holds for every i ∈ Ω∗ , x ¯ is not a point of minimizer of (CP ) implies that the system (CPΩ ) is consistent.
© 2012 by Taylor & Francis Group, LLC
6.2 Geometric Optimality Condition: Smooth Case
251
Conversely, suppose that the system (CPΩ ) is consistent for some subset Ω ⊂ I(¯ x), that is, h∇f (¯ x, di < 0, h∇gi (¯ x), di < 0, d∈
Di= (¯ x),
i ∈ Ω,
i ∈ Ω∗ = I(¯ x)\Ω.
(6.8) (6.9) (6.10)
From the inequality (6.8), lim α↓0
f (¯ x + αd) − f (¯ x) < 0, α
which implies there exists α ¯ f > 0 such that f (¯ x + αd) < f (¯ x), ∀ α ∈ (0, α ¯ f ].
(6.11)
Similarly, from the condition (6.9), there exist α ¯i > 0, i ∈ Ω such that gi (¯ x + αd) < gi (¯ x) = 0, ∀ α ∈ (0, α ¯ i ], i ∈ Ω.
(6.12)
From (6.10), d ∈ Di= (¯ x), i ∈ Ω∗ , which by Definition 6.1 implies that there ∗ exist α ¯ i > 0, i ∈ Ω such that gi (¯ x + αd) = gi (¯ x) = 0, ∀ α ∈ (0, α ¯ i ], i ∈ Ω∗ .
(6.13)
For i 6∈ I(¯ x), gi (¯ x) < 0. As gi , i ∈ / I(¯ x) is continuous on Rn , there exist α ¯ i > 0, i ∈ I(¯ x) such that gi (¯ x + αd) < 0, ∀ α ∈ (0, α ¯ i ], i ∈ I(¯ x).
(6.14)
Define α ¯ = min{¯ αf , α ¯1 , . . . , α ¯ m }. Therefore, the conditions (6.12), (6.13), and (6.14) hold for α ¯ as well, which implies x ¯ + αd ¯ ∈ C, that is, feasible for (CP ). By the strict inequality (6.11), f (¯ x + αd) ¯ < f (¯ x), thereby leading to the fact that x ¯ is not a point of minimizer of (CP ), as desired. We illustrate the above result by the following example. Consider the convex programming problem min −x1 + x2 subject to x1 + x2 + 1 ≤ 0, x22 ≤ 0. Observe that x ¯ = (−1, 0) is the point of minimizer of the above problem. The KKT optimality condition at x¯ is given by −1 1 0 0 + λ1 + λ2 = , 1 1 0 0
© 2012 by Taylor & Francis Group, LLC
252
Optimality without Constraint Qualification
which is not satisfied by any λi ≥ 0, i = 1, 2. For x ¯, I(¯ x) = {1, 2} with D1= (¯ x) = {(d1 , d2 ) ∈ R2 : d1 + d2 = 0}, = D2 (¯ x) = {(d1 , d2 ) ∈ R2 : d2 = 0}. Now consider the following systems as in Theorem 6.3: −d1 + d2 < 0, d1 + d2 = 0, d2 = 0.
(CP∅ )
−d1 + d2 < 0, d1 + d2 < 0, d2 = 0.
(CP1 )
−d1 + d2 < 0, 0 < 0, d1 + d2 = 0.
−d1 + d2 < 0, d1 + d2 < 0, 0 < 0.
(CP2 )
(CPI(¯x) )
Observe that all four systems are inconsistent. Therefore, by the above theorem, x ¯ is the point of minimizer of the problem. Now if we consider x ˜ = (−2, 0), which is feasible for the problem, I(˜ x) = {2} with D2= (˜ x) = D2= (¯ x). For x ˜, the system −d1 + d2 < 0, (CPI(˜x) ) 0 < 0. is inconsistent whereas −d1 + d2 < 0, d2 = 0.
(CP∅ )
is consistent. Thus, by Theorem 6.3, x ˜ is not the point of minimizer. Theorem 6.3 was expressed in terms of the inconsistency of a system for every subset Ω ⊂ I(¯ x). Next we present the result of Ben-Tal, Ben-Israel, and Zlobec [7] in terms of the Fritz John type optimality conditions. But before establishing that result, we state the Dubovitskii–Milyutin Theorem, which acts as a tool in the proof. Proposition 6.4 Consider open blunt convex cones C1 , C2 , . . . , Cm and convex cone Cm+1 . Then m+1 \ i=1
© 2012 by Taylor & Francis Group, LLC
Ci = ∅
6.2 Geometric Optimality Condition: Smooth Case
253
if and only if there exists yi ∈ Ci◦ , i = 1, 2, . . . , m, not all simultaneously zero such that y1 + y2 + . . . + ym + ym+1 = 0. Theorem 6.5 Consider the convex programming problem (CP ) with C given by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then x ¯ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(¯ x) the system X ◦ = 0 ∈ λ0 ∇f (¯ x) + λi ∇gi (¯ x) + (DΩ x)) , ∗ (¯ (CPΩ′ ) i∈Ω λ0 ≥ 0, λi ≥ 0, i ∈ Ω, not all simultaneously zeros is consistent, where
= DΩ x) ∗ (¯
=
\ Di= (¯ x),
i∈Ω∗ n
R ,
if Ω∗ 6= ∅, if Ω∗ = ∅.
Proof. From Theorem 6.3, x ¯ is a point of minimum of (CP ) if and only if for every subset Ω ⊂ I(¯ x), the system (CPΩ ) is inconsistent, which by the differentiability of f and gi , i ∈ Ω, along with Proposition 6.2 (iii) is equivalent to ! \ < < = Df (¯ x) ∩ Di (¯ x) ∩ DΩ x) = ∅, ∗ (¯ i∈Ω
where Di< (¯ x) = Dg 0 sufficiently small. Then using the condition (6.15), the system
© 2012 by Taylor & Francis Group, LLC
6.3 Geometric Optimality Condition: Nonsmooth Case ˜ < 0, h∇f (¯ x, di ˜ < 0, i ∈ I(¯ h∇gi (¯ x), di x),
255 (CPI(¯x) )
˜ thereby leading to the desired result. is consistent for d,
Note that in the example considered in this section, the regularization condition did not hold. As a matter of fact, the Slater constraint qualification was not satisfied.
6.3
Geometric Optimality Condition: Nonsmooth Case
The work of Ben-Tal, Ben-Israel, and Zlobec [7] was extended by Wolkowicz [112] to nonsmooth convex scenario. The latter not only studied the optimality conditions by avoiding constraint qualifications, but also gave a geometrical interpretation to what he termed as badly behaved constraints. Before discussing the contributions of Wolkowicz [112] toward the convex programming problem (CP ) with the feasible set C given by (3.1), we will define some notations. The equality set is given by I = = {i ∈ {1, 2, . . . , m} : gi (x) = 0, ∀ x ∈ C}. For x ¯ ∈ C, define I < (¯ x) = I(¯ x)\I = , where I(¯ x) is the active index set at x ¯. Observe that while I < (¯ x) depends on = x ¯, I is independent of any x ∈ C. Using the direction notations presented in the beginning of this chapter, Wolkowicz [112] defined the set of badly behaved constraints. Definition 6.8 For x ¯ ∈ C, the set of badly behaved constraints is given by \ I b (¯ x) = {i ∈ I = : (Di> (¯ x) ∩ S(¯ x))\cl Di= (¯ x) 6= ∅}, i∈I =
where S(¯ x) = {d ∈ Rn : gi′ (¯ x, d) ≤ 0, ∀ i ∈ I(¯ x)}. Recall that we introduced the set S(¯ x) in Section 3.3 and proved in Proposition 3.9 that
where b x) = { S(¯
X
i∈I(¯ x)
© 2012 by Taylor & Francis Group, LLC
b x), (S(¯ x))◦ = cl S(¯ λi ξi : λi ≥ 0, ξi ∈ ∂gi (¯ x), i ∈ I(¯ x)}.
256
Optimality without Constraint Qualification
The set I b (¯ x) is the set of constraints that create problems in KKT conditions. A characterization of the above set in terms of the directional derivative was stated by Wolkowicz [112] without proof. We present the result with proof for a better understanding. Theorem 6.9 Consider the convex programming problem (CP ) with C given by (3.1). Let i∗ ∈ I = . Then i∗ ∈ I b (¯ x) if and only if the system gi′∗ (¯ x, d) = 0, gi′ (¯ x, d) ≤ 0, ∀ i\ ∈ I(¯ x)\i∗ , (CPb ) d∈ / Di=∗ (¯ x) ∪ cl Di= (¯ x). i∈I =
is consistent.
Proof. Suppose that i∗ ∈ I b (¯ x), which implies there exists d∗ ∈ Rn such that ∗ d ∈ Di>∗ (¯ x), ∗ d ∈ S(¯ x\ ), ∗ / cl Di= (¯ x). d ∈ i∈I =
As d∗ ∈ Di>∗ (¯ x), d∗ ∈ / Di=∗ (¯ x), which along with the last condition implies d∗ ∈ / Di=∗ (¯ x) ∪ cl
\
Di= (¯ x).
(6.16)
i∈I =
Also, as d∗ ∈ Di>∗ (¯ x), by Definition 6.1 there exists α∗ > 0 such that gi∗ (¯ x + αd∗ ) > gi∗ (¯ x), ∀ α ∈ (0, α∗ ]. Therefore, lim α↓0
gi∗ (¯ x + αd∗ ) − gi∗ (¯ x) ≥ 0, α
which implies Because d∗ ∈ S(¯ x),
gi′∗ (¯ x, d∗ ) ≥ 0.
(6.17)
gi′ (¯ x, d∗ ) ≤ 0, ∀ i ∈ I(¯ x).
(6.18)
In particular, taking i∗ ∈ I = ⊆ I(¯ x) in the above inequality along with (6.17) yields gi′∗ (¯ x, d∗ ) = 0. (6.19) Combining the conditions (6.16), (6.18), and (6.19) together imply that d∗ solves the system (CPb ), thereby leading to its consistency.
© 2012 by Taylor & Francis Group, LLC
6.3 Geometric Optimality Condition: Nonsmooth Case
257
Conversely, suppose that (CPb ) is consistent, which implies there exists d∗ ∈ Rn such that gi′∗ (¯ x, d∗ ) = 0, gi′ (¯ x, d∗ ) ≤ 0, ∀ i \ ∈ I(¯ x)\i∗ , d∗ ∈ / Di=∗ (¯ x) ∪ cl Di= (¯ x). i∈I =
The first equality condition can be expressed as two inequalities given by gi′∗ (¯ x, d∗ ) ≤ 0
and
gi′∗ (¯ x, d∗ ) ≥ 0.
(6.20)
As i∗ ∈ I = ⊆ I(¯ x) along with the above condition yields d∗ ∈ S(¯ x).
(6.21)
Also, from the inequality (6.20), there exists α∗ > 0 such that gi∗ (¯ x + αd∗ ) ≥ gi∗ (¯ x), ∀ α ∈ (0, α∗ ]. As d∗ 6∈ Di=∗ (¯ x), the above inequality holds as a strict inequality and hence d∗ ∈ Di>∗ (¯ x). The conditions (6.21) and (6.22) along with the fact that d∗ ∈ / cl implies that d∗ ∈ I b (¯ x), thereby establishing the desired result.
(6.22) T
i∈I =
Di= (¯ x)
Observe that if Di=∗ (¯ x) = {d ∈ Rn : gi′∗ (¯ x, d) = 0}, then by the above characterization of the badly behaved constraints, i∗ 6∈ I b (¯ x). The class of functions that are never badly behaved includes the class of all continuous linear functionals and the classical distance function. For more on badly behaved constraints, one can refer to Wolkowicz [112]. Before moving any further, we present a few results from Wolkowicz [112] that act as a tool in the derivation of the characterization for the point of minimum. Proposition 6.10 Consider the convex programming problem (CP ) with C given by (3.1). Suppose that x ¯ ∈ C. Then \ \ \ (i) Di≤ (¯ x) = Di= (¯ x) ∩ Di≤ (¯ x). i∈I =
i∈I(¯ x)
(ii)
\
i∈I =
Di= (¯ x) ∩
\
i∈I < (¯ x)
i∈I < (¯ x)
Di< (¯ x) 6= ∅.
Furthermore, suppose that the set Ω satisfies I b (¯ x) ⊂ Ω ⊂ I = . If \ either co Di= (¯ x) is closed or Ω = I = , i∈Ω
then
© 2012 by Taylor & Francis Group, LLC
258
Optimality without Constraint Qualification \ (iii) TC (¯ x) = cl Di≤ (¯ x). i∈I(¯ x)
(iv) cl co
\
i∈Ω
Di= (¯ x) ∩ S(¯ x) = cl
(v) TC (¯ x) = cl co
\
i∈Ω
(vi) −co
[
i∈I < (¯ x)
\
i∈Ω
Di= (¯ x) ∩ S(¯ x) = cl
\
i∈I =
Di= (¯ x) ∩ S(¯ x).
Di= (¯ x) ∩ S(¯ x).
∂gi (¯ x) ∩ (
\
i∈Ω
Di= (¯ x))◦ = ∅.
Proof. (i) Observe that I(¯ x) = I = ∪ I < (¯ x), which implies \ Di≤ (¯ x) = {d ∈ Rn : there exists α ¯ > 0 such that i∈I(¯ x)
=
\
Di≤ (¯ x)
i∈I =
∩
\
gi (¯ x + αd) ≤ gi (¯ x), ∀ i ∈ I(¯ x)} Di≤ (¯ x).
i∈I < (¯ x)
For any d ∈ Di≤ (¯ x), there exists α ¯ > 0 such that gi (¯ x + αd) ≤ gi (¯ x) = 0, α ∈ (0, α], ¯ which implies x ¯ + αd ∈ C for every α ∈ (0, α]. ¯ As for every i ∈ I = , gi (x) = 0 for every feasible point x ∈ C of (CP ), thereby implying that for every i ∈ I = , gi (¯ x + αd) = gi (¯ x) = 0, α ∈ (0, α], ¯ which implies Di≤ (¯ x) = Di= (¯ x) for every i ∈ I = . Therefore, by this condition, \ \ \ Di≤ (¯ x) = Di= (¯ x) ∩ Di≤ (¯ x), i∈I(¯ x)
i∈I =
i∈I < (¯ x)
as desired. (ii) If I(¯ x) = ∅, the result holds trivially by (i). Suppose that I = and I < (¯ x) are nonempty. Then corresponding to any i ∈ I < , there exists some x ˆ ∈C such that gi (ˆ x) < 0. By the convexity of gi , for every λ ∈ (0, 1], gi (¯ x + λ(ˆ x−x ¯)) ≤ λgi (ˆ x) + (1 − λ)gi (¯ x) < 0 = gi (¯ x), which implies that dˆ = x ˆ−x ¯ ∈ Di< (¯ x). Also, suppose that there is some j ∈ I < (¯ x), j 6= i, then corresponding to j there exists some x ˜ ∈ C such that gj (˜ x) < 0. Then as before, d˜ = x ˜−x ¯ ∈ Dj< (¯ x). Now if i and j are such that gi (˜ x) = 0
© 2012 by Taylor & Francis Group, LLC
and
gj (ˆ x) = 0,
6.3 Geometric Optimality Condition: Nonsmooth Case
259
then by the convexity of gi and gj , for every λ ∈ (0, 1), gi (λˆ x + (1 − λ)˜ x) ≤ λgi (ˆ x) + (1 − λ)gi (˜ x) < 0, gj (λˆ x + (1 − λ)˜ x) ≤ λgj (ˆ x) + (1 − λ)gj (˜ x) < 0, which implies for λ ∈ (0, 1), (λˆ x + (1 − λ)˜ x) − x ¯ = λdˆ + (1 − λ)d˜ such that λdˆ + (1 − λ)d˜ ∈ Di< (¯ x)
λdˆ + (1 − λ)d˜ ∈ Dj< (¯ x).
and
Proceeding as above, there exists d¯ ∈ Rn such that d ∈ Di< (¯ x), ∀ i ∈ I < (¯ x)
(6.23)
with corresponding α ¯ > 0 such that x ¯ +αd ∈ C for every α ∈ (0, α]. ¯ Therefore, for every i ∈ I = , gi (¯ x + αd) = 0 = gi (¯ x), ∀ α ∈ (0, α], ¯ that is, d ∈ Di= (¯ x), ∀ i ∈ I = , which along with the condition (6.23) proves the desired result. (iii) Consider a feasible point x ∈ C of (CP ) that implies gi (x) ≤ 0, ∀ i ∈ I(¯ x). By the convexity of gi , i ∈ I(¯ x), for every λ ∈ (0, 1], gi (¯ x + λ(x − x ¯)) ≤ λgi (x) + (1 − λ)gi (¯ x) ≤ 0 = gi (¯ x), ∀ i ∈ I(¯ x). T Therefore, x − x ¯ ∈ i∈I(¯x) Di≤ (¯ x) for every x ∈ C, which implies (C − x ¯) ⊂
As
T
i∈I(¯ x)
\
Di≤ (¯ x).
i∈I(¯ x)
x) is a cone, Di≤ (¯ cone (C − x ¯) ⊂
Suppose that d ∈
T
i∈I(¯ x)
\
Di≤ (¯ x).
(6.24)
i∈I(¯ x)
x), which implies there exists α ¯ > 0 such that Di≤ (¯
gi (¯ x + αd) ≤ gi (¯ x) = 0, ∀ α ∈ (0, α], ¯ ∀ i ∈ I(¯ x). For i 6∈ I(¯ x), gi (¯ x) < 0 and thus, there exists some α′ > 0 such that for any n d∈R , gi (¯ x + αd) < 0, ∀ α ∈ (0, α′ ), ∀ i 6∈ I(¯ x).
© 2012 by Taylor & Francis Group, LLC
260
Optimality without Constraint Qualification
Therefore, by the preceding inequalities, x′ = x ¯ + αd ∈ C for α ∈ (0, α∗ ], ∗ ′ where α = min{¯ α, α }, which implies αd ∈ C − x ¯, thereby leading to d ∈ cone (C − x ¯), which along with the condition (6.24) yields \ Di≤ (¯ x) = cone (C − x ¯). i∈I(¯ x)
By Theorem 2.35, TC (¯ x) = cl cone (C − x ¯) = cl
\
Di≤ (¯ x),
i∈I(¯ x)
hence establishing the result. (iv) By the given hypothesis Ω ⊂ I = , which implies that the containment relation \ \ \ cl Di= (¯ x) ∩ S(¯ x) ⊂ cl Di= (¯ x) ∩ S(¯ x) ⊂ cl co Di= (¯ x) ∩ S(¯ x) (6.25) i∈I =
i∈Ω
i∈Ω
holds. To establish the result, we will prove the following: \ \ (1) cl co Di= (¯ x) ∩ S(¯ x) ⊂ cl (co Di= (¯ x) ∩ S(¯ x)). i∈Ω
If co cl co
\
i∈Ω
T
i∈Ω
= x) i∈Ω Di (¯
is closed, then
Di= (¯ x) ∩ S(¯ x) = co
\
i∈Ω
Di= (¯ x) ∩ S(¯ x) ⊂ cl (co
\
i∈Ω
Di= (¯ x) ∩ S(¯ x)),
thereby establishing the above condition. If Ω = I = , we prove \ \ cl co Di= (¯ x) ∩ S(¯ x) ⊂ cl (co Di= (¯ x) ∩ S(¯ x)). i∈I =
i∈I =
As S(¯ x) is a closed convex set and cl co
\
i∈I =
Also S(x) =
T
i∈I =
Si (¯ x) ∩
T
T
i∈I =
Di= (¯ x) ⊂ S(¯ x),
Di= (¯ x) ⊂ S(¯ x).
i∈I < (¯ x)
Si (¯ x), where
Si (¯ x) = {d ∈ Rn : gi′ (¯ x, d) ≤ 0}.
Therefore, establishing (6.26) is equivalent to proving \ \ \ x) ∩ x) ∩ cl co Di= (¯ Si (¯ x) ⊂ cl (co Di= (¯ i∈I =
© 2012 by Taylor & Francis Group, LLC
i∈I < (¯ x)
i∈I =
\
i∈I < (¯ x)
Si (¯ x)).
(6.26)
6.3 Geometric Optimality Condition: Nonsmooth Case By condition (ii), there exists d ∈ Rn such that \ \ \ d∈ Di= (¯ x) ∩ Di< (¯ x) ⊂ co Di= (¯ x) ∩ int i∈I =
i∈I =
i∈I < (¯ x)
\
261
Si (¯ x),
i∈I < (¯ x)
which yields the above condition. \ \ (2) co Di= (¯ x) ∩ S(¯ x) = Di= (¯ x) ∩ S(¯ x). i∈Ω
i∈Ω
By Proposition 6.2 (i) and (iii), \ \ ≤ Di= (¯ x) ⊂ Di (¯ x). i∈Ω
Because
Di≤ (¯ x)
i∈Ω
is convex, co
\
i∈Ω
Di= (¯ x) ⊂
\
Di≤ (¯ x).
(6.27)
i∈Ω
As Ω ⊂ I = , for every feasible point x ∈ C, gi (x) = 0, i ∈ Ω. For any d ∈ Di≤ (¯ x), i ∈ Ω, there exists α ¯ i > 0 such that gi (¯ x + αd) ≤ gi (¯ x) = 0, ∀ α ∈ (0, α ¯ i ], which implies x ¯ + αd ∈ C. Therefore, for any i ∈ Ω, gi (¯ x + αd) = 0, ∀ α ∈ (0, α ¯ i ], thereby implying that d ∈ Di≤ (¯ x), i ∈ Ω. Thus, the condition (6.27) becomes \ \ \ co Di= (¯ x) ⊂ Di= (¯ x) ⊂ co Di= (¯ x). i∈Ω
i∈Ω
i∈Ω
T
The above relation implies that i∈Ω Di= (¯ x) is convex, thereby leading to \ \ co Di= (¯ x) ∩ S(¯ x) = Di= (¯ x) ∩ S(¯ x), i∈Ω
i∈Ω
T as desired. Note that, in particular, for Ω = I = , i∈I = Di= (¯ x) is convex. \ \ (3) cl ( Di= (¯ x) ∩ S(¯ x)) ⊂ cl Di= (¯ x) ∩ S(¯ x). i∈I =
i∈Ω
Suppose that Ω ( I = . We claim that \ \ Di= (¯ x) ∩ S(¯ x) ⊂ cl Di= (¯ x) ∩ S(¯ x). i∈Ω
i∈I =
Assume on the contrary that there exists d ∈ Rn such that \ \ d∈ Di= (¯ x) ∩ S(¯ x) \ (cl Di= (¯ x) ∩ S(¯ x)). i∈Ω
© 2012 by Taylor & Francis Group, LLC
i∈I =
262
Optimality without Constraint Qualification
˜ ⊂ I = \ Ω such that By the given hypothesis, there exists Ω \ d ∈ S(¯ x), d ∈ Di= (¯ x) ˜ i∈I = \Ω
but d∈ / Di= (¯ x) ∪ cl
\
i∈I =
˜ Di= (¯ x), ∀ i ∈ Ω.
˜ ⊂ I = \ Ω ⊂ I = \ I b (¯ By the hypothesis I b (¯ x) ⊂ Ω ⊂ I = , Ω x), which implies b ˜ Ω 6⊂ I (¯ x). By invoking Theorem 6.9, the system (CPb ) is inconsistent and thus ˜ gi′ (¯ x, d) < 0, ∀ i ∈ Ω. Therefore, d∈
\
˜ i∈Ω
\
Di< (¯ x) ∩
Di= (¯ x).
(6.28)
˜ i∈I = \Ω
By (ii), as \
i∈I =
Di= (¯ x) ∩
\
i∈I < (¯ x)
there exists d¯ ∈ Rn such that \ d¯ ∈ Di= (¯ x) ∩ i∈I =
Di< (¯ x) 6= ∅,
\
Di< (¯ x).
(6.29)
i∈I < (¯ x)
¯ By condition (6.28), for i ∈ Ω ˜ there exists α Define dλ = λd + (1 − λ)d. ¯i > 0 such that gi (¯ x + αd) < gi (¯ x), ∀ α ∈ (0, α ¯ i ]. (6.30) = ˜ ˜ As Ω ⊂ I , by condition (6.29), for i ∈ Ω there exists α ˆ i > 0 such that gi (¯ x + αd) = gi (¯ x), ∀ α ∈ (0, α ˆ i ].
(6.31)
˜ along with conditions Denote αi = min{¯ αi , α ˆ i }. By the convexity of gi , i ∈ Ω (6.30) and (6.31), for λ ∈ (0, 1], ¯ < gi (¯ gi (¯ x + αdλ ) ≤ λgi (¯ x + αd) + (1 − λ)gi (¯ x + αd) x), ∀ α ∈ (0, αi ], which implies dλ ∈
\
˜ i∈Ω
Di< (¯ x), ∀ λ ∈ (0, 1].
Again from (6.28), d∈
© 2012 by Taylor & Francis Group, LLC
\
˜ i∈I = \Ω
Di= (¯ x) ⊂
\
˜ i∈I = \Ω
Di≤ (¯ x),
(6.32)
6.3 Geometric Optimality Condition: Nonsmooth Case
263
and from (6.29), d¯ ∈
\
i∈I =
Di= (¯ x) ⊂
\
˜ i∈I = \Ω
Di= (¯ x) ⊂
\
Di≤ (¯ x).
˜ i∈I = \Ω
˜ are convex sets, Because Di≤ (¯ x), i ∈ I = \ Ω \ Di≤ (¯ x), ∀ λ ∈ (0, 1). dλ ∈
(6.33)
˜ i∈I = \Ω
By Theorem 2.69, gi , i ∈ I < (¯ x), is continuous on Rn , which along with condition (6.29) implies that there exists β ∈ (0, 1) such that \ dλ ∈ Di< (¯ x), λ ∈ (0, β]. (6.34) i∈I < (¯ x)
Observe that ˜ ∪ Ω, ˜ I(¯ x) = I < (¯ x) ∪ I = \ Ω which along with (i) leads to \ \ Di≤ (¯ x) ∩ Di< (¯ x) = i∈I(¯ x)
˜ i∈Ω
\
i∈I < (¯ x)
Di≤ (¯ x) ∩
\
˜ i∈I = \Ω
Di≤ (¯ x) ∩
\
Di< (¯ x).
˜ i∈Ω
Therefore, combining (6.32), (6.33), and (6.34) along with the above relation yields \ \ dλ ∈ Di≤ (¯ x) ∩ Di< (¯ x). ˜ i∈Ω
i∈I(¯ x)
˜ ⊂ I = , which along with (i) implies As Ω \ \ \ Di≤ (¯ x) = Di≤ (¯ x) ∩ Di= (¯ x) ⊂ i∈I(¯ x)
i∈I =
i∈I < (¯ x)
\
i∈I < (¯ x)
Di≤ (¯ x) ∩
Thus, dλ ∈
\
Di= (¯ x),
\
Di< (¯ x).
˜ i∈Ω
which is a contradiction to dλ ∈
˜ i∈Ω
Therefore, \
i∈Ω
x) ∩ S(¯ x) ⊂ cl Di= (¯
© 2012 by Taylor & Francis Group, LLC
\
i∈I =
Di= (¯ x) ∩ S(¯ x).
\
˜ i∈Ω
Di= (¯ x).
264
Optimality without Constraint Qualification
Because cl
\
Di= (¯ x) and S(¯ x) are closed sets,
i∈I =
cl (
\
i∈Ω
Di= (¯ x) ∩ S(¯ x)) ⊂ cl
\
i∈I =
Di= (¯ x) ∩ S(¯ x),
thereby establishing the desired result when Ω ( I = . If Ω = I = , \
i∈I =
Di= (¯ x) ∩ S(¯ x) ⊂ cl
\
i∈I =
Di= (¯ x) ∩ S(¯ x),
thus yielding the desired condition as before. From the conditions (1) through (3), it is easy to observe that \ \ cl co Di= (¯ x) ∩ S(¯ x) ⊂ cl (co Di= (¯ x) ∩ S(¯ x)) i∈Ω
i∈Ω
= cl (
\
i∈Ω
Di= (¯ x) ∩ S(¯ x)) ⊂ cl
\
i∈I =
Di= (¯ x) ∩ S(¯ x),
which along with (6.25) yields the requisite result. (v) Using (iii) and (iv), it is enough to show that \ \ cl Di≤ (¯ x) = cl ( Di= (¯ x) ∩ S(¯ x)). i∈I =
i∈I(¯ x)
From (ii) and Proposition 6.2 (iii), it is obvious that \ \ cl Di≤ (¯ x) ⊂ cl ( Di= (¯ x) ∩ S(¯ x)).
(6.35)
To prove the result, we claim that \ \ Di= (¯ x) ∩ S(¯ x) ⊂ cl Di≤ (¯ x).
(6.36)
i∈I(¯ x)
i∈I =
Suppose that d ∈
T
i∈I =
i∈I(¯ x)
Di= (¯ x) ∩ S(¯ x). By (ii), there exists d¯ ∈ Rn such that \ \ d¯ ∈ Di= (¯ x) ∩ Di< (¯ x).
i∈I =
i∈I =
i∈I < (¯ x)
¯ Therefore, by Theorem 2.79 and Proposition 6.2, Denote dλ = λd[ + (1 − λ)d. for every ξ ∈ ∂gi (¯ x), i∈I < (¯ x)
¯ < 0, ∀ λ ∈ [0, 1), hξ, dλ i = λhξ, di + (1 − λ)hξ, di
© 2012 by Taylor & Francis Group, LLC
6.3 Geometric Optimality Condition: Nonsmooth Case
265
which again by Theorem 2.79 implies that for every i ∈ I < (¯ x), gi′ (¯ x, dλ ) < 0, ∀ λ ∈ [0, 1). Therefore, by Proposition 6.2 (iii), dλ ∈ Also, by the convexity of
\
Di< (¯ x), ∀ λ ∈ [0, 1).
i∈I < (¯ x)
T
i∈I =
dλ ∈
\
(6.37)
Di= (¯ x),
i∈I =
Di= (¯ x), ∀ λ ∈ [0, 1).
(6.38)
Thus, by the relations (6.37) and (6.38), along with (i), we obtain dλ ∈
\
i∈I < (¯ x)
Di< (¯ x) ∩
\
i∈I =
Di= (¯ x) ⊂
\
i∈I(¯ x)
Di≤ (¯ x), ∀ λ ∈ [0, 1).
As the limit λ → 1, dλ → d, which implies d ∈ cl (6.36), which yields that cl (
\
i∈I =
Di= (¯ x) ∩ S(¯ x)) ⊂ cl
\
T
i∈I(¯ x)
Di≤ (¯ x), thus proving
Di≤ (¯ x).
i∈I(¯ x)
The above condition along with (6.35) establishes the desired result. (vi) Define F = −co
[
∂gi (¯ x).
i∈I < (¯ x)
We will prove the result by contradiction. Assume that \ F ∩( Di= (¯ x))◦ 6= ∅, i∈Ω
which implies there exists ξ ∈F ∩(
\
Di= (¯ x))◦ .
i∈Ω
As ξ ∈ F , there exists ξi ∈ ∂gi (¯ x) and λi ≥ 0, i ∈ I < (¯ x) with such that X ξ=− λi ξi . i∈I < (¯ x)
© 2012 by Taylor & Francis Group, LLC
P
i∈I < (¯ x)
λi = 1
266
Optimality without Constraint Qualification
By Proposition 2.31, \ Di= (¯ x) i∈Ω
−F ◦
⊂
{ξ}◦ = {d ∈ Rn : hξ, di ≤ 0},
⊂
−{ξ}◦ = {d ∈ Rn : hξ, di ≥ 0}.
Therefore, −F ◦ ∩
\
Di= (¯ x) ⊂ {d ∈ Rn : hξ, di = 0}.
\
Di= (¯ x) ∩
i∈Ω
By (ii), there exists dˆ ∈
i∈I =
\
⊂
Di= (¯ x)
i∈Ω
\
Di< (¯ x)
i∈I < (¯ x)
∩ −F ◦ ⊂ {d ∈ Rn : hξ, di = 0},
ˆ = 0. As dˆ ∈ T < D< (¯ ¯ i > 0, i ∈ I < (¯ x), that is, hξ, di i∈I (¯ x) i x), there exists α such that ˆ < 0, ∀ αi ∈ (0, α gi (¯ x + αi d) ¯ i ], ∀ i ∈ I < (¯ x). By the convexity of gi , i ∈ I < (¯ x), for every αi ∈ (0, α ¯ i ], ˆ ≤ gi (¯ ˆ − gi (¯ αi hξi , di x + αi d) x) < 0, ∀ i ∈ I < (¯ x), which implies hξ, di =
X
i∈I < (¯ x)
λi hξi , di < 0,
which is a contradiction, thereby leading to the requisite result.
Wolkowicz [112] derived a certain characterization in form of the KKT type optimality conditions. But before presenting that result, we present a lemma that will be required to prove the result. Lemma 6.11 Consider the convex programming problem (CP ) with C given by (3.1). Suppose that x ¯ ∈ C and F ⊂ Rn any nonempty set. Then the statement x ¯ is a point of minimizer of (CP ) if and only if the system X λi ∂gi (¯ x) + F 0 ∈ ∂f (¯ x) + i∈I(¯ x)
λi ≥ 0, i ∈ I(¯ x)
is consistent
© 2012 by Taylor & Francis Group, LLC
(6.39)
6.3 Geometric Optimality Condition: Nonsmooth Case
267
holds for any objective function f if and only if F satisfies b x) + F. NC (¯ x) = S(¯
(6.40)
Proof. Suppose that the statement is satisfied for any fixed objective function. We will prove the condition (6.40). Consider ξ ∈ NC (¯ x) and define the objective function as f (x) = −hξ, xi. Then ξ ∈ −∂f (¯ x)∩NC (¯ x), which implies 0 ∈ ∂f (¯ x) + NC (¯ x). By the optimality conditions for (CP ), Theorem 3.1 (ii), x ¯ is a point of minimizer of (CP ). Therefore, by (6.39) along with ∂f (¯ x) = {−ξ} leads to
that is,
b x) + F, ξ ∈ S(¯
b x) + F. NC (¯ x) ⊂ S(¯
(6.41)
b x) + F , which implies there exist ξi ∈ ∂gi (¯ Now suppose that ξ ∈ S(¯ x) and λi ≥ 0 for i ∈ I(¯ x) such that X ξ− λi ξi ∈ F. i∈I(¯ x)
Again define the objective function as f (x) = −hξ, xi, which implies ∂f (¯ x) = {−ξ}. By the above condition it is obvious that the condition (6.39) is satisfied and thus by the statement, x ¯ is a point of minimizer of (CP ). Applying Theorem 3.1, −ξ ∈ NC (¯ x), which implies b x) + F ⊂ NC (¯ S(¯ x).
The above containment along with the relation (6.41) yields the desired condition (6.40). Conversely, suppose that (6.40) holds. By Theorem 3.1 (ii), x¯ is a point of minimizer of (CP ) if and only if 0 ∈ ∂f (¯ x) + NC (¯ x), which by (6.40) is equivalent to b x) + F, 0 ∈ ∂f (¯ x) + S(¯
that is, the system (6.39) is consistent, thereby completing the proof.
b x) by PropoAs mentioned in the beginning of this section, (S(¯ x))◦ = cl S(¯ b x) is closed, condition (6.40) becomes sition 3.9. Therefore, if S(¯ NC (¯ x) = (S(¯ x))◦ + F.
© 2012 by Taylor & Francis Group, LLC
268
Optimality without Constraint Qualification
A similar result as the above theorem was studied by Gould and Tolle [53] under the assumption of differentiability of the functions but not necessarily convex. Applying the above lemma along with some additional conditions, Wolkowicz [112] established KKT type optimality conditions. We present the result below. Theorem 6.12 Consider the convex programming problem (CP ) with C given by (3.1) and x ¯ ∈ C. Suppose that the set Ω satisfies I b (¯ x) ⊂ Ω ⊂ I = and both the sets co
\
Di= (¯ x)
i∈Ω
and
b x) + ( S(¯
\
Di= (¯ x))◦
i∈Ω
are closed. Then x ¯ is a point of minimizer of (CP ) if and only if the system X \ 0 ∈ ∂f (¯ x) + λi ∂gi (¯ x) + ( Di= (¯ x))◦ , (6.42) i∈Ω i∈I(¯ x) λi ≥ 0, i ∈ I(¯ x), is consistent.
Proof. the system (6.42) is obtained, in particular, by taking T Observe that F = ( i∈Ω Di= (¯ x))◦ in Lemma 6.11. Thus, to establish the result, it is sufficient to prove that \ b x) + ( Di= (¯ x))◦ . (6.43) NC (¯ x) = S(¯ i∈Ω
By Proposition 6.10 (v),
TC (¯ x) = S(¯ x) ∩ cl co
\
Di= (¯ x),
i∈Ω
which by Propositions 2.31 and 3.9 imply that NC (¯ x)
= cl ((S(¯ x)◦ + (cl co b x) + ( = cl (S(¯
\
\
Di= (¯ x))◦ )
i∈Ω
Di= (¯ x))◦ ).
i∈Ω
The closedness assumption leads to the condition (6.43), thereby yielding the requisite result. In the above theorem, the closedness conditions on the sets \ \ b x) + ( co Di= (¯ x) and S(¯ Di= (¯ x))◦ i∈Ω
© 2012 by Taylor & Francis Group, LLC
i∈Ω
6.3 Geometric Optimality Condition: Nonsmooth Case
269
act as a constraint qualification. If, in particular, we choose Ω = I = , then the closedness conditions are no longer needed. In fact, \ b x) + ( NC (¯ x) = S(¯ Di= (¯ x))◦ i∈I =
is always satisfied. Below we present the result for this particular case. Theorem 6.13 x ¯ is a minimum of (CP ) if and only if the system X \ 0 ∈ ∂f (¯ x) + λi ∂gi (¯ x) + ( Di= (¯ x))◦ , i∈I =
i∈I(¯ x)
λi ≥ 0, i ∈ I(¯ x),
is consistent.
Proof. By Theorem 3.1 (ii), x ¯ ∈ C is a point of minimizer if and only if 0 ∈ ∂f (¯ x) + NC (¯ x). In order to establish the result, it is enough to show that \ b x) + ( NC (¯ x) = S(¯ Di= (¯ x))◦ .
(6.44)
i∈I =
Observe that int Di≤ (¯ x) = Di< (¯ x) for every i ∈ I < (¯ x). Thus, invoking Propositions 2.31 and 6.10 implies X \ NC (¯ x) = TC (¯ x)◦ = (Di≤ (¯ x))◦ + ( Di= (¯ x))◦ . i∈I =
i∈I < (¯ x)
Again by Proposition 6.10 (ii), Di< (¯ x) 6= ∅, which along with Proposition 6.2 (iv) yields X \ NC (¯ x) = { λi ∂gi (¯ x) : λi ≥ 0, i ∈ I < (¯ x)} + ( Di= (¯ x))◦ . i∈I =
i∈I < (¯ x)
Choosing λi = 0, i ∈ I = , the above condition leads to \ b x) + ( NC (¯ x) ⊂ S(¯ Di= (¯ x))◦ .
(6.45)
i∈I =
By Propositions 3.9, 2.31, and 6.10 imply that \ b x) ⊂ (S(¯ S(¯ x))◦ = ( Di≤ (¯ x))◦ = NC (¯ x). i∈I(¯ x)
Again, by Proposition 6.10, \ \ ( Di= (¯ x))◦ ⊂ ( Di≤ (¯ x))◦ = NC (¯ x). i∈I =
© 2012 by Taylor & Francis Group, LLC
i∈I(¯ x)
(6.46)
270
Optimality without Constraint Qualification
As NC (¯ x) is a closed convex cone, the above relation along with (6.46) leads to \ b x) + ( S(¯ Di= (¯ x))◦ ⊂ NC (¯ x), i∈I =
which together with (6.45) yields the desired condition (6.44).
In all these discussions, the notion of constraint qualification was not considered. Observe that in Theorem 6.13, instead of the standard KKT optimality conditions,TWolkowicz [112] derived KKT type optimality conditions involving the set i∈I =TDi= (¯ x). The system reduces to the standard KKT optimality conditions if ( i∈I = Di= (¯ x))◦ = {0}, that is, F = {0} in system (6.39) of Lemma 6.11. Similar to the regularization condition of Ben-Tal, Ben-Israel, and Zlobec [7], Wolkowicz [112] introduced the notion of regular point and weakest constraint qualification. Definition 6.14 A feasible point x ¯ ∈ C of (CP ) is a regular point if for any objective function f , the system (6.39) holds for F = {0}. A constraint qualification that is satisfied if and only if x¯ is a regular point is known as the weakest constraint qualification. For the differentiable case, Gould and Tolle [52, 53] showed that the Abadie constraint qualification, that is, TC (¯ x) = S(¯ x) is a weakest constraint qualification. Under the differentiability of the funcb x) is closed, which along with the Abadie contions gi , i ∈ I(¯ x), the set S(¯ straint qualification is equivalent to b x), NC (¯ x) = S(¯
which is a weakest constraint qualification. For the nonsmooth case, as discussed in Theorem 3.10, the Abadie constraint qualification along with the b x) is closed leads to the standard KKT conditions. In fact, assumption that S(¯ the Abadie constraint qualification is equivalent to the emptiness of the class of badly behaved constraints I b (¯ x). We present the result below. Proposition 6.15 Let x ¯ ∈ C. Then TC (¯ x) = S(¯ x) if and only if I b (¯ x) = ∅. Proof. Suppose that I b (¯ x) = ∅. Therefore by Proposition 6.10 (iii) and (v), it is obvious that TC (¯ x) = S(¯ x). Conversely, let I b (¯ x) 6= ∅, which implies there exists i∗ ∈ I b (¯ x) such that i∗ ∈ I = and there exists \ x) ∩ S(¯ x))\cl Di= (¯ x). v ∗ ∈ (Di>∗ (¯ i∈I =
© 2012 by Taylor & Francis Group, LLC
6.3 Geometric Optimality Condition: Nonsmooth Case
271
Again by Proposition 6.10, \ v∗ ∈ / cl Di= (¯ x) ∩ S(¯ x) = TC (¯ x), i∈I =
which implies TC (¯ x) 6= S(¯ x), thereby proving the result.
Now we illustrate by examples the above result. Consider C = {x ∈ Rn : x2 ≤ 0, x ≤ 0} with g1 (x) = x2 and g2 (x) = x. Observe that C = {0}. For x ¯ = 0, TC (¯ x) = {0}, and I(¯ x) = I = = {1, 2}. Here, S(¯ x)
= {v ∈ R : g1′ (¯ x, v) ≤ 0, g2′ (¯ x, v) ≤ 0} = {v ∈ R : h∇g1 (¯ x), vi ≤ 0, h∇g2 (¯ x), vi ≤ 0} = {v ∈ R : v ≤ 0}.
Thus, TC (¯ x) 6= S(¯ x), thereby showing that the Abadie constraint qualification is not satisfied. Also by the definitions of the cones of directions, we have D1> (¯ x) = {v ∈ R : v 6= 0}, D2> (¯ x) = {v ∈ R : v > 0}, D1= (¯ x) = {0} = D2= (¯ x). Observe that I b (¯ x) = {1}, that is, the set of badly behaved constraints is nonempty. Next let us consider the set C = {x ∈ R : |x| ≤ 0, x ≤ 0}. Recall from Chapter 3 that the Abadie constraint qualification is satisfied at x ¯ = 0 with S(¯ x) = {0}. Here also, the cones of directions are the same as that of the previous example but now Di> (¯ x) ∩ S(¯ x) = ∅, thereby showing that the set of badly behaved constraints I b (¯ x) is empty. Wolkowicz [112] gave an equivalent characterization of the regular point with Abadie constraint qualification and the set of badly behaved constraints I b (¯ x). We state the result below. The proof can be worked out using Theorem 3.10 and Proposition 6.15. Theorem 6.16 Consider the convex programming problem (CP ) with C given by (3.1) and let x ¯ ∈ C. Then the following are equivalent: (i) x ¯ is a regular point, b x) is closed, (ii) Abadie constraint qualification holds at x ¯ and S(¯
b x) is closed. (iii) I b (¯ x) is empty and S(¯ © 2012 by Taylor & Francis Group, LLC
272
Optimality without Constraint Qualification
In Chapter 3 we derived the optimality conditions not only under the Abadie constraint qualification, but also the Slater constraint qualification. It was observed by Wolkowicz [112] that the Slater constraint qualification is a weakest constraint qualification with respect to the Fritz John optimality condition, which we present below. Theorem 6.17 Consider the convex programming problem (CP ) with C given by (3.1). Then the Slater constraint qualification is a weakest constraint qualification. Proof. By Definition 6.14, the Slater constraint qualification is a weakest constraint qualification if and only if x¯ is a regular point. Consider the Fritz John optimality condition for (CP ); that is, if x ¯ ∈ C is a point of minimizer of (CP ), then there exist λi ≥ 0, i ∈ {0} ∪ I(¯ x), not all simultaneously zero such that X 0 ∈ λ0 ∂f (¯ x) + λi ∂gi (¯ x). i∈I(¯ x)
Suppose that the Slater constraint qualification is satisfied, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. We claim that λ0 6= 0. On the contrary, assume that λ0 = 0. Then the above condition implies that there exist λi ≥ 0, i ∈ I(¯ x), not all simultaneously zero, such that X 0∈ λi ∂gi (¯ x), i∈I(¯ x)
which implies that there exist ξi ∈ ∂gi (¯ x), i ∈ I(¯ x), such that X 0= λi ξi .
(6.47)
i∈I(¯ x)
By the convexity of gi , i ∈ I(¯ x), hξi , x ˆ−x ¯i ≤ gi (ˆ x) − gi (¯ x) < 0, ∀ ξi ∈ ∂gi (¯ x), which along with the condition (6.47) leads to a contradiction. Thus, λ0 6= 0 and hence can be normalized to one, thereby leading to the KKT optimality conditions. Observe that the KKT optimality condition holds at x¯ if the system X 0 ∈ ∂f (¯ x) + λi ∂gi (¯ x), i∈I(¯ x)
λi ≥ 0, i ∈ I(¯ x),
is consistent for any f , which is equivalent to the inconsistency of the system X 0∈ λi ∂gi (¯ x), i∈I(¯ x)
λi ≥ 0, i ∈ I(¯ x), not all simultaneously zero.
© 2012 by Taylor & Francis Group, LLC
6.3 Geometric Optimality Condition: Nonsmooth Case Thus, the inconsistency of the above system is equivalent to [ 0 6∈ cone co ∂gi (¯ x).
273
(6.48)
i∈I(¯ x)
We claim that the above condition is equivalent to the Slater constraint qualification. Suppose that the condition (6.48) holds. Because dom gi = Rn , i ∈ I(¯ x), by Proposition 2.83, ∂gi (¯ x) is a nonempty compact set. As S I(¯ x) ⊂ {1, 2, . . . , m} is finite, i∈I(¯x) ∂gi (¯ x) is also nonempty compact. Also, as [ [ ∂gi (¯ x) ⊂ cone co ∂gi (¯ x), i∈I(¯ x)
i∈I(¯ x)
the condition (6.48) implies that 0∈ /
[
∂gi (¯ x).
i∈I(¯ x)
Invoking Proposition 3.4, cone co
[
∂gi (¯ x)
i∈I(¯ x)
is a closed set. Invoking the Strict Separation Theorem, Theorem 2.26 (iii), there exists d¯ ∈ Rn and d¯ 6= 0 such that [ ¯ < 0, ∀ z ∈ cone co hz, di ∂gi (¯ x). i∈I(¯ x)
In particular, for ξi ∈ ∂gi (¯ x), i ∈ I(¯ x), the above inequality leads to ¯ < 0. hξ, di As dom gi = Rn , i ∈ I(¯ x), by Theorem 2.79, for i ∈ I(¯ x), ¯ = g ′ (¯ ¯ max hξi , di i x, d) < 0,
ξi ∈∂gi (¯ x)
which implies lim λ↓0
¯ ¯ − gi (¯ gi (¯ x + λd) gi (¯ x + λd) x) = lim < 0. λ↓0 λ λ
Therefore, for every λ > 0, ¯ < 0, ∀ i ∈ I(¯ gi (¯ x + λd) x).
(6.49)
For i 6∈ I(¯ x), gi (¯ x) < 0. Because dom gi = Rn , i 6∈ I(¯ x), by Theorem 2.69, ¯ > 0 such that gi , i 6∈ I(¯ x) is continuous over Rn . Thus, there exists λ ¯ < 0, ∀ d ∈ Rn . gi (¯ x + λd)
© 2012 by Taylor & Francis Group, LLC
274
Optimality without Constraint Qualification
¯ the above inequality becomes In particular, for d = d, ¯ d) ¯ < 0, ∀ i ∈ gi (¯ x+λ / I(¯ x).
(6.50)
¯ d¯ ∈ Rn , Combining (6.49) and (6.50), for x¯ + λ ¯ d) ¯ < 0, ∀ i = 1, 2, . . . , m, gi (¯ x+λ which implies that the Slater constraint qualification holds. Conversely, suppose that the Slater constraint qualification holds, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i ∈ I. By Definition 2.77 of subdifferentiability, for any ξi ∈ ∂gi (¯ x), i ∈ I(¯ x), hξ, x ˆ−x ¯i ≤ gi (ˆ x) − gi (¯ x) = gi (ˆ x) < 0, which implies that hz, x ˆ−x ¯i < 0, ∀ z ∈ cone co
[
∂gi (¯ x).
i∈I(¯ x)
S Therefore, z 6= 0 for any z ∈ cone co i∈I(¯x) ∂gi (¯ x), thereby establishing (6.48). Hence, the Slater constraint qualification is a weakest constraint qualification. In both these approaches, one makes use of the direction sets to establish optimality conditions in the absence of any constraint qualification for the convex programming problem (CP ). More recently, Jeyakumar and Li [69] studied a class of sublinear programming problems involving separable sublinear constraints in the absence of any constraint qualification, which we discuss in the next section.
6.4
Separable Sublinear Case
As already mentioned, the sublinear programming problem considered by Jeyakumar and Li [69] involved separable sublinear constraints. So before moving ahead with the problem, we state the concept of separable sublinear function. Definition 6.18 A sublinear function p : Rn → R is called a separable sublinear function if p(x) =
n X
pj (xj )
j=1
with each pj : R → R, j = 1, 2, . . . , n being a sublinear function.
© 2012 by Taylor & Francis Group, LLC
6.4 Separable Sublinear Case
275
The sublinear programming problem studied by Jeyakumar and Li [69] is min p0 (x)
pi (x) ≤ bi , i = 1, 2, . . . , m
subject to
(SP )
where p0 : Rn → R is a sublinear function, pi : Rn → R, i = 1, 2, . . . , m, is a separable sublinear function and bi ∈ R, i = 1, 2, . . . , m. Before establishing the optimality conditions for (SP ), we first present the Farkas’ Lemma derived by Jeyakumar and Li [69]. Farkas’ Lemma acts as a tool in the study of optimality conditions for (SP ) in the absence of any constraint qualification. Theorem 6.19 Consider the sublinear function p˜0 : Rn → R and separable sublinear functions p˜i : Rn → R, i = 1, 2, . . . , m. Then the following are equivalent: (i) x ∈ Rn , p˜i (x) ≤ 0, i = 1, 2, . . . , m ⇒ p˜0 (x) ≥ 0, (ii) There exist λi ≥ 0, i = 1, 2, . . . , m, such that p˜0 (x) +
m X i=1
λi p˜i (x) ≥ 0, ∀ x ∈ Rn .
Proof. Suppose that condition (i) holds. We claim that condition (ii) is also satisfied. On the contrary, assume that (ii) does not hold, which along with the fact that for a real-valued sublinear function p : Rn → R, p(0) = 0 implies that for any λi ≥ 0, i = 1, 2, . . . , m, x ¯ = 0 is not a point of minimizer of the unconstrained problem min p˜0 (x) +
m X
λi p˜i (x)
subject to
i=1
x ∈ Rn .
As sublinear functions are a special class of convex functions, the sublinear programming problem (SP ) is also a convex programming problem for which the KKT optimality conditions are necessary as well as sufficient for the point of minimizer. Therefore, the KKT optimality condition does not hold at x¯ = 0, that is, 0∈ / ∂(˜ p0 +
m X
λi p˜i )(0).
i=1
As dom p˜i = Rn , i = 0, 1, . . . , m, by Theorem 2.69, p˜i , i = 0, 1, . . . , m, are continuous on Rn . Applying the Sum Rule, Theorem 2.91, the above condition becomes 0 6∈ ∂ p˜0 (0) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi ∂ p˜i (0),
276
Optimality without Constraint Qualification
thereby implying ∂ p˜0 (0) ∩ (−P ) = ∅, where m X P ={ λi ∂ p˜i (0) : λi ≥ 0, i = 1, 2, . . . , m}. i=1
As p˜i , i = 1, 2, . . . , m, are separable sublinear functions, p˜i (x) =
n X
p˜ij (xj ), i = 1, 2, . . . , m,
j=1
where p˜ij are sublinear functions on R. Thus, m X P ={ λi (∂ p˜i1 (0) × ∂ p˜i2 (0) × . . . × ∂ p˜in (0)) : λi ≥ 0, i = 1, 2, . . . , m}. i=1
As p˜ij : R → R, by Proposition 2.83, ∂ p˜ij is a nonempty convex and compact set in R, that is, ∂ p˜ij (0) = [lij , uij ], i = 1, 2, . . . , m, j = 1, 2, . . . , n, for some lij , uij ∈ R with lij ≤ uij . Therefore, P
m X λi ([li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ]) : λi ≥ 0, i = 1, 2, . . . , m} ={ i=1
= cone co
m [
i=1 m [
= cone co {
i=1
([li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ]) (ai1 , ai2 , . . . , ain ) : aij ∈ [lij , uij ], j = 1, 2, . . . , n}.
Note that [li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ] forms a convex polytope in Rn with 2n vertices denoted by r r r (vir ) = (vi1 , vi2 , . . . , vin ), i = 1, 2, . . . , m, r = 1, 2, . . . , 2n , r where vij ∈ {lij , uij }. Also, any element in the polytope can be expressed as the convex combination of the vertices. Therefore,
(ai1 , ai2 , . . . , ain ) = co{(vi1 ), (vi2 ), . . . , (vi2n )}, which implies that P = cone co
m [
{(vi1 ), (vi2 ), . . . , (vi2n )}.
i=1
Hence, P is a finitely generated convex cone and thus, by Proposition 2.44, is a polyhedral cone that is always closed.
© 2012 by Taylor & Francis Group, LLC
6.4 Separable Sublinear Case
277
As sublinear functions are convex, by Proposition 2.83, ∂ p˜0 (0) is a compact convex set and, from the above discussion, P is a closed convex cone. Therefore, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists α ∈ Rn with α 6= 0 such that sup hα, ξi < inf hα, ξi = − suphα, ξi. ξ∈−P
ξ∈∂ s˜0 (0)
(6.51)
ξ∈P
Consider suphα, ξi
=
ξ∈P
(
sup hα, ξi : ξ ∈ {
≥ sup{hα, ξi : ξ ∈ =
m X i=1
m X
i=1 m X i=1
λi ∂ p˜i (0) : λi ≥ 0, i = 1, 2, . . . , m}
)
λi ∂ p˜i (0)}, ∀ λ ∈ Rm +
λi p˜i (α), ∀ λ ∈ Rm +.
From the preceding relation and condition (6.51), sup hα, ξi < − suphα, ξi ≤ −
p˜0 (α) =
ξ∈P
ξ∈∂ p˜0 (0)
m X i=1
λi p˜i (α), ∀ λ ∈ Rm +,
(6.52)
which implies m X i=1
λi p˜i (α) < −˜ p0 (α), ∀ λ ∈ Rm +.
This inequality holds for every λ ∈ Rm ˜i (α) ≤ 0, i = 1, 2, . . . , m. Otherwise, + if s if for some i ∈ {1, 2, . . . , m}, s˜i (α) > 0, then choosing the corresponding λi → +∞, we arrive at a contradiction. Also, as P is a closed convex cone, from (6.52), p˜0 (α) < − sup hα, ξi ≤ 0. ξ∈P
Therefore, for α ∈ Rn , p˜0 (α) < 0
and
p˜i (α) ≤ 0, i = 1, 2, . . . , m,
which contradicts (i). Thus condition (ii) is satisfied. Conversely, suppose that condition (ii) holds, which implies for some λi ≥ 0, i = 1, 2, . . . , m, −
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi p˜i (x) ≤ p˜0 (x), ∀ x ∈ Rn .
278
Optimality without Constraint Qualification
If for some x ∈ Rn , p˜i (x) ≤ 0, i = 1, 2, . . . , m, from the above inequality p˜0 (x) ≥ 0, thereby establishing condition (i) and hence the desired result. We end this chapter by deriving the constraint qualification free optimality condition for the sublinear programming problem (SP ) from Jeyakumar and Li [69]. Theorem 6.20 Consider the sublinear programming problem (SP ). Then x ¯ is a minimizer of (SP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂p0 (0) +
m X
λi ∂pi (0)
and
p0 (¯ x) +
i=1
m X
λi bi = 0.
i=1
Proof. Observe that x ¯ is a minimizer of (SP ) if and only if pi (x) − bi ≤ 0, i = 1, 2, . . . , m
=⇒
p0 (x) − p0 (¯ x) ≥ 0.
(6.53)
But Theorem 6.19 cannot be applied directly to the above system as the theorem is for the system involving sublinear functions, whereas here neither pi (x) − bi , i = 1, 2, . . . , m, nor p0 (x) − p0 (¯ x) is positively homogeneous and hence not sublinear functions. So define p˜i : Rn × R → R, i = 0, 1, . . . , m, as p˜0 (x, t) = p0 (x) − tp0 (¯ x)
and
p˜i (x, t) = pi (x) − tbi , i = 1, 2, . . . , m.
Because pi , i = 1, 2, . . . , m, are separable sublinear functions on Rn , p˜i , i = 1, 2, . . . , m, are also separable sublinear functions along with the sublinearity of p˜0 on Rn × R. Now consider the system p˜i (x, t) ≤ 0, i = 1, 2, . . . , m
=⇒
p˜0 (x, t) ≥ 0.
(6.54)
This system is in the desired form needed for the application of Farkas’ Lemma, Theorem 6.19. To establish the result, we will first establish the equivalence between the systems (6.53) and (6.54). Suppose that the system (6.53) holds. We claim that (6.54) is also satisfied. On the contrary, assume that the system (6.54) does not hold, which implies there exists (˜ x, t˜) ∈ Rn × R such that p˜0 (˜ x, t˜) < 0
and
p˜i (˜ x, t˜) ≤ 0, i = 1, 2, . . . , m.
For t˜ > 0, by positive homogeneity of the sublinear function and the construction of p˜i , i = 0, 1, . . . , m, p0 (˜ x/t˜) − p0 (¯ x) = p˜0 (˜ x/t˜, 1) < 0, pi (˜ x/t˜) − bi = p˜i (˜ x/t˜, 1) ≤ 0, i = 1, 2, . . . , m, thereby contradicting (6.53).
© 2012 by Taylor & Francis Group, LLC
6.4 Separable Sublinear Case
279
Now, in particular, taking t˜ = 0, p0 (˜ x) = p˜0 (˜ x, 0) < 0
and
pi (˜ x) = p˜i (˜ x, 0) ≤ 0, i = 1, 2, . . . , m.
For t > 0, consider x ¯ + t˜ x ∈ Rn . Therefore, by the feasibility of x ¯ for (SP ) and the above condition, p0 (¯ x + t˜ x) − p0 (¯ x) ≤ tp0 (˜ x) < 0, pi (¯ x + t˜ x) − bi ≤ pi (¯ x) − bi + tpi (˜ x) ≤ 0,
i = 1, 2, . . . , m,
which is again a contradiction of the system (6.53). If t˜ < 0, then by construction of p˜i , i = 0, 1, . . . , m, p0 (˜ x) − t˜p0 (¯ x) < 0
pi (˜ x) − t˜bi ≤ 0, i = 1, 2, . . . , m.
and
Consider x ˜ + (−t˜ + 1)¯ x ∈ Rn . By the sublinearity of pi , i = 0, 1, . . . , m, p0 (˜ x + (−t˜ + 1)¯ x) ≤ p0 (˜ x) + (−t˜ + 1)p0 (¯ x) ≤ t˜p0 (¯ x) + (−t˜ + 1)p0 (¯ x) = p0 (¯ x), and pi (˜ x + (−t˜ + 1)¯ x) ≤ pi (˜ x) + (−t˜ + 1)pi (¯ x) ˜ ˜ ≤ tbi + (−t + 1)bi = bi , i = 1, 2, . . . , m, which contradicts (6.53). Thus from all three cases, it is obvious that our assumption is wrong and hence the system (6.54) holds. Conversely, taking t = 1 in system (6.54) yields (6.53). Hence, both systems (6.53) and (6.54) are equivalent. Applying Farkas’ Lemma, Theorem 6.19, for the sublinear systems to (6.54), there exist λi ≥ 0, i = 1, 2, . . . , m, such that p˜0 (x, t) +
m X i=1
λi p˜i (x, t) ≥ 0, ∀ (x, t) ∈ Rn × R,
which implies (0, 0) ∈ Rn × R is a point of minimizer of the unconstrained problem min p˜0 (x, t) +
m X
λi p˜i (x, t)
subject to
i=1
(x, t) ∈ Rn × R.
By the KKT optimality condition for the unconstrained problem, Theorem 2.89, (0, 0)
∈
∂(˜ p0 +
= ∂(p0 +
m X
i=1 m X i=1
© 2012 by Taylor & Francis Group, LLC
λi p˜i )(0, 0) λi pi )(0) × ∇(tp0 (¯ x) +
m X i=1
λi tbi )(0),
280
Optimality without Constraint Qualification
where the subdifferential is with respect to x and the gradient with respect to t. Therefore, a componentwise comparison leads to 0 ∈ ∂(p0 + λi pi )(0)
and
p0 (¯ x) +
m X
λi bi = 0.
i=1
As dom pi = Rn , i = 0, 1, . . . , m, by Theorem 2.69, pi , i = 0, 1, . . . , m, are continuous on Rn . Thus, by the Sum Rule, Theorem 2.91, the first relation yields 0 ∈ ∂p0 (0) +
m X
thereby establishing the desired result.
© 2012 by Taylor & Francis Group, LLC
λi ∂pi (0),
i=1
Chapter 7 Sequential Optimality Conditions
7.1
Introduction
In this chapter we are going to look into a completely different approach to develop optimality conditions in convex programming. These optimality conditions, called sequential optimality conditions, can hold without any qualification and thus both from a theoretical as well as practical point of view this is of great interest. To the best of our knowledge, this approach was initiated by Thibault [108]; Jeyakumar, Rubinov, Glover, and Ishizuka [70]; and Jeyakumar, Lee, and Dinh [68]. Unlike the approach of direction sets in Chapter 6, in the sequential approach one needs calculus rules for subdifferentials and εsubdifferentials, namely the Sum Rule and the Chain Rule. As the name itself suggests, the sequential optimality conditions are established as a sequence of subdifferentials at neighborhood points as in the work of Thibault [108] or sequence of ε-subdifferentials at the exact point as in the study of Jeyakumar and collaborators [68, 70]. Thibault [108] used the approach of sequential subdifferential calculus rules while Jeyakumar and collaborators [68, 70] used the approach of epigraphs of conjugate functions to study the sequential optimality conditions extensively. In both these approaches, the convex programming problem involved cone constraints and abstract constraints. But keeping in sync with the convex programming problem (CP ) studied in this book, we consider the feasible set C involving convex inequalities. The reader must have realized the central role of the Slater constraint qualification in the study of optimality and duality in optimization. However, as we have seen, the Slater constraint qualification can fail even for very simple problems. The failure of the Slater constraint qualification was overcome by the development of the so-called closed cone constraint qualification. It is a geometric qualification that uses the Fenchel conjugate of the constraint function. We will study this qualification condition in detail. 281 © 2012 by Taylor & Francis Group, LLC
282
7.2
Sequential Optimality Conditions
Sequential Optimality: Thibault’s Approach
We first discuss the approach due to Thibault [108]. As already mentioned, he makes use of sequential subdifferential rules in his work. As one will observe, the Sum Rule and the Chain Rule are expressed in terms of the sequence of subdifferentials at neighborhood points. We present below the Sum Rule from Thibault [108] which involves the application of the Sum Rule given by Hiriart-Urruty and Phelps [64]. Theorem 7.1 (Sequential Sum Rule) Consider two proper lsc convex func¯ i = 1, 2. Then for any x tions φi : Rn → R, ¯ ∈ dom φ1 ∩ dom φ2 , ∂(φ1 + φ2 )(¯ x) = lim sup {∂φ1 (x1 ) + ∂φ2 (x2 )}, xi →φi −h,i x ¯
where lim supxi →φi −h,i x¯ {∂φ1 (x1 ) + ∂φ2 (x2 )} denotes the set of all limits limk→∞ (ξ1k + ξ2k ) for which there exists xki → x ¯, i = 1, 2 such that ξik ∈ ∂φi (xki ), i = 1, 2, and φi (xki ) − hξik , xki − x ¯i → φi (¯ x), i = 1, 2.
(7.1)
Proof. Suppose that ξ ∈ ∂(φ1 + φ2 )(¯ x). By Theorem 2.120, \ ξ∈ cl{∂1/k φ1 (¯ x) + ∂1/k φ2 (¯ x)}, k∈N
which implies for every k ∈ N, ξ ∈ cl {∂1/k φ1 (¯ x) + ∂1/k φ2 (¯ x)}. From Definition 2.12 of the closure of a set, for every k ∈ N, ξ ∈ ∂1/k φ1 (¯ x) + ∂1/k φ2 (¯ x) +
1 B. k
Therefore, there exists ξi′k ∈ ∂1/k φi (¯ x), i = 1, 2, and bk ∈ B such that ξ = ξ1′k + ξ2′k +
1 k b . k
(7.2)
Applying the modified version of the Brøndsted–Rockafellar Theorem, Theorem 2.114, there exist xki ∈ Rn and ξik ∈ ∂φi (xki ) such that for i = 1, 2, 1 2 1 kxki − x ¯k ≤ √ , kξik − ξi′k k ≤ √ , |φi (xki ) − hξik , xki − x ¯i − φi (¯ x)| ≤ , (7.3) k k k √ which implies ξi′k = ξik + 1/ k bki for some bki ∈ B for i = 1, 2. Therefore, the condition (7.2) becomes 1 1 1 ξ = ξ1k + ξ2k + ( bk + √ bk1 + √ bk2 ), k k k
© 2012 by Taylor & Francis Group, LLC
7.2 Sequential Optimality: Thibault’s Approach
283
that is, ξ = lim (ξ1k + ξ2k ), k→∞
which along with (7.3) yields the desired inclusion. Conversely, suppose that ξ ∈ lim sup {∂φ1 (x1 ) + ∂φ2 (x2 )}, xi →φi −h,i x ¯
which implies for i = 1, 2, there exist xki → x ¯, ξik ∈ ∂φi (xki ) satisfying k k k φi (xi ) − hξi , xi − x ¯i → φi (¯ x) and ξ = lim (ξ1k + ξ2k ). k→∞
As ξik ∈ ∂φi (xki ), i = 1, 2, hξik , x − xki i ≤ φi (x) − φi (xki ), ∀ x ∈ Rn . Also, for every x ∈ Rn , hξik , x − x ¯i = hξik , x − xki i + hξik , xki − x ¯i k k k ≤ φi (x) − φi (xi ) + hξi , xi − x ¯i,
i = 1, 2,
thereby yielding hξ1k + ξ2k , x − x ¯i ≤ φ1 (x) + φ2 (x) − φ1 (xk1 ) − φ2 (xk2 )
+hξ1k , xk1 − x ¯i + hξ2k , xk2 − x ¯i
for every x ∈ Rn . Taking the limit as k → +∞ and using the condition (7.1), the above inequality reduces to hξ, x − x ¯i ≤ (φ1 + φ2 )(x) − (φ1 + φ2 )(¯ x), ∀ x ∈ Rn , which implies ξ ∈ ∂(φ1 + φ2 )(¯ x), thereby establishing the requisite result. Using a very different assumption, the Moreau–Rockafellar Sum Rule, Theorem 2.91, was obtained by Thibault [108]. ¯ Corollary 7.2 Consider two proper lsc convex functions φi : Rn → R, i = 1, 2. If 0 ∈ core(dom φ1 − dom φ2 ), then for every x ¯ ∈ dom φ1 ∩ dom φ2 , ∂(φ1 + φ2 )(¯ x) = ∂φ1 (¯ x) + ∂φ2 (¯ x).
© 2012 by Taylor & Francis Group, LLC
284
Sequential Optimality Conditions
Proof. By Definition 2.77 of the subdifferentials, it is easy to observe that the inclusion ∂(φ1 + φ2 )(¯ x) ⊃ ∂φ1 (¯ x) + ∂φ2 (¯ x) (7.4) always holds true. To prove the result, we will show the reverse inclusion in relation (7.4). Consider ξ ∈ ∂(φ1 + φ2 )(¯ x). Then by Theorem 7.1, for i = 1, 2, there exist xki → x ¯ and ξik ∈ ∂φi (xki ) such that ξ = lim (ξ1k + ξ2k ) k→∞
and
γik = φi (xki ) − hξik , xki − x ¯i → φi (¯ x).
(7.5)
Denote ξ k = ξ1k + ξ2k . As 0 ∈ core(dom φ1 − dom φ2 ), by Definition 2.17, for any y ∈ Rn and y 6= 0, there exist α > 0 and xi ∈ dom φi , i = 1, 2, such that αy = x1 − x2 . By the convexity of φi , i = 1, 2 along with (7.5), hξ1k , αyi
= hξ1k , x1 − xk1 i + hξ1k , xk1 − x ¯i + hξ1k , x ¯ − x2 i
≤ φ1 (x1 ) − φ(xk1 ) + hξ1k , xk1 − x ¯i + hξ1k , x ¯ − x2 i k k k = φ1 (x1 ) − γ1 + hξ , x ¯ − x2 i + hξ2 , x2 − xk2 i + hξ2k , xk2 − x ¯i k k k k k ≤ φ1 (x1 ) − γ1 + hξ , x ¯ − x2 i + φ2 (x2 ) − φ2 (x2 ) + hξ2 , x2 − x ¯i =
(φ1 (x1 ) − γ1k ) + (φ2 (x2 ) − γ2k ) + hξ k , x ¯ − x2 i.
As the limit k → +∞, using the conditions (7.5), (φ1 (x1 ) − γ1k ) + (φ2 (x2 ) − γ2k ) + hξ k , x ¯ − x2 i
→ (φ1 (x1 ) − φ1 (¯ x)) + (φ2 (x2 ) − φ2 (¯ x)) + hξ, x ¯ − x2 i.
Therefore, hξ1k , yi ≤
My , ∀ k ∈ N, α
My which is independent of k. Simiα k larly, the sequence {hξ1 , −yi} is bounded above. In particular, taking y = ei , i = 1, 2, . . . , n, where ei is a vector in Rn with i-th component 1 and all other zeroes, that is, {hξ1k , yi} is bounded above by
kξ1k k∞ =
max
i=1,2,...,n
|hξ1k , ei i| ≤
max
i=1,2,...,n
|Mi |.
Thus, {ξ1k } is a bounded sequence. As ξ1k + ξ2k → ξ, {ξ2k } is also a bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, the sequences {ξik }, i = 1, 2, have a convergent subsequence. Without loss of generality, assume that ξik → ξi , i = 1, 2, such that ξ1 + ξ2 = ξ. By Theorem 2.84, ξi ∈ ∂φi (¯ x), i = 1, 2, thereby yielding ∂(φ1 + φ2 )(¯ x) ⊂ ∂φ1 (¯ x) + ∂φ2 (¯ x),
© 2012 by Taylor & Francis Group, LLC
7.2 Sequential Optimality: Thibault’s Approach which along with (7.4) leads to the desired result.
285
Consider the convex optimization problem min f (x)
subject to
x ∈ C,
(CP )
¯ is a proper lsc convex function and C ⊂ Rn is a closed where f : Rn → R convex set. We shall now provide the sequential optimality condition for (CP ) as an application to Theorem 7.1. Theorem 7.3 Consider the convex optimization problem (CP ) where ¯ is an extended-valued proper lsc convex function. Then x f : Rn → R ¯ is a point of minimizer of (CP ) if and only if there exist xki → x ¯, i = 1, 2, with ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ NC (xk2 ) such that ξ1k + ξ2k → 0,
f (xk1 ) − hξ1k , xk1 − x ¯i → f (¯ x)
and
hξ2k , xk2 − x ¯i → 0.
Proof. Observe that (CP ) is equivalent to the unconstrained problem min (f + δC )(x)
subject to
x ∈ Rn .
By the optimality condition for the unconstrained programming problem, Theorem 2.89, x ¯ is a minimum to (CP ) if and only if 0 ∈ ∂(f + δC )(¯ x). Applying Theorem 7.1, there exist sequence {xki } ⊂ Rn with xki → x ¯, i = 1, 2, ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ δC (xk2 ) = NC (xk2 ) satisfying f (xk1 ) − hξ1k , xk1 − x ¯i → f (¯ x)
and
hξ2k , xk2 − x ¯i → 0
such that lim (ξ1k + ξ2k ) = 0,
k→∞
thereby yielding a sequential optimality condition.
It is important to note that the conditions on the problem data of (CP ) was minimal. The importance of the Sequential Sum Rule becomes clear because under the assumptions in (CP ), it is not obvious whether the qualification conditions needed to apply the exact Sum Rule holds or not. For the convex programming problem (CP ) with C given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, it was discussed in Chapter 3 how the normal cones could be explicitly expressed in terms of the subdifferentials of the constraint functions gi , i = 1, 2, . . . , m, in presence of the Slater constraint qualification. But if the Slater constraint qualification is not satisfied, then how would one explicitly compute the normal cone. For that we first present the sequential Chain Rule
© 2012 by Taylor & Francis Group, LLC
286
Sequential Optimality Conditions
from Thibault [108] in a finite dimensional setting, a corollary to which plays a pivotal role in deriving the sequential optimality conditions, when C is explicitly given by convex inequalities. Note that in the following we will consider a vector-valued convex function Φ : Rn → Rm . This means that each component function of Φ is a real-valued convex function on Rn . Equivalently, Φ is convex if for every x1 , x2 ∈ Rn and for every λ ∈ [0, 1], (1 − λ)Φ(x1 ) + λΦ(x2 ) − Φ((1 − λ)x1 + λx2 ) ∈ Rm +.
(7.6)
The epigraph of Φ, epi Φ, is defined as epi Φ = {(x, µ) ∈ Rn × Rm : µ ∈ Φ(x) + Rm + }.
¯ is said to be nondecreasing on a set F ⊂ Rm if for A function φ : Rm → R every y1 , y2 ∈ F , φ(y1 ) ≤ φ(y2 )
whenever
y2 − y1 ∈ Rm +.
Consider a vector-valued convex function Φ and let φ be nondecreasing convex n function on Φ(Rn ) + Rm + . By the convexity of Φ, for every x1 , x2 ∈ R and for every λ ∈ [0, 1], the condition (7.6) leads to n m (1 − λ)Φ(x1 ) + λΦ(x2 ) ∈ Φ((1 − λ)x1 + λx2 ) + Rm + ⊂ Φ(R ) + R+ .
Also, Φ((1 − λ)x1 + λx2 ) ∈ Φ(Rn ) ⊂ Φ(Rn ) + Rm +.
As φ is a nondecreasing function on Φ(Rn ) + Rm + , by the convexity of Φ, (7.6) implies that φ(Φ((1 − λ)x1 + λx2 )) ≤ φ((1 − λ)Φ(x1 ) + λΦ(x2 )).
By the convexity of φ, for every x1 , x2 ∈ Rn ,
φ(Φ((1 − λ)x1 + λx2 )) ≤ (1 − λ)φ(Φ(x1 )) + λφ(Φ(x2 )), ∀ λ ∈ [0, 1], that is, for every λ ∈ [0, 1], (φ ◦ Φ)((1 − λ)x1 + λx2 ) ≤ (1 − λ)(φ ◦ Φ)(x1 ) + λ(φ ◦ Φ)(x2 ). Hence, (φ ◦ Φ) is a convex function. Below we present the Sequential Chain Rule from Thibault [108]. Theorem 7.4 (Sequential Chain Rule) Consider a vector-valued convex func¯ that is nontion Φ : Rn → Rm and a proper lsc convex function φ : Rm → R n m decreasing over Φ(R ) + R+ . Then for y¯ = Φ(¯ x) ∈ dom φ, ξ ∈ ∂(φ ◦ Φ)(¯ x) if and only if there exist xk → x ¯, yk → y¯, ξk → ξ, τk → 0, yk′ ∈ Φ(xk ) + Rm + with yk′ → Φ(¯ x) and ηk ∈ Rm + such that ηk + τk ∈ ∂φ(yk ),
ξk ∈ ∂(ηk Φ)(xk ),
hηk , yk′ i → hηk , Φ(xk )i
and φ(yk ) − hηk , yk − y¯i → φ(¯ y)
© 2012 by Taylor & Francis Group, LLC
and
hηk , Φ(xk ) − y¯i → 0.
7.2 Sequential Optimality: Thibault’s Approach
287
Proof. Define φ1 (x, y) = φ(y) and φ2 (x, y) = δepi Φ (x, y). We claim that ξ ∈ ∂(φ ◦ Φ)(¯ x)
if and only if
(ξ, 0) ∈ ∂(φ1 + φ2 )(¯ x, y¯).
(7.7)
Suppose that ξ ∈ ∂(φ ◦ Φ)(¯ x), which by Definition 2.77 of the subdifferential implies that φ(Φ(x)) − φ(Φ(¯ x)) = (φ ◦ Φ)(x) − (φ ◦ Φ)(¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn . m Consider (x, y) ∈ epi Φ, which implies y − Φ(x) ∈ Rm + , that is, y ∈ Φ(x) + R+ . n m Because φ is nondecreasing over Φ(R ) + R+ , φ(y) ≥ φ(Φ(x)) for every (x, y) ∈ epi Φ. Therefore, the above condition leads to
φ(y) − φ(¯ y ) ≥ hξ, x − x ¯i, ∀ (x, y) ∈ epi Φ, where y¯ = Φ(¯ x). From the definition of φ1 and φ2 , for every (x, y) ∈ Rn × Rm the above condition leads to φ1 (x, y) + φ2 (x, y) − φ1 (¯ x, y¯) − φ2 (¯ x, y¯) ≥ hξ, x − x ¯i + h0, y − y¯i, thereby implying that (ξ, 0) ∈ ∂(φ1 + φ2 )(¯ x, y¯). Conversely, suppose that (ξ, 0) ∈ ∂(φ1 + φ2 )(¯ x, y¯), which by the definition of subdifferential implies that for every (x, y) ∈ Rn × Rm , (φ1 + φ2 )(x, y) − (φ1 + φ2 )(¯ x, y¯) ≥ hξ, x − x ¯i + h0, y − y¯i. The above inequality holds in particular for every (x, y) ∈ epi Φ. As (x, Φ(x)) ∈ epi Φ, the above inequality reduces to φ(Φ(x)) − φ(Φ(¯ x)) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , which implies that ξ ∈ ∂(φ ◦ Φ)(¯ x), thereby establishing our claim (7.7). Now by Theorem 7.1, (0, βk ) + (ξk , θk ) → (ξ, 0), where βk ∈ ∂φ(yk ), (ξk , θk ) ∈ ∂δepi Φ (xk , yk′ ), yk → y¯, (xk , yk′ ) → (¯ x, y¯), φ(yk ) − hβk , yk − y¯i → φ(¯ y ), φ2 (xk , yk′ ) − hθk , yk′ − y¯i − hξk , xk − x ¯i → φ2 (¯ x, y¯) = 0. Set θk = −ηk and define τk = βk − ηk . Observe that ξk → ξ and τk = βk + θk → 0. The preceding facts can thus be written as (0, ηk + τk ) + (ξk , −ηk ) → (ξ, 0),
© 2012 by Taylor & Francis Group, LLC
288
Sequential Optimality Conditions
with (ξk , −ηk ) ∈ ∂δepi Φ (xk , yk′ ), (xk , yk′ ) → (¯ x, y¯),
ηk + τk ∈ ∂φ(yk ), yk → y¯,
φ(yk ) − hηk + τk , yk − y¯i → φ(¯ y ), ′ ′ φ2 (xk , yk ) + hηk , yk − y¯i − hξk , xk − x ¯i → φ2 (¯ x, y¯) = 0.
(7.8) (7.9)
As (xk , yk′ ) ∈ epi Φ, φ2 (xk , yk′ ) = 0, which along with ξk → ξ and xk → x ¯ reduces (7.9) to hηk , yk′ − y¯i → 0. (7.10) Also, (ξk , −ηk ) ∈ ∂δepi Φ (xk , yk′ ) implies that
hξk , x − xk i − hηk , y − yk′ i ≤ 0, ∀ (x, y) ∈ epi Φ.
(7.11)
Observe that (xk , yk′ ) ∈ epi Φ, which implies that yk′ − Φ(xk ) ∈ Rm +. Therefore, for any y ′ ∈ Rm +, y ′ + yk′ − Φ(xk ) ∈ Rm +, that is, (xk , y ′ +yk′ ) ∈ epi Φ. In particular, taking x = xk and setting y = yk′ +y ′ for any y ′ ∈ Rm + in (7.11) yields hηk , y ′ i ≥ 0, ∀ y ′ ∈ Rm +, which implies that ηk ∈ Rm + . Taking x = xk and y = Φ(xk ) in (7.11), hηk , yk′ − Φ(xk )i ≤ 0 which along with the facts that ηk ∈ Rm + and ′ yk′ − Φ(xk ) ∈ Rm + leads to hηk , yk − Φ(xk )i = 0. Therefore, (7.11) is equivalent to hξk , x − xk i ≤ hηk , yi − hηk , Φ(xk )i, ∀ (x, y) ∈ epi Φ. In particular, for y = Φ(x), hξk , x − xk i ≤ (ηk Φ)(x) − (ηk Φ)(xk ), ∀ x ∈ Rn . Observe that as ηk ∈ Rm + , (ηk Φ) is a convex function and thus the above inequality implies that (7.11) is equivalent to ξk ∈ ∂(ηk Φ)(xk ). Also from the condition hηk , yk′ − Φ(xk )i = 0, we have hηk , yk′ i = hηk , Φ(xk )i. Inserting this fact in (7.10) leads to hηk , Φ(xk ) − y¯i → 0. Because τk → 0, (7.8) is equivalent to φ(yk ) − hηk , yk − y¯i → φ(¯ y ),
© 2012 by Taylor & Francis Group, LLC
7.2 Sequential Optimality: Thibault’s Approach thereby establishing the result.
289
Next we present the Sequential Chain Rule in a simpler form using the above theorem and a lemma for which one will require the notion of the Clarke subdifferential. Recall that at any x ∈ Rn the Clarke generalized gradient or Clarke subdifferential is given as ˜ ∂ ◦ φ(x) = co {ξ ∈ Rn : ξ = lim ∇φ(xk ) where xk → x, xk ∈ D}, k→∞
˜ denotes the set of points at which φ is differentiable. The Sum where D Rule from Clarke [27] is as follows. Consider two locally Lipschitz functions φ1 , φ2 : Rn → R. Then for λ1 , λ2 ∈ R, ∂ ◦ (λ1 φ1 + λ2 φ2 )(x) ⊂ λ1 ∂ ◦ φ1 (x) + λ2 ∂ ◦ φ2 (x). For a convex function φ, the convex subdifferential and the Clarke subdifferential coincide, that is, ∂φ(¯ x) = ∂ ◦ φ(¯ x). Now we present the lemma that plays an important role in obtaining the Sequential Chain Rule. Lemma 7.5 Consider a locally Lipschitz vector-valued function Φ : Rn → Rm . Suppose that there exist {λk } ⊂ Rm and {xk } ⊂ Rn with λk → 0 and xk → x ¯ such that ωk ∈ ∂ ◦ (λk Φ)(xk ), ∀ k ∈ N. Then ωk → 0. Proof. By the Clarke Sum Rule, ωk ∈ ∂ ◦ (λk Φ)(xk ) ⊂
m X i=1
λki ∂ ◦ φi (xk ), ∀ k ∈ N,
where Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and φi : Rn → R, i = 1, 2, . . . , m, are locally Lipschitz functions. From the above condition, there exist ωik ∈ ∂ ◦ φi (xk ), i = 1, 2, . . . , m, such that ωk =
m X
λki ωik .
i=1
Therefore, kωk k = k
m X i=1
λki ωik k ≤
m X i=1
|λki | kωik k.
Because the Clarke subdifferential ∂ ◦ φi (xk ), i = 1, 2, . . . , m, are compact, {ωik }, i = 1, 2, . . . , m, are bounded sequences and hence by the Bolzano– Weierstrass Theorem, Proposition 1.3, have a convergent subsequence. Without loss of generality, assume that ωik → ωi , i = 1, 2, . . . , m. Because the
© 2012 by Taylor & Francis Group, LLC
290
Sequential Optimality Conditions
Clarke subdifferential as a set-valued map has a closed graph, ωi ∈ ∂ ◦ φi (¯ x), i = 1, 2, . . . , m. Also, as λk → 0, |λki | → 0. Hence, by the compactness of the Clarke subdifferential, for i = 1, 2, . . . , m, kωi k ≤ Mi < +∞. Therefore, m X i=1
|λki |
kωik k
→
m X
0(Mi ) = 0,
i=1
thus implying that ωk → 0.
Theorem 7.6 (A simpler version of the Sequential Chain Rule) Consider a vector-valued convex function Φ : Rn → Rm and a proper lsc convex ¯ that is nondecreasing over Φ(Rn ) + Rm . Then for function φ : Rm → R + y¯ = Φ(¯ x) ∈ dom φ, ξ ∈ ∂(φ ◦ Φ)(¯ x) if and only if there exist ηk ∈ ∂φ(yk )
and
ξk ∈ ∂(ηk Φ)(xk )
satisfying xk → x ¯, yk → Φ(¯ x), ξk → ξ, φ(yk ) − hηk , yk − y¯i → φ(¯ y ) and hηk , Φ(xk ) − y¯i → 0. Proof. Consider ξ ∈ ∂(φ ◦ Φ)(¯ x). Suppose that xk , yk , yk′ , ξk , ηk , and τk are as in Theorem 7.4. Denote ζk = ηk + τk . Observe that for every k ∈ N, ξk ∈ ∂(ηk Φ)(xk )
and
ηk Φ = ζk Φ + (ηk − ζk )Φ,
with ηk − ζk = −τk → 0. Because Φ is convex and every component is locally Lipschitz, it is simple to show that Φ is also locally Lipschitz. As xk → x ¯, for sufficiently large k, xk ∈ N (¯ x) where N (¯ x) is a neighborhood of ¯ > 0. x ¯ on which Φ satisfies the Lipschitz property with Lipschitz constant L Hence, (ηk − ζk )Φ is also locally Lipschitz over N (xk ) with Lipschitz constant ¯ k − ζk k > 0. This follows from the fact that for any x, x′ ∈ N (xk ), Lkη |(ηk − ζk )Φ(x) − (ηk − ζk )Φ(x′ )| = |h(ηk − ζk ), Φ(x) − Φ(x′ )i|, which by the Cauchy–Schwarz inequality, Proposition 1.1, implies that |(ηk − ζk )Φ(x) − (ηk − ζk )Φ(x′ )|
≤ kηk − ζk k kΦ(x) − Φ(x′ )k ¯ k − ζk k kx − x′ k. ≤ Lkη
From Theorem 7.4, ηk ∈ Rm + , which implies (ηk Φ) is convex. However, (ζk Φ) and ((ηk − ζk )Φ) need not be convex. Thus, ξk ∈ ∂(ηk Φ)(xk ) implies ξk ∈ ∂ ◦ (ηk Φ)(xk ) = ∂ ◦ {ζk Φ + (ηk − ζk )Φ}(xk ). By the Clarke Sum Rule, ξk ∈ ∂ ◦ (ζk Φ)(xk ) + ∂ ◦ ((ηk − ζk )Φ)(xk ),
© 2012 by Taylor & Francis Group, LLC
7.2 Sequential Optimality: Thibault’s Approach
291
which implies that there exist ρk ∈ ∂ ◦ (ζk Φ)(xk ) and ̺ ∈ ∂ ◦ ((ηk − ζk )Φ)(xk ) such that ξk = ρk + ̺k . As ηk − ζk → 0 and xk → x ¯, by Lemma 7.5, ̺k → 0. Setting ̺k = −βk , ρk = ξk + βk ∈ ∂ ◦ (ζk Φ)(xk ). As the limit k → +∞, ρk → ξ, ζk = ηk + τk ∈ ∂φ(yk ), hζk , Φ(xk ) − y¯i → 0, and φ(yk ) − hζk , yk − y¯i → φ(¯ y ), thereby yielding the desired conditions. Conversely, suppose that conditions are satisfied. As ηk ∈ ∂φ(yk ), φ(y) − φ(yk ) ≥ hηk , y − yk i, ∀ y ∈ Rm , which implies that φ(y) ≥ φ(yk ) − hηk , yk − y¯i + hηk , y − y¯i, ∀ y ∈ Rm . In particular, for y = Φ(x) for every x ∈ Rn , the above inequality yields that for every x ∈ Rn , (φ ◦ Φ)(x) ≥ φ(yk ) − hηk , yk − y¯i + hηk , Φ(x) − y¯i = φ(yk ) − hηk , yk − y¯i + hηk , Φ(x) − Φ(xk )i + hηk , Φ(xk ) − y¯i
= φ(yk ) − hηk , yk − y¯i + (ηk ◦ Φ)(x) − (η ◦ Φ)(xk ) + hηk , Φ(xk ) − y¯i.
Because ξk ∈ ∂(ηk ◦ Φ)(xk ), for every x ∈ Rn , (φ ◦ Φ)(x) ≥ φ(yk ) − hηk , yk − y¯i + hξk , x − xk i + hηk , Φ(xk ) − y¯i, which as the limit k → +∞ reduces to ¯ x−x (φ ◦ Φ)(x) ≥ φ(¯ y ) + hξ, ¯i ¯ = (φ ◦ Φ)(¯ x) + hξ, x − x ¯i, ∀ x ∈ Rn . Thus, ξ ∈ ∂(φ ◦ Φ)(¯ x), thereby completing the result.
Now we move on to establish the sequential optimality condition for (CP ) with the real-valued convex objective function f : Rn → R obtained by Thibault [108] using the above theorem. To apply the result, we equivalently expressed the feasible set C as C = {x ∈ Rn : G(x) ∈ −Rm + }, where G(x) = (g1 (x), g2 (x), . . . , gm (x)). Observe that G : Rn → Rm is a vector-valued convex function as gi , i = 1, 2, . . . , m, are convex. Now using the sequential subdifferential calculus rules, Theorems 7.1 and 7.6, we present the sequential optimality conditions for the constrained problem (CP ).
© 2012 by Taylor & Francis Group, LLC
292
Sequential Optimality Conditions
Theorem 7.7 Consider the convex programming problem problem (CP ) with C given by (3.1). Then x ¯ ∈ C is a point of minimizer of (CP ) if and only if there exist xk → x ¯, yk → G(¯ x), λk ∈ Rm x), and ξk ∈ ∂(λk G)(xk ) + , ξ ∈ ∂f (¯ such that ξ + ξk → 0, hλk , yk i = 0, hλk , yk − G(¯ x)i → 0, hλk , G(xk ) − G(¯ x)i → 0. Proof. Observe that the problem (CP ) can be rewritten as the unconstrained problem min(f + (δ−Rm ◦ G))(x) +
subject to
x ∈ Rn .
By Theorem 2.89, x ¯ is a point of minimizer of (CP ) if and only if 0 ∈ ∂(f + (δ−Rm ◦ G))(¯ x). +
As dom f = Rn , invoking the Sum Rule, Theorem 2.91, 0 ∈ ∂f (¯ x) + ∂(δ−Rm ◦ G)(¯ x), + which is equivalent to the existence of ξ ∈ ∂f (¯ x) and ξˆ ∈ ∂(δ−Rm ◦ G)(¯ x) such + that ξ + ξˆ = 0. As G is a convex function and the indicator function δ−Rm is a proper lsc con+ vex function nondecreasing over G(Rn ) + Rm , thereby applying Theorem 7.6, + ˆ m ξ ∈ ∂(δ−R+ ◦ G)(¯ x) if and only if there exist xk → x ¯, yk → G(¯ x), ξk → ξˆ such that λk ∈ ∂δ−Rm (yk ) +
and ξk ∈ ∂(λk G)(xk ),
satisfying δ−Rm (yk ) − hλk , yk − G(¯ x)i → (δ−Rm ◦ G)(¯ x) and hλk , G(xk ) − G(¯ x)i → 0. + + As λk ∈ ∂δ−Rm (yk ) = N−Rm (yk ), the sequence {yk } ⊂ −Rm + with + + hλk , y − yk i ≤ 0, ∀ y ∈ −Rm +.
In particular, taking y = 0 and y = 2yk in the above inequality leads to hλk , yk i = 0. Thus, hλk , yi ≤ 0, ∀ y ∈ −Rm +,
m which implies {λk } ⊂ Rm + . Using the fact that {yk } ⊆ −R+ , the condition
δ−Rm (yk ) − hλk , yk − G(¯ x)i → (δ−Rm ◦ G)(¯ x) + + reduces to hλk , yk − G(¯ x)i → 0, thereby leading to the requisite result.
© 2012 by Taylor & Francis Group, LLC
7.3 Fenchel Conjugates and Constraint Qualification
7.3
293
Fenchel Conjugates and Constraint Qualification
Observe that in the previous section, the sequential optimality conditions are obtained in terms of the subdifferentials that are calculated at some neighboring point rather than the exact point of minimum, as is the case in the standard KKT conditions. But this can be overcome by using the Brøndsted– Rockafellar Theorem, Theorem 2.114, thereby expressing the result in terms of the ε-subdifferentails at the exact point. To the best of our knowledge this was carried out by Jeyakumar, Rubinov, Glover, and Ishizuka [70] and Jeyakumar, Lee, and Dinh [68]. In their approach, the epigraph of the conjugate function of the objective function and the constraints play a central role in the characterization of the optimality for the convex programming problem (CP ). The proof is based on the result in Jeyakumar, Rubinov, Glover, and Ishizuka [70]. Theorem 7.8 Consider the convex programming problem (CP ) with C given by (3.1). Then x ¯ is a point of minimizer of (CP ) if and only if (0, −f (¯ x)) ∈ epi f ∗ + cl
[
λ∈Rm +
m X epi ( λi gi )∗ .
(7.12)
i=1
Proof. Recall that the feasible set C of the convex programming problem (CP ) is given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, which can be equivalently expressed as C = {x ∈ Rn : G(x) ∈ −Rm + }, where G : Rn → Rm is defined as G(x) = (g1 (x), g2 (x), . . . , gm (x)). Because gi , i = 1, 2, . . . , m, are convex functions, G is also a convex function. x ¯ is a point of minimizer of (CP ) if and only if f (x) ≥ f (¯ x), ∀ x ∈ C, that is, φ(x) + δC (x) ≥ 0, ∀ x ∈ Rn , where φ(x) = f (x) − f (¯ x). By Definition 2.101 of the conjugate function, (φ + δC )∗ (0)
=
sup {h0, xi − (φ + δC )(x)}
x∈Rn
=
sup −(φ + δC )(x) ≤ 0.
x∈Rn
© 2012 by Taylor & Francis Group, LLC
294
Sequential Optimality Conditions
This shows that (0, 0) ∈ epi (φ + δC )∗ , which by the epigraph of the conjugate of the sum, Theorem 2.123, implies that ∗ (0, 0) ∈ cl{epi φ∗ + epi δC }.
As dom φ = Rn , by Theorem 2.69, φ is continuous on Rn . Hence, by Propo∗ sition 2.124, epi φ∗ + epi δC is closed, which reduces the above condition to ∗ (0, 0) ∈ epi φ∗ + epi δC . (7.13) Consider (λG)(x) = hλ, G(x)i. m For x ∈ C, G(x) ∈ −Rm + , which implies (λG)(x) ≤ 0 for every λ ∈ R+ . Thus,
sup (λG)(x) = 0.
(7.14)
λ∈Rm +
If x ∈ / C, there exists some i ∈ {1, 2, . . . , m} such that gi (x) > 0. Hence, it is simple to see that sup (λG)(x) = +∞. (7.15) λ∈Rm +
Combining (7.14) and (7.15), δC (x) = sup (λG)(x). λ∈Rm +
Applying Theorem 2.123, relation (7.13) along with Proposition 2.103 yields [ (0, 0) ∈ epi f ∗ + (0, f (¯ x)) + cl co epi(λG)∗ . λ∈Rm +
By Theorem 2.123, relation reduces to (0, 0)
S
∈
λ∈Rm +
epi(λG)∗ is a convex cone and thus, the above
epi f ∗ + (0, f (¯ x)) + cl
= epi f ∗ + (0, f (¯ x)) + cl
[
[
λ∈Rm +
thereby leading to the requisite condition (7.12).
© 2012 by Taylor & Francis Group, LLC
epi(λG)∗
λ∈Rm + m X epi( λi gi )∗ , i=1
7.3 Fenchel Conjugates and Constraint Qualification
295
Conversely, suppose that condition (7.12) holds, which implies exPmthere k ∗ ist ξ ∈ dom f ∗ , α ≥ 0, αk ≥ 0, {λk } ⊂ Rm , {ξ } ⊂ dom ( λ g k + i=1 i i ) , i = 1, 2, . . . , m, such that ∗
(0, −f (¯ x)) = (ξ, f (ξ) + α) + lim (ξk , k→∞
m X
(λki gi )∗ (ξk ) + αk ).
i=1
Componentwise comparison leads to 0 −f (¯ x)
= ξ + lim ξk ,
(7.16)
k→∞
m X = f ∗ (ξ) + α + lim ( (λki gi )∗ (ξk ) + αk ). k→∞
(7.17)
i=1
By Definition 2.101 of the conjugate functions, the condition (7.17) implies that for every x ∈ Rn , f (¯ x) − f (x)
≤ ≤
m X −hξ, xi − α − lim ( (λki gi )∗ (ξk ) + αk ) k→∞
i=1
−hξ, xi − α − lim (hξk , xi − k→∞
m X
λki gi (x) + αk ).
i=1
In particular, taking x ∈ C, that is, gi (x) ≤ 0, i = 1, 2, . . . , m, in the above inequality along with the nonnegativity of α, αk , λki , i = 1, 2, . . . , m, and the condition (7.16) yields f (¯ x) ≤ f (x), ∀ x ∈ C. Therefore, x ¯ is a point of minimizer of (CP ), as desired.
As one can express the epigraph of conjugate functions in terms of the ε-subdifferential of the function, Theorem 2.122, Jeyakumar et al. [70, 68] expressed the above theorem in terms of the ε-subdifferentials, thus obtaining the sequential optimality conditions presented below. We present the same using the condition (7.12) obtained in Theorem 7.8. Theorem 7.9 Consider the convex programming problem (CP ) with C given by (3.1). Then x ¯ is a point of minimizer for (CP ) if and only if there exist ξ ∈ ∂f (¯ x), εki ≥ 0, λki ≥ 0, ξik ∈ ∂εki gi (¯ x), i = 1, 2, . . . , m, such that ξ+
m X i=1
λki ξik → 0,
m X i=1
λki gi (¯ x) → 0
and
m X i=1
λki εki ↓ 0
as
k → +∞.
Proof. Consider Theorem 7.8, according to which x ¯ is a point of minimizer of (CP ) if and only if the containment (7.12) is satisfied. By Theorem 2.122,
© 2012 by Taylor & Francis Group, LLC
296
Sequential Optimality Conditions Pm there exist ξ ∈ ∂ε f (¯ x), λki ≥ 0 and ξk ∈ ∂εk ( i=1 λki gi )(¯ x), i = 1, 2, . . . , m, with ε, εk ≥ 0 such that m X (0, −f (¯ x)) = (ξ, hξ, x ¯i + ε − f (¯ x)) + lim (ξk , hξk , x ¯ i + εk − ( λki gi )(¯ x)). k→∞
i=1
Componentwise comparison leads to 0
=
ξ + lim ξk , k→∞
m X −ε = hξ, x ¯i + lim (hξk , x ¯ i + εk − ( λki gi )(¯ x)), k→∞
i=1
which together imply that m X −ε = lim (εk − ( λki gi )(¯ x)). k→∞
i=1
This equation along with the nonnegativity of ε, εk , and λki , i = 1, 2, . . . , m, implies m X ε = 0, εk ↓ 0 and λki gi (¯ x) → 0. (7.18) i=1
n
As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.115, there exist εki ≥ 0, i = 1, 2, . . . , m, such that ξk ∈
m X
∂εki (λki gi )(¯ x)
and
i=1
εk =
m X
εki .
i=1
Define I¯k = {i ∈ {1, 2, . . . , m} : λki > 0}. By Theorem 2.117, ∂εki (λki gi )(¯ x) = λki ∂ε¯ki gi (¯ x), ∀ i ∈ I¯k , where ε¯ki =
εki ≥ 0. Therefore, λki X X ξk ∈ λki ∂ε¯ki gi (¯ x) + ∂εki (λki gi )(¯ x). i∈I¯k
(7.19)
i∈ / I¯k
As discussed in Chapter 2, the ε-subdifferential of zero function is zero, that is, ∂ε 0(x) = {0}. Thus, ∂εki (λki gi )(¯ x) = 0 = λki ∂εki gi (¯ x), ∀ i ∈ / I¯k . The above relation along with the condition (7.19) yields that X X ξk ∈ λki ∂ε¯ki gi (¯ x) + λki ∂εki gi (¯ x). i∈I¯k
© 2012 by Taylor & Francis Group, LLC
i∈ / I¯k
(7.20)
7.3 Fenchel Conjugates and Constraint Qualification
297
Also, εk =
X
λki ε¯ki +
i∈I¯k
X
λki εki ,
i∈ / I¯k
which along with (7.20) leads to the desired sequential optimality conditions. Conversely, suppose that the sequential optimality conditions hold. From Definitions 2.77 and 2.109 of subdifferentials and ε-subdifferentials, f (x) − f (¯ x) ≥ hξ, x − x ¯i, k gi (x) − gi (¯ x) ≥ hξi , x − x ¯i − εki , i = 1, 2, . . . , m, respectively. The above inequalities along with the sequential optimality conditions imply that f (x) − f (¯ x) +
m X i=1
λki gi (x) ≥ 0, ∀ x ∈ Rn ,
where {λki } ⊂ R+ , i = 1, 2, . . . , m. In particular, taking x ∈ C, that is, gi (x) ≤ 0, i = 1, 2, . . . , m, which along with the condition on {λk } reduces the above inequality to f (x) ≥ f (¯ x), ∀ x ∈ C, thereby establishing the optimality of x¯ for (CP ).
Observe that not only is the optimality condition sequential, but one obtains a sequential complementary slackness condition. Note that we are working in a simple scenario with a convex inequality system. This helps in expressing the condition (7.12) derived in Theorem 7.8 in a more relaxed form. By applying Theorem 2.123, the condition becomes (0, −f (¯ x)) ∈ epi f ∗ + cl
[
λ∈Rm +
m X epi (λi gi )∗ ). cl(
(7.21)
i=1
By the closure properties of the arbitrary union of sets, the condition (7.21) leads to m [ X (0, −f (¯ x)) ∈ epi f ∗ + cl epi (λi gi )∗ . (7.22) i=1 λ∈Rm +
Define I¯λ = {i ∈ {1, 2, . . . , m} : λi > 0}. Again by Theorem 2.123, epi (λi gi )∗ = λi epi gi∗ , ∀ i ∈ I¯λ . For i 6∈ I¯λ with λi = 0, ∗
∗
(λi gi ) (ξ) = 0 (ξ) =
© 2012 by Taylor & Francis Group, LLC
0, ξ = 0, +∞, otherwise,
298
Sequential Optimality Conditions
which implies that epi (λi gi )∗ = {0} × R+ , ∀ i ∈ / I¯λ . Using the preceding conditions, the relation (7.22) becomes [ X X (0, −f (¯ x)) ∈ epi f ∗ + cl ( λi epi gi∗ + {0} × R+ ) ¯ λ∈Rm + i∈Iλ
= epi f ∗ + cl
[
(
X
i∈I¯λ
λ∈Rm +
i∈ / I¯λ
λi epi gi∗ + {0} × R+ ).
(7.23)
P Now consider (ξ, α) ∈ i∈I¯λ λi epi gi∗ , which implies that for i ∈ I¯λ there exist (ξi , αi ) ∈ epi gi∗ such that X (ξ, α) = λi (ξi , αi ). i∈I¯λ
Therefore, for any element (0, α) ¯ ∈ {0} × R+ , X (ξ, α + α) ¯ = λi (ξi , αi + α ¯ /λi ), i∈I¯λ
where
α ¯ ≥ 0. As (ξi , αi ) ∈ epi gi∗ , λi gi∗ (ξi ) ≤ αi ≤ αi +
α ¯ , ∀ i ∈ I¯λ , λi
which implies that (ξi , αi + α ¯ /λi ) ∈ epi gi∗ . Hence (ξ, α + α ¯) ∈ for every α ¯ ≥ 0. Therefore, (7.23) reduces to ∗
(0, −f (¯ x)) ∈ epi f + cl
m [ X
λ∈Rm +
P
i∈I¯λ
λi epi gi∗
λi epi gi∗ .
i=1
It is quite simple to see that cl
m [ X
λi epi gi∗ = cl cone co
i=1 λ∈Rm +
m [
epi gi∗ .
i=1
We leave this as an exercise for the reader. Hence, (0, −f (¯ x)) ∈ epi f ∗ + cl cone co
m [
epi gi∗ .
(7.24)
i=1
The condition (7.24) implies that there exist ξ ∈ dom f ∗ , α ≥ 0, ξik ∈ dom gi∗ , αik , λki ≥ 0, i = 1, 2, . . . , m, such that (0, −f (¯ x)) = (ξ, f ∗ (ξ) + α) + lim
k→∞
© 2012 by Taylor & Francis Group, LLC
m X i=1
λki (ξik , gi∗ (ξik ) + αik ).
7.3 Fenchel Conjugates and Constraint Qualification
299
Componentwise comparison leads to 0
= ξ + lim
k→∞
m X
λki ξik ,
= f ∗ (ξ) + α + lim
−f (¯ x)
(7.25)
i=1
k→∞
m X
λki (gi∗ (ξik ) + αik ).
(7.26)
i=1
By Definition 2.101 of the conjugate functions, the condition (7.26) implies that f (¯ x) − f (x) ≤ ≤
−hξ, xi − α − lim
k→∞
−hξ, xi − α − lim
k→∞
m X
λki (gi∗ (ξik ) + αik )
i=1
m X i=1
λki (hξik , xi − gi (x) + αik ).
In particular, taking x ∈ C along with the nonnegativity of α, αik , and λki , i = 1, 2, . . . , m, and the condition (7.25) yields f (¯ x) ≤ f (x), ∀ x ∈ C. Therefore, x ¯ is a point of minimizer of (CP ) under the relation (7.24). This discussion can be stated as the following result. Theorem 7.10 Consider the convex programming problem (CP ) with C given by (3.1). Then x ¯ is a point of minimizer of (CP ) if and only if (0, −f (¯ x)) ∈ epi f ∗ + cl cone co
m [
epi gi∗ .
(7.27)
i=1
Using the above result, we present an alternate proof to the sequential optimality conditions, Theorem 7.9. Alternate proof of Theorem 7.9. According to the Theorem 7.10, x ¯ is a point of minimizer of (CP ) if and only if the containment (7.27) is satisfied. By Theorem 2.122, there exist ξ ∈ ∂ε f (¯ x), ξik ∈ ∂εki gi (¯ x) and λki ≥ 0, i = 1, 2, . . . , m, with ε, εki ≥ 0, i = 1, 2, . . . , m, such that (0, −f (¯ x)) = (ξ, hξ, x ¯i + ε − f (¯ x)) + lim
k→∞
m X i=1
λki (ξik , hξik , x ¯i + εki − gi (¯ x)).
Componentwise comparison leads to 0
=
ξ + lim
k→∞
m X
λki ξik ,
i=1
−ε = hξ, x ¯i + lim
k→∞
© 2012 by Taylor & Francis Group, LLC
m X i=1
λki (hξik , x ¯i + εki − gi (¯ x)),
300
Sequential Optimality Conditions
which together imply that −ε = lim
k→∞
m X i=1
λki (εki − gi (¯ x)).
k k This equation along the nonnegativity Pm with Pm of kε,k εi and λi , i = 1, 2, . . . , m, k implies ε = 0, i=1 λi gi (¯ x) → 0 and i=1 λi εi ↓ 0 as k → +∞, thereby establishing the sequential optimality conditions. The converse can be verified as in Theorem 7.9.
As already discussed in the previous chapters, if one assumes certain constraint qualifications, then the standard KKT conditions can be established. If we observe the necessary and sufficient condition given S Pmin Theorem 7.8 carefully, we will observe that the term cl λ∈Rm epi ( i=1 λi gi )∗ prevents us + from further manipulation. On the other hand, one might feel that the route to the KKT optimality conditions lies in further manipulation of the condition (7.12). Further, observe that we arrived at the condition (7.12) without any constraint qualification. However, in order to derive the KKT optimality conditions, one needs some additional qualification conditions on the constraints. Thus from (7.12) it is natural to consider that the set [
λ∈Rm +
m X λi gi )∗ epi (
is closed.
i=1
This is usually known as the closed cone constraint qualification or the Farkas– Minkowski (FM) constraint qualification. One may also take the more relaxed constraint qualification based on condition (7.27), that is, cone (co
m [
epi gi∗ )
is closed.
i=1
We will call the above constraint qualification as the relaxed FM constraint qualification. Below we derive the standard KKT conditions under either of the two constraint qualification. Theorem 7.11 Consider the convex programming problem (CP ) with C given by (3.1). Assume that either the FM constraint qualification holds or the relaxed FM constraint qualification holds. Then x ¯ is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x)
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
i=1
Proof. From Theorem 7.8, we know that x ¯ is a point of minimizer of (CP ) if and only if the relation (7.12) holds. As the FM constraint qualification is
© 2012 by Taylor & Francis Group, LLC
7.3 Fenchel Conjugates and Constraint Qualification
301
satisfied, (7.12) reduces to (0, −f (¯ x)) ∈ epi f ∗ +
[
λ∈Rm +
m X epi ( λi gi )∗ . i=1
By the ε-subdifferential characterization of the epigraph of the conjugate function, Theorem ξ ∈ ∂ε f (¯ x), λi ≥ 0, i = 1, 2, . . . , m, and Pm 2.122, there exist ξ ′ ∈ ∂ε′ ( i=1 λi gi )(¯ x) with ε, ε′ ≥ 0 such that m X (0, −f (¯ x)) = (ξ, hξ, x ¯i + ε − f (¯ x)) + (ξ ′ , hξ ′ , x ¯ i + ε′ − ( λi gi )(¯ x)). i=1
Componentwise comparison leads to 0 −f (¯ x)
= ξ + ξ′,
(7.28) m X
= hξ, x ¯i + ε − f (¯ x) + hξ ′ , x ¯ i + ε′ − (
λi gi )(¯ x).
(7.29)
i=1
By the feasibility of x ¯ ∈ C along with the nonnegativity of ε, ε′ , and λi , i = 1, 2, . . . , m, the condition (7.29) leads to ε = 0,
ε′ = 0,
m X
and
λi gi (¯ x) = 0.
i=1
Because ε = 0 and ε′ = 0, the condition (7.28) lead to the fact that 0 ∈ ∂f (¯ x) + Further,
Pm
i=1
m X
λi ∂gi (¯ x).
(7.30)
i=1
λi gi (¯ x) = 0 implies that λi gi (¯ x) = 0, i = 1, 2, . . . , m.
(7.31)
The conditions (7.30) and (7.31) together yield the KKT optimality conditions. Now if the relaxed constraint qualification is satisfied, (7.27) reduces to ∗
(0, −f (¯ x)) ∈ epi f + cone co
m [
epi gi∗ .
i=1
By the ε-subdifferential characterization of the epigraph of the conjugate function, Theorem 2.122, there exist ξ ∈ ∂ε f (¯ x), ξi ∈ ∂εi gi (¯ x) and λi ≥ 0, i = 1, 2, . . . , m, with ε, εi ≥ 0, i = 1, 2, . . . , m, such that (0, −f (¯ x)) = (ξ, hξ, x ¯i + ε − f (¯ x)) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi (ξi , hξi , x ¯i + εi − gi (¯ x)).
302
Sequential Optimality Conditions
Componentwise comparison leads to 0
= ξ+
m X
λi ξi ,
(7.32)
i=1
−f (¯ x)
= hξ, x ¯i + ε − f (¯ x) +
m X i=1
λi (hξi , x ¯i + εi − gi (¯ x)).
(7.33)
By the feasibility of x ¯ ∈ C along with the nonnegativity of ε, εi , and λi , i = 1, 2, . . . , m, the condition (7.33) leads to ε = 0,
m X
λ i εi = 0
and
i=1
m X
λi gi (¯ x) = 0.
i=1
Let us assume that I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Then ¯ εi = 0 and gi (¯ corresponding to any i ∈ I, x) = 0, imply that ξ ∈ ∂f (¯ x),
ξi ∈ ∂gi (¯ x)
and
¯ λi gi (¯ x) = 0, i ∈ I.
(7.34)
Therefore, from (7.32) and (7.34), X X 0=ξ+ λi ξi ∈ ∂f (¯ x) + λi ∂gi (¯ x). i∈I¯
i∈I¯
¯ choose εi = 0, the above condition leads to For i 6∈ I, 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x),
i=1
along with the complementary slackness condition λi gi (¯ x) = 0, i = 1, 2, . . . , m and thereby establishing the standard KKT optimality conditions. The reader should try to see how to arrive at the KKT optimality conditions when I¯ is empty. The sufficiency can be worked out using Definition 2.77 of subdifferentials, as done in Chapter 3. The proof of the KKT optimality conditions under the FM constraint qualification was given by Jeyakumar, Lee, and Dinh [68] and that using the relaxed FM condition is based on Jeyakumar [67]. It has been shown by Jeyakumar, Rubinov, Glover, and Ishizuka [70] that under the Slater constraint qualification, the FM constraint qualification holds. We present the result below proving the same. Proposition 7.12 Consider the set C given by (3.1). Assume that the Slater constraint qualification holds, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Then the FM constraint qualification is satisfied, that is, [
λ∈Rm +
© 2012 by Taylor & Francis Group, LLC
m X epi ( λi gi )∗ i=1
is closed.
7.3 Fenchel Conjugates and Constraint Qualification
303
Proof. Observe that defining G = (g1 , g2 , . . . , gm ), [
λ∈Rm +
m X [ epi ( λi gi )∗ = epi (λG)∗ . λ∈Rm +
i=1
Suppose that (ξk , αk ) → (ξ, α) ∈ cl
[
epi (λG)∗
λ∈Rm +
m with (λk G)∗ (ξk ) ≤ αk for some λk ∈ Rm + . As int R+ is nonempty, one can m always find a compact convex set R ⊂ R+ such that 0 ∈ / R and cone R = Rm +. Thus, λk = γk bk , where γk ≥ 0 and bk ∈ R. Assume that γk ≥ 0 for every k and bk → b ∈ R by the compactness of R. We consider the following cases.
(i) γk → γ > 0: Consider (λk G)∗ (ξk ) ≤ αk
⇐⇒ (γk bk G)∗ ≤ αk
⇐⇒ (bk G)∗ (ξk /γk ) ≤ αk /γk .
Because bk G → bG, ξk /γk → ξ/γ and αk /γk → α/γ, (bG)∗ (ξ/γ) ≤ lim inf (bk G)∗ (ξk /γk ) ≤ α/γ. k→∞
Therefore, (ξ/γ, α/γ) ∈ epi (bG)∗ and hence (ξ, α) ∈
S
λ∈Rm +
epi (λG)∗ .
(ii) γk → +∞: Then ξk /γk → 0 and αk /γk → 0. Therefore, (bG)∗ (0) ≤ lim inf (bk G)∗ (ξk /γk ) ≤ 0, k→∞
which implies − infn (bG)(x) = sup (−(bG)(x)) ≤ 0, x∈R
x∈Rn
that is, (bG)(x) ≥ 0 for every x ∈ Rn . But by the Slater constraint qualification, G(ˆ x) ∈ −int Rm x) < 0, which is a + and b 6= 0. Therefore, (bG)(ˆ contradiction. (iii) γk → 0: This implies that λk → 0 and thus (λk G) → 0. Therefore, 0∗ (ξ) ≤ lim inf (λk G)∗ (ξk ) ≤ α. k→∞
Observe that ∗
′
0 (ξ ) =
© 2012 by Taylor & Francis Group, LLC
0, ξ ′ = 0, +∞, otherwise,
304
Sequential Optimality Conditions
which leads to ξ = 0 and α ≥ 0. Thus, (0, α) ∈ epi (0G)∗ ⊂
[
epi (λG)∗ .
λ∈Rm +
Therefore, the closed cone constraint qualification is satisfied.
Next we present some examples to show that the FM constraint qualification is weaker in comparison to the Slater constraint qualification. Consider C = {x ∈ R : g(x) ≤ 0}, where 2 x , x ≤ 0, g(x) = x, x ≥ 0. Observe that C = {0} and hence the Slater constraint qualification is not satisfied. Also TC (0) = {0} while S(0)
= {d ∈ R : g ′ (0, d) ≤ 0} = {d ∈ R : d ≤ 0},
which implies that the Abadie constraint qualification is also not satisfied. For ξ ∈ R, 0, 0 ≤ ξ ≤ 1, ∗ g (ξ) = ξ 2 /4, ξ ≤ 0. Observe that as only one constraint is involved, [ [ epi (λg)∗ = λ epi g ∗ = cone epi g ∗ . λ≥0
λ≥0
Therefore, the FM constraint qualification reduces to the set cone epi g ∗ being closed, which is same as the relaxed FM constraint qualification. Here, cone epi g ∗ = {(ξ, α) ∈ R2 : ξ ≤ 0, α > 0} ∪ {(ξ, α) ∈ R2 : ξ ≥ 0, α ≥ 0} is not closed and hence the FM constraint qualification is not satisfied. Now suppose that in the previous example, −2x, x ≤ 0, g(x) = x, x ≥ 0. Again, C = {0} and the Slater constraint qualification does not hold. But unlike the above example, TC (0) = {0} = S(0) which implies that the Abadie constraint qualification is satisfied. For ξ ∈ R, g ∗ (ξ) = 0, − 2 ≤ ξ ≤ 1. Observe that the set cone epi g ∗ = {(ξ, α) ∈ R2 : ξ ∈ R, α ≥ 0}
© 2012 by Taylor & Francis Group, LLC
7.3 Fenchel Conjugates and Constraint Qualification
305
is closed. Thus, the FM constraint qualification also holds. Observe that in the above examples, either both the Abadie constraint qualification and the FM qualification are satisfied or neither holds. Now let us consider an example from Jeyakumar, Lee, and Dinh [70] showing that the FM constraint qualification is weaker than the Abadie constraint qualification as well. Consider a convex function g : R2 → R defined as q g(x1 , x2 ) = x21 + x22 − x2 .
Here C = {(x1 , x2 ) ∈ R2 : x1 = 0, x2 ≥ 0}. Observe that the Slater constraint qualification does not hold as for any (x1 , x2 ) ∈ C, g(x1 , x2 ) = 0. For (0, x2 ), x2 > 0, g is differentiable at (0, x2 ) and hence S(0, x2 ) = R2
while
TC (0, x2 ) = {(0, x2 ) : x2 ∈ R}.
Thus the Abadie constraint qualification is also not satisfied. Now, for any (ξ1 , ξ2 ) ∈ R2 , 0, ξ1 = ξ2 = 0, ∗ g (ξ1 , ξ2 ) = +∞, otherwise. Therefore, cone epi g ∗ = {(0, 0)} × R+ , which is closed. Hence, the FM constraint qualification holds, thereby showing that it is a weaker constraint qualification with respect to the Slater and Abadie constraint qualifications. Until now we considered the convex programming problem (CP ) with a real-valued objective function f . This fact played an important role in the derivation of Theorem 7.8 as the continuity of f on Rn along with Proposition 2.124 leads to the closedness of ∗ epi f ∗ + epi δC .
¯ is a proper lsc convex function and C involves inequality But if f : Rn → R constraints and additionally an abstract constraint, that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X}
(7.35)
where X ⊂ Rn is a closed convex set, then one has to impose an additional condition along with the closed cone constraint qualification to establish the KKT optimality condition, namely the CC qualification condition, that is, epi f ∗ +
[
λ∈Rm +
© 2012 by Taylor & Francis Group, LLC
m X ∗ epi ( λi gi )∗ + epi δX i=1
is closed.
306
Sequential Optimality Conditions
Next we present the KKT optimality condition in the presence of the CC qualification condition from Dinh, Nghia, and Vallet [34]. A similar result was established by Burachik and Jeyakumar [20] under the assumption of CC as well as FM constraint qualification. Theorem 7.13 Consider the convex programming problem (CP ) where ¯ is a proper lsc convex function and the feasible set C is given f : Rn → R by (7.35). Assume that the CC qualification condition is satisfied. Then x ¯ ∈ dom f ∩ C is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x)
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
i=1
Proof. Suppose that x ¯ ∈ dom f ∩ C is a point of minimizer of the problem (CP ). Then working along the lines of Theorem 7.8, we have ∗ (0, 0) ∈ cl {epi φ∗ + epi δC },
where φ(x) = f (x) − f (¯ x). Expressing C = C ∩ X implies that δC = δC + δX , where C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}. From the proof of Theorem 7.8, ∗ = cl epi δC
m X epi ( λi gi )∗ .
[
λ∈Rm +
i=1
Therefore, by Theorem 2.123 and Propositions 2.102 and 2.15 (vi), the above condition becomes (0, 0)
∗ ∗ + epi δX cl {epi φ∗ + cl (epi δC )} m [ X ∗ ⊂ cl {epi φ∗ + cl ( epi ( λi gi )∗ + epi δX )}
∈
λ∈Rm +
⊂ cl {epi φ∗ +
[
λ∈Rm +
i=1
m X
epi (
i=1
∗ λi gi )∗ + epi δX }.
By Propositions 2.103 and 2.15 (vi), the above yields (0, −f (¯ x)) ∈ cl {epi f ∗ +
© 2012 by Taylor & Francis Group, LLC
[
λ∈Rm +
m X ∗ epi ( λi gi )∗ + epi δX }, i=1
7.3 Fenchel Conjugates and Constraint Qualification
307
which under the CC qualification condition reduces to (0, −f (¯ x)) ∈ epi f ∗ +
[
λ∈Rm +
m X ∗ epi ( λi gi )∗ + epi δX . i=1
Pm Applying Theorem 2.122, there exist ξf ∈ ∂εf f (¯ x), ξg ∈ ∂εg ( i=1 λi gi )(¯ x), and ξx ∈ ∂εx δX (¯ x) = NX,εx (¯ x) with εf , εg , εx ≥ 0 such that 0 = ξf + ξg + ξx , (7.36) −f (¯ x) = (hξf , x ¯i − f (¯ x ) + εf ) m X +(hξg , x ¯i − ( λi gi )(¯ x) + εg ) + hξx , x ¯i + εx . (7.37) i=1
Condition (7.36) leads to m X 0 ∈ ∂εf f (¯ x) + ∂εg ( λi gi )(¯ x) + NX,εx (¯ x).
(7.38)
i=1
Condition (7.37) along with (7.36) and the nonnegativity conditions yields εf + εg + εx −
m X
λi gi (¯ x) = 0.
i=1
From the above condition it is obvious that εf = 0, εg = 0, εx = 0,
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
Therefore, the condition (7.38) reduces to m X 0 ∈ ∂f (¯ x) + ∂( λi gi )(¯ x) + NX (¯ x). i=1
n
As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.91, the above condition becomes 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) + NX (¯ x),
i=1
which along with the complementary slackness condition yields the desired optimality conditions. Conversely, suppose that the optimality conditions hold. Therefore, there exist ξ ∈ ∂f (¯ x) and ξi ∈ ∂gi (¯ x) such that −ξ −
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi ξi ∈ NX (¯ x),
308
Sequential Optimality Conditions
that is, hξ +
m X i=1
λi ξi , x − x ¯i ≥ 0, ∀ x ∈ X.
The convexity of f and gi , i = 1, 2, . . . , m, along with Definition 2.77 of the subdifferentials, imply that f (x) − f (¯ x) +
m X i=1
λi gi (x) −
m X i=1
λi gi (¯ x) ≥ 0, ∀ x ∈ X.
¯ that is, gi (x) ≤ 0, i = 1, 2, . . . , m, along with the In particular, for x ∈ C, complementary slackness condition, reduces the above condition to f (x) ≥ f (¯ x), ∀ x ∈ C. Thus, x ¯ is a point of minimizer of (CP ).
7.4
Applications to Bilevel Programming Problems
Consider the following bilevel problem: min f (x)
subject to x ∈ C,
(BP )
where C is given as C = argmin{φ(x) : x ∈ Θ}, f, φ : Rn → R are convex functions, and Θ ⊂ Rn is a convex set. Thus it is clear that C is a convex set and hence the problem (BP ) is a convex programming problem. As C is the solution set to a subproblem, which is again a convex optimization problem, here we call (BP ) a simple convex bilevel programming problem. In particular, (BP ) contains the standard differentiable convex optimization problem of the form min f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m, Ax = b,
where f, gi : Rn → R, i = 1, 2, . . . , m, are differentiable convex functions, A is an l × n matrix, and b ∈ Rl . This problem can be posed as the problem (BP ) by defining φ as 2
φ(x) = ||Ax − b|| +
m X i=1
|| max{0, gi (x)}||2 ,
and the lower-level problem is to minimize φ over Rn .
© 2012 by Taylor & Francis Group, LLC
7.4 Applications to Bilevel Programming Problems
309
The bilevel programming problem (BP ) can be equivalently expressed as a convex programming problem by assuming C to be nonempty and defining α = inf φ(x). x∈Θ
Then the reformulated problem is given by min f (x)
φ(x) ≤ α, x ∈ Θ.
subject to
(RP )
Observe that (RP ) has the same form as the convex programming problem (CP ) studied in the previous section. From the definition of α, it is easy to see that there does not exist any x ˆ ∈ Θ such that φ(ˆ x) < α, which implies that the Slater constraint qualification does not hold for (RP ). We present the KKT optimality condition as a consequence of Theorem 7.13. Theorem 7.14 Consider the reformulated problem (RP ). Assume that ∗ {cone {(0, 1)} ∪ cone [(0, α) + epi φ∗ ]} + epi δΘ
is closed. Then x ¯ ∈ Θ is a point of minimizer of (RP ) if and only if there is λ ≥ 0 such that 0 ∈ ∂f (¯ x) + λ∂φ(¯ x) + NΘ (¯ x)
and
λ(φ(¯ x) − α) = 0.
Proof. Observe that the problem (RP ) is of the type considered in Theorem 7.13. We can invoke Theorem 7.13 if the CC qualification condition holds, that is, [ ∗ epi f ∗ + epi(µ(φ(.) − α))∗ + epi δΘ µ≥0
is closed. As dom f = Rn , by Theorem 2.69, f is continuous on Rn and thus the CC qualification condition can be replaced by the FM constraint qualification, that is, [ ∗ epi (µ(φ(.) − α))∗ + epi δΘ (7.39) µ≥0
is closed. For µ > 0, by Proposition 2.103, (µ(φ(.) − α))∗ (ξ) = µα + (µφ)∗ (ξ), which along with Theorem 2.123 leads to epi (µ(φ(.) − α))∗
= (0, µα) + epi (µφ)∗ = µ((0, α) + epi φ∗ ), ∀ µ > 0.
For µ = 0, ∗
∗
(µ(φ(.) − α)) (ξ) = 0 (ξ) =
© 2012 by Taylor & Francis Group, LLC
0, ξ = 0, +∞, otherwise,
(7.40)
310
Sequential Optimality Conditions
which implies epi (µ(φ(.) − α))∗ = 0 × R+ = cone {(0, 1)}, µ = 0.
(7.41)
Using (7.40) and (7.41), the condition (7.39) becomes [ ∗ cone{(0, 1)} ∪ { µ((0, α) + epi φ∗ )} + epi δΘ . µ>0
Observe that cone{(0, 1)}∪{(0, 0)} = cone{(0, 1)} and thus the above becomes ∗ cone{(0, 1)} ∪ cone ((0, α) + epi φ∗ ) + epi δΘ .
(7.42)
By the hypothesis of the theorem, (7.42) is a closed set and thus the reformulated problem (RP ) satisfies the FM constraint qualification. Now invoking Theorem 7.13, there exists λ ≥ 0 such that 0 ∈ ∂f (¯ x) + λ∂(φ(.) − α)(¯ x) + NΘ (¯ x)
and
λ(φ(¯ x) − α) = 0.
As ∂(φ(.) − α)(¯ x) = ∂h(¯ x), the optimality condition reduces to 0 ∈ ∂f (¯ x) + λ∂φ(¯ x) + NΘ (¯ x), thereby establishing the desired result. The converse can be proved as in Chapter 3. For a better understanding of the above result, consider the bilevel programming problem where f (x) = x2 + 1, Θ = [−1, 1], and φ(x) = max{0, x}. Observe that C = [−1, 0] and α = 0. Thus the reformulated problem is min x2 + 1
subject to
For ξ ∈ R, φ∗ (ξ) =
max {0, x} ≤ 0, x ∈ [−1, 0].
+∞, ξ < 0 or ξ > 1, 0, ξ ∈ [0, 1],
which implies epi φ∗ = {(ξ, γ) ∈ R2 : ξ ∈ [0, 1], ξ ≥ 0} = [0, 1] × R+ ∗ while epi δΘ = epi |.|. Therefore, ∗ cone epi φ∗ + epi δΘ = R2+ ∪ {(ξ, γ) ∈ R2 : ξ ≤ 0, γ ≥ −ξ},
which is a closed set. Because cone{(0, 1)} ⊂ cone epi φ∗ , the reformulated problem satisfies the qualification condition in Theorem 7.14. It is easy to see that x ¯ = 0 is a solution of the bilevel problem with NΘ (0) = {0}, ∂f (0) = {0}, and ∂φ(0) = [0, 1]. Thus the KKT optimality conditions of Theorem 7.14 are
© 2012 by Taylor & Francis Group, LLC
7.4 Applications to Bilevel Programming Problems
311
satisfied with λ = 0. Note that the Slater condition fails to hold for the reformulated problem. We end this chapter by presenting the optimality conditions for the bilevel programming problem inf f (x)
subject to
x ∈ C,
(BP 1)
where C is the solution set of the lower-level problem min φ(x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X.
¯ is a proper, convex, lsc function and gi : Rn → R, Here, φ : Rn → R i = 1, 2, . . . , m, are convex functions, and X ⊂ Rn is a closed convex set. Define α = inf{φ(x) : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X} < +∞. Without loss of generality, assume that α = 0. This can be achieved by setting φ(x) = φ(x) − α. Then the bilevel programming problem (BP 1) is equivalent to the following optimization problem: min f (x)
subject to φ(x) ≤ 0, gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X. (RP 1)
Below we present the result on optimality conditions for the bilevel programming problem (BP 1). Theorem 7.15 Consider the bilevel programming problem (BP 1). Assume that {
[
λ∈Rm +
m m X [ [ X ∗ epi ( λi gi )∗ } ∪ { λ0 epi φ∗ + epi ( λi gi )∗ } + epi δX i=1
λ∈Rm +
λ0 >0
i=1
is closed. Then x ¯ ∈ C is a point of minimizer of (BP 1) if and only if there exist λ0 ≥ 0 and λi ≥ 0, i = 1, 2, . . . , m, such that m X 0 ∈ ∂f (¯ x) + λ0 ∂φ(¯ x) + ∂( λi gi )(¯ x) + NX (¯ x), i=1
λ0 φ(¯ x) = 0
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
e = (λ0 , λ) ∈ R+ × Rm , Proof. Observe that for any λ + e (λg)(x) = λ0 φ(x) +
Therefore,
m X
λi gi (x).
i=1
m X e ∗ = cl{epi (λ0 φ)∗ + epi( epi(λg) λi gi )∗ }. i=1
© 2012 by Taylor & Francis Group, LLC
312
Sequential Optimality Conditions
As dom gi = Rn , i = 1, 2, . . . , m, dom (λi gi ) = Rn , i = 1, 2, . . . , m, which by Theorem 2.69 are continuous on Rn . By Proposition 2.124, m X e ∗ = epi (λ0 φ)∗ + epi ( epi(λg) λi gi )∗ .
(7.43)
i=1
Now consider the two cases, namely λ0 = 0 and λ0 > 0. For λ0 = 0, epi (λ0 φ)∗ = cone {(0, 1)}. Thus, the condition (7.43) reduces to m X e ∗ = cone {(0, 1)} + epi ( epi(λg) λi gi )∗ . i=1
Observe that for µ ≥ 0,
µ(0, 1) + (ξ, α) = (ξ, α + µ) ∈ epi ( m X
where (ξ, α) ∈ epi (
i=1
m X
λi gi )∗ ,
i=1
λi gi )∗ . Because µ ≥ 0 was arbitrary,
m m X X cone {(0, 1)} + epi ( λi gi )∗ ⊂ epi ( λi gi )∗ . i=1
Also, for any (ξ, α) ∈ epi (
m X
i=1
λi gi )∗ ,
i=1
(ξ, α) = (0, 0) + (ξ, α) ∈ cone {(0, 1)} + epi (
m X
λi gi )∗ .
i=1
m X λi gi )∗ was arbitrary, As (ξ, α) ∈ epi ( i=1
m m X X λi gi )∗ , λi gi )∗ ⊂ cone {(0, 1)} + epi ( epi ( i=1
i=1
thereby implying that m m X X λi gi )∗ . λi gi )∗ = epi ( cone {(0, 1)} + epi ( i=1
i=1
Thus, for λ0 = 0, m X e ∗ = epi ( epi(λg) λi gi )∗ . i=1
© 2012 by Taylor & Francis Group, LLC
7.4 Applications to Bilevel Programming Problems
313
For the case when λ0 > 0, the condition (7.43) becomes m X e ∗ = λ0 epi φ ∗ +epi ( epi(λg) λi gi )∗ . i=1
Therefore, [
e 1+m λ∈R +
e ∗ + epi δ ∗ = { epi(λg) X
[
λ∈Rm +
{
[
m X epi ( λi gi )∗ )} ∪ i=1
[
epi φ∗ +
λ∈Rm +
λ0 >0
m X ∗ epi ( λi gi )∗ } + epi δX . i=1
By the given hypothesis, the set [ e ∗ + epi δ ∗ epi(λg) X e 1+m λ∈R +
is closed. Hence, the FM constraint qualification holds for the problem (RP 1). Because dom f = Rn , by Theorem 2.69 f is continuous on Rn , CC qualification condition holds for (RP 1). As the bilevel problem (BP 1) is equivalent to (RP 1), by Theorem 7.11, x ¯ ∈ C is a point of minimizer of (BP 1) if and only e = (λ0 , λ) ∈ R+ × Rm such that if there exists λ + e x) + NX (¯ 0 ∈ ∂f (¯ x) + ∂(λg)(¯ x)
and
e x) = 0. (λg)(¯
(7.44)
As φ is proper convex, λ0 φ is also proper convex. Therefore, dom (λ0 φ) is a nonempty convex set in Rn . By Proposition 2.14 (i), ri dom (λ0 φ) is nonempty. Because dom g = Rn , dom (λg) = Rn . Now invoking the Sum Rule, Theorem 2.91, m X e x) = λ0 ∂φ(¯ ∂(λg)(¯ x) + ∂( λi gi )(¯ x). i=1
Thus the optimality condition in (7.44) becomes
m X 0 ∈ ∂f (¯ x) + λ0 ∂φ(¯ x) + ∂( λi gi )(¯ x) + NX (¯ x).
(7.45)
i=1
By the complementary slackness condition in (7.44), e x) = λ0 φ(¯ (λg)(¯ x) +
m X
λi gi (¯ x) = 0.
i=1
As (λ0 , λ) ∈ R+ × Rm ¯ yields that + , which along with the feasibility of x λ0 φ(¯ x) = 0
and λi gi (¯ x) = 0, i = 1, 2, . . . , m.
The above condition together with (7.45) leads to the requisite conditions. The converse can be proved as in Chapter 3.
© 2012 by Taylor & Francis Group, LLC
Chapter 8 Representation of the Feasible Set and KKT Conditions
8.1
Introduction
Until now, we discussed the convex programming problem (CP ) with the convex feasible set C given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2 . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions and its variations like (CP 1) and (CCP ). But is the convexity of the functions forming the convex feasible set C important? For example, assume C as a subset of R2 given by C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 ≥ 0}. This set is convex even though g(x1 , x2 ) = 1−x1 x2 is a nonconvex function. As stated in Chapter 1, convex optimization basically means minimizing a convex function over a convex set with no emphasis on as to how the feasible set is obtained. Very recently (2010), Lasserre [74] published a very interesting paper discussing this aspect of convex feasibility for smooth convex optimization.
8.2
Smooth Case
In this section, we turn our attention to the case of smooth convex optimization studied by Lasserre [74]. From Chapter 2 we know that when a convex function is differentiable, then ∂φ(x) = {∇φ(x)} and its gradient is also continuous; thus any differentiable convex function is smooth. So one can obtain the KKT optimality conditions at the point of minimizer from the subdifferential optimality conditions discussed in Chapter 3; that is, if x¯ is the point of minimizer of (CP ) with (C) given by (3.1), then there exist λi ≥ 0, i = 1, 2, . . . , m, such that ∇f (¯ x) +
m X i=1
λi ∇gi (¯ x) = 0
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m. 315
© 2012 by Taylor & Francis Group, LLC
316
Representation of the Feasible Set and KKT Conditions
Observe that the KKT conditions for smooth convex optimization problems look absolutely the same as the KKT conditions for the usual smooth optimization problem. As discussed in earlier chapters, under certain constraint qualifications like the Slater constraint qualification, the above KKT conditions are necessary as well as sufficient. Lasserre observed that the convex feasible set C of (CP ) need not always be defined by convex inequality constraints as in the above example. The question that Lasserre answers is “in such a scenario what conditions would make the KKT optimality conditions necessary as well as sufficient? ” So now the convex set C given by (3.1) is considered, with the only difference that gi , i = 1, 2, . . . , m, need not be convex even though they are assumed to be smooth. Lasserre showed that if the Slater constraint qualification and an additional nondegeneracy condition hold, then the KKT condition is both necessary and sufficient. Though Lasserre defined the notion of nondegeneracy for every point of the set C, we define it for a particular point and extend it to the feasible set C. Definition 8.1 The nondegeneracy condition is said to hold at x ¯ ∈ C if ∇gi (¯ x) 6= 0, ∀ i ∈ I(¯ x), where I(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = 0} denotes the active index set at x ¯. The set C is said to satisfy the nondegeneracy condition if it holds for every x ¯ ∈ C. The Slater constraint qualification along with the nondegeneracy condition gives the following interesting characterization of a convex set given by Lasserre [74]. Theorem 8.2 Consider the set C given by (3.1) where gi , i = 1, 2, . . . , m, are smooth. Assume that the Slater constraint qualification is satisfied, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m, and the nondegeneracy condition holds for C. Then C is convex if and only if h∇gi (x), y − xi ≤ 0, ∀ x, y ∈ C, ∀ i ∈ I(x).
(8.1)
Proof. Suppose that C is a convex set and consider x ¯ ∈ C. Therefore, for any y ∈ C, for every λ ∈ [0, 1], x ¯ + λ(y − x ¯) ∈ C, that is, gi (¯ x + λ(y − x ¯)) ≤ 0, ∀ i = 1, 2, . . . , m. Now for i ∈ I(¯ x), lim λ↓0
gi (¯ x + λ(y − x ¯)) − gi (¯ x) ≤ 0, λ
that is, for every i ∈ I(¯ x), h∇gi (¯ x), y − x ¯i ≤ 0, ∀ y ∈ C.
© 2012 by Taylor & Francis Group, LLC
8.2 Smooth Case
317
Because x ¯ ∈ C is arbitrary, the above inequality holds for every x¯ ∈ C, thereby establishing the desired inequality. Conversely, suppose that the condition (8.1) holds. Observe that C has an interior because the Slater constraint qualification holds. Further, (8.1) along with the nondegeneracy condition of the set C implies that each boundary point of C has a nontrivial supporting hyperplane. The supporting hyperplane is nontrivial due to the non-degeneracy condition on C. Using Proposition 2.29, C is a convex set. Observe that the nondegeneracy condition of the set C is required only in the sufficiency part of the proof. Till now in the book, we have mainly dealt with the Slater constraint qualification and some others namely, Abadie, pseudonormality and the FM constraint qualification. Another well known constraint qualification for the convex programming problem (CP ) is the Mangasarian–Fromovitz constraint qualification. For (CP ) with C given by (3.1) in the smooth scenario, Mangasarian–Fromovitz constraint qualification is said to hold at x ¯ ∈ Rn if n there exists d ∈ R such that h∇gi (¯ x), di < 0, ∀ i ∈ I(¯ x). One may observe that if this constraint qualification is satisfied for x¯ ∈ C, then ∇gi (¯ x) 6= 0, for every i ∈ I(¯ x), thereby ensuring that the nondegeneracy condition at x ¯. But the converse need not hold, that is the nondegeneracy condition need not imply the Mangasarian–Fromovitz constraint qualification. We verify this claim by the following example. Consider the set C ⊂ R2 given by C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 + x2 − 2 ≤ 0, x1 ≥ 0}. Here, g1 (x1 , x2 ) = 1 − x1 x2 , g2 (x1 , x2 ) = x1 + x2 − 2 and g3 (x1 , x2 ) = −x1 . Note that C = {(1, 1)} and thus trivially is a convex set. At x ¯ = (1, 1), the active index set is I(¯ x) = {1, 2} and −1 1 ∇g1 (¯ x) = 6= 0 and ∇g2 (¯ x) = 6= 0, −1 1 which implies that the nondegeneracy condition is satisfied for C = {¯ x}. But observe that there exists no (d1 , d2 ) ∈ R2 satisfying −d1 − d2 < 0
and
d 1 + d2 < 0
simultaneously, thereby not satisfying the Mangasarian–Fromovitz constraint qualification at x ¯. We end this section by presenting the result from Lasserre [74] establishing the necessary and sufficient optimality condition for a minimizer of (CP ) over C with gi , i = 1, 2, . . . , m, nonconvex smooth functions. As one will observe
© 2012 by Taylor & Francis Group, LLC
318
Representation of the Feasible Set and KKT Conditions
from the result below, the nondegeneracy condition is required only for the necessary part at the given point and not the set as mentioned in the statement of the theorem in Lasserre [74]. Also in the converse part, we require the necessary part of Theorem 8.2, which is independent of the nondegeneracy condition. Theorem 8.3 Consider the problem (CP ) where f is smooth and C is given by (3.1), where gi , i = 1, 2, . . . , m, are smooth but need not be convex. Assume that the Slater constraint qualification is satisfied and the nondegeneracy condition holds at x ¯ ∈ C. Then x ¯ is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that ∇f (¯ x) +
m X i=1
λi ∇gi (¯ x) = 0
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
Proof. Let x ¯ be a point of minimizer of f over C. By the Fritz John optimality conditions, Theorem 5.1, there exist λ0 ≥ 0, λi ≥ 0, i = 1, 2, . . . , m, not all simultaneously zero such that λ0 ∇f (¯ x) +
m X i=1
λi ∇gi (¯ x) = 0
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
Suppose that λ0 = 0, which implies for some i ∈ {1, 2, . . . , m}, λi > 0. Therefore, the set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. By the complementary slackness condition, ¯ gi (¯ x) = 0, ∀ i ∈ I, which implies I¯ ⊂ I(¯ x). By the optimality condition, X λi ∇gi (¯ x) = 0, i∈I¯
which implies that X i∈I¯
λi h∇gi (¯ x), x − x ¯i = 0, ∀ x ∈ C.
As the Slater constraint qualification is satisfied, there exists xˆ ∈ Rn such that gi (ˆ x) < 0 for every i = 1, 2, . . . , m. As dom gi = Rn , i = 1, 2, . . . , m, by Theorem 2.69, gi , i = 1, 2, . . . , m, are continuous on Rn . Thus there exists δ > 0 such that for every x ∈ Bδ (ˆ x) and every i = 1, 2, . . . , m, gi (x) < 0, that is, Bδ (ˆ x) ⊂ int C. By the preceding equality, X λi h∇gi (¯ x), x − x ¯i = 0, ∀ x ∈ Bδ (ˆ x). (8.2) i∈I¯
© 2012 by Taylor & Francis Group, LLC
8.2 Smooth Case
319
As I¯ ⊂ I(¯ x), along with the convexity of C and Theorem 8.2, yields that for ¯ every i ∈ I, h∇gi (¯ x), x − x ¯i ≤ 0, ∀ x ∈ C, ¯ which along with the condition (8.2) implies that for i ∈ I, h∇gi (¯ x), x − x ¯i = 0, ∀ x ∈ Bδ (ˆ x).
(8.3)
Because x ˆ ∈ Bδ (ˆ x), the condition (8.3) reduces to ¯ h∇gi (¯ x), x ˆ−x ¯i = 0, ∀ i ∈ I.
(8.4)
For any d ∈ Rn , consider the vector x ˆ + λd such that for λ > 0 sufficiently ¯ small, x ˆ + λd ∈ Bδ (ˆ x). Hence, by the condition (8.3), for each i ∈ I, h∇gi (¯ x), x ˆ + λd − x ¯i = 0, which implies h∇gi (¯ x), x ˆ−x ¯i + λh∇gi (¯ x), di = 0. ¯ By condition (8.4), for every i ∈ I, h∇gi (¯ x), di = 0, ∀ d ∈ Rn . Hence, ∇gi (¯ x) = 0 for every i ∈ I¯ ⊂ I(¯ x) and thereby contradicting the nondegeneracy condition at x ¯. Thus, λ0 6= 0. Dividing the Fritz John optimality condition by λ0 , the KKT optimality condition is established at x¯ as ∇f (¯ x) +
m X i=1
¯ i ∇gi (¯ λ x) = 0
and
¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m,
¯ i = λi , i = 1, 2, . . . , m. where λ λ0 Conversely, suppose that x ¯ satisfies the KKT optimality conditions. Assume that x ¯ is not a point of minimizer of (CP ). Therefore, there exists x ∈ C such that f (x) < f (¯ x), which along with the convexity of f implies that 0 > f (x) − f (¯ x) ≥ h∇f (¯ x), x − x ¯i. Therefore, by the KKT optimality conditions, 0 > f (x) − f (¯ x) ≥ −
m X i=1
λi h∇gi (¯ x), x − x ¯i.
(8.5)
If λi = 0 for every i = 1, 2, . . . , m, we reach a contradiction. Now assume that I¯ 6= ∅. By Theorem 8.2, for every i ∈ I¯ ⊂ I(¯ x), h∇gi (¯ x), x − x ¯i ≥ 0, ∀ x ∈ C,
© 2012 by Taylor & Francis Group, LLC
320
Representation of the Feasible Set and KKT Conditions
which implies that m X i=1
λi h∇gi (¯ x), x − x ¯i =
X i∈I¯
λi h∇gi (¯ x), x − x ¯i ≥ 0, ∀ x ∈ C,
thereby contradicting the condition (8.5) and thus leading to the requisite result, that is, x ¯ is a point of minimizer of f over C.
8.3
Nonsmooth Case
Motivated by the above work of Lasserre [74], Dutta and Lalitha [40] extended the study to a nonsmooth scenario involving the locally Lipschitz function. But before we move on with the work done in this respect, we need some tools for nonsmooth Lipschitz functions. Consider a locally Lipschitz function φ : Rn → R. The Clarke directional derivative of φ at x ¯ in the direction d ∈ Rn is defined as φ◦ (¯ x, d) =
lim
x→¯ x, λ↓0
φ(x + λd) − φ(x) . λ
The Clarke directional derivative is a sublinear function of the direction d. In Section 3.6 we defined the Clarke subdifferential using the Rademacher Theorem. Here, we express the Clarke subdifferential of φ at x ¯ using the Clarke directional derivative defined above as ∂ ◦ φ(¯ x) = {ξ ∈ Rn : φ◦ (¯ x, d) ≥ hξ, di, ∀ d ∈ Rn }. The function φ is said to be regular at x ¯ if for every d ∈ Rn , the directional ′ derivative φ (¯ x, d) exists and φ◦ (¯ x, d) = φ′ (¯ x, d), ∀ d ∈ Rn . Every convex function is regular. In the nonsmooth scenario, Dutta and Lalitha [40] considered the convex feasible set C of (CP ) to be defined by inequality constraints involving nonsmooth locally Lipschitz functions that are regular. For example, consider φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}, where φi : Rn → R, i = 1, 2, . . . , m, are smooth functions. Then φ is a locally Lipschitz regular function. Now, similar to the nondegeneracy condition in the smooth case given by Lasserre [74], Dutta and Lalitha [40] defined the notion for nonsmooth locally Lipschitz scenario as follows.
© 2012 by Taylor & Francis Group, LLC
8.3 Nonsmooth Case
321
Definition 8.4 Consider the set C given by (3.1) where each gi , i = 1, 2, . . . , m is a locally Lipschitz function. The set C is said to satisfy the nondegeneracy condition at x ¯ ∈ C if 0 6∈ ∂ ◦ gi (¯ x), ∀ i ∈ I(¯ x). If the condition holds for every x ¯ ∈ C, the nondegeneracy condition is said to hold for the set C. Before moving on to discuss the results obtained in this work, we present some examples from Dutta and Lalitha [40] to have a look at the above nondegeneracy condition. Consider the set C = {x ∈ R : g0 (x) ≤ 0}, where g0 (x) = max{x3 , x} − 1. Hence C = (−∞, 1]. At the boundary point x ¯ = 1, I(¯ x) = {0} where ∂ ◦ g0 (¯ x) = [1, 3], thereby satisfying the nondegeneracy condition. Now if we define the function g0 (x) = max{x3 , x}, then C = (−∞, 0] with boundary point x ¯ = 0 at which ∂ ◦ g0 (¯ x) = [0, 1]. Thus, the nondegeneracy condition is not satisfied at x¯. Observe that in both the cases g0 is a regular function and the Slater constraint qualification is also satisfied. Yet in the second scenario the nondegeneracy condition is not satisfied. But if the functions gi , i = 1, 2, . . . , m, involved are convex and the Slater constraint qualification holds for (CP ), then the Mangasarian–Fromovitz constraint qualification for the nonsmooth case is satisfied at x¯ ∈ C, that is there exists d ∈ Rn such that gi′ (¯ x, d) < 0, ∀ i ∈ I(¯ x). As the directional derivative is a support function to the subdifferential set, Theorem 2.79, the above condition is equivalent to hξi , di < 0, ∀ ξi ∈ ∂gi (¯ x), ∀ i ∈ I(¯ x), from which it is obvious that the nondegeneracy condition is ensured for the convex nonsmooth scenario. Next we present the equivalent characterization of the convex set C under the nonsmooth scenario. Theorem 8.5 Consider the set C be given by (3.1) represented by nonsmooth locally Lipschitz inequality constraints where gi , i = 1, 2, . . . , m, are regular. Assume that the Slater constraint qualification holds and satisfies the nondegeneracy condition. Then C is convex if and only if gi◦ (x, y − x) ≤ 0, ∀ x, y ∈ C, ∀ i ∈ I(x).
© 2012 by Taylor & Francis Group, LLC
(8.6)
322
Representation of the Feasible Set and KKT Conditions
Proof. Consider the convex set C. Working along the lines of Theorem 8.2, for arbitrary but fixed x ¯ ∈ C, for λ ∈ (0, 1), gi (¯ x + λ(y − x ¯)) − gi (¯ x) ≤ 0, ∀ i ∈ I(¯ x). λ As the functions gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions, gi◦ (¯ x, y − x ¯) = gi′ (¯ x, y − x ¯) ≤ 0, ∀ y ∈ C, ∀ i ∈ I(x), thus leading to the requisite result. Conversely, suppose that (8.6) holds. As the Slater constraint qualification holds, the set C has an interior. Now consider any boundary point x ∈ C. By the condition (8.6) along with the fact that the Clarke directional derivative is the support function of the Clarke subdifferential, then for every y ∈ C, hξi , y − xi ≤ gi◦ (x, y − x) ≤ 0, ∀ ξi ∈ ∂ ◦ gi (x), ∀ i ∈ I(x). As the nondegeneracy condition is satisfied, ξi 6= 0 for every ξi ∈ ∂ ◦ gi (x) and every i ∈ I(x), which implies that there is a nontrivial supporting hyperplane to C at x. Hence, by Proposition 2.29, C is a convex set, as desired. Now we present the theorem establishing the necessary and sufficient optimality conditions for the class of problem (CP ) dealt with in this section. Theorem 8.6 Consider the problem (CP ) with C is given by (3.1), where gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions. Assume that the Slater constraint qualification holds and the nondegeneracy condition is satisfied at x ¯ ∈ C. Then x ¯ is a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
λi ∂ ◦ gi (¯ x)
and
λi gi (¯ x) = 0, i = 1, . . . , m.
i=1
Proof. Suppose that x ¯ is a point of minimizer of f over C. We know by Theorem 2.72 that a convex function f is locally Lipschitz. Then by the optimality conditions for locally Lipschitz functions at x¯, there exist λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero, such that 0 ∈ λ0 ∂ ◦ f (¯ x) +
m X
λi ∂ ◦ gi (¯ x)
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
i=1
Because f is convex, ∂ ◦ f (¯ x) = ∂f (¯ x). Therefore, the optimality condition can be rewritten as 0 ∈ λ0 ∂f (¯ x) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
λi ∂ ◦ gi (¯ x).
8.3 Nonsmooth Case
323
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. As λi , i = 0, 1, . . . , m, are not all zeroes, the set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Then the above optimality condition reduces to X 0∈ λi ∂ ◦ gi (¯ x), i∈I¯
which implies there exist ξi ∈ ∂ ◦ gi (¯ x), i ∈ I¯ such that X 0= λi ξi . i∈I¯
From the definition of the Clarke subdifferential, the above condition leads to X X λi gi◦ (¯ x, d) ≥ λi hξi , di = 0, ∀ d ∈ Rn . (8.7) i∈I¯
i∈I¯
As the Slater constraint qualification is satisfied, there exists xˆ ∈ Rn such that gi (ˆ x) < 0 for every i = 1, . . . m. Also, as gi , i = 1, 2, . . . , m, are locally Lipschitz, and hence continuous. Thus there exists δ > 0 such that for every x ∈ Bδ (ˆ x), gi (x) < 0, i = 1, . . . , m. In condition (8.7), in particular, taking d=x−x ¯ where x ∈ Bδ (ˆ x) ⊂ C, X λi gi◦ (¯ x, x − x ¯) ≥ 0, ∀ x ∈ Bδ (ˆ x). (8.8) i∈I¯
¯ that is, By the complementary slackness condition, gi (¯ x) = 0 for every i ∈ I, ¯ I ⊂ I(¯ x). Therefore, by Theorem 8.5, as C is a convex set, we have ¯ gi◦ (¯ x, x − x ¯) ≤ 0, ∀ x ∈ Bδ (ˆ x), ∀ i ∈ I, ¯ which along with the condition (8.8) implies that for every i ∈ I, gi◦ (¯ x, x − x ¯) = 0, ∀ x ∈ Bδ (ˆ x).
(8.9)
In particular, for x ˆ ∈ Bδ (ˆ x), the above condition reduces to ¯ gi◦ (¯ x, x ˆ−x ¯) = 0, ∀ i ∈ I.
(8.10)
Consider any v ∈ Rn and choose λ > 0 sufficiently small such that ¯ x ˆ + λv ∈ Bδ (ˆ x). Hence, from the condition (8.9), for every i ∈ I, gi◦ (¯ x, x ˆ + λv − x ¯) = 0, ∀ v ∈ Rn . As the Clarke generalized directional derivative is sublinear in the direction, ¯ the above condition becomes for every i ∈ I, gi◦ (¯ x, x ˆ−x ¯) + λgi◦ (¯ x, v) ≥ 0, ∀ v ∈ Rn ,
© 2012 by Taylor & Francis Group, LLC
324
Representation of the Feasible Set and KKT Conditions
which by (8.10) leads to ¯ gi◦ (¯ x, v) ≥ 0, ∀ v ∈ Rn , ∀ i ∈ I. ¯ From the definition of the Clarke subdifferential, 0 ∈ ∂ ◦ gi (¯ x) for every i ∈ I, thereby contradicting the nondegeneracy condition. Therefore λ0 6= 0 and dividing the optimality condition throughout by λ0 reduces it to 0 ∈ ∂f (¯ x) +
m X
¯ i ∂ ◦ gi (¯ λ x)
¯ i gi (¯ λ x) = 0, i = 1, 2, . . . , m,
and
i=1
¯ i = λi , i = 1, 2, . . . , m leading to the requisite result. where λ λ0 Conversely, suppose that the conditions hold at x¯. On the contrary, assume that x ¯ is not a point of minimizer of f over C. Thus, there exists x ∈ C such that f (x) < f (¯ x), which along with the convexity of f , 0 > f (x) − f (¯ x) ≥ hξ, x − x ¯i, ∀ ξ ∈ ∂f (¯ x).
(8.11)
Using the optimality conditions at x¯, there exists ξ0 ∈ ∂f (¯ x) and ξi ∈ ∂ ◦ gi (¯ x), i = 1, 2, . . . , m, such that 0 = ξ0 +
m X
λi ξi .
i=1
The above condition along with (8.11) leads to 0>−
m X i=1
λi hξi , x − x ¯i,
which by the definition of Clarke subdifferential along with Theorem 8.5 yields 0>−
m X i=1
λi gi◦ (¯ x, x − x ¯) ≥ 0,
thereby leading to a contradiction. Therefore, x ¯ is a point of minimizer of (CP ). We end this chapter with an example from Dutta and Lalitha [40] to illustrate that in the absence of the nondegeneracy condition, even though the Slater constraint qualification and the regularity of the constraint functions hold, the KKT optimality condition need not be satisfied. Consider the problem min f (x)
© 2012 by Taylor & Francis Group, LLC
subject to
g1 (x) ≤ 0, g2 (x) ≤ 0
8.3 Nonsmooth Case
325
where f (x) = −x,
g1 (x) = x
3
and g2 (x) =
−x − 1, −1,
x ≤ 0, x > 0.
Then the feasible set is C = [−1, 0] and the point of minimizer is x ¯ = 0. Also, C does not satisfy the nondegeneracy condition but the Slater constraint qualification holds along with the constraint functions being regular. Observe that ∂f (¯ x) = {−1}, ∂ ◦ g1 (¯ x) = {0} and ∂ ◦ g2 (¯ x) = ∂g2 (¯ x) = [−1, 0], and thus the KKT optimality conditions are not satisfied. Now if in the above example one takes the objective function to be f (x) = x, then the point of minimizer is x ¯ = −1 at which ∂f (¯ x) = {1}, ∂ ◦ g1 (¯ x) = {3}, and ∂ ◦ g2 (¯ x) = ∂g2 (¯ x) = {−1}. Observe that the KKT optimality conditions hold with λ1 = 0 and λ2 = 1.
© 2012 by Taylor & Francis Group, LLC
Chapter 9 Weak Sharp Minima in Convex Optimization
9.1
Introduction
In the preceding chapters we studied the necessary and sufficient optimality conditions for x ¯ ∈ Rn to be a point of minimizer for the convex optimization problem wherein a convex objective function f is minimized over a convex feasible set C ⊂ Rn . From Theorem 2.90, if the objective function f is strictly convex, then the point of minimizer x ¯ is unique. The notion of unique minimizer was extended to the concept of sharp minimum or, equivalently, strongly unique local minimum. The ideas of sharp minimizer and strongly unique minimizer were introduced by Polyak [94, 95] and Cromme [29]. These notions played an important role in the approximation theory or the study of perturbation in optimization problems and also in the analysis of the convergence of algorithms [1, 26, 56]. Below we define the notion of sharp minimum.
¯ defined over a set F ⊂ Rn is said to Definition 9.1 A function φ : Rn → R be sharp minima at x ¯ ∈ F if there exists α > 0 such that φ(x) − φ(¯ x) ≥ α kx − x ¯k, ∀ x ∈ F. From the above definition it is obvious that a point of sharp minimizer is unique. This is one of the major drawbacks of the concept of sharp minimum as it rules out the most basic optimization problem, namely the linear programming problem. To overcome this difficulty, the notion of weak sharp minimum was introduced by Ferris [46]. We study this notion for the convex optimization problem min f (x)
subject to
x ∈ C,
(CP )
where f : Rn → R is a convex function and C ⊂ Rn is a closed convex set. 327 © 2012 by Taylor & Francis Group, LLC
328
9.2
Weak Sharp Minima in Convex Optimization
Weak Sharp Minima and Optimality
We begin this section by defining the weak sharp minimum for the convex optimization problem (CP ) from Ferris [46]. Definition 9.2 Let S ⊂ Rn denote the nonempty solution set of (CP ). Then S is said to be the set of weak sharp minimizer on C if there exists α > 0 such that f (x) − f (projS (x)) ≥ α kx − projS (x)k, ∀ x ∈ C. Observe that for any x ∈ C, projS (x) ∈ S and as S is the solution set, f (¯ x) = constant, ∀ x ¯ ∈ S. Equivalently, S is the set of the weak sharp minimizer if there exists α > 0 such that f (x) − f (¯ x) ≥ α dS (x), ∀ x ∈ C, ∀ x ¯ ∈ S. The equivalent definition was given in Burke and Ferris [25]. Before moving on with the results on equivalent conditions for weak sharp minimizers, we present some results from Aubin and Ekeland [4], Lucchetti [79], Luenberger [80], and Rockafellar [97], which act as a tool in proving the equivalence. Proposition 9.3 Consider nonempty closed convex set F ⊂ Rn . (i) For every x ∈ F ,
NF (x) = {v ∈ Rn : hv, xi = σF (v)}.
(ii) For every y ∈ Rn , dF (y) = max (hv, yi − σF (v)). v∈cl B
(iii) If F is a closed convex cone, then for every y ∈ Rn , dF (y) = σcl B∩F ◦ (y). (iv) For every y ∈ Rn , dF (y) = sup dx+TF (x) (y). x∈F
(v) For every x ∈ F , the subdifferential of the distance function dF is ∂dF (x) = cl B ∩ NF (x) and the directional derivative is d′F (x, v) = dTF (x) (v) = σcl B∩NF (x) (v), ∀ v ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
9.2 Weak Sharp Minima and Optimality
329
Proof. (i) From Definition 2.36 of normal cone, NF (¯ x) = {v ∈ Rn : hv, x − x ¯i ≤ 0, ∀ x ∈ F }. Observe that any v ∈ NF (¯ x) along with the fact that x ¯ ∈ F satisfies the inequality hv, x ¯i ≤ σF (v) ≤ hv, x ¯i, that is, σF (v) ≤ hv, x ¯i. Thus NF (¯ x) = {v ∈ Rn : hv, x ¯i = σF (v)}. (ii) By the definition of the distance function, dF (y) = inf ky − xk
=
x∈F
=
inf sup hv, y − xi
x∈F v∈cl B
sup {hv, yi + inf (−hv, xi)} x∈F
v∈cl B
=
sup {hv, yi − σF (v)}.
v∈cl B
(iii) For a closed convex cone F , by Definition 2.30 of polar cone, F ◦ = {v ∈ Rn : hv, xi ≤ 0, ∀ x ∈ F }. Therefore, σF (v) =
0, v ∈ F ◦ , +∞, otherwise.
(9.1)
From (ii), which along with the above relation (9.1) yields that dF (y) = sup hv, yi, provided v ∈ F ◦ , v∈cl B
which implies dF (y) =
sup v∈cl B∩F ◦
hv, yi = σcl B∩F ◦ (y),
as desired. (iv) By Theorem 2.35, TF (x) is a closed convex cone and hence x + TF (x) is also a closed convex cone. Invoking (iii) along with Proposition 2.37 leads to dx+TF (x) (y) = dTF (x) (y − x) = σcl B∩NF (x) (y − x). Therefore, sup dx+TF (x) (y) = sup
x∈F
© 2012 by Taylor & Francis Group, LLC
sup
x∈F v∈cl B∩NF (x)
hv, y − xi.
330
Weak Sharp Minima in Convex Optimization
By (i) and (ii), the above condition reduces to sup dx+TF (x) (y) = sup {hv, yi − σF (v)} = dF (y),
x∈F
v∈cl B
thereby establishing the result. (v) As an example to inf-convolution, Definition 2.54, dF (x) = (k.k δF )(x), which is exact at every x ∈ Rn . Invoking the subdifferential inf-convolution rule at the point where the inf-convolution is exact, Theorem 2.98, ∂dF (x) = ∂k.k(y) ∩ ∂δF (x − y). For x ∈ int F , taking y = 0, ∂k.k(0) = cl B
while
∂δF (x) = NF (x) = {0}.
Thus, ∂dF (x) = {0} for x ∈ int F . For x ∈ bdry F , again taking y = 0, ∂k.k(0) = cl B
while
∂δF (x) = NF (x),
and hence ∂dF (x) = B ∩ NF (x). Therefore, ∂dF (x) = cl B ∩ NF (x), ∀ x ∈ F.
(9.2)
As dom dF = Rn , by Theorem 2.79 and the condition (9.2), d′F (x, v) = σ∂dF (x) (v) = σcl B∩NF (x) (v), which by (iii) implies that d′F (x, v) = dTF (x) (v) and hence the result.
As we know, the convex optimization problem can be equivalently expressed as the unconstrained problem min f0 (x)
subject to
x ∈ Rn ,
(CPu )
where f0 (x) = f (x) + δC (x) is an lsc proper convex function. As (CP ) and (CPu ) are equivalent, the solution set of both problems coincide, which implies that S is also the set of weak sharp minimizers of (CPu ). Before moving on to prove the main result on the characterization of the weak sharp minimizer, we present the results in terms of the objective function f0 of (CPu ). Lemma 9.4 Consider the unconstrained convex optimization problem (CPu ) and the set of weak sharp minimizers S. Let α > 0. Then the following are equivalent:
© 2012 by Taylor & Francis Group, LLC
9.2 Weak Sharp Minima and Optimality
331
(i) α cl B ∩ NS (x) ⊂ ∂f0 (x) for every x ∈ S, [ [ (ii) α cl B ∩ NS (x) ⊂ ∂f0 (x). x∈S
x∈S
Proof. It is easy to observe that (i) implies (ii). Conversely, suppose that (ii) holds. Consider x ¯ ∈ S with ξ ∈ α cl B ∩ NS (¯ x). As (ii) is satisfied, there exists y¯ ∈ S such that ξ ∈ ∂f0 (¯ y ). By Definition 2.77 of subdifferential, f0 (x) − f0 (¯ y ) ≥ hξ, x − y¯i, ∀ x ∈ Rn .
(9.3)
In particular, for any x ∈ S, f0 (x) = f0 (¯ y ), thereby reducing the above inequality to hξ, x − y¯i ≤ 0, ∀ x ∈ S, which implies ξ ∈ NS (¯ y ). By assumption, ξ ∈ NS (¯ x). Thus, by Proposition 9.3 (i), hξ, x ¯i = σS (ξ) = hξ, y¯i. (9.4) As x ¯ ∈ S, f0 (¯ x) = f0 (¯ y ), which along the conditions (9.3) and (9.4) leads to f0 (x) − f0 (¯ x) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , thereby implying that ξ ∈ ∂f0 (¯ x). Because x ¯ ∈ S was arbitrary, (i) holds. The above result was from Burke and Ferris [25]. The next result from Burke and Deng [22] provides a characterization for weak sharp minimizer in terms of f0 . Theorem 9.5 Consider the convex optimization problem (CP ) and its equivalent unconstrained problem (CPu ). Let α > 0. Then S is the set of weak sharp minimizers with modulus α if and only if f0′ (¯ x, v) ≥ α dTS (¯x) (v), ∀ x ¯ ∈ S, ∀ v ∈ Rn .
(9.5)
Proof. Suppose that S is the set of weak sharp minimizers with modulus α > 0. Consider x ¯ ∈ S. Therefore, by Definition 9.2, f (x) − f (¯ x) ≥ α dS (x), ∀ x ∈ C. As x ¯ ∈ S ⊂ C, f0 (¯ x) = f (¯ x). Also for x ∈ C, f0 (x) = f (x). Therefore, the above inequality leads to f0 (x) − f0 (¯ x) ≥ α dS (x), ∀ x ∈ C.
(9.6)
For x 6∈ C, f0 (x) = +∞. Thus, f0 (x) − f0 (¯ x) ≥ α dS (x), ∀ x ∈ /C
© 2012 by Taylor & Francis Group, LLC
(9.7)
332
Weak Sharp Minima in Convex Optimization
trivially. Combining (9.6) and (9.7) yields f0 (x) − f0 (¯ x) ≥ α dS (x), ∀ x ∈ Rn . In particular, taking x = x ¯ + λv ∈ Rn for λ > 0 and v ∈ Rn in the above condition leads to f0 (¯ x + λv) − f0 (¯ x) ≥ α dS (¯ x + λv), ∀ λ > 0, ∀ v ∈ Rn , which implies that for every λ > 0, dS (¯ x + λv) f0 (¯ x + λv) − f0 (¯ x) ≥α , ∀ v ∈ Rn . λ λ
(9.8)
Observe that dS (¯ x + λv) = inf k¯ x + λv − xk = λ inf kv − x∈S
x∈S
(x − x ¯) k, λ
which by Definition 2.33 of tangent cone implies that dS (¯ x + λv) ≥ inf kv − yk = dTS (¯x) (v). λ y∈TS (¯ x)
(9.9)
Therefore, using (9.8) along with (9.9) leads to f0 (¯ x + λv) − f0 (¯ x) ≥ αdTS (¯x) (v), ∀ v ∈ Rn . λ Taking the limit as λ → 0 in the above inequality reduces it to f0′ (¯ x, v) ≥ αdTS (¯x) (v), ∀ v ∈ Rn . Because x ¯ ∈ S was arbitrary, the above condition yields (9.5). Conversely, suppose that the relation (9.5) is satisfied. Consider x ∈ C and x ¯ ∈ S. Therefore, f0 (x) − f0 (¯ x) ≥ f0′ (¯ x, x − x ¯) ≥ α dTS (¯x) (x − x ¯) = α dx¯+TS (¯x) (x). By Proposition 9.3 (iv), the above inequality leads to f0 (x) − f0 (¯ x) ≥ α sup dx¯+TS (¯x) (x) = α dS (x). x ¯∈S
Because x ∈ C and x ¯ ∈ S were arbitrary, the above condition holds for every x ∈ C and every x ¯ ∈ S, and hence S is the set of weak sharp minimizers. We end this chapter by giving equivalent characterizations for the set of weak sharp minimizers, S, for (CP ) from Burke and Deng [22].
© 2012 by Taylor & Francis Group, LLC
9.2 Weak Sharp Minima and Optimality
333
Theorem 9.6 Consider the convex optimization problem (CP ) and its equivalent unconstrained problem (CPu ). Let α > 0. Then the following statements are equivalent: (i) S is the set of weak sharp minimizers for (CP ) with modulus α > 0. (ii) For every x ¯ ∈ S and v ∈ TC (¯ x), f ′ (¯ x, v) ≥ α dTS (¯x) (v). (iii) For every x ¯ ∈ S, α cl B ∩ NS (¯ x) ⊂ ∂f0 (¯ x). (iv) The inclusion α cl B ∩
[
x ¯∈S
NS (¯ x) ⊂
[
∂f0 (¯ x)
x ¯∈S
holds. (v) For every x ¯ ∈ S and v ∈ TC (¯ x) ∩ NS (¯ x), f ′ (¯ x, v) ≥ α kvk. (vi) For every x ¯ ∈ S, α B ⊂ ∂f (¯ x) + (TC (¯ x) ∩ NS (¯ x))◦ . (vii) For every x ∈ C, f ′ (¯ x, x − x ¯) ≥ α dS (x), where x ¯ ∈ projS (x). Proof. [(i) =⇒ (ii)] Because S is the set of weak sharp minimizers, by Theorem 9.5, f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.10) The above condition holds in particular for v ∈ TC (x). As f0 (x) = f (x) + δC (x), which along with the fact that f0′ (x, v) = f ′ (x, v) for every x ∈ S and v ∈ TC (x), and condition (9.10) yields f ′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ TC (x), thereby establishing (ii). [(ii) =⇒ (iii)] As dom f = Rn , by Theorem 2.79 and the relation (ii), σ∂f (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ TC (x).
© 2012 by Taylor & Francis Group, LLC
334
Weak Sharp Minima in Convex Optimization
By Theorem 2.35, TC (x) is a closed convex cone. Invoking Proposition 2.61 (v) along with Proposition 2.37 yields σ∂f (x)+NC (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . By the fact that NC (x) = ∂δC (x) and from the Sum Rule, Theorem 2.91, ∂f (x) + NC (x) ⊂ ∂(f + δC )(x) = ∂f0 (x), which is always true along with Proposition 2.61 (i), the above inequality yields σ∂f0 (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn .
(9.11)
By Proposition 9.3 (v), for any x ∈ S and v ∈ Rn , α dTS (x) (v) = α σcl B∩NS (x) (v) = α
sup v ∗ ∈cl B∩NS (x)
hv ∗ , vi.
As α > 0, the above condition becomes α dTS (x) (v)
=
sup v ∗ ∈cl
=
hα v ∗ , vi
B∩NS (x)
sup α v ∗ ∈α cl B∩NS (x)
hα v ∗ , vi = σα cl B∩NS (x) (v). (9.12)
Substituting the above relation in the inequality (9.11) leads to σ∂f0 (x) (v) ≥ σα cl B∩NS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn .
(9.13)
By Proposition 2.82, ∂f0 (x) is a closed convex set which along with Proposition 2.61 (iv) and (ii) implies that α cl B ∩ NS (x) ⊂ ∂f0 (x), ∀ x ∈ S, thereby proving (iii). [(iii) =⇒ (i)] By Proposition 2.61 (i), relation (9.13) holds which along with (9.12) implies that σ∂f0 (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . By Theorem 2.79, the above inequality leads to f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn , that is, (9.5) is satisfied. Therefore by Theorem 9.5, (i) holds. [(iii) ⇐⇒ (iv)] This holds by Lemma 9.4.
[(v) =⇒ (vi)] Because dom f = Rn , by Theorem 2.79, the relation (v) becomes σ∂f (x) (v) ≥ α sup hv ∗ , vi, ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x). v ∗ ∈cl B
© 2012 by Taylor & Francis Group, LLC
9.2 Weak Sharp Minima and Optimality
335
As α > 0, for every x ∈ S and every v ∈ TC (x) ∩ NS (x), the above inequality is equivalent to σ∂f (x) (v) ≥
sup
hα v ∗ , vi = σα cl B (v).
α v ∗ ∈α cl B
Because TC (x) ∩ NS (x) is a closed convex cone, by Proposition 2.61 (v), the above condition yields that for every x ∈ S, α cl B ⊂ cl {∂f (x) + (TC (x) ∩ NS (x))◦ }. Invoking Proposition 2.15, α B = int (α cl B) ⊂ int {∂f (x) + (TC (x) ∩ NS (x))◦ } ⊂ ∂f (x) + (TC (x) ∩ NS (x))◦ , ∀ x ∈ S, thereby leading to (vi). [(vi) =⇒ (v)] Applying Proposition 2.61 (v) to condition (vi) leads to σ∂f (x) (v) ≥ σα B (v), ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x). As dom f = Rn , by Theorem 2.79, the above inequality leads to f ′ (x, v) ≥ α kvk, ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x), thereby establishing (v). [(ii) =⇒ (v)] By Proposition 9.3 (vi), for every x ∈ S, dTS (x) (v) = σcl B∩NS (x) (v). For every v ∈ NS (x),
dTS (x) (v) = kvk.
(9.14)
Therefore, for every x ∈ S and every v ∈ TC (x)∩NS (x), the relation (ii) along with (9.14) leads to f ′ (x, v) ≥ α kvk, thereby deriving (v). [(v) =⇒ (vii)] Consider x ∈ C and let x ¯ ∈ projS (x). By Theorem 2.35, x−x ¯ ∈ TC (¯ x). As x ¯ ∈ projS (x), by Proposition 2.52, hx − x ¯, y¯ − x ¯i ≤ 0, ∀ y¯ ∈ S, which by Definition 2.36 of normal cone, x − x ¯ ∈ NS (¯ x). Therefore, x−x ¯ ∈ TC (¯ x) ∩ NS (¯ x).
© 2012 by Taylor & Francis Group, LLC
336
Weak Sharp Minima in Convex Optimization
iv
i
vii
ii
v
iii vi
FIGURE 9.1: Pictorial representation of Theorem 9.6.
Now by relation (v), f ′ (¯ x, x − x ¯) ≥ α kx − x ¯k. As x ¯ ∈ projS (x), dS (x) = kx − x ¯k. Thus the above inequality becomes f ′ (¯ x, x − x ¯) ≥ α dS (x). Because x ∈ C and x ¯ ∈ projS (x) were arbitrary, the inequality holds for every x ∈ C and x ¯ ∈ projS (x), thereby yielding the relation (vii). [(vii) =⇒ (i)] As dom f = Rn , by Theorem 2.79 along with Definition 2.77 of subdifferential and the relation (vii) leads to f (x) − f (¯ x) ≥ f ′ (¯ x, x − x ¯) ≥ α dS (x), ∀ x ∈ C,
(9.15)
with x ¯ ∈ projS (x). Because for any y¯ ∈ S with y¯ 6= x ¯, f (¯ y ) = f (¯ x). Thus, (9.15) holds for every x ∈ C and every x ¯ ∈ S, thereby leading to (i). Figure 9.1 presents the pictorial representation of Theorem 9.6. We have devoted this chapter only to the theoretical aspect of weak sharp minimizers, though as mentioned in the beginning this notion plays an important role from the algorithmic point of view. For readers interested in its computational aspects, one may refer to Burke and Deng [23, 24] and Ferris [46].
© 2012 by Taylor & Francis Group, LLC
Chapter 10 Approximate Optimality Conditions
10.1
Introduction
We have discussed the various aspects of studying optimality conditions for the convex programming problem (CP ). Throughout, we concentrated on establishing the standard or the sequential optimality conditions at the exact point of minima. But it may not be always possible to find the point of minimizer. There may be cases where the infimum exists but is not attainable. For instance, consider min ex
subject to
x ∈ R.
As we know, the infimum for the above problem is zero but it is not attainable over the whole real line. Thus for scenarios we try to approximate the solution. In this example, for a given ε > 0, one can always find x ¯ ∈ R such that ex¯ < ε. This leads to the notion of approximate solutions, which play a crucial role in algorithmic study of optimization problems. Recall the convex optimization problem min f (x)
subject to
x ∈ C,
(CP )
where f : Rn → R is a convex function and C is a convex subset of Rn . Definition 10.1 Let ε ≥ 0 be given. Then x ¯ ∈ C is said to be an ε-solution of (CP ) or an approximate up to ε for (CP ) if f (¯ x) ≤ f (x) + ε, ∀ x ∈ C. This is not the only way to study approximate solutions. In the literature, one finds the notions of various approximate solutions introduced over the years, such as quasi ε-solution, regular ε-solution, almost ε-solution [76], to name a few. We will define these solution concepts before moving on to study the approximate optimality conditions. The classes of quasi ε-solution and regular ε-solution are motivated by Ekeland’s variational principle stated in Chapter 2.
337 © 2012 by Taylor & Francis Group, LLC
338
Approximate Optimality Conditions
Definition 10.2 Let ε ≥ 0 be given. Then x ¯ ∈ C is said to be quasi ε-solution of (CP ) if √ ¯k, ∀ x ∈ C. f (¯ x) ≤ f (x) + εkx − x A point x ¯ ∈ C, which is an ε-solution as well as a quasi ε-solution of (CP ), is known as the regular ε-solution of (CP ). The class of almost ε-solution, as the name itself suggests, seems to be an approximation to the ε-solution. Actually, it is the approximate solution concept associated with the perturbed problem. Before defining the almost ε-solution, recall the feasible set C given by (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Definition 10.3 Let ε ≥ 0 be given. The ε-feasible set of (CP ) with the feasible set C given by (3.1) is defined as Cε = {x ∈ Rn : gi (x) ≤ ε, i = 1, 2, . . . , m}. Then x ¯ ∈ Rn is said to be an almost ε-solution of (CP ) if x ¯ ∈ Cε
and
f (¯ x) ≤ f (x) + ε, ∀ x ∈ C.
Observe that here the almost ε-solution need not be from the actual feasible set but should belong to the perturbed feasible set that is ε-feasible set. Now we move on to discuss the approximate optimality conditions for the various classes of approximate solutions. In this chapter we concentrate on the ε-solution, quasi ε-solution, and almost ε-solution. We begin with the study of ε-solutions.
10.2
ε-Subdifferential Approach
Consider the unconstrained convex programming problem (CPu ) min f (x) n
subject to
x ∈ Rn .
(CPu )
If x ¯ ∈ R is an ε-solution, then by Definition 10.1,
f (x) − f (¯ x) ≥ −ε, ∀ x ∈ Rn . Using the definition of ε-subdifferential, Definition 2.109, 0 ∈ ∂ε f (¯ x). The converse can be established by directly applying the definition of ε-subdifferential. This has been stated as a result characterizing the ε-solution in Theorem 2.121 as follows.
© 2012 by Taylor & Francis Group, LLC
10.2 ε-Subdifferential Approach
339
Theorem 10.4 Consider the unconstrained problem (CPu ). Then x ¯ ∈ Rn is an ε-solution of (CPu ) if and only if 0 ∈ ∂ε f (¯ x). As the convex programming problem (CP ) can be reformulated as an unconstrained problem with the objective function f replaced by (f + δC ), then from the above theorem one has that x ¯ is an ε-solution of (CP ) if and only if 0 ∈ ∂ε (f + δC )(¯ x).
Observe that dom f = Rn . If in addition, the Slater constraint qualification, that is, C has a nonempty relative interior holds, then by invoking the Sum Rule of ε-subdifferential, Theorem 2.115, along with the definition of ε-normal set, Definition 2.110, leads to 0 ∈ ∂ε1 f (¯ x) + NC,ε2 (¯ x) for some εi ≥ 0, i = 1, 2, with ε1 + ε2 = ε. This may be stated as the following theorem. Theorem 10.5 Consider the convex optimization problem (CP ). Assume that the Slater constraint qualification holds, that is ri C is nonempty. Let ε ≥ 0 be given. Then x ¯ ∈ C is an ε-solution of (CP ) if and only if there exist εi ≥ 0, i = 1, 2, satisfying ε1 + ε2 = ε such that 0 ∈ ∂ε1 f (¯ x) + NC,ε2 (¯ x). Note that for a nonempty convex set C, by Proposition 2.14 (i), ri C is nonempty and hence the Slater constraint qualification holds. From the above theorem it is obvious that to obtain the approximate optimality conditions in terms of the constraint functions gi , i = 1, 2, . . . , m, NC,ε (x) must be explicitly expressed in their terms. Below we present the result from Strodiot, Nguyen, and Heukemes [106], which acts as the tool in establishing the approximate optimality conditions. But before that, we define the right scalar multiplication from Rockafellar [97]. ¯ be a proper convex function and λ ≥ 0. Definition 10.6 Let φ : Rn → R The right scalar multiplication, φλ, is defined as λφ(λ−1 x), λ > 0, (φλ)(x) = δ{0} (x), λ = 0. A positively homogeneous convex function generated by φ, ψ, is defined as ψ(x) = inf{(φλ)(x) : λ ≥ 0}.
Proposition 10.7 Consider ε ≥ 0 and g : Rn → R is a convex function. Let x ¯ ∈ C¯ = {x ∈ Rn : g(x) ≤ 0}. Assume that the Slater constraint qualification holds, that is, there exist x ˆ ∈ Rn such that g(ˆ x) < 0. Then ξ ∈ NC,ε x) if and ¯ (¯ only if there exists λ ≥ 0 and ε¯ ≥ 0 such that ε¯ ≤ λg(¯ x) + ε
© 2012 by Taylor & Francis Group, LLC
and
ξ ∈ ∂ε¯(λg)(¯ x).
340
Approximate Optimality Conditions
Proof. Using the definition of an ε-normal set, Definition 2.110, NC,ε x) ¯ (¯
¯ = {ξ ∈ Rn : hξ, x − x ¯i ≤ ε, ∀ x ∈ C} = {ξ ∈ Rn : σC¯ (ξ) ≤ hξ, x ¯i + ε},
where σC¯ (ξ) denotes the support function to the set C¯ at ξ. Observe that dom g = Rn and hence by Theorem 2.69 continuous over the whole of Rn . Now invoking Theorem 13.5 from Rockafellar [97] (see also Remark 10.8), the support function σC¯ is the closure of the positively homogenous function φ generated by g ∗ , which is defined as φ(ξ) = inf (g ∗ λ)(ξ) = inf λg ∗ (λ−1 ξ) = inf (λg)∗ (ξ). λ≥0
λ≥0
λ≥0
Therefore, NC,ε x) ¯ (¯
= {ξ ∈ Rn : inf (λg)∗ (ξ) ≤ hξ, x ¯i + ε} λ≥0
= {ξ ∈ Rn : there exists λ ≥ 0 such that (λg)∗ (ξ) ≤ hξ, x ¯i + ε}
= {ξ ∈ Rn : there exists λ ≥ 0 such that (λg)∗ (ξ) + (λg)(¯ x) ≤ hξ, x ¯i + ε + (λg)(¯ x)} n = {ξ ∈ R : there exists λ ≥ 0 such that
(λg)(x) − (λg)(¯ x) ≥ hξ, x − x ¯i − ε − (λg)(¯ x), ∀ x ∈ Rn }.
From the above condition, there exists λ ≥ 0 such that ξ ∈ ∂ε+(λg)(¯x) (λg)(¯ x). As ∂ε1 φ(x) ⊂ ∂ε2 φ(x) whenever ε1 ≤ ε2 , there exists an ε¯ satisfying 0 ≤ ε¯ ≤ ε + (λg)(¯ x) such that ξ ∈ ∂ε¯(λg)(¯ x). Therefore, [ NC,ε x) = {ξ ∈ Rn : there exists λ ≥ 0 such that ¯ (¯ 0≤¯ ε≤ε+(λg)(¯ x)
=
[
[
∂ε¯(λg)(¯ x),
(λg)∗ (ξ) + (λg)(¯ x) ≤ hξ, x ¯i + ε¯}
0≤¯ ε≤ε+(λg)(¯ x) λ≥0
thereby leading to the desired result. Remark 10.8 We state Theorem 13.5 from Rockafellar [97]. ¯ be a proper lsc convex function. The support function Let φ : Rn → R of the set C = {x ∈ Rn : φ(x) ≤ 0} is then cl ψ, where ψ is the positively homogeneous convex function generated by φ∗ . Dually, the closure of the positively homogeneous convex function ψ generated by φ is the support function of the set {x∗ ∈ Rn : φ∗ (x∗ ) ≤ 0}. For more details, readers are advised to refer to Rockafellar [97].
© 2012 by Taylor & Francis Group, LLC
10.2 ε-Subdifferential Approach
341
Next we present the approximate optimality conditions for the convex programming problem (CP ). Theorem 10.9 Consider the convex programming problem (CP ) with C given by (3.1). Assume that the Slater constraint qualification is satisfied. Let ε ≥ 0. Then x ¯ is an ε-solution of (CP ) if and only if there exist ε¯0 ≥ 0, ¯ i ≥ 0, i = 1, . . . , m, such that ε¯i ≥ 0, and λ 0 ∈ ∂ε¯0 f (¯ x) +
m X
¯ i gi )(¯ ∂ε¯i (λ x)
m X
and
i=1
i=0
ε¯i − ε ≤
m X i=1
¯ i gi (¯ λ x) ≤ 0.
Proof. Observe that (CP ) is equivalent to the unconstrained problem min (f +
m X
δCi )(x)
subject to
i=1
x ∈ Rn ,
where Ci = {x ∈ Rn : gi (x) ≤ 0}, i = 1, 2, . . . , m. By the Slater constraint qualification, there exist x ˆ ∈ Rn such that gi (ˆ x) < 0 for every i = 1, 2, . . . , m, which implies ri Ci , i = 1, 2, . . . , m, is nonempty. Invoking Theorem 10.5, Pm there exist εi ≥ 0, i = 0, 1, . . . , m, with ε0 + i=1 εi = ε such that 0 ∈ ∂ε0 f (¯ x) +
m X
NCi ,εi (¯ x).
i=1
¯ i ≥ 0 and ε¯i ≥ 0, Applying Proposition 10.7 to Ci , i = 1, 2, . . . , m, there exist λ i = 1, 2, . . . , m, such that 0 ∈ ∂ε¯0 f (¯ x) +
m X
¯ i g)(¯ ∂ε¯i (λ x)
i=1
¯ i gi (¯ ε¯i − εi ≤ λ x) ≤ 0, i = 1, 2, . . . , m,
and
(10.1)
where ε¯0 = ε0P . Now summing (10.1) over i = 1, 2, . . . , m, and using the m condition ε0 + i=1 εi = ε leads to m X i=0
ε¯i − ε ≤
m X i=1
¯ i gi (¯ λ x) ≤ 0,
(10.2)
as desired. ¯ i gi (¯ Conversely, define εi = ε¯i − λ x), i = 1, 2, . . . , m. Applying Proposi¯ i gi )(¯ tion 10.7, ξi ∈ ∂ε¯i (λ x) is equivalent to ξi ∈ NCi ,εi (¯ x) for i = 1, 2, . . . , m. Also, from the condition (10.2), ε¯0 +
m X i=1
© 2012 by Taylor & Francis Group, LLC
εi +
m X i=1
¯ i gi (¯ λ x) − ε ≤
m X i=1
¯ i gi (¯ λ x) ≤ 0,
342
Approximate Optimality Conditions Pm
which implies ε¯0 + i=1 εi ≤ ε. Define ε0 = ε¯0 +εs , where εs = ε−¯ ε0 − Observe that εs ≥ 0. Therefore, 0 ∈ ∂ε¯0 f (¯ x) + where ε0 +
Pm
i=1 εi
m X i=1
NCi ,εi (¯ x) ⊂ ∂ε0 f (¯ x) +
m X
Pm
i=1 εi .
NCi ,εi (¯ x),
i=1
= ε. By Theorem 10.5, x ¯ is an ε-solution of (CP ).
Observe that in the above approximate optimality conditions instead of the complementary slackness conditions, we have an ε-complementary slackness condition. Also, we derived the approximate optimality conditions in terms of the ε-subdifferentials of the objective function as well as the constraint functions at the ε-solution of (CP ) by equivalent characterization of ε-normal set in terms of the ε-subdifferentials of the constraint functions gi , i = 1, 2, . . . , m.
10.3
Max-Function Approach
As discussed in the Section 3.5, another approach that is well known in establishing the standard KKT optimality conditions is the max-function approach. Applying a similar approach for an ε-solution, x ¯, of (CP ) we introduce an unconstrained minimization problem min F (x)
subject to
x ∈ Rn ,
(CPmax )
where F (x) = max{f (x)−f (¯ x)+ε, g1 (x), . . . , gm (x)}. Using this max-function, an alternative proof is provided to derive the approximate optimality conditions. But before that we present a result to study the relation between the ε-solution of (CP ) and those of the unconstrained problem (CPmax ). Theorem 10.10 Consider the convex programming problem (CP ) with C given by (3.1). If x ¯ is an ε¯-solution of (CP ), then x ¯ is an ε-solution of the unconstrained problem (CPmax ) for every ε ≥ ε¯. Conversely, if x ¯ is an ε-solution of (CPmax ), then it is an almost 2ε-solution of (CP ). Proof. Because x ¯ is an ε¯-solution of (CP ), x ¯ ∈ C with f (¯ x) ≤ f (x) + ε¯, ∀ x ∈ C.
(10.3)
Observe that F (¯ x) = ε¯. To show that for every ε ≥ ε¯, x ¯ ∈ Rn is an ε-solution for (CPmax ), it is sufficient to establish that F (¯ x) ≤ F (x) + ε¯, ∀ x ∈ Rn , which is equivalent to proving that F (x) ≥ 0 for every x ∈ Rn . For x ∈ C, gi (x) ≤ 0, i = 1, 2, . . . , m, while condition (10.3) ensures that
© 2012 by Taylor & Francis Group, LLC
10.3 Max-Function Approach
343
f (x) − f (¯ x) + ε¯ ≥ 0. Therefore, F (x) ≥ 0 for every x ∈ C. If x 6∈ C, then for some i ∈ {1, 2, . . . , m}, gi (x) > 0 and thus, F (x) > 0 for every x ∈ / C. Hence, x ¯ is an ε-solution of (CPmax ). Conversely, as x ¯ is an ε-solution of (CPmax ), F (¯ x) ≤ F (x) + ε, ∀ x ∈ Rn . Therefore, 0 < ε = max{ε, g1 (¯ x), g2 (¯ x), . . . , gm (¯ x)} ≤ F (x) + ε, ∀ x ∈ Rn . The above condition yields F (x) > 0
and
gi (¯ x) ≤ F (x) + ε, i = 1, 2, . . . , m, ∀ x ∈ Rn .
From the first condition, in particular for x ∈ C, f (¯ x) ≤ f (x) + ε ≤ f (x) + 2ε while in the second condition, taking x = x ¯ leads to gi (¯ x) ≤ 2ε, i = 1, 2, . . . , m, thereby implying that x ¯ is an almost 2ε-solution of (CP ).
Theorem 10.11 Consider the convex programming problem (CP ) with C defined by (3.1). Assume that the Slater constraint qualification is satisfied and let ε ≥ 0. Then x ¯ is an ε-solution of (CP ) if and only if there exist ε¯0 ≥ 0, ¯ i ≥ 0, i = 1, . . . , m, such that ε¯i ≥ 0, and λ 0 ∈ ∂ε¯0 f (¯ x) +
m X
¯ i gi )(¯ ∂ε¯i (λ x)
and
i=1
m X i=0
ε¯i − ε =
m X i=1
¯ i gi (¯ λ x) ≤ 0.
Proof. As x ¯ is an ε-solution of (CP ), then by Theorem 10.10, x ¯ is also an ε-solution of the unconstrained minimization problem (CPmax ). By the approximate optimality condition, Theorem 10.4, for the unconstrained problem, 0 ∈ ∂ε F (¯ x). By the ε-subdifferential Max-Function Pm Rule, Remark 2.119, there exist εi ≥ 0, λi ≥ 0, i = 0, 1, . . . , m, with i=1 λi = 1 and ξ0 ∈ ∂ε0 (λ0 f )(¯ x) provided λ0 > 0 and ξi ∈ ∂εi (λi gi )(¯ x) for those i ∈ {1, 2, . . . , m} satisfying λi > 0 such that 0 = ξ0 +
X
ξi
i∈I¯
and
m X i=0
εi + F (¯ x) − λ0 ε −
X
λi gi (¯ x) = ε,
(10.4)
i∈I¯
where I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. Now if λ0 = 0, again invoking
© 2012 by Taylor & Francis Group, LLC
344
Approximate Optimality Conditions
the ε-subdifferential Max-Function Rule, Remark 2.119, there exists some i ∈ {1, 2, . . . , m} such that λi > 0, which implies I¯ is nonempty. Thus, corre¯ there exist ξi ∈ ∂ε (λi gi )(¯ sponding to i ∈ I, x) such that i
0=
X
ξi
m X
and
i∈I¯
i=0
εi + F (¯ x) −
X
λi gi (¯ x) = ε.
(10.5)
i∈I¯
As F (¯ x) = ε, the second equality condition reduces to m X
εi =
X
λi gi (¯ x).
(10.6)
i∈I¯
i=0
By the definition of ε-subdifferentiability, Definition 2.109, ¯ λi gi (x) ≥ λi gi (¯ x) + hξi , x − x ¯i − εi , ∀ x ∈ Rn , i ∈ I. Therefore, the above inequality along with (10.5) and the nonnegativity of εi , i = 0, 1, . . . , m, leads to m X
λi gi (x) =
X i∈I¯
i=1
λi gi (x) ≥
X i∈I¯
λi gi (¯ x) −
m X
εi ,
i=0
which by the condition (10.6) yields m X i=1
λi gi (x) ≥ 0, ∀ x ∈ Rn .
(10.7)
As the Slater constraint qualification holds, there exists xˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Thus, m X
λi gi (ˆ x) < 0,
i=1
thereby contradicting the inequality (10.7). Therefore, λ0 6= 0. Now dividing both relations of (10.4) throughout by λ0 > 0, along with F (¯ x) = ε and Theorem 2.117, leads to 0 ∈ ξ¯0 +
X i∈I¯
ξ¯i
and
m X i=0
ε¯i − ε =
m X i=1
¯ i gi (¯ λ x) ≤ 0,
¯ i gi )(¯ ¯ ε¯i = εi , i = 0, 1, . . . , m, and where ξ¯0 ∈ ∂ε¯0 f (¯ x), ξ¯i ∈ ∂ε¯i (λ x), i ∈ I, λ0 λ i ¯i = ¯ i = 0 with ξ¯i = 0 ∈ ∂ε¯ (λ ¯ i gi )(¯ ¯ Corresponding to i ∈ ¯ λ , i ∈ I. λ / I, x), i λ0 thereby leading to the approximate optimality condition 0 ∈ ∂ε¯0 f (¯ x) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
¯ i gi )(¯ ∂ε¯i (λ x)
10.4 ε-Saddle Point Approach
345
along with the ε-complementary slackness condition. The converse can be worked along the lines of Theorem 10.9 taking εs = 0. Note that in the ε-complementary slackness condition of Theorem 10.9, we had inequality whereas in the above theorem it is in the form of an equation. Actually, this condition can also be treated as an inequality if for condition (10.4) we consider F (¯ x) = max{ε, 0} ≥ ε instead of F (¯ x) = ε.
10.4
ε-Saddle Point Approach
While studying the optimality conditions for the convex programming problem (CP ), we have already devoted a chapter on saddle point theory. Now to derive the approximate optimality conditions, we make use of the ε-saddle point approach. Recall the Lagrangian function L : Rn × Rm + → R associated with the convex programming problem (CP ) with C given by (3.1), that is, involving convex inequalities, introduced in Chapter 4, is given by L(x, λ) = f (x) +
m X
λi gi (x).
i=1
¯ ∈ Rn × Rm is said to be an ε-saddle point Definition 10.12 A point (¯ x, λ) + of (CP ) if ¯ ≤ L(x, λ) ¯ + ε, ∀ x ∈ Rn , ∀ λ ∈ Rm . L(¯ x, λ) − ε ≤ L(¯ x, λ) + Below we present a saddle point result established by Dutta [37]. Theorem 10.13 Consider the convex programming problem (CP ) with C given by (3.1). Let ε ≥ 0 be given and x ¯ be an ε-solution of (CP ). Assume that the Slater constraint qualification holds. Then there exist ¯ i ≥ 0, i = 1, 2, . . . , m, such that (¯ ¯ is an ε-saddle point of (CP ) and λ x, λ) Pm ¯ g (¯ x ) ≥ 0. ε + i=1 λ i i
Proof. As x ¯ is an ε-solution of (CP ), the following system f (x) − f (¯ x) + ε < 0, gi (x) < 0, i = 1, 2, . . . , m, has no solution x ∈ Rn . Define the set
Λ = {(y, z) ∈ R × Rm : f (x) − f (¯ x) + ε < y, gi (x) < zi , i = 1, 2, . . . , m}. The reader is urged to verify that Λ is an open convex set. Observe that (0, 0) ∈ / Λ. Therefore, by the Separation Theorem, Theorem 2.26 (ii), there
© 2012 by Taylor & Francis Group, LLC
346
Approximate Optimality Conditions
exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that λ0 (f (x) − f (¯ x) + ε) +
m X i=1
λi gi (x) ≥ 0, ∀ x ∈ Rn .
(10.8)
Working along the lines of proof of Theorem 4.2, it can be proved that (λ0 , λ) ∈ R+ × Rm +. We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. By the Slater constraint qualification, there exists xˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m which implies m X
λi gi (ˆ x) < 0,
i=1
thereby contradicting (10.8). Therefore, λ0 6= 0 and thus the condition (10.8) can be expressed as f (x) − f (¯ x) + ε +
m X i=1
¯ i gi (x) ≥ 0, ∀ x ∈ Rn , λ
(10.9)
¯ i = λi for i = 1, 2, . . . , m. In particular, taking x = x ¯, the above where λ λ0 inequality reduces to m X ¯ i gi (¯ ε+ λ x) ≥ 0. (10.10) i=1
As gi (¯ x) ≤ 0, i = 1, 2, . . . , m, which along with (10.9) leads to f (¯ x) +
m X
¯ i gi (¯ λ x) ≤ f (x) +
i=1
which implies
m X i=1
¯ i gi (x) + ε, ∀ x ∈ Rn , λ
¯ ≤ L(x, λ) ¯ + ε, ∀ x ∈ Rn . L(¯ x, λ)
(10.11)
For any λi ≥ 0, i = 1, 2, . . . , m, the feasibility of x ¯ along with the nonnegativity of ε and (10.10) leads to f (¯ x) +
m X i=1
λi gi (¯ x) − ε ≤ f (¯ x) − ε ≤ f (¯ x) +
m X
¯ i gi (¯ λ x),
i=1
that is, ¯ ∀ λ ∈ Rm . L(¯ x, λ) − ε ≤ L(¯ x, λ), +
¯ is an ε-saddle point The above inequality along with (10.11) implies that (¯ x, λ) of (CP ), which satisfies (10.10), thereby yielding the desired result. Using this ε-saddle point result, we establish the approximate optimality conditions. But unlike Theorems 10.9 and 10.11, the result below is only necessary with a relaxed ε-complementary slackness condition.
© 2012 by Taylor & Francis Group, LLC
10.4 ε-Saddle Point Approach
347
Theorem 10.14 Consider the convex programming problem (CP ) with C given by (3.1). Let ε ≥ 0 be given and x ¯ be an ε-solution of (CP ). Assume that the Slater constraint qualificationP holds. Then there exist ε¯0 ≥ 0, ε¯i ≥ 0, ¯ i ≥ 0, i = 1, 2, . . . , m, with ε¯0 + m ε¯i = ε such that and λ i=1 0 ∈ ∂ε¯0 f (¯ x) +
m X
¯ i gi )(¯ ∂ε¯i (λ x)
and
ε+
i=1
m X i=1
¯ i gi (¯ λ x) ≥ 0.
¯ i ≥ 0, i = 1, 2, . . . , m, such that Proof. By the previous theorem, there exist λ ¯ ≤ L(x, λ) ¯ + ε, ∀ x ∈ Rn L(¯ x, λ)
Pm ¯ along with ε + i=1 λ x) ≥ 0. By Definition 10.1 of ε-solution, the above i gi (¯ inequality implies that x ¯ is an ε-solution of the unconstrained problem inf f (x) +
m X
¯ i gi (x) λ
subject to
i=1
x ∈ Rn .
By Theorem 10.4, the approximate optimality condition is 0 ∈ ∂ε (f +
m X
¯ i gi )(¯ λ x).
(10.12)
i=1
As dom f = Rn and dom gi = Rn , i = 1, 2, . . . , m, applying the Sum Rule of ε-subdifferential, Theorem 2.115, there exist ε¯i ≥ 0, i = 0, 1, . . . , m, satisfying Pm ε¯0 + i=1 ε¯i = ε such that (10.12) becomes 0 ∈ ∂ε¯0 f (¯ x) +
m X
¯ i gi )(¯ ∂ε¯i (λ x),
i=1
thereby establishing the result.
Observe that the conditions obtained in Theorem 10.14 are only necessary and not sufficient. The approach used in Theorems 10.9 and 10.11 for the sufficiency part cannot be invoked here. But if instead of the relaxed ε-complementary slackness condition, one has the standard complementary slackness, which is equivalent to m X
¯ i gi (¯ λ x) = 0,
i=1
then working along the lines of Theorem 10.9 the sufficiency can also be established. The result below shows that the optimality conditions derived in the above theorem imply toward the 2ε-solution of (CP ) instead of the ε-solution.
© 2012 by Taylor & Francis Group, LLC
348
Approximate Optimality Conditions
Theorem 10.15 Consider the convex programming problem (CP ) with C given by (3.1). Let ε ≥ 0 be given. Assume that the approximate optimality condition and the relaxed ε-complementary slackness condition of Theo¯ ∈ Rn × Rm and εi ≥ 0, i = 0, 1, . . . , m, satisfying rem 10.14 x, λ) + Pm hold for (¯ ε0 + i=1 εi = ε. Then x ¯ is a 2ε-solution of (CP ).
Proof. From the approximate optimality condition of Theorem P 10.14, there ¯ i ≥ 0, i = 1, 2, . . . , m, and εi ≥ 0, i = 0, 1, . . . , m, with ε0 + m εi = ε, exist λ i=1 ¯ i gi )(¯ ξ0 ∈ ∂ε0 f (¯ x), and ξi ∈ ∂εi (λ x), i = 1, 2, . . . , m such that 0 = ξ0 +
m X
ξi .
(10.13)
i=1
By Definition 2.109 of the ε-subdifferential, f (x) − f (¯ x) ≥ hξ0 , x − x ¯ i − ε0 , ¯ i gi (x) − λ ¯ i gi (¯ λ x) ≥ hξi , x − x ¯i − εi , i = 1, 2, . . . , m. Summing the above inequalities along with the condition (10.13) leads to f (x) +
m X i=1
¯ i gi (x) ≥ f (¯ λ x) +
m X i=1
¯ i gi (¯ λ x) − (ε0 +
m X
εi ).
i=1
For any x feasible for (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which alongP with the rem laxed ε-complementary slackness condition and the fact that ε0 + i=1 εi = ε implies that f (x) ≥ f (¯ x) − 2ε, ∀ x ∈ C. Thus, x ¯ is a 2ε-solution of (CP ).
¯ is an ε-saddle point of (CP ) if From Definition 10.12, (¯ x, λ) ¯ ≤ L(x, λ) ¯ + ε, ∀ x ∈ Rn , ∀ λ ∈ Rm . L(¯ x, λ) − ε ≤ L(¯ x, λ) + ¯ With respect to the ε-solution, we will call x ¯ an ε-minimum solution of L(., λ) ¯ and similarly, call λ an ε-maximum solution of L(¯ x, .). We end this section by presenting a result relating the ε-solutions of the saddle point to the almost ε-solution of (CP ) that was derived by Dutta [37]. Theorem 10.16 Consider the convex programming problem (CP ) with C ¯ ∈ Rn ×Rm be such that x given by (3.1). Let (¯ x, λ) ¯ is an ε1 -minimum solution + ¯ and λ ¯ is an ε2 -maximum solution of L(¯ of L(., λ) x, .). Then x ¯ is an almost (ε1 + ε2 )-solution of (CP ). ¯ ∈ Rm is an ε2 -maximum solution of L(¯ x, λ) over Rm Proof. Because λ +, + ¯ ∀ λ ∈ Rm . L(¯ x, λ) − ε2 ≤ L(¯ x, λ), +
© 2012 by Taylor & Francis Group, LLC
10.4 ε-Saddle Point Approach Pm As L(x, λ) = f (x) + i=1 λi gi (x), the above inequality reduces to m X i=1
¯ i )gi (¯ (λi − λ x) ≤ ε2 , ∀ λi ≥ 0, i = 1, 2, . . . , m.
349
(10.14)
We claim that x ¯ ∈ Cε2 = {x ∈ Rn : gi (x) ≤ ε2 , i = 1, 2, . . . , m}. On the contrary, suppose that x ¯ 6∈ Cε2 , which implies that the system gi (¯ x) − ε2 ≤ 0, i = 1, 2, . . . , m does not hold. Equivalently, the above condition implies that (g1 (¯ x) − ε2 , g2 (¯ x) − ε2 , . . . , gm (¯ x ) − ε2 ) ∈ / Rm −. As Rm − is a closed convex set, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists γ ∈ Rm with γ 6= 0 such that m X i=1
γi gi (¯ x) −
m X i=1
γi ε2 > 0 ≥
m X i=1
γi yi , ∀ y ∈ Rm −.
(10.15)
m We claim that γ ∈ Rm + . On the contrary, assume that γ 6∈ R+ , which implies for some i ∈ {1, 2, . . . , m}, γi < 0. As the inequality (10.15) holds for every y ∈ Rm − , taking the corresponding yi → −∞ leads to a contradiction. Hence, γ ∈ Rm +. Pm Because γ 6= 0, it can be so chosen satisfying i=1 γi = 1. Therefore, the strict inequality condition in (10.15) reduces to m X
γi gi (¯ x) > ε2 .
(10.16)
i=1
¯ ∈ Rm and γ ∈ Rm , λ ¯ + γ ∈ Rm . Therefore, taking λ = λ ¯ + γ in (10.14) As λ + + + leads to m X i=1
γi gi (¯ x) ≤ ε2 ,
which contradicts (10.16). Thus, x ¯ ∈ Cε2 ⊂ Cε1 +ε2 , where Cε1 +ε2 = {x ∈ Rn : gi (x) ≤ ε1 + ε2 , i = 1, 2, . . . , m}. ¯ over Rn , As x ¯ is an ε1 -minimum solution of L(x, λ) ¯ ≤ L(x, λ) ¯ + ε1 , ∀ x ∈ Rn , L(¯ x, λ) which implies f (¯ x) +
m X i=1
¯ i gi (¯ λ x) ≤ f (x) +
© 2012 by Taylor & Francis Group, LLC
m X i=1
¯ i gi (x) + ε1 , ∀ x ∈ Rn . λ
350
Approximate Optimality Conditions
For any x feasible to (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which implies ¯ λ Pi gmi (x)¯ ≤ 0, i = 1, 2, . . . , m. Taking λi = 0, i = 1, 2, . . . , m, in (10.14), x) ≥ −ε2 . Thus, the preceding inequality reduces to i=1 λi gi (¯ f (¯ x) ≤ f (x) + ε1 + ε2 , ∀ x ∈ C.
Therefore, x ¯ is an almost (ε1 + ε2 )-solution of (CP ).
10.5
Exact Penalization Approach
We have discussed different approaches like the ε-subdifferential approach, max-function approach, and saddle point approach to study the approximate optimality conditions. Another approach to deal with the relationship between the different classes of approximate solutions is the penalty function approach by Loridan [76]. In the work of Loridan that appeared in 1982, he dealt with the notion of regular and almost regular approximate solutions. But here we will concentrate more on ε-solutions and almost ε-solutions for which we move on to study the work done by Loridan and Morgan [77]. This approach helps in dealing with the stability analysis with respect to the perturbed problem, thereby relating the ε-solutions of the perturbed problem and almost ε-solutions of (CP ). We consider the exact penalty function fρ (x) = f (x) +
m X
ρi max{0, gi (x)},
i=1
where ρ = (ρ1 , ρ2 , . . . , ρm ), with ρi > 0, i = 1, 2, . . . , m and the following unconstrained problem min fρ (x)
subject to x ∈ Rn ,
(CP )ρ
is associated with it. The convergence of the ε-solutions of the sequence of problems (CP )ρ under certain assumptions leads to an ε-solution of the problem (CP ). So before moving on to establish the convergence result, we present a result relating the ε-solution of (CP )ρ with the almost ε-solution of (CP ). Theorem 10.17 Assume that f is bounded below on Rn . Then there ex(α + ε) where α = inf x∈C f (x) − inf x∈Rn f (x) such that whenever ists ρε = ε ρi ≥ ρε , i = 1, 2, . . . , m, every ε-solution of (CP )ρ is an almost ε-solution of (CP ). Proof. Suppose that xρ is an ε-solution for (CP )ρ . Then fρ (xρ ) ≤ fρ (x) + ε, ∀ x ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
(10.17)
10.5 Exact Penalization Approach
351
Observe that for x ∈ C, as gi (x) ≤ 0, i = 1, 2, . . . , m, fρ (x) = f (x). This along with the condition (10.17) and the definition of fρ implies f (xρ ) ≤ fρ (xρ ) ≤ f (x) + ε, ∀ x ∈ C.
(10.18)
Again, from the definition of fρ along with (10.18), infn f (x) +
x∈R
m X i=1
which implies
ρi max{0, gi (xρ )} ≤ fρ (xρ ) ≤ inf f (x) + ε, x∈C
m X i=1
ρi max{0, gi (xρ )} ≤ α + ε.
(10.19)
(α + ε) for every ε i = 1, 2, . . . , m. Therefore, for the ε-solution xρ of (CP )ρ , the condition (10.19) leads to Now consider ρ = (ρ1 , ρ2 , . . . , ρm ) such that ρi ≥ ρε =
gi (xρ ) ≤ max{0, gi (xρ )} ≤
m X i=1
max{0, gi (xρ )} ≤ ε, ∀ i = 1, 2, . . . , m,
which implies xρ ∈ Cε . This along with (10.18) yields that xρ is an almost ε-solution of (CP ). In the above theorem, it was shown that the ε-solutions of the penalized problem (CP )ρ are almost ε-solutions of (CP ). But we are more interested in deriving an ε-solution rather than an almost ε-solution of (CP ). The next result paves a way in this direction by obtaining an ε-solution of (CP ) from the ε-solutions of the sequence of problems {(CP )ρk }k , where ρk = (ρk1 , ρk2 , . . . , ρkm ). Theorem 10.18 Assume that f is bounded below on Rn and satisfies the coercivity condition lim
kxk→+∞
f (x) = +∞.
Let {ρk }k be a sequence such that limk→+∞ ρki = +∞ for every i = 1, 2, . . . , m and xρk be the ε-solution of (CP )ρk . Then every convergent sequence of {xρk } has a limit point that is an ε-solution of (CP ). Proof. As {xρk } is the ε-solution of (CP )ρk , by Theorem 10.17, {xρk } is an almost ε-solution of (CP ) and thus satisfies f (xρk ) ≤ f (x) + ε, ∀ x ∈ C. Because f (xρk ) is bounded above for every k, therefore by the given hypothesis
© 2012 by Taylor & Francis Group, LLC
352
Approximate Optimality Conditions
{xρk } is a bounded sequence and thus by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence. Without loss of generality, assume that xρk → xρ . As dom f = Rn , by Theorem 2.69, f is continuous on Rn . Thus, taking the limit as k → +∞, the above inequality leads to f (xρ ) ≤ f (x) + ε, ∀ x ∈ C.
(10.20)
Using the condition (10.19) in the proof of Theorem 10.17 gi (xρk ) ≤ (α + ε)/ρki . Again, by Theorem 2.69, gi , i = 1, 2, . . . , m, is continuous on dom gi = Rn , i = 1, 2, . . . , m. Therefore, taking the limit as k → +∞, the above inequality leads to gi (xρ ) ≤ 0, ∀ i = 1, 2, . . . , m. Thus, xρ ∈ C along with the condition (10.20) implies that xρ is an ε-solution of (CP ). From the above discussions it is obvious that an ε-solution of (CP )ρ need not be an ε-solution of (CP ) when xρ ∈ / C. But in case xρ ∈ C, it may be considered as an ε-solution of (CP ). The result below tries to find an ε-solution for (CP ) by using an ε-solution of (CP )ρ under the Slater constraint qualification. Even though the result is from Loridan and Morgan [77] but the proof is based on the work by Zangwill [114] on penalty functions. Here we present the detailed proof for a better understanding. Theorem 10.19 Consider the convex programming problem (CP ) with C given by (3.1). Assume that f is bounded below on Rn and the Slater constraint qualification is satisfied, that is, there exists x ˆ ∈ Rn such that gi (ˆ x) < 0, i = 1, 2, . . . , m. Define β = inf x∈C f (x) − f (ˆ x) and γ = maxi=1,...,m gi (ˆ x) < 0. Let ρ0 = (β − 1)/γ > 0. For ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 , i = 1, 2, . . . , m, let xρ ∈ / C be an ε-solution for (CP )ρ . Let x ¯ be the unique point on the line segment joining xρ and x ˆ lying on the boundary of C. Then x ¯ is an ε-solution of (CP ). Proof. Because x ¯ is a unique point on the line segment joining xρ and x ˆ lying on the boundary, the active index set I(¯ x) = {i ∈ {1, 2, . . . , m} : gi (¯ x) = 0} is nonempty. Define a convex auxiliary function as X F(x) = f (x) + ρ0 gi (x). i∈I(¯ x)
Observe that for i ∈ I(¯ x), gi (¯ x) = 0 while for i ∈ / I(¯ x), gi (¯ x) < 0. Therefore, F(¯ x) = f (¯ x) = fρ0 (¯ x).
© 2012 by Taylor & Francis Group, LLC
(10.21)
10.5 Exact Penalization Approach
353
As x ¯ lies on the line segment joining xρ and x ˆ, there exists λ ∈ (0, 1) such that x ¯ = λxρ + (1 − λ)ˆ x. Then by the convexity of gi , i = 1, 2, . . . , m, gi (¯ x) ≤ λgi (xρ ) + (1 − λ)gi (ˆ x), i = 1, 2, . . . , m. For i ∈ I(¯ x), gi (¯ x) = 0, which along with the Slater constraint qualification reduces the above inequality to 0 < −(1 − λ)gi (ˆ x) ≤ λgi (xρ ), ∀ i ∈ I(¯ x). Therefore, for i ∈ I(¯ x), gi (xρ ) > 0, which implies X
gi (xρ ) =
i∈I(¯ x)
X
i∈I(¯ x)
max{0, gi (xρ )} ≤
m X
max{0, gi (xρ )},
i=1
thereby leading to the fact that F(xρ ) ≤ fρ0 (xρ ).
(10.22)
To prove the result, it is sufficient to show that F(¯ x) < F(xρ ). But first we will show that F(ˆ x) < F(¯ x). Consider F(ˆ x) = f (ˆ x) + ρ0 Because gi (ˆ x) < 0, i = 1, 2, . . . , m, by the given hypothesis implies
P
X
gi (ˆ x).
i∈I(¯ x)
x) i∈I(¯ x) gi (ˆ
≤ maxi=1,...,m gi (ˆ x), which
F(ˆ x) ≤ f (ˆ x) + ρ0 γ = inf f (x) − 1 < f (¯ x) = F(¯ x). x∈C
(10.23)
The convexity of F along with (10.23) leads to F(¯ x) < λF(xρ ) + (1 − λ)F(¯ x), which implies F(¯ x) < F(xρ ). Therefore, by (10.21) and (10.22), f (¯ x) < fρ0 (xρ ). By the definition of fρ , fρ0 (x) ≤ fρ (x) for every ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 , i = 1, 2, . . . , m, which along with the fact that xρ is an ε-solution of (CP )ρ implies f (¯ x) < fρ (xρ ) ≤ fρ (x) + ε, ∀ x ∈ Rn . For x ∈ C, fρ (x) = f (x), which reduces the above condition to f (¯ x) ≤ f (x) + ε, ∀ x ∈ C, thereby implying that x ¯ is an ε-solution of (CP ).
© 2012 by Taylor & Francis Group, LLC
354
Approximate Optimality Conditions
For a better understanding of the above result, let us consider the following example. Consider inf ex
x ≤ 0.
subject to
Obviously the Slater constraint qualification holds. Consider xˆ = −1 and then ρ0 = e−1 + 1. For ε = 2, xρ = 1/2 > 0 is an ε-solution for every ρ ≥ ρ0 . Here, x ¯ = 0 ∈ [−1, 1/2] is an ε-solution for the constrained problem. Observe that one requires the fact that xρ is an ε-solution of (CP )ρ only to establish that x ¯ is an ε-solution of (CP ). So from the proof of Theorem 10.19 it can also be worked out that under the Slater constraint qualification, corresponding to xρ 6∈ C, there exists x ¯ ∈ C such that f (¯ x) = fρ (¯ x) < fρ (xρ ) for every ρ ≥ ρ0 , where ρ0 is the same as in the previous theorem. As a matter of fact, because the set C is closed convex, one can always find such an x ¯ on the boundary of C. As xρ 6∈ C is arbitrarily chosen, then from the above inequality it is obvious that inf f (x) ≤ fρ (x), ∀ x ∈ / C.
x∈C
Also, for any x ∈ C, f (x) = fρ (x), which along with the above condition implies inf fρ (x) = inf f (x) ≤ infn fρ (x).
x∈C
x∈C
x∈R
The reverse inequality holds trivially. Therefore, inf f (x) = infn fρ (x).
x∈C
(10.24)
x∈R
This leads to the fact that every ε-solution of (CP ) is also an ε-solution of the penalized unconstrained problem. Next we derive the approximate optimality conditions for (CP ) using the penalized unconstrained problem. Theorem 10.20 Consider the convex programming problem (CP ) with C defined by (3.1). Assume that the Slater constraint qualification is satisfied. Let ε ≥ 0. Then x ¯ is an ε-solution of (CP ) if and only if there exist ε¯0 ≥ 0, ε¯i ≥ 0 ¯ i ≥ 0, i = 1, . . . , m, such that and λ 0 ∈ ∂ε¯0 f (¯ x) +
m X i=1
¯ i gi )(¯ ∂ε¯i (λ x)
and
m X i=0
ε¯i − ε =
m X i=1
¯ i gi (¯ λ x) ≤ 0.
Proof. As x ¯ ∈ C is an ε-solution of (CP ), from the above discussion it is also an ε-solution of the penalized unconstrained problem for ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 > 0, where ρ0 is defined in Theorem 10.19. Therefore, by the
© 2012 by Taylor & Francis Group, LLC
10.6 Ekeland’s Variational Principle Approach
355
approximate optimality condition, Theorem 10.4, for the unconstrained penalized problem (CP )ρ , 0 ∈ ∂ε fρ (¯ x). As dom f = dom gi = Rn , applying the ε-subdifferential P Sum Rule, Theom rem 2.115, there exist εi ≥ 0, i = 0, 1, . . . , m, satisfying i=0 εi = ε such that 0 ∈ ∂ε0 f (¯ x) +
m X
∂εi (max{0, ρi gi (.)})(¯ x).
i=1
By the ε-subdifferential Max-Function Rule, Remark 2.119, there exist 0 ≤ λi ≤ 1 and ε¯i ≥ 0 satisfying εi = ε¯i + max{0, ρi gi (¯ x)} − λi ρi gi (¯ x) = ε¯i − λi ρi gi (¯ x)
(10.25)
for every i = 1, 2, . . . , m such that 0 ∈ ∂ε¯0 f (¯ x) +
m X
¯ i gi )(¯ ∂ε¯i (λ x),
i=1
¯ i = ρi λi ≥ 0, i = 1, 2, . . . , m. The condition (10.25) where ε¯0 =P ε0 ≥ 0 and λ m along with i=0 εi = ε implies that m X i=0
ε¯i −
m X
¯ i gi (¯ λ x) = ε,
i=1
thereby leading to the requisite conditions. The converse can be proved in a similar fashion, as done in Theorem 10.9 with εs = 0. Note that the conditions obtained in the above theorem are the same as those in Theorem 10.11.
10.6
Ekeland’s Variational Principle Approach
In all the earlier sections, we concentrated on the ε-solutions. If x ¯ is an ε-solution of (CP ), then by the Ekeland’s variational principle, Theorem 2.113, mentioned in Chapter 2 there exists x ˆ ∈ C such that √ f (ˆ x) ≤ f (x) + εkx − x ˆk, ∀ x ∈ C. Any x ˆ satisfying the above condition is a quasi ε-solution of (CP ). Observe that we are emphasizing only one of the conditions of the Ekeland’s variational
© 2012 by Taylor & Francis Group, LLC
356
Approximate Optimality Conditions
principle and the other two need not be satisfied. In this section we deal with the quasi ε-solution and derive the approximate optimality conditions for this class of approximate solutions for (CP ). But before doing so, let us illustrate by an example that a quasi ε-solution may or may not be an ε-solution. Consider the problem inf
1 x
subject to
x > 0.
1 Note that the infimum of the problem is zero, which is not attained. For ε = , 4 it is easy to note that xε = 4 is an ε-solution. Now x ¯ > 0 is a quasi ε-solution if 1 1 1 ≤ + |x − x ¯|, ∀ x > 0. x ¯ x 2 Observe that x ¯ = 4.5 is a quasi ε-solution that is also an ε-solution satisfying all the conditions of the Ekeland’s variational principle, while x¯ = 3.5 is a quasi ε-solution that is not ε-solution. Also, it does not satisfy the condition 1 1 ≤ . These are not the only quasi ε-solutions. Even points that satisfy x ¯ xε only the unique minimizer condition of the variational principle, like x¯ = 3, are also the quasi ε-solution to the above problem. Now we move on to discuss the approximate optimality conditions for the quasi ε-solutions. Theorem 10.21 Consider the convex programming problem (CP ) with C given by (3.1). Let ε ≥ 0 be given. Assume that the Slater constraint qualification holds. Then x ¯ is a quasi ε-solution of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x) +
√
εB
and
λi gi (¯ x) = 0, i = 1, 2, . . . , m.
i=1
Proof. A quasi ε-solution x ¯ of (CP ) can be considered a minimizer of the convex programming problem √ ¯k subject to x ∈ C, min f (x) + εkx − x where C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}. By the KKT optimality condition, Theorem 3.7, x ¯ is a minimizer of the above problem if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that 0 ∈ ∂(f +
√
εk. − x ¯k)(¯ x) +
m X
λi ∂gi (¯ x).
i=1
As dom f = dom k. − x ¯k = Rn , invoking the Sum Rule, Theorem 2.91, along
© 2012 by Taylor & Francis Group, LLC
10.6 Ekeland’s Variational Principle Approach
357
with the fact that ∂k. − x ¯k = B, the above inclusion becomes 0 ∈ ∂f (¯ x) +
m X √ εB + λi ∂gi (¯ x), i=1
along with λi gi (¯ x) = 0, i = 1, 2, . . . , m, thereby yielding the requisite conditions. Conversely, by the approximate optimality condition, there exist ξ0 ∈ ∂f (¯ x), ξi ∈ ∂gi (¯ x), i = 1, 2, . . . , m, and b ∈ B such that 0 = ξ0 +
m X
λi ξi +
√
εb.
(10.26)
i=1
By Definition 2.77 of the subdifferential, f (x) − f (¯ x) ≥ hξ0 , x − x ¯i, ∀ x ∈ Rn , gi (x) − gi (x) ≥ hξi , x − x ¯i, ∀ x ∈ Rn , i = 1, 2, . . . , m,
(10.27) (10.28)
and by the Cauchy–Schwartz inequality, Proposition 1.1, kbk kx − x ¯k ≥ hb, x − x ¯i, ∀ x ∈ Rn .
(10.29)
Combining the inequalities (10.27), (10.28), and (10.29) along with (10.26) implies f (x) − f (¯ x) +
m X i=1
λi gi (x) −
m X
λi gi (¯ x) +
i=1
√
εkbk kx − x ¯k ≥ 0, ∀ x ∈ Rn .
For any x feasible to (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which along with the complementary slackness condition and the fact that λi ≥ 0, i = 1, 2, . . . , m, reduces the above inequality to √ ¯k ≥ 0, ∀ x ∈ C. f (x) − f (¯ x) + εkbk kx − x As b ∈ B, kbk ≤ 1, thereby leading to √ ¯k ≥ 0, ∀ x ∈ C f (x) − f (¯ x) + εkx − x and thus, establishing the requisite result.
Observe that the above theorem provides a necessary as well as sufficient characterization to the quasi ε-solution. Here the approximate optimality condition is in terms of B and the subdifferentials, unlike the earlier results of this chapter where the approximate optimality conditions were expressed in terms of the ε-subdifferentials. Also, here we obtain the standard complementary slackness condition instead of the ε-complementary slackness or relaxed ε-complementary slackness conditions. Results similar to the ε-saddle point can also be worked out for quasi ε-saddle points. For more details, one can look into Dutta [37].
© 2012 by Taylor & Francis Group, LLC
358
10.7
Approximate Optimality Conditions
Modified ε-KKT Conditions
In all discussions regarding the KKT optimality conditions in the earlier chapters, it was observed that under some constraint qualification, the optimality conditions are established at the point of minimizer, that is, the KKT optimality conditions are nothing but point conditions. Due to this very reason, the KKT conditions have not been widely incorporated in the optimization algorithm design but only used as stopping criteria. However, if one could find the direction of the minima using the deviations from the KKT conditions, it could be useful from an algorithmic point of view. Work has recently been done in this respect by Dutta, Deb, Arora, and Tulshyan [39]. They introduced a new notion of modified ε-KKT point and used it to study the convergence of the sequences of modified ε-KKT points to the minima of the convex programming problem (CP ). Below we define this new concept, which is again motivated the Ekeland’s variational principle. Definition 10.22 A feasible point x ¯ of (CP ) is said to be a modified √ ε-KKT point for a given ε > 0 if there exists x ˜ ∈ Rn satisfying k˜ x−x ¯k ≤ ε and there exist ξ˜0 ∈ ∂f (˜ x), ξ˜i ∈ ∂gi (˜ x) and λi ≥ 0, i = 1, 2, . . . , m, such that kξ˜0 +
m X i=1
λi ξ˜i k ≤
√ ε
and
ε+
m X i=1
λi gi (¯ x) ≥ 0.
Observe that in the ε-KKT condition, the subdifferentials are calculated at x), whereas the relaxed ε-complementary slackness condition some x ˜ ∈ B√ε (¯ is satisfied at x ¯ itself. Before moving on to the satability part, we try to relate the already discussed ε-solution with the modified ε-KKT point. Theorem 10.23 Consider the convex programming problem (CP ) with C given by (3.1). Assume that the Slater constraint qualification holds and let x ¯ be an ε-solution of (CP ). Then x ¯ is a modified ε-KKT point. Proof. Because x ¯ is an ε-solution of (CP ), by Theorem 10.13, there exist λi ≥ 0, i = 1, 2, . . . , m, such that x ¯ is also an ε-saddle point along with ε+
m X i=1
λi gi (¯ x) ≥ 0.
(10.30)
As x ¯ is an ε-saddle point, L(¯ x, λ) ≤ L(x, λ) + ε, ∀ x ∈ Rn , n which implies x ¯ is an ε-solution of L(., λ) vari√ over R . Applying Ekeland’s n ational principle, Theorem 2.113, for ε, there exists x ˜ ∈ R satisfying √ ˜ is a minimizer of the problem k˜ x−x ¯k ≤ ε such that x √ ˜k subject to x ∈ Rn . min L(x, λ) + εkx − x
© 2012 by Taylor & Francis Group, LLC
10.7 Modified ε-KKT Conditions
359
By the unconstrained optimality condition, Theorem 2.89, √ ˆk)(˜ 0 ∈ ∂(L(., λ) + εk. − x x). As the functions dom f = dom gi = Rn , i = 1, 2, . . . , m, applying the Sum Rule, Theorem 2.91, 0 ∈ ∂f (˜ x) +
m X
λi ∂gi (˜ x) +
√ εB,
i=1
which implies there exist ξ˜0 ∈ ∂f (˜ x), ξ˜i ∈ ∂gi (˜ x), i = 1, 2, . . . , m, and b ∈ B such that 0 = ξ˜0 +
m X
λi ξ˜i +
√ εb,
i=1
thereby leading to kξ˜0 +
m X i=1
λi ξ˜i k ≤
√ ε,
which along with the condition (10.30) implies that x¯ is a modified ε-KKT point as desired. In the case of the exact penalization approach, from Theorem 10.16 we have that every convergent sequence of ε-solutions of a sequence of penalized problems converges to an ε-solution of (CP ). Now is it possible to establish such a result by studying the sequence of modified ε-KKT points and the answer is yes, as shown in the following theorem. Theorem 10.24 Consider the convex programming problem (CP ) with C given by (3.1). Assume that the Slater constraint qualification holds and let {εk } ⊂ R+ such that εk ↓ 0 as k → +∞. For every k, let xk be a modified εk -KKT point of (CP ) such that xk → x ¯ as k → +∞. Then x ¯ is a point of minimizer of (CP ). Proof. As for every k, xk is a modified εk -KKT point of (CP ), there exists √ xk ), ξik ∈ ∂gi (˜ xk ), x ˜k ∈ Rn satisfying k˜ xk − xk k ≤ ε and there exist ξ0k ∈ ∂f (˜ k and λi ≥ 0, i = 1, 2, . . . , m, such that kξ0k +
m X i=1
λki ξik k ≤
√ εk
and
εk +
m X i=1
λki gi (xk ) ≥ 0.
(10.31)
k We claim that {λk } ⊂ Rm + is a bounded sequence. Suppose that {λ } is an k λ unbounded sequence. Define a bounded sequence γ k = with kγ k k = 1. kλk k
© 2012 by Taylor & Francis Group, LLC
360
Approximate Optimality Conditions
Because {γ k } is a bounded sequence, by the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, assume that γ k → γ with kγk = 1. Observe that k˜ xk − x ¯k
≤
k˜ xk − xk k + kxk − x ¯k √ εk + kxk − x ¯k. ≤
By the given hypothesis, as k → +∞, εk ↓ 0 and xk → x ¯, which implies x ˜k → x ¯. Now dividing both the conditions of (10.31) throughout by kλk k yields k
√ m εk 1 k X k k + ξ γ ξ k ≤ kλk k 0 i=1 i i kλk k
m X
and
i=1
γik gi (xk ) ≥ −
εk . kλk k
By Proposition 2.83, f and gi , i = 1, 2, . . . , m, have compact subdifferentials. Thus, {ξ0k } and {ξik }, i = 1, 2, . . . , m, are bounded sequences and hence by the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent subsequence. Without loss of generality, let ξ0k → ξ0 and ξik → ξi , i = 1, 2, . . . , m. By the Closed Graph Theorem, Theorem 2.84, ξ0 ∈ ∂f (¯ x) and ξi ∈ ∂gi (¯ x), i = 1, 2, . . . , m. Therefore, as k → +∞, √ εk 1 k εk ξ → 0, → 0 and → 0, kλk k 0 kλk k kλk k Pm Pm Pm which implies k i=1 γi ξi k ≤ 0, that is, i=1 γi ξi = 0 and i=1 γi gi (¯ x) ≥ 0. By Definition 2.77 of the subdifferential, gi (x) − gi (¯ x) ≥ hξi , x − x ¯i, ∀ x ∈ Rn , i = 1, 2, . . . , m, which yields m X
γi gi (x) ≥
i=1
m X i=1
γi gi (¯ x) ≥ 0, ∀ x ∈ Rn ,
thereby contradicting the existence of a point x ˆ satisfying gi (ˆ x) < 0, i = 1, 2, . . . , m by the Slater constraint qualification. Therefore, the sequence {λk } is a bounded sequence and hence by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence. Without loss of generality, let λki → λi , i = 1, 2, . . . , m. Taking the limit as k → +∞ in (10.31) yields kξ0 +
m X i=1
λi ξi k ≤ 0
and
m X i=1
The norm condition in (10.32) implies that 0 = ξ0 +
m X i=1
© 2012 by Taylor & Francis Group, LLC
λi ξi ,
λi gi (¯ x) ≥ 0.
(10.32)
10.8 Duality-Based Approach to ε-Optimality
361
thereby leading to the optimality condition 0 ∈ ∂f (¯ x) +
m X
λi ∂gi (¯ x).
i=1
As {xk } is a modified εk -KKT point, it is a feasible point of (CP ), that is, gi (xk ) ≤ 0, i = 1, 2, . . . , m, which implies gi (¯ x) ≤ 0, i = 1, 2, . . . , m, as k → +∞. This along with the condition in (10.32) leads to the complementary slackness condition m X
λi gi (¯ x) = 0.
i=1
Hence, x ¯ satisfies the standard KKT optimality conditions. As (CP ) is a convex programming problem, by the sufficient optimality conditions, x¯ is a point of minimizer of (CP ), thereby establishing the desired result. Theorems 10.23 and 10.24 can be combined together and stated as follows. Theorem 10.25 Consider the convex programming problem (CP ) with C given by (3.1). Assume that the Slater constraint qualification holds. Let {xk } be a sequence of the εk -solution of (CP ) such that xk → x ¯ and εk ↓ 0 as k → +∞. Then x ¯ is a point of minimizer of (CP ).
10.8
Duality-Based Approach to ε-Optimality
In this chapter, in all the results on approximate optimality conditions, we have assumed the Slater constraint qualification. But what if neither the Slater nor any other constraint qualification is satisfied. Work has been done in this respect by Yokoyama [113] using the exact penalization approach. In this work, he replaced the assumption of Slater constraint qualification by relating the penalty parameter with the ε-maximum solution of the dual problem associated with (CP ). The results were obtained relating the ε-solutions of the given problem (CP ), its dual problem, and the penalized problem. Here we will discuss some of his results in comparison to the ones derived in Section 10.5. For that purpose, we associate the dual problem sup w(λ)
subject to
λ ∈ Rm ,
(DP )
where w(λ) = infn L(x, λ) and L(x, λ) is the Lagrange function given by x∈R
m f (x) + X λ g (x), i i L(x, λ) = i=1 −∞, © 2012 by Taylor & Francis Group, LLC
λi ≥ 0, i = 1, 2, . . . , m, otherwise.
362
Approximate Optimality Conditions
Denote the duality gap by θ = inf x∈C f (x) − supλ∈Rm w(λ). Next we present the theorem relating the ε-solution of (CP )ρ with the almost ε-solution of (CP ) under the assumption of the ε-maximum solution of (DP ). Recall the penalized problem min fρ (x) subject to x ∈ Rn , (CP )ρ Pm where fρ (x) = f (x) + i=1 ρi max{0, gi (x)} and ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi > 0, i = 1, 2, . . . , m. Theorem 10.26 Consider the convex programming problem (CP ) with C given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying ¯i + θ , ρ ≥ 3 + max λ i=1,...,m ε ¯ = (λ ¯1, λ ¯2, . . . , λ ¯ m ) is an ε-maximum solution of (DP ), every x where λ ¯ that is an ε-solution of (CP )ρ is also an almost ε-solution of (CP ). Proof. Consider an ε-solution x ˆ ∈ C of (CP ), that is, f (ˆ x) ≤ inf f (x) + ε. x∈C
As x ¯ is an ε-solution of (CP )ρ , in particular, fρ (¯ x) ≤ fρ (ˆ x) + ε = f (ˆ x) + ε, which implies that f (¯ x) + ρ
m X i=1
max{0, gi (¯ x)} ≤ inf f (x) + 2ε. x∈C
(10.33)
By the definition of duality gap θ, inf f (x) = sup w(λ) + θ.
x∈C
(10.34)
λ∈Rm
¯ of the dual problem (DP ), For an ε-maximum solution λ ¯ + ε ≤ f (¯ sup w(λ) ≤ w(λ) x) +
λ∈Rm
m X
¯ i gi (¯ λ x) + ε.
i=1
Therefore, using the conditions (10.34) and (10.35), (10.33) becomes f (¯ x) + ρ
m X
max{0, gi (¯ x)} ≤ f (¯ x) +
m X
max{0, gi (¯ x)} ≤
i=1
m X
¯ i gi (¯ λ x) + 3ε + θ,
i=1
that is, ρ
i=1
© 2012 by Taylor & Francis Group, LLC
m X i=1
¯ i gi (¯ λ x) + 3ε + θ.
(10.35)
10.8 Duality-Based Approach to ε-Optimality
363
Define the index set I > = {i ∈ {1, 2, . . . , m} : gi (¯ x) > 0}. Thus ρ
X
gi (¯ x) = ρ
i∈I >
m X
max{0, gi (¯ x)}
i=1
≤ ≤
m X
¯ i gi (¯ λ x) + 3ε + θ
i=1
X
¯ i gi (¯ λ x) + 3ε + θ,
i∈I >
which implies ¯i) (ρ − max λ i=1,...,m
X
i∈I >
gi (¯ x) ≤
X
i∈I >
¯ i )gi (¯ (ρ − λ x) ≤ 3ε + θ.
From the above condition and the given hypothesis on ρ, X 3ε + θ gi (¯ x) ≤ ¯ i ) ≤ ε, (ρ − max λ i∈I > i=1,...,m
thereby implying that x ¯ ∈ Cε = {x ∈ Rn : gi (x) ≤ ε, i = 1, 2, . . . , m}. Also, f (¯ x) ≤ fρ (¯ x). As x ¯ is an ε-solution of (CP )ρ , fρ (¯ x) ≤ fρ (x) + ε, ∀ x ∈ Rn , which along with the fact that f (x) = fρ (x) for every x ∈ C leads to f (¯ x) ≤ f (x) + ε, ∀ x ∈ C, thus implying that x ¯ is an almost ε-solution of (CP ).
This result is the same as Theorem 10.17 except for the bound on α+ε where ε α = inf x∈C f (x) − inf x∈Rn f (x). Also in that result the Slater constraint qualification was not assumed. Observe that both the results are similar but the parameter bounds are different. Under the Slater constraint qualification, it is known that strong duality holds and thus the duality gap θ = 0 and the dual problem (DP ) is solvable. Consequently, under the Slater constraint qualification, the bound on the parameter now becomes the penalty parameter. Recall from Theorem 10.17 that ρ ≥
¯i, ρ ≥ 3 + max λ i=1,...,m
¯ = (λ ¯1, λ ¯2, . . . , λ ¯ m ) is a maximizer of (DP ). Here we were discussing where λ the existence of an almost ε-solution of (CP ), given an ε-solution of (CP )ρ . From the discussion in Section 10.5, it is seen that under the Slater constraint qualification and for ρ ≥ ρ0 with ρ0 given in Theorem 10.19, inf f (x) = infn fρ (x),
x∈C
x∈R
thereby implying that every x ¯ that is an ε-solution of (CP ) is also an ε-solution of (CP )ρ . So in absence of any constraint qualification, Yokoyama [113] obtained that x ¯ is an (2ε + θ)-solution of (CP )ρ presented below.
© 2012 by Taylor & Francis Group, LLC
364
Approximate Optimality Conditions
Theorem 10.27 Consider the convex programming problem (CP ) with C given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying ¯i, ρ ≥ max λ i=1,...,m
¯ = (λ ¯1, λ ¯2, . . . , λ ¯ m ) is an ε-maximum solution of (DP ), every x where λ ¯ that is an ε-solution of (CP ) is also a (2ε + θ)-solution of (CP )ρ . Proof. As x ¯ is an ε-solution of (CP ), x ¯ ∈ C, which implies fρ (¯ x) = f (¯ x) ≤ inf f (x) + ε. x∈C
¯ is an ε-maximum solution of (DP ), working along the lines of TheoAs λ rem 10.26, the above condition becomes fρ (¯ x) ≤ f (x) +
m X i=1
λi gi (x) + 2ε + θ, ∀ x ∈ Rn .
Using the hypothesis on ρ, the above inequality leads to fρ (¯ x) ≤ f (x) + ρ
m X i=1
max{0, gi (x)} + 2ε + θ = fρ (x) + 2ε + θ, ∀ x ∈ Rn ,
thereby implying that x ¯ is a (2ε + θ)-solution of (CP )ρ .
It was mentioned by Yokoyama [113] that in the presence of the Slater con¯ as some optimal Lagrange multiplier, every straint qualification and with λ x ¯ that is an ε-solution of (CP ) is also an ε-solution of (CP )ρ . In his work, Yokoyama also derived the necessary approximate optimality conditions as established in this chapter in the absence of any constraint qualification. The sufficiency could be established only under the assumption of the Slater constraint qualification.
© 2012 by Taylor & Francis Group, LLC
Chapter 11 Convex Semi-Infinite Optimization
11.1
Introduction
In all the preceding chapters we considered the convex programming problem (CP ) with the feasible set C of the form (3.1), that is, C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Observe that the problem involved only a finite number of constraints. Now in situations where the number of constraints involved is infinite, the problem extends to the class of semi-infinite programming problems. Such problems come into existence in many physical and social sciences models where it is necessary to consider the constraints on the state or the control of the system during a period of time. For examples from real-life scenarios where semi-infinite programming problem are involved, readers may refer to Hettich and Kortanek [57] and references therein. We consider the following convex semi-infinite programming problem, inf f (x)
subject to
g(x, i) ≤ 0, i ∈ I
(SIP )
where f, g(., i) : Rn → R, i ∈ I are convex functions with infinite index set I ⊂ Rm . The term “semi-infinite programming” is derived from the fact that the decision variable x is finite while the index set I is infinite. But before moving on with the derivation of KKT optimality conditions for (SIP ), we present some notations that will be used in subsequent sections. Denote the feasible set of (SIP ) by CI , that is, CI = {x ∈ Rn : gi (x) ≤ 0, i ∈ I}. Let RI be the product space of λ = (λi ∈ R : i ∈ I) and R[I] = {λ ∈ RI : λi 6= 0 for finitely many i ∈ I}, [I]
while the positive cone in R[I] , R+ , is defined as [I]
R+ = {λ ∈ R[I] : λi ≥ 0, ∀ i ∈ I}. 365 © 2012 by Taylor & Francis Group, LLC
366
Convex Semi-Infinite Optimization
For a given z ∈ RI and λ ∈ R[I] , define the supporting set of λ as supp λ = {i ∈ I : λi = 6 0}, X X hλ, zi = λ i zi = λ i zi . i∈I
i∈supp λ
With these notations, we now move on to study the various approaches to obtain the KKT optimality conditions for (SIP ).
11.2
Sup-Function Approach
A possible approach to solve (SIP ) is to associate a problem with a finite number of constraints, that is, the reduced form of (SIP ) inf f (x)
subject to
˜ g(x, i) ≤ 0, i ∈ I,
g) (SIP
where I˜ ⊂ I is finite and f and g(., i), i ∈ I˜ are as in (SIP ) such that the g ) coincide. Then (SIP g) optimal value of (SIP ) and the reduced problem (SIP is said to be the equivalent reduced problem of (SIP ). One way to reduce (SIP ) g ) is to replace the infinite inequality constraints by a to an equivalent (SIP single constraint, g˜(x) = sup g(x, i), i∈I
¯ is a convex function by Proposition 2.53 (iii). Therefore, where g˜ : Rn → R the reduced problem is inf f (x)
subject to
g˜(x) ≤ 0.
g sup ) (SIP
Such a formulation was studied by Pshenichnyi [96], where g(., i) for every g sup ) i ∈ I, were taken to be convex differentiable functions. Observe that (SIP is of the form (CP ) studied in Chapter 3. It was seen that under the Slater constraint qualification, the standard KKT optimality conditions for (CP ) g sup ) should satisfy can be obtained. Therefore, to apply Theorem 3.7, (SIP the Slater constraint qualification. But this problem is equivalent to (SIP ), for which we introduce the following Slater constraint qualification for (SIP ). Definition 11.1 The Slater constraint qualification for (SIP ) is (i) I ⊂ Rm is a compact set, (ii) g(x, i) is a continuous function of (x, i) ∈ Rn × I, (iii) There exists x ˆ ∈ Rn such that g(ˆ x, i) < 0 for every i ∈ I.
© 2012 by Taylor & Francis Group, LLC
11.2 Sup-Function Approach
367
Observe that in the Slater constraint qualification for (CP ), only condition (iii) is considered. Here the additional conditions (i) and (ii) ensure that the supremum is attained over I, which holds trivially in the finite index set scenario. We now present the KKT optimality condition for (SIP ). Theorem 11.2 Consider the convex semi-infinite programming problem (SIP ). Assume that the Slater constraint qualification for (SIP ) holds. Then [I(¯ x)] x ¯ ∈ Rn is a point of minimizer of (SIP ) if and only if there exists λ ∈ R+ such that X 0 ∈ ∂f (¯ x) + λi ∂g(¯ x, i), i∈supp λ
where I(¯ x) = {i ∈ I : g(¯ x, i) = 0} denotes the active index set and the subdifferential ∂g(¯ x, i) is with respect to x. g sup ) and thus x Proof. As already observed, (SIP ) is equivalent to (SIP ¯ is g also a point of minimizer of (SIP sup ). As the Slater constraint qualification for (SIP ) holds, then by conditions (i) and (ii) the supremum is attained over I. Therefore, by condition (iii) of the Slater constraint qualification for (SIP ), there exists x ˆ ∈ Rn such that g˜(ˆ x) = sup g(ˆ x, i) < 0, i∈I
g sup ) satisfies the Slater constraint qualification. Inwhich implies that (SIP voking Theorem 3.7, there exists λ′ ≥ 0 such that 0 ∈ ∂f (¯ x) + λ′ ∂˜ g (¯ x)
and
λ′ g˜(¯ x) = 0.
(11.1)
Now we consider two cases depending on g˜(¯ x). (i) g˜(¯ x) < 0: By the complementary slackness condition λ′ = 0. Also, because g˜(¯ x) < 0, g(¯ x, i) < 0 for every i ∈ I, which implies the active index set I(¯ x) is empty. Thus the optimality condition (11.1) reduces to 0 ∈ ∂f (¯ x), [I]
and the KKT optimality condition for (SIP ) holds with λ = 0 ∈ R+ .
(ii) g˜(¯ x) = 0: By the complementary slackness condition, λ ≥ 0. Define the supremum set as ˆ x) = {i ∈ I : g(¯ I(¯ x, i) = g˜(¯ x)} = {i ∈ I : g(¯ x, i) = 0},
ˆ x) = I(¯ which implies that I(¯ x). By the conditions (i) and (ii) of the Slater ˆ x) and hence I(¯ constraint qualification for (SIP ), I(¯ x) is nonempty. By the Valadier formula, Theorem 2.97, the optimality condition becomes X 0 ∈ ∂f (¯ x) + λi ∂g(¯ x, i), i∈I(¯ x)
© 2012 by Taylor & Francis Group, LLC
368
Convex Semi-Infinite Optimization
¯ i ≥ 0, i ∈ I(¯ ¯ ∈ R[I(¯x)] satisfying P where λi = λ′ λ x) with λ + i∈supp [I(¯ x)] R+
′
As λ ≥ 0, λ ∈ expressed as
¯ λ
¯ i = 1. λ
and thus the preceding optimality condition can be
0 ∈ ∂f (¯ x) +
X
λi ∂g(¯ x, i).
i∈supp λ
Thus, the KKT optimality condition is obtained for (SIP ). Conversely, suppose that the optimality condition holds, which implies that there exist ξ ∈ ∂f (¯ x) and ξi ∈ ∂g(¯ x, i) such that X 0=ξ+ λi ξi , (11.2) i∈supp λ
[I(¯ x)]
where λ ∈ R+
. By Definition 2.77 of the subdifferential, for every x ∈ Rn , f (x) ≥ f (¯ x) + hξ, x − x ¯i,
g(x, i)
≥ g(¯ x, i) + hξi , x − x ¯i, i ∈ supp λ,
which along with the condition (11.2) implies that X X f (x) + λi g(x, i) ≥ f (¯ x) + λi g(¯ x, i), ∀ x ∈ Rn . i∈supp λ
i∈supp λ
The above inequality along with the fact that g(¯ x, i) = 0, i ∈ I(¯ x) leads to X f (x) + λi g(x, i) ≥ f (¯ x), ∀ x ∈ Rn . i∈supp λ
In particular, for x ∈ CI , that is, g(x, i) ≤ 0, i ∈ I, the above condition reduces to f (x) ≥ f (¯ x), ∀ x ∈ CI , thereby implying that x ¯ is the minimizer of (SIP ).
11.3
Reduction Approach
As already mentioned in the preceding section, the reduction approach is one possible method to establish the KKT optimality condition for (SIP ). The sup-function approach was one such reduction technique. Another way to forg ) is to use the approach by Ben-Tal, Rosinger, and mulate an equivalent (SIP Ben-Israel [9] to derive a Helly-type Theorem for open convex sets using the
© 2012 by Taylor & Francis Group, LLC
11.3 Reduction Approach
369
result by Klee [72]. But this approach was a bit difficult to follow. So Borwein [16] provided a self-contained proof of the reduction approach involving quasiconvex functions. Here we present the same under the assumptions that g(., i) is convex for every i ∈ I and g(., .) is jointly continuous as a function of (x, i) ∈ Rn × I. In the proof, one only needs g(x, i) to be jointly usc as a function of (x, i) ∈ Rn × I along with the convexity assumption. Proposition 11.3 Consider open and closed convex sets U ⊂ Rn and C ⊂ Rn , respectively. The following are equivalent when I is compact. (i) There exists x ∈ C and ε > 0 such that x + εB ⊂ U,
g(y, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I.
(ii) (a) For every set of n + 1 points {i0 , i1 , . . . , in } ⊂ I, there exists x ∈ C such that g(x, i0 ) < 0, g(x, i1 ) < 0, . . . , g(x, in ) < 0. (b) For every set of n points {i1 , i2 , . . . , in } ⊂ I, there exists x ∈ C such that x ∈ U, g(x, i1 ) < 0, g(x, i2 ) < 0, . . . , g(x, in ) < 0. Proof. It is obvious that (i) implies (ii)(b). Also, in particular, taking y = x ∈ x + εB in (i) yields (ii)(a). Therefore, to establish the result, we show that (ii) implies (i). Suppose that both (ii)(a) and (b) are satisfied. We first prove that (ii)(a) implies (i) with U = Rn . For any r ∈ N and any i ∈ I, define the set 1 C r (i) = {x ∈ C ∩ r cl B : g(y, i) < 0, ∀ y ∈ x + B}. r Observe that C r (i) ⊂ r cl B and hence is bounded. We claim that C r (i) is convex. Consider x1 , x2 ∈ C r (i), which implies that xj ∈ C ∩ r cl B, j = 1, 2. Because C and cl B are convex sets, C ∩ r cl B is also convex. Thus, (1 − λ)x1 + λx2 ∈ C ∩ r cl B, ∀ λ ∈ [0, 1]. 1 For any yj ∈ xj + B, j = 1, 2, r y = (1 − λ)y1 + λy2
© 2012 by Taylor & Francis Group, LLC
1 1 (1 − λ)(x1 + B) + λ(x2 + B) r r 1 ⊂ (1 − λ)x1 + λx2 + B. r ∈
370
Convex Semi-Infinite Optimization
As x1 , x2 ∈ C r (i), for j = 1, 2, 1 g(yj , i) < 0, ∀ yj ∈ xj + B. r By the convexity of g(., i), for any λ ∈ [0, 1], g(y, i) ≤ (1 − λ)g(y1 , i) + λg(y2 , i) < 0. 1 Because the above conditions hold for arbitrary yj ∈ xj + B, j = 1, 2, r 1 g(y, i) < 0, ∀ y ∈ (1 − λ)x1 + λx2 + B. r Therefore, from the definition of C r (i), it is obvious that (1 − λ)x1 + λx2 ∈ C r (i), ∀ λ ∈ [0, 1]. Because x1 , x2 ∈ C r (i) are arbitrary, C r (i) is a convex set. Next we prove that C r (i) is closed. Suppose that x ¯ ∈ cl C r (i), which r implies there exists a sequence {xk } ⊂ C (i) with xk → x ¯. Because xk ∈ C r (i), xk ∈ C ∩ r cl B such that 1 g(y, i) < 0, ∀ y ∈ xk + B. r
(11.3)
Because C and cl B are closed sets, C ∩ r cl B is also closed and thus, 1 x ¯ ∈ C ∩ r cl B. Now if x ¯ ∈ / C r (i), there exists some y¯ ∈ x ¯ + B such that r 1 g(¯ y , i) ≥ 0. As xk → x ¯, for sufficiently large k, y¯ ∈ xk + B with g(¯ y , i) ≥ 0, r which is a contradiction to condition (11.3). Thus C r (i) is a closed set. Finally, we claim that for some r¯ ∈ N and every set of n + 1 points {i0 , i1 , . . . , in } ⊂ I, n \
j=0
C r¯(ij ) 6= ∅.
On the contrary, suppose that for every r ∈ N, there exist n + 1 points {ir0 , ir1 , . . . , irn } ⊂ I such that n \
j=0
C r (irj ) = ∅.
(11.4)
Define the sequence sr = (ir0 , ir1 , . . . , irn ) ∈ I n+1 . As I is a compact set, I n+1 is also compact and thus {sr } is a bounded sequence. By the Bolzano– Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, assume that sr → s¯, where s¯ = (¯i0 , ¯i1 , . . . , ¯in ) ∈ I n+1 . As (ii)(a) is satisfied, there exists x ¯ ∈ C such that g(¯ x, ¯i0 ) < 0, g(¯ x, ¯i1 ) < 0, . . . , g(¯ x, ¯in ) < 0.
© 2012 by Taylor & Francis Group, LLC
11.3 Reduction Approach
371
Because g(., .) is jointly continuous on (x, i) ∈ Rn × I, hence jointly usc on (x, i) ∈ Rn × I. Therefore, by the above condition there exist ε > 0 and a neighborhood of ¯ij , N (¯ij ), j = 0, 1, . . . , n, such that g(y, ij ) < 0, ∀ y ∈ x ¯ + εB, ∀ ij ∈ N (¯ij ), j = 0, 1, . . . , n.
(11.5)
As irj → ¯ij , one may choose r¯ ∈ N sufficiently large such that k¯ xk ≤ r¯,
ε>
1 r¯
and
irj¯ ∈ N (¯ij ), j = 0, 1, . . . , n.
(11.6)
Combining (11.5) and (11.6), x¯ ∈ C ∩ r¯ cl B such that 1 g(y, irj¯) < 0, ∀ y ∈ x ¯ + B. r¯ Therefore, x ¯ ∈ C r¯(irj¯) for every j = 0, 1, . . . , n, which contradicts our assumption (11.4). Thus, for some r¯ ∈ N and every set of n + 1 points {i0 , i1 , . . . , in } ⊂ I, n \
j=0
C r¯(ij ) 6= ∅.
As C r¯(ij ), j = 0, 1, . . . , n, are nonempty compact convex sets, invoking Helly’s Theorem, Proposition 2.28, \ C r¯(i) 6= ∅. i∈I
From the above condition, there exists x ˜ ∈ C r (i) for every i ∈ I, which implies x ˜ ∈ C such that 1 g(y, i) < 0, ∀ y ∈ x ˜ + B, i ∈ I. r
1 for r ∈ N, the above condition yields (i). r To complete the proof, we have to finally show that (ii)(b) also implies (i). This can be done by expressing (ii(b) in the form of (ii)(a). Consider a point i′ ∈ / I and define I ′ = {i′ } ∪ I, which is again a compact set. Also define the function g ′ on Rn × I ′ as −δ, x ∈ U, ′ ′ g (x, i ) = and g ′ (x, i) = g(x, i), i ∈ I, +∞, x 6∈ U, Taking U = Rn and defining ε =
where δ > 0. Observe that g ′ (., i), i ∈ I ′ satisfies the convexity assumption and is jointly usc on Rn × I ′ . Therefore, (ii)(b) is equivalent to the existence of x ∈ C for every n points {i1 , i2 , . . . , in } ⊂ I, g ′ (x, i′ ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0.
© 2012 by Taylor & Francis Group, LLC
(11.7)
372
Convex Semi-Infinite Optimization
As (ii)(a) is also satisfied, for every n+1 points {i0 , i1 , . . . , in } ⊂ I there exists x ∈ C such that g ′ (x, i0 ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0.
(11.8)
Combining the conditions (11.7) and (11.8), (ii)(b) implies that for every n+1 points {i0 , i1 , . . . , in } ⊂ I ′ there exists x ∈ C such that g ′ (x, i0 ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0, which is of the form (ii)(a). As we have already seen that (ii)(a) implies (i) with U = Rn , there exists x ∈ C and ε > 0 such that g ′ (x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I ′ , which by the definition of the function g ′ implies that y ∈ U, g(x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I, that is, x + εB ⊂ U, g(x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I. Thus, (ii)(b) implies (i) and hence establishes the result.
Using the above proposition, Borwein [16] obtained the equivalent reduced form of (SIP ) under the relaxed Slater constraint qualification. The convex semi-infinite programming (SIP ) is said to satisfy the relaxed Slater constraint qualification for (SIP ) if given any n+1 points {i0 , i1 , . . . , in } ⊂ I, there exists x ˆ ∈ Rn such that g(ˆ x, i0 ) < 0, g(ˆ x, i1 ) < 0, . . . , g(ˆ x, in ) < 0. Observe that the Slater constraint qualification for (SIP ) also implies the relaxed Slater constraint qualification for (SIP ). Now we present the KKT g ). optimality condition for (SIP ) by reducing it to the equivalent (SIP
Theorem 11.4 Consider the convex semi-infinite programming problem (SIP ). Suppose that the relaxed Slater constraint qualification for (SIP ) holds. Then x ¯ is a point of minimizer of (SIP ) if and only if there exist n points {i1 , i2 , . . . , in } ⊂ I, λij ≥ 0, j = 1, 2, . . . , n, such that 0 ∈ ∂f (¯ x) +
n X
λij ∂g(¯ x, ij ).
j=1
Proof. Define an open set U = {x ∈ Rn : f (x) < f (¯ x)}.
© 2012 by Taylor & Francis Group, LLC
11.3 Reduction Approach
373
Consider x1 , x2 ∈ U . By the convexity of f , f ((1 − λ)x1 + λx2 ) ≤ (1 − λ)f (x1 ) + λf (x2 ) < f (¯ x), ∀ λ ∈ [0, 1], which implies (1 − λ)x1 + λx2 ∈ U . Because x1 , x2 ∈ U were arbitrary, U is a convex set. As x ¯ is a point of minimizer of (SIP ), there does not exist any x ∈ Rn and ε > 0 such that x + εB ⊂ U,
g(y, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I,
which implies that (i) of Proposition 11.3 does not hold. Therefore either (ii)(a) or (ii)(b) is not satisfied. As the relaxed Slater constraint qualification for (SIP ), which is the same as (ii)(a), holds, (ii)(b) cannot be satisfied. Thus, there exist n points {i1 , i2 , . . . , in } ⊂ I such that f (x) < f (¯ x),
g(x, ij ) < 0, j = 1, 2, . . . , n,
(11.9)
has no solution. We claim that x ¯ is a point of minimizer of the reduced problem inf f (x)
subject to
g(x, ij ) ≤ 0, j = 1, 2, . . . , n.
g ), that is, Consider a feasible point x ˜ of (SIP
g(x, ij ) ≤ 0, j = 1, 2, . . . , n.
g) (SIP (11.10)
Also, by the relaxed Slater constraint qualification for (SIP ), corresponding g ), to the n + 1 points {i0 , i1 , i2 , . . . , in } ⊂ I with {i1 , i2 , . . . , in } as in (SIP there exists x ˆ such that g(ˆ x, ij ) < 0, j = 0, 1, 2, . . . , n.
(11.11)
By the convexity of g(., ij ), j = 1, 2, . . . , n, along with the conditions (11.10) and (11.11), g((1 − λ)˜ x + λˆ x, ij ) ≤ (1 − λ)g(˜ x, ij ) + λg(ˆ x, ij ) < 0, ∀ λ ∈ (0, 1). Because the system (11.9) has no solution, f ((1 − λ)˜ x + λˆ x) ≥ f (¯ x), ∀ λ ∈ (0, 1).
(11.12)
As dom f = Rn , by Theorem 2.69, f is continuous on Rn . Thus, taking the limit as λ → 0, the inequality (11.12) leads to f (˜ x) ≥ f (¯ x). g ), x Because x ˜ is an arbitrary feasible point of (SIP ¯ is a point of minimizer of g (SIP ). Observe that by (11.11), the Slater constraint qualification is satisfied © 2012 by Taylor & Francis Group, LLC
374
Convex Semi-Infinite Optimization
by the reduced problem. Therefore, invoking Theorem 3.7, there exist λij ≥ 0, j = 1, 2, . . . , n, such that 0 ∈ ∂f (¯ x) +
n X
λij ∂g(¯ x, ij )
and
λij g(¯ x, ij ) = 0, j = 1, 2, . . . , n. (11.13)
j=1
[I(¯ x)]
We claim that λ ∈ R+ . From the complementary slackness condition in the optimality condition (11.13), if ij 6∈ I(¯ x), λij = 0, whereas for ij ∈ I(¯ x), λij ≥ 0. For i 6∈ {i1 , i2 , . . . , in } but i ∈ I(¯ x), define λi = 0. Therefore, [I(¯ x)] λ ∈ R+ such that X 0 ∈ ∂f (¯ x) + λi ∂g(¯ x, i), i∈supp λ
thereby yielding the KKT optimality condition for (SIP ). The converse can be worked out along the lines of Theorem 11.2.
11.4
Lagrangian Regular Point
In both the preceding sections on reduction techniques to establish the KKT optimality condition for (SIP ), the index set I was taken to be compact. But then what about the scenarios where the index set I need not be compact. To look into such situations, L´opez and Vercher [75] introduced the concept of Lagrangian regular point, which we present next. Before we define this concept, we introduce the following notations. For x ¯ ∈ CI having nonempty I(¯ x), define [ ¯ x) = {∂g(¯ S(¯ x, i) ∈ Rn : i ∈ I(¯ x)} = ∂g(¯ x, i) i∈I(¯ x)
and b x) = cone co S(¯ ¯ x) = { S(¯
X
i∈supp λ
[I(¯ x)]
λi ∂g(¯ x, i) ∈ Rn : λ ∈ R+
}.
¯ x). By Definition 2.77 of the For any i ∈ I(¯ x), consider ξi ∈ ∂g(¯ x, i) ⊂ S(¯ subdifferential, g(x, i) − g(¯ x, i) ≥ hξi , x − x ¯i, ∀ x ∈ Rn . In particular, for {xk } ⊂ CI , that is, g(xk , i) ≤ 0 along with the fact that g(¯ x, i) = 0, i ∈ I(¯ x), the above inequality reduces to hξi , xk − x ¯i ≤ 0, ∀ k ∈ N.
© 2012 by Taylor & Francis Group, LLC
11.4 Lagrangian Regular Point
375
For any {αk } ⊂ R+ , hξi , αk (xk − x ¯)i ≤ 0, ∀ k ∈ N. Taking the limit as k → +∞ in the above inequality, hξi , zi = lim hξi , αk (xk − x ¯)i ≤ 0, k→∞
where z ∈ cl cone (CI − x ¯). By Theorem 2.35, z ∈ TCI (¯ x). Because i ∈ I(¯ x) ¯ x) were arbitrary, and ξi ∈ S(¯ ¯ x), hξ, zi ≤ 0, ∀ ξ ∈ S(¯ ¯ x))◦ . Because z ∈ TC (¯ which implies z ∈ (S(¯ x) is arbitrary, I ¯ x))◦ , TCI (¯ x) ⊂ (S(¯
(11.14)
b x) ⊂ NC (¯ which by Propositions 2.31(iii) and 2.37 implies that cl S(¯ x). As I (SIP ) is equivalent to the unconstrained problem, inf (f + δCI )(x)
subject to
x ∈ Rn ,
(SIPu )
therefore, if x ¯ is a point of minimizer of (SIP ), it is also a minimizer of (SIPu ). By Theorem 3.1, the following optimality condition 0 ∈ ∂f (¯ x) + NCI (¯ x)
(11.15)
b x) ⊂ NC (¯ holds. So rather than cl S(¯ x), one would prefer the reverse relation I so that the above condition may be explicitly expressed in terms of the subdifferential of the constraints. Thus, we move on with the notion of Lagrangian regular point studied in L´ opez and Vercher [75]. Definition 11.5 x ¯ ∈ CI is said to be a Lagrangian regular point if (i) I(¯ x) is empty: TCI (¯ x) = Rn . ¯ x))◦ ⊂ TC (¯ b x) is closed. (ii) I(¯ x) is nonempty: (S(¯ x) and S(¯ I
Recall the equivalent Abadie constraint qualification for (CP ) studied in Chapter 3, that is, S(¯ x) ⊂ TC (¯ x), where S(¯ x) = {v ∈ Rn : gi′ (¯ x, v) ≤ 0, ∀ i ∈ I(¯ x)}. By Proposition 3.9, b x) (S(¯ x))◦ = cl S(¯
which by Proposition 2.31(ii) and (iii) along with the fact that S(¯ x) is a closed convex cone implies that b x))◦ = (S(¯ ¯ x))◦ = S(¯ (S(¯ x) © 2012 by Taylor & Francis Group, LLC
376
Convex Semi-Infinite Optimization
where ¯ x) = {∂gi (¯ S(¯ x) : i = 1, 2, . . . , m} =
m [
∂gi (¯ x).
i=1
Therefore, the Abadie constraint qualification is equivalent to ¯ x))◦ ⊂ TC (¯ (S(¯ x). Moreover, in the derivation of the standard KKT optimality condition for (CP ), Theorem 3.10, under the Abadie constraint qualification, we further b x) was closed. A careful look at the Lagrangian regular point assumed that S(¯ when I(¯ x) is nonempty shows that it is an extension of the Abadie constraint qualification to (SIP ) along with the closedness condition. Next we derive the KKT optimality condition for (SIP ) under the Lagrangian regularity. The result is due to L´ opez and Vercher [75]. Theorem 11.6 Consider the convex semi-infinite programming problem (SIP ). Assume that x ¯ ∈ CI is a Lagrangian regular point. Then x ¯ is a point [I(¯ x)] of minimizer of (SIP ) if and only if there exists λ ∈ R+ such that X 0 ∈ ∂f (¯ x) + λi ∂g(¯ x, i). i∈supp λ
Proof. Suppose that x ¯ is a point of minimizer of (SIP ), which by the condition (11.15) implies 0 ∈ ∂f (¯ x) + NCI (¯ x). Therefore, there exists ξ ∈ ∂f (¯ x) such that −ξ ∈ NCI (¯ x). Depending on the emptiness and nonemptiness of I(¯ x), we consider the following two cases. (i) I(¯ x) is empty: As x ¯ is a Lagrangian regular point, TCI (¯ x) = Rn , which by Proposition 2.37 implies that NCI (¯ x) = (TCI (¯ x))◦ = {0}. Therefore, the optimality condition reduces to 0 ∈ ∂f (¯ x). ¯ x))◦ ⊂ TC (¯ (ii) I(¯ x) is nonempty: As x ¯ is a Lagrangian regular point, (S(¯ x), I which by Proposition 2.31(i) and (iii) yields that ¯ x))◦◦ NCI (¯ x) ⊂ (S(¯
¯ x) = cl S(¯ b x). = cl cone co S(¯
© 2012 by Taylor & Francis Group, LLC
(11.16)
11.4 Lagrangian Regular Point
377
¯ x))◦ , which implies that Also, by relation (11.14), TCI (¯ x) ⊂ (S(¯ b x) ⊂ NC (¯ cl S(¯ x) I
(11.17)
is always true. Combining the conditions (11.16) and (11.17), b x). NCI (¯ x) = cl S(¯
b x) is closed and Again, by Definition 11.5 of the Lagrangian regular point, S(¯ hence, the KKT optimality condition becomes b x), 0 ∈ ∂f (¯ x) + S(¯ [I(¯ x)]
which implies that there exists λ ∈ R+ X 0 ∈ ∂f (¯ x) +
such that
λi ∂g(¯ x, i),
i∈supp λ
as desired. The converse can be worked out as in Theorem 11.2, thereby establishing the requisite result. In Goberna and L´ opez [50], they consider the feasible direction cone to F ⊂ Rn at x ¯ ∈ F , DF (¯ x), defined as DF (¯ x) = {d ∈ Rn : there exists λ > 0 such that x ¯ + λd ∈ F }. It is easy to observe that DF (¯ x) ⊂ cone (F − x ¯).
(11.18)
In case F is a convex set, by Definition 2.46 of convex set, for every x ∈ F , x ¯ + λ(x − x ¯) ∈ F, ∀ λ ∈ (0, 1), which implies x − x ¯ ∈ DF (¯ x). Because x ∈ F was arbitrary, F − x ¯ ⊂ DF (¯ x). As DF (¯ x) is a cone, cone (F − x ¯) ⊂ DF (¯ x). (11.19) Combining the conditions (11.18) and (11.19), DF (¯ x) = cone (F − x ¯) and hence, the tangent cone to F at x ¯ is related to the feasible direction set as TF (¯ x) = cl DF (¯ x). For the convex semi-infinite programming problem (SIP ), the feasible set is CI . In particular, taking F = CI in the above condition yields TCI (¯ x) = cl DCI (¯ x).
© 2012 by Taylor & Francis Group, LLC
378
Convex Semi-Infinite Optimization
¯ x))◦ ; thus the above condition yields From (11.14) we have that TCI (¯ x) ⊂ (S(¯ ¯ x))◦ . DCI (¯ x) ⊂ TCI (¯ x) ⊂ (S(¯
(11.20)
In Definition 11.5 of the Lagrangian regular point, for x¯ ∈ CI with nonempty I(¯ x), ¯ x))◦ ⊂ TC (¯ (S(¯ x) = cl DCI (¯ x). I Combining the above condition with (11.20), which along with Proposib x) = cone co S(¯ ¯ x) implies that tion 2.31 and the fact that S(¯ b x) = (S(¯ ¯ x))◦◦ = (DC (¯ cl S(¯ x))◦ . I
b x) at the Lagrangian regular point x¯, the By the closedness condition of S(¯ preceding condition reduces to b x) = (DC (¯ S(¯ x))◦ . I
The above qualification condition is referred to as the convex locally FarkasMinkowski problem in Goberna and L´opez [50]. For x ¯ ∈ ri CI , TCI (¯ x) = Rn , which by Proposition 2.37 implies that NCI (¯ x) = (TCI (¯ x))◦ = {0}. ¯ x))◦ always holds, by Proposition 2.31, As TCI (¯ x) ⊂ (S(¯ ¯ x))◦◦ ⊂ (TC (¯ {0} ⊂ (S(¯ x))◦ = {0} I which implies b x) = cl cone co S(¯ ¯ x) = (S(¯ ¯ x))◦◦ = {0}. S(¯
b x) = NC (¯ Thus, S(¯ x) for every x ¯ ∈ ri CI . Therefore one needs to impose I the Lagrangian regular point condition to boundary points only. This fact was mentioned in Goberna and L´opez [50] and was proved in Fajardo and L´ opez [44]. Recall that in Chapter 3, under the Slater constraint qualification, Propob x), which by Propositions 2.31 and 2.37 is sition 3.3 leads to NC (¯ x) = S(¯ equivalent to b x))◦ = (cl S(¯ b x))◦ = S(¯ TC (¯ x) = (S(¯ x).
b x) is closed. Also, under the Slater constraint qualification, by Lemma 3.5, S(¯ Hence the Slater constraint qualification leads to the Abadie constraint qualification along with the closedness criteria. A similar result also holds for (SIP ). But before that we present Gordan’s Theorem of Alternative which plays an important role in establishing the result.
© 2012 by Taylor & Francis Group, LLC
11.4 Lagrangian Regular Point
379
Proposition 11.7 (Gordan’s Theorem of Alternative) Consider xi ∈ Rn for i ∈ I, where I is an arbitrary index set. If co{xi : i ∈ I} is a closed set, then the equivalence holds between the negation of system (I) and system (II), where (I) (II)
{x ∈ Rn : hxi , xi < 0, i ∈ I} 6= ∅, 0 ∈ co {xi : i ∈ I}.
Proof. If xi = 0 for some i ∈ I, then the result holds trivially as system (I) is not satisfied while system (II) holds. So without loss of generality, assume that xi 6= 0 for every i ∈ I. Suppose (I) does not hold. Let 0 6∈ co {xi : i ∈ I}. As by hypothesis co {xi : i ∈ I} is closed, by the Strict Separation Theorem, Theorem 2.26(iii), there exists a ∈ Rn with a 6= 0 such that ha, xi < 0, ∀ x ∈ co {xi : i ∈ I}. In particular, for xi ∈ co {xi : i ∈ I}, ha, xi i < 0, ∀ i ∈ I, which implies system (I) holds, a contradiction to our supposition. Thus 0 ∈ co {xi : i ∈ I}, that is, system (II) holds. [I] Suppose that system (II) holds, which implies that there exists λ ∈ R+ P with i∈supp λ λi = 1 such that X 0= λi xi . i∈supp λ
Let x ¯ ∈ {x ∈ Rn : hxi , xi < 0, i ∈ I}. Therefore, X 0 = h0, x ¯i = λi hxi , x ¯i < 0, i∈supp λ
which is a contradiction. Thus, system (I) does not hold, thereby completing the proof. The hypothesis that co {xi : i ∈ I} is a closed set is required as shown in the from L´ opez and Vercher [75]. Consider xi = (cos i, sin i) and h example π π . Observe that (0, 0) 6∈ co {xi : i ∈ I} as I= − , 2 2 0 ∈ co {cos i : i ∈ I}
and
0 ∈ co {sin i : i ∈ I}
cannot hold simultaneously because 0 ∈ co {cos i : i ∈ I} is possible only if π i = − at which sin i = −1. Thus system (II) is not satisfied. Also, there 2 does not exist any x = (xc , xs ) such that xc cos i + xs sin i < 0, ∀ i ∈ I.
© 2012 by Taylor & Francis Group, LLC
(11.21)
380
Convex Semi-Infinite Optimization
π On the contrary, suppose that such an x exists. In particular, taking i = − 2 π and i = 0 yields that xs > 0 and xc < 0, respectively. But as the limit i → , 2 xc cos i + xs sin i → xs , that is, for some i ∈ I, xc cos i + xs sin i > 0, which is a contradiction to (11.21). Hence, system (I) is also not satisfied. Note that co {xi : i ∈ I} is not π closed because taking the limit as i → , 2 xi = (cos i, sin i) → (0, 1) and (0, 1) ∈ / co {xi : i ∈ I}. Now we present the result from L´opez and Vercher [75] showing that the Slater constraint qualification for (SIP ) implies that every feasible point x ∈ CI is a Lagrangian regular point. Proposition 11.8 Consider the convex semi-infinite programming problem (SIP ). If the Slater constraint qualification for (SIP ) holds, then every x ¯ ∈ CI is a Lagrangian regular point. Proof. Suppose that x ¯ ∈ CI , that is, g(¯ x, i) ≤ 0, i ∈ I. Define g(x) = supi∈I g(x, i). (i) I(¯ x) is empty: By conditions (i) and (ii) of the Slater constraint qualification for (SIP ), g(¯ x) < 0 which by Proposition 2.67 implies that x¯ ∈ ri CI . Therefore, TCI (¯ x) = cl cone (CI − x ¯) = Rn . (ii) I(¯ x) is nonempty: We claim that I(¯ x) is compact. By condition (i) of the Slater constraint qualification for (SIP ), I is compact. Because I(¯ x) ⊂ I, I(¯ x) is bounded. Now consider {ik } ⊂ I(¯ x) such that ik → i. By the compactness of I and the fact that I(¯ x) ⊂ I, i ∈ I. As ik ∈ I(¯ x), g(¯ x, ik ) = 0, ∀ k ∈ N, which by condition (ii) of the Slater constraint qualification for (SIP ), that is, the continuity of g(x, i) with respect to (x, i) in Rn × I, implies that as the limit k → +∞, g(¯ x, i) = 0 and thus i ∈ I(¯ x). Therefore, I(¯ x) is closed, which along with the boundedness impliesSthat I(¯ x) is compact. ¯ x) = Next we will show that S(¯ x, i) is compact. Suppose that i∈I(¯ x) ∂g(¯ ¯ ¯ {ξk } ⊂ S(¯ x) with ξk → ξ. As ξk ∈ S(¯ x), there exists ik ∈ I(¯ x) such that ξk ∈ ∂g(¯ x, ik ), that is, by Definition 2.77 of the subdifferential, g(x, ik ) − g(¯ x, ik ) ≥ hξk , x − x ¯i, ∀ x ∈ Rn .
© 2012 by Taylor & Francis Group, LLC
11.4 Lagrangian Regular Point
381
As I(¯ x) is compact, without loss of generality, assume that ik → i ∈ I(¯ x). Taking the limit as k → +∞ in the above inequality along with condition (ii) of the Slater constraint qualification for (SIP ) yields g(x, i) − g(¯ x, i) ≥ hξ, x − x ¯i, ∀ x ∈ Rn , ¯ x), thereby implying that that is, ξ ∈ ∂g(¯ x, i) with i ∈ I(¯ x). Therefore ξ ∈ S(¯ ¯ x) is closed. As dom g(., i) = Rn , i ∈ I(¯ S(¯ x), by Proposition 2.83, ∂g(¯ x, i) is compact for i ∈ I(¯ x), which implies for every ξi ∈ ∂g(¯ x, i) there exists Mi > 0 such that kξi k ≤ Mi , ∀ i ∈ I(¯ x). Because I(¯ x) is compact, the supremum of Mi over I(¯ x) is attained, that is, sup Mi = M < +∞. i∈I(¯ x)
Therefore, for every i ∈ I(¯ x), kξk ≤ M, ∀ ξ ∈ ∂g(¯ x, i), ¯ x) is bounded. Thus S(¯ ¯ x) is compact. which implies that S(¯ As I(¯ x) is nonempty, g(¯ x) = 0. By condition (iii) of the Slater constraint qualification for (SIP ), there exists x ˆ such that g(ˆ x, i) < 0, i ∈ I, which along with condition (i) yields that g(ˆ x) < 0 = g(¯ x). Thus, x ¯ is not a point of minimizer of g and hence 0 ∈ / ∂g(¯ x). By the Valadier formula, Theorem 2.97, [ ¯ x). 0∈ / co ∂g(¯ x, i) = co S(¯ i∈I(¯ x)
¯ x) is compact, by Theorem 2.9, co S(¯ ¯ x) is also compact. By PropoBecause S(¯ ¯ b sition 3.4, cone co S(¯ x) and hence S(¯ x) is closed. Finally to establish that x ¯ is a Lagrangian regular point, we will prove ¯ x))◦ ⊂ TC (¯ that (S(¯ x). Define the set I ¯ x)}. S ′ (¯ x) = {x ∈ Rn : hξ, xi < 0, ∀ ξ ∈ S(¯ ¯ x) is closed with 0 6∈ co S(¯ ¯ x), by the Gordan’s Theorem of Alternative, As co S(¯ ¯ x))◦ Proposition 11.7, S ′ (¯ x) is nonempty. Therefore, by Proposition 2.67, ri (S(¯ ◦ ¯ x)) , which implies is nonempty. Consider z ∈ ri (S(¯ ¯ x), hξ, zi < 0, ∀ ξ ∈ S(¯
© 2012 by Taylor & Francis Group, LLC
382
Convex Semi-Infinite Optimization
which leads to ¯ x). hξ, zi < 0, ∀ ξ ∈ co S(¯ Again, by the Valadier formula, Theorem 2.97, [ ¯ x). ∂g(¯ x) = co ∂g(., i) = co S(¯ i∈I(¯ x)
Thus, hξ, zi < 0, ∀ ξ ∈ ∂g(¯ x). By the compactness of I, the supremum is attained by g(., i) over I. As dom g(., i) = Rn , dom g = Rn . Therefore, by Theorem 2.79, g ′ (¯ x, z) = max hξ, zi. ξ∈∂g(¯ x)
Also, because dom g = Rn , by Proposition 2.83, ∂g(¯ x) is compact and thus ¯ > 0 such that g ′ (¯ x, z) < 0, which implies that there exists λ ¯ g(¯ x + λz) < 0, ∀ λ ∈ (0, λ), ¯ which implies that for every λ ∈ (0, λ), g(¯ x + λz, i) < 0, ∀ i ∈ I. ¯ which yields Hence, x ¯ + λz ∈ CI for every λ ∈ (0, λ), z∈
1 (CI − x ¯) ⊂ cl cone (CI − x ¯) = TCI (¯ x). λ
¯ x))◦ was arbitrary, ri (S(¯ ¯ x))◦ ⊂ TC (¯ Because z ∈ ri (S(¯ x), which along with I the closedness of the tangent cone, TCI (¯ x), leads to ¯ x))◦ = cl (ri (S(¯ ¯ x))◦ ) ⊂ TC (¯ (S(¯ x). I From both cases, we obtain that x ¯ ∈ CI is a Lagrangian regular point. Because x ¯ was arbitrary, every feasible point is a Lagrangian regular point under the assumption of the Slater constraint qualification for (SIP ), thereby establishing the result.
11.5
Farkas–Minkowski Linearization
In the previous section on the Lagrangian regular point, observe that it is defined at a point and hence is a local qualification condition. We observed
© 2012 by Taylor & Francis Group, LLC
11.5 Farkas–Minkowski Linearization
383
that the notion of Lagrangian regular point is also known as the convex locally Farkas–Minkowski problem. In this section, we will discuss about the global qualification condition, namely the Farkas–Minkowski qualification studied in Goberna, L´ opez, and Pastor [51] and L´opez and Vercher [75]. Before introducing this qualification condition, let us briefly discuss the concept of Farkas–Minkowski system for a linear semi-infinite system from Goberna and L´ opez [50]. Consider a linear semi-infinite system Θ = {hxi , xi ≥ ci , i ∈ I}
(LSIS)
The relation h˜ x, xi ≥ c˜ is a consequence relation of the system Θ if every solution of Θ satisfies the relation. A consistent (LSIS) Θ is said to be a Farkas– Minkowski system, in short, an FM system, if every consequent relation is a consequence of some finite subsystem. Before we state the Farkas–Minkowski qualification for convex semi-infinite programming problem (SIP ), we present some results on the consequence relation and the FM system from Goberna and L´ opez [50]. Proposition 11.9 h˜ x, xi ≥ c˜ is a consequence relation of the consistent (LSIS) Θ if and only if (˜ x, c˜) ∈ cl cone co {(xi , ci ), i ∈ I; (0, −1)}. Proof. Denote by K ⊂ Rn+1 the convex cone K = cone co {(xi , ci ), i ∈ I; (0, −1)}. Consider i′ ∈ / I and define (xi′ , ci′ ) = (0, −1) and I ′ = {i′ } ∪ I. Thus K = cone co {(xi , ci ), i ∈ I ′ }. [I ′ ]
Suppose that (˜ x, c˜) ∈ cl K, which implies there exist {λk } ⊂ R+ , {sk } ∈ N, {xikj } ⊂ Rn and {cikj } ⊂ R satisfying ikj ∈ I ′ for j = 1, 2, . . . , sk , such that (˜ x, c˜) = lim
k→∞
sk X
λkj (xikj , cikj ).
(11.22)
j=1
As K ⊂ Rn+1 , by the Carath´eodory Theorem, Theorem 2.8, 1 ≤ sk ≤ n + 2. For any k ∈ N with sk < n + 2, define λkj = 0 and any arbitrary (xikj , cikj ) with ikj ∈ I ′ for j = sk + 1, sk + 2, . . . , n + 2. Therefore, condition (11.22) becomes n+2 X (˜ x, c˜) = lim λkj (xikj , cikj ). (11.23) k→∞
j=1
If x ¯ is a solution of (LSIS) Θ, hxikj , x ¯i ≥ cikj , j = 1, 2, . . . , n + 2,
© 2012 by Taylor & Francis Group, LLC
384
Convex Semi-Infinite Optimization
which along with the fact λkj ≥ 0, j = 1, 2, . . . , n + 2, leads to n+2 X j=1
λkj hxikj , x ¯i ≥
n+2 X
λkj cikj .
j=1
Taking the limit as k → +∞ in the above condition along with (11.23) yields h˜ x, x ¯i ≥ c˜. Because x ¯ was arbitrary, h˜ x, xi ≥ c˜ is a consequence relation of (LSIS) Θ. Conversely, suppose that h˜ x, xi ≥ c˜ is a consequence relation of (LSIS) Θ. We claim that (˜ x, c˜) ∈ cl K. On the contrary, suppose that (˜ x, c˜) 6∈ cl K. By the Strict Separation Theorem, Theorem 2.26 (iii), there exist (γ, γn+1 ) ∈ Rn × R with (γ, γn+1 ) 6= (0, 0) such that hγ, xi + γn+1 c > α ˜ = hγ, x ˜i + γn+1 c˜, ∀ (x, c) ∈ cl K.
(11.24)
As 0 ∈ K ⊂ cl K, the above condition implies that α ˜ < 0. We claim that hγ, xi + γn+1 c ≥ 0, ∀ (x, c) ∈ cl K. On the contrary, suppose that there exists (¯ x, c¯) ∈ cl K such that α ˜ < hγ, x ¯i + γn+1 c¯ < 0. Because cl K is a cone, λ(¯ x, c¯) ∈ cl K for λ > 0. Therefore, the above inequality along with the condition (11.24) implies that α ˜ < hγ, λ¯ xi + γn+1 λ¯ c < 0.
(11.25)
Taking the limit as λ → +∞, hγ, λ¯ xi + γn+1 λ¯ c → −∞, thereby contradicting the relation (11.25). Thus, hγ, xi + γn+1 c ≥ 0 > α ˜ , ∀ (x, c) ∈ cl K.
(11.26)
As (0, −1) ∈ K ⊂ cl K, for λ > 0, (0, −λ) ∈ cl K. Therefore, (11.26) leads to −λγn+1 ≥ 0, ∀ λ > 0, which implies that γn+1 ≤ 0. We now consider the following two cases. (i) γn+1 = 0: The condition (11.26) reduces to hγ, xi ≥ 0 > hγ, x ˜i, ∀ (x, c) ∈ cl K. In particular, for (xi , ci ) ∈ cl K, i ∈ I ′ , hγ, xi i ≥ 0 > hγ, x ˜i, ∀ i ∈ I ′ .
© 2012 by Taylor & Francis Group, LLC
(11.27)
11.5 Farkas–Minkowski Linearization
385
As (LSIS) Θ is consistent, there exists x ¯ ∈ Rn such that h¯ x, xi i ≥ ci , ∀ i ∈ I.
(11.28)
Therefore, from the inequalities (11.27) and (11.28), for any λ > 0, h¯ x + λγ, xi i ≥ ci , ∀ i ∈ I, which implies x ¯ + λγ is a solution of (LSIS) Θ. By our supposition, h˜ x, xi ≥ c˜ is a consequence relation of Θ, which implies that h˜ x, x ¯ + λγi ≥ c˜.
(11.29)
By the condition (11.27), as the limit λ → +∞, h¯ x + λγ, x ˜i → −∞, thereby contradicting the inequality (11.29). (ii) γn+1 < 0: As γn+1 6= 0, dividing (11.26) throughout by −γn+1 and setting −γ , x ¯= γn+1 h¯ x, xi − c ≥ 0 > h¯ x, x ˜i − c˜, ∀ (x, c) ∈ cl K. The above condition holds in particular for (xi , ci ) ∈ K ⊂ cl K, i ∈ I. Thus, h¯ x, xi i − ci ≥ 0 > h¯ x, x ˜i − c˜, ∀ i ∈ I, that is, h¯ x, xi i ≥ ci , i ∈ I
and
h¯ x, x ˜i < c˜.
Therefore, x ¯ is a solution of (LSIS) Θ but does not satisfy the consequence relation h˜ x, xi ≥ c˜, which is again a contradiction. Hence, our assumption was wrong and (˜ x, c˜) ∈ cl K, thereby completing the proof. Next we present the characterization of the FM system Θ from Goberna, L´ opez, and Pastor [51]. Proposition 11.10 (LSIS) Θ is an FM system if and only if (˜ x, c˜) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}. Proof. Suppose that (˜ x, c˜) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}.
© 2012 by Taylor & Francis Group, LLC
386
Convex Semi-Infinite Optimization
As in the proof of Proposition 11.9, consider i′ ∈ / I. Define (xi′ , ci′ ) = (0, −1) and I ′ = {i′ } ∪ I. Therefore, (˜ x, c˜) ∈ cone co {(xi , ci ), i ∈ I ′ } ⊂ Rn+1 , which by the Carath´eodory Theorem, Theorem 2.8, implies that there exist λj ≥ 0 and ij ∈ I ′ , j = 1, . . . , s, with 1 ≤ s ≤ n + 2 such that (˜ x, c˜) =
s X
λj (xij , cij ).
j=1
Invoking Proposition 11.9 to the finite system, h˜ x, xi ≥ c˜ is a consequence relation of the finite system hxij , xi ≥ cij , j = 1, 2, . . . , s. Therefore, (LSIS) Θ is an FM system. Conversely, suppose that Θ is an FM system, which implies that a consequence relation h˜ x, xi ≥ c˜ of the infinite system hxi , xi ≥ ci , i ∈ I can be expressed as a consequence of finite subsystem, that is, h˜ x, xi ≥ c˜ is a consequence relation of a finite subsystem hxi , xi ≥ ci , i = 1, 2, . . . , s. Applying Proposition 11.9 to the above finite system, (˜ x, c˜)
∈ cl cone co {(xi , ci ), i = 1, 2, . . . , s; (0, −1)} = cone co {(xi , ci ), i = 1, 2, . . . , s; (0, −1)}
which implies that (˜ x, c˜) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}, thereby establishing the desired result.
Now we introduce the Farkas–Minkowski qualification for (SIP ) from Goberna, L´ opez, and Pastor [51]. Definition 11.11 The convex semi-infinite programming problem (SIP ) is said to satisfy the Farkas–Minkowski (FM) qualification if (LSIS) Θ = {g(y, i) + hξ, x − yi ≤ 0 : (y, i) ∈ Rn × I, ξ ∈ ∂g(y, i)} is an FM system.
© 2012 by Taylor & Francis Group, LLC
11.5 Farkas–Minkowski Linearization
387
Using the FM qualification, we present the KKT optimality condition for (SIP ) from Goberna, L´ opez, and Pastor [51]. Theorem 11.12 Consider the convex semi-infinite programming problem (SIP ). Assume that the FM qualification holds. Then x ¯ ∈ Rn is a point of [I(¯ x)] minimizer of (SIP ) if and only if there exists λ ∈ R+ such that X 0 ∈ ∂f (¯ x) + λi ∂gi (¯ x). i∈supp λ
e that is, Proof. Define the solution set of (LSIS) Θ by C,
e = {x ∈ Rn : g(y, i) + hξ, x − yi ≤ 0, ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)}. C
Note that as dom g(., i) = Rn , i ∈ I, by Proposition 2.83, ∂g(y, i) is nonempty e is defined. We claim that CI = C. e for every y ∈ Rn and hence C Suppose that x ˜ ∈ CI , which implies that g(˜ x, i) ≤ 0, i ∈ I. For any y ∈ Rn and ξi ∈ ∂g(y, i) for i ∈ I, by Definition 2.77 of the subdifferential, g(y, i) + hξi , x − yi ≤ g(x, i), ∀ x ∈ Rn . In particular, for x = x ˜, the above inequality becomes g(y, i) + hξi , x ˜ − yi ≤ g(˜ x, i) ≤ 0, ∀ i ∈ I, e Because x e which leads to x ˜ ∈ C. ˜ ∈ CI was arbitrary, CI ⊂ C. e Conversely, suppose that x ˜ ∈ C, which implies that for every y ∈ Rn and i ∈ I, g(y, i) + hξ, x ˜ − yi ≤ 0, ∀ ξ ∈ ∂g(y, i).
In particular, taking y = x ˜, the above condition reduces to g(˜ x, i) ≤ 0, i ∈ I, e was arbitrary, C e ⊂ CI . Hence thereby implying that x ˜ ∈ CI . Because x ˜∈C e CI = C and thus the FM system Θ is a linearization of CI . As x ¯ is a point of minimizer of (SIP ), it is also the point of minimizer of the equivalent problem inf f (x)
subject to
x ∈ CI .
Because dom f = Rn , by Theorem 3.1, 0 ∈ ∂f (¯ x) + NCI (¯ x), which implies that there exists ξ ∈ ∂f (¯ x) such that −ξ ∈ NCI (¯ x). By Definition 2.36 of the normal cone, hξ, x − x ¯i ≥ 0, ∀ x ∈ CI ,
© 2012 by Taylor & Francis Group, LLC
388
Convex Semi-Infinite Optimization
and thus it is a consequence relation of (LSIS) Θ. As Θ is an FM system, by Proposition 11.10, there exist λj ≥ 0, ξj ∈ ∂g(¯ x, ij ), ij ∈ I, j = 1, 2, . . . , s, and µ ≥ 0 such that (ξ, hξ, x ¯i) =
s X j=1
λj (−ξj , g(yj , ij ) − hξj , yj i) + (0, −µ).
Without loss of generality, assume that λj > 0, j = 1, 2, . . . , s. Now multiplying the above condition throughout by (−x, 1) leads to hξ, x ¯ − xi =
s X j=1
λj (g(yj , ij ) + hξj , x − yj i) − µ.
As µ ≥ 0, the above relation leads to hξ, x ¯ − xi ≤
s X
λj (g(yj , ij ) + hξj , x − yj i).
(11.30)
g(yj , ij ) + hξj , x − yj i ≤ g(x, ij ), ∀ x ∈ Rn .
(11.31)
j=1
As ξj ∈ ∂g(¯ x, ij ), j = 1, 2, . . . , s,
Also, because ξ ∈ ∂f (¯ x), f (¯ x) − f (x) ≤ hξ, x ¯ − xi, ∀ x ∈ Rn .
(11.32)
Combining the conditions (11.30), (11.31) and (11.32), yields that f (¯ x) ≤ f (x) +
s X j=1
λj g(x, ij ), ∀ x ∈ Rn .
(11.33)
In particular, taking x = x ¯ in the above inequality, which along with the feasibility of x ¯ leads to 0≤
that is,
s X
s X j=1
λj g(¯ x, ij ) ≤ 0,
λj g(¯ x, ij ) = 0. Thus,
j=1
λj g(¯ x, ij ) = 0, ∀ j = 1, 2, . . . , s. By our supposition, λj > 0, j = 1, 2, . . . , s which implies that g(¯ x, ij ) = 0, that is, ij ∈ I(¯ x), j = 1, 2, . . . , s. Define λi = 0 for i ∈ I(¯ x) and i 6∈ {i1 , i2 , . . . , is }.
© 2012 by Taylor & Francis Group, LLC
11.5 Farkas–Minkowski Linearization
389
Therefore, from the inequality (11.33), x¯ is the minimizer of the unconstrained problem X inf f (x) + λi g(x, i) subject to x ∈ Rn , i∈supp λ
[I(¯ x)]
where λ ∈ R+ Theorem 2.89,
. By the optimality condition for the unconstrained problem,
0 ∈ ∂(f +
X
λi g(., i))(¯ x).
i∈supp λ
As dom f = dom g(., i) = Rn for i ∈ I(¯ x), by the Sum Rule, Theorem 2.91, X 0 ∈ ∂f (¯ x) + λi ∂g(¯ x, i), i∈supp λ
thereby establishing the KKT optimality conditions for (SIP ). The converse can be worked out along the lines of Theorem 11.2. Another notion that implies (LSIS) Θ is an FM system is that of the canonically closed system. Below we define this concept and a result relating a canonically closed system and an FM system from Goberna, L´opez, and Pastor [51]. Definition 11.13 (LSIS) Θ = {hxi , xi ≥ ci , i ∈ I} is canonically closed if the following conditions hold: (i) There exists x ˆ ∈ Rn such that hxi , x ˆi > ci , i ∈ I. (ii) The set {(xi , ci ), i ∈ I} is compact. The following result provides different conditions under which Θ is an FM system, part of the proof is due to Hestenes [55]. Proposition 11.14 If the consistent (LSIS) Θ satisfies one of the following conditions, then it is an FM system: (i) cone co {(xi , ci ), i ∈ I; (0, −1)}) is closed. (ii) cone co {(xi , ci ), i ∈ I} is closed. (iii) (LSIS) Θ is canonically closed. Proof. (i) Suppose that cone co {(xi , ci ), i ∈ I; (0, −1)} is closed. Then by Proposition 11.9, h˜ x, xi ≥ c˜ is the consequence relation of (LSIS) Θ if and only if (˜ x, c˜) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)},
© 2012 by Taylor & Francis Group, LLC
390
Convex Semi-Infinite Optimization
which by Proposition 11.10 is equivalent to Θ being an FM system. (ii) Define K = cone co {(xi , ci ), i ∈ I; (0, −1)}
and
It is easy to observe that
e = cone co {(xi , ci ), i ∈ I}. K
e + cone (0, −1). K=K
(11.34)
(˜ xk , c˜k ) = (xk , ck ) + λk (0, −1), ∀ k ∈ N,
(11.35)
e is closed. We claim that K is closed, which by (i) will then Suppose that K imply that Θ is an FM system. Consider a bounded sequence {(˜ xk , c˜k )} ⊂ K such that (˜ xk , c˜k ) → (˜ x, c˜). Note that (˜ xk , c˜k ) for k ∈ N can be expressed as e and {λk } ⊂ R+ . Assume that {λk } is an unbounded where {(xk , ck )} ⊂ K sequence, which implies λk → +∞. From the condition (11.35) 1 1 (˜ (xk , ck ) + (0, −1), ∀ k ∈ N. xk , c˜k ) = λk λk
As (˜ xk , c˜k ) → (˜ x, c˜), taking the limit as k → +∞ in the above condition implies that 1 (xk , ck ) → (0, 1), λk e ⊂ cl K. By Proposition 11.9, that is, (0, 1) ∈ cl K 0 = h0, xi ≥ 1
is a consequent relation of (LSIS) Θ, which is impossible. Thus, {λk } is a bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, assume that λk → λ. By the condition (11.35) and boundedness of the sequences {(˜ xk , c˜k )} and {λk }, {(xk , ck )} is also a bounded sequence. Without loss of generality, by the e is closed, (x, c) ∈ K. e Bolzano–Weierstrass Theorem, let (xk , ck ) → (x, c). As K Therefore, taking the limit as k → +∞, (11.35) along with (11.34) yields that (˜ x, c˜) = (x, c) + λ(0, −1) ∈ K,
and hence K is closed, which by (i) implies that (LSIS) Θ is an FM system. (iii) Suppose that Θ is a canonically closed system. Therefore, the set {(xi , ci ), i ∈ I} is compact. We claim that e = cone co {(xi , ci ), i ∈ I} ⊂ Rn+1 K
is closed. On the contrary, assume that it is not closed, which implies there
© 2012 by Taylor & Francis Group, LLC
11.5 Farkas–Minkowski Linearization
391
e such that (˜ exists a convergent sequence {(˜ xk , c˜k )} ⊂ K xk , c˜k ) → (˜ x, c˜) with e but (˜ e Because (˜ e there exist λi ≥ 0, ik ∈ I (˜ x, c˜) ∈ cl K x, c˜) ∈ / K. xk , c˜k ) ∈ K, j kj for j = 1, 2, . . . , sk , {sk } ⊂ N such that (˜ xk , c˜k ) =
sk X
λikj (xikj , cikj ).
j=1
By the Carath´eodory Theorem, Theorem 2.8, 1 ≤ sk ≤ n + 2. For sk < n + 2, choose any ikj ∈ I with λikj = 0, j = sk + 1, sk + 2, . . . , n + 2. Therefore the above condition can be rewritten as (˜ xk , c˜k ) =
n+2 X
λikj (xikj , cikj ).
(11.36)
j=1
As {(xi , ci ), i ∈ I} is compact, by the Bolzano–Weierstrass Theorem, {(xikj , cikj )} has a convergent subsequence. Without loss of generality, assume e that (xi , ci ) → (xi , ci ) ∈ {(xi , ci ), i ∈ I}. By assumption, (˜ x, c˜) ∈ / K, kj
j
kj
j
which implies {λikj } is unbounded. Otherwise, there exists some convergent subsequence of {λikj }. Without relabeling, assume that λikj → λij . Therefore, taking the limit as k → +∞ in (11.36) leads to (˜ x, c˜) =
n+2 X j=1
e λij (xij , cij ) ∈ K,
which is a contradiction of our assumption. λikj Pn+2 } ⊂ R+ is a Denote λk = j=1 λikj . Observe that the sequence { λk bounded sequence and hence by the Bolzano–Weierstrass Theorem has a conλikj → λij ≥ 0, vergent subsequence. Without loss of generality, assume that λk Pn+2 j = 1, 2, . . . , n+2, with j=1 λij = 1. Dividing the condition (11.36) throughout by λk and taking the limit as k → +∞, which along with the fact that λk → +∞ yields n+2 X
λj (xij , cij ) = 0
with
j=1
n+2 X
λij = 1.
(11.37)
j=1
As (LSIS) Θ is canonically closed, there exists x ˆ ∈ Rn such that hxi , x ˆi > ci , i ∈ I, that is, h(xi , ci ), (ˆ x, −1)i = hxi , x ˆi − ci > 0, ∀ i ∈ I.
(11.38)
Combining the relations (11.37) and (11.38) along with the fact that λij ≥ 0, j = 1, 2, . . . , n + 2, not all simultaneously zero, implies that 0=
n+2 X j=1
λij h(xij , cij ), (ˆ x, −1)i =
© 2012 by Taylor & Francis Group, LLC
n+2 X j=1
λij (hxij , x ˆi − cij ) > 0,
392
Convex Semi-Infinite Optimization
e is closed, which is impossible. Thus our assumption was wrong and hence K which by (ii) yields that Θ is an FM system.
As seen in Section 11.4, the Slater constraint qualification for (SIP ) implies that every feasible point is a Lagrangian regular point; we will now present the relation between the Slater constraint qualification for (SIP ) and the FM qualification. For this we will need the following result from Goberna, L´ opez, and Pastor [51]. Proposition 11.15 Consider a closed convex set F ⊂ Rn and let Fb denote the boundary points of F . Also consider (LSIS) Θ = {hxi , xi ≤ ci , i ∈ I} such that (i) every point of F is a solution of the system Θ, (ii) there exists x ˆ ∈ F such that hxi , xi < ci , i ∈ I, (iii) given any y ∈ Fb , there exists some i ∈ I such that hxi , yi = ci . Then F is the solution set of Θ, that is, F = {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}. Proof. Observe that by condition (i), F ⊂ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}. Conversely, suppose that there exists z ∈ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I} and z 6∈ F . By (ii), there exists x ˆ ∈ F such that hxi , xi < ci for every i ∈ I. As F is a closed convex set, the line segment joining x ˆ and z meets the boundary Fb at only one point, say y ∈ (ˆ x, z). Therefore, there exists λ ∈ (0, 1) such that y = (1 − λ)ˆ x + λz ∈ Fb . By condition (iii), there exists ¯i ∈ I such that hx¯i , yi = c¯i .
(11.39)
By the conditions on x ˆ and z, hx¯i , x ˆi < c¯i
and
hx¯i , zi ≤ c¯i ,
respectively. Thus hx¯i , yi = (1 − λ)hx¯i , x ˆi + λhx¯i , zi < c¯i , which is a contradiction to (11.39). Hence, F ⊃ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}, thereby establishing the result. Now we are in a position to present the implication that the Slater constraint qualification for (SIP ) leads to the FM qualification from L´opez and Vercher [75].
© 2012 by Taylor & Francis Group, LLC
11.5 Farkas–Minkowski Linearization
393
Proposition 11.16 Consider the convex semi-infinite programming problem (SIP ) with bounded feasible set CI . If the Slater constraint qualification for (SIP ) holds, then the FM qualification condition is also satisfied. Proof. Define g(x) = sup g(x, i) and CIb = {x ∈ CI : g(x) = 0}. Consider the i∈I
(LSIS) e = {hξ, xi ≤ hξ, yi, y ∈ C b , ξ ∈ ∂g(y)}. Θ I
e is a linear representation of CI . For any ξ ∈ ∂g(y), by We claim that Θ Definition 2.77 of the subdifferential, hξ, x − yi ≤ g(x) − g(y), ∀ x ∈ Rn .
(11.40)
As the Slater constraint qualification for (SIP ) holds, by condition (i) and (ii), the supremum g(x) is attained. Therefore, in particular, for y ∈ CIb and x ∈ CI , that is, g(y) = 0 and g(x) = supi∈I g(x, i) ≤ 0, respectively, the above inequality reduces to hξ, xi ≤ hξ, yi, ∀ ξ ∈ ∂g(y). e Because x ∈ CI was arbitrary, every point of CI is a solution of Θ. By condition (iii) of the Slater constraint qualification for (SIP ), there exists x ˆ ∈ Rn such that g(ˆ x, i) < 0, ∀ i ∈ I, that is, x ˆ ∈ CI . By the conditions (i) and (ii) of the Slater constraint qualification for (SIP ), g(ˆ x) < 0. Therefore, in particular, taking y ∈ CIb and x = x ˆ, the condition (11.40) becomes hξ, x ˆ − yi ≤ g(ˆ x) < 0, ∀ ξ ∈ ∂g(y)
(11.41)
for every y ∈ CIb . Also, in particular, taking y = y¯ ∈ CIb and x = y¯ in the inequality (11.40), the relation holds with equality. From the above discussion, it is obvious that the conditions of Proposition 11.15 are satisfied and thus, CI coincides with the solution set of (LSIS) e that is, Θ, CI = {x ∈ Rn : hξ, xi ≤ hξ, yi, ∀ y ∈ CIb , ∀ ξ ∈ ∂g(y)}.
(11.42)
e is canonically closed and hence is an FM system. We now show that Θ By the condition (11.41), hξ, x ˆi < hξ, yi, ∀ y ∈ CIb , ∀ ξ ∈ ∂g(y)
e and thus, the condition of (ii) of Definition 11.13 is satisfied. Therefore, for Θ to be a canonically closed system, we need to show that the set e = {(ξ, hξ, yi), y ∈ CIb , ξ ∈ ∂g(y)} K
© 2012 by Taylor & Francis Group, LLC
394
Convex Semi-Infinite Optimization
is compact. As CI is bounded and CIb ⊂ CI , therefore CIb is bounded. Also by condition (i) and (ii) of the Slater constraint qualification for (SIP ), the supremum is attained over I. Therefore, as dom g(., i) = Rn , i ∈ I, dom g = Rn , which by Theorem 2.69 is continuous over Rn . Thus, for a sequence {yk } ⊂ CIb with yk → y¯, g(yk ) → g(¯ y ). Also, as g(yk ) = 0 for every k ∈ N, g(¯ y ) = 0, which implies y¯ ∈ CIb . Hence, CIb is closed, thereby yielding the compactness of CIb . By Proposition 2.85, [ ∂g(CIb ) = {ξ ∈ Rn : ξ ∈ ∂g(y), y ∈ CIb } = ∂g(y) y∈CIb
is compact. e where {yk } ⊂ C b Now consider a convergent sequence {(ξk , hξk , yk i)} ⊂ K I ˜ γ˜ ), that is ξk → ξ˜ and ξk ∈ ∂g(yk ) ⊂ ∂g(CIb ). Suppose that (ξk , hξk , yk i) → (ξ, ˜ which by the compactness of ∂g(C b ) implies that and hξk , yk i → γ˜ . As ξk → ξ, I b ˜ ξ ∈ ∂g(CI ). Because {yk } ⊂ CIb , {yk } is a bounded sequence. By the Bolzano– Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of generality, assume that yk → y˜, which by compactness of CIb leads to y˜ ∈ CIb . As hξk , yk i → γ, which by the convergence of {ξk } and {yk } im˜ y˜i. Because ξk ∈ ∂g(yk ) with ξk → ξ˜ and yk → y˜, by plies that γ˜ = hξ, ˜ hξ, ˜ y˜i) ∈ K, e the Closed Graph Theorem, Theorem 2.84, ξ˜ ∈ ∂g(˜ y ). Thus (ξ, e thereby yielding the closedness of K. By the compactness of CIb and ∂g(CIb ), kyk ≤ M1 for every y ∈ CIb and e kξk ≤ M2 for every ξ ∈ ∂g(CIb ), respectively. Therefore, for any (ξ, hξ, yi) ∈ K along with the Cauchy–Schwarz inequality, Proposition 1.1, kξk2 + |hξ, yi| ≤ kξk2 + kξk kyk ≤ M1 (M1 + M2 )
e is bounded. Therefore, K e is compact, thus implying that the and hence K e e is system Θ is canonically closed, which by Proposition 11.14 yields that Θ an FM system. Next we claim that (LSIS) Θ = {hξ, x − yi ≤ −g(y, i), (y, i) ∈ Rn × I, ξ ∈ ∂g(y, i)} e that is, both Θ and Θ e have the same solution set. To is equivalent to Θ, establish this claim, we will prove that CI is the solution set of Θ. For any (y, i) ∈ Rn × I and ξi ∈ ∂g(y, i), by Definition 2.77 of the subdifferential, hξi , x − yi ≤ g(x, i) − g(y, i), ∀ x ∈ Rn . (11.43) In particular, taking x ∈ CI , that is, g(x, i) ≤ 0, ∀ i ∈ I,
© 2012 by Taylor & Francis Group, LLC
11.6 Noncompact Scenario: An Alternate Approach
395
the inequality (11.43) reduces to hξi , x − yi ≤ −g(y, i), ∀ (y, i) ∈ Rn × I, ∀ ξi ∈ ∂g(y, i), which implies x is a solution of (LSIS) Θ. Because x ∈ CI was arbitrary, every point of CI is a solution of Θ. By condition (iii) of the Slater constraint qualification, there exists xˆ ∈ Rn such that g(ˆ x, i) < 0, ∀ i ∈ I. In particular, taking x = x ˆ in (11.43) yields that for every (y, i) ∈ Rn × I, hξi , x ˆ − yi ≤ g(ˆ x, i) − g(y, i) < −g(y, i), ∀ ξi ∈ ∂g(y, i). Also, taking y = y˜ ∈ CIb , where CIb = {y ∈ Rn : there exists some i ∈ I such that g(y, i) = 0}, along with x = y˜ and ˜i ∈ I(˜ y ) in the condition (11.43) leads to hξ˜i , y˜ − y˜i = 0 = −g(˜ y , ˜i), ∀ ξi ∈ ∂g(˜ y , ˜i). As the conditions of Proposition 11.15 are satisfied, CI = {x ∈ Rn : hξ, x − yi ≤ −g(y, i), ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)}, (11.44) that is, CI is a solution set of (LSIS) Θ. e and Θ are equivalent From the conditions (11.42) and (11.44), both Θ e (LSIS). Because Θ is an FM system, Θ is also an FM system, which along with Definition 11.11 yields that (SIP ) satisfies the FM qualification condition, thereby establishing the requisite result.
11.6
Noncompact Scenario: An Alternate Approach
In this section we discuss the recent epigraphical approach, or more precisely the sequential approach studied in Chapter 7 as a tool to establish the KKT optimality conditions for (SIP ). This approach has been studied for convex programming problems with infinite constraints by Jeyakumar [66, 67] and Goberna, Jeyakumar, and L´opez [49]. Here we will present the KKT optimality conditions for (SIP ) from the work of Dinh, Mordukhovich, and Nghia [33] under the following relaxed closed cone constraint qualification for (SIP ), that is, [ cone co epi g(., i)∗ is closed. i∈I
© 2012 by Taylor & Francis Group, LLC
396
Convex Semi-Infinite Optimization
But before establishing the optimality conditions for (SIP ) as a consequence of the optimality conditions expressed in terms of the epigraph of the conjugate functions, we present a result from Jeyakumar [67]. ¯ and Proposition 11.17 Consider an lsc proper convex function φ : Rn → R define F = {x ∈ Rn : φ(x) ≤ 0}. If F is nonempty, then epi δF∗ = cl cone co epi φ∗ . Proof. Suppose that F is nonempty. From the definition of the indicator function to the set F , δF , φ(x) ≤ 0 = δF (x),
φ(x) ≤ +∞ = δF (x),
for x ∈ F,
for x ∈ / F.
Therefore, φ(x) ≤ δF (x) for every x ∈ Rn . By Proposition 2.103, δF∗ (ξ) ≤ φ∗ (ξ), ∀ ξ ∈ Rn .
(11.45)
We claim that cl cone co epi φ∗ ⊂ epi δF∗ . By Definition 2.101 of the conjugate function, δF∗ is the same as the support function to the set F , that is, δF∗ = σF . By Proposition 2.102, δF∗ is lsc, hence by Theorem 1.9, epi δF∗ is closed. Also, as σF is a sublinear function, by Theorem 2.59 epi σ is a convex cone. So it is sufficient to establish that epi φ∗ ⊂ epi δF∗ . Consider any (ξ, α) ∈ epi φ∗ , which by condition (11.45) implies that δF∗ (ξ) ≤ φ∗ (ξ) ≤ α. Therefore, (ξ, α) ∈ epi δF∗ . As (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ epi δF∗ . Because epi δF∗ is a closed convex cone, cl cone co epi φ∗ ⊂ epi δF∗ .
(11.46)
To complete the proof, we will prove the converse inclusion, that is, epi δF∗ ⊂ cl cone co epi φ∗ . Suppose that (ξ, α) ∈ / cl cone co epi φ∗ . As ∗ ∗ δF = σF is a sublinear function with δF (0) = 0. Therefore, (0, −1) 6∈ epi δF∗ , which by the relation (11.46) implies that (0, −1) 6∈ cl cone co epi φ∗ . Define the convex set Fe = {(1 − λ)(ξ, α) + λ(0, −1) ∈ Rn × R : λ ∈ [0, 1]}.
We claim that
Fe ∩ cl cone co epi φ∗ = ∅.
˜ ∈ (0, 1) such that On the contrary, suppose that there exists λ ˜ ˜ −1) ∈ cl cone co epi φ∗ . (1 − λ)(ξ, α) + λ(0,
© 2012 by Taylor & Francis Group, LLC
(11.47)
11.6 Noncompact Scenario: An Alternate Approach
397
We claim that {0} × R+ ⊂ cl cone co epi φ∗ . To establish this fact, it is sufficient to show that (0, 1) ∈ cl cone co epi φ∗ . On the contrary, suppose that (0, 1) 6∈ cl cone co epi φ∗ . Then by the Strict Separation Theorem, Theorem 2.26 (iii), there exist (a, γ) ∈ Rn × R with (a, γ) 6= (0, 0) such that ha, ξi + γα > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ .
(11.48)
As (0, 0) ∈ cl cone co epi φ∗ , γ < 0. We will show that ha, ξi + γα ≥ 0 > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ . On the contrary, suppose that (ξ, α) ∈ cl cone co epi φ∗ such that 0 > ha, ξi + γα > γ.
(11.49)
For any λ > 0, λ(ξ, α) ∈ cl cone co epi φ∗ , which by the conditions (11.48) and (11.49) implies that 0 > λ(ha, ξi + γα) > γ. Taking the limit as λ → +∞ in the above inequality, λ(ha, ξi + γα) → −∞, which is a contradiction. Therefore, ha, ξi + γα ≥ 0 > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ .
(11.50)
Consider any ξ ∈ dom φ∗ and ε > 0. Thus, (ξ, φ∗ (ξ) + ε) ∈ cl cone co epi φ∗ . Therefore, from the condition (11.50), ha, ξi + γ(φ∗ (ξ) + ε) ≥ 0, which implies 1 (ha, ξi + γφ∗ (ξ)) + γ ≥ 0. ε Taking the limit as ε → +∞ in the above inequality, which along with (11.50) yields that 0 > γ ≥ 0, which is a contradiction. Thus, (0, 1) ∈ cl cone co epi φ∗ and hence {0} × R+ = cone (0, 1) ⊂ cl cone co epi φ∗ . (11.51) From the relations (11.47) and (11.51), ˜ ˜ ˜ −1) + (0, λ) ˜ ∈ cl cone co epi φ∗ , (1 − λ)(ξ, α) = (1 − λ)(ξ, α) + λ(0,
© 2012 by Taylor & Francis Group, LLC
398
Convex Semi-Infinite Optimization
which implies (ξ, α) =
1 ˜ {(1 − λ)(ξ, α)} ∈ cl cone co epi φ∗ , ˜ (1 − λ)
thereby contradicting our assumption. Thus Fe ∩ cl cone co epi φ∗ = ∅.
As Fe is a compact convex set and cl cone co epi φ∗ is a closed convex cone, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists (a, γ) ∈ Rn ×R with (a, γ) 6= (0, 0) such that ha, zi + γβ > ha, z˜i + γ β˜
(11.52)
˜ ∈ Fe. As (0, 0) ∈ cl cone co epi φ∗ , for every (z, β) ∈ cl cone co φ∗ and (˜ z , β) 0 > ha, zi + γβ, ∀ (z, β) ∈ Fe.
Also, as (0, −1), (ξ, α) ∈ Fe, from condition (11.52), γ>0
and
ha, ξi + γα < 0.
(11.53)
Repeating the discussion as before, we can show that ha, zi + γβ ≥ 0 > ha, z˜i + γ β˜ ˜ ∈ Fe. For any ξ ∈ dom φ∗ , for every (z, β) ∈ cl cone co φ∗ and (˜ z , β) ∗ ∗ (ξ, φ (ξ)) ∈ cl cone co epi φ , which by the above inequality implies that ha, ξi + γφ∗ (ξ) ≥ 0, ∀ ξ ∈ dom φ∗ .
(11.54)
Because φ is lsc, by Theorem 2.105, φ = φ∗∗ . Therefore, by the conditions (11.53) and (11.54), φ(
−a −a −a ) = φ∗∗ ( ) = sup {hξ, i − φ∗ (ξ)} ≤ 0, γ γ γ ξ∈Rn
which implies that
−a ∈ F . Again using the condition (11.53), γ δF∗ (
−a −a −a ) = σF ( ) ≥ hξ, i > α, γ γ γ
which implies (ξ, α) ∈ / epi δF∗ , thereby establishing the desired result.
Now we move on to derive the optimality conditions in epigraphical form. Similar results have been studied in the form of generalized Farkas’ Lemma in Dinh, Goberna, and L´ opez [31] and Dinh, Goberna, L´opez, and Son [32].
© 2012 by Taylor & Francis Group, LLC
11.6 Noncompact Scenario: An Alternate Approach
399
Theorem 11.18 Consider the convex semi-infinite programming problem (SIP ). Then x ¯ is a point of minimizer of (SIP ) if and only if [ (0, −f (¯ x)) ∈ epi f ∗ + cl cone co epi g ∗ (., i). (11.55) i∈I
Proof. Suppose that x ¯ is a point of minimizer of (SIP ) and hence of the following unconstrained problem inf f (x) + δCI (x)
subject to
x ∈ Rn .
Therefore, by Theorem 2.89, 0 ∈ ∂(f + δCI )(¯ x), which by Theorem 2.108 and the fact that x ¯ ∈ CI implies that f (¯ x) + (f + δCI )∗ (0) = h0, x ¯i = 0. Therefore, the above condition leads to (0, −f (¯ x)) ∈ epi (f + δCI )∗ . As dom f = Rn , by Theorem 2.69, f is continuous over Rn . Thus, by Proposition 2.124, ∗ (0, −f (¯ x)) ∈ epi f ∗ + epi δC . (11.56) I Define the supremum function g(x) = supi∈I g(x, i), which implies that CI = {x ∈ Rn : g(x) ≤ 0}. Because x ¯ ∈ CI , CI is nonempty. Invoking Proposition 11.17, the condition (11.56) yields (0, −f (¯ x)) ∈ epi f ∗ + cl cone co epi g ∗ . Applying Theorem 2.123 to the above relation leads to [ (0, −f (¯ x)) ∈ epi f ∗ + cl cone co epi g ∗ (., i), i∈I
thereby leading to the desired condition. Conversely, suppose that the epigraphcal condition (11.55) is satisfied, which implies that there exist ξ ∈ dom f ∗ , α ≥ 0, λki ≥ 0, ξik ∈ dom g ∗ (., i) and αik ≥ 0 for i ∈ I such that (0, −f (¯ x)) = (ξ, f ∗ (ξ) + α) + lim
k→∞
© 2012 by Taylor & Francis Group, LLC
X i∈I
λki (ξik , g ∗ (ξik , i) + αik ).
400
Convex Semi-Infinite Optimization S
As cone co i∈I epi g ∗ (., i) ⊂ Rn+1 , by the Carath´eodory Theorem, Theorem 2.8, any element in the convex cone can be expressed as a sum of n + 2 S elements from i∈I epi g ∗ (., i). Therefore, the above condition becomes (0, −f (¯ x)) = (ξ, f ∗ (ξ) + α) + lim
k→∞
n+2 X
λkij (ξikj , g ∗ (ξikj , ij ) + αikj ),
j=1
where ij ∈ I, j = 1, 2, . . . , n + 2. Componentwise comparison leads to 0
−f (¯ x)
= ξ + lim
k→∞
n+2 X
λkij ξikj ,
(11.57)
j=1
n+2 X
= f ∗ (ξ) + α + lim
k→∞
λkij (g ∗ (ξikj , ij ) + αikj ).
(11.58)
j=1
By Definition 2.101 of the conjugate function, condition (11.58) yields f (¯ x) − f (x)
≤ ≤
−hξ, xi − α − lim
k→∞
−hξ, xi − α − lim
k→∞
n+2 X
λkij (g ∗ (ξikj , ij ) + αikj )
i=1
n+2 X i=1
λkij (hξikj , xi − g(ξikj , ij ) + αikj ), ∀ x ∈ Rn .
Using condition (11.57), for every x ∈ CI , the above inequality leads to f (¯ x) − f (x) ≤ −α − lim
k→∞
n+2 X
λkij αikj ,
i=1
which by the nonnegativity of α and αikj , j = 1, 2, . . . , n + 2, yields f (¯ x) ≤ f (x), ∀ x ∈ CI . Thus, x ¯ is a point of minimizer of (SIP ), thereby establishing the result. We end this chapter by presenting the KKT optimality condition for (SIP ) from Dinh, Mordukhovich, and Nghia [33]. But before that we define the set of active constraint multipliers as [I]
x, i) = 0, ∀ i ∈ supp λ}. A(¯ x) = {λ ∈ R+ : λi g(¯ Theorem 11.19 Consider the convex semi-infinite programming problem (SIP ). Assume that the closed cone constraint qualification holds. Then x ¯ is a point of minimizer of (SIP ) if and only if there exists λ ∈ A(¯ x) such that X 0 ∈ ∂f (¯ x) + λi ∂g(¯ x, i). i∈supp λ
© 2012 by Taylor & Francis Group, LLC
11.6 Noncompact Scenario: An Alternate Approach
401
Proof. By Theorem 11.18, x ¯ is a point of minimizer of (SIP ) if and only if condition (11.55) is satisfied. As the closed cone constraint qualification is satisfied, (11.55) reduces to [ (0, −f (¯ x)) ∈ epi f ∗ + cone co epi g ∗ (., i). i∈I
By Theorem 2.122, there exist ξ ∈ ∂ε f (¯ x), ε ≥ 0, λi ≥ 0, ξi ∈ ∂εi g(¯ x, i) and εi ≥ 0, i ∈ I such that X (0, −f (¯ x)) = (ξ, hξ, x ¯i − f (¯ x) + ε) + λi (ξi , hξi , x ¯i − g(¯ x, i) + εi ). i∈I
Componentwise comparison leads to X 0 = ξ+ λi ξi ,
(11.59)
i∈I
−f (¯ x)
=
(hξ, x ¯i − f (¯ x) + ε) +
X i∈I
λi (hξi , x ¯i − g(¯ x, i) + εi ). (11.60)
Using the condition (11.59), (11.60) reduces to X 0=ε+ λi (−g(¯ x, i) + εi ). i∈I
The above condition along with the fact that x¯ ∈ CI , that is, g(¯ x, i) ≤ 0, i ∈ I and the nonnegativity of ε, εi and λi , i ∈ I, implies that ε = 0,
λi εi = 0
and
λi g(¯ x, i) = 0, i ∈ I.
Thus, for i ∈ supp λ, εi = 0 and λ ∈ A(¯ x). Therefore, ξ ∈ ∂f (¯ x) and ξi ∈ ∂g(¯ x, i), i ∈ supp λ satisfying X 0=ξ+ λi ξi . i∈supp λ
Therefore, for λ ∈ A(¯ x), 0 ∈ ∂f (¯ x) +
X
λi ∂g(¯ x, i),
(11.61)
i∈supp λ
thereby yielding the KKT optimality condition for (SIP ). Conversely, suppose that (11.61) holds, which implies that there exist ξ ∈ ∂f (¯ x) and ξi ∈ ∂g(¯ x, i), i ∈ supp λ such that X 0=ξ+ ξi . (11.62) i∈supp λ
© 2012 by Taylor & Francis Group, LLC
402
Convex Semi-Infinite Optimization
By Definition 2.77 of the subdifferential, for every x ∈ Rn , f (x) g(x, i)
≥ f (¯ x) + hξ, x − x ¯i,
≥ g(¯ x, i) + hξ, x − x ¯i, i ∈ supp λ,
which along with the condition (11.62) implies that X X f (x) + λi g(x, i) ≥ f (¯ x) + λi g(¯ x, i), ∀ x ∈ Rn . i∈supp λ
i∈supp λ
As λ ∈ A(¯ x), λi g(¯ x, i) = 0 for i ∈ supp λ, which for every x ∈ CI reduces the above inequality to X f (x) ≥ f (x) + λi g(x, i) ≥ f (¯ x), ∀ x ∈ CI . i∈supp λ
Therefore, x ¯ is a point of minimizer of (SIP ), hence completing the proof.
© 2012 by Taylor & Francis Group, LLC
Chapter 12 Convexity in Nonconvex Optimization
12.1
Introduction
This is the final chapter of the book. What we want to discuss here is essentially outside the preview of convex optimization. Yet as we will see, convexity will play a fundamental role in the issues discussed. We will discuss here two major areas in nonconvex optimization, namely maximization of a convex function and minimization of a d.c. function. The acronym d.c. stands for difference convex function, that is, functions expressed as the difference of two convex functions. Thus, more precisely, we would look into the following problems: max f (x)
subject to
min f (x) − g(x)
and
x∈C
subject to
x ∈ C,
(P 1) (P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. A large class of nonconvex optimization problems actually come into this setting. Note that (P 1) can be posed as min − f (x)
subject to
x∈C
and thus as min φ(x) − f (x)
subject to
x ∈ C,
where φ is the zero function. Thus the problem (P 1) can also be viewed as a special case of (P 2), though we will consider them separately for a better understanding.
12.2
Maximization of a Convex Function
The problem of maximizing a convex function over a convex set is a complete paradigm shift from that of minimization of a convex function over a convex 403 © 2012 by Taylor & Francis Group, LLC
404
Convexity in Nonconvex Optimization
set. The problem of maximization of a convex function is, in fact, a hard nonconvex minimization problem. One of the early results in this direction appears in the classic text of Rockafellar [97] and we will mention a few of them here in order to motivate the reader. The first point that the reader should note is that local maxima of a convex function need not be global maxima. We leave it to the reader to create some examples that bring out this fact. The following result is given in Rockafellar [97]. We will not provide any proof. See Rockafellar [97] for the proof. Theorem 12.1 Consider a convex function f : Rn → R and a convex set C ⊂ Rn . If f attains its supremum relative to C at some point in ri C, then f is constant on C. The above theorem says that if f is a nonconstant convex function and if it attains its supremum on C, then it must be attained at the boundary. Of course the more interesting question is when does the convex function actually attains its maximum? In this respect, one has the following interesting result from [97] where the set C is assumed to be polyhedral. Theorem 12.2 Consider a convex function f : Rn → R and a convex set C ⊂ Rn that is polyhedral. Suppose that there are no half lines in C on which f is unbounded above. Then f attains its supremum over C. For more general results, see [97]. One of the earliest papers dealing exclusively with the optimality conditions of maximizing a convex function over a convex set is due to Strekalovskii [104]. Though Strekalovskii [104] frames his problem in a general setting, his results are essentially useful for the convex case and the main results in his paper are given only for the convex case. Observe that if f : Rn → R is a convex function and x ¯ ∈ C is the point where f attains a global maximum, then for every x ∈ C, 0 ≥ f (x) − f (¯ x) ≥ hξ, x − x ¯i, ∀ ξ ∈ ∂f (¯ x), which implies hξ, x − x ¯i ≤ 0, ∀ ξ ∈ ∂f (¯ x), ∀ x ∈ C. Thus the necessary condition is ∂f (¯ x) ⊂ NC (¯ x). The reader should try to find a necessary condition when x ¯ is a local maximum. Can we find a sufficient condition for a global maximum? Strekalovskii [104] attempts to answer this question by developing a set of necessary and sufficient conditions. Theorem 12.3 Consider the problem of maximizing a convex function f over a closed convex set C. Assume that x ¯ ∈ C is a point such that −∞ < infn f (x) < f (¯ x) < +∞ x∈R
© 2012 by Taylor & Francis Group, LLC
12.2 Maximization of a Convex Function
405
and the set C¯ = {x ∈ Rn : f (x) ≤ f (¯ x)} is compact having a nonempty interior, that is, int C¯ = 6 ∅. Then x ¯ ∈ C is a global maximum of f on C if and only if (a) for every x∗ ∈ ∂f (¯ x), hx∗ , x − x ¯i ≤ 0, ∀ x ∈ C or, (b) for every y ∗ ∈ S(f, x ¯), hy ∗ , x − x ¯i ≤ 1, ∀ x ∈ C where S(f, x ¯) = {y ∗ ∈ Rn : ∃ y ∈ Rn , y 6= x ¯, f (y) = f (¯ x) and ∗ ∃ α > 0, αy ∈ ∂f (y) satisfying hy ∗ , y − x ¯i = 1}. Proof. We will only prove (a) and leave (b) to the readers. If x¯ is a global maximum, then our discussion preceding the theorem shows that (a) holds, that the condition in (a) is necessary. Now we will look into the reverse, that is, whether (a) is sufficient for a global maximum or not. Observe that under the given hypothesis, for every x∗ ∈ ∂f (¯ x), hx∗ , x − x ¯i ≤ 0, ∀ x ∈ C. As dom f = Rn , by Theorem 2.69, f is a continuous convex function, thus the set C¯ is closed and convex. Also, from the above inequality, cone ∂f (¯ x) ⊂ NC (¯ x). Further, as C¯ has a nonempty interior, there exists x ˆ such that f (ˆ x) < f (¯ x). Hence NC¯ (¯ x) = {λξ : λ ≥ 0, ξ ∈ ∂f (¯ x)}. Thus, NC¯ (¯ x) = cone ∂f (¯ x). This shows that NC¯ (¯ x) ⊂ NC (¯ x), which implies ¯ Hence x that C ⊂ C. ¯ is the point where the maximum is achieved as x ¯ is already given to be an element of C. It is important to note that without the additional conditions, ∂f (¯ x) ⊂ NC (¯ x) does not render a global maximum. Here we put forward an example from Dutta [38]. Consider f : R → R defined as f (x) = max{x2 , x}. Now suppose that we want to maximize f over C = [−1, 0]. Consider x ¯ = 0. Thus NC (¯ x) = R+ = {x ∈ R : x ≥ 0}. Observe that ∂f (0) = [0, 1]. Therefore, ∂f (0) ⊂ NC (0). However, x ¯ = 0 is a global minimizer of f over C and not a global maximizer. Strekalovskii refined the above result slightly to provide the following result. This appeared in [105].
© 2012 by Taylor & Francis Group, LLC
406
Convexity in Nonconvex Optimization
Theorem 12.4 Consider a closed convex set C ⊂ Rn and let x ¯ ∈ C. Assume that −∞ ≤ infn f (x) < f (¯ x), x∈R
where f : Rn → R is a convex function. Then x ¯ ∈ C is a global maximum for (P 1) if and only if ∂f (x) ⊂ NC (x), ∀ x ∈ Rn satisfying f (x) = f (¯ x). Readers are requested to have a look at the difference between Strekalovskii’s result in Theorem 12.3 and this result. Though the above result is elegant, it suffers from a drawback, that is, one needs to calculate NC (x) for every x ∈ Rn satisfying f (x) = f (¯ x). Now if x ∈ / C, then traditionally we define NC (x) = ∅. However, for a convex function f : Rn → R, ∂f (x) 6= ∅ for every x ∈ Rn . This drawback was overcome by Hiriart-Urruty and Ledyaev [61]. We now present their result but with a different approach to the proof. Theorem 12.5 Consider a convex function f : Rn → R and a closed convex set C ⊂ R. Let x ¯ ∈ C be such that −∞ ≤ inf f (x) < f (¯ x). x∈C
Then x ¯ ∈ C is a maximizer for (P 1) if and only if ∂f (x) ⊂ NC (x), ∀ x ∈ C satisfying f (x) = f (¯ x). Proof. If x ¯ ∈ C is the global maximizer of the function f over C, then we have already seen that ∂f (¯ x) ⊂ NC (¯ x). It is simple to see that if f (x) = f (¯ x), ∂f (x) ⊂ NC (x). We leave this very simple proof to the reader. Conversely, assume on the contrary that x ¯ ∈ C is not a global maximizer of (P 1). Therefore, there exists x ˆ ∈ C such that f (ˆ x) > f (¯ x). Consider the following level set S(¯ x) = {x ∈ C : f (x) ≤ f (¯ x)}, which is a closed convex set. It is clear that x ˆ 6∈ S(¯ x). Thus, the following projection problem: min
1 kx − x ˆk2 2
subject to f (x) ≤ f (¯ x), x ∈ C
has a unique solution. Let x ˜ ∈ C be that unique solution. Now using the Fritz John optimality conditions for a convex optimization problem, Theorem 5.1, there exist λ0 ≥ 0 and λ1 ≥ 0 with (λ0 , λ1 ) 6= (0, 0) such that (i) 0 ∈ λ0 (˜ x−x ˆ) + λ1 ∂f (˜ x) + NC (˜ x),
© 2012 by Taylor & Francis Group, LLC
12.2 Maximization of a Convex Function
407
(ii) λ1 (f (˜ x) − f (¯ x)) = 0. Assume that λ0 = 0, which implies λ1 > 0. Thus the above conditions reduce to 0 ∈ λ1 ∂f (˜ x) + NC (˜ x) and f (˜ x) = f (¯ x). The condition 0 ∈ λ1 ∂f (˜ x) + NC (˜ x) leads to the expression 0 ∈ ∂f (˜ x) + NC (˜ x). This is obtained by dividing both sides by λ1 and noting that NC (˜ x) is a cone. As f is convex, invoking Theorem 3.1, x ˜ ∈ C is a point of minimizer of f over C, that is, f (˜ x) = inf f (x). x∈C
The condition f (˜ x) = f (¯ x) along with the given hypothesis yields that −∞ ≤ inf f (x) < f (˜ x), x∈C
thereby contradicting the fact that x ˜ is the point of minimizer of f over C. Hence λ0 > 0. Now assume that λ1 = 0. Therefore, the facts that λ0 > 0 and NC (˜ x) is a cone yield that 0 ∈ (˜ x−x ˆ) + NC (˜ x), that is, x ˆ−x ˜ ∈ NC (˜ x). Because x ˆ ∈ C, 0 ≥ hˆ x−x ˜, x ˆ−x ˜i = kˆ x−x ˜k2 , implying that x ˜=x ˆ. This is indeed a contradiction. Hence λ1 > 0. Thus there exist ξ ∈ ∂f (˜ x) and η ∈ NC (˜ x) such that 0 = λ0 (˜ x−x ˆ) + λ1 ξ + η.
(12.1)
As f (˜ x) = f (¯ x), by the given hypothesis, ∂f (˜ x) ⊂ NC (˜ x), which implies −hλ1 ξ, x ˆ−x ˜i ≥ 0.
(12.2)
Further, it is simple to see that −hη, x ˆ−x ˜i + λ0 kˆ x−x ˜k2 > 0.
(12.3)
The conditions (12.1), (12.2), and (12.3) lead to a contradiction, thereby establishing the result.
© 2012 by Taylor & Francis Group, LLC
408
12.3
Convexity in Nonconvex Optimization
Minimization of d.c. Functions
In this section we will concentrate on deriving the optimality condition for local and global minimization of a very important class of nonconvex problems. These problems are the ones where the objective function can be expressed as the difference of two convex functions. Such functions are referred to as difference convex functions or d.c. functions. Thus we will concentrate on the problem min f (x) − g(x)
subject to
x∈C
(P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. Note that f (x) − g(x) need not be a convex function unless g is a linear or affine function. So in general it is a nonconvex function. We begin by providing a necessary optimality condition for a local optimal point. Theorem 12.6 Consider the problem (P 2) and let x ¯ be a local minimizer of (P 2) where C = Rn . Then ∂f (¯ x) ∩ ∂g(¯ x) 6= ∅. Proof. Let x ¯ be a local minimum. As f − g is locally Lipschitz, 0 ∈ ∂ ◦ (f − g)(¯ x). For details, see Clarke [27] or Chapter 3. Hence, by the Sum Rule of the Clarke subdifferential, 0 ∈ ∂ ◦ f (¯ x) + ∂ ◦ (−g)(¯ x). Noting that ∂ ◦ f (¯ x) = ∂f (¯ x) and ∂ ◦ (−g)(¯ x) = −∂ ◦ g(¯ x) = −∂g(¯ x), the above condition becomes 0 ∈ ∂f (¯ x) − ∂g(¯ x). This yields that ∂f (¯ x) ∩ ∂g(¯ x) 6= ∅. We would again like to stress that for details on the Clarke subdifferential, see Clarke [27]. Let us note that the above condition is only necessary and not sufficient. Consider h(x) = f (x) − g(x), where f (x) = x2 and g(x) = |x|. At x ¯ = 0, ∂f (0) = 0 and ∂g(0) = [−1, 1]. Thus, ∂f (0) ∩ ∂g(0) = {0}. But it is clear that x ¯ = 0 is not a local minimizer of h.
© 2012 by Taylor & Francis Group, LLC
12.3 Minimization of d.c. Functions
409
Now let us see what happens if we consider C ⊂ Rn . In this case, one would have 0 ∈ ∂ ◦ (f − g)(¯ x) + NC (¯ x), (see Clarke [27] for more details). Hence, 0 ∈ ∂f (¯ x) − ∂g(¯ x) + NC (¯ x). Thus there exist ξf ∈ ∂f (¯ x), ξg ∈ ∂g(¯ x) and η ∈ NC (¯ x) such that ξg = ξf + η. Thus, the optimality condition can now be stated as follows: If x ¯ is a local minimum for (P 2), there there exists ξg ∈ ∂g(¯ x) such that ξg ∈ ∂f (¯ x) + NC (¯ x). For C = Rn , if x ¯ is a global minimum for (P 2), f (x) − g(x) ≥ f (¯ x) − g(¯ x), ∀ x ∈ Rn . Therefore, f (x) − f (¯ x) ≥ g(x) − g(¯ x) ≥ hξg , x − x ¯i, ∀ ξg ∈ ∂g(¯ x), thereby implying that ∂g(¯ x) ⊂ ∂f (¯ x). Note that this is again a necessary condition and not sufficient. We urge the reader to find an example demonstrating this fact. We now present interesting and important necessary and sufficient optimality conditions for the global optimization of problem (P 2). Here the optimality conditions will be expressed in terms of the ε-subdifferential. We present this result as given in Bomze [15]. Theorem 12.7 Consider the problem (P 2) with C = Rn . Then x ¯ ∈ Rn is a global minimizer of (P 2) if and only if ∂ε g(¯ x) ⊂ ∂ε f (¯ x), ∀ ε > 0. Proof. As x ¯ ∈ Rn is a global minimizer of (f − g) over Rn , f (x) − f (¯ x) ≥ g(x) − g(¯ x), ∀ x ∈ Rn . If ξ ∈ ∂ε g(¯ x) for any ε > 0, f (x) − f (¯ x) ≥ g(x) − g(¯ x) ≥ hξ, x − x ¯i − ε, ∀ x ∈ Rn ,
© 2012 by Taylor & Francis Group, LLC
410
Convexity in Nonconvex Optimization
thereby implying that ξ ∈ ∂ε f (¯ x). Because ε > 0 was arbitrary, this establishes that ∂ε g(¯ x) ⊂ ∂ε f (¯ x), ∀ ε > 0. Let us now look at the converse. On the contrary, assume that x ¯ is not a global minimizer of (P 2), which implies that there exists x ˆ ∈ Rn such that f (ˆ x) − g(ˆ x) < f (¯ x) − g(¯ x). This yields that f (¯ x) − f (ˆ x) − g(¯ x) + g(ˆ x) > 0. Set δ = (1/2)(f (¯ x) − f (ˆ x) − g(¯ x) + g(ˆ x)). It is simple to see that δ > 0. Now consider ξˆ ∈ ∂g(ˆ x), which implies that ˆx g(¯ x) − g(ˆ x) − hξ, ¯−x ˆi ≥ 0. Because δ > 0, ˆx g(¯ x) − g(ˆ x) − hξ, ¯−x ˆi + δ > 0. ˆx Set ε = g(¯ x) − g(ˆ x) − hξ, ¯−x ˆi + δ. Then for any x ∈ Rn , ˆ x−x hξ, ¯i − ε
ˆ x−x = hξ, ˆ+x ˆ−x ¯i − ε ˆ x−x = hξ, ˆi − δ + g(ˆ x) − g(¯ x).
As ξˆ ∈ ∂g(ˆ x), it is clear that ξˆ ∈ ∂δ g(ˆ x), which leads to ˆ x−x hξ, ˆi − δ + g(ˆ x) ≤ g(x). Thus, ˆ x−x hξ, ¯i − ε ≤ g(x) − g(¯ x), ∀ x ∈ Rn , thereby implying that ξˆ ∈ ∂ε g(¯ x). By the given hypothesis, ξˆ ∈ ∂ε f (¯ x). Therefore, in particular for x = x ˆ, ˆx f (ˆ x) − f (¯ x) ≥ hξ, ˆ−x ¯i − ε. Now 2δ
= f (¯ x) − f (ˆ x) − (g(¯ x) − g(ˆ x)) ˆx ≤ ε − hξ, ˆ−x ¯i − (g(¯ x) − g(ˆ x)).
The way in which ε is defined leads to ˆx ε − (g(¯ x) − g(ˆ x)) = δ + hξ, ˆ−x ¯i.
© 2012 by Taylor & Francis Group, LLC
12.3 Minimization of d.c. Functions
411
Hence, ˆx ˆx 2δ ≤ δ + hξ, ˆ−x ¯i − hξ, ˆ−x ¯i = δ < 2δ, which is a contradiction. Thus, x ¯ is indeed a global solution for (P 2).
Note that the above result also holds true if we assume f : Rn → R∪{+∞} and g : Rn → R. In that case, one just has to assume that x ¯ ∈ dom f . The reader is encouraged to sketch the proof for such a scenario. However, we present the result here for the sake of convenience. Theorem 12.8 Consider the problem (P 2) with C = Rn and a lower semicontinuous convex function f : Rn → R ∪ {+∞} with dom f 6= ∅. Let x ¯ ∈ dom f . Then x ¯ is a global minimum for (P 2) if and only if ∂ε g(¯ x) ⊂ ∂ε f (¯ x), ∀ ε > 0. Using the above result, one can deduce an optimality conditions for the case when C ⊂ Rn and f : Rn → R. Observe that when C ⊂ Rn and f : Rn → R, the problem (P 2) can be equivalently written as min (f + δC )(x) − g(x)
subject to
x ∈ Rn .
Hence, x ¯ is a global minimum for (P 2) if and only if ∂ε g(¯ x) ⊂ ∂ε (f + δC )(¯ x), ∀ ε > 0. This is done of course by applying Theorem 12.8. Invoking the Sum Rule of ε-subdifferential, Theorem 2.115, [ ∂ε (f + δC )(¯ x) = (∂ε1 f (¯ x) + ∂ε2 δC (¯ x)). ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε
Hence, ∂ε g(¯ x) ⊂
[
ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε
(∂ε1 f (¯ x) + Nε2 ,C (¯ x)), ∀ ε > 0.
We just recall that ∂ε2 δC (¯ x) = Nε2 ,C (¯ x) for any ε2 ≥ 0. The result Theorem 12.8 can also be used to deduce necessary and sufficient optimality conditions for the problem (P 1). Corollary 12.9 Consider the problem (P 1). Assume that C ⊂ Rn is a closed convex set. The x ¯ ∈ C is a global maximum for (P 1) if and only if ∂ε f (¯ x) ⊂ Nε,C (¯ x), ∀ ε > 0.
© 2012 by Taylor & Francis Group, LLC
412
Convexity in Nonconvex Optimization
Proof. Observe that the problem (P 1) can be written as min − f (x)
subject to
x ∈ C.
A further equivalent version can be given by min (δC + f )(x)
subject to
x ∈ Rn .
Using Theorem 12.8, the optimality condition is ∂ε f (¯ x) ⊂ ∂ε δC (¯ x), ∀ ε > 0, that is, ∂ε f (¯ x) ⊂ Nε,C (¯ x), ∀ ε > 0, thereby establishing the result.
We end our discussion and the book here. However for more details on the use of the above results, see for example Bomze [15], Hiriart-Urruty [60], and the references therein.
© 2012 by Taylor & Francis Group, LLC
Bibliography
[1] F. A. Al-Khayyal and J. Kyparisis. Finite convergence of algorithms for nonlinear programs and variational inequalities. J. Optim. Theory Appl., 70:319–332, 1991. [2] H. Attouch and H. Br´ezis. Duality for the sum of convex functions in general banach spaces. In Aspects of Mathematics and its Applications, pages 125–133. Amsterdam, 1986. [3] H. Attouch, G. Buttazzo, and G. Michaille. Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization. MPS/SIAM Series on Optimization, SIAM, Philadelphia, PA; MPS, Philadelphia, PA, 2006. [4] J.-P. Aubin and I. Ekeland. Interscience, New York, 1984.
Applied Nonlinear Analysis.
Wiley-
[5] A. Auslender. Optimisation. M´ethodes num´eriques. Maˆıtrise de Mathmatiques et Applications Fondamentales. Masson, Paris-New YorkBarcelona, 1976. [6] A. Ben-Tal and A. Ben-Israel. Characterizations of optimality in convex programming: The nondifferentiable case. Applicable Anal., 9:137–156, 1979. [7] A. Ben-Tal, A. Ben-Israel, and S. Zlobec. Characterization of optimality in convex programming without a constraint qualification. J. Optim. Theory Appl., 20:417–437, 1976. [8] A. Ben-Tal and A. Nemirovskii. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. MPS/SIAM Series on Optimization, SIAM, Philadelphia, PA, 2001. [9] A. Ben-Tal, E. E. Rosinger, and A. Ben-Israel. A Helly-type theorem and semi-infinite programming. In Constructive Approaches to Mathematical Models, pages 127–135. Academic Press, New York-London-Toronto, Ont., 1979. [10] C. Berge. Topological Spaces. Including a Treatment of Multi-Valued Functions, Vector Spaces and Convexity. Dover Publications, Inc., Mineola, NY, 1997. 413 © 2012 by Taylor & Francis Group, LLC
414
Bibliography
[11] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1999. [12] D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific, Belmont, MA, 2003. [13] D. P. Bertsekas and A. E. Ozdaglar. Pseudonormality and Lagrange multipler theory for constrained optimization. J. Optim. Theory Appl., 114:287–343, 2002. [14] D. P. Bertsekas, A. E. Ozdaglar, and P. Tseng. Enhanced Fritz John for convex programming. SIAM J. Optim., 16:766–797, 2006. [15] I. M. Bomze. Global optimization: A quadratic programming perspective. In Nolinear Optimization, Lecture Notes in Math., volume 1989. Springer-Verlag, Berlin, 2010. [16] J. M. Borwein. Direct theorems in semi-infinite convex programming. Math. Programming, 21:301–318, 1981. [17] J. M. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics, SpringerVerlag, New York, 2000. [18] J. M. Borwein and Q. J. Zhu. Techniques of Variational Analysis. Springer, New York, 2005. [19] A. Brøndsted and R. T. Rockafellar. On the subdifferentiability of convex functions. Proc. Am. Math. Soc., 16:605–611, 1965. [20] R. S. Burachik and V. Jeyakumar. A dual condition for the convex subdifferential sum formula with applications. J. Convex Anal., 12:279– 290, 2005. [21] R. S. Burachik and V. Jeyakumar. A new geometric condition for Fenchel’s duality in infinite dimensional spaces. Math. Programming, 104:229–233, 2005. [22] J. V. Burke and S. Deng. Weak sharp minima revisited. Part I: Basic theory. Control Cybernetics, 31:439–469, 2002. [23] J. V. Burke and S. Deng. Weak sharp minima revisited. Part II: Application to linear regularity and error bounds. Math. Programming, 104:235–261, 2005. [24] J. V. Burke and S. Deng. Weak sharp minima revisited Part III: Error bounds for differentiable convex inclusions. Math. Programming, 116:37– 56, 2009. [25] J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical programming. SIAM J. Control Optim., 31:1340–1359, 1993.
© 2012 by Taylor & Francis Group, LLC
Bibliography
415
[26] E. W. Cheney. Introduction to Approximation Theory. McGraw-Hill, New York, 1966. [27] F. H. Clarke. Optimization and Nonsmooth Analysis. Wiley Interscience, New York, 1983. [28] F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski. Nonsmooth Analysis and Control Theory, volume 178: Graduate Texts in Mathematics. Springer-Verlag, New York. [29] L. Cromme. Strong uniqueness. Numer. Math., 29:179–193, 1978. [30] V. F. Demyanov and A. M. Rubinov. Constructive Nonsmooth Analysis. Approximation & Optimization. 7: Peter Lang, Frankfurt am Main, 1995. [31] N. Dinh, M. A. Goberna, and M. A. L´opez. From linear to convex systems: Consistency, Farkas’ lemma and applications. J. Convex Anal., 13:279–290, 2006. [32] N. Dinh, M. A. Goberna, M. A. L´opez, and T. Q. Son. New Farkastype constraint qualifications in convex infinite programmming. ESAIM Control Optim. Calc. Var., 13:580–597, 2007. [33] N. Dinh, B. S. Mordukhovich, and T. T. A. Nghia. Subdifferentials of value functions and optimality conditions for DC and bilevel infinite and semi-infinite programs. Math. Programming, 123:101–138, 2010. [34] N. Dinh, T. T. A. Nghia, and G. Vallet. A closedness condition and its application to DC programs with convex constraints. Optimization, 59:541–560, 2010. [35] R. J. Duffin. Convex analysis treated by linear programming. Math. Programming, 4:125–143, 1973. [36] J. Dutta. Generalized derivatives and nonsmooth optimization, a finite dimensional tour. Top, 13:185–314, 2005. [37] J. Dutta. Necessary optimality conditions and saddle points for approximate optimization in Banach spaces. Top, 13:127–144, 2005. [38] J. Dutta. Optimality conditions for maximizing a locally Lipschitz function. Optimization, 54:377–389, 2005. [39] J. Dutta, K. Deb, R. Arora, and R. Tulshyan. Approximate KKT points: Theoretical and numerical study. 2010. Preprint. [40] J. Dutta and C. S. Lalitha. Optimality conditions in convex optimization revisited. 2010. Preprint.
© 2012 by Taylor & Francis Group, LLC
416
Bibliography
[41] I. Ekeland. On the variational principle. J. Math. Anal. Appl., 47:324– 353, 1974. [42] I. Ekeland. Nonconvex minimization problems. Bull. Am. Math. Soc., 1:443–474, 1979. [43] I. Ekeland and R. Temam. Convex Analysis and Variational Problems, volume 1: Studies in Mathematics and its Applications. North-Holland Publishing Co., Amsterdam-Oxford and American Elsevier Publishing Co., Inc., New York, 1976. [44] M. D. Fajardo and M. A. L´opez. Locally Farkas-Minkowski systems in convex semi-infinite programming. J. Optim. Theory Appl., 103:313– 335, 1999. [45] W. Fenchel. On conjugate convex functions. Canadian J. Math., 1:73– 77, 1949. [46] M. C. Ferris. Weak Sharp Minima and Penalty Functions in Mathematical Programming. University of Cambridge, Cambridge, 1988. Ph. D. thesis. [47] M. Florenzano and C. Le Van. Finite Dimensional Convexity and Optimization, volume 13: Studies in Economic Theory. Springer-Verlag, Berlin, 2001. [48] M. Fukushima. Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. Programming, 53:99–110, 1992. [49] M. A. Goberna, V. Jeyakumar, and M. A. L´opez. Necessary and sufficient constraint qualifications for solvability of systems of infinite convex inequalities. Nonlinear Anal., 68:1184–1194, 2008. [50] M. A. Goberna and M. A. L´opez. Linear Semi-Infinite Optimization, volume 2: Wiley Series in Mathematical Methods in Practice. John Wiley & Sons, Ltd., Chichester, 1998. [51] M. A. Goberna, M. A. L´opez, and J. Pastor. Farkas-Minkowski systems in semi-infinite programming. Appl. Math. Optim., 7:295–308, 1981. [52] F. J. Gould and J. W. Tolle. A necessary and sufficient qualification for constrained optimization. SIAM J. Appl. Math., 20:164–172, 1971. [53] F. J. Gould and J. W. Tolle. Geometry of optimality conditions and constraint qualifications. Math. Programming, 2:1–18, 1972. [54] M. Guignard. Generalized Kuhn-Tucker conditions for mathematical programming problems in a Banach space. SIAM J. Control, 7:232– 241, 1969.
© 2012 by Taylor & Francis Group, LLC
Bibliography
417
[55] M. R. Hestenes. Optimization Theory: The Finite Dimensional Case. Wiley, New York, 1975. [56] R. Hettich. A review of numerical methods for semi-infinite optimization. In Semi-Infinite Programming and Applications, Lecture Notes in Econom. and Math. System, volume 215, pages 158–178. SpringerVerlag, Berlin, 1983. [57] R. Hettich and K. O. Kortanek. Semi-infinite programming: theory, methods and applications. SIAM Review, 35:380–429, 1993. [58] J.-B. Hiriart-Urruty. ε-Subdifferential Calculus. In Convex Analysis and Optimization, pages 43–92. Pitman, London, 1982. [59] J.-B. Hiriart-Urruty. What conditions are satisfied at points minimizing the maximum of a finite number of differentiable functions? In Nonsmooth Optimization: Methods and Applications (Erice, 1991), pages 166–174. Gordon and Breach, Montreux, 1992. [60] J. B. Hiriart-Urruty. Global optimality conditions for maximizing a convex quadratic function under convex quadratic constraints. J. Global Optim., 21:445–455, 2001. [61] J. B. Hiriart-Urruty and Y. S. Ledyaev. A note on the characterization of the global maxima of a (tangentailly) convex function over a convex set. J. Convex Anal., 3:55–61, 1996. [62] J.-B. Hiriart-Urruty and C. Lemar´echal. Convex Analysis and Minimization Algorithms I & II, volume 306: Fundamental Principles of Mathematical Sciences. Springer-Verlag, Berlin, 1993. [63] J.-B. Hiriart-Urruty and C. Lemar´echal. Fundamentals of Convex Analysis. Grundlehren Text Editions, Springer-Verlag, Berlin, 2001. [64] J. B. Hiriart-Urruty and R. R. Phelps. Subdifferential calculus using ε-subdifferentials. J. Funct. Anal., 118:154–166, 1993. [65] J. Y. Jaffray and J. Ch. Pomerol. A direct proof of the Kuhn-Tucker necessary optimality theoren for convex and affine inequalities. SIAM Review, 31:671–674, 1989. [66] V. Jeyakumar. Asymptotic dual conditions characterizing optimality for infinite convex programs. J. Optim. Theory Appl., 93:153–165, 1997. [67] V. Jeyakumar. Characterizing set containments involving infinite convex constraints and reverse-convex constraints. SIAM J. Optim., 13:947– 959, 2003. [68] V. Jeyakumar, G. M. Lee, and N. Dinh. New sequential Lagrange multiplier conditions characterizing optimality without constraint qualification for convex programs. SIAM J. Optim., 14:534–547, 2003.
© 2012 by Taylor & Francis Group, LLC
418
Bibliography
[69] V. Jeyakumar and G. Y. Li. Farkas’ lemma for separable sublinear inequalities without qualifications. Optim. Lett., 3:537–545, 2009. [70] V. Jeyakumar, A. M. Rubinov, B. M. Glover, and Y. Ishizuka. Inequality systems and global optimization. J. Math. Anal. Appl., 202:900–919, 1996. [71] F. John. Extremum problems with inequalities as subsidiary conditions. In Studies and Essays Presented to R. Courant on His 60th Birthday, pages 187–204. Interscience Publishers, Inc., New York, 1948. [72] V. L. Klee. The critical set of a convex body. Am. J. Math., 75:178–188, 1953. [73] H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, pages 481–492. University of California Press, Berkeley and Los Angeles, 1951. [74] J. B. Lasserre. On representation of the feasible set in convex optimization. Optim. Lett., 4:1–5, 2010. [75] M. A. L´ opez and E. Vercher. Optimality conditions for nondifferentiable convex semi-infinite programming. Math. Programming, 27:307– 319, 1983. [76] P. Loridan. Necessary conditions for ε-optimality. Math. Programming Stud., 19:140–152, 1982. [77] P. Loridan and J. Morgan. Penalty function in ε-programming and ε-minimax problems. Math. Programming, 26:213–231, 1983. [78] D. T. Luc, N. X. Tan, and P. N. Tinh. Convex vector functions and their subdifferential. Acta Math. Vietnam., 23:107–127, 1998. [79] R. Lucchetti. Convexity and Well-Posed Problems. Springer Science + Business Media, Inc., New York, 2006. [80] D. G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, Inc., New York, 1968. [81] T. L. Magnanti. Fenchel and Lagrange duality are equivalent. Math. Programming, 7:253–258, 1974. [82] O. L. Mangasarian. Nonlinear Programming. McGraw-Hill Book Company, New York, 1969. [83] E. J. McShane. The Lagrange multiplier rule. Am. Math. Monthly, 80:922–925, 1973.
© 2012 by Taylor & Francis Group, LLC
Bibliography
419
[84] B. S. Mordukhovich. Approximation and maximum principle for nonsmooth problems of optimal control. Russian Math. Surveys, 196:263– 264, 1977. [85] B. S. Mordukhovich. Metric approximations and necessary optimality conditions for general class of nonsmooth extremal problems. Soviet Math. Doklady, 22:526–530, 1980. [86] B. S. Mordukhovich. Variational Analysis and Generalized Differentiation I & II. Springer-Verlag, Berlin, 2006. [87] J. J. Moreau. Fonctions convexes en dualite. Seminaire de Mathematiques de la Faculte des Sciences de Montpellier, (1), 1962. [88] J. J. Moreau. Convexity and duality. In Functional Analysis and Optimization, pages 145–169. Academic Press, New York, 1966. [89] J. J. Moreau. Inf-convolution, sous-additivit´e, convexit´e des fonctions num´eriques. J. Math. Pures Appl., 49:109–154, 1970. [90] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. In Applied Optimization, volume 87. Kluwer Academic Publishers, 2004. [91] A. E. Ozdaglar. Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization. Mass. Institute of Technology, Cambridge, MA, 2003. Ph. D. thesis. [92] A. E. Ozdaglar and D. P. Bertsekas. The relation between pseudonormality and quasiregularity in constrained optimization. Optim. Methods Softw., 19:493–506, 2004. [93] R. R. Phelps. Convex Functions, Monotone Operators and Differentiability, volume 1364: Lecture Notes in Mathematics. Springer-Verlag, Berlin. [94] B. T. Polyak. Sharp Minima. In Institute of Control Sciences Lecture Notes, Moscow, 1979. Presented at the IIASA Workshop on Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria. [95] B. T. Polyak. Introduction to Optimization. Optimmization Software, Inc., Publications Division, New York, 1987. [96] B. N. Pshenichnyi. Necessary Conditions for an Extremum, volume 4: Pure and Applied Mathematics. Marcel Dekker, Inc., New York, 1971. [97] R. T. Rockafellar. Convex Analysis, volume 28: Princeton Mathematical Series. Princeton University Press, Princeton, NJ, 1970.
© 2012 by Taylor & Francis Group, LLC
420
Bibliography
[98] R. T. Rockafellar. Some convex programs whose duals are linearly constrained. In Nonlinear Programming (Proc. Sympos., Univ. of Wisconsin, Madison, Wis., 1970), pages 293–322. Academic Press, New York, 1970. [99] R. T. Rockafellar. Conjugate Duality and Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1974. [100] R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35:183–238, 1993. [101] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317: Fundamental Principles of Mathematical Sciences. Springer-Verlag, Berlin. [102] A. Ruszczynski. Nonlinear Optimization. Princeton University Press, Princeton, NJ, 2006. [103] R. Schneider. Convex Bodies: The Brunn–Minkowski Theory, volume 44: Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 1993. [104] A. S. Strekavolski˘i. On the problem of the global extremum. Soviet Math. Doklady, 292:1062–1066, 1987. [105] A. S. Strekavolski˘i. Search for the global maximum of a convex functional on an admissible set. Comp. Math. Math. Phys., 33:349–363, 1993. [106] J.-J. Strodiot, V. H. Nguyen, and N. Heukemes. ε-Optimal solutions in nondifferentiable convex programming and some related questions. Math. Programming, 25:307–328, 1983. [107] T. Str¨ omberg. The operation of infimal convolution. Dissertationes Math., 352:1–61, 1996. [108] L. Thibault. Sequential convex subdifferential calculus and sequential Lagrange multipliers. SIAM J. Control Optim., 35:1434–1444, 1997. [109] M. Valadier. Sous-diffrentiels d’une borne suprieure et d’une somme continue de fonctions convexes. C. R. Acad. Sci. Paris, 268:39–42, 1969. [110] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley & Sons, Inc., New York, 1984. [111] R.-J. Wets. Elementary constructive proofs of the theorems of Farkas, Minkowski and Weyl. In Econimic Decision Making: Games, Econometrics and Optimization, pages 427–432. Elsevier-Science, Amsterdam, 1990.
© 2012 by Taylor & Francis Group, LLC
Bibliography
421
[112] H. Wolkowicz. Geometry of optimality conditions and constraint qualifications: The convex case. Math. Programming, 19:32–60, 1980. [113] K. Yokoyama. ε-Optimality criteria for convex programming problems via exact penalty functions. Math. Programming, 56:233–243, 1992. [114] W. I. Zangwill. Non-linear programming via penalty functions. Manag. Sci., 13:344–358, 1967.
© 2012 by Taylor & Francis Group, LLC
Index
S-convex function, 161 ε-complementary slackness, 342 ε-feasible set, 338 ε-maximum solution, 348 ε-minimum solution, 348 ε-normal set, 123, 339 ε-saddle point, 345 ε-solution, 135, 337 ε-subdifferential, 96, 122, 123, 338, 409 Abadie constraint qualification, 154, 214, 270, 375 abstract constraint, 157 active constraint multipliers, 400 active index set, 58, 104, 146, 213, 249, 255, 316, 367 affine combination, 31 affine function, 3 affine hull, 31 affine set, 24 affine support, 113 almost ε-solution, 236, 338, 348, 350 approximate optimality conditions, 337 approximate solutions, 337 approximate up to ε, 337 badly behaved constraints, 255 biconjugate function, 114 bilevel programming, 308 bipolar cone, 50 blunt cone, 243 Bolzano–Weierstrass Theorem, 5 bounded sequence, 5 bounded-valued map, 15 Brønsted–Rockafellar Theorem, 128
canonically closed system, 389 Carath´eodory Theorem, 28 Cauchy–Schwarz inequality, 4 CC qualification condition, 305 Chain Rule, 101, 164 Clarke directional derivative, 320 Clarke generalized gradient, 163, 289 Clarke Jacobian, 163 Clarke subdifferential, 163, 289, 320 closed cone constraint qualification, 281, 300, 395 closed convex hull of function, 70 closed function, 10 closed graph theorem, 93 closed half space, 23, 44 closed map, 15 closed-valued map, 15 closure of function, 10, 77, 79 closure of set, 4, 31 coercive, 12, 351 complementary slackness, 151, 172 complementary violation condition, 212 concave function, 63 cone, 40, 243 cone constrained problem, 161, 162 cone convex function, 162 cone generated by set, 40 conic combination, 40 conjugate function, 68, 111, 112, 114, 198 consequence relation, 383 constraint qualification, 213 continuous, 6, 75 convergent sequence, 4 convex analysis, 2, 23 convex combination, 25 423
© 2012 by Taylor & Francis Group, LLC
424 convex convex convex convex convex convex
cone, 39, 40 cone generated by set, 40 function, 2, 3, 62, 286 hull, 27 hull of function, 70 locally Farkas-Minkowski problem, 378 convex optimization, 2, 315 convex programming, 3 convex set, 2, 3, 23 convex-valued map, 15 core, 39 d.c. function, 403, 408 derivative, 13, 14, 85 direction of constancy, 244 direction of recession, 41 direction sets, 243 directional derivative, 85, 320 distance function, 65 domain, 3, 62 dual problem, 170, 185, 235, 361 duality, 170 Dubovitskii–Milyutin Theorem, 252 Ekeland’s variational principle, 127, 135, 355 enhanced dual Fritz John condition, 235 enhanced Fritz John optimality condition, 207, 208 epigraph, 4, 62, 75, 136, 286 equality set, 255 error bound, 19 exact penalty function, 350 extended-valued function, 3 faithfully convex function, 244 Farkas’ Lemma, 275, 398 Farkas–Minkowski (FM) constraint qualification, 300 Farkas–Minkowski (FM) system, 383 Farkas–Minkowski qualification, 382, 383, 386 feasible direction, 54 feasible direction cone, 377
© 2012 by Taylor & Francis Group, LLC
Index Fenchel duality, 196 Fenchel–Young inequality, 116 finitely generated cone, 61 finitely generated set, 61 Fritz John optimality condition, 207 gap function, 19 generalized Lagrange multiplier, 190 generators of cone, 61 generators of set, 61 geometric optimality condition, 249, 255 Gordan’s theorem of alternative, 379 gradient, 13 graph, 15 Helly’s Theorem, 49 Hessian, 14 hyperplane, 23, 44 improper function, 62, 75 indicator function, 65, 123 Inf-Convolution Rule, 118 infimal/inf-convolution, 68 Infimum Rule, 118 inner product, 4 interior, 4, 31 Jacobian, 14 Jensen’s inequality, 64 Karush–Kuhn–Tucker (KKT) optimality condition, 2, 151 Karush–Kuhn–Tucker multiplier, 151 Lagrange multiplier, 1, 146, 151, 188 Lagrangian duality, 185 Lagrangian function, 172, 238, 345 Lagrangian regular point, 374, 375 limit infimum of function, 6 limit infimum of sequence, 5 limit point, 5 limit supremum of function, 6 limit supremum of sequence, 5 line segment principle, 32 linear programming, 1, 25, 61, 327
Index linear semi-infinite system, 383 linearity criteria, 213, 221 Lipschitz constant, 82, 163 Lipschitz function, 2, 82, 320 locally bounded map, 15 locally Lipschitz function, 82, 163 lower limit of function, 6 lower limit of sequence, 5 lower semicontinuous (lsc), 5 lower semicontinuous hull, 10 lower-level set, 8 Mangasarian–Fromovitz constraint qualification, 317 marginal function, 190 max-function, 104, 159, 342 Max-Function Rule, 106, 132 maximal monotone, 95 Mean Value Theorem, 14 merit function, 19 metric approximation, 209 minimax equality, 170 minimax inequality, 170 modified ε-KKT conditions, 358 modified ε-KKT point, 358 modified Slater constraint qualification, 176, 183 monotone, 18 multifunction, 93 nonconvex optimization, 403 nondecreasing function, 100, 286 nondegeneracy condition, 316, 321 nonsmooth function, 13, 243, 320 nonsmooth optimization, 2 norm, 4 normal cone, 40, 54, 57, 89 open ball, 4 open half space, 24, 44 orthogonal complement, 48 parameter, 199 parameterized family, 199 penalty function, 209 polar cone, 40, 50
© 2012 by Taylor & Francis Group, LLC
425 polyhedral cone, 61 polyhedral set, 25, 58, 60 positive cone, 365 positive polar cone, 53 positively homogeneous, 71 primal problem, 170 product space, 365 projection, 65 prolongation principle, 32 proper function, 3, 62, 75 proper map, 15 proper separation, 45, 221 pseudonormality, 213, 220 quasi ε-solution, 338, 355 quasinormality, 215 quasiregularity, 215 Rademacher Theorem, 163 recession cone, 41 regular ε-solution, 338 regular function, 320 regular point, 270 regularization condition, 254 relative interior, 31 relaxed ε-complementary slackness, 346, 358 relaxed Slater constraint qualification, 372 right scalar multiplication, 339 saddle point, 169, 170 saddle point condition, 170, 216 Saddle Point Theorem, 171 Scalar Product Rule, 131 second-order derivative, 14 semi-infinite programming, 365 separable sublinear function, 274 separating hyperplane, 44 separation theorem, 44, 45 Sequential Chain Rule, 286, 290 sequential optimality conditions, 243, 281, 291, 395 Sequential Sum Rule, 282 set-valued map, 15, 93 sharp minimum, 327
426
Index
Slater constraint qualification, 145, 146, 167, 172, 214, 254, 272, 302, 316, 339, 366 Slater-type constraint qualification, 157, 162, 213, 221, 236 slope inequality, 86 smooth function, 13, 243, 315 strict convex function, 63 strict epigraph, 64 strict separation, 45 strong duality, 186 strongly convex function, 19 strongly convex optimization, 19 strongly monotone, 20 strongly unique local minimum, 327 subadditive, 71 subdifferential, 89, 162 subdifferential calculus, 98 subgradient, 89, 162 sublinear function, 66, 71, 274, 320 subsequence, 5 Sum Rule, 98, 118, 129, 137, 163 sup-function approach, 366 support function, 66, 71, 72 support set, 113, 366 supporting hyperplane, 45 Supremum Rule, 118, 137 tangent cone, 40, 54 two-person-zero-sum game, 169 upper upper upper upper
limit of function, 6 limit of sequence, 5 semicontinuous (usc), 6, 94 semicontinuous (usc) map, 15
Valadier Formula, 107 value function, 190, 197 value of game, 170 variational inequality, 17 weak duality, 186 weak sharp minimum, 327, 328 weakest constraint qualification, 270 Weierstrass Theorem, 12
© 2012 by Taylor & Francis Group, LLC