{"id":166,"date":"2018-03-20T21:35:34","date_gmt":"2018-03-20T21:35:34","guid":{"rendered":"http:\/\/wordpress.cs.vt.edu\/optml\/?p=166"},"modified":"2018-03-20T21:44:16","modified_gmt":"2018-03-20T21:44:16","slug":"convex-optimization-with-submodular-functions","status":"publish","type":"post","link":"https:\/\/wordpress.cs.vt.edu\/optml\/2018\/03\/20\/convex-optimization-with-submodular-functions\/","title":{"rendered":"Convex Optimization with Submodular Functions"},"content":{"rendered":"<p><span style=\"font-weight: 400\">During our introductory lecture on submodular functions, based on Francis Bach\u2019s <a href=\"https:\/\/arxiv.org\/abs\/1111.6453\">paper<\/a>, we discussed the importance of combinatorial optimization for machine learning applications, reviewed set notation, and defined the key concepts from chapters 2-3 of Bach\u2019s paper.<\/span><\/p>\n<h3><b>Importance of Combinatorial Optimization for Machine Learning<\/b><\/h3>\n<p><span style=\"font-weight: 400\">Feature selection, factoring distributions, ranking, clustering, and graph cuts were given as examples of machine learning problems that are combinatorial by nature and require either maximization or minimization. When discussing the importance of submodular functions and combinatorial optimization, we referred to the feature selection example from this\u00a0<a href=\"http:\/\/luthuli.cs.uiuc.edu\/~daf\/courses\/optimization\/BilevelSubmodularMaterial\/submodularity-slides.pdf\">slide deck<\/a> produced by professors Andreas Krause and Carlos Guestrin.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-6 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide1-300x230.jpg\" alt=\"\" width=\"300\" height=\"230\" \/><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-7 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide2-300x232.jpg\" alt=\"\" width=\"300\" height=\"232\" \/><\/p>\n<p><span style=\"font-weight: 400\">In the feature selection example, we desire to choose the k best features that will add the most certainty in predicting the dependent variable (in this example, Y = \u201cSick\u201d). As Krause and Guestrin point out, this problem is inherently combinatorial and therefore requires combinatorial optimization techniques. The goal of choosing an optimal feature set, |A*|\u00a0\u2264 k, is outlined in the first two slides. The problem is NP-hard because there are (2^p)<\/span><span style=\"font-weight: 400\">\u00a0&#8211; 1 constraints so we use a greedy algorithm to solve it. This essentially means that we take the best individual element that minimizes the marginal uncertainty of adding it to the current optimal set, A*. We repeat this process for up to k iterations until our optimal set is finalized. Although this certainly isn\u2019t going to give us the true optimal set every time because some combinations of features may be more optimal when paired together, Krause and Guestrin inform us that it does give us a guarantee of at least 1 &#8211; 1\/e (~63%) sub-optimal performance.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-8 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide3-300x231.jpg\" alt=\"\" width=\"300\" height=\"231\" \/><\/p>\n<h3><strong>Background<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">Before covering the Bach paper material, we reviewed set function notations for those people who hadn\u2019t taken a discrete math or combinatorics class. The slide below outlines the notations covered during our class lecture.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-169 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/optml\/wp-content\/uploads\/sites\/69\/2018\/03\/slide80-300x169.jpg\" alt=\"\" width=\"383\" height=\"215\" srcset=\"https:\/\/wordpress.cs.vt.edu\/optml\/wp-content\/uploads\/sites\/69\/2018\/03\/slide80-300x169.jpg 300w, https:\/\/wordpress.cs.vt.edu\/optml\/wp-content\/uploads\/sites\/69\/2018\/03\/slide80-768x432.jpg 768w, https:\/\/wordpress.cs.vt.edu\/optml\/wp-content\/uploads\/sites\/69\/2018\/03\/slide80-1024x576.jpg 1024w, https:\/\/wordpress.cs.vt.edu\/optml\/wp-content\/uploads\/sites\/69\/2018\/03\/slide80.jpg 1320w\" sizes=\"auto, (max-width: 383px) 100vw, 383px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">The three notations at the bottom of the slide seemed to be the ones that students were least familiar with. Cardinality (absolute value signs around a set) is simply the number of elements contained within a given set. The\u00a0indicator vector, 1_A, for the set A is simply a vector with a 1 in a particular dimension if the element corresponding to that vector dimension is contained in the set A and is a 0 if it isn\u2019t. Lastly, the notation V \\ A is simply set V with subset A removed.<\/span><\/p>\n<h3><strong>Submodular Functions and the\u00a0Lov\u00e1sz Extension<\/strong><\/h3>\n<p><strong><span style=\"font-weight: 400\">When we introduced submodular function notation, we used both Definition 2.1 and Proposition 2.2 from the book. When first learning about submodularity, we found that the proposition 2.2, which introduces the property of diminishing returns, was more intuitive and made understanding definition 2.1 easier.<\/span><\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-15 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide7-300x98.jpg\" alt=\"\" width=\"300\" height=\"98\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-16 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide8-300x43.jpg\" alt=\"\" width=\"300\" height=\"43\" \/><\/p>\n<p><strong><span style=\"font-weight: 400\">\u00a0<\/span><\/strong><span style=\"font-weight: 400\">To further ring home the concept of submodular functions and the property of diminishing returns, we reviewed the set cover example from the Krause and Guestrin slides below.\u00a0<\/span><span style=\"font-weight: 400\">The goal of this visual aid was to demonstrate how adding a new element to a set A, a subset of set B, can only cover the same or more surface area than adding the same new element to set B.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-11 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide5-300x232.jpg\" alt=\"\" width=\"300\" height=\"232\" \/><\/p>\n<p><span style=\"font-weight: 400\">During the next part of the lecture, we reviewed associate polyhedra which was by far the most confusing material to cover for the class. The differences between submodular and base polyhedra expressed in Bach\u2019s paper is clear to understand. The base polyhedra, B(F), can be thought of as the hollow outer shell of the set function formed by the intersecting constraint hyperplanes and the submodular polyhedra, P(F), can be thought of as the base polyhdra and the discrete, encapsulated, non-empty interior space encompassed by the base polyhedra. What we found confusing was whether Figure 2.1 below was representing just the domain of the set function, F, or if it was also representing the function values. We agreed that we thought the figure was only representing the domain, because it would need an additional dimension to represent the set function values for the elements s1, s2, and s3. <\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-17 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide9-300x71.jpg\" alt=\"\" width=\"300\" height=\"71\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-18 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide10-300x159.jpg\" alt=\"\" width=\"300\" height=\"159\" \/><\/p>\n<p><span style=\"font-weight: 400\">Finally, we had the opportunity to introduce the Lov<\/span><span style=\"font-weight: 400\">\u00e1<\/span><span style=\"font-weight: 400\">sz extension (also referred to as the Choquet integral) which allows us to extend the vertices of a {0,1}^<\/span>p<span style=\"font-weight: 400\"> hypercube to a continuous [0, 1]^<\/span>p<span style=\"font-weight: 400\"> hypercube (think of taking the convex hull of the vertices). By dividing the continuous hypercube into p! simplices, we can create areas with ordered variable values like in the figures from Bach\u2019s paper below. <\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-19 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide11-300x131.jpg\" alt=\"\" width=\"300\" height=\"131\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-20 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/slide12-300x275.jpg\" alt=\"\" width=\"300\" height=\"275\" \/><\/p>\n<p><span style=\"font-weight: 400\">One of the last things we were able to cover in class was the properties of the Lov<\/span><span style=\"font-weight: 400\">\u00e1<\/span><span style=\"font-weight: 400\">sz extension. The Lov<\/span><span style=\"font-weight: 400\">\u00e1<\/span><span style=\"font-weight: 400\">sz extension is important mainly because it allows us to create a continuous extension of discrete problems. Additionally, it has a unique connection with submodularity and convexity. A set function is submodular if and only if its Lov<\/span><span style=\"font-weight: 400\">\u00e1<\/span><span style=\"font-weight: 400\">sz extension is convex. We ended the class by proving this theorem<\/span><span style=\"font-weight: 400\">.<\/span><\/p>\n<pre>Theorem: A set-function, F, is submodular if and only if its Lov<span style=\"font-weight: 400\">\u00e1<\/span>sz extension, f, is convex. \r\n\r\nProof: Let's assume f is convex. Let A, B \u2286 V. The vector,<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-22 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation1-300x33.jpg\" alt=\"\" width=\"219\" height=\"24\" \/><\/p>\n<pre>has components equal to 0 where [ V \\ (A \u222a B) ]\r\n                        2 where [ A \u2229 B ] \r\n                    and 1 where [ A \u0394 B = (A \\ B) \u222a (B \\ A) ].\r\n\r\nUsing the property,<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-23 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation2-300x26.jpg\" alt=\"\" width=\"300\" height=\"26\" \/><\/p>\n<pre>we get<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-24 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation3-300x34.jpg\" alt=\"\" width=\"300\" height=\"34\" \/><\/p>\n<p style=\"text-align: center\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-25\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation4_1.jpg\" alt=\"\" width=\"120\" height=\"31\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-26\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation4_2-300x45.jpg\" alt=\"\" width=\"200\" height=\"30\" \/><\/p>\n<p style=\"text-align: center\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-27\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation5-300x38.jpg\" alt=\"\" width=\"237\" height=\"30\" \/><\/p>\n<pre>Since f is convex, then<\/pre>\n<p style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-28\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation6-300x34.jpg\" alt=\"\" width=\"300\" height=\"34\" \/><\/p>\n<p style=\"text-align: center\"><strong>\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 =<\/strong><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-29\" src=\"http:\/\/wordpress.cs.vt.edu\/buntingj\/wp-content\/uploads\/sites\/72\/2018\/03\/equation7.jpg\" alt=\"\" width=\"132\" height=\"27\" \/><\/p>\n<pre>Therefore, F(A) + F(B) = F(A \u222a B) + F(A \u2229 B). This satisfies the definition of convexity, therefore if F is submodular, then it is sufficient that for all \u03c9 \u03f5 \u211d^p, f(\u03c9) is a maximum of linear functions, thus it is convex on \u211d^p.<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>During our introductory lecture on submodular functions, based on Francis Bach\u2019s paper, we discussed the importance of combinatorial optimization for machine learning applications, reviewed set notation, and defined the key concepts from chapters 2-3 of Bach\u2019s paper. Importance of Combinatorial Optimization for Machine Learning Feature selection, factoring distributions, ranking, clustering, and graph cuts were given &hellip; <a href=\"https:\/\/wordpress.cs.vt.edu\/optml\/2018\/03\/20\/convex-optimization-with-submodular-functions\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Convex Optimization with Submodular Functions<\/span><\/a><\/p>\n","protected":false},"author":145,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-166","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9CQAE-2G","_links":{"self":[{"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/posts\/166","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/users\/145"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/comments?post=166"}],"version-history":[{"count":4,"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/posts\/166\/revisions"}],"predecessor-version":[{"id":171,"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/posts\/166\/revisions\/171"}],"wp:attachment":[{"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/media?parent=166"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/categories?post=166"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/optml\/wp-json\/wp\/v2\/tags?post=166"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}