Greedy Method forOptimization Problems
Greedy:take the action that is best now (out of the current options) according to your criterion (i.e. commit to/pick a local optimal)it may cause you to miss the optimal solutionYou build the solution as you go. No need tobacktraceit.Examples:KnapsackPic the item with the largest value/weight ratio.Other possible measures: largest value, smallest weight.Interval schedulingPick the interval that finishes firstHuffman codesPrefix codes: no code is also a prefix of another code.Huffman tree to decode (the code for a character, x, takes you to the leaf with x)Edit distance– greedy? – not clear.
Unbounded(electronics warehouse)Unlimited amounts of each item.Cannot take fractions of an item.0/1 Fractional(market: flour, rice,wine)0/1 – have only one of each itemCan take a fraction of an item.What would a greedy thief do?Method:Sort in decreasing order of value/weight ratio.Pick as many of the largest ratio as possible. After that, try to take as many of the next ratio as possible and so on.Does NOT give an optimal solution. See next page.Would any of the above problems be solved optimally using Greedy?(Prove or give counter-example)Yes. The fractional version.
Greedy forKnapsack – Criterion: Ratio
Unbounded (not fractional)
0/1 (not fractional)
Counter example used to prove that Greedy fails for Unbounded Knapsack
Goal: construct an Unbounded Knapsack instance where Greedy does not give the optimal answer.Intuition: We want Greedy to pick only one item, when in fact two other items can be picked and together give a higher value:Item B: weight 2 (can fit 2), value 3 => total value 6Item A: weight 3, value 5 => total value 5Knapsack max weight: 4!!! You must double-check that Greedy would pick item A=> check the ratios: 5/3 > 3/2 (1.66 > 1.5).If item A had value 4, Greedy would also have picked B.Same example can be modified to work for 0/1 Knapsack:Item A: 3kg, $5ItemB: 2kg,$3ItemC: 2kg,$3
Interval scheduling versionsWorksheet
With values( job = (start, end, value) ):Each interval has a value. Goal: maximize sum of values of picked non-overlapping jobs.Greedy solution?AlgorithmIs it optimal?Give proof or counter example.Without values( job = (start,end) ):Pick the largest number of non-overlapping intervals.“Activity selection problem” in CLRS (page 415)Greedysolution?AlgorithmIsit optimal? Give proof or counter example. (CLRS proof at page 418, proof of Theorem 16.1)Which of the two versions is more general?Is one a special case (or special instance) of the other?If you have a program to solve problems of one type, can you easily use it to solve problems of the other type? Which type should the program solve (with value, or without value)?
Example showing that Greedy withlargest valueDoes not give an optimal solution.Greedy will pick the red job. Nothing else fits.Better(optimal): the 2 blue jobs.
Problem version:All jobs have the SAME value. => maximize number of jobs you can pick.Optimal:jobthatfinishesfirstjob thatstarts last(See book for proof is interested)NOT optimal:Shortest durationLeast overlapsEarliest start time
Interval Scheduling Greedy Criteria
Example showing that Shortest durationDoes not give an optimal solution.Greedy will pick the red job. Nothing else fits.Better(optimal): the 2 blue jobs.
Application: file compressionExample from CLRS:File with 100,000 characters.Characters:a,b,c,d,e,fFrequency in thousands (e.g. the frequency of b is 13000):Goal: binary encoding that requires less memory.
Internal nodes contain the sum of probabilities of the leaves in that subtree.
Optimal prefixcodewordtree(Huffman tree) – optimal encoding
(Images from CLRS.)
Compute the number of bits needed for the whole file using each of these encodings.
45,000*1 + 13,000* 3 + 12,000*3 +16,000*3 + 9,000* 4 + 5,000* 4= 224,000
100,000 * 3 = 300,000
Number of bits in code
Building the Huffman Tree
Make ‘leaves’ with letters and their frequency and arrange thems.t.they are always inincreasing order(of frequency).Repeatuntil there is only one tree:Create anew tree from the two leftmost trees(with the smallest frequencies) andPut it in its place.Label left/right branches with 0/1Encoding of char = path from root to leaf that has that charTreeproperty: Everyinternal node has two childrenSee book or lecture video for the step-by-step solution.In exam orhw, you must use this method. Make sure that you alwaysput the smallest child node to the left.
Glancing at implementation issues and options
Good reference:https://www2.cs.duke.edu/csed/poop/huff/info/File Compressionwith Huffman coding -Practical IssuesGiven an encoded file,how will youknow when it ended?Typically files are stored by the OS in sizes that are multiples of a specific value, so even though your actual file takes 121 bits, on disk 128 bits are written and in that will be read when you open the file. You need to know that after 121 you are done.=> need a new symbol to indicate end of file. Note that you must add this symbol (with count 1) to your set of distinct symbols used in building the Huffman tree, because you must be able to recognize it (decode it from bits into the symbol).Given an encoded file,how will you know how to decode it?Encoding tree is not standard, but depends on the given file =>Must record some information to know how to decode the file along with the file (at the beginning)=>Store the Huffman treeStore enough info to regenerate the tree.Build the tree efficiently – use a HeapHeaps are efficient (lgN) for finding min, removing min, inserting new itemWith heap: O(NlgN)Withsort and reinsert (like on paper): O(N2)Heaps will be covered in future lectures
Greedy algorithms do not always guarantee optimal solution. It depends on the problem.Difference between Greedy and Dynamic Programming:InDP typically you solve all the problems and then make your choice.In greedy, you make the greedy choice and you are left with only one problem.