Programming and Algorithms II Degree in Bioinformatics Fall 2018 *************************************************** *** Lab 4: *** *** More recursion *** *************************************************** Part 1. 1. Give a recursive function flip(n) that given an integer n returns another one that results from exchanging the even- and odd- position digits of n. Assume 0 leading digit if necessary. For example, for 1234 it should return 2143, and for 12345 it should return 103254. You can't use floats, lists or other complex structures. Only integer variables arithmetic and logical operations such as +, *, //, <, ==, etc. Hint: suppose that you can do this for integers up to n-1 digits. How do you do it then for integers with n digits? 2. (This is hard to do recursively from the start...) Give an *iterative* algorithm that computes the reverse of an integer in base 10. For example, the reverse of 1234 is 4321, and the reverse of 930 is 39. Hint: observe that the reversal of a number with 2 digits xy is 10*y + x. Now what if you have more digits. *************************************************** Part 2. Write a recursive function that says whether or not a list contains an element that equals the sum of the elements before it in the list. For example, it has to return True on the lists [1,5,-2,4,2] (because 4 = 1+5-2) and [0,4,2,9] (because 0 = sum of the empty list), but False on the lists [1,2,4,8], [-1,2,3,-2], for example. You will need to either add an extra parameter or return an extra result to make it possible and efficient, and if you want to avoid list copying, an integer parameter indicating the part of the list to look at. *************************************************** Part 3. In Python, the string type has a < operator that tells whether a string is lexicographically less than another. For example, "abracadabra" < "act" and "bar" < "bat" < "car" < "cat" < "char". Note that a longer string may be < than a shorter one. So comparing the lengths doesn't help much. Suppose, to complicate your life, that we are only allowed to use < with single characters, not strings For example, "a" < "z" and "b" < "c". (This is how < for strings is implemented internally in python, because computers basic instructions compare characters, not strings). Give a recursive function that given two strings s1 and s2 says whether one is less than the other using only character comparisons. For example, it returns 1 if s1 < s2, 0 if s1 == s2, and -1 if s1 > s2. Hint: Suppose you can answer this for strings of length no more than n-1. How do you solve the problem for strings of length n? In other words, consider one character of s1 and one character of s2, and decide based on these two characters and a recursive call with the rest of the strings. The characters can be the first ones, or the last ones. One choice leads to a good recursive solution, the other does not. Try. Additional: what is the running time or your solution? Linear or quadratic in the length of s1 or s2? (look closely...) If it is quadratic, how to you do it in linear time? *************************************************** The rest are extra exercises for you to practice towards the exam. Extra exercise 1: Here is an implementation of the Karatsuba-Ofman algorithm to multiply integers. The function KOprod calls the recursive function KOprodrec, with a parameter n that is the number of bits of x and y, whatever is larger. def KOprodrec(x, y, n): if (n == 1): return x*y # this is really the product of 1-bit integers else: k = n//2 x1 = x >> k x0 = x - (x1 << k) y1 = y >> k y0 = y - (y1 << k) a = KOprodrec(x1,y1,k) b = KOprodrec(x0,y0,k) c = KOprodrec(x0 + x1, y0 + y1, k) c = c - a - b return (a << (2*k)) + (c << k) + b def KOprod(x,y): return KOprodrec(x,y,max(x.bit_length(),y.bit_length())) Python instructions x<>k add k 0's and remove k lowest bits to x in binary (equivalent to *2^k and //2^k, but in linear time) 1. study the algorithm and make sure you understand it (use class slides) 2. run it on large pairs of integers (x,y); measure the time it takes as (x,y) grow. Do the times match what we said in class? Generate x and y at random having n approximately bits, and try n = 100, 200, 300, 400, ... o something like this, by controling the range of generation of the random numbers. 3. An optimization is as follows: instead of going down to n==1 (1-bit numbers) we could stop the recursion at larger n. For example, our CPU probably multiplies any two 16-bit numbers by hardware without overflow. This should make it faster because it avoids about 4 levels of recursion. Check it. *************************************************** Extra exercise 2: Mergesort Actually program it and run it. If you are brave, reprogram it to sort the lines in a file. You need a function split(file1,file2,file3) that given file1, splits it into file2 and file3. File2 contains the odd-numbered lines in file1, and file3 the even-numbered ones. Then reprogram the merge function that reads lines from two sorted files and created a file that contains the ordered merge of the two files. *************************************************** Extra exercise 3: Quicksort Consider the clever function def partition(lst, begin, end): pivot = begin for i in range(begin+1,end+1): if lst[i] <= lst[begin]: pivot += 1 lst[i], lst[pivot] = lst[pivot], lst[i] lst[pivot], lst[begin] = lst[begin], lst[pivot] return pivot It does the following: given a list l and two indices begin and end, it rearranges the elements in lst and returns a value i (the "pivot") such that Every element in lst[begin..i] is <= every element in lst[i+1...end] (but the elements within lst[begin..i] can be in any order, and the elements in lst[i+1..end] can be in any order; also, i can be anything between begin..end; it does not have to be at all near (begin+end)/2 For example, with lst = [7.6, 3.2, 8.0, 1.3, 2.2, 5.9, 6.8, 7.2], begin=0, end=len(lst)-1 (all the list) it returns pivot 6 and leaves lst = [7.2, 3.2, 1.3, 2.2, 5.9, 6.8, 7.6, 8.0] which is ok because every element in [7.2, 3.2, 1.3, 2.2, 5.9, 6.8] is smaller than every element in [7.6, 8.0]. With lst = [1.3, 7.6, 3.2, 8.0, 2.2, 5.9, 6.8, 7.2] it returns pivot = 0 and lst = [1.3, 7.6, 3.2, 8.0, 2.2, 5.9, 6.8, 7.2], which is ok because every element in [1.3] is smaller than every element in [7.6, 3.2, 8.0, 2.2, 5.9, 6.8, 7.2]. Observe that partition runs in time O(end-begin). Question 0. Understand why partition does what we claim. Question 1. Give a function quicksort(lst,begin,end) that sorts lst[begin..end], recursively. Think of a base case, and how to split the problem into two subproblems. Question 2. Think why we sort lst uses time O(n log n) with n=len(lst), IF we are so lucky that at every recursive call we have pivot = (begin+end)//2. (Hint: think mergesort). Question 3: Using the second of the examples provided, prove that the worst-case running time of quicksort is O(n^2). FACT: However, on randomly sorted lists, Quicksort is also O(n log n). You are not expected to see this right away; it is a complicated analysis. Conclusion: In the worst case quicksort is quadratic, but in average case it is O(n log n) - often with less comparisons than mergesort. Two good practices if you are going to use quicksort are: - Randomly permute lst before calling quicksort, to be quite sure you are in an "average" case. - Stop doing recursion for lists that are sufficiently small, and use insertion instead (for example, when len(lst) <= 5.