Programming and Algorithms II Degree in Bioinformatics Fall 2018 *************************************************** *** Lab 1: *** *** Running time of sorting algorithms *** *** Speeding up programs using dictionaries *** *************************************************** Part 1: The goal of this lab is to experience the difference between quadratic and subquadratic running times. We will use three sorting algorithms as an example. Build a program that 1. generates a list of random integers of a given length N 2. sorts the list using a sorting algorithm 3. prints the time used by the process. Then run the program for different sorting algorithms and different N's. Use at least: 1. selection sort (which you must program) 2. insertion sort (which you must program) 3. the sort option in python To generate a list of random integers you can use the following Pyton code: import random def randomList(n,d): "returns a list of n random integers in [0..d-1]" L = [] for i in range(0,n): # randint(a..b) returns a random integer in [a..b] L.append(random.randint(0,d-1)) return L To time a piece of code, you can use the time module: import time startTime = time.time() #do your thing here ... endTime = time.time() print(endTime-startTime) There is also a timeit module. Give timeit.timeit the instruction you want to time: import timeit t = timeit.timeit('"-".join(str(n) for n in range(100))', number=10000) print(t) Then make a plot that has list size in the x axis and running time in the y axis. Use list sizes large enough that e.g. selection sort takes of the order of 2-3 minutes. You don't have to try every single list size up to them, try for example powers of 2 or so. In my laptop, selection sort of lists with about 16000 elements takes about 1 minute. Can you guess how much it would take for one with 1 million elements? Of course, it's nicer if you make a program that actually tries all these points one after the other, rather than you calling the base program by hand many times. Can you note the difference between O(n^2) and O(n log n) sort? Take into account that if you choose sizes that are too small then the time to start up the program may dominate the overall running time (like an additive O(n)) term. -------- Part 2: Program Solutions 1, 2, 3, 4 to the keeping counts problem. If not all four, program at least 1 and 4. - Solution 1 keeps an unordered table and searches elements linearly. You may need an ad-hoc function for linear search that returns a position or -1, and that "knows" that the list has pairs (x,c) when searching for x - Solution 2 keeps an ordered table and searches elements using binary search. You may need and ad-hoc function for binary search for lists of pairs (x,c) that compares only x, and a function that inserts pair (x,1) in a list of pairs, which is similar to a piece of insertion sort (in your first trial, you'll get index out of bounds trial; remember that insertion sort moves "up" an element that is already part of the list, while you are adding a new one. - Solution 3 sorts the list, then "compresses" all occurrences of every x in the sorted list into a pair (x,c) in the new list. - Solution 4 uses a dictionary. Note that in Python3 when you want to convert a dictionary into a list you do lst = list(d.items()) (an explicit conversion to list is necessary) Now run your four solutions for random lists of size n = 1, 2, 4, ... up to where you can. Start with few different elements in every list (d=10, for example). In my laptop, with d=10, the slowest solution takes about 12 seconds for n=1 milion or so. Do the methods scale in time as predicted by the theory? Now, we also know that Solutions 3 and 4 are pretty much independent of d = number of distinct elements in list = the size of the resulting list. But the running times of Solutions 1 and 2 do depend on d. Now fix for example n = 500,000 and try for d = 10, 20, 40, ... up to where you can, see if you observe the dependence in the running time. In all the lab: Aim at having clean code. Use functions a lot. Document them.