Your task is to develop a series of functions, some of which we will develop together in class as examples (in python). You must develop each of the requested functions (that is, you can't change the structure of the application -- each of the requested functions must exist in the program you turn in, and each must work as requested). Here are the requirements for each:
euclidean_dist: Takes two tuples of floating-point values as input (equal length) and determines the euclidean distance between them. Simple enough :)
assign_initial_centroids: This function will have 2 parameters, k and data, where k is the number of centroids and data is a list of input tuples (data points, all floating point values). You may implement this any way you wish, as long as it returns a list of tuples with the appropriate length (the same number of floating point attributes as in the data set). Experiment!
assign_tuples_to_clusters: This function will take a list of tuples representing the centroids and a list of tuples representing the original dataset. It will assign each data point in the data set to a cluster corresponding to one of the centroids (the closest centroid, according to the euclidean distance function). In the case of a tie, choose the first centroid in the list. It should return a list of lists of tuples, where each "sub list" represents a single cluster. Order is not important, but it may be helpful for you to maintain the same order as the clusters came in for debugging purposes.
derive_centroids_for_clusters: This function should simply take a list of lists of tuples representing the clustered input data, as returned by assign_tuples_to_clusters. There should be k lists (clusters) in this input so this function should find a centroid for each list (as a tuple) and return a list of the centroids. The centroids should have averaged values for each attribute of each tuple in a single cluster.
kmeans_cluster: This function should take an integer (k -- the number of clusters/centroids) and a list of tuples (the initial dataset). It should first call assign_initial_centroids to determine the initial set of k centroids to work with and print the results. It should then enter a loop in which it computes new clusters for these centroids (and prints the results) and then computers new centroids for the new clusters (and prints the results). It should terminate when the computed centroids do not change from the prior iteration.
Once each of these functions are written, you should develop a script that reads in a file named [url removed, login to view] from the local folder which holds the input tuples in csv format (no spaces will be included, for simplicity). You should hardcode k and process this file using the kmeans algorithm, printing the resulting clusters.
7 freelancers are bidding on average $72 for this job
I am an embedded systems engineer , and I learned Python and creating functions is a fairly easy subject , so this project shouldn't be any problem. Feel free to contact me if you have any questions.