代写INF 552作业、代写MFCCs留学生作业、代做Python实验作业、代做Java/c++语言作业代做留学生Processing|代做数据库SQL

Homework 5 INF 552, 1. Multi-class and Multi-Label Classification Using Support Vector Machines(a) Download the Anuran Calls (MFCCs) Data Set from: https://archive.ics.uci.edu/ml/datasets/Anuran+Calls+%28MFCCs%29. Choose 70% of the datarandomly as the training set.(b) Each instance has three labels: Families, Genus, and Species. Each of the labelshas multiple classes. We wish to solve a multi-class and multi-label problem.One of the most important approaches to multi-class classification is to train aclassifier for each label. We first try this approach:i. Research exact match and hamming score/ loss methods for evaluating multilabelclassification and use them in evaluating the classifiers in this problem.ii. Train a SVM for each of the labels, using Gaussian kernels and one versusall classifiers. Determine the weight of the SVM penalty and the width ofthe Gaussian Kernel using 10 fold cross validation.1 You are welcome to tryto solve the problem with both standardized 2 and raw attributes and reportthe results.iii. Repeat 1(b)ii with L1-penalized SVMs.3 Remember to standardize4the attributes.Determine the weight of the SVM penalty using 10 fold cross validation.iv. Repeat 1(b)iii by using SMOTE or any other method you know to remedyclass imbalance. Report your conclusions about the classifiers you trained.v. Extra Practice: Study the Classifier Chain method and apply it to the aboveproblem.vi. Extra Practice: Research how confusion matrices, precision, recall, ROC,and AUC are defined for multi-label classification and compute them for theclassifiers you trained in above.2. K-Means Clustering on a Multi-Class and Multi-Label Data SetMonte-Carlo Simulation: Perform the following procedures 50 times, and reportthe average and standard deviation of the 50 Hamming Distances that you calculate.1How to choose parameter ranges for SVMs? One can use wide ranges for the parameters and a finegrid (e.g. 1000 points) for cross validation; however,this method may be computationally expensive. Analternative way is to train the SVM with very large and very small parameters on the whole training dataand find very large and very small parameters for which the training accuracy is not below a threshold (e.g.,70%). Then one can select a fixed number of parameters (e.g., 20) between those points for cross validation.For the penalty parameter, usually one has to consider increments in log(λ). For example, if one found thatthe accuracy of a support vector machine will not be below 70% for λ = 10?3 and λ = 106, one has to chooselog(λ) ∈ {3, 2, . . . , 4, 5, 6}. For the Gaussian Kernel parameter, one usually chooses linear increments,e.g. σ ∈ {.1, .2, . . . , 2}. When both σ and λ are to be chosen using cross-validation, combinations of verysmall and very large λ’s and σ’s that keep the accuracy above a threshold (e.g.70%) can be used to determinethe ranges for σ and λ. Please note that these are very rough rules of thumb, not general procedures.2It seems that the data are already normalized.3The convention is to use L1 penalty with linear kernel.4It seems that the data are already normalized.1Homework 5 INF 552, �(a) Use k-means clustering on the whole Anuran Calls (MFCCs) Data Set (do not splitthe data into train and test, as we are not performing supervised learning in thisexercise). Choose k ∈ {1, 2, . . . , 50} automatically based on one of the methodsprovided in the slides (CH or Gap Statistics or scree plots or Silhouettes) or anyother method you know.(b) In each cluster, determine which family is the majority by reading the true labels.Repeat for genus and species.(c) Now for each cluster you have a majority label triplet (family, genus, species).Calculate the average Hamming distance, Hamming score, and Hamming loss5between the true labels and the labels assigned by clusters.3. ISLR 10.7.24. Extra Practice: The rest of problems in 10.7.5Research what these scores are. For example, see the paper A Literature Survey on Algorithms forMulti-label Learning, by Mohammad Sorower.2本团队核心人员组成主要包括硅谷工程师、BAT一线工程师,精通德英语!我们主要业务范围是代做编程大作业、课程设计等等。我们的方向领域:window编程 数值算法 AI人工智能 金融统计 计量分析 大数据 网络编程 WEB编程 通讯编程 游戏编程多媒体linux 外挂编程 程序API图像处理 嵌入式/单片机 数据库编程 控制台 进程与线程 网络安全 汇编语言 硬件编程 软件设计 工程标准规等。其中代写编程、代写程序、代写留学生程序作业语言或工具包括但不限于以下范围:C/C++/C#代写Java代写IT代写Python代写辅导编程作业Matlab代写Haskell代写Processing代写Linux环境搭建Rust代写Data Structure Assginment 数据结构代写MIPS代写Machine Learning 作业 代写Oracle/SQL/PostgreSQL/Pig 数据库代写/代做/辅导Web开发、网站开发、网站作业ASP.NET网站开发Finance Insurace Statistics统计、回归、迭代Prolog代写Computer Computational method代做因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected] 微信:codehelp

你可能感兴趣的:(代写INF 552作业、代写MFCCs留学生作业、代做Python实验作业、代做Java/c++语言作业代做留学生Processing|代做数据库SQL)