讲解:Modelling、dataset、Python,Java,c/c++Python|Python

Predictive Modelling, Application toBank TelemarketingThis case is based around a real-world dataset about telemarketing calls made by a Portuguesebank. You can find more information about this dataset here:https://archive.ics.uci.edu/ml/datasets/bank+marketingThe bank is interested in a predictive model because it will allow them to call the right customers atthe right times. From an analytics perspective, the primary distinguishing feature in this case is thatsolving a predictive problem is directly useful to the firm.Please hand in both documents and outputs preferably using Rmarkdown.Basic Explanatory Analysis1. Load the data contained in the file data_telebank.csv and name the variable dta_bank2. In one sentence, describe variables in each column paying special attention toa. Type of variable (categorical/numerical) and what are the units (for the numericalonly)b. For the ones that are numerical study whether they have outliers. There is nodefinition for what an outlier so we can define an outlier as any observation with avalue that is more than 4 times its standard deviation.The variable that will focus our study is y and it indicates whether the household actuallydecided to join the bank. We will see how we can use the predictive modeling techniques seenin class to improve the efficiency making marketing phone calls.3. Create a corr-plot using the package corrplot. You will have to install it using thecommand install.packages()4. Run the following command lm(y~.,data=dta_bank)a. Write the structural equation that R is estimating?b. Comment the results.i. Best time to perform telemarketing tasks?ii. Best income groups?iii. Potential concerns of omitted variable BiasPredictive Modeling and TuningThis is a predictive modeling exercise and we have seen in class that we always divide th代写Modelling作业、代做dataset留学生作业、代写Python,Java,c/c++程序语言作业 代写Pyte dataset in dta_bank_training, dta_bank_validating, dta_bank_test.1. Explain (in sentences) why and how we always do that.2. From the point of view of the firm and given that we are running a predictive exercise, isthere any variable that should not be included as X? If yes, please drop it.3. Explain the problems of overfitting and underfitting.4. Explain the meaning of the no free lunch theorem.5. For the following 4 models, write their structural equations and comment:lm1 = lm(y~age+factor(month), data=)lm2 = lm(y~age+age^2+age^3+factor(month),data=)lm3 = lm(y~., data=)lm4 = lm(y~.^2, data=)a. Which one overfits more?b. Which one underfits more?c. Is the model that fits the training data the best one that has the best predictivepower?d. Can we use a confusion matrix to analyze the problems a problem ofunderfitting?e. Which data set should we use to run these regressions?Improving the predictive power1. Make a visualization to inspect the relationship between the Y and each of the X that youhave included in the regressions above.a. Does it look linear?2. Use the other predictive methods seen in class (like NB classifiers or KNN) to check ifyou can improve the performance.3. Do they make it better? Worse?Causal Questions1. When we study causality we always focus on the parameters multiplying the X variablesinstead of the predictive capacity of the model. We then give a causal interpretation tothe estimated coefficients.a. Explain when in marketing is preferable a causal analysis to a predictive analysis.b. In the context of a linear regression, explain the concepts of a biased estimated.2. Which of the variables could be interesting to analyze from a causal point of view. Giveexamples.3. For those variables what would be the potential omitted variables problem?转自:http://www.6daixie.com/contents/13/4595.html

你可能感兴趣的:(讲解:Modelling、dataset、Python,Java,c/c++Python|Python)