AD699: Data Mining for Business Analytics Individual Assignment #2 You will submit two files via Blackboard:

(1) Your write-up. This should be a PDF that includes your written answers to any questions that ask for written answers, along with the other things asked for in the prompt.

(2) Your R Script. This is the script that you will use to write your assignment. If you use Markdown, you’ll submit an .RMD rather than a .R file.

As always, remember to take advantage of your available resources: We’ll have four live Q&A sessions next week, in addition to unlimited opportunities to schedule a Zoom session on any other day or time. For this assignment in particular, the video library can be quite helpful. As the course slogan says, “Get After It!” For each step, your write-up should clearly display your code and your results. For any step in the prompt that includes a question, the question should be answered in written sentences. This model will be used to predict the AVG_SALARY, per year, of a National Basketball Association (NBA) player’s contract. This assignment will not require any specific domain knowledge from outside of the dataset description, dataset, and prompt. Main Topics: Simple Linear Regression & Multiple Linear Regression Tasks:

● Simple Linear Regression: For this assignment, we will use the dataset nba_contracts.csv, which can be found on our class Blackboard page. Start by downloading this dataset.

1. Read the dataset into your environment in R.

2. Create a new variable called ppg. This new variable, which stands for “points per game” will be created by dividing points by games played.

3. Let’s explore the relationship between points per game and average salary. Using ggplot, create a scatterplot that depicts average salary on the y-axis and ppg on the x-axis. Add a best-fit line to this scatterplot. What does this plot suggest about the relationship between these variables? Does this make intuitive sense to you? Why or why not?

4. Now, find the correlation between these variables. Then, use cor.test() to see whether this correlation is significant.

What is this correlation? Is it a strong one? Is the correlation significant?

5. Using your assigned seed value, create a data partition. Assign approximately 60% of the records to your training set, and the other 40% to your validation set. Keep in mind that a seed value has no relationship to the data itself — it’s just an arbitrary number.

6. Using your training set, create a simple linear regression model, with AVG_SALARY

as your outcome variable and ppg as your input variable. Use the summary() function to display the results of your model.

7. What are the minimum and maximum residual values in this model?

a. Find the player whose salary generated the highest residual value. What was his actual salary? What did the model predict that it would be? How is the residual calculated from the two numbers that you just found?

b. Find the player whose salary generated the lowest residual value. What was his actual salary? What did the model predict that it would be? How is the residual calculated from the two numbers that you just found?

c. It might be unfair to say that the person in 7a is overpaid, or that the person in 7b is underpaid. Why might it be unfair to say this? (Note: You do *not* need to be a basketball fan, or to know about the NBA, in order to answer this). However, you should look at the dataset and the data description, and give this just a bit of thought before answering). You can answer this question in 2-3 sentences.

8. What is the regression equation generated by your model? Make up a hypothetical

input value and explain what it would predict as an outcome. To show the predicted outcome value, you can either use a function in R, or just explain what the predicted outcome would be, based on the regression equation and some simple math.

9. Using the accuracy() function from the forecast package, assess the accuracy of your model against both the training set and the validation set. For this answer, focus on the differences between the training and validation sets. To assess the model, focus mainly on RMSE and MAE.

10. How does your model’s RMSE compare to the standard deviation of average salary in

the dataset? What can such a comparison teach us about the model?

● K-Nearest Neighbors:

The model that we’ll build will aim to predict whether a college will have a high graduation rate. To answer this question, we will use the College dataset from the ISLR package in R. A description of this dataset can be found on our class Blackboard page, in the same folder where you found this assignment prompt.

1. Bring this dataset into your R environment. Once you have brought the ISLR package into your environment, you can do this with: > data(College)

2. We are going to build a classification model with Grad.Rate as our response variable. Call the str() function on your dataset and show the results.

a. What type of variable is Grad.Rate? b. If Grad.Rate is not currently a factor, convert it into a factor by binning it. Use

the median to create two levels for this factor — any records at or above the median should be labeled “High Rate” and any records below the median should be labeled “Low Rate.”

3. Are there any NAs in this dataset? Show the code that you used to find this out. If there are any NA values in any particular column, replace them with the median value for that column.

4. Creating two new features:

a. Create a new variable called ‘selective.’ Selective should be found by taking Accept divided by Apps. (Accept/Apps)

b. Create another new variable called ‘yield.’ Yield should be found by taking Enroll divided by Accept. (Enroll/Accept)

5. Using your assigned seed value, partition your entire dataset into training (60%) and validation (40%) sets.

6. Make up a fake college (yes, really!) a. Give your college a name (there’s no R code needed here, and you won’t use the

name when you run k-nn…but give the school a name anyway, and just write it here).

b. Use the runif() function to give your college values for each of these numeric predictor attributes: Expend, S.F. Ratio, perc.alumni, selective, and yield. Use the min and max values from your training set as the lower and upper boundaries for runif().

7. Normalize your data using the preProcess() function from the caret package. Use Table

7.2 from the book as a guide for this.

8. Using the knn() function from the FNN package, and using a k-value of 7, generate a predicted classification for your college. For your input variables, use Expend, S.F. ratio, perc.alumni, selective, and yield. What outcome category was it predicted to belong to? Also, who were your college’s 7 nearest neighbors? How many of them were High Rate, and how many were Low Rate? Be sure to show their outcome classes in your write-up.

9. Use your validation set to help you determine an optimal k-value. Use Table 7.3 from the textbook as a guide here.

10. Using either the base graphics package or ggplot, make a scatterplot with the various

k values that you used in 7a on your x-axis, and the accuracy metrics on the y-axis.

11. Re-run your knn() function with the optimal k-value that you found previously. What result did you obtain? Was it different from the result you saw when you first ran the k-nn function? Also, what were the outcome classes for each of your college’s k-nearest neighbors? Be sure to show their outcome classes in your write-up.

,

"NAME" – this is the name of the player "CONTRACT_START" — the year that the contract started "CONTRACT_END" — the last year of the contract "AVG_SALARY" — the average salary, per year, for the duration of the contract "AGE" — the player's age "GP" — number of games played in the previous season (season before the contract was signed) "W" — number of Wins for the player's team in the previous season "L" — number of Losses for the player's team in the previous season "MIN" — total number of total minutes played by the player in the previous season "PTS" — total number of points scored by the players in the previous season "FGM" — number of field goals made by the player in the previous season. A field goal is a "regular" shot in basketball "FGA" — number of field goals attempted by the player in the previous season. "FG." — percentage of field goals made by the player in the previous season. "X3PM" — 3-point field goals made by the player in the previous season. (these shots are taken from far away, and are worth one more point than a shot from a closer range) "X3PA" — 3 point field goals attempted by the player in the previous season "X3P." — percentage of 3-point field goals converted by the player in the previous season "FTM" — free throws made in the previous season (a free throw is worth 1 point, and is not contested) "FTA" — free throws attempted in the previous season "FT." — free throw percentage in the previous season "OREB" — offensive rebounds in the previous season (this occurs when a player on one's own team attempts a shot, and misses, but the player grabs the ball, so his team retains possession "DREB" — defensive rebounds in the previous season (same as above, but when the shot comes from the other team) "REB" — total rebounds in the previous season "AST" — total assists in the previous season (passes to another player on one's own team, who subsequently scores) "TOV" — total turnovers in the previous season (times when a player loses the ball and the other team recovers) "STL" — total steals in the previous season (times when the player takes the ball away from the other team) "BLK" — total blocked shots in the previous season (times when the player prevents the other team from scoring) "PF" — total personal fouls made the player in the previous season (when a player illegally prevents an opponent from taking some action) "X…" — the player's "plus/minus" score from the previous season (total points by the player's team when the player is ON the court, minus the total points scored by opponent when player is ON the court) ** If any definitions here do not seem clear, please ask about this during office hours and/or during class.

We are a professional custom writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework.

Yes. We have posted over our previous orders to display our experience. Since we have done this question before, we can also do it for you. To make sure we do it perfectly, please fill our Order Form. Filling the order form correctly will assist our team in referencing, specifications and future communication.

1. Click on the “**Place order** tab at the top menu or “**Order Now**” icon at the
bottom and a new page will appear with an order form to be filled.

2. Fill in your paper’s requirements in the "**PAPER INFORMATION**" section
and click “**PRICE CALCULATION**” at the bottom to calculate your order
price.

3. Fill in your paper’s academic level, deadline and the required number of pages from the drop-down menus.

4. Click “**FINAL STEP**” to enter your registration details and get an account
with us for record keeping and then, click on “**PROCEED TO CHECKOUT**”
at the bottom of the page.

5. From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.

Need this assignment?

Order here and claim 25% off

Discount code SAVE25