Computer Science Homework Help

BU Machine Learning Dataset Worksheet

 

Aim

The purpose of assignment 4 is to apply descriptive analysis to compare two or more groups with the dataset your group preprocessed in assignment 3. This assignment is related to the term project. By combining assignments 2, 3, and 4, you should be able to set up research questions and answer those questions. Please note that this assignment should be group work. Basically, you are expected to 1. to select two or more groups (e.g., taking aspirin vs. not taking aspirin), 2. determine the type of variables (categorical vs. numerical), 3. select appropriate statistical tests (e.g., Chi-squared test, Fisher’s exact test, Student’s t-test, One-way ANOVA, Mann–Whitney test, Kruskal–Wallis test, and so on), 4. apply the tests you selected in step 3. More details will be explained below.

Dataset

You need to use a dataset you created for assignment 3. It should be noted that groups to compare can be created only from categorical variables such as “RACETHNX”. If your dataset only includes numerical variables, you may need to include a new variable. Modifying your previous notebook and creating a new dataset that is different from your previous submission is totally fine. Data analysis is an iterative process. 

What to Do for the Assignment 4

  1. Do practice with the week 12 and 13 example codes.
  2. Find a categorical independent variable that can be influential on your dependent variable.
    • Create groups from this categorical variable (refer to week 12 example code).
  3. Determine the type of independent variable (categorical vs. numerical).
    • Please carefully look at the coding book to make sure your decision is correct.
  4. Based on step 3, select one of the  appropriate tests you can apply (e.g., Chi-squared test, Fisher’s exact test, Student’s t-test, One-way ANOVA, Mann–Whitney test, Kruskal–Wallis test, and so on)
    • Please provide the rationale of your choice.
    • If your dependent variable is a numerical variable, you need to examine the normality of data distribution.
    • If your dependent variable is a categorical variable, you need to examine null transactions.
  5. Apply the test you selected in step 4.
    • Please present the test statistics (e.g., chi-statistics, t-statistics, and p-value) and interpret the results accordingly.
    • You may find that there is no difference between groups. That’s totally fine for assignment 4. You just need to properly select a test and interpret the result. In the project, you should iterate this process until you find clinically or socioeconomically meaningful variable(s).

How to write

For the assignment 4, I will not put constraints on the format. However, you need to clearly explain what you have done for descriptive analysis in high-level. Also, please provide rationale for your decision and supporting evidence (e.g., test statistics) (refer to What to Do for the Assignment 4). Up to this part, you can write the report in a MS word file. You also need to submit your Jupyter notebook you used and the data file (csv).