1) A social researcher in a particular city wishes to obtain information on the number of children in households that receive welfare support. A random sample of 400 households is selected from the city welfare rolls. A check on welfare recipient data provides the number of children in each household.

a) Identify the population of measurements that is of interest to the researcher.

number of children in households that receive welfare support

b) Identify the sample.

children in households

c) What characteristics of the population are of interest to the researcher?

Average number of children that receive the welfare support

2) The eight measurements that follow are furnace temperatures recorded on successive batches in a semiconductor manufacturing process (units are F):

953 955 948 951 949 954 950 959

a) Calculate the sample mean and sample standard deviation.

b) Find the sample median of the data.

c) How much could the largest temperature measurement increase without changing the sample median?

  1. A survey on knee injuries recorded the following data on type of injury (A=meniscal tear, B=MCL tear, C=ACL tear, D=patella dislocation, E=PCL tear), the data is given in injury.csv

a) Construct a bar chart for this data.

install.packages('ggplot2') # install ggplot2 package
library(ggplot2) # import library
 
ggplot(injury_data, aes(x=Injury_Type)) + geom_bar(fill="skyblue") + labs(title="Bar Chart of Injury Types", x="Injury Type", y="Frequency")

b) Construct a pie chart for this data.

injury_table <- table(injury_data$Injury_Type)
pie(injury_table, main="Pie Chart of Injury Types", col=rainbow(length(length(injury_table))))
4) A small part for an automobile rear-view mirror was produced on two different punch presses. In order to describe the distribution of the weights of those parts, a random sample was selected, and each piece was weighed in grams, resulting in the data set weight.csv
# import data from csv
weight_data <- read.csv("/Users/weijie/Documents/cs/data analysis and stats/weights.csv")
weights <- weight_data$weight

a) What is the average weight?

print(mean(weights))

b) Find the variance of the weights.

print(var(weights))
 
# get variance in a fixed decimal
 
print(round(var(weights), 2)) # in this case, 2 is the fix decimal

c) Find the five-number summaries of this dataset.

print(summary(weights))

d) Compute the interquartile range of the weights.

print(IQR(weights))

e) Draw a boxplot of the data with x-axis and y-axis labelled.

boxplot(weights, main="Boxplot of Weights", ylab="Weight (grams)", xlab="Automobile Parts")

f) Draw a histogram of the data with x-axis and y-axis labelled.

hist(weights, main="Histogram of Weights", xlab="Weight (grams", ylab="Frequency", col="lightgreen")
5) In an experiment designed to study the behaviour of certain individual cells that had been exposed to beryllium, the interdivision times (IDTs) of cells were deter-mined for a large number of cells both in exposed (treatment) and unexposed (control) conditions. The data is given in IDT.csv
# import data from csv
idt_data <-read.csv("/Users/weijie/Documents/cs/data analysis and stats/IDT.csv")
idts <- idt_data$Interdivision_Time

a) Construct a histogram of this data.

hist(idts, main="Histogram of Interdivision Times", xlab="Interdivision Time", ylab="Frequency", col="lightblue")

b) Calculate for each observation.

log_idts = log10(idts)
print(log_idts)

c) Construct a histogram of the transformed data.

hist(log_idts, main="Histogram of Interdivision Times", xlab="Interdivision Time", ylab="Frequency", col="lightblue")

d) What is the effect of the transformation?