Assignment1collatedfeedback1.docx

Assignment1collatedfeedback1.docx

General:

· Check that you included everything that was asked for in the report (not just the workbook). If you missed out computing or discussing e.g. the p-values I couldn’t give you marks for those!

Data description – univariate

· Things I looked for were

· A brief introduction to the variables: what do the quantities mean?

· Use of descriptive statistics to describe the main features of the data e.g. IQR, CV, mean, median, quartiles, standard deviation (together with empirical rule or Chebyshev theorem)

· A little discussion about outliers that were interesting or were removed

· Histograms, polygons, boxplots and/or normal probability plots and discussion of the data distribution shape

· If there was no observation in a data set for a country it should not be treated as zero. Also be suspicious of zeros that do appear in the raw data set – they might have ended up there in place of no observation.

· Be careful with units – state what the units of the variables are and keep using those units for things like mean, standard deviation etc. E.g. cereal yield is in tonnes per capita, not percentage.

· Left skewed or negatively skewed data has a peak near the top of the distribution and a long lower tail.

· A data set does not have to fall into {left skewed, symmetric/normal, right skewed}. There are many other variations (without specific names, you could just call it asymmetric for instance).

· It wasn’t necessary to transform a data set for the univariate analysis if it was skewed, only if it was so skewed or outliers were so extreme that boxplots etc. were not useful. It could of course be useful for bivariate analysis if one or both variables seem to be lognormal, because then you could still see a linear relation in the scatterplot using the logged variable(s) and linear correlation and regression would be valid.

Data description – bivariate

· Things I looked for were

· A scatterplot and associated discussion (not a line chart or bar chart with series side by side)

· The correlation coefficient being stated and interpreted

· Discussion about why there might or might not be a relationship

· You shouldn’t just rely on the correlation coefficient – if the scatterplot indicates almost no relationship then you should say it is doubtful there is an actual relationship

· If the scatterplot indicates there is a relationship, but it is more likely to be non-linear, you should mention this.

· Don’t just say there might be a third variable and leave it at that. Some discussion is important to show you know what this actually means.

Confidence intervals

· Things I looked for were

· Statement of each confidence interval in a sentence, in the context of the variable

· Discussion of why the confidence intervals were valid (appealing to CLT)

· Some said the sample mean was a good estimate because it was inside the confidence interval. Of course, the sample mean is at the center of the interval! This does not guarantee anything about the accuracy of the estimate.

· If you have dropped countries from the data set, the confidence interval only estimates the mean of the population you have sampled from, which may not be the same population that the countries you dropped come from. E.g. if you drop many African countries from the data set due to lack of data, then you should use caution saying that the confidence interval estimate for the mean still applies to African countries (i.e. only valid if you are quite sure there is no systematic difference between such countries and those included in your sample in the context of your variable.

P-values

· Things I looked for were

· Computation of two-tail p-values in the workbook

· An interpretation of the two-tail (not one-tail) p-value for each variable in terms of how likely it is to observe the sample estimate if the assumed population parameter was correct (not in terms of rejection of the null hypothesis or not).

· Don’t guess the parameter value to be equal to the sample mean, otherwise the p-value is guaranteed to be 1 (and is just cheating!)

Professional appearance

· The report should have been easy to read, well-structured, and include charts with titles and axis labels etc.

· You shouldn’t put bar charts of all the countries for each variable into the report – if you want to highlight certain countries e.g. top 5 and bottom 5 just put them into a bar chart.

· You also shouldn’t paste in whole descriptive statistics tables or give values with all the decimal places Excel gives you by default. A good rule of thumb is two or three significant figures. E.g. 542,997 becomes 543,000 and 0.024897874 becomes 0.025 or 0.0249. Also don’t paste in the confidence interval or p-value calculation templates. Managers don’t have time or understanding for technical output. They won’t be impressed; they may just be confused or think you are showing off. They will be impressed if you can boil that technical material down in such a way that they can understand the key points enough to know what decisions need to be made.