IndividualProject_2022.pdf

Individual Term Project – Data Analysis Project 30%

Each student will, individually, analyze the Individual Project Data Set W2022 Final.xlsx file.

The dataset is from Houston Dynamo FC and includes data from ticket sales and social media

content for a match that was held in March 2019 between Houston Dynamo FC and the

Vancouver Whitecaps. The Dynamo analytics team is looking for information regarding the

relationship between social media content, ticket sales, and date of purchase, as well as any

differences in ticket sales by date of transaction. You have been hired as a consultant to review

the dataset and provide a summary report to the Houston Dynamo executive team**.

• Write the Report (include a creative title page for your report and an executive summary that outlines what the report will cover)

a. Data Summaries:

• In an opening section with a heading entitled, “Cleaning the Data”, summarize any outliers within the variables associated with Column F (Facebook Content

Impressions and G Facebook Engagement)

o Remove the rows that include outliers in Columns F and G, and o Describe the outliers (by Date of Impression or Engagement) that were

removed from these columns and how you identified them as outliers.

o Maximum 1-page, single space. 6 marks

• Following the removal of outliers, create and present histogram visual summaries of columns E-J and column M;

o Approximately 3-7 pages depending on how the histograms are formatted. Make sure all values can be seen when formatting to fit on the page.

7 marks

• Following the cleaning of data, calculate mean and standard deviation values for columns E-J and column M, and frequency statistics for columns L and N.

o Present these data in two tables, one for means and standard deviations for each variable in columns E-J, and

o One for the frequency scores for columns L and N. 16 marks

• Conclude this section by outlining three main difference test hypotheses and discuss if correlation analyses can be run.

o Three hypotheses related to difference tests you plan to assess (e.g., variables you will assess to see if there is a mean difference) and

o A conclusion on if correlation analyses can be run with this data. o Approximately 1-2 pages, single spaced.

5 marks

Part A Total: /34 marks

BEGIN HYPOTHESIS TESTING AND DATA ANALYSIS

b. Present Comparison Statistics:

• Start this section with an opening paragraph that identifies the columns of data you wish to test for THREE differences in mean (as an example, will you test

differences in ticket price by certain date categories? You can justify trying to test

three different date categories against one another).

o Use textbook terminology to outline why a difference test is appropriate with the variables you have chosen.

6 marks

• Provides a brief summary of your three difference test results. Marks will be allocated for if the appropriate test was run, and the appropriate statistics are

reported for each difference you have identified:

o Make sure to identify if the difference between mean scores is significant based on your p value range,

o If the null hypothesis is rejected, and o Any meaningful variables connections to the mean difference test;

▪ Approximately 1-2 pages. ▪ Calculations will appear as appendices

9 marks

• The results in this section will also be graded on how they are presented. Make sure the results are clear and formatted in a way that looks professional.

3 marks

Part B Total: /18 marks

c. Acknowledging Association Statistics:

• An organization may be interested in correlation analyses to determine if social media data is associated with ticket price purchases. Using scatterplots, provide a

summary of the associations between Columns E-J and Column M; six total

scatterplots.

6 marks

o From the visual representation of this data, what can we say about running correlation analyses on these variables?

2 marks

Part C Total: /8 marks

***Continued on next page

d. Provide Conclusions and Recommendations – Approximately 2 pages: Outline 5 conclusions and 5 recommendations based on the results you believe would be

meaningful to the Houston Dynamo.

Part D Total: /10 marks

Total: / 70 marks

Due Date: Wednesday April 13 2022 at 12:00pm EST to the Sakai Drop Box

**Partnering with the Houston Dynamo and Wasserman Media Group, approximately five

(5) exemplar Individual Projects (i.e., superior mark and professionalism of project

presentation) will be selected by Dr. Kerwin and the TAs to have the opportunity to present

their findings and recommendations to the Houston Dynamo data team. If chosen, Dr.

Kerwin will ask for the students’ voluntary engagement in the presentation (note. This is

not a requirement of the project, but rather than added incentive to conduct quality work)

and ensure the students are prepared to make a presentation to these industry

stakeholders.