assignment.pdf

Waikato Environment for Knowledge Analysis (Weka) is a suite of machine learning software written in Java, developed at the University Of Waikato, New Zealand. It is free software licensed under the GNU General Public License. In previous labs, we learned how to use Weka to run several Data Mining algorithms on datasets stored in ARFF format.

In the midterm exam, you will connect Weka to MySQL Database and then apply a Data Mining algorithm you used before on data stored as tables in a database.

1. The first step is to do your own research to install MySQL and MySQL Workbench on your laptop/PC (ifyou do not have it already). Test it by making sure you are able to create tables in the database andwrite/execute queries.

2. Find 1 dataset that you have not used before in any of the previous labs (you can find the data in ARFFformat and then import them into a database). As you know, ARFF is text-based format and it is easy toreformat it into other text-based formats (e.g. *.csv) that can be imported into a database using MySQLWorkbench. The dataset should be imported as 1 table. Name the database CSCI4823MidtermDB).Copy/Paste a snapshot of the Create statements of your table in the provided answer sheet..

3. Next step is connect Weka to MySQL database. Take a screenshot of a successful connection to the database(similar to the one below). Paste your screenshot in the provided answer sheet..

Successful connection to the database

4. To test that your connection does actually work, write a query to retrieve the data from a table in thedatabase. Take a screenshot of the result (see below) and paste it in the provided answer sheet.

5. Now that you are able to connect WEKA to MySQL, database and retrieve data from it. The next step wouldbe to apply a Data Mining algorithm you learned before on data retrieved from the database.

6. Do one of the following, either association rules or clustering (your response goes in the provided answersheet.):

(a). Apply any of the Association Rules Mining algorithms on the table, take a screenshot of the output (if the output does not fit in one screenshot, copy the output in the answer sheet).

 Discuss the output (Minimum 1 paragraph of 5 lines).(b). Apply any of the Clustering algorithms on one of the tables, take a screenshot of the output (if the

output does not fit in one screenshot, copy the output in the answer sheet).  Discuss the output (Minimum 1 paragraph of 5 lines).

7. Upload the dataset to your Google Drive, copy and paste the share link in the provided answer sheet.

Query to retrieve all data from customer table

Data retrieved after successful execution of the query above