Data Mining and Text Analysis

There are three tasks in this assignment :

1. Consider how to evaluate (compare and contrast) tools and programming languages used in data mining and text analysis.

This can be organised in any way you choose, and you are advised to have at least THREE and no more than FIVE criteria that are well defined and can be clearly justified as effective.  Reading around the literature should inform your criteria development, and where you choose to reuse existing criteria this should be clearly justified and supported by citation.

You should present clear well-defined criteria followed by a short discussion to justify your selection.

This should be no more than ONE page in length.

2. Test and evaluate at least TWO and no more than THREE tools/programming languages using either the given exercises, or a set of your own devising. Again, reading around the literature can inform how you test and apply your evaluation process.
Exercise: Using an algorithm of your choice construct and test a decision tree classifier for the No Claims attribute. with a brief explanation and justification of your choice of algorithm. – Training and Test Data are attached.

You should briefly discuss how your tests/evaluation process work and provide support for your criteria. You can choose to use a table or diagram to enable this discussion.

You should present the results of your tests/evaluation process for each of the tools/programming languages you have selected. Along with a brief discussion of what the results potentially demonstrate.

This should be no more than TWO pages in length.

3.      From your evaluation process identify ONE of the tools/programming languages that you think is the most effective given your results. Consider this tool in the context of at least TWO and no more than THREE specific application areas, for example: sentiment analysis on a twitter feed. Consider whether or not the tool/programming language is fit to potentially overcome areas of identified challenges in this area.

You will need to consider the requirements of your application areas here carefully and particular current limitations. It is not expected at this stage that this will be complete or entirely accurate. Again, reading around the research will support this task and it is acceptable to reuse the requirements of an existing application area.

This should be no more than ONE page in length.

You should provide an executive summary at the beginning of your report. This should contain at least:

      The intended aim/objective of the report overall;

      An overview of each section included indicating what tools/areas you have selected to use/evaluate/discuss;

      Any specific points of interest that need to be emphasized;

      The conclusion or final position of the report.

This should be no more than ONE page in length.

Referencing Style should be IEEE. I tried using Python and WEKA. But see the best tools that are easy. If you need clarification contact me. You can either devise something comprising test and training data to create a model. as long as the outcomes are the same.