Computer Science Homework Help

Auburn University Data Mining Algorithms Questions & Responses

 

In the article this week, we focus on deciding whether the results of two different data mining algorithms provides significantly different information.  Therefore, answer the following questions:

  1. When using different data algorithms, why is it fundamentally important to understand why they are being used?
  2. If there are significant differences in the data output, how can this happen and why is it important to note the differences?
  3. Who should determine which algorithm is “right” and the one to keep?  Why?

You should be actively engaging with weekly discussions by providing peer-to-peer feedback. You need to make at least two comments on the responses posted by your classmates. You should be writing your response/feedback/comments/question at least two students in your class in at least 100 words.  

Article

  1. Tatti, V. (2012). Comparing apples and oranges: measuring differences between exploratory data mining results. Data Mining and Knowledge Discovery, 25(2), 173–207.

Post from Daniel:

Different data algorithms may be more appropriate for different situations. It is important to know how the algorithms can affect the data. This helps to determine what data structures would be best suited (Tatti, & Vreeken, 2012). This will also provide the analyst with a better understand of what the data and the results of the data mean.

It is important to note any significant differences in data output because if there are significant difference that means something important has occurred. This means that something of interest has occurred. This may occur if there are frequencies that can be derived from one data set and not from another (Tatti, & Vreeken, 2012). This should be investigated and taken note of.

The correct algorithm to use depends on the amount of time taken to investigate. The algorithm that generates the lowest level of complexity will generally be the algorithm of choice unless there is evidence that the more complex result will provide significantly more information that is not available in the other result (Tatti, & Vreeken, 2012). In this case it would be up to the observer to determine if there is important additional information that is worth the extra time and effort

Post from Prashanth:

The purpose of using an algorithm is to perform a repetitive process until the desired results are expected. Out of the several available algorithms, analysts and engineers need to understand that the purpose of the different algorithms running on data sets can produce different results when the datasets and expectations can be different when consumed by unique data algorithms. Instead of manually running a code every time and tweaking it to get the desired results, an algorithm makes it simpler for the developers and analysts to run them once finalized on the selected datasets for exploring new opportunities that were not explored previously and able to find the hidden patterns and trends in the data that can provide very significant insights (Tatti & Vreeken, 2012). In an organization, the analytics team involves understanding the data algorithms as they are more familiar with the business domain knowledge and can help execute the algorithms with appropriate input and outputs.

Though there can be different results and perspectives from data obtained through algorithms, the primary reason for these specific results could be the desired outcomes, datasets, algorithm models, and methods used to perform analytics and mining on the data. When similar datasets are used as the input for running algorithms, generally identical results should be expected. The cost of resources needed to perform this job, and the complexity in understanding the outcomes from these algorithms should also be considered well ahead (Tatti & Vreeken, 2012).

An organization can have their data engineering team, which can comprise data scientists, analysts, and other data-based professionals, to learn and adapt to the data algorithms to know which ones are the best to use for specific scenarios (Tatti & Vreeken, 2012). This practice is possible only when repetitive running of data exercises is performed with variations until the desired outcomes are achieved. Based on the results provided by the data team, business teams and analysts can let them know which algorithm-based models provided the optimum results and are cost-efficient.