Computer Science Homework Help

Auburn University Association Analysis and the Advanced Concepts Discussion

 

This week we discuss association analysis and the advanced concepts (in Chapter six).  After reviewing the material answer the following questions:

  1. What are the techniques in handling categorical attributes?
  2. How do continuous attributes differ from categorical attributes?
  3. What is a concept hierarchy?
  4. Note the major patterns of data and how they work.

Reply to at least two classmates’ responses by end of the week.

post from sanjay:

  1. What are the techniques in handling categorical attributes?

Technique in handling categorical attributes involves association analysis. Categorical and symmetric binary attributes will be transformed into items while extracting patterns from a given dataset. This transformation helps to enable the application of the existing association rule mining algorithm and could be undertaken through creation of new items for the distinct attribute pairs (Tan et al, 2019).

  1. How do continuous attributes differ from categorical attributes?

Continuous attributes are a numeric variable that will have an infinite number of values between any two values. Continuous attributes can be date/time or numeric such as the date/time of the payment received. On the other hand, Categorical attributes are finite number of categories or distinct groups which might not have a logical order such as gender, payment method, race etc. (Trubin & Sallnas, 2014).

  1. What is a concept hierarchy?

A concept hierarchy refers to a sequence of mappings from a set of low-level concepts to higher level, more general concepts. Concept hierarchy reduce the data by collecting and replacing low level concepts such as numeric values for the attribute age by higher level concepts such as young, middle aged or senior. Concept hierarchy helps to understand the patterns and different models that are being exhibited in an association analysis and does a comparative analysis between the variables in a specific dataset (Cimiano et al, 2005)

  1. Note the major patterns of data and how they work.

There are four major patterns of data i.e., associations, predictions, clusters, and sequential relationship. Association patterns provides guidance for modeling the association that occur among objects within both the real world and the solutions domains of computer applications. Prediction pattern helps to identify the nature of the future occurrences of certain events based on what has happened in the past. Cluster pattern identifies natural grouping of things based on their known characteristics such as identifying customers based on their demographic and past purchase behavior. Sequential relationship pattern discovers time ordered event (Tan et al, 2021).

post from rahul:

Categorical attributes are commonly known for masking and hiding essential information in a dataset. In this perspective, it is of essence to understand some of the techniques that can be used to deal with such variables. Inability to deal with the variables makes it impossible to find the essential variables in a model. Therefore, there are various ways to deal with the variables, but not all of them improve the results.

Converting to numbers is one of the ways of dealing with categorical variables. This method is suitable in cases where ML libraries are unable to accept categorical variables as input. In this perspective, they have to be converted to numerical variables. Ideally, some techniques are used to convert categorical data into numerical nature. They include a label encoder used to convert non-numerical labels to numerical labels, and the numerical labels vary between 0 and the n- classes and -1. According to Anwar(2019), label encoding substitutes each group with a corresponding number and maintaining this consistency throughout the feature.

The other method of dealing with categorical variables is one-hot encoding. This is the correct and the often way used to deal with categorical data. Ideally, this method involves creating an additional feature for every group of the categorical feature and marking all the observations belonging to value-1 and value-0.

The difference between categorical attributes and continuous variables is that categorical variables have a finite number of distinct and categories groups while the continuous attributes contain an infinite number of values between any two values. Swamy & Reddy (2020) cited that a concept hierarchy defines an arrangement of mappings from a set of low-level and more general concepts.

Conclusively, there are various major patterns in data mining, including association, predictions and clusters. In this perspective, associations detect the commonly occurring groupings of things, for instance, bread and butter. Predictions depict the nature of future occurrences of various events based on past occurrences. Clusters show natural groupings of things based on their current and evident characteristics.