Computer Science Homework Help
University of Cumberland W4 Data Classification Discussion
Data Classification:
Data classification is the process of separating and organizing data into relevant groups (“classes”) based on their shared characteristics, such as their level of sensitivity and the risks they present, and the compliance regulations that protect them. When done right, data classification makes using and protecting data easier and more efficient (Data classification, 2020).
undefined
Classification Framework:
There are two data classification models described in our textbook they are:
Descriptive Modeling: It serves as an explanatory tool to distinguish between objects of different classes.
Predictive Modeling: It can used to predict the class label of the unknown records (Tan, P.-N. et al., 2013).
undefined
Decision Tree:
The decision tree has three types of nodes. A root node that has no incoming edges and zero or more outgoing edges. Internal node has exactly one incoming edge and two or. More outgoing edges. Terminal node has exactly one incoming edge and no outgoing edge. In a decision tree, each leaf node is assigned a class label. The non-terminal nodes, which include. The root and other internal nodes, contain attribute test conditions to separate records that have different characteristics (Tan, P.-N. et al., 2013). Decision tree helps to visualize and understand data. Decision tree can also handle multidimensional data with great accuracy.
undefined
Hyperparameters:
Hyperparameters are model parameters that are estimated without using actual, observed data. It’s basically a “good guess” at what a model’s parameters might be, without using your actual data. The term “hyperparameter” is used to distinguish the prior “guess” parameters from other parameters used in statistics, such as coefficients in regression analysis (Hyperparameter, 2020).
Model Selection and Evaluation:
Model Selection is the process of choosing between the different learning algorithms for modelling our data, for solving a classification problem the choices could be made between Logistic Regression, SVM, Tree-based algorithms etc. Model Evaluation aims to check the generalization ability of our model, i.e ability of our model to perform well on an unseen dataset (Patel, S. 2020).