Computer Science Homework Help

SJSU Types of Classifiers & Nearest Neighbor & Naïve Bayes Classifiers Discussion

 

Please write a 150 word peer response for Answer 1 and another 150 word peer response for Answer.

Answer 1

Discuss TWO types of classifiers?

The development of sequencing techniques led to an exponential growth of protein sequences in the public databases. The sequential information has been successfully applied to unveil the structures, functions, evolutionary relationships, etc. Lot of computational methods have been developed to classify the protein sequences and to predict the diseases based on their sequence information. The classification of biological sequences is one of the significant challenges in bioinformatics as well in genomics and proteomics. The existence of these sequence data in huge masses and their indistinctness and especially the high costs for lab experiments make use of data mining in disease prediction methods which are applied instead of laboratory experiments. Since a wide number of diseases are based on proteins and their sequences, the protein sequence analysis has been of great attention recently. The use of data mining techniques in protein sequence analysis provides an efficient way for examining the proteins to identify their characteristics and it also provides a way for better drug designing. In this research work, the haemoglobin protein based diseases are predicted by applying Naïve Bayes classifier. The performance of this classifier is analyzed by the factors classification accuracy and execution time (Birtane, & Korkmaz, 2014).

Discuss the difference between nearest neighbor and naïve Bayes classifiers?

           In recent trends, technologies such as computers, satellites and many others have led to an exponential growth of collected data in various areas. It is clear that traditional data analysis techniques do not have sufficient power to process large amounts of data efficiently. In this case data mining technology is only way that can extract knowledge from large amount of data. In recent times the collection of biological data like protein sequences and the DNA sequences is increasing at explosive rate due to improvements of existing methods and technology. So data mining techniques are used to extract the meaningful information from the huge amount of biological data sequences such as the DNA or protein sequences etc (Stoltzfus, 2011).

References

Birtane, S., & Korkmaz, H. (2014). Rule-based fuzzy classifier for spinal deformities. Bio-Medical Materials And Engineering, 24(6), 3311-3319. doi: 10.3233/bme-141154

Stoltzfus, J. (2011). Logistic Regression: A Brief Primer. Academic Emergency Medicine, 18(10), 1099-1104. doi: 10.1111/j.1553-2712.2011.01185.x

—————————————————————————————————————————————-

Answer 2

Discuss TWO types of classifiers?

Rule based classifier:

Rule-based classifiers are a form of classifier that makes a classification conclusion based on a set of “if…else” rules. Because these criteria are simple to understand, these classifiers are frequently employed to construct descriptive models. The antecedent is the condition utilized with “if,” and the consequent is the anticipated class of each rule. The rules created by rule-based classifiers are usually not mutually exclusive, meaning that many rules might apply to the same data. The rules created by rule-based classifiers may not be exhaustive, meaning that certain data may be excluded from all rules. Missing values in the test set are problematic for rule-based classifiers. This is because rules in a rule set are ordered in a certain way, and even if a test instance is covered by many rules, they might give various class labels based on their rule set position.

Logistic Regression classifier:

Logistic regression is a classification model that computes the poster probabilities directly without making any assumptions on the class conditional probabilities. As a result, it is general and may be used in a variety of situations. It’s also known as multinomial logistic regression when applied to multiclass classification. Irrelevant characteristics may be handled using logistic regression by learning weight values near to 0 for attributes that do not improve performance during training. It can also manage interacting characteristics since model parameter learning is done jointly, considering the impacts of all attributes. Because the posterior probabilities are only obtained by taking a weighted sum of all the characteristics, logistic regression can’t handle data instances with missing values. A training instance can be removed from the training set if it has missing values. If missing values are present in a test case, however, logistic regression will fail to predict the class label.

Discuss the difference between nearest neighbor and naïve Bayes classifiers?

Nearest neighbor classification is part of a broader method known as instance-based learning, which does not create a global model but instead makes predictions for a test instance based on the training instances. Whereas Naive Bayes classifiers are probabilistic classification models that provide posterior probability estimates to quantify the uncertainty in predictions. Missing values in the training set can be handled by disregarding the missing values of each attribute when generating conditional probability estimates using Naive Bayes classifiers. By employing suitable proximity measures that may include the impacts of several characteristics together, nearest neighbor classifiers can manage the existence of interacting attributes, i.e., traits that have higher predictive value when combined than when used alone. Because proximity computations often demand the availability of all characteristics, nearest neighbor classifiers have trouble managing missing values in both the training and test sets. Isolated noise points do not have a major influence on conditional probability estimations since they are typically averaged out during training, therefore Naive Bayes classifiers are robust to them.

References

Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar. 2018. Introduction to Data Mining (2nd Edition) (2nd. ed.). Pearson.

Stoltzfus, J. C. (2011). Logistic Regression: A Brief Primer. Academic Emergency Medicine, 18(10), 1099–1104. https://doi.org/10.1111/j.1553-2712.2011.01