Computer Science Homework Help

Big Data Multiple Pattern Matching & Naive Algorithm Essay

 

Q1. Multiple pattern matching.

AA: In big data applications, one is often faced with searching for multiple items within the huge datasets at the same
time. Imagine that you wish to search the web for webpages that contain the following four terms: “Mountaineers,
Morgantown, West, Virginia”. Consider the web to be our huge database (big data) with billions of symbols.
Explain how you will perform the required search using each of the following:

(A) The naïve search algorithm

(B) The suffix tree.
(C) The suffix array

You can give an algorithm or a pseudocode for performing the search.
Indicate the time and space complexity (Big O notation) for each approach (average case, worst case), in terms of the
size of the text, size of the patterns, number of patterns.

Based on this, compare and contrast, the three methods for searching for multiple patterns in Big Data collections.
That is, for each pair of methods, describe the advantage(s) & disadvantage(s) of one method against the other.

_________________________________

BB: Given velocity as one of the core V’s of Big Data, describe how you can use each of the three approaches above
(naïve algorithm, suffix tree, and suffix array) to handle cases where the data is coming at a high velocity. That is, the
dataset (T) is changing rapidly, example streaming data. Assume for now that changes are made simply by appending
data to the end of the existing database.

_____________________________________

CC. Explain how the time and space complexity will be affected by velocity for each of the three cases above. Which
approach do you think is more suitable, as $n$ approaches Big Data sizes – i.e., when n becomes very large? Justify.