Writing Homework Help

MIS 650 Grand Canyon University Response for discussion

 

Discussion 1: Josie Barney

Data manipulation consumes around 80% of the effort and is why it is popularly termed “data munging” by y Simple founder Josh Reich (Lander, 2017). No matter your approach, a general concept changes the information to make it readable and organized.

In R, depending on the package you are working in, will show what functions can be used. Three functions that can be used for data manipulation in R are: (Naveen, 2021)

1. Package: dplyr

Function: filter()

Description: Filter() is used to find rows with matching criteria. For instance, when a project needs a particular skill set, searching the employee job title to match the criteria need the data is limited to just the employees that have that job title.

2. Package: dplyr

Function: arrange()

Description: Arrange() is used to sort rows by variables in ascending and descending order. This can be used when you are running through an inventory report and need to know what products are low on stock. The function allows you to sort in ascending order of product inventory, and you can see what product is lowest in-stock first.

3. Package: dplyr

Function: summarise()

Description: Summarise() used to find insights(mean, median, mode, etc.) from a dataset. This can be used when you have to supply stats on sales of each of the sales reps. This function can be used to give you the mean, median, mode, etc., of each rep weekly, monthly, yearly sales.

Data manipulation in an R or similar programs can be a great advantage if you understand how to run the functions you need to adjust the data to what you need. On the other hand, it might take more time if you have to troubleshoot your code. If it is a repetitive report you have to adjust often, the time spent setting up the code will save time. For example, I pull many reports weekly; by setting up code to manipulate the data into the information that is readable for what I use, I save myself four hours that it would take me in excel to do the same.

Doing your data manipulation in excel can have advantages if the data is a one-time need or simple adjustments. If the information is more complicated, the time to format, sort, add formulas and delete unnecessary columns and/or rows can eat up time that can be used more efficiently. The process can also be far more complicated than if you just used some coding in R. An example is pulling a report with simple product figures for a teammate. It is simpler for me to format it in excel since I need to sort it by product name.

References

Lander, J. P. (2017). R for everyone: Advanced analytics and graphics (2nd ed.). Addison-Wesley Professional.

Naveen. (2021, March 8). Data manipulation in R. Intellipaat Blog. https://intellipaat.com/blog/tutorial/r-programmin…

Upadhyay, I. (2020, December 10). Data manipulation: Definition, and purpose, examples. Jigsaw Academy. https://www.jigsawacademy.com/blogs/data-science/data-manipulation/#What-is-Data-Manipulation?

Discussion 2: Arcelia Rael

Three functions that can be used for data manipulation in R include filter(), select(), and mutate(). Coding Club (2019) notes that filter() and select() are most often used to reduce the size of a data frame as needed. Specifically, filter() works to match and return a subset of rows while select() manipulates the returned columns.

In this example, we use the filter function to return only product types with a “Name” column equal to “Mountain-200 Black, 38”.

productSales %>%

filter(Name == “Mountain-200 Black, 38”)

In this example we use the select function to return only two columns from our data set: “Name” and “LineTotal”:

productSales %>%

select(Name, LineTotal)

In our last example, we use the mutate function to add a new column to our sales aggregate data to identify price per unit based on quantity and revenue:

PPU = salesAgg %>%

mutate(PricePerUnit = Revenue/`Unit Volume`)

Preparing your data in applications like Microsoft Excel before being used in RStudio, has the advantage of being able to complete some operations without having to write out lengthy code. For example, instead of using the rename() function to rename your columns, you can do this quickly and efficiently through Excel. Additionally, if data should be sorted in a specific manner before analysis, we can order the data in Excel before upload instead of using the order() function in R. Completing all data manipulation in R has the advantage of allowing you to keep track of all manipulations via the console in the environment. This can reduce mistakes associated with preparing and handling raw data outside of the final environment. This includes data type mismatches after upload, which may occur when a data type in Excel is not translated correctly into the new environment. Lastly, R is built to handle large data sets (vector lengths are capped at around 2 billion), which Excel can struggle with due to its built-in limitations. As noted by Microsoft (n.d.) Excel has a table size limit of “1,048,576 rows by 16,384 columns” (Excel specifications and limits section). Due to these sizing limitations, it is more practical to complete data preparation and manipulation in R.

References

Coding Club. (2019, April 4). Basic data manipulation. https://ourcodingclub.github.io/tutorials/data-manip-intro/

Microsoft. (n.d.). Excel specifications and limits. Retrieved August 21, 2021, from https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

Discussion 3: David Lundholm

Data manipulation in R is used to modify data in order to organize it better and make it easier to read. In addition, it can help remove any inaccuracies and make data more accurate and precise for better analysis and data visualization. Within the dplyr package in R, analysts can use the distinct(), arrange() or summarise() functions to assist with further data modification. Using the distinct() function will remove duplicate rows which can be very useful when wanting to get the most accurate dataset returned. In our pervious assignment using this function would assist with pulling the top 10 products for AdventureWorks while making sure duplicate data was omitted from the results. The arrange() function can assist with reordering the rows of your results to specifically draw users to the pertinent information requested. By far the most beneficial function, users can use the summarise() function to compute statistical summaries (e.g., computing the mean, min, max etc.) For example, computing statistical information for reports or deeper analysis can done quickly to speed up time consuming computations.

Since data manipulation is essentially organizing or modifying data, it can be conducted in both excel or R. Most analysts probably preform their data manipulation within excel simply because of the familiarity within excel and it is by far the most common system used for data. However, excel is not the most user friendly when it comes to modifying data and is very time consuming. R does almost all the work for analysts and is very easy use one of these simple functions. Unfortunately, R is not a very common system used within most organizations and can be difficult to incorporate to a normal daily process for most analysts.

References:

Datanovia. Data manipulation in R. https://www.datanovia.com/en/courses/data-manipulation-in-r/