2023 – 2024 Google Data Analytics Professional Certificate Course 7: Data Analysis With R – Quiz Answers

2023 – 2024 Google Data Analytics Professional Certificate Course 7: Data Analysis With R – Quiz Answers

Week 1: Programming and data analytics

R is a programming language that can help you in your data analysis process. In this part of the course, you’ll learn about R and RStudio, the environment you’ll use to work in R. You’ll explore the benefits of using R and RStudio as well as the components of RStudio that will help you get started.

Learning Objectives

  • Compare and contrast the R programming environment and the RStudio programming environment
  • Describe the RStudio programming environment including its components and benefits
  • Describe the R programming language and its programming environment
  • Describe programming languages and appropriate use including examples
  • Download and install R assets to a computer
  • Open R and execute a command
  • Differentiate between the R Console and R programming environments
  • Execute operations in R using mathematical operators such as +, -, *, and /
  • Download and use RStudio Desktop

Answers to week 1 quiz questions

L2 Programming languages

Question 1

Fill in the blank: Programming involves _ a computer to perform an action or set of actions.

  • updating
  • instructing
  • training
  • filtering

Programming means giving instructions to a computer to perform an action or set of actions.

Question 2

What are Python, JavaScript, SAS, Scala, and Julia?

  • Integrated development environments
  • Databases
  • Programming languages
  • Web applications

Python, JavaScript, SAS, Scala, and Julia are examples of programming languages.

Question 3

What are the benefits of using a programming language to work with your data? Select all that apply.

  • Clarify the steps of your analysis
  • Easily reproduce and share your work
  • Save time
  • Choose a business task for analysis

There are three main benefits of using a programming language to work with your data. You can easily reproduce and share your work, save time, and clarify the steps of your analysis.

L3 R programming language

Question 1

Open-source code is only available to people who pay a subscription fee.

  • True
  • False

Open-source code is freely available to anyone.

Question 2

The R programming language can be used for which of the following tasks? Select all that apply.

  • Data analysis
  • Visualization
  • Statistical analysis
  • Gaming

The R programming language can be used for statistical analysis, visualization, and data analysis.

Question 3

Which of the following terms best describes the R programming language?

  • Open-data
  • Data-centric
  • Closed-source
  • Open-ended

The term data-centric best describes the R programming language. R is designed to make data analysis easier, more efficient, and more powerful.

L4 Programming with RStudio

Question 1

What type of software application is RStudio?

  • Data visualization tool
  • Source editor
  • Database
  • Integrated development environment

RStudio is a type of software application known as an integrated development environment (IDE). An IDE brings together all the tools you may want to use in a single place.

Question 2

RStudio includes which of the following panes? Select all that apply.

  • Environment pane
  • Source editor pane
  • Command pane
  • R console pane

RStudio includes an R console pane for executing commands, a source editor pane for writing code, and an environment pane for managing loaded data. RStudio does not include a Command pane.

Question 3

If you write code directly in the R console, RStudio will automatically save your code when you close your current session.

  • True
  • False

If you write code directly in the R console, RStudio will automatically forget your code when you close your current session. To save your code, use the source editor.

Weekly challenge 1

Question 1

A data analyst uses words and symbols to give instructions to a computer. What are the words and symbols known as?

  • Syntax language
  • Function language
  • Programming language
  • Coded language

Programming languages are the words and symbols you use to write instructions for computers to follow.

Question 2

Many data analysts prefer to use a programming language for which of the following reasons? Select all that apply.

  • To choose a topic for analysis
  • To easily reproduce and share an analysis
  • To clarify the steps of an analysis
  • To save time

Many data analysts prefer to use a programming language in order to easily reproduce and share an analysis, save time, and clarify the steps of an analysis.

Question 3

Which of the following are benefits of open-source code? Select all that apply.

  • Anyone can fix bugs in the code
  • Anyone can create an add-on package for the code
  • Anyone can pay a fee for access to the code
  • Anyone can use the code for free

The benefits of open-source code include the following: anyone can use the code for free, fix bugs in the code, and create add-on packages for the code.

Question 4

Fill in the blank: The benefits of using _ for data analysis include the ability to quickly process lots of data and create high quality visualizations.

  • the R programming language
  • a dashboard
  • a spreadsheet
  • structured query language

The benefits of using the R programming language for data analysis include the ability to quickly process lots of data and create high quality visualizations.

Question 5

A data analyst needs to quickly create a series of scatterplots to visualize a very large dataset. What should they use for the analysis?

  • Structured query language
  • A slide presentation
  • A dashboard
  • R programming language

The analyst should use the R programming language to quickly create a series of scatterplots to visualize a very large dataset. R can quickly process lots of data and create high quality visualizations.

Question 6

RStudio’s integrated development environment lets you perform which of the following actions? Select all that apply.

  • Install R packages
  • Create data visualizations
  • Import data from spreadsheets
  • Stream online videos

RStudio’s integrated development environment lets you install R packages, import data from spreadsheets, and create data visualizations.

Question 7

In which two parts of RStudio can you execute code? Select all that apply.

  • The environment pane
  • The plots pane
  • The source editor pane
  • The R console pane

In RStudio, you can execute code in the R console pane and the source editor pane.

Question 8

Fill in the blank: In RStudio, the _ is where you can find all the data you currently have loaded, and can easily organize and save it.

  • environment pane
  • plots pane
  • R console pane
  • source editor pane

In RStudio, the environment pane is where you can find all the data you currently have loaded, and can easily organize and save it.

Week 2: Programming using RStudio

Using R can help you complete your analysis efficiently and effectively. In this part of the course, you’ll explore the fundamental concepts associated with R. You’ll learn about functions and variables for calculations and other programming. In addition, you’ll discover R packages, which are collections of R functions, code and sample data that you’ll use in RStudio.

Learning Objectives

  • Describe the contents and components of the tidyverse package for R
  • Describe the concept of packages in R programming language
  • Describe the use of pipes in R programming language
  • Describe the use of operators to complete calculations in the R programming language
  • Describe the fundamental concepts associated with programming in R including functions, variables, data types, pipes, and vectors
  • Install and load the tidyverse package
  • Use the browseVignettes(“packagename”) function to read through vignettes of a loaded package

Answers to week 2 quiz questions

L2 Programming concepts

Question 1

Why do analysts use comments In R programming? Select all that apply.

  • To make an R Script more readable
  • To explain their code
  • To act as functions
  • To provide names for variables

In R programming, comments are used to explain your code and to make an R Script more readable.

Question 2

What should you use to assign a value to a variable in R?

  • A vector
  • An operator
  • A comment
  • An argument

You should use an operator to assign a value to a variable in R. You should use operators such as <- after a variable to assign a value to it.

Question 3

Which of the following examples is the proper syntax for calling a function in R?
1 point

  • <- 20
  • print()
  • data_1
  • #first

An example of the syntax for a function in R is print(). If you add an argument in the parentheses for the print() function, the argument will appear in the console pane of RStudio.

Question 4

Which of the following examples can you use in R for date/time data? Select all that apply.
1 point

  • 2018-12-21 16:35:28 UTC
  • 2019-04-16
  • 06:11:13 UTC
  • 07/24-2018

The examples of types of date/time data that you can use in R are 06:11:13 UTC, 2019-04-16, and 2018-12-21 16:35:28 UTC. R recognizes the syntax of each of these formats as a date/time data type.

L3 Coding in R

Question 1

An analyst includes the following calculation in their R programming: midyear_sales <- (quarter_1_sales + quarter_2_sales) - overhead_costs Which variable will the total from this calculation be assigned to?

  • midyear_sales
  • quarter_1_sales
  • quarter_2_sales
  • overhead_costs

The total from this calculation will be assigned to the variable midyear_sales. The assignment operator <- follows the variable mid_sales, so the value of the calculated total is assigned to this variable.

Question 2

An analyst is checking the value of the variable x using a logical operator, so they run the following code:

x > 35 & x < 65

Which values of x would return TRUE when the analyst runs the code? Select all that apply.

  • 35
  • 50
  • 60
  • 70

The values 50 and 60 will return TRUE when the analyst runs the code x > 35 & x < 65. In this code, the logical operator & tells the server to return TRUE when the value of the variable is greater than 35 and less than 65.

Question 3

Which of the following functions can analysts use to create conditional statements in their R programming? Select all that apply.

  • print()
  • else()
  • if()
  • c()

Analysts can use the if() and else() functions and other functions to create conditional statements in their R programming. Conditional statements declare that if a certain condition is true, then a certain event must take place.

L4 R Packages

Question 1

When using RStudio, what does the installed.packages() function do?

  • Presents a list of packages currently installed in an RStudio session
  • Selects the best packages to use based on an analyst’s current needs
  • Creates code for analysts to use to edit their packages
  • Installs all available packages for use in an RStudio session

The installed.packages() function shows a list of packages currently installed in an RStudio session. You can then locate the names of the packages and what’s needed to use functions from the package.

Question 2

In data analytics, what is CRAN?

  • A commonly used online archive with R packages and other R resources
  • A collection of packages that function together to make analysis in R more efficient
  • An R interface that has many of the same functions as RStudio
  • A function for finding packages to use for analysis in RStudio

CRAN is a commonly used online archive with R packages and other R resources. CRAN makes sure that the R resources it shares follow the required quality standards and are authentic and valid.

Question 3

What are ggplot2, tidyr, dplyr, and forcats all a part of?

  • A list of functions that clean data efficiently
  • A list of variables for use in programming in RStudio
  • A collection of core tidyverse packages
  • A collection of commonly used, CRAN-based data sets

The packages ggplot2, tidyr, dplyr, and forcats are part of a collection of eight core tidyverse packages. The other core packages are: tibble, readr, purrr, and stringr.

L5 Explore the tidyverse

Question 1

When working in R, for which part of the data analysis process do analysts use the tidyr package?

  • Data security
  • Data visualization
  • Data cleaning
  • Data calculations

Analysts use the tidyr package for data cleaning. It works with wide and long data to make sure every part of a data table or data frame is the right data type and in the right place.

Question 2

Which tidyverse package contains a set of functions, such as select(), that help with data manipulation?

  • forcats
  • ggplot2
  • readr
  • dplyr

The dplyr package is the tidyverse package which contains a set of functions, such as select(), that help with data manipulation. For example, select() selects only relevant variables based on their names.

Question 3

An analyst is organizing a dataset in RStudio using the following code:

arrange(filter(Storage_1, inventory >= 40), count)

Which of the following examples is a nested function in the code?

  • filter
  • arrange
  • inventory
  • count

In the analyst’s code, filter is the nested function. It is embedded in the argument of the broader arrange function.

Weekly challenge 2

Question 1

Which of the following is an example of a piece of R code that contains both a function and an argument?

  • print("peaches")
  • weekly_sales <- 7450
  • #filter
  • mass > 1000

The piece of code print("peaches") is an example of R code that contains a function and an argument. The function is print and the argument in parentheses ("peaches") follows the function.

Question 2

A data analyst is assigning a variable to a value in their company’s sales dataset for 2020. Which variable name uses the correct syntax?

  • _2020sales
  • sales_2020
  • -sales-2020
  • 2020_sales

The variable with the correct syntax is sales_2020. A variable name in R may contain numbers and underscores as well but not as the first character.

Question 3

You want to create a vector with the values 12, 23, 51, in that exact order. After specifying the variable, what R code chunk allows you to create the vector?

  • v(12, 23, 51)
  • c(12, 23, 51)
  • c(51, 23, 12)
  • v(51, 23, 12)

The code chunk c(12, 23, 51) allows you to create a vector with the values 12, 23, 51. A vector is a group of data elements of the same type stored in a sequence in R. You can create a vector by putting the values you want inside the parentheses of the combine function

Question 4

An analyst comes across dates listed as strings in a dataset, for example December 10th, 2020. To convert the strings to a date/time data type, which function should the analyst use?

  • mdy()
  • now()
  • datetime()
  • lubridate()

To convert the strings to date/time data types, the analyst should use the function mdy(). The mdy() function and other variations of the ymd() function convert string dates and times into date/time data types that are compatible with R.

Question 5

A data analyst inputs the following code in RStudio:

sales_1 <- (3500.00 * 12)

Which of the following types of operators does the analyst use in the code? Select all that apply.

  • Assignment
  • Arithmetic
  • Logical
  • Relational

In the code sales_1 <- (3500.00 * 12), the analyst uses an assignment (<-) and an arithmetic (*) operator. The assignment operator assigns the calculated value in parentheses to the variable sales_1 and the arithmetic operator multiplies the values in parentheses to complete the calculation.

Question 6

A data analyst is deciding on naming conventions for an analysis that they are beginning in R. Which of the following rules are widely accepted stylistic conventions that the analyst should use when naming variables? Select all that apply.

  • Use single letters, such as “x” to name all variables
  • Use an underscore to separate words within a variable name
  • Use all lowercase letters in variable names
  • Begin all variable names with an underscore

The analyst should use all lowercase letters in variable names and should separate words with underscores. These are widely accepted stylistic conventions that help keep code readable.

Question 7

Which of the following are included in R packages? Select all that apply.

  • Tests for checking your code
  • Sample datasets
  • Reusable R functions
  • Naming conventions for R variable names

R packages include reusable R functions, sample datasets, and tests for checking your code. R packages also include documentation about how to use the included functions.

Question 8

Packages installed in RStudio are called from CRAN. CRAN is an online archive with R packages and other R-related resources.

  • True
  • False

Packages installed in RStudio are called from CRAN. CRAN is an online archive with R packages and other R-related resources.

Question 9

When programming in R, what is a pipe used as an alternative for?

  • Variable
  • Vector
  • Nested function
  • Installed package

A pipe can be used as an alternative for a nested function. You can use both pipes and nested functions to complete multiple operations on data. However, a pipe is often the preferred method because it makes your code easier to read and understand.

Week 3: Working with data in R

The R programming language was designed to work with data at all stages of the data analysis process. In this part of the course, you’ll examine how R can help you structure, organize, and clean your data using functions and other processes. You’ll learn about data frames and how to work with them in R. You’ll also revisit the issue of data bias and how R can help.

Learning Objectives

  • Discuss how R functions may be used to address issues of bias and relationship between data variables
  • Describe R functions that may be used to clean and organize data
  • Describe functions used to work with data frames including read_csv(), data(), and datapasta()
  • Demonstrate an understanding of the use of dataframes in R
  • Discuss the difference between tibbles and tribbles
  • Compare and contrast data cleaning with different tools

Answers to week 3 quiz questions

L2 Explore data and R

Question 1

Which of the following are best practices for creating data frames? Select all that apply.

  • Columns should be named
  • Data can be stored as many different types
  • Rows should be named
  • Each column should contain the same number of data items

When creating data frames, columns should be named, data can be stored as many different types, and each column should contain the same number of data items.

Question 2

Why are tibbles a useful variation of data frames?

  • Tibbles make changing the names of variables easier.
  • Tibbles can create row names
  • Tibbles make printing easier
  • Tibble can change the data type of inputs

Question 3

Tidy data is a way of standardizing the organization of data within R.

  • True
  • False

Tidy data refers to the principles that make data structures meaningful and easy to understand. It’s a way of standardizing the organization of data within R.

Question 4

Which R function can be used to make changes to a data frame?

  • colnames()
  • mutate()
  • str()
  • head()

The mutate() function can be used to make changes to a data frame.

L3 Cleaning data

Question 1

A data analyst is cleaning their data in R. They want to be sure that their column names are unique and consistent to avoid any errors in their analysis. What R function can they use to do this automatically?

  • rename()
  • select()
  • rename_with()
  • clean_names()

The clean_names() function will automatically make sure that column names are unique and consistent.

Question 2

A data analyst is trying to sort the penguins bill_length_mm data in descending order. They input the following code:

penguins %>%

What code does the analyst add to organize the column bill_length_mm in descending order?

  • arrange(-bill_length_mm)
  • arrange(=bill_length_mm)
  • arrange(+bill_length_mm)
  • arrange(%>%bill_length_mm)

The analyst adds the code arrange(-bill_length_mm) to organize the column bill_length_mm in descending order. The minus sign in front of the column name sorts the data in descending order by bill length. Without the minus sign, this command will return the data in ascending order instead.

Question 3

A data analyst is working with customer information from their company’s sales data. The first and last names are in separate columns, but they want to create one column with both names instead. Which of the following functions can they use?

  • separate()
  • unite()
  • arrange()
  • select()

The unite() function can be used to combine columns.

L4 R functions

Question 1

Which of the following functions can a data analyst use to get a statistical summary of their dataset? Select all that apply.

  • cor()
  • ggplot2()
  • sd()
  • mean()

The sd(), cor(), and mean() functions can provide a statistical summary of the dataset using standard deviation, correlation, and mean.

Question 2

A data analyst inputs the following command:

quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y)).

Which of the functions in this command can help them determine how strongly related their variables are?

  • mean(y)
  • sd(x)
  • cor(x,y)
  • sd(y)

The cor() function returns the correlation between two variables. This determines how strong the relationship between those two variables is.

Question 3

Fill in the blank: The bias function compares the actual outcome of the data with the _ outcome to determine whether or not the model is biased.

  • probable
  • desired
  • predicted
  • final

The bias function compares the actual outcome of the data with the predicted outcome to determine whether or not the model is biased.

Weekly challenge 3

Question 1

A data analyst is creating a new data frame. Their dataset has dates, currency, and text strings. What characteristic of data frames is this an instance of?

  • Data stored can be many different types
  • Columns should contain the same number of items
  • Columns should be named
  • Variables should be named

A data frame is a collection of columns. Characteristics of data frames include: all columns should be named, data stored can be many different types, and all columns should contain the same number of items. The dataset in question has a variety of data types, which is related to the idea that data stored can be many different types.

Question 2

A data analyst is considering using tibbles instead of basic data frames. What are some of the limitations of tibbles? Select all that apply.

  • Tibbles can overload a console
  • Tibbles can never create row names
  • Tibbles won’t automatically change the names of variables
  • Tibbles can never change the input type of the data

Tibbles are useful when working with large datasets because they make printing easier. But tibbles can never change the input type of the data, create row names, or change the names of variables.

Question 3

A data analyst is working with a large data frame. It contains so many columns that they don’t all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use?

  • colnames()
  • head()
  • str()
  • mutate()

The colnames() function will return a list of all the column names in a data frame for easy reference.

Question 4

A data analyst is working with the ToothGrowth dataset in R. What code chunk will allow them to get a quick summary of the dataset?

  • glimpse(ToothGrowth)
  • min(ToothGrowth)
  • separate(ToothGrowth)
  • colnames(ToothGrowth)

The code chunk is glimpse(ToothGrowth). The glimpse() function provides the analyst with a quick summary of the data in the ToothGrowth dataset. This function shows what all of the column names are and how many rows there are.

Question 5

A data analyst is working with the penguins dataset. What code chunk does the analyst write to make sure all the column names are unique and consistent and contain only letters, numbers, and underscores?

  • drop_na(penguins)
  • clean_names(penguins)
  • rename(penguins)
  • select(penguins)

The code chunk is clean_names(penguins). The clean_names() function ensures that there are only characters, numbers, and underscores in the names used in the data frame.

Question 6

A data analyst is working with the penguins data. They write the following code:

penguins %>%

The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?

  • filter(Gentoo == species)
  • filter(species <- "Gentoo")
  • filter(species == "Gentoo")
  • filter(species == "Adelie")

The code chunk is filter(species == “Gentoo”). The filter function allows the data analyst to specify which part of the data they want to view. Two equal signs in an argument mean “exactly equal to.” Using this operator instead of the assignment operator <- calls only the data about Gentoo penguins to the dataset.

Question 7

A data analyst is working with the penguins dataset. They write the following code:

penguins %>%
    group_by(species) %>% 

What code chunk does the analyst add to find the mean value for the variable body_mass_g?

  • summarize(=body_mass_g)
  • summarize(max(body_mass_g))
  • summarize(mean(body_mass_g))
  • summarize(body_mass_g(mean))

The code chunk is summarize(mean(body_mass_g)). The summarize function gives high-level information about a dataset.

Question 8

A data analyst is working with a data frame named salary_data. They want to create a new column named wages that includes data from the rate column multiplied by 40. What code chunk lets the analyst create the wages column?

  • mutate(salary_data, rate = wages * 40)
  • mutate(wages = rate * 40)
  • mutate(salary_data, wages = rate * 40)
  • mutate(salary_data, wages = rate + 40)

The code chunk is mutate(salary_data, wages = rate * 40). The analyst can use the mutate() function to create a new column called wages that includes data from the rate column multiplied by 40. The mutate() function can create a new column without affecting any existing columns.

Question 9

A data analyst is working with a data frame named customers. It has separate columns for area code (area_code) and phone number (phone_num). The analyst wants to combine the two columns into a single column called phone_number, with the area code and phone number separated by a hyphen. What code chunk lets the analyst create the phone_number column?

  • unite(customers, area_code, phone_num, sep="-")
  • unite(customers, "phone_number", area_code, phone_num)
  • unite(customers, "phone_number", area_code, sep="-")
  • unite(customers, "phone_number", area_code, phone_num, sep="-")

The code chunk unite(customers, "phone_number", area_code, phone_num, sep="-"). lets the analyst create the phone_number column. The unite() function lets the analyst combine the area code and phone number data into a single column. In the parentheses of the function, the analyst writes the name of the data frame, then the name of the new column in quotation marks, followed by the names of the two columns they want to combine. Finally, the argument sep="-" places a hyphen between the area code and phone number data in the phone_number column.

Question 10

A data analyst wants to summarize their data with the sd(), cor(), and mean(). What kind of measures are these?

  • Statistical
  • Numerical
  • Summary
  • Standard

Standard deviation, correlation, mean, maximum, and minimum are statistical measures which can be used to summarize data.

Question 11

In R, which statistical measure demonstrates how strong the relationship is between two variables?

  • Standard deviation
  • Correlation
  • Average
  • Maximum

Correlation measures how strong the relationship between two variables is. This is represented by the cor() function.

Question 12

A data analyst is studying weather data. They write the following code chunk:

bias(actual_temp, predicted_temp)

What will this code chunk calculate?

  • The minimum difference between the actual and predicted values
  • The maximum difference between the actual and predicted values
  • The average difference between the actual and predicted values
  • The total average of the values

The bias() function can be used to calculate the average amount a predicted outcome and actual outcome differ in order to determine if the data model is biased.

Week 4: More about visualizations, aesthetics, and annotations

R is a tool well-suited for creating detailed visualizations. In this part of the course, you’ll learn how to use R to generate and troubleshoot visualizations. You’ll also explore the features of R and RStudio that will help you with the aesthetics of your visualizations and for annotating and saving them.

Learning Objectives

  • Demonstrate an understanding of R functions for annotating and saving visualizations
  • Demonstrate an understanding of the aesthetics features available in R with reference to size, shape, color, and plots
  • Explain some common problems associated with visualizations in R
  • Demonstrate an understanding of the use of ggplot() to generate basic visualizations
  • Describe the options for generating visualizations in R
  • Demonstrate an understanding of RStudio functionality for saving visualizations
  • Create a plot in ggplot2
  • Explain the purpose and basic logic of the ggplot2 package

Answers to week 4 quiz questions

L2 Aesthetics in analysis

Question 1

In ggplot2, you can use the _ function to specify the data frame to use for your plot.

  • labs()
  • aes()
  • geom_point()
  • ggplot()

In ggplot2, you can use the ggplot() function to specify the data frame to use for your plot.

Question 2

In ggplot2, you use the plus sign (+) to add a layer to your plot.

  • True
  • False

In ggplot2, you use the plus sign (+) to add a layer to your plot.

Question 3

In ggplot2, what function do you use to map variables in your data to visual features of your plot?

  • The aes() function
  • The geom_bar() function
  • The ggplot() function
  • The geom_point() function

In ggplot2, you use the aes() function to map variables in your data to visual features of your plot. These features are known as aesthetics.

Question 4

What type of plot will the following code create?

ggplot(data = penguins) +
     geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
  • Bar chart
  • Scatterplot
  • Line diagram
  • Boxplot

The code will create a scatterplot. The function geom_point() uses points to create a scatterplot.

L3 Aesthetics in analysis

Question 1

Which of the following aesthetics attributes can you map to the data in a scatterplot? Select all that apply.

  • Text
  • Color
  • Size
  • Shape

You can map the color, shape, and size aesthetics to the data in a scatterplot.

Question 2

Which of the following functions let you display smaller groups, or subsets, of your data?

  • ggplot()
  • geom_bar()
  • geom_point()
  • facet_wrap()

The facet_wrap() function lets you display smaller groups, or subsets, of your data.

Question 3

You can use the color aesthetic to add color to the outline of each bar in a bar chart.

  • True
  • False

You can use the color aesthetic to add color to the outline of each bar in a bar chart.

Question 4

What is the role of the x argument in the following code?

ggplot(data = diamonds) +
     geom_bar(mapping = aes(x = cut))
  • A dataset
  • A function
  • A variable
  • An aesthetic

X is an aesthetic that refers to the x-axis of the plot. The x aesthetic maps the variable cut from the diamonds dataset to the x-axis of the plot.

Question 5

A data analyst creates a scatterplot with a lot of data points. It is difficult for the analyst to distinguish the individual points on the plot because they overlap. What function could the analyst use to make the points easier to find?

  • geom_line()
  • geom_bar()
  • geom_jitter()
  • geom_point()

The analyst could use the geom_jitter() function to make the points easier to find. The geom_jitter() function adds a small amount of random noise to each point in the plot, which helps deal with the overlapping of points.

L4 Annotating and saving visualizations

Question 1

Which of the following are benefits of adding labels and annotations to your plot? Select all that apply.

  • Indicating the main purpose of your plot
  • Helping stakeholders quickly understand your plot
  • Highlighting important data in your plot
  • Choosing a geom for your plot

Question 2

A data analyst is creating a plot for a presentation to stakeholders. The analyst wants to add a title, subtitle, and caption to the plot to help communicate important information. What function could the analyst use?

  • The geom_bar() function
  • The facet_wrap() function
  • The geom_point() function
  • The labs() function

The analyst could use the labs() function to add a title, subtitle, and caption to the plot.

Question 3

What function can you use to put a text label inside the grid of your plot to call out specific data points?

  • The annotate() function
  • The labs() function
  • The aes() function
  • The facet_wrap() function

You can use the annotate() function to put a text label inside the grid of your plot to call out specific data points.

Question 4

A data analyst wants to add the title “Penguins” to a plot that visualizes the penguins dataset. What is the correct syntax for the argument of the labs() function?

  • labs(title <- “Penguins”))
  • labs(title = “Penguins”)
  • labs("Penguins")
  • labs("Penguins" = title)

The code labs(title = “Penguins”) uses the correct syntax for the argument of the labs() function. In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks.

Question 5

Which of the following functions can you use to save your plots in ggplot2?

  • The ggsave() function
  • The ggplot() function
  • The saveplot() function
  • The ggplotsave() function

You can use the ggsave() function to save your plots in ggplot2.

Weekly challenge 4

Question 1

Which of the following are benefits of using ggplot2? Select all that apply.

  • Automatically clean data before creating a plot
  • Easily add layers to your plot
  • Combine data manipulation and visualization
  • Customize the look and feel of your plot

The benefits of using ggplot2 include easily adding layers to your plot, customizing the look and feel of your plot, combining data manipulation and visualization.

Question 2

In ggplot2, what symbol do you use to add layers to your plot?

  • The equal sign (=)
  • The ampersand symbol (&)
  • The pipe operator (%>%)
  • The plus sign (+)

In ggplot2, you use the plus sign (+) to add layers to your plot.

Question 3

A data analyst creates a plot using the following code chunk:

ggplot(data = penguins) + 
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

Which of the following represents a variable in the code chunk? Select all that apply.

  • body_mass_g
  • x
  • flipper_length_mm
  • y

The two variables in the code are flipper_length_mm and body_mass_g. The two variables are part of the penguins dataset. The aesthetic x maps the variable flipper_length_mm to the x-axis of the plot. The aesthetic y maps the variable body_mass_g to the y-axis of the plot.

Question 4

A data analyst uses the aes() function to define the connection between their data and the plots in their visualization. What argument is used to refer to matching up a specific variable in your data set with a specific aesthetic?

  • Faceting
  • Mapping
  • Jittering
  • Annotating

Mapping is an argument that matches up a specific variable in your data set with a specific aesthetic. You use the aes() function to define the mapping between your data and your plot.

Question 5

A data analyst is working with the penguins data. The analyst creates a scatterplot with the following code:

ggplot(data = penguins) + 
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g,alpha = species))

What does the alpha aesthetic do to the appearance of the points on the plot?

  • Makes some points on the plot more transparent
  • Makes the points on the plot more colorful
  • Makes the points on the plot smaller
  • Makes the points on the plot larger

The alpha aesthetic makes some points on a plot more transparent, or see-through, than others.

Question 6

You are working with the penguins dataset. You create a scatterplot with the following code chunk:

ggplot(data = penguins) + 
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

How do you change the second line of code to map the aesthetic size to the variable species?

  • geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, species = size)
  • geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size = species))
  • geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, species + size)
  • geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size + species))

You change the second line of code to geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, size = species)) to map the aesthetic size to the variable species. Inside the parentheses of the aes() function, add a comma after y = body_mass_g to add a new aesthetic attribute, then write size = species to map the aesthetic size to the variable species. The data points for each of the three penguin species will now appear in different sizes.

Question 7

Fill in the blank: The _ creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.

  • geom_bar() function
  • geom_jitter() function
  • geom_smooth() function
  • geom_point() function

The geom_jitter() function creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.

Question 8

You have created a plot based on data in the diamonds dataset. What code chunk can be added to your existing plot to create wrap around facets based on the variable color?

  • facet_wrap(~color)
  • facet_wrap(color)
  • facet_wrap(color~)
  • facet(~color)

The code chunk is facet_wrap(~color). Inside the parentheses of the facet_wrap() function, type a tilde symbol (~) followed by the name of the variable you want to facet.

Question 9

A data analyst uses the annotate() function to create a text label for a plot. Which attributes of the text can the analyst change by adding code to the argument of the annotate() function? Select all that apply.

  • Change the size of the text
  • Change the font style of the text
  • Change the color of the text
  • Change the text into a title for the plot

By adding code to the argument of the annotate() function, the analyst can change the font style, color, and size of the text.

Question 10

You are working with the penguins dataset. You create a scatterplot with the following lines of code:

ggplot(data = penguins) + 
    geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) + 

What code chunk do you add to the third line to save your plot as a jpeg file with “penguins” as the file name?

  • ggsave(penguins)
  • ggsave("penguins.jpeg")
  • ggsave(penguins.jpeg)
  • ggsave("jpeg.penguins")

You add the code chunk ggsave("penguins.jpeg") to save your plot as a jpeg file with “penguins” as the file name. Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (penguins), then a period, then the type of file (jpeg), then a closing quotation mark.

Week 5: Documentation and reports

When you’re ready to save and present your analysis, R has different options to consider. In this part of the course, you’ll explore R Markdown, a file format for making dynamic documents with R. You’ll find out how to format and export R Markdown, including how to incorporate R code chunks in your documents.

Learning Objectives

  • Demonstrate an understanding of how to export R Markdown notebooks
  • Demonstrate an understanding of how to incorporate R code chunks into R Markdown notebooks
  • Demonstrate an understanding of the basic formatting R Markdown to create structure and emphasize content
  • Describe the R Markdown notebooks and their use to document R programming code
  • Create and outline a structure for an R Markdown notebook
  • Access and use a customized R Markdown template included in an R package
  • Demonstrate an understanding of the uses of R Markdown templates

Answers to week 5 quiz questions

L2 Documentation and reports

Question 1

R Markdown allows you to create a record of the steps you took to complete your analysis directly in RStudio.

  • True
  • False

R Markdown is a file format for making dynamic documents with R. It allows you to create a record of your analysis and conclusions in a document while working in RStudio.

Question 2

Fill in the blank: Markdown is a _ for formatting plain text files.

  • file application
  • coding language
  • guide
  • syntax

Markdown is a syntax for formatting plain text files.

Question 3

A data analyst creates an interactive version of their R Markdown document to share with other users that allows them to execute code the analyst wrote. What did they create?

  • A markdown
  • An R notebook
  • A code chunk
  • An HTML report

They created an R notebook, which is an interactive R Markdown option. It lets users run code from the R Markdown document and displays charts and graphs to visualize that code. Markdown is a syntax for formatting plain-text files.

Question 4

A data analyst wants to convert their R Markdown file into another format. What are their options? Select all that apply.

  • HTML, PDF, and Word
  • Slide presentation
  • JPEG, PNG, and GIF
  • Dashboard

R Markdown files can be converted into HTML, PDF and Word, slideshow presentations, or dashboards.

Question 5

A data analyst has finished editing their R Markdown file and wants to save it as an HTML report. What tool will they use?

  • Knit
  • Output
  • Save
  • Hashtags

The knit button will produce a report containing all text, code, and results from the R Markdown file.

L3 Creating R Markdown documents

Question 1

What information does a data analyst usually find in the header section of an RMarkdown document? Select all that apply.

  • Title and author
  • Conclusions
  • File type
  • Date

The header section of an RMarkdown document contains the title, author, date, and file type.

Question 2

While formatting their R Markdown document, a data analyst decides to make one of the headers smaller. What do they type into the document to do this?

  • Brackets
  • Parentheses
  • Hashtags
  • Backticks

Hashtags can be used to make headers smaller. The more hashtags, the smaller the text.

Question 3

Inline code can be inserted directly into a .rmd file.

  • True
  • False

Inline code can be inserted directly into a .rmd file. This allows you to refer to code directly as you explain it to readers.

Question 4

To create bullet points to their output document, a data analyst adds _ to their RMarkdown document.

  • brackets
  • hashtags
  • asterisks
  • spaces

To create bullet points to their output document, a data analyst adds asterisks to their RMarkdown document.

Question 5

A data analyst wants to embed a link in their RMarkdown document. They write (click here!)(www.rstudio.com) but it doesn’t work. What should they write instead?

  • [click here!](www.rstudio.com)
  • <click here!>(www.rstudio.com)
  • "click here!"(www.rstudio.com)
  • click here!(www.rstudio.com)

The analyst should write [click here!](www.rstudio.com). The text to be linked should be bracketed. The parentheses around the URL itself are correct.

L4 Code chunks

Question 1

A data analyst includes a section of code in their RMarkdown file so they can add comments and allow stakeholders to run it. What is this the term for this section of code?

  • Template
  • Markdown
  • YAML
  • Code chunk

Code added to an .rmd file is usually called a code chunk.

Question 2

Fill in the blank: A delimiter is a character that marks the beginning and end of _.

  • a data item
  • an HTML report
  • a command line
  • an .rmd file

A delimiter is a character that marks the beginning and end of a data item. It can mark a single line of code, or a whole section of code in an .rmd file.

Question 3

Data analysts put three backticks at the end of their code chunks to act as a delimiter.

  • True
  • False

Three backticks can be written directly in an .rmd file to indicate the end of a code chunk as a delimiter.

Question 4

A data analyst has to create a monthly report for their stakeholders. What can they create to help them save time generating these reports?

  • HTML report
  • .rmd file
  • Template
  • R notebook

Creating a template for your reports allows you to run one line of code to update your data without having to recreate the report from scratch. Templates can also help you customize the appearance of your final report.

Question 5

A data analyst wants to mark the beginning of their code chunk. What delimiter should they type in their .rmd file?

  • +++{r }
  • ```{r }
  • ==={r }
  • ***{r }

Three backticks followed by the letter r in braces (```{r }) indicates the beginning of a code chunk in an .rmd file.

Weekly challenge 5

Question 1

A data analyst wants to create a shareable report of their analysis with documentation of their process and notes explaining their code to stakeholders. What tool can they use to generate this?

  • Code chunks
  • Filters
  • Dashboards
  • R Markdown

R Markdown is a file format for making dynamic documents with R. R Markdown documents can be used to save, organize, and document code; create a record of your cleaning process; and generate reports with executable code for stakeholders.

Question 2

Fill in the blank: R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and _.

  • dashboards
  • spreadsheets
  • tables
  • YAML

R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and dashboards.

Question 3

A data analyst notices that their header is much smaller than they wanted it to be. What happened?

  • They have too few hashtags
  • They have too few asterisks
  • They have too many hashtags
  • They have too many asterisks

Hashtags can be used to change the font size of headers. The more hashtags you add, the smaller the header.

Question 4

A data analyst wants to include a line of code directly in their .rmd file in order to explain their process more clearly. What is this code called?

  • Inline code
  • YAML
  • Documented
  • Markdown

Inline code is code that can be inserted directly into a .rmd file.

Question 5

What symbol can be used to add bullet points in R Markdown?

  • Backticks
  • Asterisks
  • Brackets
  • Exclamation marks

Asterisks can be used to add bullet points to an .rmd file. Hyphens can also be used.

Question 6

A data analyst adds a section of executable code to their .rmd file so users can execute it and generate the correct output. What is this section of code called?

  • Data plot
  • YAML
  • Documentation
  • Code chunk

Code added to a .rmd file is usually referred to as a code chunk. Code chunks allow users to execute R code from within the .rmd file.

Question 7

A data analyst is inserting a line of code directly into their .rmd file. What will they use to mark the beginning and end of the code?

  • Hashtags
  • Delimiters
  • Asterisks
  • Markdown

A delimiter is a character that indicates the beginning or end of a data

Question 8

If an analyst creates the same kind of document over and over or customizes the appearance of a final report, they can use _ to save them time.

  • a filter
  • a template
  • an .rmd file
  • a code chunk

A template can save time when creating the same kind of document over and over or when customizing the appearance of a final report.

Bir cevap yazın

E-posta hesabınız yayımlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir