# 2023 – 2024 Google Data Analytics Professional Certificate Course 3: Prepare Data – Quiz Answers

2023 – 2024 Google Data Analytics Professional Certificate Course 3: Prepare Data – Quiz Answers

## Week 1: Data types and structures

We all generate lots of data in our daily lives. In this part of the course, you’ll check out how we generate data and how analysts decide which data to collect for analysis. You’ll also learn about structured and unstructured data, data types, and data formats as you start thinking about how to prepare your data for exploration.

### Learning Objectives

• Discuss wide and long data formats with references to organization and purpose
• Explain the relationship between data types, fields, and values
• Explain the difference between structured and unstructured data
• Explain factors that should be considered when making decisions about data collection
• Explain how data is generated as a part of our daily activities with reference to the types of data generated
• Discuss the difference between data and data types

## L2 Differentiate between data structures

### Question 1

Fill in the blank: The running time of a movie is an example of _ data.

• continuous
• discrete
• internal
• animated

Running times of movies are an example of continuous data, which is measured and can have almost any numeric value.

### Question 2

What are the characteristics of unstructured data? Select all that apply.

• May have an internal structure
• Is not organized
• Fits neatly into rows and columns
• Has a clearly identifiable structure

Unstructured data is not organized, although it may have an internal structure.

### Question 3

Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.

• Store
• Find
• Analyze
• Search

Structured data that is grouped together to form relations enables analysts to more easily store, search, and analyze the data.

### Question 4

Which of the following is an example of unstructured data?

• Contact saved on a phone
• Rating of a local favorite restaurant
• Email message
• GPS location

An example of unstructured data is an email message. Other examples of unstructured data are video files and social media content.

## L3 Generating data

### Question 1

Which method of data-collection is most often used by scientists?

• Questionnaires
• Interviews
• Surveys
• Observations

Observation is the method of data-collection most often used by scientists.

### Question 2

Fill in the blank: Organizations such as the U.S. Centers for Disease Control (CDC) often use data collected from other organizations. Data gathered by hospitals, then collected by the CDC is an example of _.

• multiple-party data
• second-party data
• third-party data
• first-party data

Data gathered by hospitals, then collected by the CDC is an example of second-party data.

### Question 3

A data analyst is working for a company that’s about to launch a new product. The analyst needs to collect qualitative data from customers during the product launch. What is the quickest, most accurate, and most relevant method of data collection in this scenario?

• First-party data collection using an online survey
• Second-party data results for a similar product
• Third-party observations of customers shopping in stores
• First-party customer interviews in focus groups

In this scenario, the quickest, most accurate, and most relevant method of data collection is first-party data collection using an online survey.

## L4 Explore data types, fields, and values

### Question 1

You’re working as a data analyst and you use a formula to average data in a spreadsheet. You receive an error based on the data type. Which data types in cells may have caused the error? Select all that apply.

• Number
• String
• Duplicates
• Text

A text or string data type may have caused the error. The AVERAGE formula expects cells with a number data type. A text or string data type in a cell will cause the error.

### Question 2

The Boolean operator Not performs which of the following actions?

• Changes the value of “true” to “false” or “false” to “true”
• Changes the value of “true” to “null”
• Ignores any value that is “false”
• Converts the boolean values of “true” or “false” to their binary equivalents of “1” or “0”

The Boolean operator Not changes the value of “true” to “false” or “false” to “true.”

### Question 3

Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _ expands the number of results when used in a keyword search.

• And
• Not
• Or
• With

The Boolean operator Or expands the number of results when used in a keyword search.

### Question 4

Which of the following statements accurately describes a key difference between wide and long data?

• Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns.
• Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.
• Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns.
• Every wide data subject has multiple columns. Every long data subject has data in a single column.

Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

### Question 5

Data transformation enables you to do what with your data?

• Change the structure of the data
• Retrieve the data faster
• Inspect the data for accuracy
• Restore the data after it has been lost

Data transformation enables you to add, delete, or change the structure of your data.

## Weekly challenge 1

### Question 1

If you have a short time frame for data collection and need an answer immediately, you would have to use historical data.

• True
• False

If you have a short time frame for data collection and need an answer immediately, you would have to use historical data.

### Question 2

Which of the following is an example of continuous data?

• Box office returns
• Movie run time
• Movie budget
• Leading actors in movie

Movie run time is an example of continuous data.

### Question 3

Which of the following questions collects nominal qualitative data?

• Is this your first time dining at this restaurant?
• How many people do you usually dine with?
• On a scale of 1-10, how would you rate your service today?
• How many times have you dined at this restaurant?

“Is this your first time dining at this restaurant?” is a question that collects nominal qualitative data.

### Question 4

Which of the following is a benefit of internal data?

• Internal data is less vulnerable to biased collection.
• Internal data is more relevant to the problem.
• Internal data is more reliable and easier to collect.
• Internal data is less likely to need cleaning.

A benefit of internal data is that it’s more reliable and easier to collect than external data.

### Question 5

A social media post is an example of structured data.

• True
• False

A social media post is an example of unstructured data.

### Question 6

Fill in the blank: A Boolean data type can have _ possible values.

• two
• three
• infinite
• 10

A Boolean data type can have two possible values.

### Question 7

In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?

• A specific constraint
• A unique format
• A specific data type
• A unique data variable

In wide data, each column contains a unique data variable. In long data, separate columns contain the values and the context for the values, respectively.

### Question 8

A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation.

• True
• False

A data analyst using Save As to change a file type from .XLS to .CSV is an example of a data transformation.

## Week 2: Bias, credibility, privacy, ethics, and access

When data analysts work with data, they always check that the data is unbiased and credible. In this part of the course, you’ll learn how to identify different types of bias in data and how to ensure credibility in your data. You’ll also explore open data and the relationship between and importance of data ethics and data privacy.

## Learning Objectives

• Demonstrate an awareness of the accessibility issues associated with open data
• Demonstrate an understanding of the benefits of anonymizing data
• Explain the relationship between data ethics and data privacy
• Define data ethics and data privacy
• Explain the concept of open data with reference to the ongoing debate in data analytics
• Discuss characteristics of credible sources of data including reference to untidy data
• Identify different types of bias including confirmation, interpretation, and observer bias
• Discuss the difference between biased and unbiased data
• Explain what is involved in reviewing data to identify bias

## L2 Unbiased and objective data

### Question 1

Which of the following are examples of sampling bias? Select all that apply.

• A survey of high school students does not include homeschooled students.
• A national election poll only interviews people with college degrees.
• An online marketing analytics firm stores data in a spreadsheet.
• A clinical study includes three times more men than women.

A survey of high school students that does not include homeschooled students, a national election poll that only interviews people with college degrees, and a clinical study that includes three times more men than women are not representative of the population.

### Question 2

Two doctors look at the exact same image of a brain scan. The image is inconclusive, yet one doctor sees evidence of an abnormality in the brain. The other doctor sees a healthy brain. This is an example of sampling bias.

• True
• False

This is an example of observer bias, which is the tendency for different people to observe things differently.

## L3 Explore data credibility

### Question 1

Which of the following are usually good data sources? Select all that apply.

• Social media sites
• Governmental agency data
• Vetted public datasets
• Academic papers

Vetted public datasets, academic papers, and governmental agency data are usually good data sources.

### Question 2

To determine if a data source is cited, you should ask which of the following questions? Select all that apply.

• Who created this dataset?
• Has this dataset been properly cleaned?
• Is this dataset from a credible organization?
• Is the data relevant to the problem I’m trying to solve?

“Is this dataset from a credible organization?” and “Who created this dataset?” are questions that can help you determine if a data source is cited.

### Question 3

Which of the following are qualities of a bad data source? Select all that apply.

• The data source is out of date and irrelevant
• The data source solely relies on third-party information
• The data source is not cited or vetted
• The data source is not missing any important information

A bad data source is not cited or vetted, is out of date or irrelevant, or solely relies on third-party information.

### Question 4

A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply.

The data is not accurate
The data is biased

• The data is not original
• The data is not current

Third-party data about an older version of the product is inappropriate because it is not original or current.

## L4 Understand data ethics and privacy

### Question 1

A data analyst uses fixed-length codes to represent data columns in order to remove personally identifying information from a dataset. What process does this scenario describe?

• Data collection
• Data sorting
• Data visualization
• Data anonymization

This scenario describes data anonymization, which is the process of protecting people’s private or sensitive data by eliminating identifying information.

### Question 2

Data analysts never anonymize license plate numbers because that type of data can be easily seen whenever someone is out driving their car.

• True
• False

Data analysts often anonymize license plate numbers. Even though someone’s license plate is visible when they drive, it’s often important to keep that information private in a dataset.

### Question 3

Before completing a survey, an individual reads information about how and why the data they provide will be used. What is this concept called?

• Currency
• Openness
• Privacy
• Consent

This concept is called consent. Consent is the aspect of data ethics that presumes an individual’s right to know how and why their personal data will be used before agreeing to provide it.

## L5 Explaining open data

### Question 1

What aspect of data ethics promotes the free access, usage, and sharing of data?

• Consent
• Transaction transparency
• Privacy
• Openness

Openness is the aspect of data ethics that promotes the free access, usage, and sharing of data.

### Question 2

What are the main benefits of open data?

• Open data restricts data access to certain groups of people.
• Open data increases the amount of data available for purchase.
• Open data makes good data more widely available.
• Open data combines data from different fields of knowledge.

The benefits of open data include making good data more widely available and combining data from different fields of knowledge.

### Question 3

Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply.

• Certain groups of people must share their private data.
• Everyone must be able to use, re-use, and redistribute open data.
• No one can place restrictions on data to discriminate against a person or group.
• All corporations are allowed to sell open data.

The key aspects of universal participation are that everyone must be able to use, reuse, and redistribute open data. Also, no one can place restrictions on data to discriminate against a person or group.

## Weekly challenge 2

### Question 1

Fill in the blank: A preference in favor of or against a person, group of people, or thing is called _. It is an error in data analytics that can systematically skew results in a certain direction.

• data interoperability
• data collection
• data anonymization
• data bias

Data bias is a type of error that systematically skews results in a certain direction.

### Question 2

A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of?

• Interpretation bias
• Confirmation bias
• Sampling bias
• Observer bias

This is an example of sampling bias, which is when a sample isn’t representative of the population as a whole.

### Question 3

Which of the following are qualities of unreliable data? Select all that apply.

• Biased
• Vetted
• Inaccurate
• Incomplete

Unreliable data is inaccurate, incomplete, and biased.

### Question 4

In data ethics, consent gives an individual the right to know the answers to which of the following questions? Select all that apply.

• How will my data be used?
• Why am I being forced to share my data?
• Why is my data being collected?
• How long will my data be stored?

In data ethics, consent gives individuals the right to know why their data is being collected, how it will be used, and how long it will be stored.

### Question 5

An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This concept refers to which aspect of data ethics?

• Transaction transparency
• Ownership
• Consent
• Currency

This refers to transaction transparency, which is the idea that an individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data.

### Question 6

What is data privacy?

• Providing free access, usage, and sharing of data
• Applying well-founded standards of right and wrong that dictate how data is collected, shared, and used
• Searching for or interpreting supporting information
• Preserving a data subject’s information and activity for all data transactions

Data privacy refers to preserving a data subject’s information and activity for all data transactions.

### Question 7

Data anonymization applies to both text and images.

• True
• False

Data anonymization applies to all personally identifiable information, including text and images.

### Question 8

The government of a large city collects data on the quality of the city’s infrastructure. Any business, nonprofit organization, or citizen can access the government’s databases and re-use or redistribute the data. Is this an example of open data?

• Yes
• No

This is an example of open data. Everyone must be able to use, re-use, and redistribute open data.

## Week 3: Databases: Where data lives

When you’re analyzing data, you’ll access much of the data from a database. It’s where data lives. In this part of the course, you’ll learn all about databases, including how to access them and extract, filter, and sort the data they contain. You’ll also check out metadata to discover the different types and how analysts use them.

### Learning Objectives

• Demonstrate an understanding of how to use SQL functions to extract data from a database
• Demonstrate an understanding of how to use spreadsheet functionality to import and inspect a given set of data
• Explain the use of filters and sorting functionality in spreadsheets
• Demonstrate an understanding of the issues and steps involved in accessing data from multiple sources
• Discuss the importance of metadata and how it relates to the work of a data analyst
• Explain metadata as it relates to databases
• Describe databases with references to their functions and components

## L2 Working with databases

### Question 1

Fill in the blank: Normalized databases help avoid _ data.

• messy
• abnormal
• inaccurate
• redundant

Normalized databases help avoid redundant data. Redundancies occur when the same piece of data is stored in more than one place in a database.

### Question 2

What does a database’s metadata tells a data analyst about its contents? Select all that apply.

• Where the data came from
• When the data was created
• What the data is all about
• Which type of analysis to perform on the data

A database’s metadata tells a data analyst when the data was created, where it came from, and what it’s all about.

### Question 3

What is the difference between a primary key and a foreign key?

• A primary key is an identifier that references a column in which each value is identical. A foreign key references a column in which each value is unique.
• A primary key is any column of data from a database. A foreign key is any column of data from a secondary database.
• A primary key is an identifier that references a column in which each value is unique. A foreign key is a field within a table that’s a primary key in the original table.
• A primary key is an identifier that references a column of relevant data within a database. A foreign key is an identifier that references a column of irrelevant data.

A primary key is an identifier that references a column in which each value is unique. A foreign key is a field within a table that’s a primary key in the original table.

### Question 4

A data analyst at a PR firm needs to construct a database of celebrity clients. If their boss needs the data to be accessed as quickly as possible, the analyst should use a snowflake schema.

• True
• False

A star schema prioritizes speed and simplicity.

### Question 5

Fill in the blank: A relational database contains a series of _ that can be connected to form relationships.

• tables
• cells
• fields
• schemas

A relational database contains a series of tables that can be connected to form relationships.

## L3 Managing data with metadata

### Question 1

A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections the data lives in?

• Descriptive metadata
• Representative metadata
• Administrative metadata
• Structural metadata

Structural metadata indicates exactly how many collections the data lives in. It provides information about how a piece of data is organized and whether it’s part of one, or more than one, data collection.

### Question 2

Fill in the blank: Data _ ensures that a company’s data assets are properly managed.

• governance
• quality control
• maintenance
• organization

Data governance ensures that a company’s data assets are properly managed.

### Question 3

A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?

• Representative
• Administrative
• Structural
• Descriptive

The ID numbers are descriptive, which means they can be used to identify the students at a later time.

### Question 4

A company needs to merge third-party data with its own data. The company can accomplish this with which of the following actions? Select all that apply.

• Use metadata to standardize the data.
• Use metadata to evaluate the third-party data’s quality and credibility.
• Replace the incoming data’s metadata with its own company metadata.
• Alter the company’s metadata to more closely reflect the incoming metadata.

The company can use metadata to standardize the data and evaluate the third-party data’s quality and credibility.

### Question 5

The date and time a database was created is an example of which kind of metadata?

• Unstructured
• Descriptive
• Administrative
• Structural

The date and time a database was created is an example of administrative metadata.

## Accessing different data sources

### Question 1

A .CSV file saves data in a table format. What does .CSV stand for?

• Compatible scientific variables
• Comma-separated values
• Calculated spreadsheet values
• Cell-structured variables

.CSV stands for comma-separated values.

### Question 2

A data analyst wants to bring data from a .CSV file into a spreadsheet. This is an example of what process?

• Editing data
• Filing data
• Titling data
• Importing data

A data analyst bringing data from a .CSV file into a spreadsheet is an example of importing data.

### Question 3

A .CSV file makes it easier for data analysts to complete which tasks? Select all that apply.

• Distinguish values from one another
• Manage multiple tabs within a worksheet
• Examine a small subset of a large dataset
• Import data to a new spreadsheet

A .CSV file makes it easier for data analysts to examine a small part of a large dataset, import data to a new spreadsheet, and distinguish values from one another.

## L5 Sorting and filtering

### Question 1

What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?

• Filtering
• Reframing
• Sorting
• Prioritizing

Sorting is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize.

### Question 2

Filtering by a particular criteria is an effective way to narrow the scope of a query. However, filtering is time-intensive because it can only be done one variable at a time.

• True
• False

Filtering can be done by single variable or multiple variables, depending on the query’s criteria.

### Question 3

A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?

• Filter out condominium sales
• Sort by condominium sales
• Filter out non-condominium sales
• Sort by non-condominium sales

The analyst can narrow their scope by filtering out non-condominium sales. This will enable them to view only the data on condominium sales.

### Question 4

A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars?

• By return date, in ascending order
• By return date, in descending order
• By car numerical ID, in ascending order
• By car numerical ID, in descending order

To sort the spreadsheet to quickly find the most recently returned cars, they should sort by return date, in descending order.

### Question 5

Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _ from the View menu.

• Set
• Freeze
• Pin
• Lock

To keep a header row at the top of a spreadsheet, highlight the row and select freeze from the View menu.

## L6 Working with large datasets in SQL

### Question 1

In MySQL, what is a proper way to write a SELECT clause starter?

• ‘SELECT’
• “SELECT”
• select
• SELECT

In MySQL, a proper way to write a SELECT clause starter is either SELECT or select.

### Question 2

Which case should be used when writing the column names in a database table?

• Camel case
• Sentence case
• Lowercase
• Snake case

Column names should be written in snake case, which separates each word with an underscore to make it more readable. Column names should never contain spaces.

## Weekly challenge 3

### Question 1

Primary and foreign keys are two connected identifiers within separate tables. These tables exist in what kind of database?

• Primary
• Relational
• Normalized
• Metadata

Primary and foreign keys are two connected identifiers within separate tables in a relational database.

### Question 2

Metadata is data about data. What kinds of information can metadata offer about a particular dataset? Select all that apply.

• How to combine the data with another dataset
• Which analyses to perform on the data
• If the data is clean and reliable
• What kinds of data it contains

Metadata helps data analysts identify the type of data, if it is clean and reliable, and how it can be combined with another dataset.

### Question 3

Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply.

• Classes the student is enrolled in
• Student’s ID number
• Grades the student earns
• Student’s enrollment date

The student ID number, enrollment date, and classes the student is enrolled in represent structural metadata.

### Question 4

Think about data as a refrigerator. Which kind of metadata is the refrigerator’s product number?

• Redundant
• Administrative
• Structural
• Descriptive

The refrigerator’s product number is descriptive metadata because it is information that can help identify the refrigerator at a later date.

### Question 5

What is the process that data analysts use to ensure the formal management of their company’s data assets?

• Data integrity
• Data governance
• Data mapping
• Data aggregation

Data governance is the process of ensuring the formal management of a company’s data assets.

### Question 6

Describe the key differences between a star and a snowflake schema. Select all that apply.

• A star schema enables very fast data processing.
• A snowflake schema enables very fast data processing.
This should not be selected
• A snowflake schema has one or more fact tables referencing any number of dimension tables. A star schema is an extension of a snowflake schema, with more dimensions and subdimensions.
• A star schema has one or more fact tables referencing any number of dimension tables. A snowflake schema is an extension of a star schema, with more dimensions and subdimensions.

A star schema has one or more fact tables referencing any number of dimension tables. A snowflake schema is an extension of a star schema, with more dimensions and subdimensions. It also enables very fast data processing.

### Question 7

What are some key benefits of using external data? Select all that apply.

• External data is always reliable.
• External data is free to use.
• External data has broad reach.
• External data provides industry-level perspectives.

Some key benefits of using external data are that it has a broad reach and it provides industry-level perspectives.

### Question 8

A data analyst reviews a database of Wisconsin car sales to find the last five car models sold in Milwaukee in 2019. How can they sort and filter the data to return the last five cars at the top? Select all that apply.

• Filter out sales outside of Milwaukee
• Filter out sales not in 2019
• Sort by date in ascending order
• Sort by date in descending order

The analyst can filter out sales outside of Milwaukee in 2019 and sort by date in descending order.

## Week 4: Organizing and protecting your data

Good organization skills are a big part of most types of work, and data analytics is no different. In this part of the course, you’ll learn the best practices for organizing data and keeping it secure. You’ll also learn how analysts use file naming conventions to help them keep their work organized.

### Learning Objectives

• Explain steps that can be taken to secure data
• Discuss the use of file-naming conventions by data analysts
• Describe best practices for organizing data

## L2 Effectively organize data

### Question 1

Data analysts use guidelines to describe a file’s version, content, and date created. What are these guidelines called?

• Naming references
• Naming verifications
• Naming attributes
• Naming conventions

Naming conventions are guidelines that describe the content, date, or version of a file.

### Question 2

Data analysts use foldering to achieve what goals? Select all that apply.

• To transfer files from one place to another
• To organize folders into subfolders
• To assign metadata about the folders
• To keep project-related files together

Data analysts use foldering to keep project-related files together and organize them into subfolders.

### Question 3

Fill in the blank: To separate current from past work and reduce clutter, data analysts create _. This involves moving files from completed projects to a separate location.

• structures
• archives
• copies
• backups

To separate current from past work and reduce clutter, data analysts create archives.

### Question 4

What is the process of structuring folders broadly at the top, then breaking down those folders into more specific topics?

• Developing metadata
• Creating a hierarchy
• Assigning naming conventions
• Producing a backup

The process of structuring folders broadly at the top, then breaking down those folders into more specific topics, is creating a hierarchy.

### Question 5

Successful file naming conventions include information that’s useful when trying to locate or update a file. Which of the following are effective file names? Select all that apply.

• AirportCampaign_53019_V01
• May2019_AirportCampaignData_V03
• Data_519
• May30-2019_AirportAdvertisingCampaignResults_Terminals3-5_InclCustSurveyResponses_PLUS_IdeasforJune

AirportCampaign_53019_V01 and May2019_AirportCampaignData_V03 are effective file names because they are an appropriate length and reference the project name, creation date, and version.

## L3 Securing data

### Question 1

Fill in the blank: Data security involves using _ to protect data from unauthorized access or corruption.

• metadata
• data validation
• foldering
• safety measures

Data security involves using safety measures to protect data from unauthorized access or corruption.

### Question 2

When using data security measures, analysts can choose between protecting an entire spreadsheet or protecting certain cells within the spreadsheet.

• True
• False

When using data security measures, analysts can choose between protecting an entire spreadsheet or protecting certain cells within the spreadsheet. Data security can be used to protect an entire spreadsheet, specific parts of a spreadsheet, or even just a single cell.

### Question 3

What tools can data analysts use to control who can access or edit a spreadsheet? Select all that apply.

• Filters
• Sharing permissions
• Tabs
• Encryption

Data analysts use encryption and sharing permissions to control who can access or edit a spreadsheet.

## Weekly challenge 4

### Question 1

Fill in the blank: Naming conventions are _ that describe a file’s content, creation date, or version.

• frequent suggestions
• common verifications
• general attributes
• consistent guidelines

Naming conventions are consistent guidelines that describe a file’s content, creation date, or version.

### Question 2

A data analytics team uses data about data to indicate consistent naming conventions for a project. What type of data is involved in this scenario?

• Metadata
• Long data
• Aggregated data
• Big data

Metadata is data about data. Metadata practices can help analytics teams create consistent naming conventions and storage practices for their files.

### Question 3

A data analyst creates a file that lists people who donated to their organization’s fund drive. An effective name for the file is: FundDriveDonors_Feb2022_V3.

• True
• False

FundDriveDonors_Feb2022_V3 is an effective file name because it is an appropriate length and references the project name, creation date, version.

### Question 4

Foldering may be used by data analysts to organize folders into what?

• Databases
• Subfolders
• Versions
• Tables

Foldering may be used by data analysts to organize folders into subfolders.

### Question 5

Data analysts use archiving to separate current from past work. What does this process involve?

• Reviewing current data files to confirm they’ve been cleaned
• Moving files from completed projects to another location
• Reorganizing and renaming current files
• Using secure data-erase software to destroy old files

Archiving involves moving files from completed projects to a separate location.

### Question 6

Fill in the blank: Data analysts create _ to structure their folders.

• hierarchies
• ladders
• sequences
• scales

Data analysts create hierarchies to structure their folders.

### Question 7

A data analyst wants to ensure only people on their analytics team can access, edit, and download a spreadsheet. They can use which of the following tools? Select all that apply.

• Sharing permissions
• Encryption
• Templates
• Filtering

To control who can access or edit a spreadsheet, data analysts use encryption and sharing permissions.

### Question 8

To reduce clutter, a data analyst hides cells that contain long, complex formulas. To view the formulas again, the analyst will need to adjust the spreadsheet sharing or encryption settings.

• True
• False

Hidden cells can be easily unhidden using the unhide feature. Hiding does not protect data.

## Week 5: Optional: Engaging in the data community

Having a strong online presence can be a big help for job seekers of all kinds. In this part of the course, you’ll explore how to manage your online presence. You’ll also discover the benefits of networking with other data analytics professionals.

### Learning Objectives

• Explain the importance of networking with other data analysts including reference to mentorship and communication
• Apply best practices to manage a professional online presence
• Describe approaches to build an online presence as a data analyst

## Course challenge

Prepare for the course challenge by reviewing terms and definitions in the glossary. Then, demonstrate your knowledge of data collection, ethics and privacy, and bias during the quiz. You will also have an opportunity to apply your skill with spreadsheet and SQL functions, as well as filtering and sorting. Finally, secure and organize data with data analytics best practices.

### Learning Objectives

• Explain factors that should be considered when making decisions about data collection
• Explain the relationship between data ethics and data privacy
• Discuss characteristics of credible sources of data including reference to untidy data
• Identify different types of bias including confirmation, interpretation, and observer bias
• Demonstrate an understanding of how to use SQL functions to extract data from a database
• Demonstrate an understanding of how to use spreadsheet functionality to import and inspect a given set of data
• Describe best practices for organizing data

## Scenario 1, questions 1-5

### Question 1

You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.

To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.

Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.

Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.

Click below to read the email:
C3 Scenario 1_Client Email.pdf

And click below to access the datasets:

Course 3 Final Challenge Data Sets – Customer survey data (1).csv

Course 3 Final Challenge Data Sets – Delivery times_distance (1).csv

Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data is first-party data. What does this mean?

• It’s subjective data that measures qualities and characteristics.
• It’s data that was collected by Garden employees using their own resources.
• It’s a type of data that’s categorized without a set order.
• It’s data that was collected from outside sources.

First-party data is data collected by an individual or group using their own resources.

### Question 2

Next, you review the customer satisfaction survey data:

CustomerSurveyData – Customer survey data.csv

The question in column E asks, “Was your order accurate? Please respond yes or no.” What kind of data is this?

• Clean data
• Ordinal data
• Second-party data
• Boolean data

This is Boolean data, which has only two possible values, such as yes or no.

### Question 3

Now, you review the data on delivery times and the distance of customers from the restaurant:

DeliveryTimes_DistanceData – Delivery times_distance.csv

The data in column E shows the duration of each delivery. What type of data is this? Select all that apply.

• Quantitative data
• Qualitative data
• Discrete data
• Continuous data

This is an example of discrete data, which is counted and has a limited number of values. It is also quantitative data, which is specific and measures numerical facts.

### Question 4

The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is an example of structured data.

• True
• False

This is an example of unstructured data, which is not organized in an easily identifiable manner.

### Question 5

Now that you’re familiar with the data, you want to build trust with the team at Garden.

What actions should you take when working with their data? Select all that apply.

• Keep the data safe by implementing data-security measures, such as password protection and user permissions.
• Organize the data using effective naming conventions.
• Share the client’s data with other delivery restaurants to compare performance.
• Post on social media that you’re working with Garden and would like feedback from any of your contacts who have ordered there before.

You can build trust by showing a client that you will organize their data effectively and keep it safe by implementing appropriate data-security measures.

## Scenario 2, questions 6-10

### Question 6

You’ve completed this program and are interviewing for a junior data scientist position at a company called Sewati Financial Services.

Click below to review the job description:

C3 Course Challenge Junior Data Scientist Job Description .pdf

So far, you’ve successfully completed the first interview with a recruiter. They arrange your second interview with the team at Sewati Financial Services.

Click below to read the email from the human resources director:

Course 3 Scenario 2_Second Interview Email.pdf

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Kai Harvey, the senior manager of strategy. After welcoming you, he begins the behavioral interview.

Consider and respond to the following question. Select all that apply.

Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the results do not favor a particular person, group of people, or thing?

• Instruct participants to share their name and contact information.
• Ensure the survey sample represents the population as a whole.
• Make sure the wording of the survey question does not encourage a specific response from participants.
• Give participants enough time to answer each survey question.

The way questions are written, the amount of time given to answer each question, and the inclusivity of the participants can help ensure survey results are unbiased.

### Question 7

Consider and respond to the following question. Select all that apply.

Our data analytics team often uses both internal and external data. Describe the difference between the two.

• Internal data lives within a company’s own systems. External data lives outside the organization.
• External data is typically generated from within the company. Internal data is generated outside the organization.
• Internal data is typically generated from within the company. External data is generated outside the organization.
• External data lives within a company’s own systems. Internal data lives outside the organization.

Internal data lives within a company’s own systems and is typically generated from within the company. External data lives in and is generated outside the organization.

### Question 8

Consider and respond to the following question. Select all that apply.

Our analysts often work with the same spreadsheet, but for different purposes. How would you use filtering to help in this situation?

• Use filters to highlight the header row
• Use filters to simplify a spreadsheet by only showing you only the information you need.
• Use filters to sort the data in a meaningful order
• Use filters to show only the data that meets a specific criteria while hiding the rest

Filters enable data analysts on the same team to use the same dataset for different purposes.

### Question 9

Next, your interviewer wants to better understand your knowledge of basic SQL commands. He asks: How would you write a query that retrieves only data about people with the last name Hassan from the Clients table in our database?

• SELECT DATA FROM Clients WHERE ‘Hassan’
• SELECT Clients WHERE Last_Name= ‘Hassan’ FROM *
• SELECT * FROM Clients WHERE Last_Name= ‘Hassan’
• SELECT All WHERE Last_Name ‘Hassan’ FROM Clients

To write a query that retrieves only data about people with the last name Hassan from the Clients table, type SELECT * FROM Clients WHERE Last_Name=’Hassan’.

### Question 10

For your final question, your interviewer explains that Sewati Financial Services cares about its clients’ trust, and this is an important responsibility for the data analytics team. They do this by:

• protecting clients from unauthorized access to their private data
• ensuring freedom from inappropriate use of client data
• giving consent to use someone’s data

He asks: Which data analytics practice does this describe?

• Encryption
• Data privacy
• Sharing permissions
• Bias

This describes data privacy, which involves protecting an individuals’ private data.