## Get All Weeks Algorithms for DNA Sequencing Coursera Quiz Answers

## Table of Contents

### Algorithms for DNA Sequencing Week 01 Quiz Answers

#### Quiz 1: Module 1

Q1. Which of the following is not a suffix of CATATTAC?

- CAT
- TATTAC
- TAC
**C**

Q2. What’s the longest prefix of CACACTGCACAC that is also a suffix?

**CACAC**- C
- CACACTG
- CAC

Q3. Which of the following is not a substring of GCTCAGCGGGGCA?

**GCC**- GCT
- GCA
- GCG

Q4. Starting around 2007, the cost of DNA sequencing started to decrease rapidly because more laboratories started to use:

- Sanger sequencing
- Double sequencing
**Second-generation sequencing**- DNA microarrays

Q5. Which of the following pieces of information is not included in a sequencing read in the FASTQ format:

- The sequence of base qualities corresponding to the bases
- A “name” for the read
- The sequence of bases that make up the read
**Which chromosome the read originate from**

Q6. If read alignment is like “looking for a needle in a haystack,” then the “haystack” is the:

- Sequencing read
- Gene database
**Reference genome**- Sequencer

Q7. The Human Genome Project built the initial “draft” sequence of the human genome, starting from sequencing reads. The computational problem they had to solve was the:

- prime factorization problem
**de novo shutgun assembly problem**- gene finding problem
- read alignment problem

Q8. If the length of the pattern is x and the length of the text is y, the minimum possible number of character comparisons performed by the naive exact matching algorithm is:

**y – x + 1**- xy
- x + y
- x(y – x + 1)

Q9. If the length of the pattern is x and the length of the text is y, the maximum possible number of character comparisons performed by the naive exact matching algorithm is:

- x + y
- xy
- y – x + 1
**x(y – x + 1)**

Q10. Say we have a function that generates a random DNA string, i.e. the kind of string we would get by rolling a 4-sided die (A/C/G/T) over and over. We use the function to generate a random pattern P of length 20 and a random text T of length 100. Now we run the naive exact matching algorithm to find matches of P within T. We expect the total number of character comparisons we perform to be closer to the…

**maximum possible**- minimum possible

### Week 2

#### Quiz 1: Module 2

Q1. Boyer-Moore: How many alignments are skipped by the bad character rule for this alignment?

Note: the number of skips is one less than the number of positions P shifts by. That is, if the pattern shifts by 2 positions, that’s 1 alignment skipped.

Also note: the question is asking only about the alignment shown. Do not consider any other alignments of P to T in your answer.

`T: GGCTATAATGCGTA`

P: TAATAAA

Answer: he bad character rule in Boyer-Moore skips one alignment. So the answer is 1.

Q2. Boyer-Moore: How many alignments are skipped by the good suffix rule in this scenario?

`T: GGCTATAATGCGTA`

P: TAATTAA

Answer: The good suffix rule does not skip any alignments in this scenario. So the answer is 0.

Q3. Boyer-Moore, true or false: for given P and T, it’s possible that some characters from T will never be examined, i.e., won’t be involved in any character comparisons.

- False
**True**

Q4. Consider a version of Boyer-Moore that uses only the bad character rule (no good suffix rule), and say our pattern P is a random string of 50% As and 50% Ts. In which scenario would you expect Boyer-Moore to skip the most alignments?

**The text T consists of 40% As, 40% Ts, 10% Cs and 10%Gs**- The text T consists of 25% As, 25% Ts, 25% Cs and 25%Gs
- The text T consists of 10% As, 10% Ts, 40% Cs and 40%Gs

Q5. The naive exact matching algorithm preprocesses:

- The text T
- Neither
- Both
**The pattern P**

Q6. The Boyer-Moore algorithm preprocesses:

**The pattern P**- Neither
**The text T**- Both

Q7. In which of these scenarios is an offline matching algorithm not appropriate?

**A tool that evaluates a password by comparing it against a large database of bad (easy-to-guess) passwords**- Your web browser’s “find” function allows you to find a particular word on the web page
- you are currently viewing
- A tool that searches for words in an archive of every speech made in the U.S. Congress

Q8. Say we have a k-mer index containing all 5-mers from T. We query the index using the first 5-mer from P and the index returns a single index hit. What can we say about whether P occurs in T? Assume T is longer than P and that P is at least 6 bases long.

- It definitely does
**It definitely does not****We don’t know; not enough information**

Q9. Say we have a k-mer index containing all k-mers from T and we query it with 3 different k-mers from the pattern P. The first query returns 0 hits, the second returns 1 hit, and the third returns 3 hits. What can we say about whether P occurs in T?

- It definitely does
**It definitely does not****We don’t know; not enough information**

Q10. Which of the following is not an “edit” allowed in edit distance:

**Transposition**- Deletion
- Substitution
- Insertion

### Week 3

#### Quiz 1: Module 3

Q1. The value in each edit-distance matrix element depends on its neighbors:

- Above, to the left, and to the right
**To the upper-left, to the left and to the lower-left**- To the left and to the lower-left
- Above, to the left, and to the upper-left

Q2. Say we have filled in the approximate matching matrix and identified the minimum value (say, 2) in the bottom row. Now we would like to know the shape of the corresponding 2-edit alignment, i.e. we would like to know where the insertions, deletions and substitutions are. We use a procedure called:

- Filling
- Binary search
- Pathing
**Traceback**

Q3. Say the edit distance between DNA strings α and β is 407. What is the edit distance between α and β\verb|G|G (β concatenated with the base \verb|G|G)

- could be any of the other choices
- 406
**407**- 408

Q4. Say we are using dynamic programming to find approximate occurrences of P in T. About how many dynamic programming matrix elements do we have to fill in?

**|P| |T|****|P| + |T|**- |T|^2 (squared)
- |P|^2 (squared)

Q5. Local alignment is different from global alignment because:

**It finds similarities between substrings rather than between entire strings**- There is no dynamic programming algorithm for solving it
- It compares three strings instead of two
- Insertions and deletions incur no penalty

Q6. The first law of assembly says that if a prefix of read A is similar to a suffix of read B, then…

**A and B might overlap in the genome**- A and B must be from different genomes
- Read B might have a sequencing error at the end
- A and B should not be joined in the final assembly

Q7. The second law of assembly says that more coverage leads to…

- less accurate results
**more and longer overlaps between reads**- more sequencing errors

Q8. In an overlap graph, the nodes of the graph correspond to

- Bases
- Genomes
- Overlaps
**Reads**

Q9. The overlap graph is a useful structure because:

- It makes it faster to compare reads
**A reconstruction of the genome corresponds to a path through the graph**- It helps to ignore long overlaps

Q10. Which of the following is not a reason why an overlap might contain sequence differences (i.e. might not be an exact match):

- Insufficient coverage
**Polyploidy**- Sequencing error

### Week 4

#### Quiz 1: Module 4

Q1. The slow (sometimes called “brute force”) algorithm for finding the shortest common superstring of the strings in set S involves:

- Iteratively removing strings from S that don’t belong in the superstring
**Trying all orderings of the strings in S**- Concatenating the strings in of S
- Finding the longest common substring of the strings in S

Q2. Which of the following is not a true statement about the slow (brute force) shortest common superstring algorithm?

- It might collapse repetitive portions of the genome
- The superstring returned might be longer than the shortest possible one
**The amount of time it takes grows with the factorial of the number of input strings**

Q3. Which of the following is not a true statement about the greedy shortest common superstring formulation of the assembly problem?

**The amount of time it takes grows with the factorial of the number of input strings**- It might collapse repetitive portions of the genome
- The superstring returned might be longer than the shortest possible one

Q4. True or false: an Eulerian walk is a way of moving through a graph such that each node is visited exactly once

- False
**True**

Q5. If the genome is repetitive and we try to use the De Bruijn Graph/Eulerian Path method for assembling it, we might find that:

- There is more than one Eulerian path
**The genome “spelled out” along the Eulerian path is not a superstring of the reads**- The De Bruijn graph breaks into pieces

Q6. In a De Bruijn assembly graph for given k, there is one edge per

- read
**k-mer**- k-1-mer
- genome

Q7. Which of the following does not help with the problem of assembling repetitive genomes:

- Paired-end reads
- Longer reads
**Increasing the minimum required overlap length for the overlap graph**

###### Get All Course Quiz Answers for the Genomic Data Science Specialization

Introduction to Genomic Technologies Coursera Quiz Answers

Python for Genomic Data Science Coursera Quiz Answers

Algorithms for DNA Sequencing Coursera Quiz Answers

Command Line Tools for Genomic Data Science Coursera Quiz Answers

Bioconductor for Genomic Data Science Coursera Quiz Answers

Statistics for Genomic Data Science