Command Line Tools for Genomic Data Science Quiz Answers

Get All Weeks Command Line Tools for Genomic Data Science Quiz Answers

Command Line Tools for Genomic Data Science Week 01 Quiz Answers

Quiz 1: Module 1 Quiz

Q1. Which of the following Unix commands can be used to view the content of a file?

View
less

Q2. Which of the following commands can be used to compress the content of a file?

View
gzip

Q3. The file “months” lists each of the 12 months on a separate line and no further lines. What would be the result if the following command was run:

cat months | head -1000 | wc –l
View
12

Q4. What is the effect of using the pipe operator ‘|’ in a sequence of commands:

View
Act as a character separator between different shell commands, without any effects on the outcome

Q5. If typing ‘pwd’ produces “/home/userA/Coursera/L1/”, which of the following commands will list the file content of the current directory?

View
ls .

Q6. Suppose your current working directory is “/home/Coursera/L1/”, and “peach”, “apple” and “pear” are subdirectories, each containing a single file named “genome”. What would be the current directory, as reported by running the ‘pwd’ command, after each of the four commands in the sequence below:

View
cd apple
rm *
cd ../..
mv apple pl

Q7. Consider the file “seasons” with the following columns separated by spaces ‘ ‘:

January 1 winter
…
December 12 winter

What would be the sequence of outputs for the following commands:
cut -d ' ' -f1,3 seasons | sort -u | wc -l" and "cut -f1 seasons | sort | uniq -c | wc -l ?

View
4, 3

Q8. Your current working directory is named “Plants”. Its subdirectory “apple” contains the files “apple.genome”, “apple.samples” and “apple.genes”. What would be the result of the command rmdir apple?

View
The command will have no effect, since the directory is not empty

Q9. Suppose that you have two files, A and B, containing experiment data:

File A: File B:

geneA + geneB +
geneB + geneC +
geneC –

What would be the sequence of outputs for the commands:

(1) comm -3 A B | wc –l
(2) comm -1 -3 A B | wc –l
(3) comm -2 A B | wc –l

View
1,2,4

Q10. The current working directory contains four subdirectories named “apple”, “pear”, “peach” and “strawberry”, each with the following files: “genome”, “genes” and “samples”. Which of the following commands would extract the top line from all of the “genes” files?

View
head -1 */genes

Quiz 2: Module 1 Exam

Q1. How many chromosomes are there in the genome?

View
3

Q2. How many genes?

View
5453

Q3. How many transcript variants?

View
5456

Q4. How many genes have a single splice variant?

View
5450

Q5. How may genes have 2 or more splice variants?

View
3

Q6. How many genes are there on the ‘+’ strand?

View
2662

Q7. How many genes are there on the ‘-’ strand?

View
2791

Q8. How many genes are there on chromosome chr1?

View
1624

Q9. How many genes are there on each chromosome chr2?

View
2058

Q10. How many genes are there on each chromosome chr3?

View
1771

Q11. How many transcripts are there on chr1?

View
1625

Q12. How many transcripts are there on chr2?

View
2059

Q13. How many transcripts are there on chr3?

View
1772

Q14. How many genes are in common between condition A and condition B?

View
2410

Q15. How many genes are specific to condition A?

View
1205

Q16. How many genes are specific to condition B?

View
1243

Q17. How many genes are in common to all three conditions?

View
1608


Command Line Tools for Genomic Data Science Week 02 Quiz Answers

Quiz 1: Module 2 Quiz

Q1. Which of the following strings cannot denote a DNA sequence:

View
MASLLRG

Q2. How many lines does it take to specify:

i) one fasta sequence? and ii) one fastq sequence?

Select the best answer.

View
Fasta – a fasta header followed by any number of sequence lines; fastq – 4 lines

Q3. Which of the following is incorrect:

View
BEDtools can be used to align sequences to the genome.

Q4. Which of the following is NOT an alignment operation:

View
Cut and paste

Q5. What is the minimum number of columns that are sufficient to specify a BED format?

View
3

Q6. Which of the following represents the most accurate conversion into BED of the GTF record:

chr1 CLASS exon 516 811 100 + . gene_id “genA”; transcript_id “genA.1”;
chr1 CLASS exon 1001 1115 100 + . gene_id “genA”; transcript_id “genA.1”;
chr1 CLASS exon 3010 3312 100 + . gene_id “genA”; transcript_id “genA.1”
```
View
chr1 515 3312 genA.1 100 + 515 3312 0 3 296,115,303 0,485,2494

chr1 516 3312 genA + 516 3312 0 2 296,303 0,2494

Q7. Determine the number of genes, transcripts, exons per transcript, gene orientation (strand), and the length of 5’ most exon(s) from the GTF snippet below. Select the correct answer.

chr1 HAVANA gene 3205901 3671498 . - . gene_id "MUSG51951.5";
chr1 HAVANA transcript 3205901 3216344 . - . gene_id "MUSG51951.5"; transcript_id "MUST162897.1";
chr1 HAVANA exon 3213609 3216344 . - . gene_id "MUSG51951.5"; transcript_id "MUST162897.1”;
chr1 HAVANA exon 3205901 3207317 . - . gene_id "MUSG51951.5"; transcript_id "MUST162897.1
chr1 HAVANA transcript 3206523 3215632 . - . gene_id "MUSG51951.5"; transcript_id "MUST159265.1”;
chr1 HAVANA exon 3213439 3215632 . - . gene_id "MUSG51
View
Genes: 1; Transcripts: 2; Exons: 2,2; Strand: -; Length of 5’ exon(s): 2735, 2193.

Q8. Which of the following is FALSE for the following read alignments:

R1 83 chr12 9232390 255 50M = 9232180 0
ATGGCAGAGCCTAATATGTCTCCTAGAGAATGGGAGAGATGGGAAGTCAT HGHHHHHHHHHHHHHHHHHHHHHHHHHHIGIIIIHHHHHHHHHHHGHHFH NM:i:0 NH:i:1 HI:i:0
R2 97 chr12 9232391 255 28M278N22M = 9242529
0 TGGCAGAGCCTAATATGTCTCCCAAAACTGAGACAGAAGCTCGGGCAGAT D>DDDHHHHHHHHHHIHIHHHHHIHHHHIGFFGGGHHHHHHHHHHFB.F NM:i:4 NH:i:3 HI:i:0 XS:A:+ NS:i:2
R3 77 * 0 0 0 * * 0 0 CTGATATGAGGAAAGAGGATTGCTTAAGCCCAGGAGGTAGAGGCTGTACC @@@FFDFFHFFHHJJJJIJEGFGIGHHIHIIIIGCDE?D?FGGCBHDGGG
View
R2 has an exact match to the genome.

Q9. For the alignment below, which statements are FALSE? The binary encoding for 97 is 972 = 0000 0110 00012. Select all answers that apply.

R2 97 chr12 9232391 255 28M278N22M = 9242529
0 TGGCAGAGCCTAATATGTCTCCCAAAACTGAGACAGAAGCTCGGGCAGAT D>DDDHHHHHHHHHHIHIHHHHHIHHHHIGFFGGGHHHHHHHHHHFB.F NM:i:4 XS:A:+ NS:i:2
View
The alignment passes quality checks.
The sequence of the read’s mate is reverse-complemented in its alignment.

Q10. Files ‘A.bed’ and ‘B.bed’ contain the following sets of intervals:

View
bedtools intersect –wao –a A.bed –b B.bed | sort –u | wc -l
bedtools intersect –wo –a A.bed –b B.bed | cut –f1-3 | sort –u | wc -l

Quiz 2: Module 2 Exam

Q1. How many alignments does the set contain?

View
221372

Q2. How many alignments show the read’s mate unmapped?

View
65521

Q3. How many alignments contain a deletion (D)?

View
2451

Q4. How many alignments show the read’s mate mapped to the same chromosome?

View
150913

Q5. How many alignments are spliced?

View
0

Q6. How many alignments does the set contain?

View
7081

Q7. How many alignments show the read’s mate unmapped?

View
1983

Q8. How many alignments contain a deletion (D)?

View
31

Q9. How many alignments show the read’s mate mapped to the same chromosome?

View
4670


Q10. How many alignments are spliced?

View
0

Q11. How many sequences are in the genome file?

View
7

Q12. What is the length of the first sequence in the genome file?

View
29923332

Q13. What alignment tool was used?

View
stampy

Q14. What is the read identifier (name) for the first alignment?

View
GAII05_0002:1:113:7822:3886#0

Q15. What is the start position of this read’s mate on the genome? Give this as ‘chrom:pos’ if the read was mapped, or ‘*” if unmapped.

View
Chr3:11700332

Q16. How many overlaps (each overlap is reported on one line) are reported?

View
3101

Q17. How many of these are 10 bases or longer?

View
2899

Q18. How many alignments overlap the annotations?

View
3101

Q19. Conversely, how many exons have reads mapped to them?

View
21

Q20. If you were to convert the transcript annotations in the file “athal_wu_0_A_annot.gtf” into BED format, how many BED records would be generated?

View
4


Command Line Tools for Genomic Data Science Week 03 Quiz Answers

Quiz 1: Module 3 Quiz

Q1. Which of the following statements is FALSE:

View
SNP refers to a Single Non-defined Polymorphism

Q2. Which of the following statements is FALSE:

View
The VCF format shows the changes in amino acid resulting from the nucleotide mutation, in column 3.

Q3. What program can be used to generate a list of candidate sites of variation in an exome data set:

View
samtools

Q4. In a comprehensive effort to study genome variation in a patient cohort, you sequence and call variants in the exome, whole genome shotgun and RNA-seq data from each patient. Which of the following is FALSE when comparing these three types of resources:

View
Exome sequencing comprehensively captures variants in the 3’ and 5’ UTRs of genes.

Q5. Which of the following options can be used to allow bowtie2 to generate partial alignments?

View
–local

Q6. Select the correct interpretation for the snippet of ‘mpileup’ output below.

Chr3 11700316 C 8 .$……. 8C@C;CB3
Chr3 11951491 G 16 AAAA,……aA..A C2@2BCBCCCAC2CC4

Both sites show potential variation;

View
the alternate letter for site 1 is $, and for site 2 is A;

site 1 has 8 supporting reads, and site 2 has 16

Q7. Given the set of variants described in the VCF excerpt below, which of the following is FALSE?

INFO=
INFO=
FORMAT=
FORMAT=
Chr3 11966312 . G A 15.9 . DP=5;MQ=15 GT:PL 1/1:43,9,0
Chr3 11972108 . TAAAA TAAA 32.8 . INDEL;IDV=7;IMF=0.636364;DP=11;MQ=22 GT:PL 0/1:66,0,2
Chr3 13792328 rs145271872 G T 5.5 . DP=1;MQ=40 GT
View
The alternate allele for variant 1 is A

Q8. What does the following code do:

bowtie2 –x species/species –U in.fastq | grep –v “^@” | cut –f3 | sort | uniq –c

Run bowtie2 with a set of single-end reads, reporting the top 5 alignments for a read;

then determine the number of reads mapped reverse complemented

Run bowtie2 with a set of single-end reads, allowing for local matches;

then determine the number of matches with unmapped mates

Run bowtie2 with a set of single-end reads, reporting the best alignment only;

then determine the number of matches on each genomic sequence

Run bowtie2 with a set of single-end reads, allowing for local matches;

then determine the number of exact matches on each genomic sequence

Q9. What does the following snippet of code do NOT do:

samtools mpileup –O –f genome.fa in.bam | cut –f7
View
Report in the intermediate mpileup output the qualities of all read bases aligned at that position

Q10. What does the following code do NOT do:

bcftools call –v –c –O z –o out.vcf.gz in.vcf.gz
View
Report output in compressed VCF format

Quiz 2: Module 3 Exam

Q1. How many sequences were in the genome?

View 7

Q2. What was the name of the third sequence in the genome file? Give the name only, without the “>” sign.

View
Chr3

Q3. What was the name of the last sequence in the genome file? Give the name only, without the “>” sign.

View
mitochondria

Q4. How many index files did the operation create?

View
6

Q5. What is the 3-character extension for the index files created?

View
bt2

Q6. How many reads were in the original fastq file?

View
147354

Q7. How many matches (alignments) were reported for the original (full-match) setting? Exclude lines in the file containing unmapped reads.

View
137719

Q8. How many matches (alignments) were reported with the local-match setting? Exclude lines in the file containing unmapped reads.

View
141044

Q9. How many reads were mapped in the scenario in Question 7?

View
137719


Q10. How many reads were mapped in the scenario in Question 8?

View
141044

Q11. How many reads had multiple matches in the scenario in Question 7? You can find this in the bowtie2 summary; note that by default bowtie2 only reports the best match for each read.​

View
43939

Q12. How many reads had multiple matches in the scenario in Question 8? Use the format above. You can find this in the bowtie2 summary; note that by default bowtie2 only reports the best match for each read.​

View
56105

Q13. How many alignments contained insertions and/or deletions, in the scenario in Question 7?

View
2782

Q14. How many alignments contained insertions and/or deletions, in the scenario in Question 8?

View
2614

QQ15. How many entries were reported for Chr3?

View
360295

Q16. How many entries have ‘A’ as the corresponding genome letter?

View
1150985

Q17. How many entries have exactly 20 supporting reads (read depth)?

View
1816

Q18. How many entries represent indels?

View
1972

Q19. How many entries are reported for position 175672 on Chr1?

View
2

Q20. How many variants are called on Chr3?

View
398

Q21. How many variants represent an A->T SNP? If useful, you can use ‘grep –P’ to allow tabular spaces in the search term.

View
392

Q22. How many entries are indels?

View
320

Q23. How many entries have precisely 20 supporting reads (read depth)?

View
2

Q24. What type of variant (i.e., SNP or INDEL) is called at position 11937923 on Chr3?

View
SNP


Command Line Tools for Genomic Data Science Week 04 Quiz Answers

Quiz 1: Module 4 Quiz

Q1. Which of the following is FALSE:

View
A human gene can express at most 12 splice variants.

Q2. Which of the following is FALSE about the organization of a eukaryotic gene:

View
The length of an intron cannot be a multiple of 3.

Q3. What programs could you use to align RNA-seq reads to: i) a reference genome, and ii) a transcript database?

View
bowtie, bwa

tophat, bwa

Q4. Which of the following is FALSE:

View
RNA-seq can be used to quantify the expression levels of proteins.

Spliced reads can be used to determine the introns in a gene.

Q5. What programs could be used to: i) assemble transcripts from RNA-seq reads, and ii) identify potentially novel transcripts and genes?

View
cufflinks, cuff-compare

Q6. Which of the following is FALSE about the gene annotations in the following GTF snippet:

chr1 MGF gene 3413609 3671498 . - . gene_id "MG051951";
chr1 MGF transcript 3413609 3416344 . - .gene_id "MG051951"; transcript_id "MT162897";
chr1 MGF exon 3413609 3416344 . - . gene_id "MG051951"; transcript_id "MT162897";
chr1 MGF transcript 3421702 3671498 . - . gene_id "MG051951"; transcript_id "MT070533";
chr1 MGF exon 3670552 3671498 . - . gene_id "MG051951"; transcript_id "MT070533";
chr1 MGF CDS 3670552 3671348 . - 0 gene_id "MG051951"; transcript_id "MT070533";
chr1 MGF exon 342170
View
Both exons of MT70533 contain both coding and non-coding sequences.

Q7. What does the following code NOT do:

BWT2IDX=/home/me/genomes/hg20/hg20
ANNOT=/home/me/genomes/hg20/myannot.gtf
ANNOTIDX=/home/me/genomes/hg20/myannot/myannot
mkdir -p /home/me/SRR100000
tophat2 -o /home/me/SRR100000 -p 10 --max-multihits 10 \
-r 26 –-mate-std-dev 25 \
-a 6 \
-G $ANNOT –-transcriptome-index $ANNOTIDX \
View
Report spliced reads with at most 6 mismatches in the anchor site

Q8. What does the following code NOT do:

TOPHATDIR=/home/florea/Tophat/
mkdir –p Test1
cd Test1
ln –s $TOPHATDIR/accepted_hits.bam .
cufflinks -L Test1 -p 8 –j 0.10 –F 0.05 accepted_hits.bam
View
Use the default reference transcript annotation to guide assembly

Q9. Which of the following is NOT described in the following summary file produced by tophat:

Left reads:
Input : 60586968
Mapped : 58163843 (96.0% of input)
of these: 6832240 (11.7%) have multiple alignments (359075 have >10)
Right reads:
Input : 60586968
Mapped : 56969290 (94.0% of input)
of these: 6668479 (11.7%) have multiple alignments (358573 have >10)
95.0% overall read mapping rate.
View
The reads were 100 bp long

Q10. Which of the following is NOT TRUE about the output below, obtained from a cuffdiff differential expression analysis:

XLOC_000002 XLOC_000002 AT1G01020 1:5927-8737 q1 q2 OK 1.13032 3.48406 1.62404 0.694576 0.5277 0.998846 no
XLOC_000004 XLOC_000004 AT1G01073 1:44676-44787 q1 q2 NOTEST 0 0 0 0 1 1 no
XLOC_000042 XLOC_000042 AT1G01580 1:209394-213041 q1 q2 OK 1.59512 0 -inf nan 5e-05 0.0096703 yes
View
There are too many alignments for testing for differential expression at locus XLOC_000004

Quiz 2: Module 4 Exam

Q1. How many alignments were produced for the ‘Day8’ RNA-seq data set?

View
63845

Q2. How many alignments were produced for the ‘Day16’ RNA-seq data set?

View
58398

Q3. How many reads were mapped in ‘Day8’ RNA-seq data set?

View
63489

Q4. How many reads were mapped in ‘Day16’ RNA-seq data set?

View
57951

Q5. How many reads were uniquely aligned in ‘Day8’ RNA-seq data set?

View
63133

Q6. How many reads were uniquely aligned in ‘Day16’ RNA-seq data set?

View
57504

Q7. How many spliced alignments were reported for ‘Day8’ RNA-seq data set?

View
8596

Q8. How many spliced alignments were reported for ‘Day16’ RNA-seq data set?

View
10695

Q9. How many reads were left unmapped from ‘Day8’ RNA-seq data set?

View
84


Q10. How many reads were left unmapped from ‘Day16’ RNA-seq data set?

View
34

Q11. How many genes were generated by cufflinks for Day8?

View
186

Q12. How many genes were generated by cufflinks for Day16?

View
80

Q13. How many transcripts were reported for Day8?

View
192

Q14. How many transcripts were reported for Day16?

View
92

Q15. How many single transcript genes were produced for Day8?

View
180

Q16. How many single transcript genes were produced for Day16?

View
69

Q17. How many single-exon transcripts were in the Day8 set?

View
119

Q18. How many single-exon transcripts were in the Day16 set?

View
24

Q19. How many multi-exon transcripts were in the Day8 set?

View
73

Q20. How many multi-exon transcripts were in the Day16 set?

View
68

Q21. How many cufflinks transcripts fully reconstruct annotation transcripts in Day8?

View
16

Q22. How many cufflinks transcripts fully reconstruct annotation transcripts in Day16?

View
36

Q23. How many splice variants does the gene AT4G20240 have in the Day8 sample?

View
2

Q24. How many splice variants does the gene AT4G20240 have in the Day16 sample?

View
0

Q25. How many cufflinks transcripts are partial reconstructions of reference transcripts (‘contained’)? (Day8)

View
133

Q26. How many cufflinks transcripts are partial reconstructions of reference transcripts (‘contained’)? (Day16)

View
21

Q27. How many cufflinks transcripts are novel splice variants of reference genes? (Day8)

View
14

Q28. How many cufflinks transcripts are novel splice variants of reference genes? (Day16)

View
22

Q29. How many cufflinks transcripts were formed in the introns of reference genes? (Day8)

View
4

Q30. How many cufflinks transcripts were formed in the introns of reference genes? (Day16)

View
1

Q31. How many genes (loci) were reported in the merged.gtf file?

View
129

Q32. How many transcripts?

View
200

Q33. How many genes total were included in the gene expression report from cuffdiff?

View
129

Q34. How many genes were detected as differentially expressed?

View
4

Q35. How many transcripts were differentially expressed between the two samples?

View
5


Get All Course Quiz Answers for the Genomic Data Science Specialization

Introduction to Genomic Technologies Coursera Quiz Answers

Python for Genomic Data Science Coursera Quiz Answers

Algorithms for DNA Sequencing Coursera Quiz Answers

Command Line Tools for Genomic Data Science Coursera Quiz Answers

Bioconductor for Genomic Data Science Coursera Quiz Answers

Statistics for Genomic Data Science

Team Networking Funda
Team Networking Funda

We are Team Networking Funda, a group of passionate authors and networking enthusiasts committed to sharing our expertise and experiences in networking and team building. With backgrounds in Data Science, Information Technology, Health, and Business Marketing, we bring diverse perspectives and insights to help you navigate the challenges and opportunities of professional networking and teamwork.

Leave a Reply

Your email address will not be published. Required fields are marked *