Get All Weeks Command Line Tools for Genomic Data Science Quiz Answers
Table of Contents
Command Line Tools for Genomic Data Science Week 01 Quiz Answers
Quiz 1: Module 1 Quiz
Q1. Which of the following Unix commands can be used to view the content of a file?
Q2. Which of the following commands can be used to compress the content of a file?
ViewQ3. The file “months” lists each of the 12 months on a separate line and no further lines. What would be the result if the following command was run:
cat months | head -1000 | wc –l
ViewQ4. What is the effect of using the pipe operator ‘|’ in a sequence of commands:
ViewQ5. If typing ‘pwd’ produces “/home/userA/Coursera/L1/”, which of the following commands will list the file content of the current directory?
ViewQ6. Suppose your current working directory is “/home/Coursera/L1/”, and “peach”, “apple” and “pear” are subdirectories, each containing a single file named “genome”. What would be the current directory, as reported by running the ‘pwd’ command, after each of the four commands in the sequence below:
Viewrm *
cd ../..
mv apple pl
Q7. Consider the file “seasons” with the following columns separated by spaces ‘ ‘:
January 1 winter
…
December 12 winter
What would be the sequence of outputs for the following commands:cut -d ' ' -f1,3 seasons | sort -u | wc -l" and "cut -f1 seasons | sort | uniq -c | wc -l
?
Q8. Your current working directory is named “Plants”. Its subdirectory “apple” contains the files “apple.genome”, “apple.samples” and “apple.genes”. What would be the result of the command rmdir apple
?
Q9. Suppose that you have two files, A and B, containing experiment data:
File A: File B:
geneA + geneB +
geneB + geneC +
geneC –
What would be the sequence of outputs for the commands:
(1) comm -3 A B | wc –l
(2) comm -1 -3 A B | wc –l
(3) comm -2 A B | wc –l
Q10. The current working directory contains four subdirectories named “apple”, “pear”, “peach” and “strawberry”, each with the following files: “genome”, “genes” and “samples”. Which of the following commands would extract the top line from all of the “genes” files?
ViewQuiz 2: Module 1 Exam
Q1. How many chromosomes are there in the genome?
ViewQ2. How many genes?
ViewQ3. How many transcript variants?
ViewQ4. How many genes have a single splice variant?
ViewQ5. How may genes have 2 or more splice variants?
ViewQ6. How many genes are there on the ‘+’ strand?
ViewQ7. How many genes are there on the ‘-’ strand?
ViewQ8. How many genes are there on chromosome chr1?
ViewQ9. How many genes are there on each chromosome chr2?
ViewQ10. How many genes are there on each chromosome chr3?
ViewQ11. How many transcripts are there on chr1?
ViewQ12. How many transcripts are there on chr2?
ViewQ13. How many transcripts are there on chr3?
ViewQ14. How many genes are in common between condition A and condition B?
ViewQ15. How many genes are specific to condition A?
ViewQ16. How many genes are specific to condition B?
ViewQ17. How many genes are in common to all three conditions?
ViewCommand Line Tools for Genomic Data Science Week 02 Quiz Answers
Quiz 1: Module 2 Quiz
Q1. Which of the following strings cannot denote a DNA sequence:
ViewQ2. How many lines does it take to specify:
i) one fasta sequence? and ii) one fastq sequence?
Select the best answer.
ViewQ3. Which of the following is incorrect:
ViewQ4. Which of the following is NOT an alignment operation:
ViewQ5. What is the minimum number of columns that are sufficient to specify a BED format?
ViewQ6. Which of the following represents the most accurate conversion into BED of the GTF record:
chr1 CLASS exon 516 811 100 + . gene_id “genA”; transcript_id “genA.1”;
chr1 CLASS exon 1001 1115 100 + . gene_id “genA”; transcript_id “genA.1”;
chr1 CLASS exon 3010 3312 100 + . gene_id “genA”; transcript_id “genA.1”
```
Viewchr1 516 3312 genA + 516 3312 0 2 296,303 0,2494
Q7. Determine the number of genes, transcripts, exons per transcript, gene orientation (strand), and the length of 5’ most exon(s) from the GTF snippet below. Select the correct answer.
chr1 HAVANA gene 3205901 3671498 . - . gene_id "MUSG51951.5";
chr1 HAVANA transcript 3205901 3216344 . - . gene_id "MUSG51951.5"; transcript_id "MUST162897.1";
chr1 HAVANA exon 3213609 3216344 . - . gene_id "MUSG51951.5"; transcript_id "MUST162897.1”;
chr1 HAVANA exon 3205901 3207317 . - . gene_id "MUSG51951.5"; transcript_id "MUST162897.1
chr1 HAVANA transcript 3206523 3215632 . - . gene_id "MUSG51951.5"; transcript_id "MUST159265.1”;
chr1 HAVANA exon 3213439 3215632 . - . gene_id "MUSG51
ViewQ8. Which of the following is FALSE for the following read alignments:
R1 83 chr12 9232390 255 50M = 9232180 0
ATGGCAGAGCCTAATATGTCTCCTAGAGAATGGGAGAGATGGGAAGTCAT HGHHHHHHHHHHHHHHHHHHHHHHHHHHIGIIIIHHHHHHHHHHHGHHFH NM:i:0 NH:i:1 HI:i:0
R2 97 chr12 9232391 255 28M278N22M = 9242529
0 TGGCAGAGCCTAATATGTCTCCCAAAACTGAGACAGAAGCTCGGGCAGAT D>DDDHHHHHHHHHHIHIHHHHHIHHHHIGFFGGGHHHHHHHHHHFB.F NM:i:4 NH:i:3 HI:i:0 XS:A:+ NS:i:2
R3 77 * 0 0 0 * * 0 0 CTGATATGAGGAAAGAGGATTGCTTAAGCCCAGGAGGTAGAGGCTGTACC @@@FFDFFHFFHHJJJJIJEGFGIGHHIHIIIIGCDE?D?FGGCBHDGGG
ViewQ9. For the alignment below, which statements are FALSE? The binary encoding for 97 is 972 = 0000 0110 00012. Select all answers that apply.
R2 97 chr12 9232391 255 28M278N22M = 9242529
0 TGGCAGAGCCTAATATGTCTCCCAAAACTGAGACAGAAGCTCGGGCAGAT D>DDDHHHHHHHHHHIHIHHHHHIHHHHIGFFGGGHHHHHHHHHHFB.F NM:i:4 XS:A:+ NS:i:2
ViewThe sequence of the read’s mate is reverse-complemented in its alignment.
Q10. Files ‘A.bed’ and ‘B.bed’ contain the following sets of intervals:
Viewbedtools intersect –wo –a A.bed –b B.bed | cut –f1-3 | sort –u | wc -l
Quiz 2: Module 2 Exam
Q1. How many alignments does the set contain?
ViewQ2. How many alignments show the read’s mate unmapped?
ViewQ3. How many alignments contain a deletion (D)?
ViewQ4. How many alignments show the read’s mate mapped to the same chromosome?
ViewQ5. How many alignments are spliced?
ViewQ6. How many alignments does the set contain?
ViewQ7. How many alignments show the read’s mate unmapped?
ViewQ8. How many alignments contain a deletion (D)?
ViewQ9. How many alignments show the read’s mate mapped to the same chromosome?
ViewQ10. How many alignments are spliced?
ViewQ11. How many sequences are in the genome file?
ViewQ12. What is the length of the first sequence in the genome file?
ViewQ13. What alignment tool was used?
ViewQ14. What is the read identifier (name) for the first alignment?
ViewQ15. What is the start position of this read’s mate on the genome? Give this as ‘chrom:pos’ if the read was mapped, or ‘*” if unmapped.
ViewQ16. How many overlaps (each overlap is reported on one line) are reported?
ViewQ17. How many of these are 10 bases or longer?
ViewQ18. How many alignments overlap the annotations?
ViewQ19. Conversely, how many exons have reads mapped to them?
ViewQ20. If you were to convert the transcript annotations in the file “athal_wu_0_A_annot.gtf” into BED format, how many BED records would be generated?
ViewCommand Line Tools for Genomic Data Science Week 03 Quiz Answers
Quiz 1: Module 3 Quiz
Q1. Which of the following statements is FALSE:
ViewQ2. Which of the following statements is FALSE:
ViewQ3. What program can be used to generate a list of candidate sites of variation in an exome data set:
ViewQ4. In a comprehensive effort to study genome variation in a patient cohort, you sequence and call variants in the exome, whole genome shotgun and RNA-seq data from each patient. Which of the following is FALSE when comparing these three types of resources:
ViewQ5. Which of the following options can be used to allow bowtie2 to generate partial alignments?
ViewQ6. Select the correct interpretation for the snippet of ‘mpileup’ output below.
Chr3 11700316 C 8 .$……. 8C@C;CB3
Chr3 11951491 G 16 AAAA,……aA..A C2@2BCBCCCAC2CC4
Both sites show potential variation;
Viewsite 1 has 8 supporting reads, and site 2 has 16
Q7. Given the set of variants described in the VCF excerpt below, which of the following is FALSE?
INFO=
INFO=
FORMAT=
FORMAT=
Chr3 11966312 . G A 15.9 . DP=5;MQ=15 GT:PL 1/1:43,9,0
Chr3 11972108 . TAAAA TAAA 32.8 . INDEL;IDV=7;IMF=0.636364;DP=11;MQ=22 GT:PL 0/1:66,0,2
Chr3 13792328 rs145271872 G T 5.5 . DP=1;MQ=40 GT
ViewQ8. What does the following code do:
bowtie2 –x species/species –U in.fastq | grep –v “^@” | cut –f3 | sort | uniq –c
Run bowtie2 with a set of single-end reads, reporting the top 5 alignments for a read;
then determine the number of reads mapped reverse complemented
Run bowtie2 with a set of single-end reads, allowing for local matches;
then determine the number of matches with unmapped mates
Run bowtie2 with a set of single-end reads, reporting the best alignment only;
then determine the number of matches on each genomic sequence
Run bowtie2 with a set of single-end reads, allowing for local matches;
then determine the number of exact matches on each genomic sequence
Q9. What does the following snippet of code do NOT do:
samtools mpileup –O –f genome.fa in.bam | cut –f7
ViewQ10. What does the following code do NOT do:
bcftools call –v –c –O z –o out.vcf.gz in.vcf.gz
ViewQuiz 2: Module 3 Exam
Q1. How many sequences were in the genome?
View 7Q2. What was the name of the third sequence in the genome file? Give the name only, without the “>” sign.
ViewQ3. What was the name of the last sequence in the genome file? Give the name only, without the “>” sign.
ViewQ4. How many index files did the operation create?
ViewQ5. What is the 3-character extension for the index files created?
ViewQ6. How many reads were in the original fastq file?
ViewQ7. How many matches (alignments) were reported for the original (full-match) setting? Exclude lines in the file containing unmapped reads.
ViewQ8. How many matches (alignments) were reported with the local-match setting? Exclude lines in the file containing unmapped reads.
ViewQ9. How many reads were mapped in the scenario in Question 7?
ViewQ10. How many reads were mapped in the scenario in Question 8?
ViewQ11. How many reads had multiple matches in the scenario in Question 7? You can find this in the bowtie2 summary; note that by default bowtie2 only reports the best match for each read.
ViewQ12. How many reads had multiple matches in the scenario in Question 8? Use the format above. You can find this in the bowtie2 summary; note that by default bowtie2 only reports the best match for each read.
ViewQ13. How many alignments contained insertions and/or deletions, in the scenario in Question 7?
ViewQ14. How many alignments contained insertions and/or deletions, in the scenario in Question 8?
ViewQQ15. How many entries were reported for Chr3?
ViewQ16. How many entries have ‘A’ as the corresponding genome letter?
ViewQ17. How many entries have exactly 20 supporting reads (read depth)?
ViewQ18. How many entries represent indels?
ViewQ19. How many entries are reported for position 175672 on Chr1?
ViewQ20. How many variants are called on Chr3?
ViewQ21. How many variants represent an A->T SNP? If useful, you can use ‘grep –P’ to allow tabular spaces in the search term.
ViewQ22. How many entries are indels?
ViewQ23. How many entries have precisely 20 supporting reads (read depth)?
ViewQ24. What type of variant (i.e., SNP or INDEL) is called at position 11937923 on Chr3?
ViewCommand Line Tools for Genomic Data Science Week 04 Quiz Answers
Quiz 1: Module 4 Quiz
Q1. Which of the following is FALSE:
ViewQ2. Which of the following is FALSE about the organization of a eukaryotic gene:
ViewQ3. What programs could you use to align RNA-seq reads to: i) a reference genome, and ii) a transcript database?
Viewtophat, bwa
Q4. Which of the following is FALSE:
ViewSpliced reads can be used to determine the introns in a gene.
Q5. What programs could be used to: i) assemble transcripts from RNA-seq reads, and ii) identify potentially novel transcripts and genes?
ViewQ6. Which of the following is FALSE about the gene annotations in the following GTF snippet:
chr1 MGF gene 3413609 3671498 . - . gene_id "MG051951";
chr1 MGF transcript 3413609 3416344 . - .gene_id "MG051951"; transcript_id "MT162897";
chr1 MGF exon 3413609 3416344 . - . gene_id "MG051951"; transcript_id "MT162897";
chr1 MGF transcript 3421702 3671498 . - . gene_id "MG051951"; transcript_id "MT070533";
chr1 MGF exon 3670552 3671498 . - . gene_id "MG051951"; transcript_id "MT070533";
chr1 MGF CDS 3670552 3671348 . - 0 gene_id "MG051951"; transcript_id "MT070533";
chr1 MGF exon 342170
ViewQ7. What does the following code NOT do:
BWT2IDX=/home/me/genomes/hg20/hg20
ANNOT=/home/me/genomes/hg20/myannot.gtf
ANNOTIDX=/home/me/genomes/hg20/myannot/myannot
mkdir -p /home/me/SRR100000
tophat2 -o /home/me/SRR100000 -p 10 --max-multihits 10 \
-r 26 –-mate-std-dev 25 \
-a 6 \
-G $ANNOT –-transcriptome-index $ANNOTIDX \
ViewQ8. What does the following code NOT do:
TOPHATDIR=/home/florea/Tophat/
mkdir –p Test1
cd Test1
ln –s $TOPHATDIR/accepted_hits.bam .
cufflinks -L Test1 -p 8 –j 0.10 –F 0.05 accepted_hits.bam
ViewQ9. Which of the following is NOT described in the following summary file produced by tophat:
Left reads:
Input : 60586968
Mapped : 58163843 (96.0% of input)
of these: 6832240 (11.7%) have multiple alignments (359075 have >10)
Right reads:
Input : 60586968
Mapped : 56969290 (94.0% of input)
of these: 6668479 (11.7%) have multiple alignments (358573 have >10)
95.0% overall read mapping rate.
ViewQ10. Which of the following is NOT TRUE about the output below, obtained from a cuffdiff differential expression analysis:
XLOC_000002 XLOC_000002 AT1G01020 1:5927-8737 q1 q2 OK 1.13032 3.48406 1.62404 0.694576 0.5277 0.998846 no
XLOC_000004 XLOC_000004 AT1G01073 1:44676-44787 q1 q2 NOTEST 0 0 0 0 1 1 no
XLOC_000042 XLOC_000042 AT1G01580 1:209394-213041 q1 q2 OK 1.59512 0 -inf nan 5e-05 0.0096703 yes
ViewQuiz 2: Module 4 Exam
Q1. How many alignments were produced for the ‘Day8’ RNA-seq data set?
ViewQ2. How many alignments were produced for the ‘Day16’ RNA-seq data set?
ViewQ3. How many reads were mapped in ‘Day8’ RNA-seq data set?
ViewQ4. How many reads were mapped in ‘Day16’ RNA-seq data set?
ViewQ5. How many reads were uniquely aligned in ‘Day8’ RNA-seq data set?
ViewQ6. How many reads were uniquely aligned in ‘Day16’ RNA-seq data set?
ViewQ7. How many spliced alignments were reported for ‘Day8’ RNA-seq data set?
ViewQ8. How many spliced alignments were reported for ‘Day16’ RNA-seq data set?
ViewQ9. How many reads were left unmapped from ‘Day8’ RNA-seq data set?
ViewQ10. How many reads were left unmapped from ‘Day16’ RNA-seq data set?
ViewQ11. How many genes were generated by cufflinks for Day8?
ViewQ12. How many genes were generated by cufflinks for Day16?
ViewQ13. How many transcripts were reported for Day8?
ViewQ14. How many transcripts were reported for Day16?
ViewQ15. How many single transcript genes were produced for Day8?
ViewQ16. How many single transcript genes were produced for Day16?
ViewQ17. How many single-exon transcripts were in the Day8 set?
ViewQ18. How many single-exon transcripts were in the Day16 set?
ViewQ19. How many multi-exon transcripts were in the Day8 set?
ViewQ20. How many multi-exon transcripts were in the Day16 set?
ViewQ21. How many cufflinks transcripts fully reconstruct annotation transcripts in Day8?
ViewQ22. How many cufflinks transcripts fully reconstruct annotation transcripts in Day16?
ViewQ23. How many splice variants does the gene AT4G20240 have in the Day8 sample?
ViewQ24. How many splice variants does the gene AT4G20240 have in the Day16 sample?
ViewQ25. How many cufflinks transcripts are partial reconstructions of reference transcripts (‘contained’)? (Day8)
ViewQ26. How many cufflinks transcripts are partial reconstructions of reference transcripts (‘contained’)? (Day16)
ViewQ27. How many cufflinks transcripts are novel splice variants of reference genes? (Day8)
ViewQ28. How many cufflinks transcripts are novel splice variants of reference genes? (Day16)
ViewQ29. How many cufflinks transcripts were formed in the introns of reference genes? (Day8)
ViewQ30. How many cufflinks transcripts were formed in the introns of reference genes? (Day16)
ViewQ31. How many genes (loci) were reported in the merged.gtf file?
ViewQ32. How many transcripts?
ViewQ33. How many genes total were included in the gene expression report from cuffdiff?
ViewQ34. How many genes were detected as differentially expressed?
ViewQ35. How many transcripts were differentially expressed between the two samples?
ViewGet All Course Quiz Answers for the Genomic Data Science Specialization
Introduction to Genomic Technologies Coursera Quiz Answers
Python for Genomic Data Science Coursera Quiz Answers
Algorithms for DNA Sequencing Coursera Quiz Answers
Command Line Tools for Genomic Data Science Coursera Quiz Answers
Bioconductor for Genomic Data Science Coursera Quiz Answers
Statistics for Genomic Data Science