All Weeks Python and Machine-Learning for Asset Management with Alternative Data Sets Quiz Answers
Table of Contents
Python and Machine-Learning for Asset Management with Alternative Data Sets Coursera Quiz Answers
Week 1: Python and Machine-Learning for Asset Management with Alternative Data Sets
Q1. In what ways could managers potentially benefit from misrepresenting earnings?
- They misrepresent bad earnings so the stock does well
- They over-hype earnings during the announcement so that the stock goes up
- They sell their own holdings in the company before negative earnings announcements which they under hype
- They buy their own holdings of the company before positive earnings announcements which they under hype
Q2. If we wanted to measure foot traffic into stores on a busy street, what might be a challenge we would face?
- Identifying whether customers were going into a specific store or another one
- There isn’t a technology that could capture foot traffic at that level of granularity.
- It is illegal
- The size of the data would be too large to handle
Q3. What kind of normalization might someone want to do on foot traffic based data?
- Seasonality normalization for changes in overall shopping based on season
- Hourly normalization or aggregation to daily data
- Normalization for specific events such as sales events
- All of the above
Q4. Why does euclidean distance is not sufficiently precise for geolocational data manipulation?
- It does not take into account the curvature of earth
- The part where the difference is squared biases results
- Both A & B
- The numbers we obtain will be too large to process with regular computers
Q5. If we were converting a datetime index by using the date formatter “%m/%d/%Y”, what would/could end up happening to how the data could be grouped?
- It could be grouped into hourly buckets
- It could be grouped into daily buckets
- It could be grouped into weekly buckets if we did another change on the index
- Both B+C
Q6. If we had average rides per hour for both the weekend and weekday, why might we normalize and divide by the total number of rides for each respective category?
To increase the sample size
- To decrease the sample size
- To compare the two on a similar scale and deal with differences in the number of rides that might be attributed to a different overall number of weekend vs. weekday rides
- To deal with the difference in scale of each of the 7 days
Q7. What kind of seasonality did we see present in the uber dataset/
Hourly
- Weekday
- Idiosyncratic (Big Shocks/clustering)
- All of the above
Q8. Which of the following would not be considered consumption alternative data:
- Cell phone geolocational data
- Company earning announcements
- Satellite images
- Tweets
Q9. Which of the following are common issues with consumption data:
- Accuracy of measurement
- Privacy issues
- Scarcity of devices/methodologies for collection
- Both A+B
Q10. Why would inclusion of major sporting events as a table help our uber analysis?
- We could normalize for expected extra drop offs in that time frame
- We could observe rides attributable to the event by looking around that area and time
- Both A+B
- Neither A + B
Week 2: Python and Machine-Learning for Asset Management with Alternative Data Sets
Q1. If we are computing inverse document frequency like we did in class, and we have 5 documents, in which a word appears 2, 0, 0, 0,2 times respectively, what will the IDF term equal?
- ln(4)
- 1
- 5/2
- ln(5/2)
Q2. What is a stop word?
- A word which when included messes up any text mining algorithms
- Rare words that we exclude as outliers
- Words that signify the ending of a sentence
- Words such as “the” which are very common in many documents hence not valuable to textual analysis as they do not provide differentiating information between documents
Q3. What is the cosine similarity of the two following vectors: [5, 3, 5, 0], [-5, 0, 3, 0]?
- -0.22327214
- 44.788391353117383
- 10
- 0
Q4. In class, we do a log transform, if instead of doing a log transform we did an exponential transformation (the opposite effect of a log transform), what would the effect on the word values be?
- Very frequent words would still be worth more than less frequent words, but to a lesser degree.
- Very frequent words would still be worth more than less frequent words, and to a higher degree
- Common words would have no value
- The effect would be the same as the log transform
Q5. Which part of TF-IDF would deal with a word that is extremely common in documents and also is very frequent?
- TF
- IDF
- Both
- Neither
Q6. Consider the case where we have a set of text, and also have the same set except every word is doubled. How will normalization impact the distance between these two documents vs. the raw word count?
- The distance will be doubled
- The distance will be halved
- It will not impact distance
- Distance will become 0
Q7. If we substituted from the string “A B CC” with the character “X” and did it with the pattern “[A-Z]”, what would be the result?
- X B CC
- X X CC
- X X X
- X X XX
Q8. Which of the following is not a stop word?
- The
- We
- He
- Noun
Q9. Is it possible for there to be high cosine similarity between documents but also high distance
- No, this can’t happen
- Yes, this is what should normally happen
- Yes, if word counts were not normalized
- Yes, because these measures have no relation to each other
Week 3: Python and Machine-Learning for Asset Management with Alternative Data Sets
Q1. Would the regular expression pattern [A-Z]{2}\s[0-9]{5} find “ma 02446” and “MA02446”?
- Just the first one
- Just the second one
- It would find neither.
- It would find both.
Q2. What data is in the 13-F? Multiple responses possible
- It contains performance data for funds.
- It shows holdings that funds have of different securities
- It shows a breakdown for each tradeable security of what executives working at the company hold
- A & B
Q3. If you knew a table in an html page had the class infoTable, how would you find it with beautifulsoup?
- soup.find(“table”, {“class”: “infoTable”})
- soup.find(“table”, “infoTable”)
- soup.find(“table”, “class”, “infoTable”)
- None of the above
Q4. What can be found in the 10-K?
- Company financials
- Description of risks/competition
- Litigation
- All of the above
Q5. What are issues with the 10-K?
- Strides have been made to standardize the template in terms of html elements/the actual format, but it still is not perfect and can require work to figure out how to text mine.
- Companies aren’t required to report at a high level the same things.
- Web scraping/pulling the data from online is not very easy.
- All of the above
Q6. Explain the significance of the “decaying” of cosine similarities between 10-Ks between years.
- It means the uniqueness of the document goes down as time passes
- It means that documents that are further away in years are more similar
- It means that documents closer in terms of year are more similar.
- None of the above
Q7. What would we expect the cosine similarity to be between the same company’s 10-K as well as a competitors 10-K?
- We would expect a large similarity between the company’s own 10-K, and a small to moderate similarity between their competitors and their 10-K.
- We would expect a small to moderate similarity between the company’s own 10-K, and a small to moderate similarity between their competitors and their 10-K.
- We would expect a large similarity between the company’s own 10-K, and a large similarity between their competitors and their 10-K.
- We would expect a small to moderate similarity between the company’s own 10-K, and a large similarity between their competitors and their 10-K.
Q8. Let’s say we were looking at messy data where users input the code for their state. What might be an issue with using [A-Z]{2} for the regular expression?
- The form might not specify that 2 letters must be inputted
- Users could input lower case letters to the form
- Users could have typos by adding numbers
- All of the above
Q9. Why might we want to use a boolean representation of fund holdings of stocks?
- It can be more informative
- To reduce sample size
- Both A & B
- Neither A or B
Q10. What might be issues with measuring country risk from the 10-K?
- The presence of the country might not necessarily mean it is a risk depending on the context of its inclusion.
- Companies don’t mention countries in their risk section
- There are too many countries to track.
- None of the above
Q11. If we do term frequency like in class, what does a term with 24 occurrences become?
- Ln(24)
- Ln(25)
- 5
- 24
Week 4: Python and Machine-Learning for Asset Management with Alternative Data Sets
Q1. If we had a network with hundreds of stocks, and there were few connections between, but these connections from stock A to stock B were very high (say 20 connections), would our graph be sparse or dense, and would the high level of connections between given stocks skew the type of analysis we did?
- Sparse, no
- Sparse, yes
- Dense, no
- Dense, yes
Q2. Why don’t broad dictionaries of word-sentiment mappings work in the context of text around financial data?
- They don’t contain the words we need to analyze
- There are too many sentiment classifications
- There are many terms which have finance specific meanings like liability. In the context of regular text, this would be a negative term, but in the context of finance it really is neutral.
- We can’t analyze text at just the word by word level
Q3. For sentiment analysis, which of the following words might be different for finance vs. a general text documents?
- Asset
- Liability
- A+B
- None of the above
Q4. Which of the following is not considered a reason that analyzing media is difficult?
- Local Biases
- Differences in reporting styles of different writers.
- There isn’t enough sources
- Fake news
Q5. Using the equation for tone, what would the tone be of an article if there were 5 positive words and 3 negative words?
- 2/5
- 2/8
- -2/8
- 5/3
Q6. What innovation did Jagadeesh and Wu bring to sentiment analysis?
- They created the dictionary of words mapped to sentiment for finance
- They were the first to measure sentiment.
- They created a methodology to weight the impact of word sentiment based on prior market reactions.
- They made improvements in regards to the natural language processing of sentiment.
Q7. What was the conclusion of the Fang and Peres paper?
- Companies with low media coverage over-perform companies with high media coverage in a statistically significant manner.
- Companies with high media coverage over-perform companies with low media coverage in a statistically significant manner
- Companies with low media coverage over-perform companies with high media coverage in a non-statistically significant manner
- Companies with high media coverage over-perform companies with low media coverage in a non-statistically significant manner
Q8. Which of the following is media not usually used to predict
- Trading volume
- Risk
- All of the above are predicted with media
- Future stock returns
Q9. What is the implication of Netflix having the most connections to it, but Amazon having the highest page rank in the twitter networks notebook
- It means Amazon has more unique connections
- It means Netflix has more unique connections, but less total connections
- It means that while Netflix has more connections, the connections that Amazon has are more reputable/have higher scores so they give Amazon the higher PageRank
- It means that the summation of square root of connection strength is actually higher for Amazon
Q10. Will PageRank deal with a) the importance of nodes in connections and b) normalization of the number of connections between a given node A and B (in terms of smoothing the distribution) ?
Just A- Just B
- Both A&B
- Neither A or B
Get All Course Quiz Answers of Entrepreneurship Specialization
Entrepreneurship 1: Developing the Opportunity Quiz Answers
Entrepreneurship 2: Launching your Start-Up Quiz Answers
Entrepreneurship 3: Growth Strategies Coursera Quiz Answers
Entrepreneurship 4: Financing and Profitability Quiz Answers