Python and Machine-Learning for Asset Management with Alternative Data Sets Quiz Answers

All Weeks Python and Machine-Learning for Asset Management with Alternative Data Sets Quiz Answers

Python and Machine-Learning for Asset Management with Alternative Data Sets Coursera Quiz Answers

Week 1: Python and Machine-Learning for Asset Management with Alternative Data Sets

Q1. In what ways could managers potentially benefit from misrepresenting earnings?

They misrepresent bad earnings so the stock does well
They over-hype earnings during the announcement so that the stock goes up
They sell their own holdings in the company before negative earnings announcements which they under hype
They buy their own holdings of the company before positive earnings announcements which they under hype

Q2. If we wanted to measure foot traffic into stores on a busy street, what might be a challenge we would face?

Identifying whether customers were going into a specific store or another one
There isn’t a technology that could capture foot traffic at that level of granularity.
It is illegal
The size of the data would be too large to handle

Q3. What kind of normalization might someone want to do on foot traffic based data?

Seasonality normalization for changes in overall shopping based on season
Hourly normalization or aggregation to daily data
Normalization for specific events such as sales events
All of the above

Q4. Why does euclidean distance is not sufficiently precise for geolocational data manipulation?

It does not take into account the curvature of earth
The part where the difference is squared biases results
Both A & B
The numbers we obtain will be too large to process with regular computers

Q5. If we were converting a datetime index by using the date formatter “%m/%d/%Y”, what would/could end up happening to how the data could be grouped?

It could be grouped into hourly buckets
It could be grouped into daily buckets
It could be grouped into weekly buckets if we did another change on the index
Both B+C

Q6. If we had average rides per hour for both the weekend and weekday, why might we normalize and divide by the total number of rides for each respective category?
To increase the sample size

To decrease the sample size
To compare the two on a similar scale and deal with differences in the number of rides that might be attributed to a different overall number of weekend vs. weekday rides
To deal with the difference in scale of each of the 7 days

Q7. What kind of seasonality did we see present in the uber dataset/
Hourly

Weekday
Idiosyncratic (Big Shocks/clustering)
All of the above

Q8. Which of the following would not be considered consumption alternative data:

Cell phone geolocational data
Company earning announcements
Satellite images
Tweets

Q9. Which of the following are common issues with consumption data:

Accuracy of measurement
Privacy issues
Scarcity of devices/methodologies for collection
Both A+B

Q10. Why would inclusion of major sporting events as a table help our uber analysis?

We could normalize for expected extra drop offs in that time frame

We could observe rides attributable to the event by looking around that area and time
Both A+B
Neither A + B

Week 2: Python and Machine-Learning for Asset Management with Alternative Data Sets

Q1. If we are computing inverse document frequency like we did in class, and we have 5 documents, in which a word appears 2, 0, 0, 0,2 times respectively, what will the IDF term equal?

ln(4)
1
5/2
ln(5/2)

Q2. What is a stop word?

A word which when included messes up any text mining algorithms
Rare words that we exclude as outliers
Words that signify the ending of a sentence
Words such as “the” which are very common in many documents hence not valuable to textual analysis as they do not provide differentiating information between documents

Q3. What is the cosine similarity of the two following vectors: [5, 3, 5, 0], [-5, 0, 3, 0]?

-0.22327214
44.788391353117383
10
0

Q4. In class, we do a log transform, if instead of doing a log transform we did an exponential transformation (the opposite effect of a log transform), what would the effect on the word values be?

Very frequent words would still be worth more than less frequent words, but to a lesser degree.
Very frequent words would still be worth more than less frequent words, and to a higher degree
Common words would have no value
The effect would be the same as the log transform

Q5. Which part of TF-IDF would deal with a word that is extremely common in documents and also is very frequent?

TF
IDF
Both
Neither

Q6. Consider the case where we have a set of text, and also have the same set except every word is doubled. How will normalization impact the distance between these two documents vs. the raw word count?

The distance will be doubled
The distance will be halved
It will not impact distance
Distance will become 0

Q7. If we substituted from the string “A B CC” with the character “X” and did it with the pattern “[A-Z]”, what would be the result?

X B CC
X X CC
X X X
X X XX

Q8. Which of the following is not a stop word?

The
We
He
Noun

Q9. Is it possible for there to be high cosine similarity between documents but also high distance

No, this can’t happen
Yes, this is what should normally happen
Yes, if word counts were not normalized
Yes, because these measures have no relation to each other

Week 3: Python and Machine-Learning for Asset Management with Alternative Data Sets

Q1. Would the regular expression pattern [A-Z]{2}\s[0-9]{5} find “ma 02446” and “MA02446”?

Just the first one
Just the second one
It would find neither.
It would find both.

Q2. What data is in the 13-F? Multiple responses possible

It contains performance data for funds.
It shows holdings that funds have of different securities
It shows a breakdown for each tradeable security of what executives working at the company hold
A & B

Q3. If you knew a table in an html page had the class infoTable, how would you find it with beautifulsoup?

soup.find(“table”, {“class”: “infoTable”})
soup.find(“table”, “infoTable”)
soup.find(“table”, “class”, “infoTable”)
None of the above

Q4. What can be found in the 10-K?

Company financials
Description of risks/competition
Litigation
All of the above

Q5. What are issues with the 10-K?

Strides have been made to standardize the template in terms of html elements/the actual format, but it still is not perfect and can require work to figure out how to text mine.
Companies aren’t required to report at a high level the same things.
Web scraping/pulling the data from online is not very easy.
All of the above

Q6. Explain the significance of the “decaying” of cosine similarities between 10-Ks between years.

It means the uniqueness of the document goes down as time passes
It means that documents that are further away in years are more similar
It means that documents closer in terms of year are more similar.
None of the above

Q7. What would we expect the cosine similarity to be between the same company’s 10-K as well as a competitors 10-K?

We would expect a large similarity between the company’s own 10-K, and a small to moderate similarity between their competitors and their 10-K.
We would expect a small to moderate similarity between the company’s own 10-K, and a small to moderate similarity between their competitors and their 10-K.
We would expect a large similarity between the company’s own 10-K, and a large similarity between their competitors and their 10-K.
We would expect a small to moderate similarity between the company’s own 10-K, and a large similarity between their competitors and their 10-K.

Q8. Let’s say we were looking at messy data where users input the code for their state. What might be an issue with using [A-Z]{2} for the regular expression?

The form might not specify that 2 letters must be inputted
Users could input lower case letters to the form
Users could have typos by adding numbers
All of the above

Q9. Why might we want to use a boolean representation of fund holdings of stocks?

It can be more informative
To reduce sample size
Both A & B
Neither A or B

Q10. What might be issues with measuring country risk from the 10-K?

The presence of the country might not necessarily mean it is a risk depending on the context of its inclusion.
Companies don’t mention countries in their risk section
There are too many countries to track.
None of the above

Q11. If we do term frequency like in class, what does a term with 24 occurrences become?

Ln(24)
Ln(25)
5
24

Week 4: Python and Machine-Learning for Asset Management with Alternative Data Sets

Q1. If we had a network with hundreds of stocks, and there were few connections between, but these connections from stock A to stock B were very high (say 20 connections), would our graph be sparse or dense, and would the high level of connections between given stocks skew the type of analysis we did?

Sparse, no
Sparse, yes
Dense, no
Dense, yes

Q2. Why don’t broad dictionaries of word-sentiment mappings work in the context of text around financial data?

They don’t contain the words we need to analyze
There are too many sentiment classifications
There are many terms which have finance specific meanings like liability. In the context of regular text, this would be a negative term, but in the context of finance it really is neutral.
We can’t analyze text at just the word by word level

Q3. For sentiment analysis, which of the following words might be different for finance vs. a general text documents?

Asset
Liability
A+B
None of the above

Q4. Which of the following is not considered a reason that analyzing media is difficult?

Local Biases
Differences in reporting styles of different writers.
There isn’t enough sources
Fake news

Q5. Using the equation for tone, what would the tone be of an article if there were 5 positive words and 3 negative words?

2/5
2/8
-2/8
5/3

Q6. What innovation did Jagadeesh and Wu bring to sentiment analysis?

They created the dictionary of words mapped to sentiment for finance
They were the first to measure sentiment.
They created a methodology to weight the impact of word sentiment based on prior market reactions.
They made improvements in regards to the natural language processing of sentiment.

Q7. What was the conclusion of the Fang and Peres paper?

Companies with low media coverage over-perform companies with high media coverage in a statistically significant manner.
Companies with high media coverage over-perform companies with low media coverage in a statistically significant manner
Companies with low media coverage over-perform companies with high media coverage in a non-statistically significant manner
Companies with high media coverage over-perform companies with low media coverage in a non-statistically significant manner

Q8. Which of the following is media not usually used to predict

Trading volume
Risk
All of the above are predicted with media
Future stock returns

Q9. What is the implication of Netflix having the most connections to it, but Amazon having the highest page rank in the twitter networks notebook

It means Amazon has more unique connections
It means Netflix has more unique connections, but less total connections
It means that while Netflix has more connections, the connections that Amazon has are more reputable/have higher scores so they give Amazon the higher PageRank
It means that the summation of square root of connection strength is actually higher for Amazon

Q10. Will PageRank deal with a) the importance of nodes in connections and b) normalization of the number of connections between a given node A and B (in terms of smoothing the distribution) ?

Just A
Just B
Both A&B
Neither A or B

Get All Course Quiz Answers of Entrepreneurship Specialization

Entrepreneurship 1: Developing the Opportunity Quiz Answers

Entrepreneurship 2: Launching your Start-Up Quiz Answers

Entrepreneurship 3: Growth Strategies Coursera Quiz Answers

Entrepreneurship 4: Financing and Profitability Quiz Answers

Python and Machine-Learning for Asset Management with Alternative Data Sets Quiz Answers

All Weeks Python and Machine-Learning for Asset Management with Alternative Data Sets Quiz Answers

Table of Contents

Python and Machine-Learning for Asset Management with Alternative Data Sets Coursera Quiz Answers

Week 1: Python and Machine-Learning for Asset Management with Alternative Data Sets

Week 2: Python and Machine-Learning for Asset Management with Alternative Data Sets

Week 3: Python and Machine-Learning for Asset Management with Alternative Data Sets

Week 4: Python and Machine-Learning for Asset Management with Alternative Data Sets

Get All Course Quiz Answers of Entrepreneurship Specialization

Team Networking Funda

Leave a ReplyCancel Reply

All Weeks Python and Machine-Learning for Asset Management with Alternative Data Sets Quiz Answers

Table of Contents

Python and Machine-Learning for Asset Management with Alternative Data Sets Coursera Quiz Answers

Week 1: Python and Machine-Learning for Asset Management with Alternative Data Sets

Week 2: Python and Machine-Learning for Asset Management with Alternative Data Sets

Week 3: Python and Machine-Learning for Asset Management with Alternative Data Sets

Week 4: Python and Machine-Learning for Asset Management with Alternative Data Sets

Get All Course Quiz Answers of Entrepreneurship Specialization

Team Networking Funda

Related Posts

Project Management Fundamentals Quiz Answers: Practice & Graded Quizzes

Transacting on the Blockchain Quiz Answers – Practice Quiz Solution

Market Research and Analysis for Tech Industries Quiz Answers

Leave a ReplyCancel Reply

Trending now