29 minute read
We want to include data that changes frequently; otherwise, we are wasting space including redundant data. Of the options given, both “date/time” and “price of the stock” are good fits. The other data points are either static or change relatively infrequently.
Your task is to calculate the mean volume for each of the given symbols.
Given a DataFrame
df containing a "Volume” column, the following code returns the mean of the values in that column.
Your task is to plot the high prices for IBM.
First, we need to make sure that we read in the right CSV, which we can accomplish with:
df = pd.read_csv(‘data/IBM.csv’)
Next, we can retrieve the “High” column and plot it:
See the breakdown here (for 2018 and 2019).
We can avoid having to explicitly call
dropna on line 22 by passing a certain value for the ‘how’ parameter of the
join call on line 19.
What is the value of that parameter?
If we look at the code we have written so far, we see some duplication.
Your task is to consolidate this code into one location: the utility function
We have consolidated both the DataFrame initialization code and the joining code.
Additionally, since we are using SPY as our baseline, we drop all rows from
df where SPY has not traded - that is, where the SPY column has
NaN values - with the following code:
df = df.dropna(subset=[“SPY”])
While both of these are technically correct, the second approach leverages vectorization which is must faster than the iterative approach. Read more about vectorization here.
Your task is to write code that first slices
df along the rows bounded by
end_index and across
columns, and then passes the slice to the
We can create our slice using the following code, which we then pass to the
We need to move data from the last two rows and the last two columns of
nd2 to the first two rows and the first two columns of
Let’s first look at how we can slice
nd2 to extract the data we want. We can slice the last two rows using negative indexing:
-2:. We can slice the last two columns as
2:4. Remember that NumPy indexing is upper-bound exclusive.
Now let’s see how we can transplant that data into
nd1. We can select the first two rows of
0:2, and we can select the first two columns as
The complete data transfer can be accomplished with the following code:
nd1[0:2, 0:2] = nd2[-2:, 2:4]
We saw that the elements in an array created by
np.ones are floats by default. Our task here is to update the call to
np.ones and pass in a parameter that tells NumPy to give us integers instead of floats.
We can accomplish this change with the following code:
np.ones((5, 4), dtype=np.int_)
Our task is to implement the function
get_max_index, which takes a one-dimensional ndarray
a and returns the index of the maximum value.
We can retrieve the index of the maximum value in
a with the
def get_max_index(a): return a.argmax()
Assume we are using a rolling mean to track the movement of a stock. We are looking for an opportunity to find when the price has diverged significantly far from the rolling mean, as we can use this divergence to signal a buy or a sell.
Which statistic might we use to discover if the price has diverged significantly enough?
Standard deviation gives us a measure of divergence from the mean. Therefore, if the price breaches the standard deviation, we may conclude that the price has moved significantly enough for us to consider buying or selling the stock.
Computing Bollinger bands consists of three main steps: first, computing the rolling mean; second, computing the rolling standard deviation and; third, computing the upper and lower bands.
Our goal is to implement the three functions below to accomplish these tasks.
Given an ndarray
values and a window
window, we can calculate the rolling standard deviation as follows:
# OLD pd.rolling_std(values, window=window) # NEW values.rolling(window).std()
Given a rolling mean
rm and a rolling standard deviation
rstd, we can calculate the Bollinger bands as follows:
rm + (2 * rstd), rm - (2 * rstd)
Our task is to implement a function
compute_daily_returns that receives a DataFrame
df and returns a DataFrame consisting of the daily return values. The returned DataFrame must have the same number of rows as
df, and any rows with missing data must be filled with zeroes.
Given a DataFrame
n rows, we can create a new DataFrame with
n + m rows where each row is shifted down
m rows with the following code:
Note that the newly created
m rows at the top of the DataFrame will be filled with
Therefore, we can divide every price in our DataFrame
df by the price on the previous day like this:
df / df.shift(1)
We can complete the daily return calculation, and generate a DataFrame
daily_returns like so:
df / df.shift(1) - 1
The only thing we have to consider now is the first row in
daily_returns. Since the first row in the shifted DataFrame is filled with
NaN values, any subsequent mathematical operations on those values yield
However, our task was to fill any rows with missing data with zeroes. We can fix
daily_returns like this:
# Deprecated daily_returns.ix[0, :] = 0 # Use this daily_returns.iloc = 0
Our task is to find the parameter that we need to pass to
fillna to fill forward missing values.
If we look at the following plot of stock price data, we can see several gaps.
Our task is to use the
fillna method to fill these gaps.
We can use forward filling for gaps that have a definitive start date, and backward filling for gaps that have no beginning or begin before our date range.
Suppose that we've taken all of the SPY pricing data from over the years, generated an array of daily returns, and created a histogram from those returns.
Which of the following shapes would the histogram most likely have?
A common practice in finance is to plot histograms of the daily returns of different stocks together to assess how the stocks relate to each other.
Below are the daily returns histograms for SPY and XYZ, as well as three statements describing the relationship between the two.
Which statement do you think is correct?
Note that "vol" refers to volatility and not volume.
We can see that the mean of SPY is slightly higher than the mean of XYZ, indicating that SPY outperforms XYZ.
Additionally, we can see that the XYZ curve is "flatter" than the SPY curve. This feature indicates that the daily returns of XYZ are more spread out than those of SPY, which are more centralized.
In summary, XYZ has both lower returns and higher volatility than SPY.
Given what we just learned about correlation and slope (beta), let's look at two scatterplots with their best-fit lines, and choose the most accurate statement.
The best-fit line in the SPY vs. ABC scatterplot has a higher beta because that line has a larger slope than the corresponding line in the SPY vs. XYZ plot.
Additionally, the SPY and ABC daily returns are more highly correlated, which can be determined visually from examining how "tightly" they hug the best-fit line in the SPY vs. ABC plot.
The Sharpe ratio allows us to consider our returns in the context of risk: the standard deviation, or volatility, of the returns. When we look at portfolio performance, we don't typically look at raw returns; instead, we adjust the returns received for the risk borne.
With this in mind, let's look at three comparisons of two stocks, ABC and XYZ, and decide which is better.
For the first comparison, ABC is better. ABC and XYZ have similar amounts of volatility, but ABC has double the return of XYZ.
For the second comparison, XYZ is better. Both ABC and XYZ have the same return, but ABC is much more volatile than XYZ.
For the third comparison, we can't tell which is better, given the information provided. ABC has a higher return than XYZ, but that return is offset by higher volatility.
We need a qualitative measure to compare ABC and XYZ in this third example, and the Sharpe ratio is that measure.
Consider the following three factors.
How would you combine these three factors into a simple equation to create a metric that provides a measure of risk-adjusted return?
Only the third choice meets the two criteria we described earlier; all else being equal, higher returns increase our metric, and lower risk increases our metric. Additionally, a higher rate of risk-free return decreases our metric.
Assume we have been trading a strategy for 60 days now. On average, our strategy returns one-tenth of one percent per day. Our daily risk-free rate is two one-hundredths of a percent. The standard deviation of our daily return is one-tenth of one percent.
What is the Sharpe ratio of this strategy?
In financial terminology, one one-hundredth of one percent is known as a basis point, or "bip". Instead of saying, for example, that our strategy returns one-tenth of one percent per day, we could say it returns 10 bps per day.
Let's recall our formula for the Sharpe ratio:
Given that on average, on average, and , with a daily sample rate:
Which of the following functions would be hard for the minimizer to solve?
The first graph is hard because of the "flat" areas on either side of the parabola. A minimizer testing a point in the middle of this area wouldn't be able to find any gradient to follow, so it wouldn't know how to adjust the value it was currently testing.
The second graph is hard because it has several local minima that aren't necessarily the global minimum. A minimizer might "get stuck" in a local minimum, even though a more significant, global minimum exists.
The fourth graph is challenging both because of the "flat" area and the discontinuity between the two halves.
Let's assume that a point has an error , which is the vertical distance between and the best-fit line currently under consideration. Given a number of such errors , which of the following expressions describes the metric we want to minimize?
We want to minimize the sum of the errors, but we want to ensure that errors above and below the line do not cancel out. To accomplish this, we need to make each error positive by either squaring it or taking its absolute value.
Let's assume we have a portfolio of four stocks, and we want to find the optimal allocations that maximize some performance metric. Which of the following metrics would be easiest to optimize for?
It would be easiest to write an optimizer for cumulative return. To do so, all we need to do is allocate 100% of our portfolio into whichever stock had the highest cumulative return.
Optimizing for minimum volatility or Sharpe ratio involves evaluating various combinations of stocks, which is more complicated than simply putting all our eggs in one basket.
Let's think about building a model to use in trading. Which of the following factors might be input values () to the model, and which might be output values ()?
Since we often use models to predict values in the future, both future price and future return make sense as output values. Our model might make these predictions by considering price momentum, current price, and Bollinger values as input.
We've identified that, in KNN, for a particular query , we want to utilize the nearest data points to to come up with a prediction. What should we do with those neighboring data points to find that prediction?
Remember that we want to predict a -value for the queried -value. As a result, it doesn't make sense to take the average of the -values of the nearest neighbors. Additionally, we don't want to take the largest -value; otherwise, the other neighbors have no influence on the prediction. The correct approach here is to take the mean of their -values.
Let's consider the relationship between variables in two different scenarios.
The first scenario involves firing a cannon. The independent variable is the value of the angle that the cannon makes with the ground, and the dependent variable is the horizontal distance the cannonball travels.
The second scenario involves attracting bees to a food source. The independent variable is the richness of the food source, and the dependent variable is the number of bees attracted to that source.
Note that this scenario is slightly different than the first because it's not clear that the number of bees always increases as richness increases.
Given these two scenarios, which, if any, is better modeled using a parametric model, and which, if any, is better modeled using a non-parametric model?
In the first scenario, we can start with an estimate of the underlying behavior of the system in terms of a mathematical equation that expresses how it behaves. This equation is just the equation of trajectory, which we can find online. We can then learn the parameters of this equation such that it describes the relationship between our variables.
In the second scenario, we have no initial estimate for the underlying mathematical equation, so it's better to use a non-parametric model, which can model any "shape" of relationship.
Consider the following three models, each generated using a different value for .
Our first task is to match the value of with the corresponding plot. Our second task is to decide whether we increase the chances of overfitting as we increase . An overfit model matches the training set very well but fails to generalize to new examples.
Let's consider the case where . In this case, the model passes through every point directly, since near , the only point that has any influence is .
Now consider the case where . In this case, every point considers all of the neighbors. Thus, the generated model is a straight line passing through the mean of the values of all the points.
Of course, when , the graph lies between these two extremes. For , the graph roughly follows the points without passing through them directly.
As a result, we see that increases in decrease the probability of overfitting.
Consider the following three polynomial models. The difference between each model is the degree of the polynomial .
Our first task is to match the value of with the corresponding plot. Our second task is to decide whether we increase the chances of overfitting as we increase .
A polynomial of degree one matches the equation , which is the equation of a line and corresponds to the third plot.
A polynomial of degree two matches the equation , which is the equation of a parabola and corresponds to the first plot.
A third-order polynomial matches the equation , which corresponds to the second plot.
We see that as we increase , our model begins to follow the points more closely. Indeed, it can be shown that for points, a parabola of degree exists that passes through each point.
Notice that for each of these models, we can extrapolate beyond the data given. This ability to extrapolate is a property of parametric models that instance-based models lack.
Suppose we just built a model. Which error would you expect to be larger: in-sample or out-of-sample?
In general, the out-of-sample error is worse than the in-sample error.
Let's think about the relationship between RMS error and the correlation between and . Which of the following statements is true?
In most cases, correlation decreases as RMS error increases. However, it is possible to construct examples where correlation increases as RMS error increases.
Let's consider overfitting in KNN and how in-sample and out-of-sample error changes as increases from 1 to the number of items in a data set.
Which of the following plots correctly represents the shape of the error curves that we would expect for both types of error as we increase ?
Remember that KNN models are least generalized when . In other words, when , the model predicts each training point in the data set perfectly but fails to predict testing points accurately. As a result, KNN models overfit when is small.
There are a few other factors worth considering when evaluating a learning algorithm. For each of the following factors, which of the two models has better performance?
Linear regression models require less space for persistence than KNN models. A linear regression model of degree four can be described in as few as four integers, while a KNN model must retain every single data point ever seen.
KNN models require less compute time to train than linear regression models. In fact, KNN models require zero time to train.
Linear regression models process queries more quickly than KNN models. The query time for a linear regression model is constant. The query time for KNN models grows with the number of queries, as previously queried data points are added to the data set and must be examined in subsequent queries.
Adding new data is quicker in KNN than in linear regression. Incorporating new data into a model requires retraining the model, but, as we just saw, the training time for a KNN model is zero.
How might we go about building an ensemble of learners?
We can create an ensemble by training several parameterized polynomials of differing degrees (A) or by training several KNN models using different subsets of data (B).
It doesn't make sense to train KNN models using randomized -values, because we want to train any model we plan to use on the actual data in our training set. As a result, neither (C) nor (E) is correct.
Instead of using just polynomial learners or KNN learners, we can combine the two into a super ensemble (D) for even better results.
Which of these two models is least likely to overfit?
Aside: the screenshot says "most likely to overfit", but he selects the model that is least likely to overfit.
As we increase the number of models in our ensemble, which of the following strategies is more likely to overfit?
AdaBoost focuses primarily on improving the system for specific data points; in other words, it strives to fit. As a result, it is more susceptible to overfitting than is simple bagging.
Typically, symbols for ETFs have three or four letters, while mutual fund symbols usually have five. Hedge funds don't have symbols; instead, we refer to them by their full name.
So far, we've looked at two different incentive structures: expense ratios and the two and twenty rule. Which of the following actions might these compensation mechanisms incentivize?
The expense ratio, which is derived entirely from AUM, primarily incentivizes AUM accumulation. Additionally, since the "two" of two and twenty is based on AUM, that incentive structure overall slightly incentivizes AUM accumulation.
ETF managers and mutual fund managers are not compensated for making profits. ETFs, for example, are specifically designed to track an index, and they don't particularly care whether an index goes up or down. The two and twenty rule incentivizes profit, as the "twenty" component is earned through profit gains.
Funds that compensate according to expense ratios are not incentivized to take risks at all. Under the two and twenty model, however, risk-taking is incentivized since significant profit gains can be realized by undertaking considerable risk.
Additionally, fund managers under the two and twenty rule are insulated from risk by the 2% expense ratio that they receive no matter what. As a result, they experience the upside of risk and a minimized downside.
Consider the following order book. Do you think the price of this equity is likely to go up or down in the near future?
The price is likely to drop in the near future because there is more selling pressure than buying pressure.
Consider what would happen if we put in a market order to sell 200 shares. We would get 100 shares at $99.95, 50 shares at $99.90, and 50 shares at $99.85. Our single order would cause the price of the equity to drop by $0.10.
On the other hand, suppose we issue a market order to buy 200 shares. We would receive 200 of the 1000 shares available for sale at $100. The next market buy order would start with the remaining 800 shares for sale at $100. In other words, our buy order wouldn't affect the sale price at all.
Suppose we've been watching IBM, and we decide to short it when it reaches $100 because we think that it is going to go down. If we short 100 shares at $100 per share and submit an order to buy back the shares at $90 per share to close out our position, what is our net return?
Each time IBM drops $1 in price, we make $100 because we are shorting 100 shares. Altogether, the stock dropped $10, so we made $1000.
Suppose there is a company that consistently generates $1 per year. What is that company worth?
Assume that we are in a position to receive one of three different assets.
The first asset is a $1 bill: cold, hard cash. The second asset is a Tucker Balch bond; essentially, a promise certified by the professor that he will pay us $1 in one year. The third asset is a US Government bond, which also pays out $1 in one year, but is backed by the United States government.
Which of these assets would you rather receive? Rank the choices from 1 (best) to 3 (worst).
The most valuable asset among these three is the $1 delivered right now because you can spend it right now. The other two are promises for a reward at some point in the future. Among these two bonds, the one backed by the US government is likely more valuable than the one backed by the professor.
Consider a company that pays a dividend of $2 per year. Given a discount rate of 4%, what is the intrinsic value of this company?
The present value, , of a company is equal to the future value, , divided by the discount rate, . Given and ,
Consider a fictitious airline company.
This company owns 10 airplanes, each valued at $10 million. Additionally, it has a brand name worth another $10 million. Finally, it has an outstanding loan for $20 million. What is the book value of this company?
This company pays $1 million per year in dividends. Assuming a 5% discount rate, what is the intrinsic value of this company?
This company has one million shares of stock outstanding. Given a stock price of $75 per share, what is the market capitalization of this company?
To calculate book value, we take the value of the total assets and subtract the intangible assets, like the brand, and the liabilities. Given $110,000,000 in total assets, a $10,000,000 intangible asset, and a $20,000,000 liability, the book value for this company is $80,000,000.
To compute the intrinsic value, we divide the value of the dividends, $1,000,000, by the discount rate, 0.05, to get $20,000,000.
To compute the market capitalization, we take the product of the share price, $75, and the number of outstanding shares, 1,000,000, to get $75,000,000.
Since this company has a market capitalization of $75,000,000, we could buy all of the shares and, effectively the company, for $75,000,000. Should we?
It might seem like a tricky question, given such a relatively low intrinsic value, but it's not. We should buy this company for $75,000,000 and then break it apart and sell the individual assets for $80,000,000 to get an immediate $5,000,000 profit.
Stock prices very rarely dip below book value for this exact reason; otherwise, predatory buyers swoop in and buy the whole company just to sell it for parts.
Consider a portfolio consisting of two stocks: Stock A and Stock B. 75% of the portfolio is in Stock A, and -25% of the portfolio is in Stock B; in other words, the portfolio has taken a short position in Stock B.
Assume that, today, the price of Stock A increases by 1%, while the price of Stock B decreases by 2%. What is the return on this portfolio?
Remember the formula to calculate portfolio return.
If we plug in the weights and returns for Stock A and Stock B, we can compute a portfolio return of 1.25%.
Consider the following two scatterplots. The plot on the left shows the daily returns of a fictional stock XYZ against the daily returns of the S&P 500. The plot on the right shows the daily returns of a fictional stock ABC against the daily returns of the S&P 500.
Given these two plots, which asset has a higher and which has a higher ?
Recall that is the slope of the fit line, and is the y-intercept of the fit line. We can tell from the plots that ABC has both a higher and a higher .
If we are in an upward market, do we want a portfolio with a larger or a smaller ? How about if we are in a downward market?
In upward markets, we want a portfolio with a larger . For example, a portfolio with a greater than one rises even higher than the market, while a portfolio with a smaller than one won't be able to take full advantage of market performance.
In downward markets, we want the opposite: a smaller . Indeed, a portfolio with a smaller falls less sharply in a downward market, while a portfolio with a larger crashes hard.
Let's consider another scenario now.
Instead of staying flat, suppose that the market went up 10%. What are the relative and absolute returns for both stocks, and what is our total return, both relative and absolute?
Let's also consider the scenario where the market goes down 10%. What are the relative and absolute returns for both stocks, and what is our total return, both relative and absolute?
Let's first look at the case where the market rises by 10%.
Consider stock A, which has a of 1.0 and an of 0.01. A of 1.0 tells us that for every percentage point that the market moves, stock A moves one percent. An of 0.01 tells us that stock A will move 1% above its movement with the market.
As a result, stock A moves 10% plus 1%, for a total relative return of 11%, and a total absolute return of $5.50, 11% on a $50 investment.
Let's consider stock B now, which has a of 2.0 and an of -0.01. A of 2.0 tells us that for every percentage point the market moves, stock A moves two percent. An of -0.01 tells us that stock A will move 1% below its movement with the market.
As a result, stock B moves 20% minus 1%, for a total relative return of 19%. However, since we shorted stock B, this return is actually -19%, and we have lost $9.50.
Let's compute the total return. Since we gained $5.50 on stock A and lost $9.50 on stock B, our total absolute return is -$4.
Calculating the relative return is a little tricky; that is, we can't just add 11% and -19% to get -8%. Instead, since we split our investment across stock A and stock B, our actual return is one-half of 11% plus one-half of -19%, or -4%.
Now, let's look at the case where the market falls 10%.
In this case, stock A falls with the market, but does 1% better, for a total relative loss of 9%, which equates to a $4.50 loss on a $50 investment.
Stock B falls twice as hard as the market, and does 1% worse on top of that, for a total relative loss of 21%. However, since we shorted stock B, this loss is actually a 21% gain, or a gain of $10.50 on a $50 investment.
Overall, this market scenario nets us 6%, or $6 on a $100 investment.
Let's look at our two stocks again. Stock A has a of 1.0 and an of 0.01. Stock B has a of 2.0 and an of -0.01. What should the weights be for stock A and stock B so that we can minimize market risk?
We need to solve the following equation.
We also know that the sum of the absolute values and should equal one.
If we substitute for , we can solve for .
However, since we want to short B, is actually , not . We can now solve for .
If we plug these two weights back into our original equation, we can verify that we do get an overall of 0.
Now that we know some differences between the types of data used for fundamental and technical analysis, let's look at the following four factors. Which of these are fundamental, and which are technical?
Remember that technical analysis considers only price and volume data, whereas fundamental analysis incorporates other types of data.
The moving average of price and the percent change in volume consider only price and volume, respectively, so they are both technical indicators.
P/E ratio considers both price and earnings, making it a fundamental factor, as well as intrinsic value, which is based on dividends
Let's consider how we might trade using Bollinger Bands. Consider the four events below, each of which involves the price of a stock crossing over a Bollinger Band. For each event, determine if the event demonstrates a buying opportunity, a selling opportunity, or no opportunity at all.
For the first event, we see the price crossing from the outside to the inside of the upper Bollinger Band. This event indicates that the price is moving back towards the moving average after a strong upward excursion. This is a sell signal.
For the second event, we see the price crossing from the inside to the outside of the lower Bollinger Band. This is not a signal, although it does indicate a significant excursion from the moving average.
For the third and fourth events, we see the price crossing from the outside to the inside of the lower Bollinger Band. These events indicate that the price is moving back towards the moving average after a strong downward excursion. Correspondingly, they are both buy signals.
OMSCS Notes is made with in NYC by Matt Schlenker.