Welcome to Part 13 of our Data Analysis with Python and Pandas, using Real Estate investing as an example. At this point, we've learned quite a bit about what Pandas has to offer us, and we'll come up here with a bit of a challenge! As we've covered so far, we can make relatively low-risk investments based on divergence between highly correlated state pairs and probably do just fine. We'll cover testing this strategy later on, but, for now, let's look into acquiring the other necessary data that comprises housing values: Interest rates. Now, there are many different types of mortgage rates both in the way interest is accrued as well as the time-frame for the loan. Opinions vary over the years, and depending on the current market situation, on whether you want a 10 year, 15 year, or 30 year mortgage. Then you have to consider if you want an adjustable rate, or maybe along the way you decide you want to re-finance your home.
At the end of the data, all of this data is finite, but ultimately will likely be a bit too noisy. For now, let's just keep it simple, and look into the 30 year conventional mortgage rate. Now, this data should be very negatively correlated with the House Price Index (HPI). Before even bothering with this code, I would automatically assume and expect that the correlation wont be as negatively strong as the higher-than-90% that we were getting with state HPI correlation, certainly less than -0.9, but also it should be greater than -0.5. The interest rate is of course important, but correlation to the overall HPI was so very strong because these were very similar statistics. The interest rate is of course related, but not as directly as other HPI values, or the US HPI.
First, let's grab the data. We will start by creating a new function:
def mortgage_30y():
df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
print(df.head())
return df
mortgage_30y()
Output:
Value Date 1975-01-01 0.000000 1975-02-01 -3.393425 1975-03-01 -5.620361 1975-04-01 -6.468717 1975-05-01 -5.514316
A couple of points to make here. First, notice the new parameter added to the Quandl.get(), which was trim_start. This allows us to start the data at a certain date. We chose this Jan 1 1975 because that's when our House Price Index data starts. From here, we print the head of the data, and we are greeted with our first problem: This is 1st of the month, rather than end of month. This is going to cause trouble when we go to join this dataframe to our others. So, now what? We have learned how to resample, what if we just do a typical resample using "M," which means end of month. Maybe that'd move the data to a 31st since there's one value in the month.
def mortgage_30y():
df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
df=df.resample('M')
print(df.head())
return df
mortgage_30y()
Output:
Value Date 1975-01-31 NaN 1975-02-28 NaN 1975-03-31 NaN 1975-04-30 NaN 1975-05-31 NaN
Well, that didn't work out so well. We're probably needing more than one data point for the calculation, so what do we do? We could attempt to adjust the date column or something, or we can do a bit of a hack. What if we just sample to daily? If we do that, the numbers will just keep repeating through the month. Then, we can resample to end of Monthly, and then all should work.
def mortgage_30y():
df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
df=df.resample('1D')
df=df.resample('M')
print(df.head())
return df
mortgage_30y()
Output:
Value Date 1975-01-31 0.000000 1975-02-28 -3.393425 1975-03-31 -5.620361 1975-04-30 -6.468717 1975-05-31 -5.514316
We win! Next up, we can grab all of the data, join this new set to the dataframe, and now we're really cooking. Just in case you are just now joining us, or maybe you are lost along the way, here's the code up to this point:
import Quandl
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from matplotlib import style
style.use('fivethirtyeight')
# Not necessary, I just do this so I do not show my API key.
api_key = open('quandlapikey.txt','r').read()
def state_list():
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
return fiddy_states[0][0][1:]
def grab_initial_state_data():
states = state_list()
main_df = pd.DataFrame()
for abbv in states:
query = "FMAC/HPI_"+str(abbv)
df = Quandl.get(query, authtoken=api_key)
print(query)
df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0
print(df.head())
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df)
pickle_out = open('fiddy_states3.pickle','wb')
pickle.dump(main_df, pickle_out)
pickle_out.close()
def HPI_Benchmark():
df = Quandl.get("FMAC/HPI_USA", authtoken=api_key)
df["United States"] = (df["United States"]-df["United States"][0]) / df["United States"][0] * 100.0
return df
def mortgage_30y():
df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
df=df.resample('1D')
df=df.resample('M')
return df
Now we can do something like:
HPI_data = pd.read_pickle('fiddy_states3.pickle')
m30 = mortgage_30y()
HPI_Bench = HPI_Benchmark()
m30.columns=['M30']
HPI = HPI_Bench.join(m30)
print(HPI.head())
Output:
United States M30 Date 1975-01-31 0.000000 0.000000 1975-02-28 0.594738 -3.393425 1975-03-31 1.575473 -5.620361 1975-04-30 2.867177 -6.468717 1975-05-31 3.698896 -5.514316
Next, we can immediately run a quick correlation:
print(HPI.corr())
Output:
United States M30 United States 1.000000 -0.740009 M30 -0.740009 1.000000
That's fairly expected. -0.74 is pretty strongly negative. Obviously not as beautifully aligned as the various states were usually to eachother, but this is still obviously a useful metric. Next, we can check this metric against all of the states:
state_HPI_M30 = HPI_data.join(m30) print(state_HPI_M30.corr())
Output:
AL AK AZ AR CA CO CT \
AL 1.000000 0.944603 0.927361 0.994896 0.935970 0.979352 0.953724
AK 0.944603 1.000000 0.893904 0.965830 0.900621 0.949834 0.896395
AZ 0.927361 0.893904 1.000000 0.923786 0.973546 0.911422 0.917500
AR 0.994896 0.965830 0.923786 1.000000 0.935364 0.985934 0.948341
CA 0.935970 0.900621 0.973546 0.935364 1.000000 0.924982 0.956495
CO 0.979352 0.949834 0.911422 0.985934 0.924982 1.000000 0.917129
CT 0.953724 0.896395 0.917500 0.948341 0.956495 0.917129 1.000000
DE 0.980566 0.939196 0.942273 0.975830 0.970232 0.949517 0.981177
FL 0.918544 0.887891 0.994007 0.915989 0.987200 0.905126 0.926364
GA 0.973562 0.880261 0.939715 0.960708 0.943928 0.959500 0.948500
HI 0.946054 0.930520 0.902554 0.947022 0.937704 0.903461 0.938974
ID 0.982868 0.944004 0.959193 0.977372 0.944342 0.960975 0.923099
IL 0.984782 0.905512 0.947396 0.973761 0.963858 0.968552 0.955033
IN 0.981189 0.889734 0.881542 0.973259 0.901154 0.971416 0.919696
IA 0.985516 0.943740 0.894524 0.987919 0.914199 0.991455 0.913788
KS 0.990774 0.957236 0.910948 0.995230 0.926872 0.994866 0.936523
KY 0.994311 0.938125 0.900888 0.992903 0.923429 0.987097 0.941114
LA 0.967232 0.990506 0.909534 0.982454 0.911742 0.972703 0.907456
ME 0.972693 0.935850 0.923797 0.972573 0.965251 0.951917 0.989180
MD 0.964917 0.943384 0.960836 0.964943 0.983677 0.940805 0.969170
MA 0.966242 0.919842 0.921782 0.966962 0.962672 0.959294 0.986178
MI 0.891205 0.745697 0.848602 0.873314 0.861772 0.900040 0.843032
MN 0.971967 0.926352 0.952359 0.972338 0.970661 0.983120 0.945521
MS 0.996089 0.962494 0.927354 0.997443 0.932752 0.985298 0.945831
MO 0.992706 0.933201 0.938680 0.989672 0.955317 0.985194 0.961364
MT 0.977030 0.976840 0.916000 0.983822 0.923950 0.971516 0.917663
NE 0.988030 0.941229 0.896688 0.990868 0.912736 0.992179 0.920409
NV 0.858538 0.785404 0.965617 0.846968 0.948143 0.837757 0.866554
NH 0.953366 0.907236 0.932992 0.952882 0.969574 0.941555 0.990066
NJ 0.968837 0.934392 0.943698 0.967477 0.975258 0.944460 0.989845
NM 0.992118 0.967777 0.934744 0.993195 0.934720 0.968001 0.946073
NY 0.973984 0.940310 0.921126 0.973972 0.959543 0.949474 0.989576
NC 0.998383 0.934841 0.915403 0.991863 0.928632 0.977069 0.956074
ND 0.936510 0.973971 0.840705 0.957838 0.867096 0.942225 0.882938
OH 0.966598 0.855223 0.883396 0.954128 0.901842 0.957527 0.911510
OK 0.944903 0.984550 0.881332 0.967316 0.882199 0.960694 0.879854
OR 0.981180 0.948190 0.949089 0.978144 0.944542 0.971110 0.916942
PA 0.985357 0.946184 0.915914 0.983651 0.950621 0.956316 0.975324
RI 0.950261 0.897159 0.943350 0.945984 0.984298 0.926362 0.988351
SC 0.998603 0.945949 0.929591 0.994117 0.942524 0.980911 0.959591
SD 0.983878 0.966573 0.889405 0.990832 0.911188 0.984463 0.924295
TN 0.998285 0.946858 0.919056 0.995949 0.931616 0.983089 0.953009
TX 0.963876 0.983235 0.892276 0.981413 0.902571 0.970795 0.919415
UT 0.983987 0.951873 0.926676 0.982867 0.909573 0.974909 0.900908
VT 0.975210 0.952370 0.909242 0.977904 0.949225 0.951388 0.973716
VA 0.972236 0.956925 0.950839 0.975683 0.977028 0.954801 0.970366
WA 0.988253 0.948562 0.950262 0.982877 0.956434 0.968816 0.941987
WV 0.984364 0.964846 0.907797 0.990264 0.924300 0.979467 0.925198
WI 0.990190 0.930548 0.927619 0.985818 0.943768 0.987609 0.936340
WY 0.944600 0.983109 0.892255 0.960336 0.897551 0.950113 0.880035
M30 -0.762343 -0.678591 -0.614237 -0.747709 -0.680250 -0.747269 -0.726121
DE FL GA ... TN TX UT \
AL 0.980566 0.918544 0.973562 ... 0.998285 0.963876 0.983987
AK 0.939196 0.887891 0.880261 ... 0.946858 0.983235 0.951873
AZ 0.942273 0.994007 0.939715 ... 0.919056 0.892276 0.926676
AR 0.975830 0.915989 0.960708 ... 0.995949 0.981413 0.982867
CA 0.970232 0.987200 0.943928 ... 0.931616 0.902571 0.909573
CO 0.949517 0.905126 0.959500 ... 0.983089 0.970795 0.974909
CT 0.981177 0.926364 0.948500 ... 0.953009 0.919415 0.900908
DE 1.000000 0.947876 0.954346 ... 0.977213 0.943323 0.952441
FL 0.947876 1.000000 0.933753 ... 0.910359 0.881164 0.908197
GA 0.954346 0.933753 1.000000 ... 0.970564 0.920372 0.943421
HI 0.976226 0.909336 0.887794 ... 0.941823 0.916708 0.925630
ID 0.971421 0.947140 0.953024 ... 0.976012 0.943472 0.989533
IL 0.978133 0.948851 0.986683 ... 0.980145 0.925778 0.961563
IN 0.941916 0.873664 0.972737 ... 0.982888 0.928735 0.956452
IA 0.954993 0.888359 0.948792 ... 0.987924 0.959989 0.980798
KS 0.964387 0.903659 0.961825 ... 0.993486 0.978622 0.980113
KY 0.968469 0.895461 0.966719 ... 0.996549 0.961847 0.975918
LA 0.949931 0.899010 0.911625 ... 0.968690 0.989803 0.975590
ME 0.993413 0.932706 0.949576 ... 0.973697 0.946992 0.935993
MD 0.993728 0.968700 0.938240 ... 0.960881 0.935619 0.945962
MA 0.978758 0.931237 0.964604 ... 0.969053 0.943613 0.923883
MI 0.846668 0.846085 0.952179 ... 0.891484 0.806632 0.855976
MN 0.966800 0.955992 0.976933 ... 0.970940 0.944605 0.955689
MS 0.975673 0.917084 0.963318 ... 0.996444 0.977670 0.987812
MO 0.978316 0.936293 0.986001 ... 0.991835 0.958853 0.969655
MT 0.968166 0.909331 0.917504 ... 0.976586 0.967914 0.985605
NE 0.951875 0.888425 0.962706 ... 0.991270 0.966743 0.976138
NV 0.881209 0.971601 0.911678 ... 0.845672 0.791177 0.841324
NH 0.975576 0.943501 0.959112 ... 0.954165 0.930112 0.908947
NJ 0.995132 0.952767 0.950385 ... 0.967025 0.940268 0.935497
NM 0.980594 0.925001 0.949564 ... 0.989390 0.972216 0.986413
NY 0.993814 0.928749 0.947804 ... 0.974697 0.950417 0.937078
NC 0.977472 0.906887 0.976190 ... 0.998354 0.959839 0.976901
ND 0.926355 0.833816 0.849962 ... 0.944451 0.964373 0.942833
OH 0.927542 0.878248 0.980012 ... 0.966237 0.900707 0.935392
OK 0.917902 0.868255 0.893142 ... 0.947590 0.992422 0.951925
OR 0.969869 0.940983 0.945712 ... 0.977083 0.943652 0.991080
PA 0.994948 0.919264 0.946609 ... 0.984959 0.954439 0.956809
RI 0.984731 0.959567 0.951973 ... 0.947561 0.907964 0.906497
SC 0.983353 0.922779 0.976778 ... 0.997851 0.966682 0.979527
SD 0.963422 0.883479 0.931010 ... 0.987597 0.973825 0.979387
TN 0.977213 0.910359 0.970564 ... 1.000000 0.967678 0.982384
TX 0.943323 0.881164 0.920372 ... 0.967678 1.000000 0.956718
UT 0.952441 0.908197 0.943421 ... 0.982384 0.956718 1.000000
VT 0.992088 0.914969 0.929674 ... 0.976577 0.955538 0.947708
VA 0.994223 0.957210 0.939416 ... 0.970906 0.952162 0.953655
WA 0.985085 0.945027 0.956455 ... 0.983588 0.950234 0.984835
WV 0.968813 0.901690 0.931330 ... 0.985509 0.967845 0.983636
WI 0.970690 0.925943 0.974086 ... 0.988615 0.946572 0.977972
WY 0.938938 0.884962 0.869454 ... 0.945079 0.963628 0.965801
M30 -0.758073 -0.627997 -0.706512 ... -0.770422 -0.669410 -0.737147
VT VA WA WV WI WY M30
AL 0.975210 0.972236 0.988253 0.984364 0.990190 0.944600 -0.762343
AK 0.952370 0.956925 0.948562 0.964846 0.930548 0.983109 -0.678591
AZ 0.909242 0.950839 0.950262 0.907797 0.927619 0.892255 -0.614237
AR 0.977904 0.975683 0.982877 0.990264 0.985818 0.960336 -0.747709
CA 0.949225 0.977028 0.956434 0.924300 0.943768 0.897551 -0.680250
CO 0.951388 0.954801 0.968816 0.979467 0.987609 0.950113 -0.747269
CT 0.973716 0.970366 0.941987 0.925198 0.936340 0.880035 -0.726121
DE 0.992088 0.994223 0.985085 0.968813 0.970690 0.938938 -0.758073
FL 0.914969 0.957210 0.945027 0.901690 0.925943 0.884962 -0.627997
GA 0.929674 0.939416 0.956455 0.931330 0.974086 0.869454 -0.706512
HI 0.979103 0.976083 0.963950 0.952790 0.928536 0.935530 -0.755064
ID 0.955898 0.970393 0.994442 0.975239 0.977441 0.956742 -0.721927
IL 0.958711 0.968271 0.982702 0.962100 0.992079 0.911345 -0.753583
IN 0.937365 0.928187 0.955000 0.958981 0.982614 0.889497 -0.773100
IA 0.960204 0.955724 0.976571 0.990479 0.991509 0.955104 -0.785584
KS 0.967734 0.964949 0.977117 0.988007 0.989477 0.956913 -0.748138
KY 0.970702 0.962244 0.977386 0.985453 0.992035 0.938804 -0.785726
LA 0.958907 0.962746 0.967991 0.982913 0.957145 0.988894 -0.683956
ME 0.993570 0.990376 0.969212 0.963035 0.963999 0.929516 -0.769778
MD 0.983851 0.997558 0.981974 0.962220 0.960073 0.945807 -0.729642
MA 0.975046 0.975432 0.953441 0.947520 0.964247 0.904811 -0.758192
MI 0.817081 0.828781 0.862245 0.843538 0.918028 0.741663 -0.686146
MN 0.952722 0.969721 0.973082 0.961230 0.987026 0.927507 -0.723314
MS 0.974975 0.973635 0.986430 0.989047 0.986738 0.961005 -0.750756
MO 0.968741 0.972720 0.980907 0.974606 0.993691 0.930004 -0.747344
MT 0.974065 0.976197 0.985994 0.993622 0.972195 0.990517 -0.756735
NE 0.954657 0.949766 0.969023 0.981915 0.988942 0.938583 -0.761330
NV 0.828018 0.882206 0.882127 0.820529 0.874777 0.779155 -0.543798
NH 0.966338 0.972531 0.944892 0.930573 0.949941 0.892414 -0.722957
NJ 0.987844 0.992944 0.971273 0.956438 0.960854 0.928928 -0.743508
NM 0.977351 0.978702 0.988594 0.985877 0.976586 0.966689 -0.729704
NY 0.994142 0.989544 0.968541 0.962209 0.961359 0.929946 -0.770619
NC 0.973354 0.965901 0.981436 0.978326 0.987338 0.931717 -0.770820
ND 0.957772 0.944229 0.935840 0.972698 0.921882 0.977003 -0.763102
OH 0.912974 0.910193 0.939052 0.933308 0.974849 0.852217 -0.753133
OK 0.930105 0.933030 0.937180 0.959298 0.932422 0.969641 -0.621887
OR 0.959889 0.973285 0.995502 0.984262 0.984121 0.968156 -0.749370
PA 0.997231 0.989277 0.982052 0.978963 0.972162 0.945319 -0.779589
RI 0.970213 0.980550 0.953760 0.930845 0.950360 0.890562 -0.732558
SC 0.977946 0.975200 0.987828 0.982315 0.989425 0.943358 -0.754808
SD 0.976071 0.967219 0.976170 0.994328 0.979649 0.971496 -0.794906
TN 0.976577 0.970906 0.983588 0.985509 0.988615 0.945079 -0.770422
TX 0.955538 0.952162 0.950234 0.967845 0.946572 0.963628 -0.669410
UT 0.947708 0.953655 0.984835 0.983636 0.977972 0.965801 -0.737147
VT 1.000000 0.991347 0.975016 0.976666 0.961824 0.951637 -0.779342
VA 0.991347 1.000000 0.983402 0.973592 0.966393 0.956771 -0.745763
WA 0.975016 0.983402 1.000000 0.984210 0.984955 0.962198 -0.750646
WV 0.976666 0.973592 0.984210 1.000000 0.981398 0.977070 -0.770068
WI 0.961824 0.966393 0.984955 0.981398 1.000000 0.939200 -0.776679
WY 0.951637 0.956771 0.962198 0.977070 0.939200 1.000000 -0.702034
M30 -0.779342 -0.745763 -0.750646 -0.770068 -0.776679 -0.702034 1.000000
[51 rows x 51 columns]
The main column we're interested in here is just M30 compared to everything, so we could do this instead:
print(state_HPI_M30.corr()['M30'])
Output:
AL -0.762343 AK -0.678591 AZ -0.614237 AR -0.747709 CA -0.680250 CO -0.747269 CT -0.726121 DE -0.758073 FL -0.627997 GA -0.706512 HI -0.755064 ID -0.721927 IL -0.753583 IN -0.773100 IA -0.785584 KS -0.748138 KY -0.785726 LA -0.683956 ME -0.769778 MD -0.729642 MA -0.758192 MI -0.686146 MN -0.723314 MS -0.750756 MO -0.747344 MT -0.756735 NE -0.761330 NV -0.543798 NH -0.722957 NJ -0.743508 NM -0.729704 NY -0.770619 NC -0.770820 ND -0.763102 OH -0.753133 OK -0.621887 OR -0.749370 PA -0.779589 RI -0.732558 SC -0.754808 SD -0.794906 TN -0.770422 TX -0.669410 UT -0.737147 VT -0.779342 VA -0.745763 WA -0.750646 WV -0.770068 WI -0.776679 WY -0.702034 M30 1.000000 Name: M30, dtype: float64
Looks like Arizona (AZ) is the weakest negative correlation, at -0.614237. We can grab some more stats quickly with:
print(state_HPI_M30.corr()['M30'].describe())
Output:
count 51.000000 mean -0.699445 std 0.247709 min -0.794906 25% -0.762723 50% -0.748138 75% -0.722442 max 1.000000 Name: M30, dtype: float64
The mean here is just under -0.7, which is pretty consistent with our previous findings, and there's not too much spread here. It should be obvious logically, but the data definitely agrees here that the Mortgage rates play a significant role in housing prices. What I am finding interesting in all of this so far, is how little variance we're seeing. There are some states that diverge, but not many. Most of the states stay very sharply in line with some pretty simple rules. Our 3rd major factor, before diving in more deeply to the local areas, would be the overall economy. From here, we could start looking into demographics by state, as we dive into counties and even neighborhoods. I am wondering, however, if, given such dependable values so far, we could easily already create a formula for the HPI. If not a basic formula, I suspect we could use much of this data possibly in a random forest classifier and do well. For now, let's carry on to the general economy. We're hoping here to see correlation above 0.5. Let's cover that in the next tutorial.