Fri 11 March 2016

Boston Dataset

Linear Regressions on the famous Boston dataset

In [1]:
import pandas as pd 
import numpy as np
In [2]:
import seaborn as sns 
import matplotlib.pyplot as plt
In [3]:
%matplotlib inline
In [4]:
from sklearn.datasets import load_boston
In [5]:
bostondt = load_boston()
In [6]:
Boston house prices dataset

Boston house prices dataset

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
In [7]:
dataset = load_boston()
df = pd.DataFrame(, columns=dataset.feature_names)
df['target'] =
In [8]:
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
In [9]:
Using various methods to compute the corrolation

In [10]:
             CRIM        ZN     INDUS      CHAS       NOX        RM       AGE  \
CRIM     1.000000 -0.200469  0.406583 -0.055892  0.420972 -0.219247  0.352734   
ZN      -0.200469  1.000000 -0.533828 -0.042697 -0.516604  0.311991 -0.569537   
INDUS    0.406583 -0.533828  1.000000  0.062938  0.763651 -0.391676  0.644779   
CHAS    -0.055892 -0.042697  0.062938  1.000000  0.091203  0.091251  0.086518   
NOX      0.420972 -0.516604  0.763651  0.091203  1.000000 -0.302188  0.731470   
RM      -0.219247  0.311991 -0.391676  0.091251 -0.302188  1.000000 -0.240265   
AGE      0.352734 -0.569537  0.644779  0.086518  0.731470 -0.240265  1.000000   
DIS     -0.379670  0.664408 -0.708027 -0.099176 -0.769230  0.205246 -0.747881   
RAD      0.625505 -0.311948  0.595129 -0.007368  0.611441 -0.209847  0.456022   
TAX      0.582764 -0.314563  0.720760 -0.035587  0.668023 -0.292048  0.506456   
PTRATIO  0.289946 -0.391679  0.383248 -0.121515  0.188933 -0.355501  0.261515   
B       -0.385064  0.175520 -0.356977  0.048788 -0.380051  0.128069 -0.273534   
LSTAT    0.455621 -0.412995  0.603800 -0.053929  0.590879 -0.613808  0.602339   
target  -0.388305  0.360445 -0.483725  0.175260 -0.427321  0.695360 -0.376955   

              DIS       RAD       TAX   PTRATIO         B     LSTAT    target  
CRIM    -0.379670  0.625505  0.582764  0.289946 -0.385064  0.455621 -0.388305  
ZN       0.664408 -0.311948 -0.314563 -0.391679  0.175520 -0.412995  0.360445  
INDUS   -0.708027  0.595129  0.720760  0.383248 -0.356977  0.603800 -0.483725  
CHAS    -0.099176 -0.007368 -0.035587 -0.121515  0.048788 -0.053929  0.175260  
NOX     -0.769230  0.611441  0.668023  0.188933 -0.380051  0.590879 -0.427321  
RM       0.205246 -0.209847 -0.292048 -0.355501  0.128069 -0.613808  0.695360  
AGE     -0.747881  0.456022  0.506456  0.261515 -0.273534  0.602339 -0.376955  
DIS      1.000000 -0.494588 -0.534432 -0.232471  0.291512 -0.496996  0.249929  
RAD     -0.494588  1.000000  0.910228  0.464741 -0.444413  0.488676 -0.381626  
TAX     -0.534432  0.910228  1.000000  0.460853 -0.441808  0.543993 -0.468536  
PTRATIO -0.232471  0.464741  0.460853  1.000000 -0.177383  0.374044 -0.507787  
B        0.291512 -0.444413 -0.441808 -0.177383  1.000000 -0.366087  0.333461  
LSTAT   -0.496996  0.488676  0.543993  0.374044 -0.366087  1.000000 -0.737663  
target   0.249929 -0.381626 -0.468536 -0.507787  0.333461 -0.737663  1.000000  
             CRIM        ZN     INDUS      CHAS       NOX        RM       AGE  \
CRIM     1.000000 -0.571660  0.735524  0.041537  0.821465 -0.309116  0.704140   
ZN      -0.571660  1.000000 -0.642811 -0.041937 -0.634828  0.361074 -0.544423   
INDUS    0.735524 -0.642811  1.000000  0.089841  0.791189 -0.415301  0.679487   
CHAS     0.041537 -0.041937  0.089841  1.000000  0.068426  0.058813  0.067792   
NOX      0.821465 -0.634828  0.791189  0.068426  1.000000 -0.310344  0.795153   
RM      -0.309116  0.361074 -0.415301  0.058813 -0.310344  1.000000 -0.278082   
AGE      0.704140 -0.544423  0.679487  0.067792  0.795153 -0.278082  1.000000   
DIS     -0.744986  0.614627 -0.757080 -0.080248 -0.880015  0.263168 -0.801610   
RAD      0.727807 -0.278767  0.455507  0.024579  0.586429 -0.107492  0.417983   
TAX      0.729045 -0.371394  0.664361 -0.044486  0.649527 -0.271898  0.526366   
PTRATIO  0.465283 -0.448475  0.433710 -0.136065  0.391309 -0.312923  0.355384   
B       -0.360555  0.163135 -0.285840 -0.039810 -0.296662  0.053660 -0.228022   
LSTAT    0.634760 -0.490074  0.638747 -0.050575  0.636828 -0.640832  0.657071   
target  -0.558891  0.438179 -0.578255  0.140612 -0.562609  0.633576 -0.547562   

              DIS       RAD       TAX   PTRATIO         B     LSTAT    target  
CRIM    -0.744986  0.727807  0.729045  0.465283 -0.360555  0.634760 -0.558891  
ZN       0.614627 -0.278767 -0.371394 -0.448475  0.163135 -0.490074  0.438179  
INDUS   -0.757080  0.455507  0.664361  0.433710 -0.285840  0.638747 -0.578255  
CHAS    -0.080248  0.024579 -0.044486 -0.136065 -0.039810 -0.050575  0.140612  
NOX     -0.880015  0.586429  0.649527  0.391309 -0.296662  0.636828 -0.562609  
RM       0.263168 -0.107492 -0.271898 -0.312923  0.053660 -0.640832  0.633576  
AGE     -0.801610  0.417983  0.526366  0.355384 -0.228022  0.657071 -0.547562  
DIS      1.000000 -0.495806 -0.574336 -0.322041  0.249595 -0.564262  0.445857  
RAD     -0.495806  1.000000  0.704876  0.318330 -0.282533  0.394322 -0.346776  
TAX     -0.574336  0.704876  1.000000  0.453345 -0.329843  0.534423 -0.562411  
PTRATIO -0.322041  0.318330  0.453345  1.000000 -0.072027  0.467259 -0.555905  
B        0.249595 -0.282533 -0.329843 -0.072027  1.000000 -0.210562  0.185664  
LSTAT   -0.564262  0.394322  0.534423  0.467259 -0.210562  1.000000 -0.852914  
target   0.445857 -0.346776 -0.562411 -0.555905  0.185664 -0.852914  1.000000  
             CRIM        ZN     INDUS      CHAS       NOX        RM       AGE  \
CRIM     1.000000 -0.462057  0.521014  0.033948  0.603361 -0.211718  0.497297   
ZN      -0.462057  1.000000 -0.535468 -0.039419 -0.511464  0.278134 -0.429389   
INDUS    0.521014 -0.535468  1.000000  0.075889  0.612030 -0.291318  0.489070   
CHAS     0.033948 -0.039419  0.075889  1.000000  0.056387  0.048080  0.055616   
NOX      0.603361 -0.511464  0.612030  0.056387  1.000000 -0.215633  0.589608   
RM      -0.211718  0.278134 -0.291318  0.048080 -0.215633  1.000000 -0.187611   
AGE      0.497297 -0.429389  0.489070  0.055616  0.589608 -0.187611  1.000000   
DIS     -0.539878  0.478524 -0.565137 -0.065619 -0.683930  0.179801 -0.609836   
RAD      0.563969 -0.234663  0.353967  0.021739  0.434828 -0.076569  0.306201   
TAX      0.544956 -0.289911  0.483228 -0.037655  0.453258 -0.190532  0.360311   
PTRATIO  0.312768 -0.361607  0.336612 -0.115694  0.278678 -0.223194  0.251857   
B       -0.264378  0.128177 -0.192017 -0.033277 -0.202430  0.032951 -0.154056   
LSTAT    0.454837 -0.386818  0.465980 -0.041344  0.452005 -0.468231  0.485359   
target  -0.403964  0.339989 -0.418430  0.115202 -0.394995  0.482829 -0.387758   

              DIS       RAD       TAX   PTRATIO         B     LSTAT    target  
CRIM    -0.539878  0.563969  0.544956  0.312768 -0.264378  0.454837 -0.403964  
ZN       0.478524 -0.234663 -0.289911 -0.361607  0.128177 -0.386818  0.339989  
INDUS   -0.565137  0.353967  0.483228  0.336612 -0.192017  0.465980 -0.418430  
CHAS    -0.065619  0.021739 -0.037655 -0.115694 -0.033277 -0.041344  0.115202  
NOX     -0.683930  0.434828  0.453258  0.278678 -0.202430  0.452005 -0.394995  
RM       0.179801 -0.076569 -0.190532 -0.223194  0.032951 -0.468231  0.482829  
AGE     -0.609836  0.306201  0.360311  0.251857 -0.154056  0.485359 -0.387758  
DIS      1.000000 -0.361892 -0.381988 -0.223486  0.168631 -0.409347  0.313115  
RAD     -0.361892  1.000000  0.558107  0.251913 -0.214364  0.287943 -0.248115  
TAX     -0.381988  0.558107  1.000000  0.287769 -0.241606  0.384191 -0.414650  
PTRATIO -0.223486  0.251913  0.287769  1.000000 -0.042152  0.330335 -0.398789  
B        0.168631 -0.214364 -0.241606 -0.042152  1.000000 -0.145430  0.126955  
LSTAT   -0.409347  0.287943  0.384191  0.330335 -0.145430  1.000000 -0.668656  
target   0.313115 -0.248115 -0.414650 -0.398789  0.126955 -0.668656  1.000000  
In [11]:
%timeit df.corr(method='pearson')
1 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [12]:
%timeit df.corr(method='spearman')
24.9 ms ± 7.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [13]:
%timeit df.corr(method='kendall')
75 ms ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [14]:
pearson = df.corr(method= 'pearson')
In [15]:
pearson = df.corr(method= 'pearson')
#assuming target attr is the last then we remove corr wit itself
corr_with_target = pearson.iloc[-1][:-1]
#attri sorted from the most predictive
#predictivity = corr_with_target.argsort(ascending= False)
predictivity = corr_with_target.sort_values(ascending= False)

Since we might be also interested in strong negative correlations it would be better to sort the correlations by the absolute value.

In [16]:
In [17]:
In [18]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
The info method used above helps us understand the nature of our dataset in this case we can see we have 506 entries 14 columns and and they are all numerical type(Float)

In [19]:
count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032 12.653063 22.532806
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864 7.141062 9.197104
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000 1.730000 5.000000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500 6.950000 17.025000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000 11.360000 21.200000
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.623500 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000 16.955000 25.000000
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000 37.970000 50.000000

Taking a closer view of our columns

In [20]:
Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT', 'target'],
In [21]:
<seaborn.axisgrid.PairGrid at 0x1d38e3b4710>
<Figure size 720x360 with 0 Axes>

Lets find out what kind of distribution our dealing with, the distribution can tell us what type of model we should proceed with.

In [58]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d39951b9b0>
In [59]:
In [60]:
In [61]:
sns.heatmap(df.corr(), annot= True)
<matplotlib.axes._subplots.AxesSubplot at 0x1d3995b5198>

From that heatmap we can see some of the features that will pick our interest, for example we got RM it did well with our target the rest of them not so well so for now we can focus on RM(Number of Rooms) which by the way was shown to be a conteder from when we were still choosing the type of correlation method were going to us.

Now we come back to remind ourselves of the columns

Lets draw some more graphs a scatter graph is always good because most people a familiar with this kind of graph so it will help us here.

In [62]:
sns.scatterplot(y='target', x='RM', data=df)
<matplotlib.axes._subplots.AxesSubplot at 0x1d399967940>

From the graph above its difficult to make a model based on this one feature because even though there is a positive relationship it is not enough, more work to be done before we can convice our selves.

Lets introduce a the scipy library just to check if it can improve the performance of our one feature model.

In [27]:
from scipy import stats
In [28]:
slope, intercept, r_value, p_value, std_err = stats.linregress(df['target'], df['RM'])
In [29]:
print("R Value: " ,r_value)
print("RSquared Value: " ,r_value ** 2)
print("Intercept: " ,intercept)
R Value:  0.6953599470715394
RSquared Value:  0.483525455991334
Intercept:  5.087638671836054

Again not bad numbers considring the fact that were dealing with real data, 0.7 R value is good. But the question is always whether we can do better to reassure ourselves. We could try to fit a curve or line but those efforts would be futile as we can clearly see from the none those options would work.

we do have one more trick up our sleeves to try and model this data with out any trainning involved, we could use a scaler to fit all our data into a range and then use that range to compute a coef.

For this we are going to need the statsmodels library and scikitlearn's Standardscaler.

In [30]:
In [71]:
X= df[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX','PTRATIO', 'B', 'LSTAT']]
y= df['target']
In [32]:
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

X= df[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX','PTRATIO', 'B', 'LSTAT']]
y= df['target'] makes the feature scaled to 1--->-1

X= df[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX','PTRATIO', 'B', 'LSTAT']] = scale.fit_transform(X= df[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX','PTRATIO', 'B', 'LSTAT']].as_matrix())

print (X)

est = sm.OLS(y, X).fit()

[[-0.41978194  0.28482986 -1.2879095  ... -1.45900038  0.44105193
  -1.0755623 ]
 [-0.41733926 -0.48772236 -0.59338101 ... -0.30309415  0.44105193
 [-0.41734159 -0.48772236 -0.59338101 ... -0.30309415  0.39642699
  -1.2087274 ]
 [-0.41344658 -0.48772236  0.11573841 ...  1.17646583  0.44105193
 [-0.40776407 -0.48772236  0.11573841 ...  1.17646583  0.4032249
 [-0.41500016 -0.48772236  0.11573841 ...  1.17646583  0.44105193
OLS Regression Results
Dep. Variable: target R-squared: 0.106
Model: OLS Adj. R-squared: 0.082
Method: Least Squares F-statistic: 4.477
Date: Wed, 21 Nov 2018 Prob (F-statistic): 3.14e-07
Time: 23:55:38 Log-Likelihood: -2304.8
No. Observations: 506 AIC: 4636.
Df Residuals: 493 BIC: 4691.
Df Model: 13
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
x1 -0.9281 1.388 -0.669 0.504 -3.654 1.798
x2 1.0816 1.571 0.688 0.492 -2.006 4.169
x3 0.1409 2.071 0.068 0.946 -3.928 4.210
x4 0.6817 1.074 0.635 0.526 -1.429 2.792
x5 -2.0567 2.173 -0.947 0.344 -6.325 2.212
x6 2.6742 1.441 1.855 0.064 -0.158 5.506
x7 0.0195 1.825 0.011 0.991 -3.567 3.605
x8 -3.1040 2.062 -1.506 0.133 -7.154 0.946
x9 2.6622 2.836 0.939 0.348 -2.909 8.234
x10 -2.0768 3.111 -0.668 0.505 -8.189 4.035
x11 -2.0606 1.390 -1.482 0.139 -4.792 0.671
x12 0.8493 1.204 0.706 0.481 -1.516 3.214
x13 -3.7436 1.778 -2.106 0.036 -7.236 -0.251
Omnibus: 178.041 Durbin-Watson: 0.045
Prob(Omnibus): 0.000 Jarque-Bera (JB): 783.126
Skew: 1.521 Prob(JB): 8.84e-171
Kurtosis: 8.281 Cond. No. 9.82

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

We can see in the Coef column that room number is out perfoming the rest even when we scale down to a small range of values for all features. But then again we have a few other features that scored a 2.5 and above, this makes us rethink our focus on the one feature prediction model.

The new conteder happens to be Rad, Rad perfomed badly in terms of correlation but now we have new information that if we scaleld all the features to a small range, the rand feature is worth paying attention to.

This means its time to introduce our favuorite library again, it might have an algorithm that would do a better job with all the features than our scale estimator.

In [74]:
from sklearn.model_selection import train_test_split
In [75]:
X_train,X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
In [76]:
from sklearn.linear_model import LinearRegression
In [77]:
lm = LinearRegression()
In [78]:,y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
In [79]:
print('Intercept: ',lm.intercept_)
print('Coef: ',lm.coef_)
Intercept:  22.225777226574095
Coef:  [-0.66646229  0.97929727  0.62472316  1.04873206 -2.31254853  2.02868973
  0.45424819 -2.6605586   2.26313467 -1.87315532 -1.90447001  0.64066261

The coef above is

In [81]:
coeffecients = pd.DataFrame(lm.coef_,X.columns)
coeffecients.columns = ['Coeffecient']
CRIM -0.666462
ZN 0.979297
INDUS 0.624723
CHAS 1.048732
NOX -2.312549
RM 2.028690
AGE 0.454248
DIS -2.660559
RAD 2.263135
TAX -1.873155
PTRATIO -1.904470
B 0.640663
LSTAT -4.590607
In [82]:
CRIM -0.666462
ZN 0.979297
INDUS 0.624723
CHAS 1.048732
NOX -2.312549
RM 2.028690
AGE 0.454248
DIS -2.660559
RAD 2.263135
TAX -1.873155
PTRATIO -1.904470
B 0.640663
LSTAT -4.590607

Predictions using the coefficient

In [83]:
predictions = lm.predict(X_test)
In [84]:
In [85]:
array([38.76995104, 27.39271318, 16.26805601, 16.64592872, 30.5945708 ,
       31.37975753, 37.68282481,  7.57986744, 33.62371472,  6.94206736,
       30.00015138, 13.74184077, 16.41357803, 17.5975484 , 24.92452314,
       20.61277162,  6.84027833, 32.74459645, 28.14176473, 24.87051184,
       12.01460369, 19.89597528, 22.93223855, 24.84808083, 33.41944923,
       18.2663553 , 32.40616206, 19.07263109, 27.85446156, 33.36724349,
       20.31071184, 18.71427039, 36.3942392 , 43.97914411, 28.53636198,
       22.23810379, 15.23341286, 18.4441601 ,  2.99896469, 30.75373687,
       23.98495287, 17.65233987, 33.49269972, 13.72450288, 17.45026475,
       25.3864821 , 29.9370352 , 16.43822597, 27.0157306 , 23.23886475,
       31.8958797 , 36.8917952 , 22.96758436, 18.06656811, 30.34602124,
       -0.30828515, 19.8446382 , 16.6131071 , 23.63902347, 21.26225918,
       29.69766593,  3.14282554, 16.86387632, 19.76329036,  9.71050797,
       24.21870511, 24.27695942, 19.87071765, 17.16247142, 19.85216234,
       23.74078001, 21.56791537, 23.14099313, 20.54638573, 27.77053085,
       21.2590119 , 36.87579928,  8.05035628, 28.9146871 , 16.70037511,
       15.70980238, 19.14484394, 29.65683713, 16.86617546, 10.15073018,
       21.34814159, 21.81482232, 32.18098353, 22.24314075, 21.75449868,
       12.50117018, 10.64264803, 22.59103858, 32.00987194,  5.75604165,
       34.05952126,  7.04112579, 31.53788515,  9.02176123, 21.19511453,
       32.37147301, 21.32823602, 27.19438339, 24.91207186, 23.08174295,
       24.76969659, 24.77145042, 30.14032582, 36.63344929, 32.59298802,
       23.27852444, 35.5111093 , 24.17973314, 22.05040637, 29.57566524,
       26.94598149, 28.86934886, 30.98598123, 26.77898549, 28.83037557,
       16.05739187, 20.89220193, 21.91047939, 36.88601261, 25.01402328,
       23.53157107, 15.12274061,  5.50883218, 14.14631563, 23.87422049,
       26.85906918, 33.17708597, 24.22078613, 19.60743115, 24.54377589,
       26.24871922, 30.8997013 , 26.2619873 , 33.44890707, 23.05544279,
       12.12838356, 35.44082938, 31.79591619, 16.5997814 , 25.17956469,
       19.77417177, 20.07188943, 24.67905941, 26.64881616, 29.50609111,
       16.87246772, 16.25039628, 40.96167542, 36.18058639, 22.00214486,
       21.47973172, 23.48638653, 12.67663095, 20.83340172, 24.99555373,
       19.27796673, 29.13806185, 40.15324017, 22.1316772 , 26.14454982,
       23.02029457, 18.61562996, 30.48499643, 17.42381182, 10.92515821,
       18.66294924, 33.26084439, 34.96275041, 20.74820685,  1.70547647,
       18.03065088, 27.34915728, 18.06414053, 28.56520062, 24.41093319,
       27.53096541, 20.55435421, 22.62919622, 37.78233999, 26.87713512,
       37.38740447, 25.79142163, 14.81336505, 22.11034091, 17.09095927,
       25.08768209, 35.57385009,  8.21251303, 20.29558413, 19.03028948,
       26.45168363, 24.24592238, 18.52485619, 21.43469229, 35.01450733,
       20.96970996, 23.6978562 , 28.08966447])
In [86]:
<matplotlib.collections.PathCollection at 0x1d3996cb320>
In [87]:
<matplotlib.collections.PathCollection at 0x1d399712e80>
In [88]:
In [89]:
In [90]:
from sklearn import metrics
In [91]:
print('MAE:    ',metrics.mean_absolute_error(y_test,predictions))
print('MSE:    ',metrics.mean_squared_error(y_test,predictions))
print('sqtMSE: ',np.sqrt(metrics.mean_squared_error(y_test,predictions)))
MAE:     3.9051448026275075
MSE:     29.416365467452838
sqtMSE:  5.423685598138302
In [ ]:

In [ ]: