1. Title: Boston Housing Data

2. Sources:
   (a) Origin:  This dataset was taken from the StatLib library which is
                maintained at Carnegie Mellon University.
   (b) Creator:  Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the 
                 demand for clean air', J. Environ. Economics & Management,
                 vol.5, 81-102, 1978.
   (c) Date: July 7, 1993

3. Past Usage:
   -   Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 
       1980.   N.B. Various transformations are used in the table on
       pages 244-261.
    -  Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning.
       In Proceedings on the Tenth International Conference of Machine 
       Learning, 236-243, University of Massachusetts, Amherst. Morgan
       Kaufmann.

4. Relevant Information:

   Concerns housing values in suburbs of Boston.

5. Number of Instances: 506

6. Number of Attributes: 13 continuous attributes (including "class"
                         attribute "MEDV"), 1 binary-valued attribute.

7. Attribute Information:

    1. CRIM      per capita crime rate by town
    2. ZN        proportion of residential land zoned for lots over 
                 25,000 sq.ft.
    3. INDUS     proportion of non-retail business acres per town
    4. CHAS      Charles River dummy variable (= 1 if tract bounds 
                 river; 0 otherwise)
    5. NOX       nitric oxides concentration (parts per 10 million)
    6. RM        average number of rooms per dwelling
    7. AGE       proportion of owner-occupied units built prior to 1940
    8. DIS       weighted distances to five Boston employment centres
    9. RAD       index of accessibility to radial highways
    10. TAX      full-value property-tax rate per $10,000
    11. PTRATIO  pupil-teacher ratio by town
    12. B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks 
                 by town
    13. LSTAT    % lower status of the population
    14. MEDV     Median value of owner-occupied homes in $1000's

8. Missing Attribute Values:  None.

> dd = read.table("housing.txt")
> nom =c("crim","zn","indus","chas","nox","rm","age","dis","rad","tax","pbrat","b","lstat","medv")
> names(dd) = nom
> plot(lstat,medv)
> l1=lm(medv~lstat)
> lines(lstat,l1$fitted.values, col="red")
> l2=lm(log(medv)~log(lstat))
> points(lstat,exp(l2$fitted.values), col="blue",pch=20)
> l4=loess(medv~lstat,span=2/3)
> points(lstat,l4$fitted, col="dark red",pch=20)
> l5 = smooth.spline(lstat,medv,df=5)
> lines(l5$x,l5$y, col="cyan")
> l6=rpart(medv~lstat)
> l6p=predict(l6)
> points(lstat,l6p,col="violet",pch=20)
> l6
n= 506 

node), split, n, deviance, yval
      * denotes terminal node

 1) root 506 42716.3000 22.53281  
   2) lstat>=9.725 294  7006.2830 17.34354  
     4) lstat>=16.085 144  2699.2200 14.26181  
       8) lstat>=19.9 75  1214.0820 12.32533 *
       9) lstat< 19.9 69   898.1933 16.36667 *
     5) lstat< 16.085 150  1626.6090 20.30200 *
   3) lstat< 9.725 212 16813.8200 29.72925  
     6) lstat>=4.65 162  6924.4230 26.64630  
      12) lstat>=5.495 134  5379.1940 25.84701 *
      13) lstat< 5.495 28  1049.9370 30.47143 *
     7) lstat< 4.65 50  3360.8940 39.71800  
      14) lstat>=3.325 32  2109.5620 37.31562 *
      15) lstat< 3.325 18   738.3178 43.98889 *