STEM

Machine Learning In Python (Multivariate Linear Regression)

In my previous post, I shared how to predict using a single criteria leveraging machine learning in Python. If you haven’t read that, be sure to read that first here to get a better understanding of what we’re trying to do. In this post, I’ll let computer predict something based on multiple criteria. For example, I want to know the price of a house of a specific size (square footage), specific number of bedrooms, and age of the house. This particular data may not be available in any database but I want to know the closest price prediction for such. In order to do that, we start with some known dataset. I culled some data online for houses for sale in Seattle in 2022. I saved the data locally, and gave it the following headers: area_sqft,bedrooms,age_yrs,price_$
Its dimension is 6 data rows x 4 columns.

I would like to know the price of home in Seattle in 2022 that is 2300 square feet, has 4 bedrooms, and is 37 years old. While that particular information isn’t in the dataset, we give the computer the known dataset that should be enough for it to make a prediction for such.

Just like with the univariate regression model (link to that post above), the set up is the same. I’ll call out the differences here.

When we’re fitting the model, this time we’ll be supplying 2 parameters for training…size, and number of bedrooms.

reg = linear_model.LinearRegression()
reg.fit(df[['area_sqft','bedrooms','age_yrs']], df[['price_$']])

For prediction, we’ll supply the following parameters:

reg.predict([[2300, 4, 37]])

The multivariate linear regression general formula is:
price = (m1x1) + (m2x2) + (m3*x3) + b
where m1,m2,m3 are coefficients; b is intercept. y = price (target var)
and in this example, x1 is area, x2 is bedrooms, x3 is age

Because price varies by size, bedroom and age, we call price a dependent variable, and the others independent variables. The criteria or factors such as size, bedroom, age are known as features in the machine learning lingo.

Once the fitting is done, we can retrieve the coefficient and intercept values are follows:

print(reg.coef_) # out: [[   661.73771436 -27853.36335893  -9233.18487143]]
print(reg.intercept_) # out: [803130.9565902]

Once we have it all put together, the output looks like this:

Sample output:

    area_sqft  bedrooms  age_yrs  price_$
0        3450         3       91  1999500
1        4397         4       70  3795950
2         980         2       68   949000
3        3310         5      101  1849900
4        1720         4       98   900000
5        3140         5      112  1140000
6        1540         3       69   599000
7        3572         5        1  3195950
8        2362         4        1  2500000
9         790         2      113   749950
10        480         1       26   429000
11       3978         4       81  1675000
12       2550         4      116  1850000
13       1800         4       76  1250000
14       2353         3      119  1245000
[[   661.73771436 -27853.36335893  -9233.18487143]]
[803130.9565902]
[[1872086.40593489]]
Price of a 2300 sqft, 4-bedroom, 37 years old house: $ 1,872,086

Hope this was educational. Come back for more interesting topics! And up your educational bar and fun by checking out my book below!



Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top