In my previous post, I shared how to predict using a single criteria leveraging machine learning in Python. If you haven’t read that, be sure to read that first here to get a better understanding of what we’re trying to do. In this post, I’ll let computer predict something based on multiple criteria. For example, I want to know the price of a house of a specific size (square footage), specific number of bedrooms, and age of the house. This particular data may not be available in any database but I want to know the closest price prediction for such. In order to do that, we start with some known dataset. I culled some data online for houses for sale in Seattle in 2022. I saved the data locally, and gave it the following headers: area_sqft,bedrooms,age_yrs,price_$
Its dimension is 6 data rows x 4 columns.
I would like to know the price of home in Seattle in 2022 that is 2300 square feet, has 4 bedrooms, and is 37 years old. While that particular information isn’t in the dataset, we give the computer the known dataset that should be enough for it to make a prediction for such.
Just like with the univariate regression model (link to that post above), the set up is the same. I’ll call out the differences here.
When we’re fitting the model, this time we’ll be supplying 2 parameters for training…size, and number of bedrooms.
reg = linear_model.LinearRegression()
reg.fit(df[['area_sqft','bedrooms','age_yrs']], df[['price_$']])
For prediction, we’ll supply the following parameters:
reg.predict([[2300, 4, 37]])
The multivariate linear regression general formula is:
price = (m1x1) + (m2x2) + (m3*x3) + b
where m1,m2,m3 are coefficients; b is intercept. y = price (target var)
and in this example, x1 is area, x2 is bedrooms, x3 is age
Because price varies by size, bedroom and age, we call price a dependent variable, and the others independent variables. The criteria or factors such as size, bedroom, age are known as features in the machine learning lingo.
Once the fitting is done, we can retrieve the coefficient and intercept values are follows:
print(reg.coef_) # out: [[ 661.73771436 -27853.36335893 -9233.18487143]]
print(reg.intercept_) # out: [803130.9565902]
Once we have it all put together, the output looks like this:
Sample output:
area_sqft bedrooms age_yrs price_$
0 3450 3 91 1999500
1 4397 4 70 3795950
2 980 2 68 949000
3 3310 5 101 1849900
4 1720 4 98 900000
5 3140 5 112 1140000
6 1540 3 69 599000
7 3572 5 1 3195950
8 2362 4 1 2500000
9 790 2 113 749950
10 480 1 26 429000
11 3978 4 81 1675000
12 2550 4 116 1850000
13 1800 4 76 1250000
14 2353 3 119 1245000
[[ 661.73771436 -27853.36335893 -9233.18487143]]
[803130.9565902]
[[1872086.40593489]]
Price of a 2300 sqft, 4-bedroom, 37 years old house: $ 1,872,086
Hope this was educational. Come back for more interesting topics! And up your educational bar and fun by checking out my book below!
▛Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!
▟