What is a fair price for a HDB resale flat from a data science perspective?

Updated: Jun 2

(Photo credit: Wikipedia)

I have recently developed a machine learning model after crunching through 60,000+ HDB resale transactions over the period of Jan 2017 to Nov 2019. The data is extracted from www.data.gov.sg.

Here is an example of how the data looks like:

In the midst of crunching the data, I have also developed box plots for the HDB resales prices for each of the estates.

To help you better understand the data, I will use Yishun as an example here. From the diagram, you can see that

Average price- $355983.35

Median price- $350000

Price at 25th percentile- $312000

Price at 75th percentile- $390000

From the plots above, you can see that towns along the North-South MRT line (red line) such as Woodlands, Choa Chu Kang, Sembawang and Yishun are the most affordable towns to buy a HDB resale flat. On the other side of the spectrum, the central area and Queenstown have the most expensive HDB resale flats.

In the process of crunching through the 60,000+ HDB resale transactions to build a viable machine learning model, feature selection is also used to determine which variable such as town, flat type etc contributes most to the HDB resales prices. Based on that, I have also ranked the importance of these various features and the top four features are as follows:

1) Flat Model

2) Remaining Lease

3) Town

4) Flat Type

As such, flat model (improved, new generation etc) is the number one variable in determining HDB resale prices, followed by remaining lease of the flat, then town (Ang Mo Kio, Yishun etc) and lastly, flat type (3 Room, 4 Room etc). This could aid in helping to understand what are the key variables you should be looking at when purchasing/selling your HDB resale flat.

On a side note, I have also recently wrote an article titled "How have the new rules on CPF usage affected old HDB resale prices". In the article, I did a deep dive and investigated how had the new CPF rules which kicked into effect on May last year impacted HDB resales of varying remaining leases. Might be of interest to you.

Now, back to the machine learning model.

