(Photo credit: Wikipedia)

**Date of Analysis: 27 December 2019**

**Period of data: Dec 2016 to Dec 2019**

**Number of transactions analyzed: 3308**

(transaction data extracted from URA website)

District 9 is one of the prime districts within the CCR (Core Central Region) of Singapore. Most people will recognise it as the "very atas" part of Singapore. It comprises of few neighbourhoods such as Orchard and River Valley. Some of the private properties in this region are **8 Saint Thomas**, **Twentyone Angullia Park** and **OUE Twin Peaks** etc. Recent new properties in the area are **Martin Modern, Haus on Handy** and **The Iveria**

How do the private properties in D9 generally fare? Using ** box plots**, here are the details for each of the properties in D9.

*More box plots of other condominiums in this district (together with all the other districts) could be unlocked when you become a patron (**https://www.patreon.com/datascienceinvestor**)*

To help you better understand the data, I will use **The MARQ on Paterson Hill** as an example here. From the diagram, you can see that

Average price- $3778.9 psf

Median price- $3690 psf

Price at 25th percentile- $3328 psf

Price at 75th percentile- $4092 psf

I personally think that box plot is a good way to present the data. In this case, you can easily see the average price, median price, price at 25th percentile and price at 75th percentile from the plots. You could also tell at one glance how wide the spread of prices are for any of the condominium projects. Pretty neat, I will think.

The metric used here is $psf as it is a common indicator to reflect property prices.

The most expensive condominium in D9 is **The MARQ on Paterson Hill** with an average price of $3778.9 psf while the most affordable condominium in D9 is **Peace Center/Mansions** with an average price of $595.6 psf. Peace Center/Mansions had went through a few en bloc attempts in the recent years, with the __fifth attempt__ happening in the early part of 2019.

Now, let's take a look at the various ** scatter plots** to have a better insight of how the property prices perform across 3308 transactions in the past 3 years.

First, a scatter plot of the $psf against date.

In scatter plot, we could derive r coefficient, which is used to explain the strength of the linear relationship between 2 variables. Since we are using $psf and date as the variables, r coefficient allows us to better understand how the $psf changes with time. To some extent, if the r coefficient is high, we could roughly assume that the $psf increases positively with time. The r coefficient (or much simply/loosely put, the gradient for the line of best fit) in the scatter plot above is 0.21. This generally means that the $psf is increasing in the past 3 years. This performance is generally the same as __D23, an OCR district,__ and pales in comparison to __D13, a RCR district.__

From this line of best fit, you could also better understand if you are "over-paying" for your property purchase (eg. if you property is above the line of best fit). Taking a quick glance at the scatter plot, your transaction will be on the high side if you are paying more than $2300 psf in Oct 2018. Of course, there could be many factors such as location, tenure etc that could influence your buying price. This is still a general assumption.

So, which projects perform remarkably well comparatively in the past 3 years?

The plot above shows a myriad of lines of best fit from various different projects in D9.

You could see that there is a good mixture of properties with $psf increasing and properties with $psf even decreasing over the past 3 years! Isn't it shocking to know that properties in D9 could actually drop in $psf?

2 of the top performing projects from the graph above are **Scotts Square **and **Aspen Heights**. The r coefficients for both of these projects is 0.74 which is much higher than r coefficient of 0.25 for the general trend line for all transactions. This goes to show that these two projects have substantial increment in $psf as compared to the other projects!

**Scotts Square** is a freehold mixed development which was launched in 2010. Having a freehold status together with being part of a mixed development with shopping amenities right at your doorstep in the heart of Singapore is indeed rare and prestigious, and thus explains the performance of this condominium project. Aspen Heights is a 999 year condominium project which TOP in 1998 with a reasonable $psf around $1600psf, hence making it an attractive property to invest in for long term returns in D9.

Surprisingly, some projects in D9 didn't perform really well. For eg, **8 Saint Thomas** and **OUE Twin Peaks** actually have their $psf decrease over the course of these 3 years. So everyone, please do not have the misconception that buying a property in Core Central Region (CCR) is a sure win.

Next, how do freehold perform against leasehold during this 3 years period?

I have only included freehold transactions in this plot and you could see that the r coefficient of 0.25 is not too different than the r coefficient of 0.21 for the scatter plot with all transactions. This means that the freehold properties in D9 perform quite similarly to the leasehold properties in D9 in terms of $psf over the past 3 years.

Also, how about apartments of various sizes? How do they perform against each other?

Surprisingly, apartments with size less than 500 sqft perform the best in terms of $psf increment over the past 3 years. This shows that investing in a one bedder or studio apartment in D9 is a worthy investment. Such results are a bit different from what we have seen in __D13 (RCR)__ and __D23 (OCR)__ where one bedder apartment usually do not perform as well.

What you have seen above are largely data insights that we have derive using the various data science tools. But, what if we could actually use these insights to build machine learning model to attempt to predict the prices of the properties in D9 and understand if the prices the seller is asking for is reasonable? How could we do that?

We could try various different machine learning models to attempt to do so. Some examples of such machine learning models we could use are ** random forest** and

**. They are methods which we could generally use to apply regression techniques to attempt to construct a linear relationship between price and various other variables (in this case, it will be project name, date of sales, size of flat etc). What we ultimately try to construct is a predictive model which allows us to have the highest confidence in prediction by attempting to reducing as much prediction errors as possible (think about**

__linear regression__**Mean Absolute Error**and

**Root Mean Squared Error**)

If you are already feeling confused at this point of time, don't be as these information are highly technical in nature. You may read up more about them if you want to. Otherwise, I believe the information above in the box plots and scatter plots are more than enough for you to better understand the property prices in D9. I will also attempt to explain or illustrate more of this in a separate post in the future.

Running through all 3308 transactions through several machine learning models, I eventually achieve a model which provides me with suitable evaluation results (MAE of 266978, RMSE of 653089 and R2 of 0.989).

I then now try to put this machine learning model to practice and use it to determine what should be a reasonable price for the following property.

Project: Twentyone Angullia Park

Area: 2260sqft

Floor level: Middle (i'm going to assume it is from level 11 to 15)

Running through the machine learning model which I have created, the price I have obtained is __$8,708,569__ which is just slightly lesser than the asking price of $8,887,880. This might then suggest that the asking price is reasonable. But of course, more investigation will also be needed to look at other factors beyond these parameters.

Of course, the above example is just a glimpse of what is achievable as you could actually use it to determine a lot more property prices in the region. In the future, I will also consider uploading this machine learning model online so you could actually use it to determine/predict property prices based on this model. But that's a story for another day.

Now, with these data in mind, go be a data science investor!

*Psst.. If you like what you read, please scroll down and subscribe for regular updates!*

## Comments