Sentiment analysis has been increasingly used in many aspects of companies’ decision making process nowadays. Through sentiment analysis, companies are able to better understand how do their customers feel about their new product launch, which aspect of customer service irritates their customer the most, how has the brand image grew over the past year etc. Such insights are of high value to the businesses as it allow them to clearly hear and understand the voice of customer. In the past when technology is not as prevalent in our lives, sentiment analysis could not be easily obtained. The easiest method back then is to get someone to go out on the street and ask the public to fill up some long winded customer surveys. As people get more and more connected on social media in the recent years, sentiment analysis is becoming increasingly easier as people show their emotions on their posts or comments willingly on social media. These textual data now allows the companies or even the government (Sentiment analysis was used to understand the impact on presidential elections as early as 2012) to have a much better idea on what/how do people think about them.
So how does sentiment analysis exactly work?
There are generally two ways how sentiments are being analysed or derived. The first way is the method of bag-of-words. In the bag-of-words model, individual known words and their frequency of occurrence are extracted to assign a certain score. Order and structure of the words are ignored in this case. For example, in the sentence "I love to eat vegetables", 5 different words will be extracted with a score assigned to each of them based on a preassigned sentiment score for the word. In this case, the word "love" will have a high positive sentiment score while "vegetables" will likely be a neutral word with zero sentiment score. Adding up the score could then give you a quick understanding on how positive or negative this sentence is with regards to sentiments. The main shortcoming in this method though is the fact that order and structure of the words are ignored, and these could affect a correct representation of the sentiment score in certain cases. For example, in the sentence "This bumpy bus ride is the best bus ride I ever taken!", we could all tell that there is sarcasm involved and will probably assign a negative sentiment score to it. In the bag-of-words-model though, this will likely be assigned a positive sentiment score due to the word "best" being used in the sentence.
The second method is a more advanced form of Natural Language Processing (NLP) which involves the use of more advanced machine learning techniques or even deep learning to better extract relationships between words and hence better understand how the order and structure of words could be taken into context to assign a more accurate sentiment score. This area is still under a lot of research by many Natural Language Processing experts, and will be worth watching in the years ahead.
In my typical "Stock in focus" series, I have usually included sentiment analysis in my analysis of stocks to understand if sentiments of a stock has a huge correlation in its share price. So far, I have did so for Apple, Tesla, Nvidia, Facebook and Visa. You may be interested to know that people sentiment has quite a huge correlation with the stock price for Nvidia.
With the market going through one of the steepest drops in history amid the ultra black swan events (COVID-19 together with Cil Crisis), I am now curious to know how are the sentiments like in the current market in a simple exercise (as illustrated below).
To do this, I first pull out tweets from Twitter using the Twitter Developer API. For more information on how to get your Twitter Developer API key, you may refer here.
Next, I pull out 3000 tweets containing the phrase "stock market" from Twitter at different random time of the day over the course of 5 days. Sentiment Analysis (using bag-of-words model) is then used to determine the sentiment score for each of these tweets. (There are various sentiment analysis models available for you to do such analysis. Some great examples will be Knime and Sentdex)
For example, the tweet below will be assigned a positive sentiment score based on the content.
On the other hand, the tweet below will be assigned a negative sentiment score based on the content.
Here's a snapshot of the data extracted from Twitter together with the sentiment score I assigned next to it with a sentiment analysis model.
Averaging out the sentiment score across the 3000 tweets I collected each day, I then worked out the average sentiment score of the public pertaining to the phrase "stock market" for each of these 5 days.
Here is a table showing the results.
No surprises here, number of tweets with negative sentiment score outweighs number of tweets with positive sentiment score over the past 5 days. The peak of the negative sentiment score happened when I extracted the tweets on 17 Mar morning (which corresponds to 16 Mar night for US). 16 Mar is also infamously the day when US markets see a plunge of 12% (the worst since 1987). Sentiments generally grew better over the next few days as seen with the rise in the average sentiment score.
One interesting discovery from the analysis of recent tweets is that some of the tweets which are assigned a negative sentiment score are in reaction to Donald Trump's way of handling the COVID-19 situation.
For instance, there is some reaction from the public on Donald Trump describing the COVID-19 virus as "Chinese Virus".
You also have people drawing some correlation that whenever Donald Trump mentions "Chinese Virus", the stock market will see quite a substantial drop. Hell hath no fury like the public scorned ah?
Let's see how the presidential election will unfold with the COVID-19 now throwing a spanner in the works. Maybe this is worth a post in the future?
Of course, this is a relatively simple exercise. Just a sampling of 3000 tweets per day for the past 5 days might not be enough to be statically meaningful. Also, pulling the tweets from different times of the day might have some effects on the sentiments depending on what news was released at that point of time (eg. Sentiments might not be too positive if there is just news announcement of lockdown of a particular country). The objective of the exercise here is to briefly illustrate the use of sentiment analysis on something relevant to fellow investors like us who like to make data driven decisions.
Now, with these data in mind, go be a data science investor! #datascienceinvestor
Psst.. If you like what you read, please scroll down and subscribe for regular updates!
コメント