Extract stock sentiments from financial news headlines in FinViz website using Python
FinViz is definitely one of my favourite go-to websites for information on the stock market. From fundamental ratios, technical indicators to news headlines and insider training data, it is a perfect stock screener. Furthermore, it has updated information on the performance of each sector, industry and any major stock index.
An example of the news headlines section for Amazon (with ticker ‘AMZN’) from the FinViz website is given below. Feel free to visit it and scroll down to this section to see it for yourself! This section is updated live, for every single stock.
Instead of having to go through each headline for every stock you are interested in, we can use Python to parse this website data and perform sentiment analysis (i.e. assign a sentiment score) for each headline before averaging it over a period of time.
The idea is that the averaged value may give valuable information for the overall sentiment of a stock for a given day (or week if you decide to average over a week’s news). What makes it easier to parse the website is that you simply have to add the stock ticker at the end of this url ‘https://finviz.com/quote.ashx?t=’ to parse it (see the url in the image above). Let’s get right down to it!
Update: As an update, I have written another article on applying everything in this article to build a Stock Sentiment Dashboard Web App on Flask and deploying it online. Feel free to check it out after you have read this article.
First, we import the libraries that we need to store the data. ‘BeautifulSoup’ is needed to parse data from FinViz while ‘requests’ is needed to get data. ‘Pandas’ is used to store the data in DataFrames while ‘Matplotlib’ is used to plot the sentiment on a chart. Finally, the ‘nltk.sentiment.vader’ library is used to perform sentiment analysis on the news headlines!
# Import libraries
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
# NLTK VADER for sentiment analysis
from nltk.sentiment.vader import SentimentIntensityAnalyzer
Let’s take a closer look at the news headlines for Amazon (AMZN) and its corresponding html code below. You can also visit the FinViz page and view the html code in your browser.
Notice from the above code that all the news is stored into a table with id=“news-table”. I have included two rows of data from the table, bounded by <tr> </tr> tags. The code for one of the rows is boxed up. Note the date and time data between the first <td></td> tags in the box, and the news headline text in the <a></a> tags. We are going to extract the date, time and news headline for each row and perform sentiment analysis on the news headline.
The code below shows stores the entire ‘news-table’ from the FinViz website into a Python dictionary, news_tables, for theses stocks — Amazon (AMZN), Tesla (TSLA) and Google(GOOG) (or rather Alphabet, the company that owns Google). You can include as many tickers as you want in the tickers list.
finwiz_url = 'https://finviz.com/quote.ashx?t='news_tables =
tickers = ['AMZN', 'TSLA', 'GOOG']for ticker in tickers:
url = finwiz_url + ticker
req = Request(url=url,headers='User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0')
response = urlopen(req)
# Read the contents of the file into 'html'
html = BeautifulSoup(response)
# Find 'news-table' in the Soup and load it into 'news_table'
news_table = html.find(id='news-table')
# Add the table to our dictionary
news_tables[ticker] = news_table
To get a sense of what is stored in the news_tables dictionary for ‘AMZN’. Feel free to run the code below, which iterates through each <tr></tr> tags (for the first 4 rows) to obtain the headlines between the <a></a> tags and the date and time between the <td></td> tags before printing them out. This step is optional and is for your own learning.
# Read one single day of headlines for ‘AMZN’
amzn = news_tables[‘AMZN’]
# Get all the table rows tagged in HTML with <tr> into ‘amzn_tr’
amzn_tr = amzn.findAll(‘tr’)for i, table_row in enumerate(amzn_tr):
# Read the text of the element ‘a’ into ‘link_text’
a_text = table_row.a.text
# Read the text of the element ‘td’ into ‘data_text’
td_text = table_row.td.text
# Print the contents of ‘link_text’ and ‘data_text’
# Exit after printing 4 rows of data
if i == 3:
You should get something like this below (with more updated headlines of course).
The following code is similar to the one above, but this time it parses the date, time and headlines into a Python list called parsed_news instead of printing it out. The if, else loop is necessary because if you look at the news headlines above, only the first news of each day has the ‘date’ label, the rest of the news only has the ‘time’ label so we have to account for this.
parsed_news = # Iterate through the news
for file_name, news_table in news_tables.items():
# Iterate through all tr tags in 'news_table'
for x in news_table.findAll('tr'):
# read the text from each tr tag into text
# get text from a only
text = x.a.get_text()
# splite text in the td tag into a list
date_scrape = x.td.text.split()
# if the length of 'date_scrape' is 1, load 'time' as the only elementif len(date_scrape) == 1:
time = date_scrape
# else load 'date' as the 1st element and 'time' as the second
date = date_scrape
time = date_scrape
# Extract the ticker from the file name, get the string up to the 1st '_'
ticker = file_name.split('_')
# Append ticker, date, time and headline as a list to the 'parsed_news' list
parsed_news.append([ticker, date, time, text])
parsed_news[:5] # print first 5 rows of news
Part of your list from the above code with look like this. Notice that it is actually a list of lists, with each list containing the ticker symbol, date, time and corresponding news-headline.
It is now time to perform sentiment analysis with nltk.sentiment.vader, finally! We store the ticker, date, time, headlines in a Pandas DataFrame, perform sentiment analysis on the headlines before adding an additional column in the DataFrame to store the sentiment scores for each headline.
# Instantiate the sentiment intensity analyzer
vader = SentimentIntensityAnalyzer()# Set column names
columns = ['ticker', 'date', 'time', 'headline']# Convert the parsed_news list into a DataFrame called 'parsed_and_scored_news'
parsed_and_scored_news = pd.DataFrame(parsed_news, columns=columns)# Iterate through the headlines and get the polarity scores using vader
scores = parsed_and_scored_news['headline'].apply(vader.polarity_scores).tolist()# Convert the 'scores' list of dicts into a DataFrame
scores_df = pd.DataFrame(scores)# Join the DataFrames of the news and the list of dicts
parsed_and_scored_news = parsed_and_scored_news.join(scores_df, rsuffix='_right')# Convert the date column from string to datetime
parsed_and_scored_news['date'] = pd.to_datetime(parsed_and_scored_news.date).dt.dateparsed_and_scored_news.head()
The first 5 rows of the DataFrame from the code above should look something like this. The ‘compound’ column gives the sentiment scores. For positive scores, the higher the value, the more positive the sentiment is. Similarly for negative scores, the more negative the value, the more negative the sentiment is. The scores range from -1 to 1.
Feel free to refer to this article for more information about the nltk.sentiment.vader library and more information on sentiment analysis.
The following code takes the average of the sentiment scores for all news headlines collected during each date and plots it on a bar chart. You can average the scores for each week too, to obtain the overall sentiment for a week.
plt.rcParams['figure.figsize'] = [10, 6]# Group by date and ticker columns from scored_news and calculate the mean
mean_scores = parsed_and_scored_news.groupby(['ticker','date']).mean()# Unstack the column ticker
mean_scores = mean_scores.unstack()# Get the cross-section of compound in the 'columns' axis
mean_scores = mean_scores.xs('compound', axis="columns").transpose()# Plot a bar chart with pandas
mean_scores.plot(kind = 'bar')
The above code gives rise to the following chart. Notice that on some days without news headlines for any particular stock, there would be no sentiment score.
I hope you find this useful. All code is available in this Python Notebook in my GitHub repository. Of course, it is now up to you to decide what to do with the sentiment scores obtained! You can try doing machine learning with it if you want!
If you enjoyed this article, you may also wish to read my other articles on my follow up stock sentiment projects.
First, I built a Stock Sentiment Heat Map Dashboard for a portfolio of stocks in this article.
I also applied everything here to build a Stock Sentiment Dashboard Web App on Flask and deployed it online.