Yahoo Finance Web Scraping




How can we download fundamentals data with Python?

The world's biggest social media company has been at loggerheads with Apple's 'App Tracking Transparency' feature, expected to kick in with the latest iPhone software update next week, which.

In this post we will explore how to download fundamentals data with Python. We’ll be extracting fundamentals data from Yahoo Finance using the yahoo_fin package. For more on yahoo_fin, including installation instructions, check out its full documentation here or my YouTube video tutorials here.

Getting started

Now, let’s import the stock_info module from yahoo_fin. This will provide us with the functionality we need to scrape fundamentals data from Yahoo Finance. We’ll also import the pandas package as we’ll be using that later to work with data frames.

  • What is yahoofin? Yahoofin is a Python 3 package designed to scrape historical stock price data, as well as to provide current information on market caps, dividend yields, and which stocks comprise the major exchanges.
  • Apr 04, 2021 The personal details of more than 500 million Facebook (FB) users, including phone numbers, Facebook IDs, full names, locations, birthdates and email addresses, have reportedly been posted on a website for hackers. According to Business Insider, which first reported the availability of the data.

Next, we’ll dive into getting common company metrics, starting with P/E ratios.

How to get P/E (Price-to-Earnings) Ratios

There’s a couple ways to get the current P/E ratio for a company. First, we can use the get_quote_table method, which will extract the data found on the summary page of a stock (see here).

Next, let’s pull the P/E ratio from the dictionary that is returned.

A company’s P/E ratio can also be extracted from the get_stats_valuation method. Running this method returns a data frame of the “Valuation Measures” on the statistics tab for a stock.

Next, let’s extract the P/E ratio.

How to get P/S (Price-to-Sales) Ratios

Another popular metric is the P/S ratio. We can get the P/S ratio, along with several other other metrics, using the same get_stats_valuation method. Let’s use the object we pulled above, currently stored as val.

Then, we can get the Price/Sales ratio like below.

Getting fundamentals stats for many stocks at once

Now, let’s get the Price-to-Earnings and Price-to-Sales ratios for each stock in the Dow. We could also do this for a custom list of tickers as well.

Price-to-Earnings ratio for each Dow stock

The P/E ratio for each stock can be obtained in a single line:

Getting the Price-to-Sales ratio for each Dow stock

After the above code, we can get the Price / Sales ratios for each stock like below.

How to get Price / Book ratio

Similarly, we can get the Price-to-Book ratio for every stock in our list below.

How to get PEG ratio

Next, let’s get the PEG (Price / Earnings-to-Growth ratio).

How to get forward P/E ratios

We can get forward P/E ratios like this:

Getting additional stats from multiple stocks

In addition to the “Valuation Measures” table on the stats tab, we can also scrape the remaining data points on the webpage using the get_stats method. Calling this method lets us extract metrics like Return on Equity (ROE), Return on Assets, profit margin, etc. Click here to see the webpage for Apple.

Similar to above, we can get this information for each stock in the Dow.

How to get Return on Equity (ROE)

Using the result data frame, combined_extra_stats, let’s get Return on Equity for each stock in our list.

How to get Return on Assets

A simple tweak gives us Return on Assets for each stock.

How to get profit margin

To get profit margin, we just need to adjust our filter like below.

How to get balance sheets

We can extract balance sheets from Yahoo Finance using the get_balance_sheet method. Using the data frame that is returned, we can get several attributes about the stock’s financials, including total cash on hand, assets, liabilities, stockholders’ equity, etc.

How to get total cash on hand

We can see the “Total Cash” row in the balance sheet by filtering for “cash”. This will give us the total cash value for the last several years.

How to get stockholders’ equity

Next, we can also get Total Stockholders’ Equity.

How to get a company’s total assets

Now, let’s get Total Assets.

How to get balance sheets for many stocks at once

Yahoo Finance Web Scraping

Like with the company statistics tables we pulled earlier, we can also download the balance sheet for all the stocks in the Dow (or again, a custom list of your choice).

From here, we could then look at values from the balance sheets across multiple companies at once. For example, the code below combines the balance sheets from each stock in the Dow. Since each individual balance sheet may have different column headers (from different dates), we’ll just get the most recent column of data from the balance sheet for each stock.

Now we have a data frame containing the balance sheet information for each stock in our list. For example, we can look at the Total Assets for each Dow stock like this:

How to get income statements

Next, let’s examine income statements. Income statements can be downloaded from Yahoo Finance using the get_income_statement method. See an example income statement here.

Yahoo Finance Web Scraping Tutorial

Using the income statement, we can examine specific values, such as total revenue, gross profit, total expenses, etc.

Yahoo Stock Data Python

Looking at a company’s total revenue

To get the total revenue, we just need to apply a filter like previously.

Getting a company’s gross profit

Similarly, we can get the gross profit:

Getting the income statement from each Dow stock

Next, let’s pull the income statement for each Dow stock.

Now, we can look at metrics in the income statement across multiple companies at once. First, we just need to combine the income statements together, similar to how we combined the balance sheets above.

Now that we have a combined view of the income statements across stocks, we can examine specific values in the income statements, such as Total Revenue, for example.

How to extract cash flow statements

In this section, we’ll extract cash flow statements. We can do that using the get_cash_flow method.

Here’s the first few rows of the cash flow statement:

Now let’s get the cash flow statements of each Dow stock.

Again, we combine the datasets above, using similar code as before.

Now, we can examine information in the cash flow statements across all the stocks in our list.

Getting dividends paid across companies

One example to look at in a cash flow statement is the amount of dividends paid, which we can see across the companies in our list by using the filter below.

Getting stock issuance information

Here’s another example – this time, we’ll look at debt-related numbers across the cash flow statements.

Conclusion

That’s it for this post! Learn more about web scraping by checking out this online course on Udemy that I co-created with 365 Data Science! You’ll learn all about scraping data from different sources, downloading files programmatically, working with APIs, scraping JavaScript-rendered content, and more! Check it out here!


What is yahoo_fin?

Yahoo_fin is a Python 3 package designed to scrape historical stock price data, as well as to provide current information on market caps, dividend yields, and which stocks comprise the major exchanges. Additional functionality includes scraping income statements, balance sheets, cash flows, holder information, and analyst data. The package includes the ability to scrape live (real-time) stock prices, capture cryptocurrency data, and get the most actively traded stocks on a current trading day. Yahoo_fin also contains a module for retrieving option prices and expiration dates.

The latest version of yahoo_fin can also scrape earnings calendar history and has an additional module for scraping financial news RSS feeds.

If you like yahoo_fin and / or this blog, consider making a contribution here to support the project.

Table of Contents

For navigating yahoo_fin’s documentation, click on any of the links below.


To see in-depth examples, check out the my video series on YouTube or the following posts:


Two intro videos in the series are below.

Installation & Getting historical / real-time stock prices

Easily scraping ticker lists


Updates

Update: March 2021

yahoo_fin 0.8.8 was released in March 2021. This release contains a patch for the tickers_dow method.

Update: Feb. 2021

yahoo_fin 0.8.7 was released in Feb. 2021. This version adds a collection of new features.

  • Added several functions to extract historical earnings calendar data. These functions allow you to get the dates of previous earnings releases along with the expected and actual EPS. You can also get the next earnings date for a stock ticker or search for stocks that have earnings in particular date range or individual date. See get_earnings_history, get_earnings_for_date, and get_earnings_in_date_range in the stock_info module.
  • A new (experimental) module was added to scrape financial news data. Currently, this module can download the Yahoo Finance RSS news feeds for an input ticker.
  • Added functions to get currencies and futures data (get_currencies and get_futures, respectively)
  • Added a function, get_quote_data, which returns a collection of useful data on a stock ticker. It returns a dictionary containing over 70 elements, including current real-time price, company name, book value, P/E, current market state (Open / Closed), pre-market price (if applicable), post-market price (if applicable), and more.
  • You can also now get the current pre-market and post-market (after hours) prices directly (when available). See get_premarket_price and get_postmarket_price in the stock_info module.
  • New ticker lists can be pulled for several other exchanges: (NIFTY 50, NIFTYBANK, FTSE 100, FTSE 250, and Ibovespa)
  • A couple issues around specific tickers were fixed. For example, get_quote_table would throw an error for some tickers due to the different structure for those tickers (e.g. “CL=F”). This issue has been resolved.

  • Update: July 11, 2020

    Version 0.8.6 of yahoo_fin made the following changes:

  • By popular demand, quarterly data (in addition to previously available yearly data) can now be downloaded for balance sheets, cash flow statements, and income statements!
  • Earnings information can be pulled with the get_earnings function.
  • A bug affecting ticker extraction was fixed (thanks to Alvaro Ritter!)
  • The code extracting fundamentals data was rewritten to have greater stability.
  • The get_financials function was added, which allows you to more efficiently extract balance sheets, cash flows, and income statements for the same ticker all at once
  • The get_dividends and get_splits functions were added to easily download historical dividends and splits information. Thank you to Daniel Catlin for this contribution!


    Update: April 24, 2020

    This update to yahoo_fin occurred on April 24, 2020 (version 0.8.5). This version updated the get_stats function, as well as added the get_stats_valuation function. Follow the guidance in the installation section below to upgrade yahoo_fin to the latest version.

    Update: December 15, 2019

    An update to this package was pushed on December 15, 2019. This update fixes the issues caused by a recent change in Yahoo Finance’s website. If you have a previously installed version of yahoo_fin, please follow the guidance below to upgrade your installation using pip.

    Recommended Python Version

    A few methods in yahoo_fin require a package called requests_html as a dependency. Since requests_html requires Python 3.6+, you’ll need Python 3.6+ when installing yahoo_fin.

    yahoo_fin Installation

    Yahoo_fin can be installed using pip:

    If you have a previously installed version, you can upgrade like this:

    Requirements

    Yahoo_fin requires the following packages to be installed:

    With the exception of requests_html, these dependencies come pre-installed with Anaconda. requests_html requires Python 3.6+ and is needed for several of the functions in yahoo_fin, as described above. To install requests_html, you can use pip:

    Methods

    The yahoo_fin package has three modules. These are called stock_info, options, and news. stock_info has the below primary methods.

    The methods for options are listed below:

    The news module currently contains one method:

    stock_info module

    Any method from yahoo_fin’s stock_info module can be imported by running the follow line, with get_analysts_info replaced with the method of choice.

    Alternatively, all methods can be imported at once like so:

    get_analysts_info(ticker)

    Scrapes data from the Analysts page for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/analysts?p=NFLX. This includes information on earnings estimates, EPS trends / revisions etc.

    Returns a dictionary containing the tables visible on the ‘Analysts’ page.

    Possible parameters

    get_balance_sheet(ticker, yearly = True)

    Scrapes the balance sheet for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/balance-sheet?p=NFLX.

    Possible parameters

    get_cash_flow(ticker, yearly = True)

    Scrapes the cash flow statement for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/cash-flow?p=NFLX.

    Possible parameters

    get_currencies()

    Scrapes the currencies table Yahoo Finance: https://finance.yahoo.com/currencies

    get_data(ticker, start_date = None, end_date = None, index_as_date = True, interval = “1d”)

    Downloads historical price data of a stock into a pandas data frame. Offers the functionality to pull daily, weekly, or monthly data.

    Possible parameters

    If you want to filter by a date range, you can just add a value for the start_date and / or end_date parameters, like below:

    Get weekly or monthly historical price data:

    get_day_gainers()

    Scrapes the top 100 (at most) stocks with the largest gains (on the given trading day) from Yahoo Finance (see https://finance.yahoo.com/gainers).

    get_day_losers()

    Scrapes the top 100 (at most) worst performing stocks (on the given trading day) from Yahoo Finance (see https://finance.yahoo.com/losers).

    get_day_most_active()

    Scrapes the top 100 most active stocks (on the given trading day) from Yahoo Finance (see https://finance.yahoo.com/most-active).

    get_dividends(ticker, start_date = None, end_date = None, index_as_date = True)

    Downloads historical dividend data of a stock into a pandas data frame.

    Possible parameters

    get_earnings(ticker)

    Scrapes earnings information from Yahoo Finance’s financials page for a given ticker (see https://finance.yahoo.com/quote/NFLX/financials?p=NFLX). Returns a dictionary with quarterly actual vs. estimated earnings per share, quarterly revenue / earnings data, and yearly revenue / earnings data.

    Possible parameters

    get_earnings_for_date(ticker)

    Returns a list of dictionaries. Each dictionary contains a ticker, its corresponding EPS estimate, and the time of the earnings release.
    Possible parameters

    get_earnings_history(ticker)

    Scrapes earnings history information from Yahoo Finance’s financials page for a given ticker. Returns a list of dictionaries with quarterly actual vs. estimated earnings per share along with dates of previous earnings releases. Currently, this method can pull back data for over 20 years.

    Possible parameters

    get_earnings_in_date_range(ticker)

    Returns a list of dictionaries. Each dictionary contains a ticker, its corresponding EPS estimate, and the time of the earnings release. The data is returned based upon what earnings occur in the input date range. The date range is inclusive of the start_date and end_date inputs.
    Possible parameters

    get_financials(ticker, yearly = True, quarterly = True)

    Efficient method to scrape balance sheets, cash flow statements, and income statements in a single call from Yahoo Finance’s financials page for a given ticker (see https://finance.yahoo.com/quote/NFLX/financials?p=NFLX).

    If you’re looking to get all of this information for a given ticker, or set of tickers, this function will be 3x faster than running get_balance_sheet, get_cash_flow, and get_income_statement separately. Yearly, quarterly, or both time-periods can be pulled.

    Returns a dictionary with the following keys:

    If yearly = True:

  • yearly_income_statement
  • yearly_balance_sheet
  • yearly_cash_flow

  • If quarterly = True:

  • quarterly_income_statement
  • quarterly_balance_sheet
  • quarterly_cash_flow

  • If yearly and quarterly are both set to be True, all six key-value pairs are returned.

    Possible parameters

    get_futures()

    Returns the table of futures prices from Yahoo Finance here: https://finance.yahoo.com/commodities

    get_holders(ticker)

    Scrapes data from the Holders tab from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/holders?p=NFLX for an input ticker.

    Possible parameters

    get_income_statement(ticker, yearly = True)

    Scrapes the income statement for the input ticker, which includes information on Price / Sales, P/E, and moving averages (e.g. https://finance.yahoo.com/quote/NFLX/financials?p=NFLX.

    Possible parameters

    get_live_price(ticker)

    Scrapes the live quote price for the input ticker.

    Possible parameters

    get_market_status()

    Returns a status specifying whether the market is currently pre-market (“PRE”), open (“OPEN”), post-market (“POST”), or closed (“CLOSED”).

    get_next_earnings_date(ticker)

    Returns the next upcoming earnings date for a given ticker.

    Possible parameters

    get_premarket_price(ticker)

    Returns the premarket price for a given ticker if available / applicable.

    Possible parameters

    get_postmarket_price(ticker)

    Returns the postmarket price for a given ticker if available / applicable.

    Possible parameters

    get_quote_data(ticker)

    Scrapes a collection of over 70 data points for an input ticker from Yahoo Finance (e.g. https://query1.finance.yahoo.com/v7/finance/quote?symbols=NFLX), including current real-time price, company name, book value, 50-day average, 200-day average, pre-market price / post-market price (if available), shares outstanding, and more. The results are returned as a dictionary.

    Possible parameters

    get_quote_table(ticker , dict_result = True)

    Scrapes the primary table found on the quote page of an input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/AAPL?p=AAPL)

    Possible parameters

    The following fields with their corresponding values are returned:

    get_splits(ticker, start_date = None, end_date = None, index_as_date = True)

    Downloads historical stock splits data of a stock into a pandas data frame.

    Possible parameters

    get_stats(ticker)

    Scrapes data off the statistics page for the input ticker, which includes information on moving averages, return on equity, shares outstanding, etc. (e.g. https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX.

    Possible parameters

    get_stats_valuation(ticker)

    Scrapes the “Valuation Measures” data off the statistics page for the input ticker, which includes information on Price / Sales, P/E, and market cap (e.g. https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX.

    Possible parameters

    get_top_crypto(ticker)

    Scrapes data for top 100 cryptocurrencies by market cap (see https://finance.yahoo.com/cryptocurrencies).

    Possible parameters

    get_undervalued_large_caps

    Returns the table of the top 100 undervalued large caps from Yahoo Finance here: https://finance.yahoo.com/screener/predefined/undervalued_large_caps?offset=0&count=100

    tickers_dow(include_company_data = False)

    If no parameters are passed, returns a list of tickers currently listed on the Dow Jones. The tickers are scraped from Wikipedia (see https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average. If you set include_company_data = True, it will return the full table on this webpage.

    tickers_ftse100(include_company_data = False)

    If no parameters are passed, returns a list of tickers currently listed on the FTSE 100 index. Otherwise, setting include_company_data = True will return a table with ticker, sector, and company name. The tickers are scraped from here: https://en.wikipedia.org/wiki/FTSE_100_Index.

    tickers_ftse250(include_company_data = False)

    Yahoo Finance Web Scraping Tool

    If no parameters are passed, returns a list of tickers currently listed on the FTSE 250 index. Otherwise, setting include_company_data = True will return a table with ticker and company name. The tickers are scraped from here: https://en.wikipedia.org/wiki/FTSE_250_Index.

    tickers_nasdaq(include_company_data = False)

    Returns a list of tickers currently listed on the NASDAQ. If you specify include_company_data = True, it will return a table containing the tickers, their corresponding company names, and several other attributes. This method, along with tickers_other, works by scraping text files from ftp://ftp.nasdaqtrader.com/SymbolDirectory/.

    tickers_nasdaq scrapes the nasdaqlisted.txt file from the link above, while tickers_other scrapes the otherlisted.txt file.

    tickers_nifty50(include_company_data = False)

    Returns a list of tickers currently listed on the NIFTY50. This method scrapes the tickers from here: https://en.wikipedia.org/wiki/NIFTY_50. If include_company_data is set to True, a table containing the tickers and company names is returned.

    tickers_niftybank()

    Returns a list of tickers currently listed on the NIFTYBANK. No parameters need to be passed.

    tickers_other(include_company_data = False)

    See above description for tickers_nasdaq.

    tickers_sp500(include_company_data = False)

    Returns a list of tickers currently listed in the S&P 500. The data for this is scraped from Wikipedia:

    If include_company_data is set to True, the tickers, company names, and sector information is returned as a data frame.


    options module

    We can import any method from options module like this:

    Just replace get_options_chain with any other method. Also, we can import all methods at once like so:

    get_calls(ticker, date = None)

    Scrapes call options data for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/options?p=NFLX.

    Returns a pandas data frame containing the call options data for the given ticker and expiration date.

    Possible parameters

    get_expiration_dates(ticker)

    Scrapes expiration dates for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/options?p=NFLX.

    Returns a list of expiration dates for the input ticker. This list is based off the drop-down selection box on the options data webpage for the input ticker.

    Possible parameters

    get_options_chain(ticker, date)

    Scrapes calls and puts tables for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/options?p=NFLX.

    Returns a dictionary with two data frames. The keys of the dictionary are labeled calls (which maps to the calls data table) and puts (which maps to the puts data table).

    Possible parameters

    get_puts(ticker, date = None)

    Scrapes put options data for the input ticker from Yahoo Finance (e.g. https://finance.yahoo.com/quote/NFLX/options?p=NFLX.

    Returns a pandas data frame containing the put options data for the given ticker and expiration date.

    Possible parameters

    yahoo_fin news module

    Currently the news module contains a single function, get_yf_rss, which retrieves the Yahoo Finance news RSS feeds for an input ticker.


    To learn more about Python and / or open source coding, check out a new online Python course I co-created with 365 Data Science! You’ll learn all about web scraping, how to use APIs in Python, how to scrape JavaScript pages, and how to deal with other modern challenges like logging into websites! Check it out on Udemy here!