Why Is This Tag Empty When Parsed With Beautiful Soup?
Solution 1:
There is no <tbody>
tag in the HTML.
If you look at the page with a browser (e.g. with Chrome developer tools) it looks like there is a <tbody>
tag, but that's a fake tag inserted into the DOM by Chrome.
Try omitting both tags in your search chain. I am certain the first one isn't there and (although the HTML is hard to read) I'm pretty sure the second isn't there either.
Update: Here are the HTML beginning with the table you are interested in:
<TABLE class="yfnc_tabledata1" width="100%" cellpadding="0" cellspacing="0" border="0">
<TR>
<TD>
<TABLE width="100%" cellpadding="2" ...>
<TR class="yfnc_modtitle1" style="border-top:none;">
<td colspan="2" style="border-top:2px solid #000;">
<small><span class="yfi-module-title">Period Ending</span></small>
</td>
<th scope="col" style="border-top:2px ...">27/09/2014</th>
<th scope="col" style="border-top:2px ...">28/06/2014</th>
...
so no <tbody>
tags.
Solution 2:
Let's be specific and practical.
The idea is to find the Total Revenue
label and get the next cell's text using .next_sibling
:
table = soup.find("table", class_="yfnc_tabledata1")
total_revenue_label = table.find(text=re.compile(r'Total Revenue'))
print total_revenue_label.parent.parent.next_sibling.get_text(strip=True)
Demo:
>>> import re
>>> import requests
>>> import bs4
>>>
>>> page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
>>> soup = bs4.BeautifulSoup(page.content)
>>>
>>> table = soup.find("table", class_="yfnc_tabledata1")
>>> total_revenue_label = table.find(text=re.compile(r'Total Revenue'))
>>> total_revenue_label.parent.parent.next_sibling.get_text(strip=True)
42,123,000
Solution 3:
To answer your general question:
I suggest book "Mining the Social Web" second edition. Specially chapter 5 - "Mining Web Pages".
Source code for the book is available here on github.
Solution 4:
I think there are probably better ways of getting the data you want? It's been provided for free for a number of years by a number of institutions, e.g. is the information you want in here somewhere?
Post a Comment for "Why Is This Tag Empty When Parsed With Beautiful Soup?"