python - 使用Python进行Web抓取数据时的AttributeError

python - 使用Python进行Web抓取数据时的AttributeError,第1张

我试图访问this URL中表格中的数据。我使用的是以下代码,但我遇到了行AttributeError: 'NoneType' object has no attribute 'find'中的错误data = iter(soup.find("table", {"class": "xtTblCon"}).find("div", {"id": "MATURITYY%"}).find_all_next("li"))。代码如下:

from bs4 import BeautifulSoup
import requests

r = requests.get(
"http://appsso.eurostat.ec.europa.eu/nui/submitViewTableAction.do")
soup = BeautifulSoup(r.content)

data = iter(soup.find("table", {"class": "xtTblCon"}).find("div", {"id": "MATURITYY%"}).find_all_next("li"))

修改:抱歉,这是original URL。对不起,我必须"定制"通过点击"时间"工具栏并检查所有年份到2007年。有没有办法收集所有这些数据?

谢谢

最佳答案:

2 个答案:

答案 0 :(得分:1)

列名重复,所以我使用OrderedDict保存顺序并删除dupes,行在子列表中分组,maturity列在一个列表中,每行匹配一行:

data = soup.find("table",{"class":"xTable" })

from collections import OrderedDict
headers = OrderedDict.fromkeys(s["title"] for s in soup.find("div", {"class":"xtRowCon"}).find_all("span"))

rows = [[ele.text.strip() for ele in tag.find_all("td")] for tag in data.find_all("tr")]

maturity = [ele.find("span",{"class":"label_MATURITY"}).text.strip() for ele in soup.find("div",{"class":"xtTblCon"}).find_all("li")]


print(headers.keys())
print(rows)
print(maturity)

输出:

['2015M05D27', '2015M05D28', '2015M05D29', '2015M06D01', '2015M06D02', '2015M06D03', '2015M06D04', '2015M06D05', '2015M06D08', '2015M06D09']

[[u'-0.24', u'-0.26', u'-0.25', u'-0.25', u'-0.25', u'-0.22', u'-0.25', u'-0.24', u'-0.23', u'-0.22'], [u'-0.22', u'-0.23', u'-0.23', u'-0.23', u'-0.20', u'-0.18', u'-0.20', u'-0.19', u'-0.18', u'-0.16'], [u'-0.15', u'-0.15', u'-0.16', u'-0.16', u'-0.11', u'-0.07', u'-0.09', u'-0.07', u'-0.07', u'-0.04'], [u'-0.04', u'-0.05', u'-0.06', u'-0.07', u'0.01', u'0.08', u'0.06', u'0.08', u'0.09', u'0.13'], [u'0.08', u'0.07', u'0.06', u'0.05', u'0.15', u'0.24', u'0.23', u'0.25', u'0.26', u'0.31'], [u'0.21', u'0.20', u'0.18', u'0.17', u'0.29', u'0.41', u'0.40', u'0.42', u'0.43', u'0.49'], [u'0.34', u'0.32', u'0.30', u'0.29', u'0.43', u'0.56', u'0.56', u'0.58', u'0.59', u'0.66'], [u'0.46', u'0.43', u'0.42', u'0.40', u'0.55', u'0.70', u'0.70', u'0.72', u'0.74', u'0.81'], [u'0.57', u'0.54', u'0.52', u'0.50', u'0.66', u'0.82', u'0.82', u'0.85', u'0.87', u'0.94'], [u'0.66', u'0.63', u'0.61', u'0.59', u'0.75', u'0.93', u'0.93', u'0.95', u'0.98', u'1.06'], [u'0.74', u'0.71', u'0.69', u'0.67', u'0.84', u'1.02', u'1.02', u'1.05', u'1.07', u'1.16'], [u'0.81', u'0.78', u'0.76', u'0.74', u'0.91', u'1.11', u'1.10', u'1.13', u'1.16', u'1.24'], [u'0.88', u'0.84', u'0.82', u'0.80', u'0.97', u'1.18', u'1.17', u'1.20', u'1.23', u'1.32'], [u'0.93', u'0.90', u'0.88', u'0.85', u'1.03', u'1.24', u'1.23', u'1.26', u'1.29', u'1.38'], [u'0.98', u'0.95', u'0.92', u'0.90', u'1.08', u'1.29', u'1.29', u'1.32', u'1.35', u'1.44'], [u'1.02', u'0.99', u'0.97', u'0.94', u'1.12', u'1.34', u'1.33', u'1.36', u'1.40', u'1.49'], [u'1.06', u'1.03', u'1.00', u'0.98', u'1.16', u'1.39', u'1.37', u'1.41', u'1.44', u'1.53'], [u'1.10', u'1.06', u'1.04', u'1.01', u'1.19', u'1.42', u'1.41', u'1.44', u'1.48', u'1.57'], [u'1.13', u'1.09', u'1.07', u'1.04', u'1.22', u'1.46', u'1.45', u'1.48', u'1.51', u'1.61'], [u'1.16', u'1.12', u'1.09', u'1.06', u'1.25', u'1.49', u'1.48', u'1.51', u'1.54', u'1.64'], [u'1.18', u'1.15', u'1.12', u'1.09', u'1.27', u'1.52', u'1.50', u'1.53', u'1.57', u'1.67'], [u'1.20', u'1.17', u'1.14', u'1.11', u'1.30', u'1.54', u'1.53', u'1.56', u'1.60', u'1.69'], [u'1.22', u'1.19', u'1.16', u'1.13', u'1.32', u'1.56', u'1.55', u'1.58', u'1.62', u'1.72'], [u'1.24', u'1.21', u'1.18', u'1.15', u'1.34', u'1.59', u'1.57', u'1.60', u'1.64', u'1.74'], [u'1.26', u'1.23', u'1.20', u'1.17', u'1.35', u'1.61', u'1.59', u'1.62', u'1.66', u'1.76'], [u'1.28', u'1.24', u'1.21', u'1.18', u'1.37', u'1.62', u'1.61', u'1.64', u'1.68', u'1.78'], [u'1.29', u'1.26', u'1.23', u'1.20', u'1.38', u'1.64', u'1.62', u'1.66', u'1.70', u'1.80'], [u'1.31', u'1.27', u'1.24', u'1.21', u'1.40', u'1.66', u'1.64', u'1.67', u'1.71', u'1.81'], [u'1.32', u'1.28', u'1.26', u'1.22', u'1.41', u'1.67', u'1.65', u'1.69', u'1.73', u'1.83'], [u'1.33', u'1.30', u'1.27', u'1.23', u'1.42', u'1.68', u'1.67', u'1.70', u'1.74', u'1.84']]

[u'Maturity: 1 year', u'Maturity: 2 years', u'Maturity: 3 years', u'Maturity: 4 years', u'Maturity: 5 years', u'Maturity: 6 years', u'Maturity: 7 years', u'Maturity: 8 years', u'Maturity: 9 years', u'Maturity: 10 years', u'Maturity: 11 years', u'Maturity: 12 years', u'Maturity: 13 years', u'Maturity: 14 years', u'Maturity: 15 years', u'Maturity: 16 years', u'Maturity: 17 years', u'Maturity: 18 years', u'Maturity: 19 years', u'Maturity: 20 years', u'Maturity: 21 years', u'Maturity: 22 years', u'Maturity: 23 years', u'Maturity: 24 years', u'Maturity: 25 years', u'Maturity: 26 years', u'Maturity: 27 years', u'Maturity: 28 years', u'Maturity: 29 years', u'Maturity: 30 years']

如果您想对每个成熟度的行进行分组,您可以使用OrderedDict创建adict以保持顺序:

print(OrderedDict(zip(maturity,rows)))

OrderedDict([(u'Maturity: 1 year', [u'-0.24', u'-0.26', u'-0.25', 
u'-0.25', u'-0.25', u'-0.22', u'-0.25', u'-0.24', u'-0.23', 
u'-0.22']), (u'Maturity: 2 years', [u'-0.22', u'-0.23', u'-0.23', 
u'-0.23', u'-0.20', u'-0.18', u'-0.20', u'-0.19', u'-0.18',    u'-0.16']), (u'Maturity: 3 years', [u'-0.15', u'-0.15', u'-0.16', 
u'-0.16', u'-0.11', ..........................

答案 1 :(得分:-1)

首先,没有类xtTblCon的表。它实际上是一个div元素。更改表格'到了' div。 其次,没有id为MATURITYY%的div。

DABAN RP主题是一个优秀的主题,极致后台体验,无插件,集成会员系统
U19学习网站 » python - 使用Python进行Web抓取数据时的AttributeError