'Python/BeautifulSoup: How do I stop treating a list of elements like a single element?
I'm new to Python, and to BeautifulSoup. I'm trying to parse some datasets from a state vendor, which they export in badly-formed HTML. I've successfully gotten one file parsed and exported to CSV, so I was feeling pretty good. On another HTML file, however, the vendor decided to split the data out into multiple tables, separated by wide <td colspan> "header" groups. I have attached a redacted screenshot (at bottom) in case anyone has a better idea about what to do than what I am trying:
I am trying to use BeautifulSoup's decompose() to discard those colspan tags completely, thereby joining the data into one big table that I can parse. In order to find those colspans and decompose them, I'm using some code suggested in this post, and getting an error, and I am too ignorant to figure out how to get this working.
with open ("file.html") as fp:
soup = BeautifulSoup(fp, 'html5lib', from_encoding="us-ascii")#, parse_only=SoupStrainer('td'))
table = soup.find_all('table')
for tdcol in table.select('td[colspan]'):
tdcol.parent.decompose()
print(table)
is returning AttributeError: ResultSet object has no attribute 'select'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
My question: would someone be kind enough to explain to me what this error means, and what I need to do to address it? I've read the documentation. I've read Stack Overflow posts. I know I'm just not understanding something fundamental. I would really appreciate some helpful guidance.
Solution 1:[1]
[keithpjolley's answer works for me]
table is a list (or list like - not sure off the top of my head) when you assign it with find_all. In the next line you treat it like it's not a list, with table.select. i'd use something like for table in soup.find_all('tr'): for tdcol in table.select(...): – keithpjolley
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
