'Retrieve Multiple Strings Out Of HTML TD, Delimit The Strings and Join Them With Loop of TD Elements?
So I'm trying to learn how to use Beautiful Soup to get data from a website that has joined key strings into one block.
<html>
<head>
<center>
<font face="arial" size="5">
<table border="0" cellpadding="0" cellspacing="0" width="100%" bgcolor="#000066">
<tr>
<td align="left" valign="top" bgcolor="#000066">
<a href="/"><img height="50" width="540" src="/leftbar-quote.gif" border="0" usemap="#leftbar10b39c7"></a>
<map name="leftbar10b39c7"><area href="/outside/multi.htm" coords="328,5,390,36" shape="rect">
<area href="/index.htm" coords="254,5,322,37" shape="rect">
<area href="#" coords="185,5,251,35" shape="rect" onclick="history.back(); return false;">
<area href="/cgi-bin/quoteForm.cgi?type=q&sEmail=&part=Engine&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&name=AutoPartex.net&int=-1&uIMS=&userSearch=exact&seqNum=600000000000000000456918622&ref=&userid=1000&email=&userClaim=&userLang=&userZip=&selleruserid=1000" coords="400,5,460,36" shape="rect">
<area href="/buyerfaq.htm" coords="470,5,530,36" shape="rect">
</map>
</td>
<td valign=top><div align="right"><img height="50" width="36" src="/result-rs.gif"></div></td>
</tr>
<tr>
<td COLSPAN=2><table WIDTH="100%"><tr>
<td width="10" valign="top"><img height="30" width="10" src="/trans4.gif"></td>
<td width="90%">
<b>
<div style='font-size:18pt; font-style: italic; color: white;'><b>Results sorted by <u>PRICE</u></b> <span class="small"><b>(Click on heading to re-sort)</b></span><br /></div><font color='#FFFFFF' face='Arial,Helvetica,Geneva,Swiss,SunSans-Regular' size='2'>Click back to modify your previous choice.<br>Most prices do not include extended warranties or shipping.<br>Not all displayed parts are interchangeable. Please verify with the recycler that the part fits your auto.<br /></font></b></td><td valign=bottom align=center><table bgcolor="#e4e4e4"width=350 cellpadding=3 border=1 cellspacing=0><tr><td align=center><form method="post" action="/cgi-bin/search.cgi" style="display: inline"><input type= hidden name=userDate value="2005"><input type= hidden name=userModel value="Ford Focus"><input type= hidden name=userLocation value="USA"><input type= hidden name=userPreference value="price"><input type= hidden name=userZip value=""><input type="hidden" name="userPage" value="1"><input type="hidden" name="userInterchange" value="None"><input type="hidden" name="userDate2" value="Ending Year"><input type="hidden" name="userSearch" value="int"><input type="hidden" NAME="userClaim" VALUE="">
<input type="hidden" NAME="userClaimer" VALUE="">
<input type="hidden" NAME="userLang" VALUE="">
<input type="hidden" NAME="userLat" VALUE="">
<input type="hidden" NAME="userLong" VALUE="">
<input type="hidden" NAME="userCSA" VALUE="">
<input type="hidden" NAME="userMCO" VALUE="">
<input type="hidden" NAME="userAdjuster" VALUE="">
<input type="hidden" NAME="userItem" VALUE="">
<input type="hidden" NAME="hpsDate" VALUE="">
<input type="hidden" NAME="hpsGroup" VALUE="">
<input type="hidden" NAME="reqId" VALUE="">
<input type="hidden" NAME="thirdMapType" VALUE="">
<input type="hidden" NAME="vendUrl" VALUE="">
<input type="hidden" NAME="iCN" VALUE="">
<input type='hidden' name='limitYears' value=''>
<input type='hidden' name='userIntSelect' value='711575'>
<input type='hidden' name='userVIN' value=''>
<input type='hidden' name='vinSearch' value='0'>
<input type='hidden' name='userVINModelID' value=''>
<input type="hidden" name="uID" value=""><input type="hidden" name="uPass" value=""><table bgcolor="#e4e4e4" width=350 cellpadding=3 border=1 cellspacing=0><tr><td colspan=2 align=center>2005 Ford Focus<br>Engine<br></td></tr><tr>
<td align=center>
<font style="font-size: 10pt">Non-Interchange search for year:<br></font>
<font style="font-size: 10pt"><b>2005</b><br><br></font>
<br>
<br><font style="font-size: 8pt"><a style="color:blue" href="/cgi-bin/search.cgi?userDate=2005&userModel=Ford%20Focus&userPart=Engine&origPart=&userPreference=price&userZip=&userLat=&userLong=&userVIN=&dbPart=300.1&userIntSelect=711575&userClaimer=&userClaim=&uID=&uPass=&userLocation=USA&userSearch=int">Click Here</a> to see All Interchange Choices </font>
</td>
</table></table></form>
</td></tr></table></td></tr></table><table width="100%" border="1" cellspacing="0" cellpadding="4">
<tr align=center>
<td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=year&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Year</a><br>Part<br>Model</td>
<td>Description</td>
<td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=miles&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Miles</a></td>
<td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=grade&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Part <br> Grade</a></td> <td>Stock#</td>
<td>US<br>Price</td>
<td>Dealer Info</td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td><a href=""><img width="100" hspace="3" align="middle" onclick="return popupImg('seller=2013&partGUID=2013-1-282435&vehicleGUID=2013-1-V18432&display=2005%20Ford%20Focus%20Engine%20Assembly-Stock%23%2010286')" src="http://wsimgoh.autopartex.net/2013/2015/10286/2013_18432_05_thumb.jpg"></img></a>ZX4,2.0,EFI,FATO,FWDRUNSGREAT</td><td align=right> </td><td align=center> </td><td>10286</td><td align=center>$350550</td><td><A HREF="http://www.LaPointAuto.com" target="_top">LaPoint Discount MIDW</A> USA-OH(Holland) <A HREF="/cgi-bin/quoteForm.cgi?type=g&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=10286&price=350550&desc=ZX4%2C2.0%2CEFI%2CFATO%2CFWDRUNSGREAT&name=LaPoint%20Discount%20MIDW&url=http://www.LaPointAuto.com&int=-1&broker=0&recycler=0&selleruserid=2013&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 419-865-2329 / 800-845-0270 <A HREF="/cgi-bin/quoteForm.cgi?type=i&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=10286&price=350550&desc=ZX4%2C2.0%2CEFI%2CFATO%2CFWDRUNSGREAT&name=LaPoint%20Discount%20MIDW&url=http://www.LaPointAuto.com&int=-1&broker=0&recycler=0&selleruserid=2013&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=2013&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=350550&pst=10286&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td>TESTED,2.3L,5MT,08/04,FWD,+CORE</td><td align=right> </td><td align=center> </td><td>E94764</td><td align=center>$1500</td><td><A HREF="http://www.ParadiseAutoParts.com" target="_top">Paradise Auto Parts-ELITE</A> USA-MD(Elkton) <A HREF="/cgi-bin/quoteForm.cgi?type=g&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=E94764&price=1500&desc=TESTED%2C2.3L%2C5MT%2C08%2F04%2CFWD%2C%2BCORE&name=Paradise%20Auto%20Parts-ELITE&url=http://www.ParadiseAutoParts.com&int=-1&broker=0&recycler=0&selleruserid=2843&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 888-811-5051/410-620-5051 <A HREF="/cgi-bin/quoteForm.cgi?type=i&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=E94764&price=1500&desc=TESTED%2C2.3L%2C5MT%2C08%2F04%2CFWD%2C%2BCORE&name=Paradise%20Auto%20Parts-ELITE&url=http://www.ParadiseAutoParts.com&int=-1&broker=0&recycler=0&selleruserid=2843&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=2843&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=1500&pst=E94764&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td>175-175</td><td align=right>38,916</td><td align=center>A</td><td>FC6555</td><td align=center>$1250</td><td><A HREF="http://www.DonsSportcar.com" target="_top">Don's Sportcar</A> USA-CO(Pueblo) <A HREF="/cgi-bin/quoteForm.cgi?type=g&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=FC6555&price=1250&desc=175-175&name=Don's%20Sportcar&url=http://www.DonsSportcar.com&int=-1&broker=0&recycler=0&selleruserid=3776&miles=38.916&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 800-332-3649 <A HREF="/cgi-bin/quoteForm.cgi?type=i&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=FC6555&price=1250&desc=175-175&name=Don's%20Sportcar&url=http://www.DonsSportcar.com&int=-1&broker=0&recycler=0&selleruserid=3776&miles=38.916&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=3776&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=1250&pst=FC6555&pgr=A&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr>
</table>
</div>
</body> </html>
This is the html text and structure. Here's what I actually need help with in terms of approach:
With no css decorators I'm not able to locate with traditional examples found using
xpathor something likeselenium.I needed for text that's in a cell to be seperated into seperate strings.
Using BeautifulSoup I tried usings a few methods to get the text
After trying something like this I'm getting this error:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("./test.html"), "lxml")
trs = soup.find_all('tr')
for tr in trs:
tds = tr.find_all("td")
try:
result = str(tds[0].get_text())
except:
adjust = ' '
continue
result = result.split(" ")
result = str.replace('2005Engine', "2005Engine", "2005 ") + str.replace('AssemblyFord', "AssemblyFord", "Engine Assembly ") + str.repl$
strresult = ''.join(result)
trs = soup.find_all('tr')
for tr in trs:
tds = tr.find_all("td")
tds[0] = strresult
tds.get_text()
print(tds)
ERROR MESSAGE:
Traceback (most recent call last):
File "carpartbs5.find.td.py", line 33, in <module>
tds.get_text()
File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1807, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Here is the flip-soup:
When I just print the tds, it replaces the first td using any array, however, whenever I try and just return text using the get_text() method from BeautifulSoup, it throws that error. The error seems to state that I have a problem calling a method on something that's not possible.
So I'm not really clear on lists and strings. I tried converting my list to an actual string towards the end and it does not work. I thought because I'm using a list that that's the reason why it cannot get the text. If so, is there a better way using BeautifulSoup to achieve the goal of:
- Getting individual text out of these positions in each element
- Join them in a nicely comma delimited string result?
Hopefully this helps, I don't have enough points to post pics or upload files. The last text is what my program spit out IF I DON'T TRY AND CALL A BEAUTIFULSOUP METHOD ON THE tds variable.
My code:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("./test.html"), "lxml")
trs = soup.find_all('tr')
for tr in trs:
tds = tr.find_all("td")
try:
result = str(tds[0].get_text())
except:
adjust = ' '
continue
result = result.split(" ")
result = str.replace('2005Engine', "2005Engine", "2005 ") + str.replace('AssemblyFord', "AssemblyFord", "Engine Assembly ") + str.repl$
strresult = ''.join(result)
trs = soup.find_all('tr')
for tr in trs:
tds = tr.find_all("td")
tds[0] = strresult
print(tds)'
What Was Returned - A Sample:
['2005 Engine Assembly Ford Focus ', <td>139K</td>, <td align="right">\xa0</td>, <td align="center">\xa0</td>, <td>0232</td>, <td align="center">$800</td>, <td><a href="http://someurl.com" target="_top">Chads Part </a> USA-FL(Jacksonville) <a href="/cgi-bin/quoteForm.cgi?type=g&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=0232&price=800&desc=139K&name=Chads%20Parts&url=http://someurl.com&int=-1&broker=0&recycler=0&selleruserid=3566&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</a> 1-510-569-4845 <a href="/cgi-bin/quoteForm.cgi?type=i&[email protected]&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=0232&price=800&desc=139K&name=Chads%20Parts=rs&url=http://someurl.com&int=-1&broker=0&=0&selleruserid=3566&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</a><br/><a href="http://someurl.com/cgi-bin/applet.cgi?sid=3566&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=800&pst=0232&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick="window.open(this.href,this.target,getPrm()); return false" target="_blank"><img border="0" src="/images/LiveChat_space.gif"/></a></td>]
Just to reinforce:
I only want to get the text out of these elements to be delimited with commas into one string that I can work on again as I prep to write a csv file.
Year, Part, Car Make, Car Model, Description, Miles, Part Grade, Stock #, Price, Dealer Name, Country, State, City, Phone
- The first cell and the last are the hardest to figure out how to get the strings out, into a list and back to a string in that same order above.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
