'No connection adapters were found for using pythons goose3 library to strip simple html code
I am trying to use Python goose3 library to strip simple HTML code from a simple python string.
Is there an easier and better way to do this hopefully not involving beautiful soup as this for some reason does not work on AWS or other cloud platforms? See the error I get below when I try this
No connection adapters were found
See my code below
from goose3 import Goose
def main():
try:
g = Goose()
str = '<table>xxxxxxx full text below</table>'
article = g.extract(str)
output = article.cleaned_text
print(output)
except Exception as e:
print(str(e))
main()
see the string that I am trying to remove HTML from below
<table border="\"0\"" width="\"100%\"">
<tbody>
<tr>
<td>
<table style="height: 1081px;">
<tbody>
<tr style="height: 54px;">
<td style="width: 88.4531px; height: 54px;">Funding Opportunity ID:</td>
<td style="width: 1862.23px; height: 54px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Opportunity Number:</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Opportunity Title:</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Opportunity Category:</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 54px;">
<td style="width: 88.4531px; height: 54px;">Opportunity Category Explanation:</td>
<td style="width: 1862.23px; height: 54px;"> </td>
</tr>
<tr style="height: 54px;">
<td style="width: 88.4531px; height: 54px;" valign="\"top\"">Funding Instrument Type:</td>
<td style="width: 1862.23px; height: 54px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 54px;">
<td style="width: 88.4531px; height: 54px;" valign="\"top\"">Category of Funding Activity:</td>
<td style="width: 1862.23px; height: 54px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;" valign="\"top\"">Category Explanation:</td>
<td style="width: 1862.23px; height: 36px;"> </td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;" valign="\"top\"">CFDA Number(s):</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;" valign="\"top\"">Eligible Applicants:</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)</td>
</tr>
<tr style="height: 72px;">
<td style="width: 88.4531px; height: 72px;" valign="\"top\"">Additional Information on Eligibility:</td>
<td style="width: 1862.23px; height: 72px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;" valign="\"top\"">Agency Code:</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;" valign="\"top\"">Agency Name:</td>
<td style="width: 1862.23px; height: 36px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Posted Date:</td>
<td style="width: 1862.23px; height: 36px;">Mar 21, 2022</td>
</tr>
<tr style="height: 18px;">
<td style="width: 88.4531px; height: 18px;">Close Date:</td>
<td style="width: 1862.23px; height: 18px;">May 16, 2022</td>
</tr>
<tr style="height: 54px;">
<td style="width: 88.4531px; height: 54px;">Last Updated Date:</td>
<td style="width: 1862.23px; height: 54px;">Mar 20, 2022</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Award Ceiling:</td>
<td style="width: 1862.23px; height: 36px;">$150,000</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Award Floor:</td>
<td style="width: 1862.23px; height: 36px;">$50,000</td>
</tr>
<tr style="height: 72px;">
<td style="width: 88.4531px; height: 72px;">Estimated Total Program Funding:</td>
<td style="width: 1862.23px; height: 72px;"> </td>
</tr>
<tr style="height: 54px;">
<td style="width: 88.4531px; height: 54px;">Expected Number of Awards:</td>
<td style="width: 1862.23px; height: 54px;">5</td>
</tr>
<tr style="height: 145px;">
<td style="width: 88.4531px; height: 145px;">Description:</td>
<td style="width: 1862.23px; height: 145px;">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</td>
</tr>
<tr style="height: 18px;">
<td style="width: 88.4531px; height: 18px;">Version:</td>
<td style="width: 1862.23px; height: 18px;">2</td>
</tr>
<tr style="height: 36px;">
<td style="width: 88.4531px; height: 36px;">Modification Comments:</td>
<td style="width: 1862.23px; height: 36px;"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</p>
<p> </p>
Any helpful suggestions are welcome
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
