'Scrapy Extract Dynamic Table Data from Datasource directly
using scrapy I want to extract the data that is shown in a dynamic table on the webpage. As the table is a dynamic one - scrapy's response xpath to tbody-tag doesn't return any data
In [1]: response.xpath('//table/tbody').getall()
Out[1]: ['<tbody></tbody>']
On the other hand scrapy's response xpath to table-tag actually already contains all data - even in a structured way:
In [2]: response.xpath('//table').getall()
Out[2]: ['<table class="table icms-dt rs_preserve" cellspacing="0" width="100%" id="publikation" data-webpack-module="datatables" data-entity-type="publikation" data-entities="{"emptyColumns":["privatKategorie","_thumbnail"],"data":[{"name":"<a href=\\"\\/_rte\\/publikation\\/35897\\">Nutzungsbedingungen<\\/a>","name-sort":"nutzungsbedingungen","herausgeber":"Informatikdienst","herausgeber-sort":"informatikdienst","datum":"16.12.2010","datum-sort":"2010-12-16","kategorieId":"publikation","kategorieId-sort":"publikation","privatKategorie":"","privatKategorie-sort":"","_thumbnail":"","_downloadBtn
I want to extract the table data in a structured way - e.g. by row and column. Is there a way with BeautifulSoup for instance? Any idea & help are highly appreciated.
The table can be examined with scrapy shell as follows:
scrapy shell "rapperswil-jona.ch/publikationen"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
