'Scrape Data from React Grid using Selenium
I am trying to scrape the following map from maphub.net https://maphub.net/Cen4infoRes/russian-ukraine-monitor.
On the right hand section of the map, a react grid/list loads. I want to click on a particular item in the list, at which point a number of sub items appear. The following is an extract of code for the react grid.
<div aria-label="grid" aria-readonly="true" class="ReactVirtualized__Grid ReactVirtualized__List" role="grid" tabindex="0" style="box-sizing: border-box; direction: ltr; height: 1510px; position: relative; width: 299px; will-change: transform; overflow: hidden;">
<div class="ReactVirtualized__Grid__innerScrollContainer" role="rowgroup" style="width: auto; height: 496px; max-width: 299px; max-height: 496px; overflow: hidden; position: relative;">
<div class="panel-itemtree-node node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 0px; width: 100%;"><span class="icon icon-triangle-right dropdown"></span><span class="icon icon-visibility visibility-group"></span><span class="node-title">Civilian Casualty</span></div>
<div class="panel-itemtree-node node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 31px; width: 100%;"><span class="icon icon-triangle-right dropdown"></span><span class="icon icon-visibility visibility-group"></span><span class="node-title">Russian Firing Positions</span></div>
</div>
</div>
After clicking an item, a number of subitems appear in the code as per (once again extract):
<div aria-label="grid" aria-readonly="true" class="ReactVirtualized__Grid ReactVirtualized__List" role="grid" tabindex="0" style="box-sizing: border-box; direction: ltr; height: 4680px; position: relative; width: 298px; will-change: transform; overflow: hidden;">
<div class="ReactVirtualized__Grid__innerScrollContainer" role="rowgroup" style="width: auto; height: 3441px; max-width: 298px; max-height: 3441px; overflow: hidden; position: relative;">
<div class="panel-itemtree-node node-active node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 0px; width: 100%;"><span class="icon icon-triangle-down dropdown"></span><span class="icon icon-visibility visibility-group"></span><span class="node-title">Civilian Casualty</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 31px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">27/03/2022 One deceased civilian on the street.</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 62px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">24/03/2022 Image 1: Casualty at the humanitarian aid station in Kharkiv, near which people were standing, struck by rockets of the Hurricane system with cluster elements.</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 93px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">24/03/2022 Civilian casualty after hit on the queue that stood for humanitarian aid</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 124px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">24/03/2022 Body visable in the street</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 155px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">24/03/2022 6 killed, 15 injured by Russian long range weapons after queing to receive humanitarian aid</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 186px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">24/03/2022 Outcome of missile attack on humanitarian aid center.</span></div>
<div class="panel-itemtree-node node-children node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 217px; width: 100%;"><span class="item-color" style="background-color: rgb(204, 27, 21);"></span><span class="icon icon-visibility visibility-feature"></span><span class="icon icon-map-pin feature-icon"></span><span class="node-title">23/03/2022 Dead civilian and destruction in Mariupol</span></div>
<div class="panel-itemtree-node node-hoverable" style="height: 31px; left: 0px; position: absolute; top: 2976px; width: 100%;"><span class="icon icon-triangle-right dropdown"></span><span class="icon icon-visibility visibility-group"></span><span class="node-title">Russian Firing Positions</span></div>
</div>
</div>
I am using google colab, thus far I have written a script that looks for class names equal to node title, and it returns the link for the items in the table. I then attempt to click the first item in the list and get a new list of links based on the class node title, however it does not return all the additional link names that I am expecting.
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://maphub.net/Cen4infoRes/russian-ukraine-monitor")
print(wd.title)
links = wd.find_elements(by=By.CLASS_NAME, value= "node-title")
links[0].click
links_2 = wd.find_elements(by=By.CLASS_NAME, value= "node-title")
for elms in links_2:
print(elms.text)
After clicking on one of the items additional html is loaded elsewhere relating to the corresponding map pin, which is where I would like to scrape from.
Hope this makes sense, apologies for long post.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
