'Python & Selenium: How to get Elements in DevTools with CDP (Chrome DevTools Protocol)
I'd like to get all source code in Elements with Chrome DevTools.
Although I tried the following code, these values are not match with the above code.
body = driver.execute_cdp_cmd("DOM.getOuterHTML", {"backendNodeId": 1})
print(body)
Is it possible to get all source code with CDP? How can I get all source code with CDP?
I know the another way to scrape the source code. But I'd like to know how to get the source code in Elements in DevTools. (F12)
Solution 1:[1]
EDIT: See CDP solution at the end
Assuming by "f12 source code" you mean "the current DOM, after it has been manipulated by JS or anything else, as opposed to the original source code".
so, consider the following html page:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Hi</title>
<script>
document.addEventListener("DOMContentLoaded", function(){
setTimeout(function(){
document.getElementById("test").innerHTML+=" World!"
}, 3000)
});
</script>
</head>
<body>
<h1 id="test">Hello</h1>
</body>
</html>
3 seconds after page load, the h1 will contain "Hello World!"
And that is exactly what we see when running the following code:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
driver.get("http://localhost:8000/") # replace with your page
sleep(6) # probably replace with smarter logic
html = driver.execute_script("return document.documentElement.outerHTML")
print (html)
That outputs:
<html lang="en"><head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Hi</title>
<script>
document.addEventListener("DOMContentLoaded", function(){
setTimeout(function(){
document.getElementById("test").innerHTML+=" World!"
}, 3000)
});
</script>
</head>
<body>
<h1 id="test">Hello World!</h1>
</body></html>
EDIT, using CDP instead:
The behavior you're describing is odd, but okay, let's find a different solution.
It seems there's limited support for CDP in selenium 4 (so far) in python. as of Now (May 2022) There is no driver.getDevTools() in python, only java and JS (Node) (?).
Anyway, I'm not even sure that would have helped us.
Raw CDP will suffice for now:
from selenium import webdriver
from time import sleep
# webdriver.remote.webdriver.import_cdp()
driver = webdriver.Chrome()
driver.get("http://localhost:8000/")
sleep(6)
doc = driver.execute_cdp_cmd(cmd="DOM.getDocument",cmd_args={})
doc_root_node_id = doc["root"]["nodeId"]
result = driver.execute_cdp_cmd(cmd="DOM.getOuterHTML",cmd_args={"nodeId":doc_root_node_id})
print (result['outerHTML'])
prints:
<!DOCTYPE html><html lang="en"><head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Hi</title>
<script>
document.addEventListener("DOMContentLoaded", function(){
setTimeout(function(){
document.getElementById("test").innerHTML+=" World!"
}, 3000)
});
</script>
</head>
<body>
<h1 id="test">Hello World!</h1>
</body></html>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |