'Python web crawler: How do you get the content from inner div tag?
I'm a beginner on the python web crawler. Currently, I'm interning in a company. When I use the code below:
import requests
from requests.auth import HTTPBasicAuth
URL="https://rb-alm-04-p.de.bosch.com/ccm/web/projects/CN_Projects#action=com.ibm.team.workitem.viewWorkItem&id=1428161&tab=com.ibm.team.workitem.tab.history"
account = '******' # I cannot give password and account coz there
password = '******' # is business confidential information here
r_header = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'}
response =
requests.get(URL,auth=HTTPBasicAuth(account,password),headers=r_header) #it shows <Response [200]>
content = bs4.BeautifulSoup(response.content.decode("utf-8",'html'))
Content I got from code:
<!DOCTYPE html>
<!--
Licensed Materials - Property of IBM
(c) Copyright IBM Corporation 2005, 2021. All Rights Reserved.
Note to U.S. Government Users Restricted Rights:
Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
--><html lang="en-us">
<head>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="IE=10" http-equiv="X-UA-Compatible"/>
<title></title>
<link href="/ccm/web/_style/?include=A~&etag=Hx5xslMVjuM_en_US&_proxyURL=%2Fccm&ss=U6uTL" rel="stylesheet" type="text/css"/>
<link href="/ccm/web/net.jazz.ajax/jazz.ico" rel="shortcut icon"/>
<style type="text/css">
#net-jazz-ajax-NoScriptMessage {
width: 100%;
color: #D0D0D0;
font-size: 2em;
text-align: center;
position: absolute;
top: 1%;
z-index: 999;
}
</style>
</head>
<body class="claro">
<noscript><div id="net-jazz-ajax-NoScriptMessage">Javascript is either disabled or not available in your Browser</div></noscript>
<div id="net-jazz-ajax-InitialLoadMessage">Loading...</div>
<div id="net-jazz-ajax-WorkbenchRoot"></div>
<script type="text/javascript">
djConfig = {
isDebug: false,
layout: "",
usePlainJson: true,
baseUrl: "/ccm/web/dojo/",
locale: "en-us",
localizationComplete: true
};
/*null*/
net = {jazz: {ajax: {}}};
net.jazz.ajax._contextRoot = "/ccm";
net.jazz.ajax._webuiPrefix = "/web/";
</script>
<script src="/ccm/web/_js/?include=A~&etag=Hx5xslMVjuM_en_US&_proxyURL=%2Fccm&ss=U6uTL&locale=en-us" type="text/javascript"></script>
<script type="text/javascript">
require("dojo/main").getObject('jazz.core.loader', true)._serverStartup="U6uTL";
require("dojo/main").getObject('jazz.core.loader',true)._loaded=["A"];
</script>
<script type="text/javascript">
/* <![CDATA[ */
require(["dojo/ready", "dojo/parser", "dijit/registry", "dijit/Dialog"], function(ready, parser, registry){
ready(function(){
net.jazz.ajax.ui.PlatformUI.createAndRunWorkbench("net.jazz.web.app.authrequired");
});
});
/* ]]> */
</script>
</body>
</html>
It looks like the data was fetched successfully, but a lot of subtags (div tags) and attached contents are missing. How can I get the subtages' content?
[The picture in this link is the data I want][1]
[1]: https://i.stack.imgur.com/RP27v.png
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
