'Converting xml to html using Python

I have pages like this:

<?xml version="1.0" encoding="utf-8"?>\r\n<HTMLReturn xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://gccwebapps/PROWWS/">\r\n  <Result>OK</Result>\r\n  <ErrorMessageNewLine>\n</ErrorMessageNewLine>\r\n  <ErrorMessage />\r\n  <ID />\r\n  <HTML>&lt;div id=\'DivPROWContainer\' class=\'PROWContainer\'&gt;\n&lt;div id=\'DivTableGCCDocsHolder\' class=\'TableGCCDocsHolder\'&gt;\n&lt;table id=\'TableDisplayTable\' class=\'DisplayTable DisplayGCCDocsTable HtmlDataTable\'&gt;\n&lt;tbody&gt;\n&lt;tr class=\'DisplayTableHeaderRow HtmlDataTableHeaderRow DisplayTableTopRow\'&gt;\n&lt;th colspan=\'5\'&gt;Documents available for the planning Application&lt;/th&gt;\n&lt;/tr&gt;\n&lt;tr class=\'DisplayTableHeaderRow HtmlDataTableHeaderRow\'&gt;\n&lt;th&gt;Application Number&lt;/th&gt;\n&lt;th&gt;Plan number&lt;/th&gt;\n&lt;th&gt;Document type&lt;/th&gt;\n&lt;th&gt;Description&lt;/th&gt;\n&lt;th&gt;Date Entered&lt;/th&gt;\n&lt;/tr&gt;\n&lt;tr class=\'DisplayTableDataRow HtmlDataTableRow ResultRowAlternative\'&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_APP_NO\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_DEC_LET.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;22/0001/NONMAT\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;&lt;/td&gt;\n&lt;td&gt;Text&lt;/td&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_DESCRIPTION\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_DEC_LET.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;Decision Letter\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;26/01/2022&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr class=\'DisplayTableDataRow HtmlDataTableRow ResultRowAlternative\'&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_APP_NO\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_APP_FORM_RED.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;22/0001/NONMAT\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;&lt;/td&gt;\n&lt;td&gt;Plan&lt;/td&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_DESCRIPTION\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_APP_FORM_RED.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;Application Form 9Redacted)\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;10/01/2022&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr class=\'DisplayTableDataRow HtmlDataTableRow ResultRowAlternative\'&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_APP_NO\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_LAND_PLAN_P20_2956_05D.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;22/0001/NONMAT\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;P20_2956_05D&lt;/td&gt;\n&lt;td&gt;Text&lt;/td&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_DESCRIPTION\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_LAND_PLAN_P20_2956_05D.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;Landscape MasterPlan 04.01.22\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;10/01/2022&lt;/td&gt;\n&lt;/tr&gt;\n&lt;tr class=\'DisplayTableDataRow HtmlDataTableRow ResultRowAlternative\'&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_APP_NO\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_ELEC_SERV_190123_SC_XX_XX_DR_E_600.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;22/0001/NONMAT\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;190123_SC_XX_XX_DR_E_600&lt;/td&gt;\n&lt;td&gt;Plan&lt;/td&gt;\n&lt;td&gt;&lt;a id=\'AFormLink_DESCRIPTION\' class=\'FormHyperLink\' href=\'https://ww3.gloucestershire.gov.uk/PROW/PROWWS.asmx/GetFileGCCContents?Filename=images%2f22_0001_NONMAT_ELEC_SERV_190123_SC_XX_XX_DR_E_600.PDF\' data-DisableMeWhenSomethingChanged=\'1\' target=\'_blank\' rel=\'noopener noreferrer\'&gt;Electrical Services Site Wide\n&lt;/a&gt;&lt;/td&gt;\n&lt;td&gt;10/01/2022&lt;/td&gt;\n&lt;/tr&gt;\n&lt;/tbody&gt;\n\n&lt;/table&gt;\n&lt;/div&gt;\n&lt;div class=\'PROWDefaultFooter\'&gt;\n&lt;div class=\'PROWFooter1\'&gt;© 2014-21 Gloucestershire County Council, Shire Hall, Westgate Street, Gloucester GL1 2TG.\n&lt;/div&gt;\n&lt;div class=\'PROWFooter2\'&gt;&lt;STRONG&gt;Telephone:&lt;/STRONG&gt;+44(0)1452 425000 - &lt;STRONG&gt; Out of hours:&lt;/STRONG&gt; +44(0)845 6677788\n&lt;/div&gt;\n&lt;div class=\'PROWFooter2\'&gt;\n&lt;a id=\'AGCCLink\' class=\'GCCFooterLink\' href=\'http://www.gloucestershire.gov.uk\' data-DisableMeWhenSomethingChanged=\'1\'&gt;www.gloucestershire.gov.uk\n&lt;/a&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n</HTML>\r\n  <Script>gcc_docs_startScreenSetup();</Script>\r\n</HTMLReturn>

I need to find elements in it using xpath (without namespaces). I tried different variants, but I receive something very short and empty as an output (5-6 bytes):

That's the variants I tried. As you can see - none of them works.

import lxml.html as html
res = html.fromstring(sec_response.body)
len(res)
5
res.xpath('//div')
[]

import xml.etree.ElementTree as ET
xhtml = ET.fromstring(sec_response.text)
len(xhtml)
6
xhtml.xpath('//div')
*** AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'xpath'

from lxml import etree
xslt_root = etree.XML(sec_response.body)
len(xslt_root)
6
xslt_root.xpath('//div')
[]

sec_response.selector.remove_namespaces()
sec_response.xpath('//td')
[]
sec_response.xpath('//tr')
[]

Please, show the way to transform it, so that xpath may be used to it (I need to look for //tr or //td or //a elements and FIND it).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source