'How to get a HTML5 element by XPath by LibXML2 in C++

I want to get div HTML tag by XPath by LibXml2 in C++ but it finds nothing while I have many div tags in the HTML. When I put something like /html/body/div[1]/div/div it even crashes.

htmlParserCtxtPtr parse_ctx = htmlCreateMemoryParserCtxt(resp.text.c_str(), resp.text.size());
if (!parse_ctx) {
  std::cout << "Error!" << std::endl;
  return;
}

xmlXPathContextPtr xml_ctx = xmlXPathNewContext(parse_ctx->myDoc);
if (!xml_ctx) {
  std::cout << "Error!" << std::endl;
  return;
}

xmlXPathObjectPtr xpath_obj = xmlXPathEvalExpression((xmlChar *)"//div", xml_ctx);
if (!xpath_obj) {
  std::cout << "Error!" << std::endl;
  return;
}

xmlNodeSetPtr nodes = xpath_obj->nodesetval;
std::cout << nodes->nodeNr << std::endl;   // result is 0.

I also tried to use htmlParseElement function instead of XML functions, but it shows error on HTML5 tags (unknown tag error).



Solution 1:[1]

htmlCreateMemoryParserCtxt only creates a parser context and doesn't parse the document yet, so parse_ctx->myDoc will be NULL. Try htmlReadMemory which parses the document and returns an xmlDocPtr.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 nwellnhof