'Jsoup. Print all text nodes in order
I want to parse this with Jsoup (this is a simplification, I would be parsing entire web pages)
<html><body><p>A<strong>B</strong>C<strong>D</strong>E</p></body></html>
to obtain all text elements in the order they appear, this is:
A B C D E
I have tried two approaches:
Elements elements = doc.children().select("*");
for (Element el : elements)
System.out.println(el.ownText());
which returns:
A C E B D
This is, the elements between "strong" tags go at the end.
I have also tried a recursive version:
myfunction(doc.children());
private void myfunction(Elements elements) {
for (Element el : elements){
List<Node> nodos = el.childNodes();
for (Node nodo : nodos) {
if (nodo instanceof TextNode && !((TextNode) nodo).isBlank()) {
System.out.println(((TextNode) nodo).text());
}
}
myfunction(el.children());
}
But the result is the same as before.
How can this be accomplished? I feel I am making difficult something simple ...
Solution 1:[1]
How about:
private static void myfunction(Node element) {
for (Node n : element.childNodes()) {
if (n instanceof TextNode && !((TextNode) n).isBlank()) {
System.out.println(((TextNode) n).text());
} else {
myfunction(n);
}
}
}
Demo:
String html = "<html><body><p>A<strong>B</strong>C<strong>D</strong>E</p></body></html>";
Document doc = Jsoup.parse(html);
myfunction(doc.body());
Output:
A
B
C
D
E
Java 15 update to avoid casting (TextNode) n (for more details see JEP 375: Pattern Matching for instanceof (Second Preview))
private static void myfunction(Node element) {
for (Node n : element.childNodes()) {
if (n instanceof TextNode tNode && !tNode.isBlank()) {
System.out.println(tNode.text());
} else {
myfunction(n);
}
}
}
Solution 2:[2]
The text() method will do the trick e.g. below
public static void main(String[] args) {
Document doc = Jsoup.parse("<html><body><p>A<strong>B</strong>C<strong>D</strong>E</p></body></html>");
String texts = doc.body().text();
System.out.println(texts);
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | johnII |
