'segmenting bs4.element.Tag
Is it possible to segment a bs4.element.Tag into several bs4.element.Tag?
You can think of an application as the following:
1- The original bs4.element.Tag contains a paragraph.
2- We want to segment the paragraph in the original bs4.element.Tag into sentences and get a bs4.element.Tag corresponding to each sentence.
Example:
paragraphs = soup.find_all('p') gives all the paragraphs in an HTML file.
Suppose a paragraph (which is also a bs4.element.Tag instance) is the following:
<p><i><a href="/wiki/Le_Bassin_Aux_Nymph%C3%A9as" title="Le Bassin Aux Nymphéas">Le Bassin Aux Nymphéas</a></i>, 1919. Monet's late series of water lily paintings are among his best-known works.
I would like to turn this bs4.element.Tag instance (which is also a paragraph) into 2 bs4.element.Tag instances as the following (one for each sentence):
First bs4.element.Tag should correspond to the first sentence:
<i><a href="/wiki/Le_Bassin_Aux_Nymph%C3%A9as" title="Le Bassin Aux Nymphéas">Le Bassin Aux Nymphéas</a></i>, 1919.
Second bs4.element.Tag should correspond to the second sentence:
Monet's late series of water lily paintings are among his best-known works.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
