'Php splitting a sentence
I'm trying to split a string of sentences by "." to get each sentence in an array. Like below:
$Text = "Hello, Mr. James. How are you today."
$split= explode(".", $Text);
As you can see $Text contains 2 sentences therefore i should only have 2 elements in the array. The issue i'm having is that sometimes my $Text can contain words like "Mr." or any other word which contains a "." in the middle of a sentence. This will result in the sentences being split from the middle and placed separately in the array like below:
Array ( [0] => Hello, Mr [1] => James [2] => How are you today [3] => )
Solution 1:[1]
You can avoid a lot of exception handling and general misery, if you can ensure that all English sentences are properly spaced at the end of each sentence -- 2 consecutive spaces. This can be difficult when dealing with some digitized strings because sometimes multi-spacing gets condensed to a single space.
This is what I mean:
$Text = "Hello, Mr. James. How are you today.";
$split = explode(" ", $Text);
var_export($split);
// array ( 0 => 'Hello, Mr. James.', 1 => 'How are you today.', )
Exploding on each space-space will give you a reliable result.
If you want good output, you'll need to use good input.
If you want to blacklist a few predictable substrings that should not be use to split the string, then you can use (*SKIP)(*FAIL) for that.
Code: (Demo)
$text = "Hello, Mr. James. How are you today.";
var_export(
preg_split('~(?:Mrs?|Miss|Ms|Prof|Rev|Col|Dr)[.?!:](*SKIP)(*F)|[.?!:]+\K\s+~', $text, 0, PREG_SPLIT_NO_EMPTY)
);
Output:
array (
0 => 'Hello, Mr. James.',
1 => 'How are you today.',
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
