'Php splitting a sentence

I'm trying to split a string of sentences by "." to get each sentence in an array. Like below:

$Text = "Hello, Mr. James. How are you today."
$split= explode(".", $Text);

As you can see $Text contains 2 sentences therefore i should only have 2 elements in the array. The issue i'm having is that sometimes my $Text can contain words like "Mr." or any other word which contains a "." in the middle of a sentence. This will result in the sentences being split from the middle and placed separately in the array like below:

Array ( [0] => Hello, Mr [1] => James [2] => How are you today [3] => )


Solution 1:[1]

You can avoid a lot of exception handling and general misery, if you can ensure that all English sentences are properly spaced at the end of each sentence -- 2 consecutive spaces. This can be difficult when dealing with some digitized strings because sometimes multi-spacing gets condensed to a single space.

This is what I mean:

$Text = "Hello, Mr. James.  How are you today.";
$split = explode("  ", $Text);
var_export($split);
// array ( 0 => 'Hello, Mr. James.', 1 => 'How are you today.', )

Exploding on each space-space will give you a reliable result. If you want good output, you'll need to use good input.


If you want to blacklist a few predictable substrings that should not be use to split the string, then you can use (*SKIP)(*FAIL) for that.

Code: (Demo)

$text = "Hello, Mr. James. How are you today.";

var_export(
    preg_split('~(?:Mrs?|Miss|Ms|Prof|Rev|Col|Dr)[.?!:](*SKIP)(*F)|[.?!:]+\K\s+~', $text, 0, PREG_SPLIT_NO_EMPTY)
);

Output:

array (
  0 => 'Hello, Mr. James.',
  1 => 'How are you today.',
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1