'Regular expression: extract first word character after dot

I am trying to extract first word character after the dot with this regex:

\..(\w)

But it is not working with new lines and spaces.

homEwork:

  it was a bright cold day in April, and the clocks were striking thirteen.



  the hallway smelt of boiled cabbage and old rag mats. at one end of it a coloured poster, too large for indoor display, had been tacked to the wall. 



  winston turned a switch and the voice sank somewhat, though the words were still distinguishable.  his hair was very fair, his face naturally sanguine.



  it was the police patrol, snooping into people's windows. the patrols did not matter, however. only the Thought Police mattered.

enter image description here



Solution 1:[1]

You can use

(\.[\W\d_]*)([^\W\d_])

If you only work with ASCII, you can use

(\.[^a-zA-Z]*)([a-zA-Z])

Details:

  • \. - a dot, then
  • [\W\d_]* / [^A-Za-z0-9]* - any zero or more non-letters, and then
  • [^\W\d_] / [a-zA-Z] - a letter cptured into Group 1.

See the regex demo.

If you wish to uppercase the letter in Python you will need a re.sub like

re.sub(r'(\.[^a-zA-Z]*)([a-zA-Z])', lambda x: f'{x.group(1)}{x.group(2).upper()}', text)

See the Python demo:

import re
rx = r"(\.[^a-zA-Z]*)([a-zA-Z])"
text = "homEwork:\n\n  it was a bright cold day in April, and the clocks were striking thirteen.\n\n\n\n  the hallway smelt of boiled cabbage and old rag mats. at one end of it a coloured poster, too large for indoor display, had been tacked to the wall. \n\n\n\n  winston turned a switch and the voice sank somewhat, though the words were still distinguishable.  his hair was very fair, his face naturally sanguine.\n\n\n\n  it was the police patrol, snooping into people's windows. the patrols did not matter, however. only the Thought Police mattered."
print( re.sub(r'(\.[^a-zA-Z]*)([a-zA-Z])', lambda x: f'{x.group(1)}{x.group(2).upper()}', text) )

Output:

homEwork:

  it was a bright cold day in April, and the clocks were striking thirteen.



  The hallway smelt of boiled cabbage and old rag mats. At one end of it a coloured poster, too large for indoor display, had been tacked to the wall. 



  Winston turned a switch and the voice sank somewhat, though the words were still distinguishable.  His hair was very fair, his face naturally sanguine.



  It was the police patrol, snooping into people's windows. The patrols did not matter, however. Only the Thought Police mattered.

Solution 2:[2]

You can use \.\s*(\w+)

>>> re.findall(r'\.\s*(\w)', text)
['the', 'at', 'winston', 'his', 'it', 'the', 'only']
  • \.: literal dot
  • \s*: 0 or more whitespace
  • (\w+): 1 or more [a-zA-Z0-9_]. Parenthesis are for capture group

Solution 3:[3]

You can do this with string methods

import re
word = """  it was a bright cold day in April, and the clocks were striking thirteen.



  the hallway smelt of boiled cabbage and old rag mats. at one end of it a coloured poster, too large for indoor display, had been tacked to the wall. 



  winston turned a switch and the voice sank somewhat, though the words were still distinguishable.  his hair was very fair, his face naturally sanguine.



  it was the police patrol, snooping into people's windows. the patrols did not matter, however. only the Thought Police mattered.```



  [1]: https://i.stack.imgur.com/vCGA8.png"""
#remove new lines
word = word.replace('\n','')
#remove space
word = re.sub('\. +', '.', word)
#position of .
pos =word.find('.')
#next character after .
if (pos+1) < len(word):
 word[pos+1]



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew
Solution 2 Corralien
Solution 3