'How can a program learn to map pronouns correctly?

How can a program learn to map pronouns correctly to something else in the text?

For example, in text "Lisa beats Jenny. She is cruel.", I would like "She" to map to "Lisa".

Is there a known name for such algorithm? If yes, what is it?



Solution 1:[1]

What you're looking for is called coreference/anaphora/pronoun resolution[1,2] but it's more of a research problem than an algorithm.

See the image below for what the CoreNLP online demo does with the sentence "Lisa beats Jenny. She is cruel". Keep in mind that it won't always have the result you want/expect, though.

Result of CoreNLP on coreference resolution

Solution 2:[2]

I believe the information you are looking for can be found in this link, about NLP (Natural Language Processing) and using it in a CNN (Convolutional Neural Networks)

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Its also worth noting that CNN's are made specifically for 'vision' or image parsing. And in most cases a DNN(Deep Neural Network) is needed for such a complex requirement.

DNN/NLP reading can be found here: https://arxiv.org/pdf/1703.03091.pdf

TL;DR

There is no specific algorithm, but rather a subset of multiple algorithms that can be used to infer the information above. Look into Microsoft's white papers on language research.

Solution 3:[3]

Parsing such a sentence requires a great deal of common knowledge. You need to know that beating someone can be considered a cruel behavior. As far as I know no one has managed to really handle this in unconstrained speech.

IMO, machine learning techniques would fail because they work without understanding, by merely reproducing learnt patterns. but think that "Lisa beats Jenny. She is cruel." and "Lisa beats Jenny. She is blond." are structurally identical but you can't generalize one from the other.

Some systems such as Google translation work by reusing already seen fragments, i.e. short word sequences. But in your case, the patterns can straddle several sentences and their probabilities of re-occurrence it too small.

Solution 4:[4]

You can use neuralcoref library from huggingface.

import neuralcoref
import spacy.cli
spacy.cli.download("en_core_web_sm")
nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp,greedyness=0.55)
doc=nlp('Lisa beats Jenny. She is cruel.')
print('coref:',doc._.coref_clusters)

output:

coref: [Lisa: [Lisa, She]]

However, it wouldn't necessarily map correct nouns. The ambiguity will be difficult to interpret. You can do some tweaking in the parameters though.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Chris
Solution 3
Solution 4