'Splitting a bilingual text into two parts using Python

input text: "gut, wird gemacht right, will do (inf)" output text: gut, wird gemacht right , will do (inf) input text: gut, mache ich right, will do (inf) or I’ll do that output text: gut , mache ich right, will will do (inf) or I’ll do that input text: "wie mans macht, ists verkehrt whatever you do is wrong" output text: wie mans macht, ists verkehrt whatever you do is wrong



Solution 1:[1]

First off, please try to solve the problem yourself first. As @Julien points out, no one will write code for you.

To answer your question, you need to find an alghoritm that can detect which language a text is written in, and specify how certain it is (eg, counting letter frequencies has a surprisingly good hit rate, or you might want to use a database and compare words to that).

The next step is to choose an algorithm to find the most likely split. You could for instance evaluate each word individually, or try splitting the text in a couple locations to find what position is best.

Once you have that set up it's just a matter of trying different things until you get the accuracy you need.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nathan