'Split a long text in two or more parts each one with a maximum length in python

Let's suppose I have a long text that I want to process with an API having a maximum number of allowed characters (N). I would like to split that text into 2 or more texts with shorter than N characters, and based on a separator. I know I could split by separator but I would like to keep the number of output sub-texts the smallest as possible.

For example, suppose my text is:

"Lorem ipsum dolor sit amet, odio salutandi id nam, ferri nostro te duo. Eum ex odio habeo qualisque, ne eos natum graeco. Autem voluptatum ex mea. Nulla putent reformidans cu pro, posse recusabo reprehendunt pro no. An sit ludus oblique. Consulatu cotidieque ex sea, nam no duis prompta expetendis.

Est ne tempor quaestio complectitur, modo error vim et. Option voluptaria efficiantur te eam, ea appareat evertitur qui, te vix pertinax recteque. Mea eu diceret ceteros. Expetenda torquatos assueverit est ex, te reque voluptatibus signiferumque has."

which is 550 characters long. Let's suppose that N is 250. I would expect the text to be split in this way:

Part 1: "Lorem ipsum dolor sit amet, odio salutandi id nam, ferri nostro te duo. Eum ex odio habeo qualisque, ne eos natum graeco. Autem voluptatum ex mea. Nulla putent reformidans cu pro, posse recusabo reprehendunt pro no. An sit ludus oblique" (237 characters)
Part 2: "Consulatu cotidieque ex sea, nam no duis prompta expetendis.

Est ne tempor quaestio complectitur, modo error vim et. Option voluptaria efficiantur te eam, ea appareat evertitur qui, te vix pertinax recteque. Mea eu diceret ceteros." (232 characters)

Part 3: the remaining.

Any idea on how to do this in Python?

Thank you for any help. Francesca

python substring

Solution 1:^[1]


n = 250
text = """Lorem ipsum dolor sit amet, odio salutandi id nam, ferri nostro te duo. Eum ex odio habeo qualisque, ne eos natum graeco. Autem voluptatum ex mea. Nulla putent reformidans cu pro, posse recusabo reprehendunt pro no. An sit ludus oblique. Consulatu cotidieque ex sea, nam no duis prompta expetendis.

Est ne tempor quaestio complectitur, modo error vim et. Option voluptaria efficiantur te eam, ea appareat evertitur qui, te vix pertinax recteque. Mea eu diceret ceteros. Expetenda torquatos assueverit est ex, te reque voluptatibus signiferumque has."""

if len(text) >= 550:
  print(text[0:n-1])
  print(text[n:])
else:
  print(text)

So you can have a variable n with the length (250 in your example). Then it checks if the length of the text is greater or equal 550 chars. If yes it's going to print everything from char 0 up to the length n (minus 1 so you get the first 250 not the first 251 characters). Then it is going to do this for the second part: from n to the end.

Solution 2:^[2]

You can create a function, that can return the chunks of desired length.

In [13]: def split(N, text):
    ...:     chunks = [text[i:i+N] for i in range(0, len(text), N-1)]
    ...:     return chunks

This will return the chunks in the format of list. i.e

text = "Lorem.................." # complete lorem ispm
chunks = split(250, text)
print(len(s[0]), len(s[1]), len(s[2]))

And the output lengths will be

250 250 50

Solution 3:^[3]

This is a possible solution:

def split_txt(txt, sep, n):
    if any(len(s) + 1 > n for s in txt.split(sep)):
        raise Exception('The text cannot be split')
    result = []
    start = 0
    while start + n <= len(txt):
        result.append(txt[start:start + n].rsplit(sep, 1)[0] + sep)
        start += len(result[-1])
    if start < len(txt):
        result.append(txt[start:])
    return result

Solution 4:^[4]

You might consider building a child class of the built-in TextWrapper tools, using the other answers insights. Base class lets you specify rules to handle a text : max number of columns (width), max number of lines, handling of hyphens and so on.

The textwrap module provides some convenience functions, as well as TextWrapper, the class that does all the work. If you’re just wrapping or filling one or two text strings, the convenience functions should be good enough; otherwise, you should use an instance of TextWrapper for efficiency. [emphasis mine]

The basic class itself does not treat the specifics of OP problem, but it is worth having a look at it for anyone landing on this page.

Stuff in this section may also give some inspiration : https://docs.python.org/3/library/text.html#stringservices

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Henrik
Solution 2	Ahmad Anis
Solution 3	Riccardo Bucco
Solution 4	LoneWanderer

'Split a long text in two or more parts each one with a maximum length in python

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]