'Create a compress function in Python?

I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.

Ex:

>>> compress("ddaaaff")
'd2a3f2'


 def compress(s):
     count=0

     for i in range(0,len(s)):
         if s[i] == s[i-1]:
             count += 1
         c = s.count(s[i])

     return str(s[i]) + str(c)


Solution 1:[1]

Here is a short python implementation of a compression function:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

Here are a few examples:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'

Solution 2:[2]

Short version with generators:

from itertools import groupby
import re
def compress(string):
    return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))

(1) Grouping by chars with groupby(string)

(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)

(3) Joining into proper format

(4) Removing 1 chars for single items when there is a no digit before and after 1

Solution 3:[3]

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

Here's the output:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.

Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

Solution 4:[4]

x="mississippi"
res = ""
count = 0
while (len(x) > 0):
    count = 1
    res= ""
    for j in range(1, len(x)):
        if x[0]==x[j]:
            count= count + 1
        else:
            res = res + x[j]
    print(x[0], count, end=" ")
    x=res

Solution 5:[5]

Just another simplest way to perform this:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

Which gives the output as required, examples:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i

Solution 6:[6]

from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)

Solution 7:[7]

input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
    if i == len(input):
        print(input[i - 1] + str(count), end="")
        break
    else:
        if input[i - 1] == input[i]:
            count += 1
    else:
            print(input[i - 1] + str(count), end="")
            count = 1

Output : m1i1s2i1s2i1p2i1

Solution 8:[8]

s=input("Enter the string:")
temp={}
result=" "
for x in s:
    if x in temp:
        temp[x]=temp[x]+1
    else:
        temp[x]=1
for key,value in temp.items():
    result+=str(key)+str(value)

print(result)

Solution 9:[9]

Here is something I wrote.

def stringCompression(str1):
  counter=0
  prevChar = str1[0]
  str2=""
  charChanged = False
  loopCounter = 0

  for char in str1:
      if(char==prevChar):
          counter+=1
          charChanged = False
      else:
          str2 += prevChar + str(counter)
          counter=1
          prevChar = char
          if(loopCounter == len(str1) - 1):
              str2 += prevChar + str(counter)
          charChanged = True
      loopCounter+=1
  if(not charChanged):
      str2+= prevChar + str(counter)

  return str2

Not the best code I guess. But works well.

a -> a1

aaabbbccc -> a3b3c3

Solution 10:[10]

This is a solution to the problem. But keep in mind that this method only effectively works if there's a lot of repetition, specifically if consecutive characters are repetitive. Otherwise, it will only worsen the situation.

e.g.,
AABCD --> A2B1C1D1
BcDG ---> B1c1D1G1

def compress_string(s):
    result = [""] * len(s)
    visited = None

    index = 0
    count = 1

    for c in s:
        if c == visited:
            count += 1
            result[index] = f"{c}{count}"
        else:
            count = 1
            index += 1
            result[index] = f"{c}{count}"
            visited = c

    return "".join(result)

Solution 11:[11]

You can simply achieve that by:

gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
    if i==last:
        count=count+1
    elif i!=last:
        rstr=rstr+last+str(count)
        count=1
        last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))

Solution 12:[12]

Here is a short python implementation of a compression function:

#d=compress('xxcccdex')
#print(d)

def compress(word):
    list1=[]
    for i in range(len(word)):
        list1.append(word[i].lower())
    num=0
    dict1={}
    for i in range(len(list1)):
        if(list1[i] in list(dict1.keys())):
            dict1[list1[i]]=dict1[list1[i]]+1
        else:
            dict1[list1[i]]=1

    s=list(dict1.keys())
    v=list(dict1.values())
    word=''
    for i in range(len(s)):
        word=word+s[i]+str(v[i])
    return word

Solution 13:[13]

Below logic will work irrespective of

  1. Data structure
  2. Group By OR Set or any sort of compression logic
  3. Capital or non-capital characters
  4. Character repeat if not sequential

    def fstrComp_1(stng):
    sRes = ""
    cont = 1        
    for i in range(len(stng)):
    
     if not stng[i] in sRes:
        stng = stng.lower()
        n = stng.count(stng[i])
        if  n > 1: 
            cont = n
            sRes += stng[i] + str(cont)
        else:
            sRes += stng[i]
    
        print(sRes)
    
    fstrComp_1("aB*b?cC&")
    

Solution 14:[14]

I wanted to do it by partitioning the string. So aabbcc would become: ['aa', 'bb', 'cc']

This is how I did it:

def compression(string):

    # Creating a partitioned list
    alist = list(string)
    master = []
    n = len(alist)

    for i in range(n):
        if alist[i] == alist[i-1]:
            master[-1] += alist[i]
        else:
            master += alist[i]


    # Adding the partitions together in a new string
    newString = "" 
    for i in master:
        newString += i[0] + str(len(i))

    # If the newString is longer than the old string, return old string (you've not 
    # compressed it in length)
    if len(newString) > n:
        return string
    return newString



string = 'aabbcc'
print(compression(string))

Solution 15:[15]

string = 'aabccccd' output = '2a3b4c4d'

new_string = " "
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
    else:         
        new_string =  new_string + str(count) + string[i]
        count = 1 
new_string = new_string + str(count) + string[i+1]    
print(new_string)

Solution 16:[16]

For a coding interview, where it was about the algorithm, and not about my knowledge of Python, its internal representation of data structures, or the time complexity of operations such as string concatenation:

def compress(message: str) -> str:
    output = ""
    length = 0
    previous: str = None
    for char in message:
        if previous is None or char == previous:
            length += 1
        else:
            output += previous
            if length > 1:
                output += str(length)
            length = 1
        previous = char
    if previous is not None:
        output += previous
        if length > 1:
            output += str(length)
    return output

For code I'd actually use in production, not reinventing any wheels, being more testable, using iterators until the last step for space efficiency, and using join() instead of string concatenation for time efficiency:

from itertools import groupby
from typing import Iterator


def compressed_groups(message: str) -> Iterator[str]:
    for char, group in groupby(message):
        length = sum(1 for _ in group)
        yield char + (str(length) if length > 1 else "")


def compress(message: str) -> str:
    return "".join(compressed_groups(message))

Taking things a step further, for even more testability:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


def segments(message: str) -> Iterator[Segment]:
    for char, group in groupby(message):
        yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return "".join(str(s) for s in segments(message))

Going all-out and providing a Value Object CompressedString:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


class CompressedString(str):

    @classmethod
    def compress(cls, message: str) -> "CompressedString":
        return cls("".join(str(s) for s in cls._segments(message)))

    @staticmethod
    def _segments(message: str) -> Iterator[Segment]:
        for char, group in groupby(message):
            yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return CompressedString.compress(message)

Solution 17:[17]

def compress(val):
    print(len(val))
    end=0
    count=1
    result=""
    for i in range(0,len(val)-1):
        #print(val[i],val[i+1])
        if val[i]==val[i+1]:
            count=count+1
            #print(count,val[i])
        elif val[i]!=val[i+1]:
            #print(end,i)
            result=result+val[end]+str(count)
            end=i+1
            count=1
    result=result+val[-1]+str(count)
    return result
res=compress("I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.")
print(len(res))

Solution 18:[18]

Use python's standard library re.

def compress(string):
    import re
    p=r'(\w+?)\1+' # non greedy, group1 1
    sub_str=string
    for m in re.finditer(p,string):
        num=m[0].count(m[1])
        sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
    return sub_str
string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd' 
string4='ababcdabcdefabcdcd' 

print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))

Resut:

a8b9c8k11p3                                                                     
ab2cd2                                                                          
abcd2
ab2cdabcdefabcd2 

Solution 19:[19]

Using generators:

input = "aaaddddffwwqqaattttttteeeeeee"

from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))

Solution 20:[20]

def compress(string):

    # taking out unique characters from the string

    unique_chars = []
    for c in string:
        if not c in unique_chars:
            unique_chars.append(c)

    # Now count the characters

    res = ""

    for i in range(len(unique_chars)):
        count = string.count(unique_chars[i])
        res += unique_chars[i]+str(count)

    return res


string = 'aabccccd'
compress(string)

Solution 21:[21]

from collections import Counter

def char_count(input_str):
    my_dict = Counter(input_str)
    print(my_dict)
    output_str = ""
    for i in input_str:
        if i not in output_str:
            output_str += i
            output_str += str(my_dict[i])
    return output_str

result = char_count("zddaaaffccc")
print(result)

Solution 22:[22]

This is the modification of Patrick Yu's code. It code fails for the below test cases.

SAMPLE INPUT:
c
aaaaaaaaaabcdefgh

EXPECTED OUTPUT:
c1
a10b1c1d1e1f1g1h1

OUPUT OF Patrick's Code:
c
a10bcdefgh

Below is the modified code:


def Compress(S):
    Ans = S[0]
    count = 1
    for i in range(len(S)-1):
        if S[i] == S[i+1]:
            count += 1
        else:
            if count >= 1:
                Ans += str(count)
            Ans += S[i+1]
            count = 1
    if count>=1:
        Ans += str(count)
    return Ans

Just the condition must be changed from greater(">") to greater than equal to(">=") when comparing the count with 1.