'Does Python intern strings?

In Java, explicitly declared Strings are interned by the JVM, so that subsequent declarations of the same String results in two pointers to the same String instance, rather than two separate (but identical) Strings.

For example:

public String baz() {
    String a = "astring";
    return a;
}

public String bar() {
    String b = "astring"
    return b;
}

public void main() {
    String a = baz()
    String b = bar()
    assert(a == b) // passes
}

My question is, does CPython (or any other Python runtime) do the same thing for strings? For example, if I have some class:

class example():
    def __init__():
        self._inst = 'instance' 

And create 10 instances of this class, will each one of them have an instance variable referring to the same string in memory, or will I end up with 10 separate strings?



Solution 1:[1]

A fairly easy way to tell is by using id(). However as @MartijnPieters mentions, this is runtime dependent.

class example():

    def __init__(self):
        self._inst = 'instance'

for i in xrange(10):
    print id(example()._inst)

Solution 2:[2]

  • All length 0 and length 1 strings are interned.
  • Strings are interned at compile time ('wtf' will be interned but ''.join(['w', 't', 'f'] will not be interned)
  • Strings that are not composed of ASCII letters, digits or underscores, are not interned. This explains why 'wtf!' was not interned due to !.

https://www.codementor.io/satwikkansal/do-you-really-think-you-know-strings-in-python-fnxh8mtha

The above article explains the string interning in python. There are some exceptions which are defined clearly in the article.

Solution 3:[3]

Some strings are interned in python. As the python code compiled, identifiers are interned e.g. variable names, function names, class names.

Strings that meet identifier rules which are starts with underscore or string and contains only underscore, string and number, are interned:

a="hello"
b="hello"

Since strings are immutable python shares the memory references here and

a is b ===> True

But if we had

a="hello world"
b="hello world"

since "hello world" does not meet the identifier rules, a and b are not interned.

a is b  ===> False

You can intern those with sys.intern(). use this method if you have a lot of string repetition in your code.

a=sys.intern("hello world")
b=sys.intern("hello world")

now a is b ===> True

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 woozyking
Solution 2 Prithvi Raj Vuppalapati
Solution 3 Yilmaz