'Pandas internals - "Index labels must be unique"

The Pandas Internals documentation (v1.2.4) states

In pandas there are a few objects implemented which can serve as valid containers for the axis labels:

  • Index: the generic “ordered set” object, an ndarray of object dtype assuming nothing about its contents. The labels must be hashable (and likely immutable) and unique. Populates a dict of label to location in Cython to do O(1) lookups.

Clearly dataframe indexes do not need to be unique:

df = pd.DataFrame({10, 20, 30}, index=['a','b','b'])
df.index
# Index(['a', 'b', 'b'], dtype='object')

Why does the documentation quoted above state that labels in an index must be unique?



Solution 1:[1]

In pandas there are a few objects implemented which can serve as valid containers for the axis labels

The keyword here is valid, Pandas allows you to create non-unique indexes, but there will be some functions with errors. Just because it allows something, doesn't make it "valid".

The set_index() function has a keyword verify_integrity that can be used to make the function error when the index wouldn't be "valid".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1