'How can I subclass a Pandas DataFrame?

Subclassing Pandas classes seems a common need, but I could not find references on the subject. (It seems that Pandas developers are still working on it: Easier subclassing #60.)

There are some SO questions on the subject, but I am hoping that someone here can provide a more systematic account on the current best way to subclass pandas.DataFrame that satisfies two general requirements:

  1. calling standard DataFrame methods on instances of MyDF should produce instances of MyDF
  2. calling standard DataFrame methods on instances of MyDF should leave all attributes still attached to the output

(And are there any significant differences for subclassing pandas.Series?)

Code for subclassing pd.DataFrame:

import numpy as np
import pandas as pd

class MyDF(pd.DataFrame):
    # how to subclass pandas DataFrame?
    pass

mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])
print(type(mydf))  # <class '__main__.MyDF'>

# Requirement 1: Instances of MyDF, when calling standard methods of DataFrame,
# should produce instances of MyDF.
mydf_sub = mydf[['A','C']]
print(type(mydf_sub))  # <class 'pandas.core.frame.DataFrame'>

# Requirement 2: Attributes attached to instances of MyDF, when calling standard
# methods of DataFrame, should still attach to the output.
mydf.myattr = 1
mydf_cp1 = MyDF(mydf)
mydf_cp2 = mydf.copy()
print(hasattr(mydf_cp1, 'myattr'))  # False
print(hasattr(mydf_cp2, 'myattr'))  # False


Solution 1:[1]

For Requirement 1, just define _constructor:

import pandas as pd
import numpy as np

class MyDF(pd.DataFrame):
    @property
    def _constructor(self):
        return MyDF


mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])
print type(mydf)

mydf_sub = mydf[['A','C']]
print type(mydf_sub)

I think there is no simple solution for Requirement 2. I think you need define __init__, copy, or do something in _constructor, for example:

import pandas as pd
import numpy as np

class MyDF(pd.DataFrame):
    _attributes_ = "myattr1,myattr2"

    def __init__(self, *args, **kw):
        super(MyDF, self).__init__(*args, **kw)
        if len(args) == 1 and isinstance(args[0], MyDF):
            args[0]._copy_attrs(self)

    def _copy_attrs(self, df):
        for attr in self._attributes_.split(","):
            df.__dict__[attr] = getattr(self, attr, None)

    @property
    def _constructor(self):
        def f(*args, **kw):
            df = MyDF(*args, **kw)
            self._copy_attrs(df)
            return df
        return f

mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])
print type(mydf)

mydf_sub = mydf[['A','C']]
print type(mydf_sub)

mydf.myattr1 = 1
mydf_cp1 = MyDF(mydf)
mydf_cp2 = mydf.copy()
print mydf_cp1.myattr1, mydf_cp2.myattr1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Peter Mortensen