'Best way to populate class properties with external data

What is the most pythonic (and hopefully most efficient) way of populating object's properties with data coming from other objects, say Pandas dataframes? Currently I'm passing the data as argument in the constructor:

class Item:
  def __init__(self, row, mapping):
    self._foo = row["foo"]
    self._bar = self.mapping_method(row["bar"], mapping)
  
  def mapping_method(self, value, mapping):
    return mapping.loc[value, "some_column"]

for index, row in df.iterrows():
  i = Item(row, mapping)

But I have the feeling passing a whole dataframe around and around isn't the best way to do it. Can this be improved somehow?



Solution 1:[1]

You may want to give your class two public constructors. The basic way of constructing the class would be to call it directly with data items you intend to store. Then you'd add a classmethod that would translate data from a specific kind of source (like a row of a dataframe) and build an instance using that data.

This design would allow you to use the same class even if you later need to process data that comes to you in a completely different format (like a database entry, or a json response from a web query). You'd just add a new classmethod to interpret the new data format and use the result to create your objects. The core of the class doesn't care what form the data came in with, it just needs to use that data.

For your example class, I might do:

class Item:
  def __init__(self, foo, bar):
    self._foo = foo
    self._bar = bar

  @classmethod
  def from_df_row(cls, row, mapping):
    return cls(row["foo"], mapping.loc[row["bar"], "some_column"])

for index, row in iterrows(df):
  i = Item.from_df_row(row, mapping)

I chose to fold the mapping_method into the from_df_row classmethod, but it may make sense to keep it as part of __init__ if it's something fundamental to the meaning of _foo and _bar and the class (rather than something specific to the data source you're currently using).

Solution 2:[2]

The basic question is where the list of properties to use are coming from.

Are properties preset by object or does object want to get all columns of dataframe row as properties?

In the second case, something like this:

class Item:
  MAPPED_COLUMNS={'bar':'some column'}
  def __init__(self, row, mapping):
    self.map_to_properties(row, mapping)
  
  def map_to_properties(self, row, mapping):
    for col in dict(row):
        if col in Item.MAPPED_COLUMNS:
            setattr(self,col,mapping.loc[ col, Item.MAPPED_COLUMNS[col] ])
        else:
            setattr(self,col,row[col])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Blckknght
Solution 2