'How to write a minimally working pyproject.toml file that can install packages?

Pip supports the pyproject.toml file but so far all practical usage of the new schema requires a 3rd party tool that auto-generates these files (e.g., poetry and pip). Unlike setup.py which is already human-writeable, pyproject.toml is not (yet).

From setuptools docs,

[build-system]
requires = [
  "setuptools >= 40.9.0",
  "wheel",
]
build-backend = "setuptools.build_meta"

However, this file does not include package dependencies (as outlined in PEP 621). Pip does support installing packages using pyproject.toml but nowhere does pep specify how to write package dependencies in pyproject.toml for the official build system setuptools.

How do I write package dependencies in pyproject.toml?


Related StackOverflow Questions:



Solution 1:[1]

The data = data + 1 assignment creates a new object, which you can see if you look at id(data) after the call. Without that, it's just modifying the same object in place (as in test1 below):

In [11]: def test1(data):
    ...:     print(id(data))
    ...:     data['C'] = data['A'] + data['B']
    ...:     return data['C']

In [12]: data, id(data)
Out[12]:
(   A  B
 0  1  1
 1  2  2
 2  3  3
 3  4  4, 1413911641896)

In [13]: test1(data)
1413911641896
Out[13]:
0    2
1    4
2    6
3    8
Name: C, dtype: int64

In [14]: def test2(data):
    ...:     data = data + 1
    ...:     print(id(data))
    ...:     data['D'] = data['A'] + data['B']
    ...:     return data['D']

In [15]: test2(data)
1413912402128
Out[15]:
0     4
1     6
2     8
3    10
Name: D, dtype: int64

Solution 2:[2]

It seems that in the second case when you add the line

data=data + 1 

you're creating a new instance of a dataframe modifying that instead of the original. In the second case it's well known you can't modify a dataframe in a function without making a copy of it first, or else you modify the initial df as well

Solution 3:[3]

I did not know that pandas does not make duplicate dataframe when assigning the df to another variable. Instead it is just assigning the "view" of the original df to the new variable. I suppose this make sense that Pandas make this as default to save memory consumption unless we specifically request a separate copy of the original df.

When I first ask the question, I already had this thesis. However, I did not know how to "force" a copy instead of a view. After some further trial with new "search keywords", I finally ran into the right documentation here.

Anyway, to cut to the chase, here is my conclusion:

  1. Pandas does not automatically create a localised version of df when the df is passed to a function as input (like def test1(df):)
  2. If we want a localised df within the function, we must specifically make a "Deep" copy via the .copy(deep=True)function.

A new set of codes that work is as follows:

df0 = pd.DataFrame({'A':[1,2,3,4],'B':[4,5,6,7]})    
def test1(dfSource):
  df1 = dfSource.copy()
  df1['C'] = dfSource['A']+dfSource['B']
  return(df1)

print("\ndf0 before test1()\n"," id(df0):",id(df0),"\n",df0)
print("\nreturn from test1(df0):\n",test1(df0))
print("\ndf0 after test1()\n"," id(df0):",id(df0),"\n",df0)

Output:

df0 before test1()
  id(df0): 140102308422224 
    A  B
0  1  4
1  2  5
2  3  6
3  4  7

return from test1(df0):
    A  B   C
0  1  4   5
1  2  5   7
2  3  6   9
3  4  7  11

df0 after test1()
  id(df0): 140102308422224 
    A  B
0  1  4
1  2  5
2  3  6
3  4  7

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Randy
Solution 2 mouad Et-tali
Solution 3 S Suen