'Apply formula on a column conditional to another column
import requests
from geopy.geocoders import Nominatim
import streamlit as st
import numpy as np
import pandas as pd
import random, string
print('ville :')
codePostal= str(input())
print('Adresse:')
adresse= str(input())
def randomword(length):
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(length))
geolocator = Nominatim(user_agent=randomword(10))
location = geolocator.geocode(adresse+' ,France ,'+codePostal)
lat=str(location.latitude)
lon=str(location.longitude)
dist='1000'
url= 'http://api.cquest.org/dvf?lat='+lat+'&lon='+lon+'&dist='+dist
request= requests.get(url)
dataR = request.json()
datacomr1=list(dict.values(dataR))
area=datacomr1[4]
df = pd.DataFrame.from_dict(pd.json_normalize(area), orient='columns')
df.columns = df.columns.str.replace('properties.', '')
df.columns = df.columns.str.replace('.', '_')
df['nature_mutation'] = df['nature_mutation'].str.replace("'",'')
df = df[df['surface_relle_bati'] > 0]
df['surface_terrain'] = df['surface_terrain'].fillna(0)
At this point I want to calculate the price per square meter depending on wether or not the real estate transaction (valeur fonciere) contain a piece of land (surface terrain).
To be sure that my code is good I made a value count of the transactions that appear without land:
print(df.surface_terrain.value_counts())
out:
0.0 4703
1116.0 181
5751.0 90
395.0 75
1333.0 41
...
998.0 1
717.0 1
1855.0 1
502.0 1
453.0 1
Name: surface_terrain, Length: 141, dtype: int64
So I have 4703 transaction with zero land.
Now I apply my formula in two columns :
-one without taking into account the land df['price_m22']
-another taking it into consideration df['price_m2']
df['price_m22'] = df['valeur_fonciere']/df['surface_relle_bati']
df.loc[df['surface_terrain'] > 0, 'price_m2'] = df['valeur_fonciere']/(df['surface_relle_bati']+ np.log(df.surface_terrain))
df.loc[df['surface_terrain'] == 0, 'price_m2'] = df['valeur_fonciere']/df['surface_relle_bati']
df['difference'] = df['price_m22'] - df['price_m2']
So at this point I should have a difference of 0.00
for 4703 rows but when I value_counts() the column difference I get :
0.000000e+00 4775
6.922453e+06 175
3.216753e+05 18
1.992464e+04 7
1.093293e+06 6
...
4.414213e+04 1
6.632227e+04 1
7.221947e+05 1
5.838055e+04 1
3.467473e+01 1
Name: difference, Length: 1235, dtype: int64
Any idea how to fix it?
Solution 1:[1]
There are two ways in which price_m22
ends up being equal to price_m2
:
surface_terrain
is1
: in this casenp.log(df.surface_terrain) == 0
.valeur_fonciere
is0
, in this caseprice_m22 == price_m2 == 0
.
So it is likely a combination of the two scenarios. Check the value_counts
for both, they will give you a number matching the difference you are seeing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |