'Issue converting object to int in Python for OLS regression
I'm trying to run a multiple linear regression in Python. One of my columns "member_total" is an object and I can't figure out how to convert it into an int. Right now, when I run the OLS model, this variable is interpreted as being categorical and thus I receive tons of coefficients for it.
I suspect the issue is because "member_total" is an object, but I can't figure out how to convert it.
I've tried:
member_total = int(sub.member_total)
and get this error:
TypeError: cannot convert the series to <class 'int'>
I've also tried:
sub = sub.astype(int)
and get this error:
ValueError: invalid literal for int() with base 10: '27,908'
Solution 1:[1]
I realized you were just using the replace method incorrectly, when you want to modify a column, you have to identify the DataFrame that it's in as well. Yours appears to be called sub
, with the column in question being member_total
. ~ So, the correct way to use replace would be:
sub['member_total'] = sub['member_total'].replace(',', '')
or
sub['member_total'].replace(',', '', inplace=True)
To make everything you're trying to do one line:
sub['member_total'] = sub['member_total'].replace(',', '').astype(int)
If this STILL fails, a more robust method would be:
sub['member_total'] = sub['member_total'].replace('\D', '', regex=True).astype(int)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | BeRT2me |