'NaN in cor() metric
I have data from compustat on different economic metrics for US companies as well as lobbying expenditure for these companies. I merge these datasets with left_join() function.
I have created two additional variables: "lobby_revt_pct" (percentage of lobbying expenditure of revenue) and "lobby_gp_pct" (percentage of lobbying expenditure of gross profit). Revenue and gross profit are in millions. I checked these variables manually, and it checks out (i.e. the lobby expenditure of e.g. revenue in % is correct).
I wish to see these variables in a correlation matrix. The other variables work, but the two aforementioned variables return NaN. Why? And is there a fix?
join_df <- left_join(compustat, lobby_expenditure, by = c("gvkey", "year"))
#new variable: lobbying expenditure as a percentage share
#of respective year revenue
join_df$lobby_revt_pct <- (join_df$expend/(join_df$revt*1000000))*100
join_df$lobby_gp_pct <- (join_df$expend/(join_df$gp*1000000))*100
#cor
cor(join_df, use = "pairwise.complete.obs"[sapply(join_df, is.numeric)])
Output:
gvkey year naics at gp
gvkey 1.000000000 0.145972467 -0.016623718 -0.05087810 -0.085729359
year 0.145972467 1.000000000 -0.027012382 0.03681725 0.051685347
naics -0.016623718 -0.027012382 1.000000000 0.04463169 0.004457598
at -0.050878102 0.036817255 0.044631687 1.00000000 0.594233227
gp -0.085729359 0.051685347 0.004457598 0.59423323 1.000000000
lct -0.056661336 0.051043277 -0.019868730 0.88155209 0.771837888
revt -0.084994470 0.048538321 -0.010151030 0.43787682 0.827172011
tie -0.115622243 0.003035108 -0.002584713 0.76935365 0.753452440
txpd 0.002665236 0.006650114 -0.001791597 0.02586675 0.049040643
xdp NA NA NA NA NA
xsga -0.098379359 0.047337986 0.006583750 0.43630634 0.920886086
expend -0.098455101 0.080394812 0.033842575 0.33230920 0.496191500
lobby_revt_pct NaN NaN NaN NaN NaN
lobby_gp_pct NaN NaN NaN NaN NaN
lct revt tie txpd xdp xsga
gvkey -0.05666134 -0.08499447 -0.115622243 0.002665236 NA -0.09837936
year 0.05104328 0.04853832 0.003035108 0.006650114 NA 0.04733799
naics -0.01986873 -0.01015103 -0.002584713 -0.001791597 NA 0.00658375
at 0.88155209 0.43787682 0.769353646 0.025866750 NA 0.43630634
gp 0.77183789 0.82717201 0.753452440 0.049040643 NA 0.92088609
lct 1.00000000 0.81128522 NA 0.053310015 NA 0.75583507
revt 0.81128522 1.00000000 0.891790266 0.057496412 NA 0.77383781
tie NA 0.89179027 1.000000000 0.586165935 NA 0.67626876
txpd 0.05331001 0.05749641 0.586165935 1.000000000 NA 0.02382979
xdp NA NA NA NA NA NA
xsga 0.75583507 0.77383781 0.676268756 0.023829788 NA 1.00000000
expend 0.35901711 0.38344403 0.199782257 0.244935543 NA 0.46833965
lobby_revt_pct NaN NaN -0.006623683 NaN NA NaN
lobby_gp_pct NaN NaN -0.010564012 NaN NA NaN
expend lobby_revt_pct lobby_gp_pct
gvkey -0.09845510 NaN NaN
year 0.08039481 NaN NaN
naics 0.03384257 NaN NaN
at 0.33230920 NaN NaN
gp 0.49619150 NaN NaN
lct 0.35901711 NaN NaN
revt 0.38344403 NaN NaN
tie 0.19978226 -0.006623683 -0.01056401
txpd 0.24493554 NaN NaN
xdp NA NA NA
xsga 0.46833965 NaN NaN
expend 1.00000000 NaN NaN
lobby_revt_pct NaN NaN NaN
lobby_gp_pct NaN NaN NaN
Warning message:
In if (is.na(na.method)) stop("invalid 'use' argument") :
the condition has length > 1 and only the first element will be used
edit: @quinten : I used dput(join_df). Not sure if I did it correctly? I removed the first part due to space limitation. Most of it is NA anyway.
, row.names = c(NA, -214620L
), spec = structure(list(cols = list(gvkey = structure(list(), class = c("collector_number",
"collector")), year = structure(list(), class = c("collector_number",
"collector")), naics = structure(list(), class = c("collector_number",
"collector")), at = structure(list(), class = c("collector_number",
"collector")), gp = structure(list(), class = c("collector_number",
"collector")), lct = structure(list(), class = c("collector_number",
"collector")), revt = structure(list(), class = c("collector_number",
"collector")), tie = structure(list(), class = c("collector_number",
"collector")), txpd = structure(list(), class = c("collector_number",
"collector")), xdp = structure(list(), class = c("collector_number",
"collector")), xsga = structure(list(), class = c("collector_number",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x559516ae34c0>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
