'table of "mean of combined columns" and join with original table in R

I have a file (tekoopl) with 300,000 rows and 16 columns. Column 2 is PRICE and column 13 is DISTRICT. (In the DISTRICT column there are different districts in London; for example 2000 rows with district Westend, 4000 rows with district London etc.. The rows are interchangeable). Now I get stuck on the following questions: a. Generate a table with the average PRICE per DISTRICT b. Join this with the original table 'tekoopl). (So the final table is not aggregated but has the same number of rows as the original table) c. Create a new column (PRICE-DISTRICTPRICE) with the difference between the PRICE and the DISTRICT average. I only manage to calculate the average/mean PRICE of the column PRICE, but I need to have the average PRICE of all rows 'district a', all rows district b'and the number of rows of the new table (dfl07) must be equal to the number of rows in 'tekoopl'. Because after that, I have to join them. Which of course I cannot do. :-( Can someone help me?

tekoopl <- read.csv("datafiles/ppd_london_15161718.csv", 
                    stringsAsFactors = FALSE)  
str(tekoopl)  
dfl07 <- select(tekoopl, PRICE, DISTRICT) %>%  
GEM = round(mean(tekoopl$PRICE, na.rm = TRUE)) 

(I am currently using the packages dplyr and tidyr)

r


Solution 1:[1]

To generate a seperate table for the average price per district:

dfl07 %>% dplyr::group_by(DISTRICT) %>% summarise(mean_price=mean(PRICE, na.rm=T))

If you want to have it in your original table (so a column with an average prize per district, which I think is what you want to achieve), then:

dfl07 %>% dplyr::group_by(DISTRICT) %>% mutate(mean_price=mean(PRICE, na.rm=T))

To create mean price per district and the difference between the price and mean price for district (for the reference tekpool data frame):

tekoopl %>% dplyr::group_by(DISTRICT) %>% mutate(mean_price=mean(PRICE, na.rm=T), price_diff=PRICE-mean_price)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1