'Combine datasets
I'm supposed to combine all four datasets from https://www.drroyspencer.com/latest-global-temperatures/ into one dataframe. I've tried to merge it without luck. Any help?
What I've tried so far.
# Lower troposphere data.
lower_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " ")
# Mid-Troposphere data.
mid_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tmt/uahncdc_mt_6.0.txt", sep = " ")
# Tropopause data.
trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/ttp/uahncdc_tp_6.0.txt", sep = " ")
# Lower Stratosphere data.
lower_strat <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tls/uahncdc_ls_6.0.txt" , sep = " ")
#
lower_trop1 <- lower_trop[-c(518)]
mid_trop1 <- Mid_trop[-c(518)]
trop1 <- trop[-c(518)]
lower_strat1 <- lower_strat[-c(518)]
#
lower_trop <- as.data.frame(apply(lower_trop1, 2, as.numeric))
mid_trop <- as.data.frame(apply(mid_trop1, 2, as.numeric))
trop_paus <- as.data.frame((apply(trop1, 2, as.numeric)))
lower_strat <- as.data.frame(apply(lower_strat1, 2, as.numeric))
all<- rbind(lower_trop,mid_trop,trop_paus,lower_strat)
Solution 1:[1]
Don't ignore warnings. Your
freadcalls are warninglower_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " ") # [100%] Downloaded 89265 bytes... # Warning in fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", : # Stopped early on line 520. Expected 29 fields but found 0. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<Trend 0.14 0.18 0.12 0.16 0.19 0.14 0.11 0.16 0.10 0.12 0.16 0.11 0.18 0.21 0.17 0.10 0.15 0.09 0.25 0.23 0.27 0.01 0.09 -0.02 0.18 0.18 0.18>>This is because those files have trailing material that are not "proper CSV" (as far as R cares, that is):
Year Mo Globe Land Ocean NH Land Ocean SH Land Ocean Trpcs Land Ocean NoExt Land Ocean SoExt Land Ocean NoPol Land Ocean SoPol Land Ocean USA48 USA49 AUST 1978 12 -0.48 -0.51 -0.47 -0.44 -0.46 -0.42 -0.52 -0.62 -0.50 -0.60 -0.62 -0.59 -0.37 -0.44 -0.30 -0.46 -0.55 -0.45 -0.39 -0.68 -0.06 -0.45 -0.38 -0.49 -1.29 -1.15 -1.29 1979 1 -0.47 -0.64 -0.41 -0.64 -0.86 -0.50 -0.31 -0.13 -0.34 -0.47 -0.54 -0.45 -0.73 -0.93 -0.55 -0.23 0.12 -0.29 -0.46 -0.95 0.10 -0.16 -0.15 -0.16 -3.22 -2.42 0.92 ...snip... 2021 11 0.08 0.22 0.03 0.11 0.32 -0.02 0.06 -0.00 0.07 0.14 0.35 0.08 0.09 0.31 -0.10 0.02 -0.24 0.06 -0.43 -0.19 -0.70 -0.72 -0.67 -0.75 0.50 -0.06 -0.29 2021 12 0.21 0.17 0.22 0.27 0.19 0.32 0.15 0.12 0.15 0.03 0.30 -0.04 0.40 0.19 0.59 0.20 -0.07 0.25 0.01 -0.25 0.31 -0.42 -0.24 -0.51 1.63 1.42 -0.06 Year Mo Globe Land Ocean NH Land Ocean SH Land Ocean Trpcs Land Ocean NoExt Land Ocean SoExt Land Ocean NoPol Land Ocean SoPol Land Ocean USA48 USA49 AUST Trend 0.14 0.18 0.12 0.16 0.19 0.14 0.11 0.16 0.10 0.12 0.16 0.11 0.18 0.21 0.17 0.10 0.15 0.09 0.25 0.23 0.27 0.01 0.09 -0.02 0.18 0.18 0.18 NOTE: Version 6.0 as of April 2015 NOTE: New Reference for annual cycle 1991-2020 NOTE: Version 5.6 as of Jun 2013 NOTE: 3/13 More accurate grid weighting for USA48 NOTE: 3/13 Alaska added in USA49, Australia added GL 90S-90N, NH 0-90N, SH 90S-0, TRPCS 20S-20N NoExt 20N-90N, SoExt 90S-20S, NoPol 60N-90N, SoPol 90S-60SThere is no easy way to just "ignore" those lines, so I suggest you be very clear about it: state that that warning is expected and not a concern here. Your future self (and colleagues) will thank you. You can use this to silence just that warning (and allow
freadto continue normal operations):lower_trop <- withCallingHandlers( fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " "), warning = function(w) { if (grepl("Stopped early on line", as.character(w))) invokeRestart("muffleWarning") })R dislikes non-unique column names. I don't know anything about the data, but it appears that
LandandOceanare grouped by the preceding columnsGlobe,NH,SH, etc. Let's rename them to ensure they are unique and unambiguous (not just numbers):lands <- which(names(lower_trop1) == "Land") names(lower_trop1)[lands] <- paste0(names(lower_trop1)[lands-1], "_", names(lower_trop1)[lands]) names(lower_trop1)[lands+1] <- paste0(names(lower_trop1)[lands-1], "_", names(lower_trop1)[lands+1]) head(lower_trop1) # Year Mo Globe Globe_Land Globe_Ocean NH NH_Land NH_Ocean SH SH_Land SH_Ocean Trpcs Trpcs_Land # <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> # 1: 1978 12 -0.48 -0.51 -0.47 -0.44 -0.46 -0.42 -0.52 -0.62 -0.50 -0.60 -0.62 # 2: 1979 1 -0.47 -0.64 -0.41 -0.64 -0.86 -0.50 -0.31 -0.13 -0.34 -0.47 -0.54 # 3: 1979 2 -0.43 -0.56 -0.39 -0.47 -0.57 -0.41 -0.39 -0.53 -0.37 -0.36 -0.25 # 4: 1979 3 -0.38 -0.51 -0.33 -0.46 -0.51 -0.44 -0.30 -0.53 -0.26 -0.36 -0.43 # 5: 1979 4 -0.40 -0.57 -0.34 -0.47 -0.62 -0.37 -0.34 -0.46 -0.31 -0.35 -0.37 # 6: 1979 5 -0.40 -0.56 -0.33 -0.52 -0.54 -0.52 -0.27 -0.62 -0.19 -0.46 -0.55 # 16 variables not shown: [Trpcs_Ocean <char>, NoExt <char>, NoExt_Land <char>, NoExt_Ocean <char>, SoExt <char>, SoExt_Land <char>, SoExt_Ocean <char>, NoPol <char>, NoPol_Land <char>, NoPol_Ocean <char>, ...](Almost) Never use
applyon adata.frame. Okay, almost never, there are legitimate times to do it. However, by default,applyis going to return amatrixwhen you really need it to remain adata.frame(or at leastlist-like). Uselapplyinstead:lower_trop1[] <- lapply(lower_trop1, as.numeric)Two tricks to know about this and how it works with frames:
A
data.frameis just a glorifiedlistwith some added conditions: all elements should be named uniquely; all elements (typically) must be the same length; it may have row-names, some tools use/honor/keep them, some tools (dplyr) ignore or actively remove them. Oh, and theclassis slightly different. But most things that work onlists work just as well ondata.frames, column-wise.Even though we're working on columns of a frame, in one sense we're just working on elements of a
list, and we want it to staylist-like, so we should be usinglapply.Unfortunately,
lapplyis going to return alistnot adata.frame, so the reassignment intolower_trop1[] <- ...says to replace the contents oflower_trop1without changing its class. There is likely more that can be said here, but know thatframe[] <- lapply(frame, ...)preserves thedata.frameproperties we need while allowing very simplelapply-operations.
From here, rbind should work as intended.
FYI, you can combine your middle two steps with:
lower_trop1 <- lower_trop[-518,][, lapply(.SD, as.numeric)]
though you still need to do the renaming for unique columns names.
Solution 2:[2]
I guess you want some sort of join (so joining the columns from the 4 datasets). Here some information on the different join options in R: https://dplyr.tidyverse.org/reference/mutate-joins.html
Now in your case, you have the Year and the Mo columns that appear in your different datasets. So you could join the datasets by these columns. Here an example of a left join (just for the first two dataframes):
First, make sure that you have unique column names in each single dataframe (you do currently not have unique names, so this step is necessary):
names(lower_trop) <- make.unique(names(lower_trop), sep="_")
names(mid_trop) <- make.unique(names(lower_trop), sep="_")
Then you join these with a left join (so you keep all the rows from the left dataset):
new_df <- left_join(lower_trop, mid_trop, by = c("Year", "Mo"), suffix = c(".lower_trop", ".mid_trop"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Ane |
