'How can I take all the values (Id, Date and Calories) from ONLY the first and last date of a date range, in date formatting?
First off, StackOverFlow keeps saying there are answers already, but I've been looking for 2.5 hours now and nothing is available.
I'm attempting to view values from a dataframe with 940 rows. I would like to view the calories associated to the user IDs from the first and last dates of the trial.
Id ActivityDay Calories
1 1503960366 2016-04-12 1985
2 1624580081 2016-04-12 1432
3 1644430081 2016-04-12 3199
4 1844505072 2016-04-12 2030
5 1927972279 2016-04-12 2220
6 2022484408 2016-04-12 2390
7 2026352035 2016-04-12 1459
8 2320127002 2016-04-12 2124
9 2347167796 2016-04-12 2344
10 2873212765 2016-04-12 1982
11 3372868164 2016-04-12 1788
12 3977333714 2016-04-12 1450
13 4020332650 2016-04-12 3654
14 4057192912 2016-04-12 2286
15 4319703577 2016-04-12 2115
16 4388161847 2016-04-12 2955
17 4445114986 2016-04-12 2113
18 4558609924 2016-04-12 1909
19 4702921684 2016-04-12 2947
20 5553957443 2016-04-12 2026
21 5577150313 2016-04-12 3405
22 6117666160 2016-04-12 1496
23 6290855005 2016-04-12 2560
24 6775888955 2016-04-12 1841
25 6962181067 2016-04-12 1994
26 7007744171 2016-04-12 2937
27 7086361926 2016-04-12 2772
28 8053475328 2016-04-12 3186
29 8253242879 2016-04-12 2044
30 8378563200 2016-04-12 3635
31 8583815059 2016-04-12 2650
32 8792009665 2016-04-12 2044
33 8877689391 2016-04-12 3921
34 1503960366 2016-04-13 1797
35 1624580081 2016-04-13 1411
36 1644430081 2016-04-13 2902
37 1844505072 2016-04-13 1860
38 1927972279 2016-04-13 2151
39 2022484408 2016-04-13 2601
40 2026352035 2016-04-13 1521
41 2320127002 2016-04-13 2003
42 2347167796 2016-04-13 2038
43 2873212765 2016-04-13 2004
44 3372868164 2016-04-13 2093
45 3977333714 2016-04-13 1495
46 4020332650 2016-04-13 1981
47 4057192912 2016-04-13 2306
48 4319703577 2016-04-13 2135
49 4388161847 2016-04-13 3092
50 4445114986 2016-04-13 2095
51 4558609924 2016-04-13 1722
52 4702921684 2016-04-13 2898
This is the sample data...ommiting the other nearly 900 rows... I want to keep only the date of 2016-04-12, AND 2016-05-12. That is the range of which the data was taken from. I'd like to see the IDs of the users, and their calories from those 2 dates only.
I've tried about 50 codes...here is where I'm at right now:
Daily_Calories %>%
group_by(Id, Calories) %>%
arrange(ActivityDay) %>%
as.data.frame()
I have not saved all the codes I've tried, as I'm new and RStudio gets messy and unorganized quickly...and then I get a bit lost.
I've also tried:
Daily_Calories %>%
group_by(Id, Calories) %>%
group_by(min(ActivityDay), max(ActivityDay)) %>%
arrange(ActivityDay) %>%
as.data.frame()
and got this:
Id ActivityDay Calories min(ActivityDay) max(ActivityDay)
1 1503960366 2016-04-12 1985 2016-04-12 2016-05-12
2 1624580081 2016-04-12 1432 2016-04-12 2016-05-12
3 1644430081 2016-04-12 3199 2016-04-12 2016-05-12
4 1844505072 2016-04-12 2030 2016-04-12 2016-05-12
5 1927972279 2016-04-12 2220 2016-04-12 2016-05-12
6 2022484408 2016-04-12 2390 2016-04-12 2016-05-12
7 2026352035 2016-04-12 1459 2016-04-12 2016-05-12
8 2320127002 2016-04-12 2124 2016-04-12 2016-05-12
9 2347167796 2016-04-12 2344 2016-04-12 2016-05-12
10 2873212765 2016-04-12 1982 2016-04-12 2016-05-12
11 3372868164 2016-04-12 1788 2016-04-12 2016-05-12
12 3977333714 2016-04-12 1450 2016-04-12 2016-05-12
and then tried this:
Daily_Calories %>%
group_by(Id, Calories) %>%
arrange(ActivityDay) %>%
summarise(min(ActivityDay), max(ActivityDay)) %>%
as.data.frame()
and got this:
Id Calories min(ActivityDay) max(ActivityDay)
1 1503960366 0 2016-05-12 2016-05-12
2 1503960366 1728 2016-04-17 2016-04-17
3 1503960366 1740 2016-05-08 2016-05-08
4 1503960366 1745 2016-04-15 2016-04-15
5 1503960366 1775 2016-04-21 2016-04-21
6 1503960366 1776 2016-04-14 2016-04-14
7 1503960366 1783 2016-05-11 2016-05-11
8 1503960366 1786 2016-04-20 2016-04-20
9 1503960366 1788 2016-04-24 2016-04-24
I'm not looking for the minimum and maximum calories, simply, the "minimum" and "maximum" dates...meaning, 2016-04-12, and 2016-05-12. All three of these codes I just tried had 700+ rows omitted from the results, which signifies they are wrong. There are 33 users, and 2 dates, so there should be 66 rows for results.
I hope this is explained well enough, I'm trying to be better with my questions. I appreciate the time and help.
Almost forgot, I wasn't wanting to create a new dataframe, just see the results. That's why my code starts with just the dataframe. Does it make a difference? I'd prefer the results in the console for viewing. Cheers!
Solution 1:[1]
If I understand you correctly, you want to keep all observations in the data frame where ActivityDay is either 2016-04-12 or 2016-05-12, correct? Or do you want to view all values in the range between them?
If so, try:
keeps <- c("2016-04-12", "2016-05-12")
# Keep only those values
df[df$ActivityDay %in% keeps,]
# Keep value in range between
df[as.Date(df$ActivityDay) %in% seq(min(as.Date(keeps)), max(as.Date(keeps)),1),]
This will show values for the dates that you want.
I was unclear as to what your final data would look like - if I misunderstood, let me know and I will modify my answer. Good luck!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
