Identify and count spells (Distinctive events within each group) The Next CEO of Stack...
Is there a rule of thumb for determining the amount one should accept for a settlement offer?
How can the PCs determine if an item is a phylactery?
Can this transistor (2N2222) take 6 V on emitter-base? Am I reading the datasheet incorrectly?
Arrows in tikz Markov chain diagram overlap
Gauss' Posthumous Publications?
Is the offspring between a demon and a celestial possible? If so what is it called and is it in a book somewhere?
Compensation for working overtime on Saturdays
Is it possible to create a QR code using text?
How dangerous is XSS
What steps are necessary to read a Modern SSD in Medieval Europe?
Find a path from s to t using as few red nodes as possible
Is the 21st century's idea of "freedom of speech" based on precedent?
Is a distribution that is normal, but highly skewed, considered Gaussian?
Direct Implications Between USA and UK in Event of No-Deal Brexit
How badly should I try to prevent a user from XSSing themselves?
Car headlights in a world without electricity
Gödel's incompleteness theorems - what are the religious implications?
What does this strange code stamp on my passport mean?
Is this a new Fibonacci Identity?
Is it okay to majorly distort historical facts while writing a fiction story?
What is the difference between 'contrib' and 'non-free' packages repositories?
My boss doesn't want me to have a side project
Is it OK to decorate a log book cover?
Early programmable calculators with RS-232
Identify and count spells (Distinctive events within each group)
The Next CEO of Stack OverflowR - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
4
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
9 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
8 hours ago
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
r dataframe dplyr time-series tidyverse
edited 4 hours ago
Thomas Speidel
asked 9 hours ago
Thomas SpeidelThomas Speidel
364216
364216
4
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
9 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
8 hours ago
add a comment |
4
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
9 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
8 hours ago
4
4
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
9 hours ago
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
9 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
8 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
8 hours ago
add a comment |
6 Answers
6
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
A somehow different possibility (not involving cumsum()
) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
add a comment |
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
edited 7 hours ago
answered 9 hours ago
markusmarkus
15k11336
15k11336
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
add a comment |
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
1
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
7 hours ago
1
1
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
@M-M Added some explanation. Thanks for the comment.
– markus
7 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
answered 9 hours ago
MrFlickMrFlick
124k11141173
124k11141173
add a comment |
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
edited 9 hours ago
answered 9 hours ago
Hector HaffendenHector Haffenden
579216
579216
add a comment |
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
answered 7 hours ago
M-MM-M
7,17962146
7,17962146
add a comment |
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
edited 3 hours ago
answered 3 hours ago
akrunakrun
418k13207282
418k13207282
add a comment |
add a comment |
A somehow different possibility (not involving cumsum()
) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
add a comment |
A somehow different possibility (not involving cumsum()
) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
add a comment |
A somehow different possibility (not involving cumsum()
) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
A somehow different possibility (not involving cumsum()
) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
edited 11 mins ago
answered 8 hours ago
tmfmnktmfmnk
3,6211516
3,6211516
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
9 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
8 hours ago