Użyj wartości z poprzedniego wiersza w danych R.obliczanie tabeli

Question

Użyj wartości z poprzedniego wiersza w danych R.obliczanie tabeli

Chcę utworzyć nową kolumnę w danych.tabela obliczana na podstawie bieżącej wartości jednej kolumny i poprzedniej drugiej. Czy można uzyskać dostęp do poprzednich wierszy?

Np.:

> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
   A  B   C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?

Poprawna odpowiedź powinna brzmieć

> DT
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

67

r data.table

Author: Corone, 2013-02-04

Source

7 answers

Kilka osób odpowiedziało na konkretne pytanie. Zobacz poniższy kod, aby uzyskać ogólną funkcję, której używam w takich sytuacjach, które mogą być pomocne. Zamiast zdobywać poprzedni wiersz, możesz przejść tyle wierszy w "przeszłości" lub "przyszłości", ile chcesz.

rowShift <- function(x, shiftLen = 1L) {
  r <- (1L + shiftLen):(length(x) + shiftLen)
  r[r<1] <- NA
  return(x[r])
}

# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]

# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
                                 eruptLengthTwoPrior=rowShift(eruptions,-2), 
                                 eruptLengthThreeFuture=rowShift(eruptions,3))]
##   eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1:              3.600                  NA                  2.283
##2:              1.800                  NA                  4.533
##3:              3.333               3.600                     NA
##4:              2.283               1.800                     NA
##5:              4.533               3.333                     NA

19

Author: dnlbrky,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2014-08-01 16:24:39

Używając dplyr możesz zrobić:

mutate(DT, D = lag(B) + C)

Co daje:

#   A  B   C   D
#1: 1 10 100  NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540

14

Author: Steven Beaupré,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2015-04-27 02:48:34

Na podstawie komentarza @ Steve Lianoglou powyżej, dlaczego nie tylko:

DT[, D:= C + c(NA, B[.I - 1]) ]
#    A  B   C   D
# 1: 1 10 100  NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540

I unikaj używania seq_len lub head lub jakiejkolwiek innej funkcji.

12

Author: Gary Weissman,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2014-05-05 13:29:33

Po rozwiązaniu Aruna można uzyskać podobne wyniki bez odwoływania się do .N

> DT[, D := C + c(NA, head(B, -1))][]
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

9

Author: Ryogi,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2013-02-04 16:07:11

Dodałem argument padding i zmieniłem kilka nazw i nazwałem go shift. https://github.com/geneorama/geneorama/blob/master/R/shift.R

1

Author: geneorama,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2014-11-03 22:03:58

Oto moje intuicyjne rozwiązanie:

Twoja ramka danych

Df = data.frame (a=1:5, B=seq(10,50,10), C=seq(100,500, 100))

Teraz Utwórz nową kolumnę

Df$D=c(NA, head(DF$B, 4)+tail (DF$C, 4))

Tutaj 4 to liczba wierszy minus 1. Jeśli masz, powiedzmy, 1000 wierszy, to 4 należy zastąpić 999. nrow (df) podaje liczbę wierszy w ramce danych lub w wektorze. Podobnie, jeśli chcesz wziąć jeszcze wcześniejsze wartości, odejmij od nrow 2, 3,...itd, a także umieścić odpowiednio NA początku. Mam nadzieję, że to pomoże.

0

Author: Abdullah Adil Mahmud,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2018-07-05 10:51:14

score 85 · Accepted Answer

Z shift() zaimplementowanym w v1.9.6, jest to dość proste.

DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]

From NEWS :

nowa funkcja shift() implementuje szybkie lead/lagz wektora, lista, data.ramki lub dane.tabele . Przyjmuje type argument, który może być " lag " (domyślnie) lub "lead" . Umożliwia bardzo wygodne użytkowanie wraz z := lub set(). Na przykład: DT[, (cols) := shift(.SD, 1L), by=id]. Proszę spojrzeć na ?shift aby uzyskać więcej informacji.

Zobacz historię poprzednich odpowiedzi.