Wektorowe sprawdzanie wartości w ramce danych Pandy
Mam dwa pandy dataframes jeden o nazwie 'orders' i drugi o nazwie 'daily_prices'. daily_prices jest następujące:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
Kolejność jest następująca:
direction size ticker prices
2011-01-10 Buy 1500 AAPL 339.44
2011-01-13 Sell 1500 AAPL 342.64
2011-01-13 Buy 4000 IBM 143.92
2011-01-26 Buy 1000 GOOG 616.50
2011-02-02 Sell 4000 XOM 79.46
2011-02-10 Buy 4000 XOM 79.68
2011-03-03 Sell 1000 GOOG 609.56
2011-03-03 Sell 2200 IBM 158.73
2011-06-03 Sell 3300 IBM 160.97
2011-05-03 Buy 1500 IBM 167.84
2011-06-10 Buy 1200 AAPL 323.03
2011-08-01 Buy 55 GOOG 606.77
2011-08-01 Sell 55 GOOG 606.77
2011-12-20 Sell 1200 AAPL 392.46
Indeks obu ramek danych to datetime.Data. kolumna "ceny" w ramce danych "zamówienia" została dodana za pomocą rozumienia listy, aby zapętlić wszystkie zamówienia i wyszukać konkretny symbol dla konkretnej daty w ramce danych "daily_prices", a następnie dodać tę listę jako kolumnę do ramki danych "zamówienia". Ja bym jak to zrobić za pomocą operacji tablicy, a nie coś, co pętli. czy da się to zrobić? próbowałem użyć:
Daily_prices.ix [daty, tickery]
Ale to zwraca macierz iloczynu kartezjańskiego obu list. chcę, aby zwrócił wektor kolumnowy tylko ceny określonego tickera dla określonej daty.
2 answers
Użyj naszego przyjaciela lookup
, zaprojektowanego właśnie do tego celu:
In [17]: prices
Out[17]:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
In [18]: orders
Out[18]:
Date direction size ticker prices
0 2011-01-10 00:00:00 Buy 1500 AAPL 339.44
1 2011-01-13 00:00:00 Sell 1500 AAPL 342.64
2 2011-01-13 00:00:00 Buy 4000 IBM 143.92
3 2011-01-26 00:00:00 Buy 1000 GOOG 616.50
4 2011-02-02 00:00:00 Sell 4000 XOM 79.46
5 2011-02-10 00:00:00 Buy 4000 XOM 79.68
6 2011-03-03 00:00:00 Sell 1000 GOOG 609.56
7 2011-03-03 00:00:00 Sell 2200 IBM 158.73
8 2011-06-03 00:00:00 Sell 3300 IBM 160.97
9 2011-05-03 00:00:00 Buy 1500 IBM 167.84
10 2011-06-10 00:00:00 Buy 1200 AAPL 323.03
11 2011-08-01 00:00:00 Buy 55 GOOG 606.77
12 2011-08-01 00:00:00 Sell 55 GOOG 606.77
13 2011-12-20 00:00:00 Sell 1200 AAPL 392.46
In [19]: prices.lookup(orders.Date, orders.ticker)
Out[19]:
array([ 339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56,
158.73, 160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2012-12-15 15:47:22
Poniżej dla wygody każdego, aby odtworzyć wyniki.
columns=['AAPL','GOOG','IBM','XOM']
index = ['2011-01-10','2011-01-13','2011-01-26','2011-02-02','2011-02-10','2011-03-03','2011-05-03','2011-06-03','2011-06-10','2011-08-01','2011-12-20']
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.44,614.21,142.78,71.57]
prices.iloc[1]=[342.64,616.69,143.92,73.08]
prices.iloc[2]=[340.82,616.50,155.74,75.89]
prices.iloc[3]=[341.29,612.00,157.93,79.46]
prices.iloc[4]=[351.42,616.44,159.32,79.68]
prices.iloc[5]=[356.40,609.56,158.73,82.19]
prices.iloc[6]=[345.14,533.89,167.84,82.00]
prices.iloc[7]=[340.42,523.08,160.97,78.19]
prices.iloc[8]=[323.03,509.51,159.14,76.84]
prices.iloc[9]=[393.26,606.77,176.28,76.67]
prices.iloc[10]=[392.46,630.37,184.14,79.97]
columns=['Date','direction','size','ticker','prices']
orders = pd.DataFrame(columns=columns)
orders.loc[0] = ['2011-01-10','Buy',1500,'AAPL',339.44]
orders.loc[1] = ['2011-01-13','Sell',1500,'AAPL',342.64]
orders.loc[2] = ['2011-01-13','Buy',4000,'IBM',143.92]
orders.loc[3] = ['2011-01-26','Buy',1000,'GOOG',616.50]
orders.loc[4] = ['2011-02-02','Sell',4000,'XOM',79.46]
orders.loc[5] = ['2011-02-10','Buy',4000,'XOM',79.68]
orders.loc[6] = ['2011-03-03','Sell',1000,'GOOG',609.56]
orders.loc[7] = ['2011-03-03','Sell',2200,'IBM',158.73]
orders.loc[8] = ['2011-06-03','Sell',3300,'IBM',160.97]
orders.loc[9] = ['2011-05-03','Buy',1500,'IBM',167.84]
orders.loc[10] = ['2011-06-10','Buy',1200,'AAPL',323.03]
orders.loc[11] = ['2011-08-01','Buy',55,'GOOG',606.77]
orders.loc[12] = ['2011-08-01','Sell',55,'GOOG',606.77]
orders.loc[13] = ['2011-12-20','Sell',1200,'AAPL',392.46]
lookupValues = prices.lookup(orders.Date, orders.ticker)
Następnie wynik:
>>> prices
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.5 155.74 75.89
2011-02-02 341.29 612 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.4 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
>>> orders
Date direction size ticker prices
0 2011-01-10 Buy 1500 AAPL 339.44
1 2011-01-13 Sell 1500 AAPL 342.64
2 2011-01-13 Buy 4000 IBM 143.92
3 2011-01-26 Buy 1000 GOOG 616.50
4 2011-02-02 Sell 4000 XOM 79.46
5 2011-02-10 Buy 4000 XOM 79.68
6 2011-03-03 Sell 1000 GOOG 609.56
7 2011-03-03 Sell 2200 IBM 158.73
8 2011-06-03 Sell 3300 IBM 160.97
9 2011-05-03 Buy 1500 IBM 167.84
10 2011-06-10 Buy 1200 AAPL 323.03
11 2011-08-01 Buy 55 GOOG 606.77
12 2011-08-01 Sell 55 GOOG 606.77
13 2011-12-20 Sell 1200 AAPL 392.46
>>> lookupValues
array([339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56, 158.73,
160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
>>>