Conditional operations for rows in pandas dataframe
By : obaid
Date : March 29 2020, 07:55 AM
This might help you You can use loc: code :
df.loc[df.prediction < 0, 'return'] = 0
print df
returns prediction return
0 1.005705 0.999999 NaN
1 0.005952 1.000000 NaN
2 0.000000 0.999891 0
3 0.020000 1.000000 0
4 0.000000 1.000000 NaN
5 0.005000 1.000000 NaN
6 0.000000 0.999984 0
7 0.005813 0.999871 0
df.loc[df.prediction < 0, 'returns'] = 0
print df
returns prediction
0 1.005705 0.999999
1 0.005952 1.000000
2 0.000000 0.999891
3 0.000000 1.000000
4 0.000000 1.000000
5 0.005000 1.000000
6 0.000000 0.999984
7 0.000000 0.999871

create a new pandas dataframe by taking values from a different dataframe and perforing some mathematical operations on
By : James
Date : March 29 2020, 07:55 AM
To fix this issue The general idea is to stack your values so you can apply numpy's fast, vectorized functions. code :
# stack the dataframe
df2 = df.stack().reset_index(level=1)
df2.columns = ['sec', 'value']
# extract the sector number
df2['sec_no'] = df2['sec'].str.slice(2).astype(int)
# apply numpy's vectorized functions
import numpy as np
df2['x'] = df2['value'] * (np.cos(np.radians(1.40625*(df2['sec_no']))))
df2['y'] = df2['value'] * (np.sin(np.radians(1.40625*(df2['sec_no']))))
sec value sec_no x y
19700101 05:54:17 sec01 8.50 1 8.497440 0.208600
19700101 05:54:17 sec02 8.62 2 8.609617 0.422963
19700101 05:54:17 sec03 8.53 3 8.506888 0.627506
19700101 05:54:17 sec04 8.45 4 8.409311 0.828245
19700101 05:54:17 sec05 8.50 5 8.436076 1.040491
df2[['sec', 'x', 'y']].pivot(columns='sec')

Pandas dataframe  speed in python: dataframe operations, numba, cython
By : Prateek Vora
Date : March 29 2020, 07:55 AM
I wish this helpful for you I have a financial dataset with ~2 million rows. I would like to import it as a pandas dataframe and add additional columns by applying rowwise functions utilizing some of the existing column values. For this purpose I would like to not use any techniques like parallelization, hadoop for python, etc, and so I'm faced with the following: , How about simply: code :
df.loc[:, 'px'] = (alpha * beta) / df.loc[:, 'time'] * df.loc[:, 'vol']

Vectorized Operations on two Pandas DataFrame to create a new DataFrame
By : Ko Ko Naing
Date : October 14 2020, 09:25 AM
it helps some times I have orders.csv as a dataframe called orders_df: , Vectorized Solution code :
j = np.array([df_trades.columns.get_loc(c) for c in orders_df.Symbol])
i = np.arange(len(df_trades))
o = np.where(orders_df.Order.values == 'BUY', 1, 1)
v = orders_df.Shares.values * o
t = df_trades.values
t[i, j] = v
df_trades.loc[:, 'CASH'] = \
df_trades.drop('CASH', 1, errors='ignore').mul(prices_df).sum(1)
df_trades
AAPL IBM GOOG XOM SPY CASH
Date
20110110 100 0 0 0 0 15000.0
20110113 200 0 0 0 0 50000.0
20110113 0 100 0 0 0 30000.0
20110126 0 0 200 0 0 20000.0

Calculate conditional probability using groupby and shift operations in Pandas dataframe
By : user2816481
Date : March 29 2020, 07:55 AM
This might help you I have a dataframe with patients and their visits and the presence of a disease in their left and/or right eye is labeled with {0,1} values (0 = not present and 1 = present). The dataset looks like this: , IIUC: code :
df.groupby('L').R.mean()
L
0 0.000000
1 0.384615
Name: R, dtype: float64
df.groupby(['Patient','L']).R.mean()
Patient L
P_1 1 0.2
P_2 1 0.5
P_3 0 0.0
1 0.5
Name: R, dtype: float64

