DataFrame: Data Cleaning with Python
Remove outliers from specific columns
cols = ['col_1', 'col_2'] # one or more
Q1 = df[cols].quantile(0.25)
Q3 = df[cols].quantile(0.75)
IQR = Q3 - Q1
df = df[~((df[cols] < (Q1 - 1.5 * IQR)) |(df[cols] > (Q3 + 1.5 * IQR))).any(axis=1)]
Creating an empty dataframe with same columns from a existing dataframe
df_empty = df[0:0]
Duplicate row based on value in different column (such as frequency or quantity of row)
df.loc[df.index.repeat(df.Quantity)]
MultiIndex DataFrame
Assign new values to slice from MultiIndex DataFrame
data.loc[('<index 1 value>','<index 2 value>'), '<col>'] = <new value>
Comments are closed.