Buat kolom pada dua kondisi dengan panda

Saya menggunakan panda untuk melakukan beberapa latihan analisis. Saya ingin membuat kolom baru yang nilainya merupakan jumlah dari dua baris. Kumpulan data asli adalah sebagai berikut...

    Admit      Gender   Dept    Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted    Female  A   89
3   Rejected    Female  A   19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted    Female  B   17
7   Rejected    Female  B   8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted    Female  C   202
11  Rejected    Female  C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted    Female  D   131
15  Rejected    Female  D   244
16  Admitted    Male    E   53
17  Rejected    Male    E   138
18  Admitted    Female  E   94
19  Rejected    Female  E   299
20  Admitted    Male    F   22
21  Rejected    Male    F   351
22  Admitted    Female  F   24
23  Rejected    Female  F   317

Saya ingin membuat kolom baru menggunakan bingkai data berikut...

    Dept    Gender  Freq
0   A   Female  108
1   A   Male    825
2   B   Female  25
3   B   Male    560
4   C   Female  593
5   C   Male    325
6   D   Female  375
7   D   Male    417
8   E   Female  393
9   E   Male    191
10  F   Female  341
11  F   Male    373

Saya ingin membuat kolom baru di bingkai data pertama menggunakan kolom Freq dari bingkai data kedua. Saya perlu memasukkan nilai 108 if Detp and Gender yang sama di kedua bingkai data. Bingkai data baru akan terlihat seperti ini...

    Admit      Gender   Dept    Freq   Total
0   Admitted    Male    A   512        825
1   Rejected    Male    A   313        825
2   Admitted    Female  A   89         108
3   Rejected    Female  A   19         108
4   Admitted    Male    B   353        560
5   Rejected    Male    B   207        560
6   Admitted    Female  B   17         25
7   Rejected    Female  B   8          25 

Saya telah mencoba kode berikut...

for i in data.iterrows():
    for j in total_freq.iterrows():
        if i[1].Gender == total_freq.Gender & i[1].Dept == total_freq.Dept:
            data['Total'] = total_freq.Freq

Saya mendapatkan kesalahan berikut... TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

Adakah bantuan untuk membuat kolom dengan nilai yang benar?


person Gilbert    schedule 17.05.2017    source sumber


Jawaban (2)


Anda dapat menggunakan transformasi

df['Total'] = df.groupby(['Dept', 'Gender']).Freq.transform('sum')

Anda mendapatkan

    Admit   Gender  Dept    Freq    Total
0   Admitted    Male    A   512 825
1   Rejected    Male    A   313 825
2   Admitted    Female  A   89  108
3   Rejected    Female  A   19  108
4   Admitted    Male    B   353 560
5   Rejected    Male    B   207 560
6   Admitted    Female  B   17  25
7   Rejected    Female  B   8   25
8   Admitted    Male    C   120 325
9   Rejected    Male    C   205 325
10  Admitted    Female  C   202 593
11  Rejected    Female  C   391 593
12  Admitted    Male    D   138 417
13  Rejected    Male    D   279 417
14  Admitted    Female  D   131 375
15  Rejected    Female  D   244 375
16  Admitted    Male    E   53  191
17  Rejected    Male    E   138 191
18  Admitted    Female  E   94  393
19  Rejected    Female  E   299 393
20  Admitted    Male    F   22  373
21  Rejected    Male    F   351 373
22  Admitted    Female  F   24  341
23  Rejected    Female  F   317 341
person Vaishali    schedule 17.05.2017

Anda dapat menggunakan pandas.DataFrame.merge() ke kiri, gabungkan total Anda dari kerangka data kedua ke kerangka data pertama. Pertama, ganti nama freq di total df.

df1 = df1.rename(columns={'Freq':'Total'})
df_totals = pd.merge(df, df1['Total'], how='left', on=['Gender', 'Dept'])
person SimplySnee    schedule 17.05.2017