Gabungan Luar Kiri untuk catatan yang tidak sama dari dua bingkai data dalam skala percikan

Saya memiliki dua bingkai data. Bingkai Data satu

+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|DataPartition|TimeStamp                |OrganizationID|SourceID|_auditorId|sr:AuditorEnumerationId|sr:AuditorOpinionCode|sr:AuditorOpinionId|sr:IsPlayingAuditorRole|sr:IsPlayingCSRAuditorRole|sr:IsPlayingTaxAdvisorRole|FFAction|!||
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|Japan        |2018-05-03T09:52:48+00:00|4295876589    |195     |null      |null                   |null                 |null               |null                   |null                      |null                      |O|!|       |
|Japan        |2018-05-03T08:10:19+00:00|4295876589    |196     |null      |null                   |null                 |null               |null                   |null                      |null                      |D|!|       |
|Japan        |2018-05-03T09:52:48+00:00|4295876589    |194     |null      |null                   |null                 |null               |null                   |null                      |null                      |O|!|       |
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+

Bingkai Data Dua adalah

    DataPartition   TimeStamp   OrganizationID  SourceID    _auditorId  sr:AuditorEnumerationId sr:AuditorOpinionCode   sr:AuditorOpinionId sr:IsPlayingAuditorRole sr:IsPlayingCSRAuditorRole  sr:IsPlayingTaxAdvisorRole  FFAction|!|
Japan   2018-05-03T08:06:06+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T08:06:06+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T09:48:33+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:48:33+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:27:10+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:27:10+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:27:10+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:35:42+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:35:42+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:35:42+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:34:46+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:34:46+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T08:10:19+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T08:10:19+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:28:16+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:28:16+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:28:16+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-02T09:05:04+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-02T09:05:04+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-02T09:05:04+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:31:28+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:31:28+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:31:28+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:22:58+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:22:58+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:22:58+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:45:22+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:45:22+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:11:26+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:11:26+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:11:26+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:00:45+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:00:45+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:00:45+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:36:47+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:36:47+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:36:47+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:01:52+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:01:52+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:01:52+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-02T10:28:22+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-02T10:28:22+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-02T10:28:22+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:52:48+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:52:48+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T09:41:09+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:41:09+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-02T10:30:32+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-02T10:30:32+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-02T10:30:32+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T06:56:32+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T06:56:32+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T06:56:32+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T07:05:04+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:05:04+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:05:04+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|
Japan   2018-05-03T09:38:59+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T09:38:59+00:00   4295876589  195 16157   1002485247  UWE 3010547 true    false   false   O|!|
Japan   2018-05-03T07:08:14+00:00   4295876589  194 2719    3023331 AOP 3010542 true    false   true    O|!|
Japan   2018-05-03T07:08:14+00:00   4295876589  195 5937    3026578 NOP 3010543 true    false   true    O|!|
Japan   2018-05-03T07:08:14+00:00   4295876589  196 3252    3024053 ONC 3020538 true    false   true    O|!|

Sekarang saya ingin menambahkan semua kolom bingkai data, satu dua bingkai data kecuali untuk catatan yang tiga kolomnya TimeStamp ,OrganizationID and SourceID berbeda. Jadi dalam hal ini catatan bingkai data satu tidak akan ditambahkan ke bingkai data dua. Karena TimeStamp |OrganizationID|SourceID kolom cocok di kedua bingkai data.

Hanya 1 baris yang harus ditambahkan yang memiliki SourceId 196 .

Apakah left_outer join akan berfungsi dalam kasus ini? Ketika saya melakukan itu saya mendapatkan kolom duplikat.

Jadi singkatnya Catatan yang cocok berdasarkan tiga kolom dari Bingkai data 1 tidak akan ditambahkan selain semua catatan akan ditambahkan ke bingkai data


person Atharv Thakur    schedule 10.05.2018    source sumber
comment
seperti apa kerangka data akhir Anda?   -  person koiralo    schedule 10.05.2018
comment
Apakah Anda benar-benar berusaha melakukan apa yang benar?   -  person Andrey Tyukin    schedule 10.05.2018


Jawaban (1)


Anda dapat mencoba leftanti bergabung lalu uniondf2,

df1.join(df2, Seq("TimeStamp" ,"OrganizationID", "SourceID"), "leftanti").union(df2)
person Binzi Cao    schedule 10.05.2018
comment
Tidak, saya mendapat catatan duplikat jika saya menggunakan ini - person Atharv Thakur; 10.05.2018