Pisahkan nilai kolom dalam file CSV menggunakan pembatas ganda dan duplikat

Saya memiliki file CSV yang berantakan. Saya mencoba menggunakan regex untuk mengekstrak nama depan dan nama belakang dari nilai di kolom di file csv. Nama depan dan belakang akan memiliki kolomnya masing-masing.

File CSV (dengan kombinasi pembatas berbeda):

ID,Description,Number
JDo,John Doe - Temp - Client Client Ops,SomeValue
JDo,John  Doe - Temp - Client Client Ops,SomeValue
JDo,John  Doe  - Temp - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops ,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops  ,SomeValue
JDo,John Doe-Temp-Client Client Ops,SomeValue
JDo,John  Doe - Temp-Client Client Ops,SomeValue
JDo,John  Doe  - Temp-Client Client Ops,SomeValue
JDo,John  Doe-Temp -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops,SomeValue
JDo,John  Doe-Temp  -  Client  Client  Ops ,SomeValue
JDo,John  Doe-Temp-Client  Client  Ops  ,SomeValue
JDo,John.Doe - Temp - Client Client Ops,SomeValue
JDo,John .Doe - Temp - Client Client Ops,SomeValue
JDo,John. Doe - Temp - Client Client Ops,SomeValue
JDo,John . Doe - Temp - Client Client Ops,SomeValue
JDo,John.Doe - Temp - Client Client Ops  ,SomeValue
JDo,John .Doe - Temp - Client Client Ops  ,SomeValue
JDo,John. Doe - Temp - Client Client Ops  ,SomeValue
JDo,John . Doe - Temp - Client Client Ops  ,SomeValue
JDo,John.Doe-Temp-Client Client Ops,SomeValue
JDo,John .Doe-Temp-Client Client Ops,SomeValue
JDo,John. Doe-Temp-Client Client Ops,SomeValue
JDo,John . Doe-Temp-Client Client Ops,SomeValue
JDo,John.Doe  - Temp  - Client Client Ops,SomeValue
JDo,John .Doe -  Temp -  Client Client Ops,SomeValue
JDo,John. Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John . Doe - Temp - Client Client Ops,SomeValue
JDo,John?Doe - Temp - Client Client Ops,SomeValue
JDo,John ?Doe - Temp - Client Client Ops,SomeValue
JDo,John? Doe - Temp - Client Client Ops,SomeValue
JDo,John ? Doe - Temp - Client Client Ops,SomeValue
JDo,John?Doe - Temp - Client Client Ops  ,SomeValue
JDo,John ?Doe - Temp - Client Client Ops  ,SomeValue
JDo,John? Doe - Temp - Client Client Ops  ,SomeValue
JDo,John ? Doe - Temp - Client Client Ops  ,SomeValue
JDo,John?Doe-Temp-Client Client Ops,SomeValue
JDo,John ?Doe-Temp-Client Client Ops,SomeValue
JDo,John? Doe-Temp-Client Client Ops,SomeValue
JDo,John ? Doe-Temp-Client Client Ops,SomeValue
JDo,John?Doe  - Temp  - Client Client Ops,SomeValue
JDo,John ?Doe -  Temp -  Client Client Ops,SomeValue
JDo,John? Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John ? Doe - Temp - Client Client Ops,SomeValue
JDo,"John,Doe - Temp - Client Client Ops",SomeValue
JDo,"John ,Doe - Temp - Client Client Ops",SomeValue
JDo,"John, Doe - Temp - Client Client Ops",SomeValue
JDo,"John , Doe - Temp - Client Client Ops",SomeValue
JDo,"  John,Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John ,Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John, Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John , Doe - Temp - Client Client Ops  ",SomeValue
JDo,"John,Doe-Temp-Client Client Ops",SomeValue
JDo,"John ,Doe-Temp-Client Client Ops",SomeValue
JDo,"John, Doe-Temp-Client Client Ops",SomeValue
JDo,"John , Doe-Temp-Client Client Ops",SomeValue
JDo,"John,Doe  - Temp  - Client Client Ops",SomeValue
JDo,"John ,Doe -  Temp -  Client Client Ops",SomeValue
JDo,"John, Doe  -  Temp  -  Client Client Ops",SomeValue
JDo,"John , Doe - Temp - Client Client Ops",SomeValue
JDo,John-Doe - Temp - Client Client Ops,SomeValue
JDo,John -Doe - Temp - Client Client Ops,SomeValue
JDo,John- Doe - Temp - Client Client Ops,SomeValue
JDo,John - Doe - Temp - Client Client Ops,SomeValue
JDo,John-Doe - Temp - Client Client Ops  ,SomeValue
JDo,John -Doe - Temp - Client Client Ops  ,SomeValue
JDo,John- Doe - Temp - Client Client Ops  ,SomeValue
JDo,John - Doe - Temp - Client Client Ops  ,SomeValue
JDo,John-Doe-Temp-Client Client Ops,SomeValue
JDo,John -Doe-Temp-Client Client Ops,SomeValue
JDo,John- Doe-Temp-Client Client Ops,SomeValue
JDo,John - Doe-Temp-Client Client Ops,SomeValue
JDo,John-Doe  - Temp  - Client Client Ops,SomeValue
JDo,John -Doe -  Temp -  Client Client Ops,SomeValue
JDo,John- Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John - Doe - Temp - Client Client Ops,SomeValue

Untuk menambahkan kolom nama depan dan belakang, saya menggunakan kode berikut:

Function FixRxClaimReportAddFirstLastNameColumn {
  Param ($csvFile)

  Write-Host "Adding columns 'First Name' and 'Last Name' to $csvFile"
  Import-Csv $csvFile |
    Select-Object *, @{n='First Name'; e={if ($_.Description) {
        $columnFirstNameValue = $($_.Description -replace '\s+', ' ').split(" ")[0]
        if ($columnFirstNameValue -notlike "*,*" -and $columnFirstNameValue -notmatch '\?' -and $columnFirstNameValue -notlike "*.*" -and $columnFirstNameValue -notlike "*-*") {
          $columnFirstNameValue.Trim()
        } else {
          $columnFirstNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
          $columnFirstNameValue2[0].Trim()
        }
      }}}, @{n='Last Name'; e={if ($_.Description) {
        $columnLastNameValue = $($_.Description -replace '\s+', ' ').split(" ")[1]
        if ($columnLastNameValue -notlike "*,*" -and $columnLastNameValue -notmatch '\?' -and $columnLastNameValue -notlike "*.*" -and $columnLastNameValue -notlike "*-*") {
          $columnLastNameValue.Trim()
        } else {
          $columnLastNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
          $columnLastNameValue2[1].Trim()
        }
      }}} | Export-Csv "$csvFile-Results.csv" -NoTypeInformation -Force
  Write-Host "Complete."
  Write-Host ""
}

FixRxClaimReportAddFirstLastNameColumn 'C:\Scripts\Tests\Test1.csv'

Saat saya menjalankan kode ini, semua nilai nama depan harus John, dan semua nilai nama belakang harus Doe. Namun, nilainya sangat berbeda untuk semua orang.


person Fiddle Freak    schedule 21.01.2016    source sumber


Jawaban (1)


Anda berpikir terlalu rumit. Hapus informasi tambahan dari akhir bidang Description untuk mendapatkan nama saja, lalu potong nama dan bagi menjadi nama depan dan belakang sebelum menambahkan ini sebagai properti baru ke objek masukan.

Coba ini:

Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
  $rawname = $_.Description -replace '-[^-]*-[^-]*$'
  $firstname, $lastname = $rawname.Trim() -split ' *[ \?\.,-] *'
  $_ | Add-Member -Type NoteProperty -Name FirstName -Value $firstname
  $_ | Add-Member -Type NoteProperty -Name LastName -Value $lastname
  $_
} | Export-Csv 'C:\path\to\output.csv' -NoType
person Ansgar Wiechers    schedule 21.01.2016
comment
Terima kasih Ansgar. Kamu selalu sangat membantu :) - person Fiddle Freak; 21.01.2016
comment
Anda mungkin bisa lolos tanpa $rawname sama sekali: $FirstName,$LastName,$null = $_ -split '[\s\?.,-]' | ? { $_ } - person xXhRQ8sD2L7Z; 22.01.2016
comment
@ ST8Z6FR57ABE6A8RE9UF Lebih mudah dibaca dan dipahami jika Anda tidak melakukannya dalam satu baris. Selain itu, pemisahan seperti itu memiliki kelemahan yaitu Anda tidak dapat menangani beberapa nama depan atau nama yang mengandung tanda hubung. - person Ansgar Wiechers; 22.01.2016