Inilah solusi (yang agak rumit) (untuk masalah yang agak rumit):
Data:
df <- data.frame(
id = 1:2,
amenities = c('{"Wireless Internet","Wheelchair accessible",Kitchen,Elevator,"Buzzer/wireless intercom",Heating,Washer,Dryer,Essentials,Shampoo,Hangers,"Laptop friendly workspace"}',
'{TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Kitchen,"Smoking allowed","Pets allowed","Buzzer/wireless intercom",Heating,"Family/kid friendly","Smoke detector","Carbon monoxide}'))
Siapkan data:
amenities_clean <- gsub('[{}"]', '', df$amenities) # remove unwanted stuff
amenities_split <- strsplit(amenities_clean, ",") # split rows into individual amenities
amenities_unique <- unique(unlist(strsplit(amenities_clean, ","))) # get a list of unique amenities
df[amenities_unique] <- NA # set up the columns for each amenity
Sekarang untuk inti analisanya, menggunakan str_detect
dari paket stringr
:
# record presence/absence of individual amenities in each new column:
library(stringr)
for(i in 1:ncol(df[amenities_unique])){
for(j in 1:nrow(df)){
df[amenities_unique][j,i] <-
ifelse(str_detect(amenities_split[j], names(df[amenities_unique][i])), 1, 0)
}
}
Ini akan memunculkan peringatan tetapi tampaknya dapat diabaikan karena hasilnya benar:
df
id
1 1
2 2
amenities
1 {"Wireless Internet","Wheelchair accessible",Kitchen,Elevator,"Buzzer/wireless intercom",Heating,Washer,Dryer,Essentials,Shampoo,Hangers,"Laptop friendly workspace"}
2 {TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Kitchen,"Smoking allowed","Pets allowed","Buzzer/wireless intercom",Heating,"Family/kid friendly","Smoke detector","Carbon monoxide}
Wireless Internet Wheelchair accessible Kitchen Elevator Buzzer/wireless intercom Heating Washer Dryer
1 1 1 1 1 1 1 1 1
2 1 0 1 0 1 1 0 0
Essentials Shampoo Hangers Laptop friendly workspace TV Cable TV Internet Air conditioning Smoking allowed
1 1 1 1 1 0 0 1 0 0
2 0 0 0 0 1 1 1 1 1
Pets allowed Family/kid friendly Smoke detector Carbon monoxide
1 0 0 0 0
2 1 1 1 1
EDIT:
Sebagai alternatif, dan mungkin lebih ekonomis, alih-alih for
loop bersarang Anda dapat menggunakan fungsi apply
seperti ini (berdasarkan vektor amenities_split
dan amenities_unique
dari tahap Persiapan solusi pertama):
cbind(df, t(sapply(amenities_split, function(x)
table(factor(x, levels = amenities_unique)))))
person
Chris Ruehlemann
schedule
26.04.2020
df
Anda dapat melakukan:amenities_clean <- gsub('[{}"]', '', df$amenities) # remove unwanted stuff amenities_unique <- unique(unlist(strsplit(amenities_clean, ","))) # get a list of unique amenities df[amenities_unique] <- NA # set up the columns for each amenity
- person Chris Ruehlemann   schedule 19.04.2020