Pengoptimalan Kueri Fungsi Jendela
Pada proyek baru-baru ini, saya perlu membuat fitur untuk membandingkan waktu penanganan rata-rata seorang karyawan dengan waktu penanganan rata-rata keseluruhan kasus dalam antrian.
Setelah sedikit berpikir dan meneliti, saya mendapat pertanyaan berikut:
SELECT t.*, AVG(handle_time_secs) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(DATE(assigned_at)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) AS queue_aht_30days, AVG(handle_time_secs) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(DATE(assigned_at)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) AS emp_queue_aht_30days FROM project.dataset.table t
Namun, saat saya mencoba menjalankannya di BigQuery, saya mendapat pengecualian berikut:
Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 158% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%
Manajer saya menyarankan agar saya mengurangi jumlah baris yang perlu diulang oleh fungsi jendela dengan terlebih dahulu menggabungkan data berdasarkan hari.
Kueri berikut pertama-tama memperoleh tanggal stempel waktu saat kasus pertama kali ditugaskan ke karyawan dan kemudian menghitung total dan rata-rata waktu penanganan kasus per hari.
WITH features_daily_emp_queue AS ( SELECT queue_name, emp_id, DATE(assigned_at) AS assigned_at_date, avg(handle_time_secs) emp_queue_1day_aht, sum(handle_time_secs) emp_queue_1day_total_ht, COUNT(1) emp_queue_1day_cases FROM project.dataset.table GROUP BY queue_name, emp_id, assigned_at_date )
Lalu, daripada menggunakan fungsi `AVG`, kita dapat menghitung rata-ratanya dengan menjumlahkan total waktu penanganan setiap hari dan membaginya dengan jumlah jumlah kasus yang ditugaskan setiap hari.
SELECT (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) / (SUM(emp_queue_1day_cases) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS queue_aht_30days FROM features_daily_emp_queue
Kita bisa mengulangi prosesnya; menambahkan ID karyawan ke partisi.
SELECT (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) / (SUM(emp_queue_1day_touches) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS emp_queue_aht_30days FROM features_daily_emp_queue
Terakhir, kami menggabungkan semuanya dan menggabungkan fitur-fiturnya kembali ke tabel asli menggunakan ID karyawan dan tanggal yang ditetapkan.
WITH features_daily_emp_queue AS ( SELECT queue_name, emp_id, DATE(assigned_at) AS assigned_at_date, avg(handle_time_secs) emp_queue_1day_aht, sum(handle_time_secs) emp_queue_1day_total_ht, COUNT(1) emp_queue_1day_cases FROM project.dataset.table GROUP BY queue_name, emp_id, assigned_at_date ), features_daily_emp_queue_window AS ( SELECT queue_name, emp_id, assigned_at_date, (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) / SUM(emp_queue_1day_cases) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS queue_aht_30days, (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) / SUM(emp_queue_1day_cases) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS emp_queue_aht_30days ) SELECT t.*, queue_aht_30days, emp_queue_aht_30days FROM project.dataset.table t LEFT JOIN features_daily_emp_queue_window f ON t.emp_id = f.emp_id AND DATE(t.assigned_at) = f.assigned_at_date
Berbeda dengan kueri awal, kueri sebelumnya dijalankan hingga selesai.