Pengoptimalan Kueri Fungsi Jendela

Pada proyek baru-baru ini, saya perlu membuat fitur untuk membandingkan waktu penanganan rata-rata seorang karyawan dengan waktu penanganan rata-rata keseluruhan kasus dalam antrian.

Setelah sedikit berpikir dan meneliti, saya mendapat pertanyaan berikut:

SELECT
 t.*,
 AVG(handle_time_secs) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(DATE(assigned_at)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) AS queue_aht_30days,
 AVG(handle_time_secs) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(DATE(assigned_at)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) AS emp_queue_aht_30days
FROM project.dataset.table t

Namun, saat saya mencoba menjalankannya di BigQuery, saya mendapat pengecualian berikut:

Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 158% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%

Manajer saya menyarankan agar saya mengurangi jumlah baris yang perlu diulang oleh fungsi jendela dengan terlebih dahulu menggabungkan data berdasarkan hari.

Kueri berikut pertama-tama memperoleh tanggal stempel waktu saat kasus pertama kali ditugaskan ke karyawan dan kemudian menghitung total dan rata-rata waktu penanganan kasus per hari.

WITH features_daily_emp_queue AS (
 SELECT
    queue_name,
    emp_id,
    DATE(assigned_at) AS assigned_at_date,
    avg(handle_time_secs) emp_queue_1day_aht,
    sum(handle_time_secs) emp_queue_1day_total_ht,
    COUNT(1) emp_queue_1day_cases
 FROM project.dataset.table
 GROUP BY
    queue_name,
    emp_id,
    assigned_at_date
)

Lalu, daripada menggunakan fungsi `AVG`, kita dapat menghitung rata-ratanya dengan menjumlahkan total waktu penanganan setiap hari dan membaginya dengan jumlah jumlah kasus yang ditugaskan setiap hari.

SELECT
 (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) / (SUM(emp_queue_1day_cases) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS queue_aht_30days
FROM features_daily_emp_queue

Kita bisa mengulangi prosesnya; menambahkan ID karyawan ke partisi.

SELECT
   (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) /
              (SUM(emp_queue_1day_touches) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS emp_queue_aht_30days
FROM features_daily_emp_queue

Terakhir, kami menggabungkan semuanya dan menggabungkan fitur-fiturnya kembali ke tabel asli menggunakan ID karyawan dan tanggal yang ditetapkan.

WITH features_daily_emp_queue AS (
       SELECT
           queue_name,
           emp_id,
           DATE(assigned_at) AS assigned_at_date,
           avg(handle_time_secs) emp_queue_1day_aht,
           sum(handle_time_secs) emp_queue_1day_total_ht,
           COUNT(1) emp_queue_1day_cases
       FROM project.dataset.table
       GROUP BY
           queue_name,
           emp_id,
           assigned_at_date
   ),
   features_daily_emp_queue_window AS (
       SELECT
           queue_name,
           emp_id,
           assigned_at_date,
           (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) /
              SUM(emp_queue_1day_cases) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS queue_aht_30days,
               (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) /
              SUM(emp_queue_1day_cases) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS emp_queue_aht_30days
   )
SELECT
   t.*,
   queue_aht_30days,
   emp_queue_aht_30days
FROM project.dataset.table t
   LEFT JOIN features_daily_emp_queue_window f
    ON t.emp_id = f.emp_id
     AND DATE(t.assigned_at) = f.assigned_at_date

Berbeda dengan kueri awal, kueri sebelumnya dijalankan hingga selesai.