1661. Average Time of Process per Machine

题目链接：https://leetcode.cn/problems/average-time-of-process-per-machine/description/?envType=study-plan-v2&envId=sql-free-50

Example 1:

Input:
Activity table:
+————+————+—————+———–+
| machine_id | process_id | activity_type | timestamp |
+————+————+—————+———–+
| 0 | 0 | start | 0.712 |
| 0 | 0 | end | 1.520 |
| 0 | 1 | start | 3.140 |
| 0 | 1 | end | 4.120 |
| 1 | 0 | start | 0.550 |
| 1 | 0 | end | 1.550 |
| 1 | 1 | start | 0.430 |
| 1 | 1 | end | 1.420 |
| 2 | 0 | start | 4.100 |
| 2 | 0 | end | 4.512 |
| 2 | 1 | start | 2.500 |
| 2 | 1 | end | 5.000 |
+————+————+—————+———–+
Output:
+————+—————–+
| machine_id | processing_time |
+————+—————–+
| 0 | 0.894 |
| 1 | 0.995 |
| 2 | 1.456 |
+————+—————–+
Explanation:
There are 3 machines running 2 processes each.
“
Machine 0’s average time is ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894
Machine 1’s average time is ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995
Machine 2’s average time is ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456
“

Write your MySQL query statement below 1

SELECT machine_id,
ROUND(SUM(IF(activity_type=”start”,-timestamp,timestamp)) / COUNT(DISTINCT process_id) , 3) AS processing_time
FROM Activity
GROUP BY machine_id;

使用 IF(activity_type=”start”, -timestamp, timestamp) 将 start 时间取负值，而 end 时间保持正值。
通过 SUM() 累加所有 timestamp，由于 start 和 end 是成对出现的，最终得到的是每个 process_id 的时间差总和。
COUNT(DISTINCT process_id) 计算每台机器中唯一的 process_id 数量。
SUM / COUNT 求得每台机器的平均处理时间，最终用 ROUND 保留 3 位小数。
优点：
简洁清晰，逻辑简单，直接按列运算，不需要表连接。
不依赖数据的行数或顺序，具有较好的可读性。
缺点：
依赖于 IF 判断，将正负时间差叠加在一起，这种方式在数据量非常大时可能会有一定的性能开销。
假如存在非成对的 start 和 end 记录（例如，缺少 start 或 end），结果可能会不准确。

Write your MySQL query statement below 2

select machine_id,
round(
(sum(case activity_type when ‘end’ then timestamp else 0 end)
+ sum(case activity_type when ‘start’ then -timestamp else 0 end))
/(count(*)/2),3) processing_time
from Activity
group by machine_id

解释：
使用 CASE 条件判断，将 end 的 timestamp 累加，将 start 的 timestamp 取负值。
SUM() 计算所有 timestamp 的加和，得到每台机器的所有时间差。
COUNT() / 2 假设每个 process_id 有两条记录（start 和 end），因此通过总行数除以 2 来得到进程数。
最后计算平均时间差并用 ROUND 保留 3 位小数。
优点：
使用 COUNT() / 2 来推断进程数量，避免了 COUNT(DISTINCT process_id) 的开销。
表达式对数据规律性（成对出现）进行了假设，适用于严格成对的情况。
缺点：
假如数据存在不完整的 start 或 end 记录，COUNT(*) / 2 的假设将导致错误的计算结果。
逻辑稍显复杂，尤其在理解 CASE 和 COUNT 的作用时可能不直观。

Write your MySQL query statement below 3

SELECT
a.machine_id,
ROUND(
SUM(b.timestamp - a.timestamp) / COUNT(a.machine_id = b.machine_id) , 3)as processing_time
FROM
Activity a
LEFT JOIN
Activity b
ON
a.machine_id = b.machine_id
AND
a.activity_type = ‘start’
AND
b.activity_type = ‘end’
AND
a.process_id = b.process_id
WHERE
b.machine_id is NOT NULL
GROUP BY
a.machine_id

解释：
将表 Activity 自连接，通过 LEFT JOIN 连接 start 和 end 的记录。
ON 条件：匹配相同的 machine_id 和 process_id，并且 a 为 start，b 为 end。
SUM(b.timestamp - a.timestamp) 累加每个 process_id 的时间差。
COUNT(a.machine_id = b.machine_id) 计算有效的匹配对数量（即 start 和 end 配对的数量）。
计算每台机器的平均处理时间并用 ROUND 保留 3 位小数。
优点：
严格基于成对的 start 和 end 记录，不会受不完整记录的影响。
数据更精确，适合处理复杂或不完全成对的数据场景。
缺点：
使用 JOIN 操作，性能开销较大，特别是在数据量很大的情况下。
写法相对复杂，可读性较低。

machine_id	process_id	activity_type	timestamp	machine_id	process_id	activity_type	timestamp
0	0	start	0.712	0	0	end	1.52
0	0	end	1.52	null	null	null	null
0	1	start	3.14	0	1	end	4.12
0	1	end	4.12	null	null	null	null
1	0	start	0.55	1	0	end	1.55
1	0	end	1.55	null	null	null	null
1	1	start	0.43	1	1	end	1.42
1	1	end	1.42	null	null	null	null
2	0	start	4.1	2	0	end	4.512
2	0	end	4.512	null	null	null	null
2	1	start	2.5	2	1	end	5
2	1	end	5	null	null	null	null

machine_id	process_id	activity_type	timestamp	machine_id	process_id	activity_type	timestamp
0	0	start	0.712	0	0	end	1.52
0	1	start	3.14	0	1	end	4.12
1	0	start	0.55	1	0	end	1.55
1	1	start	0.43	1	1	end	1.42
2	0	start	4.1	2	0	end	4.512
2	1	start	2.5	2	1	end	5

leetcode算法题

#SQL

1661. Average Time of Process per Machine

https://pisces34.github.io/2024/11/18/leetcode/1661/

发布于

2024年11月18日

许可协议

1446. 连续字符上一篇

66. 加一下一篇