'Create a user defined function that works with GROUP BY in mysql

I'm trying to create an aggregated function MEDIAN() in MySQL like MIN(), MAX(), AVG() which takes the input the column name or string that has concatenated values of the desired column.

I'm having trouble understanding the limitations of MySQL custom functions & would be really helpful if some can help me find out how this is done.

Example:

MySQL table has 2 columns (ID, num)

+----+-----+
| id | num |
+----+-----+
|  1 |   5 |
|  1 |   6 |
|  1 |   7 |
|  2 |   1 |
|  2 |   3 |
|  2 |   5 |
+----+-----+

SELECT id, MEDIAN(num) as median
FROM table
GROUP BY id;

OR

SELECT id, MEDIAN(GROUP_CONCAT(num SEPARATOR ',') as median
FROM table
GROUP BY id;

Expected Output is

+----+--------+
| id | median |
+----+--------+
|  1 |      6 |
|  2 |      3 |
+----+--------+


Solution 1:[1]

User defined aggregate stored functions were added in MariaDB-10.3.3

MySQL can do aggregate functions however not in SQL. They need a UDF (shared library implemenation)

Solution 2:[2]

EDIT: I am aware that this answer does not directly address the question, since the question is "how to create an aggregate median function in mySQL" and my answer specifically says how to do it without a UDF.

However, the accepted answer says that it is not possible in mySQL, so I gave a solution that would address the aggregate median ability without having to use a UDF, in case someone might want to calculate the aggregate medians anyway.


It is possible to do without a UDF, and I know of two ways to do it. The first uses two selects and a join, the first select to get the values and rankings, and the second select to get the counts, then joins them. The second uses json functions to get everything in one select. They are both a little lengthy, but they work and are reasonably fast.

SOLUTION #1 (two selects and a join, one to get counts, one to get rankings)

SELECT  x.group_field, 
        avg(
            if( 
                x.rank - y.vol/2 BETWEEN 0 AND 1, 
                value_field, 
                null
            )
        ) as median
FROM (
    SELECT  group_field, value_field, 
            @r:= IF(@current=group_field, @r+1, 1) as rank, 
            @current:=group_field
    FROM (
        SELECT group_field, value_field
        FROM table_name
        ORDER BY group_field, value_field
    ) z, (SELECT @r:=0, @current:='') v
) x, (
    SELECT group_field, count(*) as vol 
    FROM table_name
    GROUP BY group_field
) y WHERE x.group_field = y.group_field
GROUP BY x.group_field;

SOLUTION #2 (uses a json object to store the counts and avoids the join)

SELECT group_field, 
    avg(
        if(
            rank - json_extract(@vols, path)/2 BETWEEN 0 AND 1,
            value_field,
            null
        )
    ) as median
FROM (
    SELECT group_field, value_field, path, 
        @rnk := if(@curr = group_field, @rnk+1, 1) as rank,
        @vols := json_set(
            @vols, 
            path, 
            coalesce(json_extract(@vols, path), 0) + 1
        ) as vols,
        @curr := group_field
    FROM (
        SELECT p.group_field, p.value_field, concat('$.', p.group_field) as path
        FROM table_name
        JOIN (SELECT @curr:='', @rnk:=1, @vols:=json_object()) v
        ORDER BY group_field, value_field DESC
    ) z
) y GROUP BY group_field;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2