'Handling dynamic (user supplied) column names

When writing applications that manage data, it is often useful to allow the end user to create or remove classes of data that are best represented as columns. For example, I'm working on a dictionary building application; a user might decide they want to add, say, an "alternate spelling" field or something to data, which could be very easily represented as another column.

Usually, I just name the column based on whatever the user called it ("alternate_spelling" in this case); however, a user-defined string that isn't explicitly sanitized as a database identifier bothers me. Since column names can't be bound like values, I'm trying to figure out how to sanitize the column names.

So my question is: what should I be doing? Can I get away with just quoting things? There's lots of questions asking how to bind column names in SQL, and many responses saying one should never need to, but never explaining the correct approach to handling variable columns. I'm working in Python specifically, but I think this question is more general.



Solution 1:[1]

It depands on which database you are using...

According to PostgreSQL: "SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable"

(Keep also in mind: maximum length allowed for the name)

Solution 2:[2]

I was looking for something like this. I still wouldn't trust it with user-supplied names - I'd look those up from the database catalog instead, but I think it is robust enough to check data that is provided from your backend.

i.e. Just because something comes from your internal data tables or yaml config files doesn't 100% mean that an attacker couldn't have hacked into those sources, so why not add another layer right before composing sql queries?

This is for postgresql but mostly should work on something else. No, it doesn't cover ALL possible characters for naming columns and tables, only those used in my databases.


class SecurityException(Exception):
    """concerns security"""

class UnsafeSqlException(SecurityException):
    """ sql fragments looks unsafe """


def is_safe_sql_name(sql : str, error_on_empty : bool = False, raise_on_false : bool = True) -> bool :
    """check that something looks like an object name"""
    patre = re.compile("^[a-z][a-z0-9_]{0,254}$",re.IGNORECASE)

    if not isinstance(sql, str):
        raise TypeError(f"sql should be a string {sql=}")

    if not sql:
        if error_on_empty:
            raise ValueError(f"empty sql {sql=}")
        return False

    res = bool(patre.match(sql))
    if not res and raise_on_false:
        raise UnsafeSqlException(f"{sql=}")
    return res

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jan M.
Solution 2 JL Peyret