'Is it possible to have a union of url and FilePath in Pydantic?

I wonder if this would somehow be possible:

class Picture(BaseModel):
    src: Union[FilePath, stricturl(allowed_schemes=["https"])]

I have this test, which is failing, because I think it is trying to apply FilePath to the url... I could reverse this but that does not make a difference because then, file paths are going to be parsed as urls. I know this is very tricky, I just wonder why a url is recognized as an interal file path (https protocol etc.)

    def test_parse_component_chapter_optional_picture_src_accepts_url():
        err_msg = ("Error in /chapter: URL scheme not permitted:"
                   " \"picture->src: ... \"")
        chapter, err = parse_component_chapter({
            "picture": {
                "src": "http://www.robingruenke.com",
                "height": "250px"
            }
        })
        assert err == err_msg and chapter is None

Maybe there is a solution for this ?



Solution 1:[1]

The issue with Union's is unexpected coercion, wherein the value is forced into the first type that could handle the value. From the Pydantic docs on Unions:

class User(BaseModel):
    id: Union[int, str, UUID]
    name: str

... pydantic will attempt to 'match' any of the types defined under Union and will use the first one that matches. In the above example the id of user_03 was defined as a uuid.UUID class (which is defined under the attribute's Union annotation) but as the uuid.UUID can be marshalled into an int it chose to match against the int type and disregarded the other types.

It makes sense for you to expect that a "http://..." URL wouldn't match FilePath, but Pydantic's FilePath is just:

like Path, but the path must exist and be a file

and Path is just the Python standard library type pathlib.Path, and Pydantic just:

simply uses the type itself for validation by passing the value to Path(v)

Checking pathlib.Path behavior directly, it does accept an HTTP URL:

In [28]: from pathlib import Path

In [29]: Path("http://www.robingruenke.com")
Out[29]: PosixPath('http:/www.robingruenke.com')

and the actual error raised when using FilePath is just about that the "path must exist". In your test, you are expecting that the error would only be about the wrong URL scheme ("URL scheme not permitted"), when in fact, Pydantic raises 2 validation errors:

In [26]: class Picture(BaseModel):
    ...:     src: Union[FilePath, stricturl(allowed_schemes=["https"])]
    ...: 

In [27]: pic = Picture(src="http://www.robingruenke.com")
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 pic = Picture(src="http://www.robingruenke.com")

File ~/path/to/venv/lib/python3.9/site-packages/pydantic/main.py:331, in pydantic.main.BaseModel.__init__()

ValidationError: 2 validation errors for Picture
src
  file or directory at path "http:/www.robingruenke.com" does not exist (type=value_error.path.not_exists; path=http:/www.robingruenke.com)
src
  URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'https'})

1 error from FilePath, that the local file doesn't exist.
1 error from stricturl, that the scheme "http" is not allowed.

That would cause your assert err == err_msg to fail, because you actually have to check for 2 sets of validation errors, and the 1st error isn't even about the wrong URL scheme.

A workaround I would suggest here is, instead of a Union of different types, is to implement your own validator for the src field. It seems the requirement is that src should either be a locally existing file path or an HTTPS URL. You can use the urllib.parse standard library module for parsing URLs.

class Picture(BaseModel):
    src: str

    @validator("src", pre=True)
    def validate_src(cls, value: Any) -> AnyUrl:
        url_parts = urlparse(value)
        is_ok = True

        if url_parts.scheme and url_parts.scheme != "https":
            is_ok = False
        elif not Path(url_parts.path).exists():
            is_ok = False

        if not is_ok:
            raise ValueError("src must be either existing local file path or HTTPS URL")

        return value
def test_src_is_local_path_exists():
    pic = Picture(src="/local/directory/test.png")
    assert pic.src

def test_src_is_local_path_does_not_exist():
    with pytest.raises(ValidationError) as exc:
        Picture(src="/file/that/does/not/exist")

    assert exc.value.errors()[0]["msg"] == "src must be either existing local file path or HTTPS URL"

def test_src_is_url_is_https():
    pic = Picture(src="https://www.robingruenke.com")
    assert pic.src

def test_src_is_url_is_not_https():
    with pytest.raises(ValidationError) as exc:
        Picture(src="http://www.robingruenke.com")

    assert exc.value.errors()[0]["msg"] == "src must be either existing local file path or HTTPS URL"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gino Mempin