'Ubuntu systemd-resolve not using correct DNS server for certain domain


I've been setting up a nomad cluster together with consul and encountered a problem concerning the consul DNS service running on 127.0.0.1:8600.
I'm using systemd-resolve and iptables for forwarding consul requests according to the official documentation, hence my resolved.conf files looks like this:

[Resolve]
DNS=127.0.0.1
Domains=~consul

[Resolve]
DNS=111.152.1.1 111.152.1.5
#FallbackDNS=
Domains=example.com
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#DNSOverTLS=no
#Cache=no-negative
#DNSStubListener=yes
#ReadEtcHosts=yes

When restarting resolved and networkd, the active DNS server is set to localhost and queries, such as host active.vault.service.dc1.consul, are sent correctly to the consul DNS service:

Looking up RR for active.vault.service.dc1.consul IN A.
Switching to DNS server 111.152.1.5 for interface ens192.
Switching to system DNS server 127.0.0.1.
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/resolve1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=5 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Cache miss for active.vault.service.dc1.consul IN A
Transaction 47973 for <active.vault.service.dc1.consul IN A> scope dns on */*.
Using feature level UDP+EDNS0 for transaction 47973.
Using DNS server 127.0.0.1 for transaction 47973.
Sending query packet with id 47973.
Processing query...
Processing incoming packet on transaction 47973 (rcode=SUCCESS).
Verified we get a response at feature level UDP+EDNS0 from DNS server 127.0.0.1.
Transaction 47973 for <active.vault.service.dc1.consul IN A> on scope dns on */* now complete with <success> from network (unsigned).
Sending response packet with id 7470 on interface 1/AF_INET.
Freeing transaction 47973.

Checking the host of the domain host example.com switches the current DNS server to 111.152.1.1 and works correctly as well:

Looking up RR for example.com IN A.
Cache miss for example.com IN A
Transaction 4122 for <example.com IN A> scope dns on */*.
Using feature level UDP+EDNS0 for transaction 4122.
Using DNS server 127.0.0.1 for transaction 4122.
Sending query packet with id 4122.
Cache miss for example.com IN A
Transaction 7037 for <example.com IN A> scope dns on ens192/*.
Using feature level UDP+EDNS0 for transaction 7037.
Using DNS server 111.152.1.5 for transaction 7037.
Sending query packet with id 7037.
Processing query...
Processing incoming packet on transaction 4122 (rcode=REFUSED).
Server returned REFUSED, switching servers, and retrying.
Retrying transaction 4122.
Switching to system DNS server 111.152.1.1.
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/resolve1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=6 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Cache miss for example.com IN A
Transaction 4122 for <example.com IN A> scope dns on */*.
Using feature level UDP+EDNS0 for transaction 4122.
Using DNS server 111.152.1.1 for transaction 4122.
Sending query packet with id 4122.
Processing incoming packet on transaction 7037 (rcode=SUCCESS).
Verified we get a response at feature level UDP+EDNS0 from DNS server 111.152.1.5.
Added positive unauthenticated cache entry for example.com IN A 1688s on ens192/INET/111.152.1.5
Transaction 7037 for <example.com IN A> on scope dns on ens192/* now complete with <success> from network (unsigned).
Freeing transaction 4122.

However, when trying host active.vault.service.dc1.consul again, it is not resolved correctly:

Looking up RR for active.vault.service.dc1.consul IN A.
Cache miss for active.vault.service.dc1.consul IN A
Transaction 21878 for <active.vault.service.dc1.consul IN A> scope dns on */*.
Using feature level UDP+EDNS0 for transaction 21878.
Using DNS server 111.152.1.1 for transaction 21878.
Sending query packet with id 21878.
Processing query...
Processing incoming packet on transaction 21878 (rcode=NXDOMAIN).
Server returned error NXDOMAIN in EDNS0 mode, retrying transaction with reduced feature level UDP (DVE-2018-0001 mitigation)
Retrying transaction 21878.
Cache miss for active.vault.service.dc1.consul IN A
Transaction 21878 for <active.vault.service.dc1.consul IN A> scope dns on */*.
Using feature level UDP for transaction 21878.
Sending query packet with id 21878.

The output of systemd-resolve --status looks as follows:

Global
       LLMNR setting: no                  
MulticastDNS setting: no                  
  DNSOverTLS setting: no                  
      DNSSEC setting: no                  
    DNSSEC supported: no                  
  Current DNS Server: 111.152.1.1         
         DNS Servers: 127.0.0.1           
                      111.152.1.1         
                      111.152.1.5         
          DNS Domain: ~consul             
                      example.com     
          DNSSEC NTA: 10.in-addr.arpa     
                      16.172.in-addr.arpa 
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa 
                      18.172.in-addr.arpa 
                      19.172.in-addr.arpa 
                      20.172.in-addr.arpa 
                      21.172.in-addr.arpa 
                      22.172.in-addr.arpa 
                      23.172.in-addr.arpa 
                      24.172.in-addr.arpa 
                      25.172.in-addr.arpa 
                      26.172.in-addr.arpa 
                      27.172.in-addr.arpa 
                      28.172.in-addr.arpa 
                      29.172.in-addr.arpa 
                      30.172.in-addr.arpa 
                      31.172.in-addr.arpa 
                      corp                
                      d.f.ip6.arpa        
                      home                
                      internal            
                      intranet            
                      lan                 
                      local               
                      private             
                      test                

Link 3 (docker0)
      Current Scopes: none
DefaultRoute setting: no  
       LLMNR setting: yes 
MulticastDNS setting: no  
  DNSOverTLS setting: no  
      DNSSEC setting: no  
    DNSSEC supported: no  

Link 2 (ens192)
      Current Scopes: DNS          
DefaultRoute setting: yes          
       LLMNR setting: yes          
MulticastDNS setting: no           
  DNSOverTLS setting: no           
      DNSSEC setting: no           
    DNSSEC supported: no           
  Current DNS Server: 111.152.1.5  
         DNS Servers: 111.152.1.5  
                      111.152.1.1  
          DNS Domain: example.com

So my question is why the .consul domain requests are not always sent to the local DNS server.



Solution 1:[1]

Not sure what your systemd version is but there was a bug in systemd's DNS resolver/cache, aka. "systemd-resolved", which resulted in errors like:

Transaction 7037 for <example.com IN A> on scope dns on ens192/* now complete with <success> from network (unsigned).

The bug was fixed in a commit on Feb 14, 2021 which got released in systemd v248.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Adam Romanek