Skip to content

LLT-7053: Raw DNS forwarder#1709

Open
tomasz-grz wants to merge 1 commit intomainfrom
LLT-7053_raw_forwarder
Open

LLT-7053: Raw DNS forwarder#1709
tomasz-grz wants to merge 1 commit intomainfrom
LLT-7053_raw_forwarder

Conversation

@tomasz-grz
Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz commented Mar 12, 2026

Problem

Non .nord DNS queries are forwarded through hickory-server's ForwardAuthority zone. We want to remove hickory dependencies as it adds unnecessary overhead, requires additional maintenance and was a source of bugs.

Solution

Add a RawForwarder, that sends DNS queries directly to upstream resolvers as UDP packets.
The forwarder will be integrated in a subsequent PR

  • DNS message use internal ID rewriting to support multiple concurrent queries
  • each upstream is tried in order, with per query timeouts
  • single socket bound to the tunnel interface is reused
  • raw bytes are forwarded and returned unchanged

☑️ Definition of Done checklist

  • Commit history is clean (requirements)
  • README.md is updated
  • Functionality is covered by unit or integration tests

@tomasz-grz tomasz-grz self-assigned this Mar 12, 2026
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 0ef4b6a to 214691c Compare March 12, 2026 15:34
@tomasz-grz tomasz-grz changed the title LLT-7053: Raw forwarder LLT-7053: Raw DNS forwarder Mar 19, 2026
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 214691c to 699976a Compare March 24, 2026 16:43
@tomasz-grz tomasz-grz changed the base branch from main to dns_nord_zone March 24, 2026 17:05
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 7bd6827 to 377c2bc Compare March 24, 2026 17:27
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 377c2bc to 955a15f Compare March 25, 2026 09:55
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 955a15f to 170b52a Compare March 25, 2026 14:55
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 0beecb0 to 87e30d5 Compare March 26, 2026 14:56
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 87e30d5 to 7a69b3b Compare March 26, 2026 15:42
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 7a69b3b to 9aa76d9 Compare March 31, 2026 14:57
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from ed2af6b to 44cb4d5 Compare April 1, 2026 12:21
Base automatically changed from dns_nord_zone to main April 2, 2026 08:54
Add component that forwards raw DNS queries to upstream resolvers over UDP socket
@tomasz-grz tomasz-grz force-pushed the LLT-7053_raw_forwarder branch from 0621e15 to e4660a6 Compare May 4, 2026 13:34
@tomasz-grz tomasz-grz marked this pull request as ready for review May 4, 2026 13:35
@tomasz-grz tomasz-grz requested a review from a team as a code owner May 4, 2026 13:35
Comment on lines +38 to +62
/// Errors returned when forwarding a DNS query
#[derive(Error, Debug)]
pub enum ForwardError {
/// Failed upstream socket bind operation
#[error("Failed socket bind operation: {0}")]
SocketBind(#[from] io::Error),
/// Failed to send a DNS query to the upstream resolver
#[error("Failed to send DNS query: {0}")]
Send(io::Error),
/// No upstreams configured
#[error("No upstream resolvers configured")]
NoUpstreams,
/// The upstream resolvers did not respond within the configured timeout
#[error("DNS query timed out")]
Timeout,
/// The forwarder channel was closed
#[error("Forwarder channel closed")]
ChannelClosed,
/// Too many concurrent requests
#[error("Too many concurrent requests in flight")]
TooManyRequests,
/// The DNS packet is too short
#[error("DNS packet too short")]
PacketTooShort,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SocketBind and Send read like operation names, while the rest of the variants (NoUpstreams, Timeout, ChannelClosed, TooManyRequests, PacketTooShort) describe an error state. Consider renaming them for consistency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Comment on lines +301 to +312
let first_upstream = {
let locked = upstreams.lock().await;
locked.first().cloned()
};

let upstream_addr = match first_upstream {
Some(addr) => addr,
None => {
send_channel_response!(msg.respond_to, Err(ForwardError::NoUpstreams));
return;
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider this:

Suggested change
let first_upstream = {
let locked = upstreams.lock().await;
locked.first().cloned()
};
let upstream_addr = match first_upstream {
Some(addr) => addr,
None => {
send_channel_response!(msg.respond_to, Err(ForwardError::NoUpstreams));
return;
}
};
let Some(upstream_addr) = upstreams.lock().await.first().cloned() else {
send_channel_response!(msg.respond_to, Err(ForwardError::NoUpstreams));
return;
};

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this still hold the lock on upstreams while doing send_channel_response?

}

/// Handle new query
async fn handle_new_query(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider extracting the body into a Result-returning helper (e.g. prepare_query) and dispatching the result once at the end. The current shape repeats the match { Ok(v) => v, Err(e) => { send_channel_response!(...); return; } } pattern four times — collapsing them with ? would significantly shorten the function. But it is readable as it is so just a nitpick. Not sure if it would be readable after doing this.

Comment on lines +377 to +381
let expired_ids: Vec<(u16, bool)> = pending
.iter()
.filter(|(_, entry)| entry.deadline <= now || entry.respond_to.is_closed())
.map(|(&id, entry)| (id, entry.respond_to.is_closed()))
.collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is_closed() is called twice per entry — once in filter, once in map. A single-pass filter_map is possible:

let expired_ids: Vec<(u16, bool)> = pending
    .iter()
    .filter_map(|(&id, entry)| {
        let closed = entry.respond_to.is_closed();
        (entry.deadline <= now || closed).then_some((id, closed))
    })
    .collect();

Comment on lines +224 to +231
let is_known_upstream = {
let locked = upstreams.lock().await;
locked.iter().any(|u| u.ip() == src.ip())
};
if !is_known_upstream {
telio_log_warn!("Received DNS response from unknown source: {src}, ignoring");
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this branch has no test

Copy link
Copy Markdown
Contributor Author

@tomasz-grz tomasz-grz May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so straight forward, since it only checks for the IP address, and in unit tests all the injected responses would be from localhost 🤔 It's probably better tested in nat-lab

Unless we change the check to include source port as well, it would be more strict. But I don't know how often could servers use ephemeral ports for their responses (for example for load balancing).. probably need to check this more

Comment on lines +125 to +134
fn allocate_id(pending: &HashMap<u16, PendingQuery>, next_id: &mut u16) -> Option<u16> {
for _ in 0..DNS_ID_SPACE {
let candidate = *next_id;
*next_id = next_id.wrapping_add(1);
if !pending.contains_key(&candidate) {
return Some(candidate);
}
}
None
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocate_id walks pending linearly and bails out after a full sweep — correctness depends on IDs being released back into the pool once a query is delivered or times out. That contract is not directly tested anywhere; a regression here (e.g. an entry forgotten in pending on some error path) would show up as a slow leak, not a unit test failure.

Easy smoke test: drive the forwarder through more queries than DNS_ID_SPACE and assert no TooManyRequests. The looping echo stub used by spawn_multi_stub is a good base — spawn_stub exits after one packet so it cannot serve enough queries here.

#[tokio::test]
async fn ids_are_released_after_response() {
    let (addr, _h) = spawn_multi_stub(DNS_ID_SPACE as usize + 1000).await;
    let forwarder = RawForwarder::new().await.unwrap();
    forwarder.set_upstreams(vec![addr]).await;

    for i in 0..(DNS_ID_SPACE as u32 + 1000) {
        let req = make_dns_packet(i as u16, b"x");
        forwarder.query(&req).await.expect("id pool exhausted — leak in `pending`");
    }
}

Comment on lines +369 to +425
/// Handle expired queries
async fn handle_timeouts(
socket: &UdpSocket,
upstreams: &Arc<Mutex<Vec<SocketAddr>>>,
timeout: &Arc<Mutex<Duration>>,
pending: &mut HashMap<u16, PendingQuery>,
) {
let now = Instant::now();
let expired_ids: Vec<(u16, bool)> = pending
.iter()
.filter(|(_, entry)| entry.deadline <= now || entry.respond_to.is_closed())
.map(|(&id, entry)| (id, entry.respond_to.is_closed()))
.collect();

if expired_ids.is_empty() {
return;
}

let current_upstreams = {
let locked = upstreams.lock().await;
locked.clone()
};

for (internal_id, is_closed) in expired_ids {
let mut entry = match pending.remove(&internal_id) {
Some(e) => e,
None => continue,
};

if is_closed {
telio_log_warn!("Caller dropped for: {internal_id}");
continue;
}

let next_index = entry.upstream_index + 1;
match current_upstreams.get(next_index) {
Some(&next_upstream) => {
telio_log_debug!(
"Upstream timed out for request: {internal_id}, trying next: {next_upstream}"
);
entry.upstream_index = next_index;
entry.deadline = Instant::now() + *timeout.lock().await;

if let Err(e) = socket.send_to(&entry.query_bytes, next_upstream).await {
send_channel_response!(entry.respond_to, Err(ForwardError::Send(e)));
continue;
}

pending.insert(internal_id, entry);
}
None => {
telio_log_warn!("All upstreams exhausted for request: {internal_id}");
send_channel_response!(entry.respond_to, Err(ForwardError::Timeout));
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: upstream_index is captured at submit time but resolved here against current_upstreams, which may have been replaced via set_upstreams while this query was pending. After a swap, entry.upstream_index + 1 can point at an unrelated resolver — possibly the same one we just timed out on.

Suggestion: store a snapshot of the upstream list (or the next SocketAddr directly) inside PendingQuery so retry behavior is independent of mutations to the shared list.

Regression test that fails on the current implementation with count_a == 2 (the retry hits the same blackhole that just timed out):

// Demonstrates that `PendingQuery::upstream_index` is unstable across
// `set_upstreams`: the index is captured at submit time but resolved later
// against the *current* upstream list, so a reorder/replace mid-flight makes
// the retry land on the wrong resolver — possibly the same one we just
// timed out on.
//
// Sequence:
//   1. upstreams = [addr_a]; submit one query → goes to addr_a (index 0).
//   2. mid-flight, swap upstreams to [echo_addr, addr_a].
//   3. timeout fires; retry uses next_index = upstream_index + 1 = 1.
//   4. current_upstreams[1] is addr_a → blackhole gets a SECOND packet.
//
// The unambiguous bug signal is `count_a == 2`. Whether the query should
// ultimately succeed via echo or fail with Timeout depends on the eventual
// fix design (snapshot at submit time vs. "use latest list, skip already-tried").
#[tokio::test]
async fn retry_targets_wrong_upstream_after_list_reorder() {
    use std::sync::atomic::{AtomicUsize, Ordering};

    // Counting blackhole: records every datagram but never sends a reply.
    let socket_a = UdpSocket::bind(\"127.0.0.1:0\").await.unwrap();
    let addr_a = socket_a.local_addr().unwrap();
    let count_a = Arc::new(AtomicUsize::new(0));
    let count_a_task = count_a.clone();
    let _stub_a = tokio::spawn(async move {
        let mut buf = vec![0u8; 4096];
        loop {
            if socket_a.recv_from(&mut buf).await.is_ok() {
                count_a_task.fetch_add(1, Ordering::SeqCst);
            }
        }
    });

    let (echo_addr, _echo) = spawn_stub(StubBehavior::Echo).await;

    let forwarder = RawForwarder::new().await.unwrap();
    forwarder.set_upstreams(vec![addr_a]).await;
    forwarder.set_timeout(Duration::from_millis(100)).await;

    let f = forwarder.clone();
    let query_handle = tokio::spawn(async move {
        let request = make_dns_packet(TEST_PACKET_ID, TEST_DNS_PAYLOAD);
        f.query(&request).await
    });

    tokio::time::sleep(Duration::from_millis(20)).await;
    assert_eq!(count_a.load(Ordering::SeqCst), 1, "initial query did not reach addr_a yet");

    // Reorder upstreams BEFORE timeout fires:
    //   index 0 -> echo_addr
    //   index 1 -> addr_a   <-- this is what handle_timeouts will pick
    forwarder.set_upstreams(vec![echo_addr, addr_a]).await;

    let _result = query_handle.await.unwrap();

    assert_eq!(
        count_a.load(Ordering::SeqCst),
        1,
        "addr_a was retried after upstream list reorder — got {} packets, expected 1",
        count_a.load(Ordering::SeqCst)
    );
}

Comment on lines +97 to +122
/// Extract the 16-bit transaction ID of a DNS packet
fn get_dns_id(packet: &[u8]) -> Result<u16, ForwardError> {
if packet.len() < DNS_HEADER_OFFSET {
return Err(ForwardError::PacketTooShort);
}

// This is ok because the size is checked above
#[allow(clippy::indexing_slicing)]
Ok(u16::from_be_bytes([packet[0], packet[1]]))
}

/// Overwrite the 16-bit transaction ID of a DNS packet
fn set_dns_id(packet: &mut [u8], id: u16) -> Result<(), ForwardError> {
if packet.len() < DNS_HEADER_OFFSET {
return Err(ForwardError::PacketTooShort);
}
let bytes = id.to_be_bytes();

// This is ok because the size is checked above
#[allow(clippy::indexing_slicing)]
{
packet[0] = bytes[0];
packet[1] = bytes[1];
}
Ok(())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Larger refactor option (only worth it if more header inspection is on the horizon — get_flags, is_response, rcode, qdcount, etc.): split validation from access. A validate_dns_header gate validates once and returns a typed reference; getters/setters take the already-validated &[u8; 12], do not check length, do not return Result, do not need #[allow].

fn validate_dns_header(packet: &[u8]) -> Result<&[u8; DNS_HEADER_OFFSET], ForwardError> {
    packet.first_chunk().ok_or(ForwardError::PacketTooShort)
}

fn validate_dns_header_mut(
    packet: &mut [u8],
) -> Result<&mut [u8; DNS_HEADER_OFFSET], ForwardError> {
    packet.first_chunk_mut().ok_or(ForwardError::PacketTooShort)
}

fn get_dns_id(header: &[u8; DNS_HEADER_OFFSET]) -> u16 {
    u16::from_be_bytes([header[0], header[1]])
}

fn set_dns_id(header: &mut [u8; DNS_HEADER_OFFSET], id: u16) {
    let bytes = id.to_be_bytes();
    header[0] = bytes[0];
    header[1] = bytes[1];
}

Indexing [0]/[1] on &[u8; 12] is compile-time safe (the type guarantees the length), so clippy stays quiet without #[allow]. Call sites validate once on entry to handle_new_query / handle_response and then pass the typed header around.

Trade-off: two layers instead of one for the current ID-only use case. If ID is all you will ever read from the header, this is over-engineered. If more header fields show up later, this scales without duplicating length checks across every getter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants