Fun with regex: We match the requirements for the HTTPS DNS record type defined by RFC 9460

To be able to fully read, understand and transfer RFCs into a standard-aligned application can be very difficult. You may become a lawyer as well.

We start off with section 2.1 of RFC 9460 which defines the following rules for us to apply:

  • the record data must be formatted as “SvcPriority TargetName SvcParams” where
    • SvcPriority is a number between 0 and 65535
    • TargetName is a domain-name, which is a combination of texts (labels) that can contain alphabetic characters, digits and hyphens, these labels can be up to 63 characters each and can be bonded together using dots up to a length of 255 characters
      • Note: The domain-name is not specified in RFC 9460, but already in RFC 1035
    • SvcParam is one or more of either only a key such as ‘mykey’ or a key-value pair such as ‘mykey=”myvalue”‘ where the quotation mark is mandatory

SvcPriority match

We first match the numbers 0 to 65535 using

\b(0|[1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])\b

It is in word boundary to ensure the regex matches whole numbers and not parts of larger numbers.

TargetName match

To fully adhere with the rules for TargetName stated above, we could use a regex like

\b((?=.{1,255}$)(?!-)(?!.*--)[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?(?:\.[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?)*\.?)\b

But now there is a problem! RFC 5890, for internationalized domain names, states that domains can contain also double hyphens, but only if they are preceded by “xn” and only in the beginning. The Punycode algorithm used here, is a whole topic by itself and not covered here. What matters is, that we somehow should include the possibility to allow “xn--” in the filter.

((?=.{1,255}$)(?:xn\-\-)?(?!-)(?!.*--)[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?(?:\.[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?)*\.?))

Now there is only one more problem: The TargetName can be also just a “.” (dot). So we could come up with something like:

\b\h(\.|\b((?=.{1,255}$)(?:xn\-\-)?(?!-)(?!.*--)[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?(?:\.[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?)*\.?)\b)(?:\.)?

SvcParam match

The RFC states specific rules for the SvcParam values also. To match them, we can even take some portion of the RFCs instructions almost directly into the regex:

(\h(([a-z0-9-]{1,63})((\=\")[\x21\x23-\x27\x2A-\x3A\x3C-\x7E]{1,255}+(\"))?)?)*?

We allow all non-special characters plus the escape character. The keys can be up to 63 bytes. The values can take up to 255 bytes.

Further constraints

This is the basic constraints set for the HTTPS record type. But there is more. For example, the keys of SvcParam must be unique. This is something that is currently not possible with plain regex, because it simply lacks a feature for storing and comparing data.

There may be more constraints in the RFC, which is huge like every RFC, so you should take this post here with a grain of salt and see it as a comedial approach to regex matching.

Wrapping up

All bundled together would be:

^\b(0|[1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])\b\h(\.|\b((?=.{1,255}$)(?:xn\-\-)?(?!-)(?!.*--)[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?(?:\.[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?)*\.?)\b)(?:\.)?(?!=)(\h(([a-z0-9-]{1,63})((\=\")[\x21\x23-\x27\x2A-\x3A\x3C-\x7E]{1,255}+(\"))?)?)*?$

Which is quite complex. If you could follow the regex until here, I am happy for you.

The matching test cases look as if we covered all:

The direct link on regex101.com is here.

Leave a Reply

Your email address will not be published. Required fields are marked *