Volume Number: 19 (2003)
Issue Number: 5
Column Tag: Network Management
A Rendezvous by any other name still works as easily
by John C. Welch
Zeroconf is the standard aggregation behind Apple's "Rendezvous" network configuration scheme. I say aggregation because, as we will see, there are a number of parts that make up Zeroconf--all of them making their way through the Internet Engineering Task Force, (IETF)standards process. Zeroconf is also hopefully the end of a multi-year quest to finally give TCP/IP the same ease of use that Mac users have had for years with the AppleTalk protocol family. What is ironic about Zeroconf is that it, like AppleTalk, originated at Apple. To be more specific, Zeroconf was created by Stuart Cheshire, a "Wizard without Portfolio" at Apple. I was at the Apple Worldwide Developer's Conference, (WWDC) in 2001 when Stuart first brought up the idea and he did it in the most understated way you can imagine--starting off with "So, I've had this idea, and I was wondering if anyone would be interested in it..." When he finished we were amazed and astounded at having just heard a solution to ninety percent of our problems with TCP/IP configuration almost handed to us. It's not often that you get to be in a place where brilliance happens, but this was one of those times.
Now, at this point there are many people working on Zeroconf, but the seminal ideas and most of the methods behind Zeroconf came from Stuart. So, the next time you are using the Rendezvous features in iChat or the upcoming Rendezvous capabilities we saw demoed in iTunes by Steve Jobs at MacWorld Expo of 2002 in New York, say a bit of thanks to Stuart for making TCP/IP finally work the way it should for the end user.
There are three major elements of Zeroconf:
Address assignment/configuration--or making sure that each device can get an IP address that is unique and usable without needing manual entry or external services
Name assignment and pairing host names with their respective IP addresses--again, without needing manual configuration or an external server
Service discovery, which allows the end user to easily locate and use network services such as printers, file servers, iTunes play lists, iChat participants, etc.
Now, you don't have to use all Zeroconf features at once. For example, if you have a cable/DSL router acting as a DHCP server, you can still continue to use it in a Zeroconf environment. One of the requirements for Zeroconf is that it must minimize its effect on existing networks--in other words, do no harm. It should also minimize its effect on existing applications. This is a bit more difficult to manage and, in some cases, it's unavoidable due to the way some applications work. Finally, while it isn't any more inherently secure than any other TCP/IP v4 protocol or set of protocols, it is also designed not to be any less secure, and it manages to achieve this as well. Luckily, all of this is easy, since nothing in Zeroconf is very new. Another thing to remember is that simplicity and ease of use (not scalability) are the primary goals of Zeroconf. If Zeroconf ends up being able to scale globally, great! But if there has to be a choice between ease of use and scalability, then scalability should lose.
Another point to remember is that while not all of the individual protocols in Zeroconf are limited to a local link (a local link being any set of TCP/IP end nodes you can reach without a router), the broadcasts and Link Local (LL) mechanisms of Zeroconf must not reach across subnets. As we will see, if those parts of Zeroconf were to be routed across multiple links, well..."It would be bad." Finally, Zeroconf protocols must not make assumptions about the state of a network as, in a pure Zeroconf environment, all aspects of the network including a node's TCP/IP address can change.
As I said earlier, there are three basic parts to Zeroconf, which create the foundation for a functional, easy to use network. They are:
These three parts apply to almost any network setup scheme from traditional TCP/IP to AppleTalk. You have to establish a unique low-level identifier, relate that to a higher-level identifier, and then find ways to use the things your network provides for you. None of this is new in principle, but what is new is the way Zeroconf adapts existing standards and/or ideas to get the work done in an easy fashion.
TCP/IP Address Assignment
So, now we have a brand new device using Zeroconf protocols to get things done--say, a spiffy new 1GHz PowerBook G4 for example. (Yes, I have a rich imagination.) The first thing we have to do is assign it an IP address. Normally this is done manually, or by talking to a DHCP or PPP server. But in a Zeroconf environment we don't have external servers and we don't want to have to deal with manual addressing. So, how do we get IP addresses assigned correctly and easily? Simple--we use Link-Local (LL) addressing.
LL addressing is not new. It has been in use on the MacOS since Mac OS 8.5 and on the Windows side of the world since Windows 98. However, it was only used if a DHCP server couldn't be found. It's simply a way to assign TCP/IP addresses without needing an external or manual configuration mechanism. Zeroconf uses a specific subnet for its address assignment--the 169.254/16 subnet. This is registered with the Internet Assigned Numbers Authority, (IANA) specifically for TCP/IPv4 LL address assignment. (TCP/IPv6 already handles this. Details can be found in RFC 2462.)
Unlike in previous implementations, under Zeroconf LL addressing is always available, even if other addressing methods are used. This may seem redundant, but think about it for a second: if you always have LL addressing available you can still function even if the DHCP server goes down or you are temporarily in a differently-addressed network. If LL addressing wasn't always available, then you would have to have a "Zeroconf" configuration set up so you could function in those situations, which would detract somewhat from ease of use. The good thing about Zeroconf's LL is that it doesn't break if you use other configuration methods concurrently, nor will it break other configuration methods by its concurrent use with them.
Since LL addressing is critical to Zeroconf we should define it a bit. In essence, LL machines are any machines that can see one another without the use of any device that decrements the TTL count or modifies the IP header or payload. The TTL, or "Time To Live" is a hop limiter used by routers. If a packet has a TTL of 10, it can only go through that many routers before the TTL is at 0 and the packet will be dropped by the next router. LL addressing is not routable, so if the TTL count on a LL packet gets changed, something is wrong. In addition, things like Network Address Translation, (NAT) or other setups that modify the packet information are not used on an LL network (indeed, they can cause major problems), so any network behind a device performing NAT or some other function should not be part of a LL network. Again, LL addresses are not routable, nor are they allowed for use on the public Internet, so you can't use a LL network to get onto the Internet unless you have a LL-aware device on the network acting as a router. LL is used most often with Ethernet (most networks are Ethernet, so this follows), but it can be used with almost any IEEE 802 media that supports at least 1Mb/second bandwidth, and some form of the Address Resolution Protocol, (ARP).
LL addressing is designed for a fairly small number of machines even though the theoretical upper limit is 65024 hosts on a given LL network. Due to the way in which LL works, and other limitations, the practical upper-end of a LL network is around 1300 machines. One of the reasons for this limitation is that TCP/IP addresses in a LL network are not static. They are quite dynamic and can change regularly, even if you are in the middle of a network operation. Therefore, LL implementations must expect situations where conflicting addresses are assigned and handle them gracefully in a way that allows them to reassign addresses "on-the-fly." This has to occur during the entire time the device is using a LL network, not just when the device initially joins the network. Obviously, LL devices should not be configured manually or via DHCP, as these methods are not designed to function in such a dynamic environment. If you have a LL address on a given interface, that doesn't have to be the only address assigned to that interface. In fact, having a manual, or DHCP address in addition to the LL address is considered a good thing. Having both means that if your Mac needs to talk to a strange printer just long enough to print an envelope you don't need to reconfigure your network setup to do so. As long as both devices support LL addressing you should be able to communicate easily. A host isn't required to have any specific number of addresses in a TCP/IP v4 environment, but if that host is going to function in a TCP/IP v6 environment, it must be able to have multiple addresses on a single interface. (I do mention IP v6 quite a bit here, mostly because it already has these capabilities and because as we all move to IP v6, we are going to be in a mixed-mode environment for many years to come.)
So, how does LL address selection work? It follows a fairly straightforward procedure that allows it to ensure that the address it is using isn't already in use by another device:
Select an address in the 169.254/16 range, via a pseudo-random generator. Any address but the first and last 256 addresses are available, as these are reserved by the IETF and IANA for future use. However the random address is picked, the starting point, or 'seed' should be based on a unique characteristic such as the interface's MAC address. If a seed such as the system clock time is used, you could have a group of machines all starting at once, constantly trying to grab the same addresses. If the developer of the LL implementation desires, they can cache the address that is eventually used as a starting point for the next time address selection takes place.
Once an address is selected the host must test to see if the address is free, usually via ARP. The host broadcasts an ARP request on the link to see if it can get the desired address. This ARP packet has the following specs:
Sender hardware address field filled with the MAC address of the interface that is being configured
The sender IP address is zero-filled to avoid polluting ARP caches, in case the desired address is being used by another machine. This makes the packet an "ARP probe."
The target hardware address is zero-filled and ignored
The target IP address is set to the desired address. This way, if the address isn't being used the packet will never be answered by the target IP address.
When the host is ready, it waits for a random time between zero and two seconds, and sends four ARP probes at two-second intervals. (This helps avoid a surge of probe traffic if multiple machines come on at once.)
If the host receives an ARP packet where the sender IP address contains the desired address, the host assumes the address is in use and has to start again with a different address. (Remember that the sender address in the ARP probe is blank.) The host should maintain a counter that tracks the number of times it probes for an address already in use. If the counter gets too high--say, over ten--the host has to slow down the rate at which it probes for new addresses.
If two seconds pass after the last probe has been sent without a reply that indicates the desired address is in use, the host can claim the desired address.
The host then has to send two ARP announcements two seconds apart that have both the sender and target IP address set to the newly claimed address to help prevent stale ARP caches.
Now, just because the host has claimed an address on an interface doesn't mean it is finished with the process. It has to help with collision detection for as long as it is on the link. If it receives an ARP packet or probe where the sender IP address is its own IP address and the MAC address is different from the host's configured interface, there are two possible actions:
The host can elect to give up the address and reconfigure itself.
The host can defend its address. To do this, it records the time the conflicting ARP packet was received, and broadcasts an ARP announcement using the IP address it is defending as the sender IP, and the MAC address of the interface using that IP address as the sender hardware addresses for that packet.
This does not mean that the host can always keep that address. If it gets multiple conflicting addresses within ten seconds (on Ethernet), then the host has to give up that address and reconfigure itself. In this way, you avoid having multiple hosts in an infinite address defense loop. Obviously, changing IP addresses can do bad things to applications, transfers, etc., but this should be rare if all hosts are following the rules. In all of these cases the ARP transmissions are broadcast, not unicast.
Interfaces with Multiple Addresses
In our modern networking world a given interface can, and often does have more than one IP address. Under Zeroconf this is quite common. If the destination address is in the LL subnet, and the host is set up with a LL address then obviously you just use that LL-configured address to send the packet out. If the host doesn't have a LL address, it can use ARP to locate the host with the LL address and send the packet, but using its own routable address in the packet. The host can't send a packet with a LL destination address to a router. If both the destination and the host each have LL and routable addresses, the address used as the destination is up to the host, and should be based on which address is likely to be the most stable.
It's important to note that LL packets are not routable. The TTL value of all LL packets must be set to 255 by the sender--that is the only valid value for the destination to use. (Routers decrement the TTL every time they pass a packet, so a TTL of less than 255 indicates that routing happened.) Routers should not answer any LL ARP packets unless they are set up with a LL interface. This restriction applies to multicast packets on the LL segment as well. It is perfectly correct to assume that any packets in the LL subnet are local and directly reachable. Subnetting of a LL segment is to be avoided, as it will undermine the ARP collision detection mechanism.
Another idiosyncrasy that can crop up involves multi-homed hosts. Remember that out of the box, any Mac can have two Ethernet interfaces--one wired, one wireless. Xserves can have two wired Gigabit Ethernet interfaces. If you take PCI slots and multiport Ethernet cards into account, a G4 tower can have over 12 interfaces. Even a PowerBook G4 can potentially have multiple interfaces, thanks to the PC Card slot. In our Zeroconf LL scenario, multi-homed hosts have three choices:
Only use one interface for LL traffic and defend the address on that interface alone
Use interface identifiers and share the LL address across all active interfaces. The LL address is created once but defended on all active interfaces.
Use each interface's IP address to identify the interface, along with additional properties to support the ID process. Each LL interface has its own unique LL address and has to defend that address, even against other interfaces on the same host. One thing the host can do when bringing up a new interface is to internally probe the other interfaces and make sure not to pick addresses that are assigned to those other interfaces, so the host doesn't have to ARP for them. (ARP is a verb too, it stands for Address Resolution Protocol)
Another problem can crop up when you join two separate networks together via a bridge or other non-router method, such as the connection of two previously separate hubs. In a case like this, as address conflicts are detected, they get resolved just like any other conflict. In any event, hosts should not be sending out "just in case" ARPs, as that can become a serious bandwidth waster.
Since joining a network is far easier with LL addressing under Zeroconf, there is the potential here for "bad" hosts to join the network and conduct attacks. ARP is not a secure protocol at all, and should never be assumed to be such. Anyone packet-sniffing on a Zeroconf network can easily get a list of MAC addresses that can be used for nefarious purposes, such as sending out ARP replies that allow it to claim every address on the LL segment, thereby creating a DOS for that segment.
However, LL isn't new and its increased use in Zeroconf does not make it any less secure than it already was.
Miscellaneous Application Issues
One side-effect of Zeroconf and LL is that network address usage gets a lot more complex. A network application can no longer assume that the IP address at the start of a transaction will be the same at the end. Connection-dependent protocols, such as FTP will have problems with this. Security implementations that rely on IP address, such as Kerberos will need to be updated to handle this or just not used in a LL network. Another problem is that on a multi-homed host, if each interface is connected to a different physical subnet, they could all have the same IP address, yet not conflict with one another. In any event, many applications will have to be rewritten to handle this gracefully--and no, a dialog saying, "Your address changed, so I failed" doesn't count.
Host Name Assignment
Now that we have our IP address assigned we need to come up with a name for our machine. Yes, you can work with IP addresses directly, but who wants to do that when you can give your machine a human-readable name--maybe even one that is descriptive. There are a lot of ways to do this, including the one we use every day that has been tested on a global scale for decades now: the Domain Name System, or DNS.
While DNS does this nicely, the traditional DNS implementation (a.k.a. - Unicast DNS) is difficult for home networks, ad hoc networks or small networks. For one, it needs a dedicated server, which an ad hoc network won't tend to have. For another, even with various tools it's really hard to set up for the uninitiated, and even a minor mistake can cause you problems. Finally, having to set up a server ahead of time isn't "Zeroconf". But, DNS is well known and stable so we want to use it, but we don't want to configure it. The answer is Multicast DNS, or mDNS.
mDNS is a serverless LL implementation of DNS's naming function. It allows for automatic naming of devices, without a server. It handles naming and conflict resolution for a Zeroconf segment without the users having to know anything about DNS. It is enabled in Mac OSX 10.2 and later (iChat being the most obvious use here) and may show up in Windows XP in a future service pack.
Like everything else in Zeroconf, the idea is to make existing protocols and methodologies work in a more friendly fashion rather than recreating the wheel via some over-engineered, overly complicated implementation that would get ignored. It is designed to create no more traffic than exists with ARP broadcasts today, as overburdening a network would again cause this not to be used. As with LL addressing, the mDNS domain--.local--is designed not to be routable along with any subdomains of .local, as they would have no meaning outside of the link on which they live.
In an mDNS system, all mDNS requests go to 22.214.171.124 on each machine in the Zeroconf segment. Also, any requests for a name in the 254.169.in-addr.arpa domain on the segment would go to the mDNS address. As a domain, .local acts like any other search domain on a machine. Unqualified queries go to .local first (on Zeroconf devices). For example, if you were looking for 'rocking4' it would first be searched for as 'rocking4.local'. This doesn't preclude the use of "normal" DNS names or using traditional DNS servers. For example, my Mac at work has an MIT DNS name and a Rendezvous name. Both machine names are the same, but the MIT name ends in .mit.edu and is a global DNS name, whereas the Rendezvous name ends in .local. In my case, I use mit.edu as my initial search domain, then .local, but both coexist peacefully. So mDNS doesn't require any more work on the part of the user than traditional DNS.
However, if you are a home user, you don't need to concern yourself. If you have ever set up machine names for AppleTalk networks, that's the effort needed for mDNS names. Once you've set up your machine name in the sharing control panel, you connect up to the LL segment. You get your LL IP address, then Rendezvous sends out mDNS queries to see if any other machine on the segment is using your host name. If you are the first, or the only one using that name, you're done and you have a host name. If there is already a machine using that name, then just like AppleTalk, you have to change your hostname. That's about all that the work-at-home/ad hoc user ever needs to do.
One of the concerns with this is name conflicts. As it turns out, Apple's long-time experience with automatic name configuration shows that this isn't much of a problem. There are human reasons for unique computer names that help smooth this out before the machine ever gets on the network. Another problem that can arise is if a routed packet shows up acting like a LL packet. Well, just as with LL addressing, if the TTL value of the mDNS packet is less than 255, it is considered an invalid packet and is ignored. This avoids acceptance of erroneously routed packets or packets from an outside host trying to spoof the LL segment.
Within an mDNS network there are three basic kinds of DNS requests that are going to be seen. The first is a standard DNS request that would be used by any device needing DNS service from a normal unicast DNS setup. This kind of "one-shot" request/reply is normally used by devices that are not aware of mDNS (a.k.a. - "mDNS-stupid"). While an mDNS-stupid client can send a standard request to the 126.96.36.199 address, there are some problems with it. For one, normally with a unicast DNS request you take the first valid answer you get back since you are only asking one server at a time. In an mDNS network, you are asking every mDNS machine at once, and you may get back hundreds of replies. If you only take the first one it has a good chance of being wrong. One-shot requests can also cause problems if the thing you are looking for is not on the .local domain, as then you could get false positives or incorrect "thing not found" error messages.
The next kind of request is a "one request out, multiple reply back" request, and is used by mDNS-aware clients. This is a client that understands that there is a difference between mDNS and unicast DNS and acts appropriately. In this case, the client understands that it is going to get multiple responses and looks at all of the responses to find the right one. It will retransmit the request until it finds what it is looking for or puts up a "thing not found" error. To increase efficiency and decrease network traffic, the client can put more than one question into the question section of the request.
The final kind of request is a "continuous" request, as used by something like a network or printer browser. This is akin to leaving the Chooser, or the 'Connect to Server' window open. This is the kind of thing you'd do while testing a printer that should be on the network, but isn't, in order to see when it comes back on the network. Obviously, this kind of request can create problems if it runs for too long or if multiple clients are generating them, which is why an IP browser window/app under Zeroconf must not generate overly large amounts of traffic. One of the ways in which mDNS helps prevent this is via Duplicate Suppression.
When you have an ad hoc network with multiple nodes providing various services, one thing you want to avoid is to have every single device having to respond every single time to every single request. One of the ways in which to avoid this is to use caching in the mDNS requests. When an mDNS requester sends a request with answers it already knows about, it populates the answer section of the DNS messages with those cached records. Those records have a TTL value that will remain valid for at least this request and the next two following requests. This way, you avoid both extra traffic and bad information in the cache. As the request hits the various mDNS responders on the segment, they all examine the request. If the cache contains the answer they would have given, and the cached TTL is at least half of the value of the real TTL, they don't respond and we avoid unnecessary traffic. If the TTL of the cached info (Yes, TTL is being used differently here. This is not the TTL used to check for erroneous routing, but a different TTL used to prevent stale caches.) is less than half of the real TTL, the mDNS responder replies, so as to update the cache and prevent premature expiration.
One thing that must be avoided is an mDNS requester caching resource records observed on other mDNS requests. This is because the answer section of those requests is not authoritative, or the information is believed to be correct, but not certified to be correct. Again, Zeroconf is a very dynamic environment , which means that a device in the answer section could have dropped off the network a millisecond after updating the cache. Obviously an mDNS responder needs to allow for larger-sized requests and replies in accordance with the "Extension Mechanisms for DNS (EDNS0)", IETF RFC2671. By correctly implementing caching and duplicate suppression, a Zeroconf network gets the same kind of functionality from mDNS that AppleTalk got from ZIP/NBP, with less traffic.
So, we have an mDNS request that has come in--now we need to handle responding to it. First off, failures aren't something for responses. If no response is received, failure is assumed. This avoids extraneous traffic from three hundred hosts all saying "not me" and one host saying "here you go." There's not much point in doing all of that duplicate suppression work on the request side just to blow it away on the response side. On an Ethernet-style network, the mDNS responder should delay its responses by a random period between zero and ten milliseconds. This is to avoid collisions on the network. If, however, the mDNS responder has verified that it is the only device that can respond to a given request, the delay is not needed.
Just like unicast DNS, mDNS responses go to UDP port 53. One thing to remember here is that the mDNS responder can't make assumptions about the uniqueness of a name, since a Zeroconf network may constantly be in flux. All mDNS peers must continually monitor the network to ensure that name collisions don't happen. If a response's source port isn't UDP 53, a client may not be aware that it is in a multicast environment and is just blindly responding. In this case, the mDNS responder must send a normal mDNS response to the requester's UDP 53 port and an identical response to the requester's source UDP port and IP address. If a request has more than one question, the mDNS responder must respond to all questions to which it can give a positive response. If the mDNS responder has unique records for the .local domain, it has to also have an mDNS requester so that it can verify the uniqueness of its records.
In a Zeroconf network--especially one chock full of Mac OS X users--there are a lot of events happening. Machines start; they wake up; they change how they are connected to the network. When any of these events occur, the mDNS responder has a few required tasks to perform:
For any resource records that have to be unique on the LL segment, it has to send an mDNS query to see if any of them are in use (i.e. - by another device using that name for a host name). Since mDNS queries can have multiple responses, this can be done with a single packet. If any conflicts/collisions are found, the device already using that name wins. The queries are sent multiple times, with the second query happening one second after the first. Two seconds later the third query goes out, and so on, for seven seconds.
The mDNS responder should then send out a gratuitous mDNS response, with the Answer section filled with any resource records that could be of use to other hosts on the link. This could include PTR records used by DNS Service Discovery. This allows hosts that have open browser windows to be immediately updated, so they don't have to send a request of their own, thereby reducing network traffic. Up to ten of these gratuitous responses can be sent, but the time interval between them must double with every response sent.
In any event, the only time announcements are to be sent out gratuitously is when a host has a change event in its network connection. There are to be no periodic "Hey, I'm here and here's what I have." announcements.
With mDNS, a conflict happens when you have multiple resource records that have the same name/type/class containing inconsistent rdata . If two hosts have identical rdata, it is not inconsistent rdata. However, if a host wants to have a given name and runs into a segment with that name in use by a different IP address, this is an example of inconsistent rdata. Whenever an mDNS responder receives a response with a conflict in a resource record, the responder must cease using that record and may have to reconfigure itself to avoid the conflict. If the host in question is a device with a human user, the responder can pop a notification to the human so that the human can make the change. Once the change is made, conflict testing must be repeated. With mDNS, the first host using a name is the one that wins all conflicts. With regard to conflict testing, it is again useful to note that .local and 254.169.in-addr.arpa have only local significance. This is in contrast with "normal" DNS, which is concerned with global uniqueness. Since mDNS is only working at a LL level, conflicts between different LL segments are likely to occur.
mDNS Record Differences
mDNS doesn't use NS records, as all mDNS domains are delegated to 188.8.131.52. That address is the "name server" for mDNS. Since it is a multicast host, this address identifies a group of hosts working together to maintain the .local zone. For any segment, mDNS works as a single zone run as a distributed name server process. Since an mDNS zone is a loose collection of CPUs working together, there is no delegation in an mDNS zone. If a host is responding for a given record, there is no guarantee that it will respond for children of that record, or even other resources in that record. Another record not used by mDNS zones is the Start Of Authority, or SOA record. Since there is no "administrator" in an mDNS zone there is no need for a SOA email address. Along that line, since there is only loose coordination between hosts in an mDNS zone, there is no way in which to implement an increasing serial number. Another mDNS difference is that there are no zone transfers for any mDNS zone.
Clearly, mDNS is going to be useful in a Service Discovery application, which we'll touch upon in detail in the DNS Service Discovery (DNS-SD) section. As we saw earlier, the information caching in mDNS uses TTL values to determine validity. Because a Zeroconf network is not assumed to be stable, these TTL values should be measured in seconds, so as to avoid stale information. If you are using DNS-SD, this can be excepted for fairly stable devices like laser printers, etc., in order to avoid excessive network traffic.
With regard to failover to mDNS for names outside of .local, this should be optional and disabled by default in order to avoid security issues that would be caused by local resolution of global names. Lookup options for unqualified names in .local are controlled by the existence of .local in the DNS search domain list on an mDNS client. If a resource is working in a .local domain, but wants to avoid mDNS, it can do so by not using any names ending in .local.
If a host is multi-homed, it needs to defend its fully qualified domain name (FQDN) on any and all active interfaces that are responding to mDNS requests. In the event of a conflict on any interface the host should configure a new host name. When answering an mDNS request, a multi-homed host with LL addresses should take care to check that addresses in an mDNS reply are valid for the interface that is responding. If a host has multiple interfaces and it detects multiple hosts with the same name, but on different LL segments, it should understand that this is valid. Obviously, mDNS clients need to listen for and examine all mDNS messages for useful information. The information can be cached or not, as desired, but if caching is used the TTL aging issues must be handled correctly.
mDNS Message Format Specifics
The query ID should be set to zero on transmission and ignored on reception. This does not apply to unicast DNS queries, which have to honor the query ID.
The query/response bit (QR) must be set to zero for queries, and one for responses.
Under current IETF standards, the Opcode must be set to zero.
The authoritative answer (AA) bit must be set on a request to zero on transmission and be ignored by the receiver. For a response message, this bit must be set to one.
The recursion desired (RD) bit must be set to zero on transmission and ignored on reception.
The recursion available (RA) bit must be set to zero on transmission and ignored on reception.
The zero bit (Z) bit must be set to zero on transmission and ignored on reception.
The authentic data (AD) bit must be set to zero on transmission and ignored on reception for queries. It can be set to one on responses, but shouldn't be trusted without the source being trusted, a secure path to the source, or DNS transaction security is used.
The checking disabled (CD) bit is set according to resolver policy for queries. For responses, the bit must be set to zero on transmission and ignored on reception.
The response code (RCODE) bit must be zero on transmissions, and any non-zero values must be ignored.
IP v6 Considerations
IP v6 and IP v4 ignore each other on the same host as though they were physically separate, so it is possible to have two .local zones on the same host. If the host has both IP v4 and IP v6 in use (dual-stacked), it should register with both v4 and v6 .local zones so it can talk to v4 and v6 hosts.
If authenticity of information is critical then DNSSEC needs to be used, especially if DNS queries outside of .local are sent to the all-DNS multicast address, as with network outages that kill a device's connection to the Internet. If DNSSEC is not use, you could have a rogue host masquerading as a local host. While a proper FQDN has a trailing dot on the end, most people omit this. (You probably don't look for www.apple.com.) Because of this omission, a rogue host could masquerade as a host in a domain by answering requests that do use the trailing dot. To avoid this, a host must not append ".local" to a relative domain name with two or more labels. It is acceptable to append it to a relative domain name with two or more labels, since the user should not expect a single label domain name to work as-is.
IANA has allocated the IP v4 LL multicast address 184.108.40.206 for mDNS use.
DNS Service Discovery
So, now we have our IP address--our host name. What's next? We have to find things to do, a.k.a. - Services. Since we have this "really nice mDNS thing" running, our next step should be to use that to find things. One of the best service discovery (SD) protocols is still AppleTalk's ZIP/NBP combination. There have been numerous attempts to replicate this on TCP/IP networks like NLS and SLP, but they tend to fail, mostly because they are too complex in implementation and use and are so over-engineered as to make a simple implementation quite difficult. (SLP is indeed a robust, flexible SD protocol, but I have yet to see an implementation that was even close to being as simple as setting up AppleTalk's SD protocols. Mostly because outside of zones, there really wasn't any setup work for ZIP/NBP.)
There are four basic properties of a good SD protocol (according to the DNS-SD folks, and yours truly) :
It has the ability to query for a certain type of service in a specified domain, and get a list of the named instances of those services. In other words, "I want to see some printers in domain X. Okay, here's a list of printers for X. Cool, thanks." Keep the queries and the responses simple.
Once it is given the name of an instance of the desired service (i.e. - Printer "4th Floor Color Xerox laser with duplexer"), resolve that name to the information the client needs to use that instance for that service. So once the human picks the printer, the SD protocol has to be able to get the information needed so that printing can commence.
The names need to be persistent (i.e. - a printer that is there today, should be there tomorrow, even if the information for that printer has changed.) The names need to be abstracted enough so that they are not tied to network addressing, etc.
Finally, the protocol needs to be simple, so that any device that can use the carrier protocol, like TCP/IP for Zeroconf, can easily implement the SD protocol as well. Giving in to the desire to have an infinite number of options will make implementing the protocol so complex that it won't be used.
Service Instance Naming
We should figure out how to name the service instances so they can be found easily. Well, DNS provides us with a useful way to do this--the SRV record. If you think about it, most services of a given type--say, printers--only tend to differ by name (i.e. - a list of lpr printers will all have the type "_printer._tcp.domain.com"). So finding LPR printers in a domain is pretty simple. However, unlike services such as web servers, you don't want to be load-balanced. Getting rotated to a random printer because the one you wanted was busy would get annoying rather quickly! To avoid getting rotated to a random instance of a service, an additional level of indirection gets used--the PTR records.
The client requests PTR records (which are pointers from one name in a DNS space to another name in that space). The result of a PTR query is a list of service names in the form "Service Instance Name = instance.service.domain". The instance in this result is a single Unicode UTF-8 encoded text string used for the DNS label. Note that the service name is not the same as a host name. The user doesn't manually enter the DNS information for the service instance, but selects them from a list, via the Chooser or the Print Center. Once the selection is made, the service can be used and possibly saved in a config/prefs file for future use (i.e. - default printers). DNS labels are limited to a 63 octet length, and UTF-8 encoding can use up to 6 octets per character depending on the amount of data needed to display a given character. So a simple character set--like the standard Roman alphabet--would be able to use 63 characters for the name, whereas something like Egyptian hieroglyphics would be limited in size to something like ten characters.
Resolving the Service Name
Now that we have a name for the service, we have to resolve that name to a device in order to communicate with it. Once the name has been selected, the client sends a DNS request for the SRV record for that name. This returns the TCP/IP address for the service instance, applicable port number for the instance, and the target host for that service. If the network is a Zeroconf record, there may not be a well-defined host name since, in a Zeroconf network, you may not have any DNS servers. In such a case, the target name in the SRV record may be the same name as that SRV record, which contains an attached address record with the IP address for that service instance. If there are multiple hits from a request, priority and weight fields must be evaluated by the client to select the most appropriate instance. This would allow some load-balancing of instances--say, to manage printers in a pre-press shop, or for pulling down network updates to clients from various file servers. Not all instances are created equal either, and some require more than just a TCP/IP address and a port as a reference. LPR print servers may have multiple queue names; a file server may have multiple share points. In these cases, TXT records with the same name as the SRV record, and contain this additional data, are used.
When doing queries you may need to be more selective. For example, in a large pre-press environment, you may not want to get a list of all printers--there could be hundreds. So, you can create more selective queries by adding restrictions to the request. A request for "_postscript._ipp._tcp.domain.com" would only give you a list of postscript printers using the IPP network printing protocol, as only those types of printers would have the appropriate PTR records pointing to their names.
Populating DNS with SD Information
For DNS-SD to work, you need to have the required records filled with the needed information for the services provided. There are many options here, most of which can be used now for conventional DNS functions:
You can always manually enter the information into the DNS server's configuration files. This is tedious, but not onerous in a stable environment.
A network monitoring tool can, upon detecting new devices, create a zone file that will be read by a standard DNS server that will automatically update the server's records. If you have a large number of changes happening at once, or regularly, this is a good option.
Devices can use Dynamic DNS (DyDNS) to automatically register SRV/PTR records with the DNS server. If you are already using Dynamic DNS this can save you some time.
A device manager can also use the DyDNS update mechanism for devices it is managing. So an HP JetDirect print manager could automatically update DNS records via DyDNS as new printers showed up.
Zeroconf devices will answer mDNS requests for SRV/PTR records in .local.
There is no specific requirement between DNS-SD and mDNS but they do work well together, especially in a Zeroconf environment. In such an environment, PTR record lookups of the form "service.local" use multicast and return a list of instances of the form "instance.service.local". In this type of implementation, the TTL for the DNS records should be short enough to avoid bad IP address translations but long enough so that once a name has been displayed on a given host, that host doesn't have to continuously re-verify the existence of that instance.
Comparisons with Other SD Protocols
Since there are several other SD protocols available for use, why not use one of them in Zeroconf? Well, as it turns out, DNS has a lot of advantages. First, it's one of the oldest, most robust and best-understood SD protocol. It scales globally (even more widely if you count links to various Shuttle flights, the Space Station Freedom, and some of the Mars missions), and fits in with every network troubleshooting tool on the market today. Since Zeroconf is using DNS for name setup anyway, using DNS for service discovery is a logical choice.
Another advantage to DNS is ubiquitousness. Almost every large network has its own DNS server, and many mid-size networks do as well. So, there is a very large DNS support population available. DNS already has a dynamic registration protocol, so automation has been done. DNS also has its own security implementation, so using DNS for service discovery avoids that work as well. In short, DNS today has everything that SD wants, and very few downsides for SD usage. The only thing missing is multicast modes, but mDNS is moving toward becoming a standard, and is in use already, so that problem will take care of itself in short order. There is no reason to reinvent the wheel yet again and create an overbuilt, hard to use SD protocol when DNS is here, and easy.
You now have a better idea of what is going on behind Rendezvous in Mac OS X 10.2 and later. Note that Rendezvous and Zeroconf are not theoretical "Soon we'll all eat pills for meals and sleep in capsules" flights of fancy. They are here now, doing work now--and thanks to Apple's release of the Rendezvous source code--developers have an example of an implementation of these things to work from. True, iChat is the most visible example of this, and most folks may not realize from that humble application the brilliance of Zeroconf and its implementation. But then, they probably haven't used the Rendezvous implementations of iChat. I was amazed at how well it works and how easy it makes things, and I am a network administrator. Zeroconf is what TCP/IP should be for users, and again--we have Apple, and Stuart Cheshire to thank for it.
Bibliography and References
All of my source materials come from Apple, the IETF and conversations with various people involved with Zeroconf. Some pertinent URLs are:
John Welch <email@example.com> is a Consultant with MIT's IS department, and the Chief Know-It-All for TackyShirt. He has over fifteen years of experience at making computers work. John specializes in figuring out ways in which to make the Mac do what nobody thinks it can, showing that the Mac is the superior administrative platform, and teaching others how to use it in interesting, if sometimes frightening ways. He also does things that don't involve computertry on occasion--or at least, that's the rumor.