If you've ever had problems with startup items not being able to resolve hostnames in DNS at bootup time in 10.4, the problem might be solved via either a software configuration change or a network switch change. Both methods have pros and cons. Be sure to read carefully before deploying.
The Problem:
For those system admins dealing with this same bug, the Apple Bug # that I filed is 4440840 - in case you wanted to refer to it and help dogpile the issue with Apple. Be sure to file a bug with Apple and refer to this number in your bug report. Please note that the bug report title includes "spanning tree protocol" (STP) but this bug occurs on network switches even when STP is NOT enabled. The desired behavior is that 10.4 should just wait for DNS to be fully functional before starting other processes.
On one of our XServes (G4/1.33 Ghz) running 10.4.4 we were seeing issues with startup items - both Apple's default startup items (NetworkTime) and the third party startup item (KeyServer) - failing to resolve hostnames. The system.log file in /var/log was showing errors like this from the /System/Library/NetworkTime/NetworkTime script (which starts up the ntpdate process):
MyServer.edu:~ root# grep ntp /var/log/system.log
Feb 9 13:18:43 MyServer.edu ntpdate[105]: getnetnum: "clockhost.myuniversity.edu" invalid host number, line ignored
Feb 9 13:18:43 MyServer.edu ntpdate[105]: no servers can be used, exiting
Feb 9 13:18:46 MyServer.edu ntpd_initres[216]: couldn't resolve `clockhost.myuniversity.edu', giving up on it
We originally set the Date & Time system preferences pane to sync with 'clockhost.myuniversity.edu', rebooted the server, and noticed these errors when we noticed that the date and time didn't update at reboot.
In the Date & Time system preferences pane we changed the hostname to the IP address of the server as listed in DNS, and everything worked just fine. What the ... DNS lookups aren't working?! DNS lookups WOULD eventually work when we remotely connected to the server via ssh (and directly at the server) because enough time had passed that DNS resolutions were fully working.
Shouldn't the startup items not run until the network is fully up? We thought so, but it wasn't the case (and still might be this way for all versions of Mac OS X 10.4?).
The "/System/Library/StartupItems/NetworkTime" startup script called the "CheckForNetwork" subroutine in the /etc/rc.common script, checking that the "NETWORKUP" system variable was 'YES', and if so, continued on and called 'ntpdate' which then still failed to resolve the hostname of the NTP server configured.
The Cause:
Mac OS X 10.4 doesn't wait long enough for the network interface to be completely up and running when the network is configured for a static IP address. At first we thought it was an issue directly related to the fact that we had spanning tree protocol enabled on the switch, but we still saw the same issue when spanning tree was disabled entirely. The longer the initial ethernet negotiation takes, the more likely there will be problems related to DNS lookups failing.
The Solutions:
There are multiple solutions to this problem : Hardware or software configuration changes. Each method has their own pros and cons. Let's go through each method.
- Hardware : Network switch configuration change
- On the (majority of high end) network switches there's a mode for a port that can be enabled to accelerate the time it takes for the ethernet negotiation to complete.
Depending on the brand of the network switch, there are different terms for this mode. It's tied to Spanning Tree Protocol, and ... "causes the switch port to skip the standard STP start-up sequence
and put the port directly into the 'Forwarding' state." (quote from HP's docs linked to on this page).
- Cisco: PortFast
- HP: mode fast
- Dell: fast link
- 3Com: Fast Start
- Note that if you don't have Spanning Tree Procotol (STP) enabled, enabling the network port to default to a speed and duplex mode that the server is also capable of MIGHT solve the issue; Ie, 100 Mb/Full Duplex, etc. Spanning Tree Procotol is a "good thing" to enable, though, as it provides a lot of protection for your networks! It could prevent issues of users in labs directly connecting a network port to another port which would then cause a nasty network loop and possibly take down networks.
- Pros:
- Doesn't require modifications to the client.
- Cons:
- Network loops are still possible. Both Cisco and HP specifically warn against this. You will lose some protection that STP provides.
- Per HP's Advanced Traffic Management Guide (Chapter 5, Section 5-30), STP fast mode should only be enabled on "edge nodes", and "changing the Mode to Fast on ports connected to hubs, switches, or routers may cause loops in your network that STP may not be able to immediately detect, in all cases."
- If the network switch gets replaced with another switch, someone needs to remember to enable STP port fast mode on your server ports.
- Hardware: Connect the Mac to a simple switch/hub that connects to the network port
- Connecting a Mac to a small hub and then connecting the hub to the network port also fixes the DNS lookup failures, but it's not a very elegant solution.
- Pros:
- Cons:
- One more point of failure.
- Software: Force Mac OS X to wait for a fully functional network interface
- There are two modifications to make to the system configurations to ensure that DNS resolutions work correctly:
- Change the network preferences from "Manual" (Static IP) to "INFORM" (Essentially faking DHCP mode but still using static IP's for everything - main IP, subnet, router/gateway, and DNS server addresses). Note that you do not need a DHCP response on the network. This method does not use DHCP whatsoever. Nada. Nope. Never.
- In the static IP configured network preferences file located here (in Mac OS X 10.4):
/Library/Preferences/SystemConfiguration/preferences.plist
- Change "Manual":
<string>Manual</string>
- To "INFORM" (all capital letters!):
<string>INFORM</string>
- Save the changes to the file. (You did make a backup copy of the file FIRST, right? Of course you did, you smart system admin you!)
- Add "ipconfig waitall" and "scutil" to startup scripts that run startup items which require DNS. Note that "ipconfig waitall" will be ignored if the INFORM setting is NOT specified in the network preferences (Per Apple).
- Ie, the StartService routine in the Apple supplied /System/Library/StartupItems/NetworkTime/NetworkTime system startup up script looks like this:
StartService ()
{
if [ "${TIMESYNC:=-YES-}" = "-YES-" ] &&
! GetPID ntpd > /dev/null; then
CheckForNetwork
if [ -f /var/run/NetworkTime.StartupItem -o "${NETWORKUP}" = "-NO-" ]; then exit; fi
touch /var/run/NetworkTime.StartupItem
echo "Starting network time synchronization"
# Synchronize our clock to the network's time,
# then fire off ntpd to keep the clock in sync.
ntpdate -bvs
ntpd -f /var/run/ntp.drift -p /var/run/ntpd.pid
fi
}
- Add the ipconfig waitall and scutil commands at the start of the script (before "ntpdate -bvs" in this case below):
StartService ()
{
if [ "${TIMESYNC:=-YES-}" = "-YES-" ] &&
! GetPID ntpd > /dev/null; then
CheckForNetwork
if [ -f /var/run/NetworkTime.StartupItem -o "${NETWORKUP}" = "-NO-" ]; then exit; fi
touch /var/run/NetworkTime.StartupItem
echo "Starting network time synchronization"
# The next two lines below force the waiting for the network and that DNS services are available:
/usr/sbin/ipconfig waitall
/usr/sbin/scutil -w State:/Network/Global/DNS -t 5
# Synchronize our clock to the network's time,
# then fire off ntpd to keep the clock in sync.
ntpdate -bvs
ntpd -f /var/run/ntp.drift -p /var/run/ntpd.pid
fi
}
- Reboot the Mac, and inspect the /var/log/system.log file and notice that there are no longer DNS resolution errors.
- Pros:
- Solves the issue regardless if Spanning Tree Protocol is enabled or not.
- Doesn't require changes on the network switch. If the network switch gets replaced with a new one, the modifications to the client should still function.
- If STP is enabled on the switch, maintains the full benefits that Spanning Tree Protocol provides - no possible network loops.
- Cons:
- Requires modification to network preferences that is not a configuration via the Network Preferences GUI.
- Requires adding ipconfig and scutil to ALL startup scripts.
- Not widely tested. It's strongly recommended that system admins test this first in their environments first before doing a massive deployment. The software solution worked on our 1 server when connected to an HP Procurve 2800 series switch when STP was both enabled and disabled.
Notes and Random thoughts:
- Before applying the software solution, we also had issues with the Sassafras KeyServer process starting up and quitting immediately. Along with adding INFORM to the network preferences, adding ipconfig waitall only was required in the KeyServer startup script (as scutil was already performed in the NetworkTime script that ran before the KeyServer startup item).
- Only enable STP port fast mode on the port that connects directly to a computer and not another switch/router/etc. Otherwise, BAD stuff will happen, according to HP (and probably Cisco).
- The software solution may not work on Macs configured to autologin as a user unless INFORM is specified in the network prefs and ipconfig waitall in ALL startup scripts that are executed.
- HP's Advanced Traffic Management Guide (Chapter 5, section 5-30) was very insightful into the many details of STP. According to HP, the IEEE 802.1D spec indicates that the total STP negotiation time should be 30 seconds but network clients might have issues because they are configured to automatically try to access the network whenever the end node detects a network connection.
- It might be possible to write a startup script that is similiar to the one documented at afp548.com, but the script could do the ipconig waitall and scutil, and then other startup scripts could require the script that does this has not been tested, and may not actually work.
|