Stjepan Groš

Problem solving on Zimbra server

One day one of the servers I maintain crashed because of the power outage. After power come back the server started up properly, but Zimbra didn't start. When I tried to start it from the command line, I got the following error message:

[zimbra@gw ~]$ zmcontrol status
Unable to determine enabled services from ldap.
Unable to determine enabled services. Cache is out of date or doesn't exist.
[zimbra@gw ~]$ zmcontrol start
Host mail.sistemnet.local
Unable to determine enabled services from ldap.
Unable to determine enabled services. Cache is out of date or doesn't exist.
[zimbra@gw ~]$ zmcontrol stop
Host mail.sistemnet.local
        Stopping stats...Done.
        Stopping mta...Done.
        Stopping spell...Done.
        Stopping snmp...Done.
        Stopping archiving...Done.
        Stopping antivirus...Done.
        Stopping antispam...Done.
        Stopping imapproxy...Done.
        Stopping memcached...Done.
        Stopping mailbox...Done.
        Stopping logger...Done.
        Stopping ldap...Done.
[zimbra@gw ~]$ zmcontrol start
Host mail.sistemnet.local
        Starting ldap...Done.
Unable to determine enabled services from ldap.
Unable to determine enabled services. Cache is out of date or doesn't exist.

Of course, in such situations the first thing you try to do is use some google-fu on the error message. But the google-fu returned the following possible causes:

The problem is, if I try each one in turn there is possibility that I mess something and that I end up in the worse situation that I'm in. So, I have to know what is the exact cause of this before trying to fix it. Of course, some simple checks was possible to do, e.g. checking if DNS works.

So, what I did is that I decided to find out what is the exact cause of the error message. First thing is to look for log files with a hope that there you'll find a clue what happened. Obviously, someone could argue that this should be the fist step and it might be right, sometimes you'll have more luck with logs and sometime with Google, but there will be also situations where you'll have to use both.

Zimbra's log files are in /var/log directory (zimbra.log and maillog) and in the /opt/zimbra/log. The log files in /var/log contain entries related mainly to mail processing and delivery, while those in /opt/zimbra/log are much more detailed. So I changed to that directory and after trying to restart Zimbra again, I looked again which file last changed (using -ltr switches to the ls command). It turned out that it's the file startup.log (how obvious name :)). But, no luck with that file because there was entry not more helpful that the one emitted in the terminal when restarting Zimbra.

# tail startup.log
Unable to determine enabled services from ldap.
Unable to determine enabled services. Cache is out of date or doesn't exist.
Host mail.sistemnet.local
        Starting ldap...Done.
Unable to determine enabled services from ldap.
Unable to determine enabled services. Cache is out of date or doesn't exist.
Host mail.sistemnet.local
        Starting ldap...Done.
Unable to determine enabled services from ldap.
Unable to determine enabled services. Cache is out of date or doesn't exist.

The next debugging step is to monitor process while it is running trying to find out where it's behavior deviates from the normal behavior. This is accomplished using strace utility on the executable that reports error, but in this case the executable is perl script (confirmed by file command) so strace here is useless. But, seeing source is much more helpful that tracing binary and to see Perl source you just have to open it in some text editor. :)

I searched there for a string cache.