The mail server was upgraded the evening of 05-16-05 … a patch was put on the Qmail program for a certain solution one customer needed. At the same time, all other mail softwares were upgraded. After this was running for a day or two, it was noticed that intermittently there were password errors during the POPping of mail. My mail client is Eudora, and while it would generate an error, mail would be checked as usual in another 10 minutes without issue.
After looking into this error, other errors were noticed for other email accounts, but no one had emailed or mentioned anything. It was also realized that these errors were occuring before the upgrade of the server software, so a portion of the software was rolled back hoping that it was an issue with a new release increasing these errors. That was a little less than a week ago.
05/24/05 a reseller had called and explained the issue in a message. Being already aware of the problem, it was time to hanker down and fix it. This error does not appear on all servers running this exact same software. Changing the releases of the failing software corrected nothing. Then making sure everything was a matched release (MySQL client and server) did nothing to help as well. Changing the running of certain options within the software was changed. Some logging was turned off. Some monitoring was disabled. It seemed under heavy disk activity and CPU load the errors would generate, even though the CPU was not being taxed anywhere near its capacity. It was determined that the server has outgrown the IDE drive that spins happily inside it. This conclusion came 4 hours after diagnosing the problem began.
eBay was hit up for a new SCSI controller and 18.4G drive in the span of about a half hour and $108.60 put on the credit card. eBay is great! Then another thought came to mind, and that was to upgrade the operating system. The mail server has been running FreeBSD 4.10 for a good stable while now, but now we’re going to see if FreeBSD 4.11 solves this problem.
While it may be premature, as of this writing (05/25/05 9:10am) it seems the vchkpw is working okay, but pop3d-ssl is still having issues. My client was failing regularly every 10 minutes (I check mail every 10 minutes) … An upgrade of the Courier POP SSL software and this problem went away … There have been way less errors since these changes have been completed, but every once in a while (as before) this error creeps up … The authors of the software say it’s a known problem with no resolution in sight … The problem appears to be with MySQL … But there is no diagnostic information logged anywhere to determine the cause of the failure … It just seems that some people have this problem creep up …
Being that it seems that the error is replicated quite easily when you bang the disk, it is still assumed to be load related and still thought that it is the result of a consumer type HDD, so we’re going ahead with plans to replace the drive with a server type SCSI HDD … Those deliveries will be completed by June 3rd, so the Mail Server HDD upgrade will be done shortly after this hardware can be installed … If the errors don’t disappear as a result, I am going to set the computer on fire and laugh and laugh … WooHoo!