We now running SuSE-8.0 on three of our PCs. The only problem after installation was, that sendmail was not working (connection refused on port 25). I changed the arguments in /etc/rc.d/sendmail to "-bd" and now its running without problems. I also recommend to use ext2 or ext3 instead of reiserfs. ReiserFS is probably not stable and can cause problems after power failures (see crashes).
My first impression is, that SuSE is more comfortable to install. I missed the description for each software package. Also tuning of X to more than 85Hz was not possible with Xconfigurator, so I had to edit the Modelines in /etc/X11/XF86Config-4 by hand.
Very impressive! Faster than bochs. Does handle vmware disks (except
2GB-splitted files and SCSI-images).
WinXP can be installed
but it does not boot after installation. I checked the partition entry
and got some strange CHS values (64 heads, 63 sectors), which probably
confuse the BIOS. I tried to to use different -hdachs options, and the
behavior of the booting process is changing, but no way to run it.
I had the same problem for WinNT4.
If you know whats going wrong, tell me.
Knoppix console locks unregulary for unknown reason.
I used my spinpack-package for speed tests and found a factor of 6 to 10
for numerical applications (a 300MHz machine on a 2600MHz PC).
WinXP-Prof-DE runs. After installation it does not work in normal mode and shows error messages that license can not be prooven. Use the "Abgesicherter Modus" (press F8 very early) and install SP2 from CD-img (also with some errors, but it works). After that XP runs slow but normally. Use images of CDs instead of /dev/cdrom which is really terrible slow (5 hours instead of one for installation).
I have administrated the following list of servers (excerpt):
Bought in the '90 (?), 640MB RAM, 66MHz processors, running AIX 4.3. The support was not the best. As we bought some new disk for this machine we did not get the right screws and adapters but the disk was working well for years lying at the dusty bottom (good disks). Updating to a new AIX version was always an adventure. "Never change a running ...". At 1997 we got a defect graphics card after power failure. The cheapest replacement was a Gt4xi graphics card of the price of new PC. At 2001 we throw the machine away.
bought 1998, two equivalent machines with 1.5GB memory, 164UX-Boards 67MHz, DEC-alpha 21164A CPUs 533MHz, 9GB SCSI-disks, running Linux. Very nice machines and the cheapest available for this configuration. The only problem was, that the IDE-Adapter is really slow, I do not know why. A additional PCI-IDE card (noname CMD-PCI646U2) could be used only in slow modes because of missing drivers (?). But using IDE disks via SCSI-IDE Adapters on the SCSI-bus was no problem. The insight of the tower is very warm, but now the machines are in a air conditioned room and no problems are expected. With older Linux kernels 2.2.x there seems to be a problem with applications using more than 1GB RAM, but after using Linux 2.4.x the machines are 100% stable. After we moved to another building we had problems with auto sense of 10Mb/100Mb duplex/halfduplex network. Luckily we do not need highspeed network at the moment. Only the 10Mb/halfduplex version was working well.Soft-RAID0 was able to increase the 19MB/s disks to 28MB/s (two disks) speed. After all, it was a good deal.
bought 1999, 8 MIPS R10000 250MHz, 8GB shared memory, running IRIX64 v6.4, very fast and stable. If I remember right we had to change one defect board during the last two years (during warranty time). Using MP-pragmas for parallel processing was working bad for complex programs. Sometimes the program was 3 times slower on 8 processors than on one processor. I could not find out why. With pthreads I got a speedup of factor 8 with the same algorithm. So I do not trust the very expensive parallel C compilers and use the more primitive standard libraries for multi processing.
I admin three Alphasystems, two of them are big machines (One has 128GB Memory and 32 EV7-Processors, the other 24GB and 16CPUs) and very fast! Unfortunately system is not very stable, there are two to four crashes a year. The hardware support is ok, but the software support is bad. You get updates regulary, but dont try to ask HP questions about misbehavior of the Tru64-system. The hotline dont think about forwarding your question/report to the programmers, they only ask for money for tuning support (@HP: I dont want to buy tuning support, I want to have the bugs I found in your system fixed without paying additional money for it!). Probably they can not reproduce our problems with there test machines and the effort for analyzing the problem for a 128GB machine is high, but simply playing the ball back is not the right thing to do with its customers aren't it? So I dont ask anymore and try to solve the problem by myself. Thats not easy without tracing like linux-strace program and kernel sources. Here is a list of things which cause problems on Tru64-V5.1, may be its usefull to know for you:
vmunix: chk_bf_quota: user quota underflow for user XXXX on fileset /
Also the df-command shows completly wrong data for the adfs-filedomains.
My feeling is that these problems are connected. Sometimes the AdvFS system
does lock if big files (58GB and more) should be removed via rm command.
Bought two machines 1997 (one with only 64MB RAM). There are quite OK. Easily to open and to look inside. After one year we changed from Solaris to Linux because it is easier to manage if you have already a lot of experiences in Linux but less in using Solaris. One bad point is the SunTurboGX graphic card. It was only possible to use with 1152x900x76colors at 72kHz/76Hz. With a 20-inch hightec SUN-Monitor you get easily a headache with less than 80Hz. It is really a bad combined hardware. Another point is the CPU-fan. Two of them are died within the last two years. Also we had one disk crash and you need special SUN disks (expensive). They are only used now as number cruncher and for guests. Funny thing is, that you get a lot of software you never need for every machine, but only one boot disk for about 16 machines. Since September 2002 we are testing SuSE 7.3 for sparc without problems until now.
Bought 1998 with two 300MHz PII CPUs and 512MB RAM and SCSI. Big mistake! Every PC magazine was claiming that one needs SCSI for burning CDs at this time, but this probably was only true for WinXX. SCSI was not necessary for the CD writer to avoid buffer underruns. After using IDE CD writer under Linux on other older PCs without any trouble, I prefer the cheaper IDE versions today. This was not a problem at all but the CPUs chosen were bad. CPUs becomes so hot that the board beeps if both CPUs are 100% used. After few weeks with lots of crashes the contact to the cooling bodies got lost (cooling body was bend by heat). With the new cooling bodies we got, the problem was not completely solved. The seller could not really solve the problem and switched the BIOS heat warnings off, but with moderate success. Nowadays I know that this CPU version was the hottest one. The voltage was increased to 5V to get the CPUs running at 300MHz. Surprisingly the CPUs are not outburned after using 3 years. On hot days with both CPUs used the board does still its quiet beeping. After such experiences we went back to Celerons with moderate clocks and 128MB RAM, which are silent, still waiting that PCs with stable boards and more than 1GB RAM are broadly available to build small PC clusters as a better solution. Buying not the newest product seems a good tactics nowadays. If you need more performance, tune your code!
On January 2002 we head trouble with two machines running SuSE 6.4 with reiserfs-2.x on root. Both machines showed inconsistencies on the reiserfs-filesystem (all actions took lot of time) after about 19 months. A new installation was necessary.
Remark: If you have any entries in /etc/hosts twice, sendmail failes to start. We took more than two hours to find and fix this problem.
On September the 23th, 2002 we had a powerfail. After that a PC with SuSE 7.3 with reiserfs-3.x.0k-pre9 installed showed non-reproducable errors during numerical calculations (wrong results, unexpected aborts). We made a reiserfs-check, bad it claims that everything was ok. After reboot and further tests also the compiler gcc gave non-reproducable "internal compiler errors", which appear more and more frequently leading finaly in a kernel panic. We installed SuSE 8.0 with ext2 and again strange things happens. So we opened the PC and made a visual check of the hardware. Only the CPU-fan (2 years old) was not in its best state, in some cases it does not start to rotate after stopping by hand. Was the fan not started after the power fail? This would cause to high temperatures and would explain the strange behaviour. Indeed the PC was running without problems after checking the CPU-fan so that we are now sure to have located the problem.
Have you ever tried to measure speed of your network? A simple command to do that is:
time dd if=/dev/zero bs=1024k count=1000 | rsh remotehost "cat >/dev/null"The result is 89s for a 100Mbps ethernet card (1000MB/89s=11.2MB/s=90Mbps). Pretty accurate! For a 1000Mbps card the test failed because rsh took 100% of CPU time. In a second test I started the above command 6 times parallel and got 55MB/s=440Mbps using a 8-CPU-machine which shows that a better speed-test is needed here. Do you have a simple on?
We had only one compromised system. An old 486 PC used as printer and floppy disk server (for the SUNs without floppy disk). This PC was used to upload and download files. We noticed it because the PC crashed after the 500MB disk was full. Other problems were connected with sendmail and relaying. It was used three times to send spam over the world. We noticed it because the machines could not do anything else and there was lot of disk activity. Sorry to all victims of spam. Now we have configured all our machines to not relay email. Since our machines are configured more securely and we use SSH logins we rarely notice portscans and other attacks.
If you use Windows on your client PC and want to login to a Unix-Box, you can use exceed + ssh (commercial), cygwin-package or Xming to work with graphical applications.
Using pine for imap-server via SSL:
pine -f {IMAP-Server/imap/ssl/user=userid}inbox # OR
.pinerc inbox-path=\
{sunny.urz.uni-magdeburg.de/imap/ssl/user="username"}inbox
# inbox-path={imap.web.de/novalidate-cert/user="username"}inbox
instead of novalidate-cert do:
- download OvGUMssl.pem to /etc/ssl/certs
- openssl x509 -noout -fingerprint -in OvGUMssl.pem # better use SHA1
# MD5 Fingerprint=72:A0:34:4C:64:18:57:6A:80:9A:89:72:48:92:7F:83
- openssl x509 -noout -hash -in OvGUMssl.pem # 6cc6a28b
- ln -s OvGUMssl.pem $(openssl x509 -noout -hash -in OvGUMssl.pem).0
- openssl verify -CApath /etc/ssl/certs OvGUMssl.pem
- same with dfn-cert.pem
- ToDo: check CRL = certification revocation list
# check the connection: netstat -atn # to hostip:993 ESTABLISHED (imap via SSL)
This happens some times on our compute servers, mostly if users dont estimate the memory needs of there programs. Most operating systems are slowing down, but are working further. Tru64-5.1B does the worst thing, killing any process (also old deamons running as root), which results often as crash. IRIX64-6.5 has killed the user process in all test situations and everything else continues well. ulimit -v can help but is not practicable for MPI processes with asymmetric memory consumtion on shared memory machines.
Explored a firmware bug in the IPMI Software of the BMC of a DELL PowerEdge 1950 Server which cause a Linux system crash on high loads. See at end of that webpage for more information.
This happens if a long running process (days or weeks) eats all the memory. The OOM-killer does not kill this process because long run processes are important. As an solution create /etc/skel/.ssh/rc with "ps -o pid --no-heading | xargs renice 10 >/dev/null 2>&1" and copy that to existing user homes. You could also use /etc/ssh/sshrc (xauth add .. must be added to sshrc files, because xauth is not called by sshd if rc file exist and x11 tunneling will fail). Nice makes killing of system processes more unlikely. Also set /proc/sys/vm/overcommit_memory to 2 and /proc/sys/vm/overcommit_ratio to 90 or higher. Also disable swap, it makes no sence for HPC, it will only create a long time slow down before OOM happens. Programs which alloc all of the memory and more are bad.
The problem is to power off the Altix 330 server after shutdown in case of
high room temperatures or power failures (to save UPS power for others).
shutdown -p -h does not work, the machine stays and consumes still power.
The only way is to connect the service processor from the service net
by telnet and power it off.
This can be done automatically by:
(echo "pwr down";sleep 9;echo -e "\x1dquit";sleep 1) | telnet 10.0.0.1
Same technique can be used for the alpha servers above.
At Januar 2010 one of our machines did ssh attacks to other servers in the
world. It was the 141.44.40.29-linux machine.
netstat -atn | wc -l showed about 2500 ssh connections.
ps auxw output was looking like this (user name changed):
matze 18773 0.5 0.0 1736 308 ? S Jan17 10:42 ./dt_ssh5 200 2 17.79.182.153 2 root 18492 0.1 0.1 8252 2396 ? Ss 11:10 0:00 sshd: root@pts/0 root 18890 0.0 0.0 4120 1844 pts/0 Ss 11:11 0:00 -bash matze 25993 0.0 0.0 1736 468 ? S 11:14 0:00 ./dt_ssh5 200 2 17.79.182.153 2 matze 26023 0.0 0.0 1736 468 ? S 11:14 0:00 ./dt_ssh5 200 2 17.79.182.153 2 matze 26024 0.0 0.0 1736 468 ? S 11:14 0:00 ./dt_ssh5 200 2 17.79.182.153 2The user had a to simple password choosen (183000 google hits for it). The binary was lying in
/tmp-path and showed this properties:
ls -l -rwxr-xr-x 1 1001 users 1379632 2010-01-17 05:05 dt_ssh5 file tmp/dt_ssh5 tmp/dt_ssh5: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.4,\ bad note description size 0x83e58955, bad note name size 0xe8000001, bad note name size 0xc2815a00,\ bad note description size 0xc7397500, bad note name size 0x89589c00, bad note name size 0xc831589c,\ statically linked, stripped md5sum tmp/dt_ssh5 f0b5fc67c41d567c1f306e88363f139a tmp/dt_ssh5
strings -9 dt_ssh5 showed strings belonging ssh and openssl
libreraries.
Two successfull logins in /var/log/messages (name changed):
Dec 21 21:55:41 fermion sshd[23143]: Accepted keyboard-interactive/pam for matze from 58.247.222.163 port 40039 ssh2 Jan 17 05:05:10 fermion sshd[18758]: Accepted keyboard-interactive/pam for matze from 217.79.182.153 port 45300 ssh2 Jan 17 05:05:10 fermion sshd[18761]: subsystem request for sftp Jan 17 05:05:10 fermion sshd[18761]: channel 0: rcvd big packet 131030, maxpack 32768 Jan 17 05:05:10 fermion sshd[18761]: channel 0: rcvd big packet 112867, maxpack 32768 Jan 17 05:05:10 fermion sshd[18761]: channel 0: rcvd big packet 112838, maxpack 32768 Jan 17 05:05:10 fermion sshd[18761]: channel 0: rcvd big packet 112809, maxpack 32768